SQL & MySQL Interview Questions 36-40 (JOINs & Subqueries)
Welcome! In our previous lessons, we've covered database structure, reliability (ACID), and performance (indexes). Now, we get to the most powerful part of SQL: combining data from multiple tables.
This lesson is all about JOINs. We'll cover the practical differences between the major join types, look at the special-case `SELF JOIN` and `CROSS JOIN`, and then finish with an essential concept: subqueries. Understanding how to link tables together correctly is perhaps the most-used skill in day-to-day SQL development.
36. Explain different types of joins (INNER, LEFT, RIGHT, FULL).
A JOIN is an SQL clause used to combine rows from two or more tables based on a related column between them (usually a `FOREIGN KEY` and `PRIMARY KEY`).
Analogy: Imagine you have two lists.
- Table A (`Customers`): A list of all your customers.
- Table B (`Orders`): A list of all orders that have been placed.
A `JOIN` helps you answer questions by linking these two lists. Let's use this sample data:
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
item VARCHAR(50),
customer_id INT
);
INSERT INTO Customers VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Carol'); -- Carol has placed no orders
INSERT INTO Orders VALUES
(101, 'Book', 1), -- Belongs to Alice
(102, 'Pen', 1), -- Belongs to Alice
(103, 'Paper', 2), -- Belongs to Bob
(104, 'Stapler', 99); -- Orphan order, no customer
INNER JOIN
Analogy: "The Intersection"
Shows only the rows that have a match in both tables. It ignores customers with no orders (Carol) and orders with no customer (Stapler).
SELECT C.name, O.item
FROM Customers C
INNER JOIN Orders O ON C.customer_id = O.customer_id;
Expected Output:
+-------+-------+
| name | item |
+-------+-------+
| Alice | Book |
| Alice | Pen |
| Bob | Paper |
+-------+-------+
3 rows in set
LEFT JOIN (or LEFT OUTER JOIN)
Analogy: "All from the Left"
Shows all rows from the left table (`Customers`), and the matched rows from the right table (`Orders`). If there is no match, the right side columns are filled with `NULL`.
SELECT C.name, O.item
FROM Customers C
LEFT JOIN Orders O ON C.customer_id = O.customer_id;
Expected Output:
+-------+-------+
| name | item |
+-------+-------+
| Alice | Book |
| Alice | Pen |
| Bob | Paper |
| Carol | NULL |
+-------+-------+
4 rows in set
RIGHT JOIN (or RIGHT OUTER JOIN)
Analogy: "All from the Right"
Shows all rows from the right table (`Orders`), and the matched rows from the left table (`Customers`). If there is no match, the left side columns are `NULL`.
SELECT C.name, O.item
FROM Customers C
RIGHT JOIN Orders O ON C.customer_id = O.customer_id;
Expected Output:
+-------+---------+
| name | item |
+-------+---------+
| Alice | Book |
| Alice | Pen |
| Bob | Paper |
| NULL | Stapler |
+-------+---------+
4 rows in set
FULL OUTER JOIN
Analogy: "All from Both"
Shows all rows from both tables. It includes matches, left-only rows (Carol), and right-only rows (Stapler).
Important: MySQL does not have a `FULL OUTER JOIN` command. You must simulate it by combining a `LEFT JOIN` and a `RIGHT JOIN` with `UNION`.
(SELECT C.name, O.item
FROM Customers C
LEFT JOIN Orders O ON C.customer_id = O.customer_id)
UNION
(SELECT C.name, O.item
FROM Customers C
RIGHT JOIN Orders O ON C.customer_id = O.customer_id);
Expected Output:
+-------+---------+
| name | item |
+-------+---------+
| Alice | Book |
| Alice | Pen |
| Bob | Paper |
| Carol | NULL |
| NULL | Stapler |
+-------+---------+
5 rows in set
37. What is a self-join? Provide an example.
A self-join is a regular `JOIN`, but instead of joining two different tables, you join a table to itself.
This is used when a table contains a reference to itself. The classic example is an `Employees` table where a `manager_id` column points to the `employee_id` of another employee in the same table.
Analogy: You have a list of people at a party, and one column is "who introduced you." To find out the name of the introducer, you have to look that person up in the same party list.
To do this, you must use aliases (like `e` and `m`) to treat the table as two separate copies.
CREATE TABLE Employees (
employee_id INT PRIMARY KEY,
name VARCHAR(50),
manager_id INT
);
INSERT INTO Employees VALUES
(1, 'Alice', 3),
(2, 'Bob', 3),
(3, 'Carol', 4),
(4, 'David', NULL); -- David is the CEO
-- Find each employee and their manager's name
SELECT
e.name AS EmployeeName,
m.name AS ManagerName
FROM
Employees e
-- We use LEFT JOIN to include the CEO (David)
LEFT JOIN
Employees m ON e.manager_id = m.employee_id;
Expected Output:
+--------------+-------------+
| EmployeeName | ManagerName |
+--------------+-------------+
| Alice | Carol |
| Bob | Carol |
| Carol | David |
| David | NULL |
+--------------+-------------+
4 rows in set
38. What is a cross join?
A CROSS JOIN is a type of join that returns the Cartesian product of two tables. This means it matches every single row from the first table with every single row from the second table.
Analogy: You have a `Shirts` table with 3 colors (Red, Green, Blue) and a `Sizes` table with 3 sizes (S, M, L). A `CROSS JOIN` would produce every possible combination: (Red, S), (Red, M), (Red, L), (Green, S), (Green, M)... for a total of 3 * 3 = 9 rows.
It's rarely used in production, as it can create massive result sets. Its main use is for generating all possible combinations for test data or specific statistical analysis.
CREATE TABLE Colors (color VARCHAR(10));
CREATE TABLE Sizes (size VARCHAR(5));
INSERT INTO Colors VALUES ('Red'), ('Blue');
INSERT INTO Sizes VALUES ('S'), ('M');
-- Explicit CROSS JOIN syntax
SELECT * FROM Colors CROSS JOIN Sizes;
-- You can also get an implicit cross join like this:
-- SELECT * FROM Colors, Sizes;
Expected Output:
+-------+------+
| color | size |
+-------+------+
| Red | S |
| Blue | S |
| Red | M |
| Blue | M |
+-------+------+
4 rows in set
39. What are subqueries in MySQL?
A subquery (or nested query) is a complete `SELECT` query that is written inside another main query.
Analogy: It's a multi-step question. To find the answer to your main question, you first have to find the answer to a smaller, secondary question.
Main Question: "Which employees work in the 'Sales' department?"
Subquery: "First, what is the `department_id` for 'Sales'?" (Answer: 3)
Subqueries are most often used in the `WHERE`, `FROM`, or `SELECT` clause.
-- Example: Find all employees who work
-- in the 'Sales' department.
-- The (inner) subquery runs first:
-- (SELECT department_id FROM Departments WHERE name = 'Sales')
-- This query returns the ID '102'.
-- Then the outer query runs:
-- SELECT * FROM Employees WHERE department_id = 102;
SELECT * FROM Employees
WHERE department_id IN (
SELECT department_id
FROM Departments
WHERE name = 'Sales'
);
40. What is the difference between correlated and non-correlated subqueries?
This is a critical follow-up to the previous question. It describes how the inner query and outer query interact.
Non-Correlated Subquery (Simple/Independent)
This is the "In-N-Out" query. The inner query is independent of the outer query.
- 1. The inner query runs one time, by itself.
- 2. It builds a result set (e.g., `(102)`).
- 3. It passes this result to the outer query, which then uses it to run.
The query from the previous question (`...WHERE department_id IN (SELECT ... 'Sales')`) is a perfect example of a non-correlated subquery.
Correlated Subquery (Dependent)
This is the "line-by-line" query. The inner query depends on the outer query and runs repeatedly—once for every row being processed by the outer query.
Question: "Find all employees who earn more than the average salary in their own department."
-- 'e1' is the outer query's employee
SELECT
e1.name, e1.salary, e1.department_id
FROM
Employees e1
WHERE
e1.salary > (
-- This inner query is "correlated"
-- It must re-run for every single row
-- to calculate the average for *that row's* department.
SELECT AVG(e2.salary)
FROM Employees e2
WHERE e2.department_id = e1.department_id
);
Key Takeaway:
- Non-Correlated: Inner query runs 1 time. Usually fast.
- Correlated: Inner query runs N times (where N is the number of rows in the outer table). Can be very slow and is often a target for optimization.