Navigate by Range

SQL & MySQL Interview Questions 21-25 (Practical Queries & Normalization Theory)

Hello! In our previous lessons, we've covered the fundamentals of database structure, indexes, and complex queries. We've learned how to find and delete data.

In this lesson, we'll round out our practical SQL skills by learning how to handle missing data (`NULL`) and modify existing data (`UPDATE`). We'll also look at a crucial set operator, `UNION`. Then, we will pivot to the most important theory in database design: Normalizationand Denormalization. Understanding this theory is what separates a good developer from a great one.

21. Write SQL to retrieve employees who do not have managers.

This is a classic question that tests your understanding of `NULL` values. In a typical `Employees` table, a `manager_id` column links to the `employee_id` of that person's manager. The top-level person (like the CEO) won't have a manager, so their `manager_id` will be `NULL`.

The key is that you cannot write `WHERE manager_id = NULL`. In SQL, `NULL` is not equal to anything, not even itself. You must use the`IS NULL` operator.

CREATE TABLE Employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(50),
    manager_id INT
);

INSERT INTO Employees VALUES
(1, 'Alice Smith', 3),  -- Manager is Carol
(2, 'Bob Johnson', 3),  -- Manager is Carol
(3, 'Carol White', 4),  -- Manager is David
(4, 'David Brown', NULL); -- David is the CEO

-- Select the employee(s) with no manager
SELECT name
FROM Employees
WHERE manager_id IS NULL;

Expected Output:

+-------------+
| name        |
+-------------+
| David Brown |
+-------------+
1 row in set

22. Write SQL to increase the salary of all employees by 10%.

This is a straightforward question to test your knowledge of the `UPDATE` statement. This command modifies existing rows in a table.

You use `UPDATE TableName` to specify the table, and `SET column = new_value` to perform the calculation. Because the question says "all employees," we do not use a `WHERE` clause.

-- Let's use a simple salary table
CREATE TABLE Salaries (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    salary DECIMAL(10, 2)
);

INSERT INTO Salaries VALUES
(1, 'Alice', 80000.00),
(2, 'Bob', 100000.00);

-- Show data *before* the update
SELECT * FROM Salaries;

-- Give everyone a 10% raise
UPDATE Salaries
SET salary = salary * 1.10;

-- Show data *after* the update
SELECT * FROM Salaries;

Output (Before):

+----+-------+-----------+
| id | name  | salary    |
+----+-------+-----------+
|  1 | Alice |  80000.00 |
|  2 | Bob   | 100000.00 |
+----+-------+-----------+
2 rows in set

Output (After):

+----+-------+-----------+
| id | name  | salary    |
+----+-------+-----------+
|  1 | Alice |  88000.00 |
|  2 | Bob   | 110000.00 |
+----+-------+-----------+
2 rows in set

23. What is the difference between UNION and UNION ALL?

Both `UNION` and `UNION ALL` are set operators used to combine the result sets of two or more `SELECT` statements. The columns in the `SELECT` statements must be the same number and have similar data types.

Analogy: You have two guest lists for a party.

UNION: This is the final guest list. You combine both lists and then cross off any duplicate names. This is a slower operation because it has to sort the results to find duplicates.
UNION ALL: This is just taping the two lists together. If "John Smith" is on both lists, his name will appear twice. This is much faster because no check for duplicates is performed.

CREATE TABLE Table1 (name VARCHAR(50));
CREATE TABLE Table2 (name VARCHAR(50));

INSERT INTO Table1 VALUES ('Alice'), ('Bob'), ('Carol');
INSERT INTO Table2 VALUES ('Bob'), ('David'), ('Eve');

-- 1. Using UNION (removes the duplicate 'Bob')
SELECT name FROM Table1
UNION
SELECT name FROM Table2;

-- 2. Using UNION ALL (keeps the duplicate 'Bob')
SELECT name FROM Table1
UNION ALL
SELECT name FROM Table2;

Output (UNION):

+-------+
| name  |
+-------+
| Alice |
| Bob   |
| Carol |
| David |
| Eve   |
+-------+
5 rows in set

Output (UNION ALL):

+-------+
| name  |
+-------+
| Alice |
| Bob   |
| Carol |
| Bob   |
| David |
| Eve   |
+-------+
6 rows in set

Tip: Always use `UNION ALL` unless you specifically need to remove duplicates. It is significantly more performant.

24. What is normalization? Explain 1NF, 2NF, 3NF, and BCNF.

Normalization is the formal process of organizing the columns and tables of a relational database to reduce data redundancy and improve data integrity.

Analogy: It's like cleaning a very messy closet. Instead of one giant pile of "clothes" (a single table), normalization is the process of creating a "shirt drawer" (a `Shirts` table), a "sock drawer" (a `Socks` table), and a "pants rack" (a `Pants` table). This makes it easier to add new clothes (insert), change an outfit (update), or get rid of an item (delete) without messing up the whole closet.

The Normal Forms:

1NF (First Normal Form):
- Rule: Each cell must be "atomic" (hold only a single value).
- Bad (Unnormalized):A `User` table with a `phone_numbers` column containing `"555-1234, 555-5678"`.
- Good (1NF):Create a separate `PhoneNumbers` table with a `user_id` to link back to the user.
2NF (Second Normal Form):
- Rule: Must be in 1NF + All non-key columns must depend on the entire primary key. (This rule only applies to tables with composite keys).
- Bad (1NF, not 2NF):A table `(order_id, product_id, order_date)`. Here, `order_date` depends only on `order_id`, not the full `(order_id, product_id)` key.
- Good (2NF):Split into two tables: `Orders(order_id, order_date)` and `Order_Items(order_id, product_id)`.
3NF (Third Normal Form):
- Rule: Must be in 2NF + No "transitive dependencies." This means a non-key column cannot depend on another non-key column.
- Bad (2NF, not 3NF):A `Users` table with `(user_id, zip_code, city)`. Here, `city` depends on `zip_code`, which depends on `user_id`. This is a transitive dependency. If the city for a zip code changes, you have to update thousands of user rows.
- Good (3NF):Split into two tables: `Users(user_id, zip_code)` and `ZipCodes(zip_code, city)`.
BCNF (Boyce-Codd Normal Form):
- Rule: A stricter version of 3NF. For any dependency `A -> B`, `A` must be a superkey.
- In simple terms:It handles rare anomalies that 3NF doesn't, usually involving multiple overlapping composite keys. For most industry applications, 3NF is considered sufficiently normalized.

25. What is denormalization, and when is it used?

Denormalization is the intentional process of adding redundant (duplicate) data back into a database.

We just spent all that time cleaning our closet (normalization), so why would we undo it?

The Reason: Performance.

A highly normalized database (like in 3NF) is fantastic for `INSERT`, `UPDATE`, and `DELETE` operations because data is in one place. However, it can be very slow for `SELECT` queries, as you have to `JOIN` many small tables back together (e.g., `Users`, `Posts`, `Comments`, `ZipCodes`).

Analogy: Your closet is perfectly normalized (shirts, pants, socks all separate). But it takes you 5 minutes to assemble an outfit (`JOIN`s are slow). You decide to create a "work uniform" on a single hanger with the shirt, pants, and tie all bundled together. This is redundant(you now have a shirt on a hanger *and* one in the drawer), but your "get ready for work" query is now blazing fast.

When is it used?

Read-Heavy Applications: If your app reads data 10,000 times for every 1 time it writes, you optimize for the read.
Reporting & Analytics: In a data warehouse, you intentionally denormalize data so that analysts can run huge queries without complex `JOIN`s.
Real-World Example: On a social media site, when you `SELECT` a post, you also want the author's username. Instead of `JOIN`ing the `Users` table every time, you might denormalize by storing both `user_id` and `username` directly on the `Posts` table. When a user changes their username, you have a more complex `UPDATE`, but your most common query (reading posts) is much faster.

Previous Next

Chat about this topic?