SQL & MySQL Interview Questions 11-15 (Advanced Indexing & Query Optimization)
Welcome back! In our last lesson, we introduced the concept of an index—the "textbook index" that makes your database reads fast. Now, we're going to take that concept to the next level.
This lesson is all about performance. We'll cover advanced index types, like composite and covering indexes. Then, we'll learn how to ask the database how it's running your query by reading an execution plan. Finally, we'll tie it all together with a practical checklist for optimizing a slow query, which is one of the most common tasks you'll face as a developer. This is a really valuable set of skills, so let's take our time.
11. What is a composite index? When should you use one?
A composite index(or multi-column index) is a single index that includes two or more columns.
Analogy: Let's use our phone book analogy again.
- A simple index on `last_name` is like the phone book itself. It's sorted by `last_name`.
- A composite index on (`last_name`, `first_name`) is a phone book that is sorted first by last name, and then, within each last name, it's sorted by first name.
When to use it:
You should use a composite index when you frequently search for data using more than one column in your `WHERE` clause.
The order of the columns in the index is critically important. If you have an index on `(`last_name`, `first_name`)`:
- This query is FAST:
`WHERE last_name = 'Smith' AND first_name = 'John'` - This query is FAST:
`WHERE last_name = 'Smith'` - This query is SLOW:
`WHERE first_name = 'John'`
Why is the last one slow? Because the phone book is not sorted by `first_name`. The index is only useful if you use its columns from left to right.
-- We have a table...
CREATE TABLE Users (
user_id INT PRIMARY KEY,
last_name VARCHAR(100),
first_name VARCHAR(100),
zip_code VARCHAR(10)
);
-- We often run this query:
-- SELECT * FROM Users
-- WHERE last_name = 'Jones' AND first_name = 'David';
-- So, we create a COMPOSITE index to make it fast:
CREATE INDEX idx_name
ON Users(last_name, first_name);
Sample Output:
Query OK, 0 rows affected
12. What is a covering index?
A covering indexis a special case of an index that contains all the columns needed to satisfy a specific query.
Analogy: Let's say you need a student's phone number.
- Normal Index: You go to the student directory (the index), find "Alice Smith," and it tells you, "Go to student file #123" (the main table). You then walk to the filing cabinet, pull file #123, and get the phone number. This is a two-step lookup.
- Covering Index: You go to a special "Quick Contact" sheet (the covering index). This sheet is sorted by name and also includes the phone number right on it. You find "Alice Smith" and get her number immediately. You never had to go to the main filing cabinet.
When a query uses a covering index, the database reads only the index and never has to touch the main table data. This is one of the most powerful performance boosts you can get.
-- This is the query we want to optimize:
-- SELECT email, last_login FROM Users WHERE username = 'alice';
-- The query NEEDS three columns: email, last_login, and username.
-- We create a composite index that "covers" all three:
CREATE INDEX idx_user_cover
ON Users(username, email, last_login);
Now, when the database looks up `username` in the index, it finds that the index also contains the `email` and `last_login` it needs. It can return the result directly from the index.
13. What is a query execution plan, and how do you read it?
A query execution plan(or "query plan") is the database's battle plan. When you send a query (e.g., `SELECT * ...`), the database's optimizer thinks about several ways to get your data and chooses the one it thinks is the cheapest (fastest).
Analogy: You ask your phone for directions to the airport. The "optimizer" checks traffic and computes multiple routes. The "execution plan" is the final, step-by-step list of instructions it gives you (e.g., "1. Turn left. 2. Merge onto highway...").
In MySQL, you read this plan by adding the `EXPLAIN` keyword in front of your query.
EXPLAIN SELECT * FROM Users WHERE username = 'alice';
Sample Output (Simplified):
+----+-------------+-------+------+---------------+------+---------+...
| id | select_type | table | type | possible_keys | key | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+...
| 1 | SIMPLE | Users | ALL | NULL | NULL | 1000 | ... |
+----+-------------+-------+------+---------------+------+---------+...
How to Read It (Key Columns):
- `type`: This is the most important column.
- `ALL`: This is a Full Table Scan. The database is reading every single row. This is BAD on large tables.
- `index`: This is a full scan of just the index. Better, but not great.
- `range`: This is good. It means it's scanning a "range" of an index (e.g., `WHERE age > 18`).
- `ref`: This is great. It means it's joining tables using a good index.
- `const` / `system`: This is the best. The database read 1 row (or 0) using a Primary or Unique key. - `possible_keys`: Indexes the database thinks it could use.
- `key`: The index the database actually chose. If this is `NULL` (and `type` is `ALL`), you need an index!
- `rows`: The database's estimate of how many rows it will have to scan. Lower is better.
- `Extra`: Contains vital clues.
- `Using index`: This is GOOD. It means a covering index was used!
- `Using filesort`: This is BAD. The database had to do a slow sort on disk or in memory. You need an index on your `ORDER BY` column.
14. How do you optimize a slow SELECT query?
This is a practical checklist question. Here is the step-by-step process you should follow.
- Benchmark First: First, measure the query speed so you have a baseline. Don't optimize blindly.
- Run `EXPLAIN`: Put `EXPLAIN` in front of the query. This is your map.
- Look for the Bottleneck: Find the line in the plan with a `type: ALL` or a very high `rows` count. Look for `Using filesort` or `Using temporary` in the `Extra` column.
- Add Indexes (The 90% Solution):
- Add indexes to columns in your `WHERE` clause.
- Add indexes to columns in your `JOIN ... ON` clauses.
- Add indexes to columns in your `ORDER BY` or `GROUP BY` clauses.
- Consider composite or covering indexes. - Rewrite the Query:
- Avoid `SELECT *`. Only select the columns you actually need. This makes covering indexes possible.
- Avoid `LIKE '%value%'` (leading wildcard) as it cannot use an index.
- Break complex subqueries into simpler `JOIN`s or temporary tables. - Introduce Caching: If the query is still slow and the data doesn't change often, cache the result in your application (e.g., in Redis or memory) for a few minutes.
15. Scenario: A table has millions of rows and SELECT queries are slow. How would you improve performance?
This is a scenario-based question that is just a re-phrasing of the previous question. It's designed to see if you can apply the checklist in a real-world context.
Here's how you should answer:
"My first step is to get the exact query that is slow. I would not start by guessing or adding random indexes.
1. Analyze: I will run `EXPLAIN` on that exact query in a production-like environment.
2. Identify: With millions of rows, the problem is almost certainly a Full Table Scan(which I'd see as `type: ALL` in the plan) or a very slow `filesort`.
3. Hypothesize: I will look at the `WHERE` clause and `ORDER BY` clause of the slow query. For example, if the query is `SELECT ... FROM large_table WHERE last_name = 'Smith' ORDER BY created_at`, the bottleneck is that the database is scanning all million rows for 'Smith' and then sorting them.
4. Implement (Index):The solution is likely to add a composite index that matches the query. In this case, I would add:
CREATE INDEX idx_name_created
ON large_table(last_name, created_at);
This index is sorted by `last_name` (for the `WHERE`) and then by `created_at` (for the `ORDER BY`), which allows the database to both find the rows and return them in the correct order, all from the index. This should eliminate the full scan *and* the filesort.
5. Verify: I would run `EXPLAIN` again. I would expect to see the `type` change from `ALL` to `ref` or `range`, the `key` to be my new `idx_name_created` index, and the `rows` to be a very small number."
Tip: The answer to almost every "slow query" question is `EXPLAIN` followed by `CREATE INDEX`. Showcasing this methodical, non-guessing approach is exactly what interviewers want to see.