Navigate by Range

Python Full-Stack Interview Questions 66–70 (Testing, Context Managers, Large Files, Multiprocessing, C Extensions)

Hello! This lesson covers some powerful, advanced concepts in Python. We'll start with the essential skill of testing, then move into elegant resource management with context managers, how to handle massive files without running out of memory, the complexities of parallel processing, and finally, how to break the speed barrier with C extensions. These topics show a deep level of mastery, so let's take our time and explore each one calmly.

66. Explain unit testing and test frameworks in Python (unittest, pytest) and mocking (unittest.mock).

Unit Testing is the practice of testing the smallest possible "units" of your code—typically individual functions or methods—in isolation. The goal is to verify that each unit does exactly what it's supposed to do, without worrying about the rest of the application.

Analogy: Think of building a car. A unit test is like testing just the headlightto make sure it turns on. It's not testing the headlight, the battery, and the dashboard switch all together (that would be an integration test).

Test Frameworks: unittest vs. pytest

unittest: This is the "batteries-included" framework built into Python's standard library. It's class-based (you must subclass unittest.TestCase ) and uses specific assertion methods (like self.assertEqual() ). It's solid but can be verbose.
pytest: This is the community standard and ismuch simpler to use. It's function-based (you just write a function named test_something ) and uses plain assert statements. It has a powerful plugin ecosystem and provides better failure messages.

Mocking (unittest.mock):This is a technique for replacing parts of your system (like an external API, a database, or a slow function) with a "fake" object during a test.

Analogy: You're testing a function that calls a payment API . You don't want to charge a real credit card every time you run your tests! So, you "mock" the API. You replace it with a stunt double that instantly returns a fake "Success" message, allowing your test to continue without any real-world side effects.

Example: Let's test this simple function.

# File: calculator.py
def add(a, b):
    return a + b

Example 1: `unittest` style (Verbose)

# File: test_calculator_unit.py
import unittest
from calculator import add

class TestCalculator(unittest.TestCase):
    def test_add(self):
        # Specific assertion method
        self.assertEqual(add(2, 3), 5)
        self.assertEqual(add(-1, 1), 0)

# To run: python -m unittest test_calculator_unit.py

Example 2: `pytest` style (Preferred)

# File: test_calculator_pytest.py
from calculator import add

def test_add():
    # Just a plain assert!
    assert add(2, 3) == 5
    assert add(-1, 1) == 0

# To run: pytest

Example 3: Mocking with `unittest.mock.patch`

# File: data_fetcher.py
import requests

def get_user_name(user_id):
    # This makes a real network request! We must mock it.
    response = requests.get(f"https://api.example.com/users/{user_id}")
    return response.json()["name"]

# File: test_data_fetcher.py
import unittest.mock
from data_fetcher import get_user_name

def test_get_user_name():
    # Create a fake response object
    mock_response = unittest.mock.Mock()
    mock_response.json.return_value = {"name": "Alice"}

    # Use 'patch' to replace 'requests.get' with our mock
    # The 'with' block ensures the patch is only active here
    with unittest.mock.patch("data_fetcher.requests.get", return_value=mock_response) as mock_get:
        
        name = get_user_name(1)
        
        # Check that our function returned the mock data
        assert name == "Alice"
        
        # Check that 'requests.get' was called correctly
        mock_get.assert_called_with("https://api.example.com/users/1")

Tip: In an interview, always express a preference for pytestfor its simplicity and power. Explain mocking as a way to ensure isolation and avoid external dependencies like networks or databases.

67. What are contextlib.contextmanager and ExitStack — when to use them?

These tools are all about resource management , specifically for use with the with statement.

The with statement (the "Context Manager Protocol") is for code that needs a setupand teardown step. The most common example is with open(...) as f: .

Setup: The file is opened.
Work: You read or write to f .
Teardown: The file is always closed, even if an error happens.

@contextlib.contextmanager

This is a decorator that lets you create a context manager using a simple generatorfunction, instead of writing a full class with __enter__and __exit__ methods.

Analogy: Think of it as a "pop-up" restaurant kit. Writing a full class is like building a restaurant from scratch. Using @contextmanageris a kit that just asks you:

Code before yield : This is your setup (prep the kitchen).
The yield statement: This is when the customer (with block) is served .
Code after yield (in a finally block): This is your teardown (clean the kitchen).

from contextlib import contextmanager
import time

@contextmanager
def timer(name):
    # 1. SETUP (code before yield)
    print(f"Timer '{name}' started...")
    start_time = time.time()
    
    try:
        # 2. YIELD (control goes back to the 'with' block)
        yield
    finally:
        # 3. TEARDOWN (code after yield, in 'finally')
        end_time = time.time()
        print(f"Timer '{name}' finished in {end_time - start_time:.2f}s")

# How to use it:
with timer("MyProcess"):
    # This code runs where 'yield' is
    time.sleep(1)
    print("Doing work...")
    time.sleep(1)

Expected Output:

Timer 'MyProcess' started...
Doing work...
Timer 'MyProcess' finished in 2.00s

contextlib.ExitStack

ExitStack is a "manager for managers." You use it when you need to manage a dynamic or variable number of resources. A regular withstatement requires you to "stack" them:

`with open('a') as f, open('b') as g:`

But what if you don't know if you need `f`, `g`, or `h` until runtime?

Analogy: An ExitStackis like a coat check at a party. You (`ExitStack`) open the door. As guests (resources) arrive, you take their coat and add it to your rack (stack.enter_context(resource) ). When the party (the with block) is over, you, the coat check, are responsible for handing allthe coats back in the reverse order, ensuring everyone gets their coat (all resources are cleaned up).

from contextlib import ExitStack

# Imagine we have a list of filenames to open
filenames = ["file1.txt", "file2.txt"] 
# We'll mock 'open' for this example
from unittest.mock import mock_open, patch

# Create mock file objects that print when 'closed'
mock_file1 = mock_open().return_value
mock_file1.name = "file1.txt"
mock_file1.close.side_effect = lambda: print("file1.txt closed")

mock_file2 = mock_open().return_value
mock_file2.name = "file2.txt"
mock_file2.close.side_effect = lambda: print("file2.txt closed")

files_to_open = [mock_file1, mock_file2]

# A 'patch' to return our specific mocks
def side_effect(filename, *args):
    if filename == "file1.txt": return mock_file1
    if filename == "file2.txt": return mock_file2

with patch("builtins.open", side_effect=side_effect):
    with ExitStack() as stack:
        opened_files = []
        for fname in filenames:
            # Dynamically open each file and add it to the stack
            # 'enter_context' calls the resource's __enter__
            f = stack.enter_context(open(fname, "r"))
            opened_files.append(f)
            print(f"Opened {f.name}")
        
        # All files are open here
        print("--- All files are open, 'with' block is ending ---")

    # The ExitStack automatically calls 'close()' on
    # file2.txt, then file1.txt (LIFO order)

Expected Output:

Opened file1.txt
Opened file2.txt
--- All files are open, 'with' block is ending ---
file2.txt closed
file1.txt closed

68. Explain the buffer/streaming techniques to process very large files (chunked reads, iterators, lazy evaluation).

This is about handling files that are too large to fit in your computer's RAM (e.g., a 50GB log file).

The Problem: If you do `data = file.read()` , Python will try to load all 50GB into a single string in memory, which will crash your program.

The Solution (Streaming):You read the file piece by piece , processing each piece and then discarding it before reading the next one.

Analogy: You don't try to "read" a 50-ton mountain of sand by picking it all up at once. You process it one scoop (a chunk) or one grain (a line) at a time.

1. Iterators (Line-by-Line):This is the simplest and most Pythonic way for text files. When you iterate over a file object, Python automatically reads it line by line.
`for line in file:`
2. Chunked Reads:This is for binary files or when "lines" don't make sense. You read a fixed number of bytes (e.g., 8192 bytes or 8KB) in a loop until the file is empty.
3. Lazy Evaluation (Generators):This is the pattern that enablesefficient processing. You create a "pipeline" of generators. The data flows through this pipeline one piece at a time, and no large lists are ever created in memory.

Example: Let's build a generator pipeline to find "ERROR" lines in a huge log file. (We'll mock the large file with a generator).

# 1. Simulate a huge log file (our 'file' object)
# This generator yields one line at a time (lazy)
def huge_file_reader(n_lines):
    print("(Reader: Starting to read file...)")
    for i in range(n_lines):
        if i % 3 == 0:
            yield f"Line {i}: INFO: System ok\n"
        else:
            yield f"Line {i}: ERROR: System failure {i}\n"
    print("(Reader: Reached end of file.)")

# 2. A generator to filter the lines (lazy)
def grep_errors(lines):
    print("(Filter: Starting to filter...)")
    for line in lines:
        if "ERROR" in line:
            # 'strip()' removes the newline character
            yield line.strip()

# 3. A function to process the results (the consumer)
def count_errors(error_lines):
    print("(Counter: Starting to count...)")
    count = 0
    for error in error_lines:
        count += 1
        # To show it's lazy, we'll print the first few
        if count <= 3:
            print(f"  > Found error: {error}")
    return count

# --- Build the lazy pipeline ---
# No file is read, no filtering happens yet.
file_lines = huge_file_reader(10_000)
error_lines = grep_errors(file_lines)
total_errors = count_errors(error_lines)

print("--- Pipeline created. Now, pulling data... ---")
# The 'count_errors' function starts pulling data.
# This pull goes all the way back to the 'huge_file_reader'.
# 'huge_file_reader' yields one line.
# 'grep_errors' checks it, and yields if it's an error.
# 'count_errors' receives it.
# This loop repeats 10,000 times.

print(f"\nTotal errors found: {total_errors}")

Expected Output:

--- Pipeline created. Now, pulling data... ---
(Counter: Starting to count...)
(Filter: Starting to filter...)
(Reader: Starting to read file...)
  > Found error: Line 1: ERROR: System failure 1
  > Found error: Line 2: ERROR: System failure 2
  > Found error: Line 4: ERROR: System failure 4
(Reader: Reached end of file.)

Total errors found: 6666

Notice how the print statements show the flow. The "Counter" starts, which pulls from the "Filter," which pulls from the "Reader." The whole file is processed without ever holding more than one line in memory at a time.

69. How do you use multiprocessing safely with shared state and inter-process communication?

This is a key challenge in Python. Because of the Global Interpreter Lock (GIL) , Python threads can't achieve true parallelism for CPU-bound tasks.

The multiprocessing module bypasses the GIL by creating new processes . Each process has its own Python interpreter and its own memory.

Analogy:

Threading: Two chefs in the same kitchen (shared memory). They can work on different tasks (I/O), but they have to coordinate to use the same cutting board (the GIL).
Multiprocessing: Two chefs in two separate kitchens (separate memory). They can both chop at full speed (true parallelism), but they can't see each other's ingredients.

The problem becomes: how do the chefs (processes) communicate?

Option 1: Shared State (Dangerous, Avoid if Possible)

This is like creating a "magic window" between the kitchens.multiprocessing provides special objects like Value and Array that can be safely shared. You must use a Lock to prevent them from trying to write to it at the exact same time (a race condition).

Option 2: Inter-Process Communication (IPC) (Preferred)

This is the clean, safe way. Instead of sharing memory, processes send messages to each other.

Queue: A process-safe "conveyor belt." One process .put() s a task on the belt. A worker process .get() s the task from the belt. This is the most common pattern.
Pipe: A "walkie-talkie" connection between two specific processes. Good for two-way communication.

Example: Using a Poolof workers and Queues to process tasks. We'll use a Queue for tasks and another Queue for results.

import multiprocessing
import time
import os

# This is the 'work' our processes will do
def worker_task(x):
    pid = os.getpid()
    print(f"Process {pid}: Working on {x}")
    time.sleep(1) # Simulate CPU-bound work
    result = x * x
    print(f"Process {pid}: Finished {x}")
    return result

if __name__ == "__main__":
    # This 'if' block is required on Windows
    print("Main: Starting pool...")
    
    # Create a pool of 4 worker processes
    # 'with' statement handles setup and teardown
    with multiprocessing.Pool(processes=4) as pool:
        
        tasks = [1, 2, 3, 4, 5, 6, 7, 8]
        
        # 'pool.map' is a powerful shortcut
        # It takes the list 'tasks', chops it up,
        # sends the pieces to the worker processes,
        # and collects the results in order.
        results = pool.map(worker_task, tasks)
        
    print(f"Main: All tasks complete.")
    print(f"Main: Results: {results}")

Expected Output (order of 'Working'/'Finished' may vary):

Main: Starting pool...
Process 12345: Working on 1
Process 12346: Working on 2
Process 12347: Working on 3
Process 12348: Working on 4
Process 12345: Finished 1
Process 12345: Working on 5
Process 12346: Finished 2
Process 12346: Working on 6
Process 12347: Finished 3
Process 12347: Working on 7
Process 12348: Finished 4
Process 12348: Working on 8
Process 12345: Finished 5
Process 12346: Finished 6
Process 12347: Finished 7
Process 12348: Finished 8
Main: All tasks complete.
Main: Results: [1, 4, 9, 16, 25, 36, 49, 64]

Tip: Emphasize that IPC (Queues, Pipes) is almost always safer and better than shared state (Values, Locks) . Using a Pool.map is the easiest, cleanest way to get started.

70. How do you write C extensions (or use cffi) for Python to speed up CPU-bound code?

This is an advanced performance-tuning technique. You use it only after profiling your code (using a tool like cProfile ) and discovering a "hot loop"—a small, CPU-bound part of your code that accounts for 99% of the runtime.

Analogy: Your Python code is a friendly, flexible head waiter (easy to work with, but not the fastest). A C extension is like hiring a master chef (your C function) who works at lightning speed in the kitchen. You don't ask the chef to greet guests; you ask Python to handle the logic and then "call" the chef to do the one, intense, time-consuming task.

Methods:

Python/C API: The original, manual, and very difficult way. You write C code that directly manipulates PyObject*pointers. It's powerful but error-prone.
Cython: A very popular choice. You write in a Python-like language that gets compiled to C. It's an easier transition for Python developers.
cffi (C Foreign Function Interface):A modern, easy-to-use library. It's fantastic for interfacing with existing C libraries (.dll, .so) or for compiling a small snippet of your own C code.

The cffi "API mode" is the easiest way to demonstrate this. You write a "build" script that you run once to compile your C code into a Python module.

Example: Using cffito create a super-fast add function.

Step 1: The Build Script (e.g., `build_my_math.py`)

from cffi import FFI
ffibuilder = FFI()

# 1. Define the C function 'header' for CFFI
ffibuilder.cdef("""
    int my_add(int x, int y);
""")

# 2. Define the actual C source code
ffibuilder.set_source("_my_math",  # name of the module to create
"""
    int my_add(int x, int y) {
        // This is pure C code
        return x + y;
    }
""")

if __name__ == "__main__":
    # This tells CFFI to build the extension
    ffibuilder.compile(verbose=True)

You would run this once from your terminal: `python build_my_math.py` . This compiles the C code and creates a file like `_my_math.cpython-310-x86_64-linux-gnu.so` .

Step 2: Your Python Application (e.g., `main.py`)

# Now, we can import the module we just built
from _my_math import ffi, lib

# 'lib' contains our C functions
result = lib.my_add(10, 20)

print(f"The C function returned: {result}")

# This Python function does the 'same' thing
def python_add(x, y):
    return x + y

# For a simple 'add', the speed difference is small.
# But for a loop with 1 billion iterations,
# 'lib.my_add' would be thousands of times faster.

Expected Output (after running build, then main):

The C function returned: 30

Previous Next

Chat about this topic?