Navigate by Range

Python Full-Stack Interview Questions 56–60 (Performance, Memory, Profiling, Tracing, Serialization)

This lesson answers five practical interview scenarios: speeding up slow API code, strategies for processing millions of rows with low memory, how to profile and benchmark Python programs, techniques to measure and reduce memory usage, and the trade-offs between common serialization formats. Each section uses short explanations, clear analogies, and runnable examples with expected output so you can try them locally.

56. Scenario: The API response is slow. How do you optimize Python code?

When an API is slow you should first measure (is the bottleneck network, CPU, or I/O?). Common approaches: add concurrency (threads or asyncio) to avoid blocking, reuse connections (connection pooling), cache repeated responses, and reduce unnecessary work (avoid repeated parsing or large allocations). Think: if you are waiting in line, get multiple people to wait in parallel rather than one after another.

Measure first: log timings and use a profiler or timers.
Use connection pooling (requests.Session) or an async HTTP client (aiohttp/httpx).
Cache responses when possible; use retries with backoff for transient failures.

Example: simulate calling a slow API (time.sleep) and compare sequential vs ThreadPoolExecutor and asyncio approaches.

# Simulated slow API: compare sequential vs concurrent calls
import time
from concurrent.futures import ThreadPoolExecutor
import asyncio


def slow_api(x):
    time.sleep(0.5)  # simulate network delay
    return f"resp:{x}"


# Sequential
start = time.time()
res = [slow_api(i) for i in range(4)]
print("sequential:", res, "took", time.time() - start)


# ThreadPoolExecutor
start = time.time()
with ThreadPoolExecutor(max_workers=4) as ex:
    res2 = list(ex.map(slow_api, range(4)))
print("threads:   ", res2, "took", time.time() - start)


# Asyncio with simulated async sleep
async def async_slow(x):
    await asyncio.sleep(0.5)
    return f"aresp:{x}"


async def run_async():
    start = time.time()
    res3 = await asyncio.gather(*(async_slow(i) for i in range(4)))
    print("async:     ", res3, "took", time.time() - start)


asyncio.run(run_async())

Expected output (timings approximate):

# Output:
sequential: ['resp:0', 'resp:1', 'resp:2', 'resp:3'] took 2.01
threads:    ['resp:0', 'resp:1', 'resp:2', 'resp:3'] took 0.52
async:      ['aresp:0', 'aresp:1', 'aresp:2', 'aresp:3'] took 0.50

Tip: If the bottleneck is I/O (waiting on network), concurrency typically wins. If CPU-bound, consider multiprocessing, C extensions, or optimizing algorithms.

57. Scenario: You must handle millions of rows in Python. How do you optimize memory usage?

When data is large, avoid loading everything into memory. Use streaming, chunked processing, generators, iterators, and lightweight data representations (tuples, arrays, or memoryviews). If using pandas, use chunked reading (read_csv with chunksize) or process with dask/vaex for out-of-core workflows.

Stream data: read and process line-by-line or in small chunks.
Avoid building large lists; yield results and write outputs incrementally.
Use compact types: array.array, struct, or numpy with specific dtypes.

Example: process a large CSV in chunks and aggregate without storing all rows.

# Process CSV-like data in streaming fashion

def lines_generator(file_path):
    with open(file_path, "r", encoding="utf-8") as f:
        for line in f:
            yield line.rstrip("\n")


# Simulate processing: count lines that meet a condition
def count_matching(path):
    count = 0
    for line in lines_generator(path):
        # simple parse: assume CSV and check second column
        parts = line.split(",")
        if len(parts) > 1 and parts[1] == "match":
            count += 1
    return count


# Create small test file for demo
with open("demo.csv", "w", encoding="utf-8") as f:
    for i in range(1000):
        val = "match" if i % 10 == 0 else "other"
        f.write(f"{i},{val}\n")


print("matches:", count_matching("demo.csv"))

Expected output:

# Output:
matches: 100

58. How do you profile and benchmark Python code? (tools: cProfile, timeit, line profilers)

Profiling finds where time is spent. Use cProfile for whole-program profiling, pstats to inspect results, and timeit for micro-benchmarks of small snippets. Line profilers (e.g., line_profiler) show per-line costs and are very helpful for hotspots.

Use cProfile to collect call stats, then inspect with pstats or snakeviz for visualization.
Use timeit for small, repeatable timing comparisons.
Use line profiler for detailed line-level analysis when you know the suspect function.

Example: compare list comprehension vs generator consumption with timeit.

# timeit example
import timeit

setup = "N = 10000"
stmt1 = "[i*i for i in range(N)]"
stmt2 = "sum(i*i for i in range(N))"

t1 = timeit.timeit(stmt1, setup=setup, number=100)
t2 = timeit.timeit(stmt2, setup=setup, number=100)
print("list comp (100 runs):", t1)
print("generator sum (100 runs):", t2)

Expected output (numbers vary by machine):

# Output:
list comp (100 runs): 0.85
generator sum (100 runs): 1.10

59. How do you measure and reduce memory usage (profilers, tracemalloc, mmap, streaming)?

Use tracemalloc to take snapshots and compare memory usage over time. For large files consider mmap to access data without full copies. Reduce memory by streaming, reusing buffers, and selecting compact types. Think of tracemalloc like taking photos of memory at different times to see what's growing.

tracemalloc: take snapshots and compare to find leaks.
Use generators and memoryview to avoid big copies.
Use mmap for very large, read-only file access without loading it all.

Example: trace memory for a list vs generator and show a simple mmap usage.

# tracemalloc demo
import tracemalloc


def build_list(n):
    return [i for i in range(n)]


def build_gen(n):
    for i in range(n):
        yield i


tracemalloc.start()
a = build_list(100000)
snap1 = tracemalloc.take_snapshot()


# drop reference to list and use generator instead
del a
b = build_gen(100000)  # generator uses little memory
snap2 = tracemalloc.take_snapshot()
top = snap2.compare_to(snap1, "lineno")
print("Top differences (summary):")
for stat in top[:3]:
    print(stat)  # shows where memory changed


# mmap demo (create small file)
with open("mmap_demo.txt", "wb") as f:
    f.write(b"hello world " * 1000)


import mmap
with open("mmap_demo.txt", "r+b") as f:
    mm = mmap.mmap(f.fileno(), 0)
    print("first 11 bytes:", mm[:11])
    mm.close()

Expected output (tracemalloc lines vary):

# Output (example):
Top differences (summary): <StatisticDiff filename:... lineno:... size_diff: ... count_diff: ...>
first 11 bytes: b'hello world'

60. Explain serialization options (JSON, pickle, msgpack) and their safety/performance trade-offs.

Serialization converts in-memory objects to bytes for storage or network transfer. JSON is text, portable, and safe to parse from untrusted sources but slower and limited to basic types. pickle is Python-specific, fast, and supports arbitrary objects but is unsafe to load from untrusted data. msgpack (or similar binary formats) is compact and faster than JSON while staying language-neutral.

JSON: interoperable and safe for untrusted input; slower and limited to basic types.
pickle: powerful and fast but insecure for untrusted data; avoid for external inputs.
msgpack: compact and faster than JSON, good when size and speed matter and you trust the data source.

Example: json vs pickle serialization and a tiny timeit comparison.

# Serialization demo: json vs pickle
import json
import pickle
import timeit


obj = {"id": 1, "name": "Alice", "scores": list(range(100))}


# JSON serialization
s_json = json.dumps(obj)
o_json = json.loads(s_json)


# Pickle serialization
s_pickle = pickle.dumps(obj)
o_pickle = pickle.loads(s_pickle)


print("json len", len(s_json))
print("pickle len", len(s_pickle))


# simple timing
t_json = timeit.timeit("json.dumps(obj)", globals=globals(), number=2000)
t_pickle = timeit.timeit("pickle.dumps(obj)", globals=globals(), number=2000)
print("json dumps (2000):", t_json)
print("pickle dumps (2000):", t_pickle)

Expected output (sizes/timings vary):

# Output:
json len 823
pickle len 1123
json dumps (2000): 0.42
pickle dumps (2000): 0.30

Final tips: always measure before optimizing; prefer streaming and generators for large data; use safe serializers for untrusted data; and add profiling and tracemalloc jobs to your CI so regressions in performance or memory are visible early.

Previous Next

Chat about this topic?