Python Full-Stack Interview Questions 71–75 (Security, Unicode, mmap, API Resilience, Advanced Generators)
Welcome! This lesson covers some very important topics for building robust, professional applications. We'll start with critical security pitfalls, dive into the complexities of Unicode, explore advanced file handling with memory-mapping, learn how to build resilient APIs that don't fail, and finish with a deep dive into advanced generators and coroutines. These are fantastic topics to show your depth as an engineer, so let's explore them calmly.
71. What are common security pitfalls in Python web apps (pickle misuse, insecure deserialization, injection risks)?
This is one of the most important topics in a web interview. It's about protecting your application and your users from attack. The core idea is never trust user input .
Insecure Deserialization (Pickle Misuse)
This is the most severe and easy-to-miss vulnerability.Serialization is converting a Python object (like a dict or a custom class) into a byte stream (e.g., to save to disk or send over the network).Deserialization is the reverse.
The Pitfall: The picklemodule is not secure. When pickle.load()rebuilds an object, it can be tricked into running arbitrary code . If an attacker can control the data you areunpickling (e.g., from a web request, a cookie, or a file they uploaded), they can take over your server.
Analogy: Using pickleon untrusted data is like accepting a "magic food replicator" capsule from a stranger. You put it in your machine (your app) expecting it to create a sandwich (an object), but the stranger designed it to create a bomb(execute code) inside your kitchen.
The Solution: Neveruse pickle , cPickle , or dill on data that did not 100% originate from your trusted code. For communicating with users or other services, always use a safe, data-only format like JSON .
import pickle
import os
import json
# --- DANGEROUS PICKLE EXAMPLE ---
# An attacker creates a malicious byte string.
# This payload runs 'os.system("echo hello from attacker")'
# when unpickled. In real life, this would be a reverse shell.
malicious_payload = (
b"cposix\nsystem\n(S'echo DANGEROUS: hello from attacker'\ntR."
)
print("--- Running DANGEROUS pickle.load() ---")
try:
# This is the vulnerability!
pickle.loads(malicious_payload)
except Exception as e:
print(f"Pickle error: {e}")
# --- SAFE JSON EXAMPLE ---
# An attacker tries to send a similar payload via JSON.
malicious_json = '{"command": "os.system(\"echo pwned\")"}'
print("\n--- Running SAFE json.loads() ---")
# This is SAFE. json.loads() only parses data.
# It does not execute anything.
data = json.loads(malicious_json)
print(f"Data received: {data}")
print("No code was executed.")
Expected Output:
--- Running DANGEROUS pickle.load() ---
DANGEROUS: hello from attacker
--- Running SAFE json.loads() ---
Data received: {'command': 'os.system("echo pwned")'}
No code was executed.
Injection Risks (e.g., SQL Injection)
This happens when you mix data(from a user) with instructions(like an SQL query or a shell command).
The Pitfall: Using string formatting (like f-strings) to build queries. An attacker can "inject" their own commands into your data.
The Solution: Use parameterized queries (also called prepared statements). This separates thequery logic from the data . Your database driver (or ORM like SQLAlchemy) handles sanitizing the data for you.
import sqlite3
# User input from an attacker
# They want to log in as 'admin' without a password
user_input = "' OR '1'='1" # This string is the attack
# --- DANGEROUS: SQL Injection Vulnerability ---
db = sqlite3.connect(":memory:")
db.execute("CREATE TABLE users (name TEXT, pass TEXT)")
db.execute("INSERT INTO users VALUES ('admin', 'abc123')")
# Building the query with an f-string is DANGEROUS
query = f"SELECT * FROM users WHERE name = '{user_input}'"
print(f"Dangerous query: {query}")
cursor = db.execute(query)
print(f"Dangerous result: {cursor.fetchone()}") # Attacker logs in!
# --- SAFE: Parameterized Query ---
# Notice the '?' placeholder.
safe_query = "SELECT * FROM users WHERE name = ?"
# We pass the user input as a separate tuple.
# The database driver safely inserts the data.
# The attacker's string is treated as just a string,
# not as part of the SQL command.
cursor = db.execute(safe_query, (user_input,))
print(f"\nSafe query result: {cursor.fetchone()}") # Returns None!
db.close()
Expected Output:
Dangerous query: SELECT * FROM users WHERE name = '' OR '1'='1'
Dangerous result: ('admin', 'abc123')
Safe query result: None
72. Explain Unicode handling in Python and common gotchas (normalization, encodings, bytes vs str).
This is a fundamental concept in Python 3. Getting this wrong leads to the most common error: the UnicodeEncodeErroror UnicodeDecodeError .
Analogy: Think of `str`as the idea of a character (like the musical note C#). `bytes` is the physical storage (the MP3 file of that note). An encoding(like 'utf-8' or 'ascii') is the instruction manual that tells you how to convert between the idea and the physical file.
- `str` (Strings): This is what you should use for all text insideyour application. It's a sequence of Unicode "code points" (abstract numbers for characters).
- `bytes` (Byte Strings): This is what you use for binary data(images, zip files) and for text that is on its way to/from the outside world (network, disk).
- `encode()`: Converts `str` → `bytes` . (The idea → physical file). You do this when writing to a file or sending on a network.
- `decode()`: Converts `bytes` → `str` . (Physical file → the idea). You do this when reading from a file or network.
Gotcha 1: The Default Encoding Trap
`open('file.txt', 'w')` uses your OS's default encoding, which can be 'ascii' or 'cp1252' on Windows. This will crash if you try to save a character like 'é' or '😊'.
Solution: Alwaysbe explicit. Use `open('file.txt', 'w', encoding='utf-8')` . UTF-8 is the universal standard and can handle all characters.
Gotcha 2: Normalization
Some characters can be represented in multiple ways. For example, "é" can be a single character OR an "e" + a "´" (combining accent). These two strings will lookidentical but fail a`==` check.
Solution: Use unicodedata.normalize()to convert both strings to a canonical (standard) form before comparing. 'NFC' (Normalization Form C) is the most common.
import unicodedata
# --- str vs bytes (Encode/Decode) ---
my_str = "café 😊"
print(f"Original str: {my_str}")
# Encode (str -> bytes) using UTF-8
my_bytes = my_str.encode("utf-8")
print(f"Encoded bytes: {my_bytes}")
# Decode (bytes -> str)
decoded_str = my_bytes.decode("utf-8")
print(f"Decoded str: {decoded_str}")
# --- Gotcha 1: Encoding Error ---
try:
my_str.encode("ascii")
except UnicodeEncodeError as e:
print(f"\nCaught expected error: {e}")
# --- Gotcha 2: Normalization ---
s1 = "café" # single char 'é'
s2 = "café" # 'e' + combining accent '´'
# They look the same, but are they?
print(f"\nString 1: {s1}, String 2: {s2}")
print(f"Are they equal? {s1 == s2}")
print(f"Length of s1: {len(s1)}, Length of s2: {len(s2)}")
# Fix with normalization
s1_norm = unicodedata.normalize("NFC", s1)
s2_norm = unicodedata.normalize("NFC", s2)
print(f"Are they equal after NFC normalization? {s1_norm == s2_norm}")
Expected Output:
Original str: café 😊
Encoded bytes: b'caf\xc3\xa9 \xf0\x9f\x98\x8a'
Decoded str: café 😊
Caught expected error: 'ascii' codec can't encode character '\xe9' in position 3: ordinal not in range(128)
String 1: café, String 2: café
Are they equal? False
Length of s1: 4, Length of s2: 5
Are they equal after NFC normalization? True
73. What are mmap and memory-mapped files, and when should they be used?
mmap (memory-mapped file) is an advanced OS feature that Python exposes via the mmapmodule.
It allows you to map a file on disk directly into your process's virtual memory . Instead of using file.read()and file.write() , you can interact with the file as if it were a giant string or byte array (like memoryview ) right in RAM.
Analogy: Normally, to edit a huge ledger book (a file), you (your app) ask a clerk (the OS) to`read()` a page, bring it to your desk (RAM), you write on it, and then hand it back with`write()` .
Using mmap is like the OS giving you a "magic portal" to the ledger. The book stays on the shelf, but any change you make on a "portal page" on your desk isinstantly reflected in the book itself, and vice-versa. The OS handles all the syncing efficiently in the background.
When to use mmap:
- Random Access in Large Files:You have a 20GB file and need to read or write 100 bytes at offset 15,000,000,000. Normally, you'd have to seek()and read() . With mmap , you just do `mm[15000000000:15000000100]` . The OS handles paging in only the tiny part of the file you need.
- Inter-Process Communication (IPC):This is a very fast way for two differentprocesses on the same machine to share data. If both processesmmap the samefile, they are sharing the same memory. One process can write data, and the other can read it instantly (with proper locking).
It is not generally faster for sequentially reading a whole file (streaming is better for that). It's a specific tool for random access or shared memory.
import mmap
import os
# Create a dummy file
filename = "test.dat"
with open(filename, "wb") as f:
f.write(b"Hello World. This is a test file.")
# Open the file for reading and writing
with open(filename, "r+b") as f:
# Create the memory map
# fileno() gets the underlying file descriptor from the OS
# 0 means map the whole file
with mmap.mmap(f.fileno(), 0) as mm:
# 1. Read from the map like a byte array
print(f"Original (bytes 0-13): {mm[0:13]}")
# 2. Write to the map (this changes the file on disk)
print("Modifying file via mmap...")
mm[0:5] = b"HELLO"
# 3. Move to a different part of the file
mm.seek(25)
mm.write(b"FILE.")
# 4. Flush changes to disk (good practice)
mm.flush()
# The mmap is closed automatically here
# The file is closed automatically here
# Verify the changes by reading the file normally
print("\nReading file from disk:")
with open(filename, "rb") as f:
print(f"Contents: {f.read()}")
os.remove(filename) # Clean up
Expected Output:
Original (bytes 0-13): b'Hello World. '
Modifying file via mmap...
Reading file from disk:
Contents: b'HELLO World. This is a test FILE.'
74. How do you build resilient APIs in Python — retries, timeouts, circuit breakers, backoff strategies?
Building a resilient service means accepting that networks will fail . The service you are calling (a database, another microservice) will be slow or go down. Resiliency patterns are how you handle this gracefully.
Analogy: You're trying to call a busy pizza place (an API).
- Timeouts: This is the most important first step. You decide you'll only wait 15 seconds for them to pick up. If they don't, you hang up. This prevents your app from hanging forever waiting for a dead service. (e.g., `requests.get(..., timeout=15)` ).
- Retries: You hang up, and you immediately call again . Maybe the line was just busy for a second. This handles temporary, transient failures.
- Exponential Backoff:Calling again immediately might be a bad idea if the pizza place is overwhelmed (DDoS-ing them). So, you retry with backoff :
- 1st fail: Wait 1 second, retry.
- 2nd fail: Wait 2 seconds, retry.
- 3rd fail: Wait 4 seconds, retry...
This gives the service time to recover. - Circuit Breaker: You've tried 5 times, and it failed every time. The pizza place isclearly closed . You "trip the breaker." For the next 5 minutes, you don't even tryto call. You instantly fail(return an error, maybe "Service Unavailable").
- After 5 minutes, the breaker moves to "half-open." You allow one call to go through.
- If it succeeds, you "close" the circuit and resume normal calls.
- If it fails, you "open" the circuit again and wait another 5 minutes.
This prevents your app from wasting time calling a dead service and gives the service maximum time to recover.
You can implement these yourself using decorators, or use battle-tested libraries like `tenacity`(for retries/backoff) or `pybreaker`(for circuit breakers).
Example: A retry-with-backoff decorator (a simplified version of what `tenacity` does).
import time
import random
from functools import wraps
# A decorator for retries with exponential backoff
def retry_with_backoff(tries=3, delay=1, backoff=2):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
_tries, _delay = tries, delay
while _tries > 0:
try:
# Try to run the function
return func(*args, **kwargs)
except Exception as e:
_tries -= 1
if _tries == 0:
print(f"Final attempt failed. Raising error.")
raise # Re-raise the last exception
print(f"Failed: {e}. Retrying in {_delay}s...")
time.sleep(_delay)
_delay *= backoff # Double the delay
return wrapper
return decorator
# --- Simulate a flaky API call ---
@retry_with_backoff(tries=4, delay=1, backoff=2)
def get_pizza_status():
# Fails 75% of the time
if random.random() < 0.75:
raise ConnectionError("Pizza shop line is busy!")
return "SUCCESS: Pizza is on its way!"
# --- Run the resilient function ---
try:
status = get_pizza_status()
print(f"\nFinal Status: {status}")
except Exception as e:
print(f"\nFinal Status: FAILED ({e})")
Expected Output (will vary due to random):
Failed: Pizza shop line is busy!. Retrying in 1s...
Failed: Pizza shop line is busy!. Retrying in 2s...
Failed: Pizza shop line is busy!. Retrying in 4s...
Final attempt failed. Raising error.
Final Status: FAILED (Pizza shop line is busy!)
(If you run it a few times, you will eventually see it succeed).
75. Explain generator.send(), throw(), and how to implement producer/consumer patterns with generators.
This question is about "coroutines"—a more advanced form of generator.
A normal generator is aproducer . You call `next()` , and it produces one value for you (a one-way street).
A coroutine is a consumer(or a two-way street). It pauses at `yield`to receive data that you`.send()` into it.
The key is the expression `value = yield` .
- `generator.send(data)`:Sends a value into the generator. The generator's code resumes, and the `(yield)`expression itself evaluates to `data` .
- `next(gen)` or `gen.send(None)`:This is required to "prime" the coroutine—to run its code until it hits the first`yield` and pauses, ready to receive data.
- `generator.throw(ex)`:"Throws" an exception intothe generator at the point it is paused. This allows you to signal an error or tell it to clean up.
Producer/Consumer Pattern:The producer is the outside code (e.g., your main loop) that generates data. The consumeris the coroutine, which is "sent" data and processes it.
Example: A coroutine that calculates a running average.
def coroutine_averager():
"""A coroutine to calculate a running average."""
print("Coroutine started...")
total = 0.0
count = 0
average = None
try:
while True:
# This is the key:
# 1. It pauses here.
# 2. When '.send(value)' is called, 'value' is
# assigned to 'term', and the loop continues.
term = yield average
total += term
count += 1
average = total / count
except GeneratorExit:
# We can catch '.close()'
print("Coroutine closing. Final average:", average)
except ValueError:
# We can catch '.throw()'
print("Coroutine caught a ValueError. Resetting.")
total, count, average = 0.0, 0, None
# 1. Create the consumer
consumer = coroutine_averager()
# 2. Prime the coroutine by calling next()
# It runs until the first 'yield average'
avg = next(consumer)
print(f"Primed. Initial value: {avg}") # Is None
# 3. The 'producer' loop sends data
print(f"Sending 10... Average: {consumer.send(10)}")
print(f"Sending 20... Average: {consumer.send(20)}")
print(f"Sending 30... Average: {consumer.send(30)}")
# 4. Throw an exception into it
try:
consumer.throw(ValueError)
except StopIteration:
pass # This is expected
print(f"Sending 5... Average: {consumer.send(5)}")
# 5. Close the coroutine
consumer.close()
Expected Output:
Coroutine started...
Primed. Initial value: None
Sending 10... Average: 10.0
Sending 20... Average: 15.0
Sending 30... Average: 20.0
Coroutine caught a ValueError. Resetting.
Sending 5... Average: 5.0
Coroutine closing. Final average: 5.0