Pythonic Code: Idioms Every Developer Should Know
The 15 Python idioms that separate Java-style Python from code that actually belongs in the language
Abstract AlgorithmsAI-assisted content. This post may have been written or enhanced with the help of AI tools. While efforts are made to ensure accuracy, the content may contain errors or inaccuracies. Please verify critical information independently.
TLDR: Writing
for i in range(len(arr)):works, but Python veterans will flag it in your first code review. Idiomatic Python usesenumerate,zip, comprehensions, context managers, unpacking, the walrus operator, and truthiness checks โ not because they're clever, but because they compress intent into the minimum readable form. Learn these 10+ idioms and your Python will look like it belongs in the language.
๐ Your First Python Code Review: What a Senior Dev Actually Sees
Imagine you're three months into your first Python role. You came from five years of Java. You've been writing Python the same way you wrote Java โ and it all runs. Then a senior engineer opens your pull request.
Here's your code:
# PR under review โ looks fine, runs fine, fails the culture test
def get_active_user_names(users):
result = []
for i in range(len(users)):
if users[i]["active"] == True:
result.append(users[i]["name"])
return result
The reviewer leaves six comments in under two minutes:
range(len(users))โ usefor user in usersdirectly== Trueโ use truthiness:if user["active"]:- Manual
.append()in a loop โ this is a list comprehension users[i]indexed access in a loop โ you don't need the index- The function body is 5 lines for a one-liner
- No type hints, no
dict.get()for safe access
The reviewer isn't being harsh. They're teaching you to speak Python. Every language has idioms โ patterns so common that every practitioner uses them automatically, and their absence signals inexperience more loudly than any bug.
In Python, these idioms aren't just style points. They affect readability (other developers can scan your code faster), performance (some idioms are measurably faster at the bytecode level), and hiring signals (every Python interviewer mentally categorizes candidates the moment they see how they iterate a list).
The same function written by a Python veteran:
def get_active_user_names(users: list[dict]) -> list[str]:
return [user["name"] for user in users if user.get("active")]
One line. Self-documenting. No index arithmetic. No manual accumulator. That's the destination this post will take you to.
๐ The Zen of Python: Why Idiomatic Code Is a First-Class Value in the Language
Open a Python REPL and type import this. You'll see the Zen of Python, a set of 19 aphorisms written by Tim Peters that act as the language's design philosophy. Three of them explain why Python idioms exist:
Beautiful is better than ugly. Explicit is better than implicit. Readability counts.
Python was designed from the start to be a language humans read, not just machines execute. Guido van Rossum has said in interviews that Python is read roughly ten times for every one time it is written, so optimising for the reader is the rational choice.
This philosophy has a practical consequence: the community doesn't just tolerate a "best way" to do common tasks โ it actively enforces it through code review culture, linting tools, and the idioms baked into the standard library. When CPython adds enumerate() to the builtins, it's saying: "iterating with an index is common enough that we want to make the idiomatic form obvious and fast."
The Pythonic philosophy can be summarised as three rules:
- Explicit over clever โ If a reader must trace through two layers of abstraction to understand what a line does, the abstraction is not earning its complexity budget.
- Terse when terse IS readable โ A list comprehension is terser than a four-line loop AND more readable, because its structure matches the reader's mental model of "transform a collection." A lambda inside a lambda inside a map call is terse but not readable.
- Use the language's built-ins โ The standard library was designed with idiomatic use in mind. Using
dict.get()instead of a try/except KeyError is not laziness; it is alignment with what the data structure was built to do.
โ๏ธ The Ten Idioms That Separate Python Experts from Java Refugees
These idioms appear in every professional Python codebase. Each one has a before/after pair showing the non-Pythonic version (which runs) and the Pythonic version (which belongs).
1. Iterating with enumerate Instead of range(len(...))
# Non-Pythonic: manual index tracking
fruits = ["apple", "banana", "cherry"]
for i in range(len(fruits)):
print(f"{i}: {fruits[i]}")
# Pythonic: enumerate gives you index AND value
for i, fruit in enumerate(fruits):
print(f"{i}: {fruit}")
# enumerate starts at any offset
for i, fruit in enumerate(fruits, start=1):
print(f"{i}: {fruit}")
enumerate returns (index, value) tuples. It eliminates the [i] noise and makes it explicit that you need both the position and the element. Use it whenever you need the index alongside the value.
2. Pairing Two Iterables with zip
# Non-Pythonic: index-based parallel iteration
names = ["Alice", "Bob", "Carol"]
scores = [92, 87, 95]
for i in range(len(names)):
print(f"{names[i]}: {scores[i]}")
# Pythonic: zip pairs them automatically
for name, score in zip(names, scores):
print(f"{name}: {score}")
zip produces tuples of corresponding elements and stops at the shorter iterable. Use itertools.zip_longest if you need to handle mismatched lengths. In Python 3.10+, zip gained a strict=True parameter that raises a ValueError if the iterables are different lengths โ useful for catching data alignment bugs.
3. List, Dict, and Set Comprehensions
# Non-Pythonic: accumulator pattern
squares = []
for n in range(10):
squares.append(n ** 2)
# Pythonic: list comprehension
squares = [n ** 2 for n in range(10)]
# Dict comprehension: {key: value for item in iterable}
word_lengths = {word: len(word) for word in ["python", "java", "go"]}
# Set comprehension: unique values only
unique_domains = {email.split("@")[1] for email in emails}
# Filter with a condition
even_squares = [n ** 2 for n in range(10) if n % 2 == 0]
Comprehensions are not just syntactic sugar โ they have their own scope in Python 3, and CPython compiles them to faster bytecode than the equivalent loop (covered in the Deep Dive section).
4. Extended Unpacking and Star Expressions
# Non-Pythonic: slicing to extract parts
data = [1, 2, 3, 4, 5]
first = data[0]
rest = data[1:]
last = data[-1]
middle = data[1:-1]
# Pythonic: unpacking with star expressions
first, *rest = data # first=1, rest=[2,3,4,5]
*init, last = data # init=[1,2,3,4], last=5
first, *middle, last = data # first=1, middle=[2,3,4], last=5
# Swap without a temp variable
a, b = 10, 20
a, b = b, a # a=20, b=10
Tuple unpacking is one of Python's most expressive features. The * syntax allows flexible extraction of head, tail, and interior elements without any slicing arithmetic.
5. The Walrus Operator (:=) for Assignment Expressions
Introduced in Python 3.8, the walrus operator assigns a value inside an expression โ most useful in while loops and list comprehensions that would otherwise compute the same value twice.
# Non-Pythonic: compute once, assign, then check
import re
line = "Error: disk full on /dev/sda1"
match = re.search(r"Error: (.+)", line)
if match:
print(match.group(1))
# Pythonic: assign and test in one expression
if match := re.search(r"Error: (.+)", line):
print(match.group(1))
# Also useful in while loops reading streams
import io
data = io.BytesIO(b"hello world")
while chunk := data.read(4):
print(chunk)
The walrus operator eliminates the pattern of computing a value into a throwaway variable just to check it. Use it when the assignment and the conditional test are semantically the same operation.
6. Context Managers with with
# Non-Pythonic: manual resource management
f = open("data.txt", "r")
try:
contents = f.read()
finally:
f.close()
# Pythonic: context manager handles open and close
with open("data.txt", "r") as f:
contents = f.read()
# Multiple context managers in one with statement
with open("input.txt") as src, open("output.txt", "w") as dst:
dst.write(src.read())
Context managers guarantee cleanup even when exceptions occur. The with statement calls __enter__ on entry and __exit__ on exit (including on exceptions). This pattern applies to database connections, threading locks, network sockets, temporary directories โ any resource that must be released.
7. Truthiness Checks Instead of Explicit Comparisons
# Non-Pythonic: explicit comparison to None/True/empty
if name != None and name != "":
print(name)
if len(items) > 0:
process(items)
if flag == True:
do_something()
# Pythonic: use truthiness
if name:
print(name)
if items:
process(items)
if flag:
do_something()
In Python, the following values are all falsy: None, 0, 0.0, "", [], {}, set(), and any object whose __bool__ returns False. Everything else is truthy. Testing truthiness directly is more concise and handles None, empty strings, and empty collections in a single check.
8. dict.get() Instead of Catching KeyError
# Non-Pythonic: try/except for a missing key
try:
value = config["timeout"]
except KeyError:
value = 30
# Pythonic: dict.get() with a default
value = config.get("timeout", 30)
# For nested dicts, use get chains
host = config.get("database", {}).get("host", "localhost")
dict.get(key, default) is the canonical way to read a potentially-absent key. It is more readable, requires no exception handling boilerplate, and is marginally faster than the try/except form for the common case where the key exists.
9. f-Strings for String Formatting
# Non-Pythonic: string concatenation
name = "Alice"
score = 95
msg = "User " + name + " scored " + str(score) + " points."
# Also non-Pythonic: old %-formatting or .format()
msg = "User %s scored %d points." % (name, score)
msg = "User {} scored {} points.".format(name, score)
# Pythonic: f-strings (Python 3.6+)
msg = f"User {name} scored {score} points."
# f-strings support expressions and formatting specs
pi = 3.14159
print(f"Pi to 2 decimals: {pi:.2f}")
print(f"Debug: {name=}") # prints: name='Alice'
f-strings are parsed at compile time and are the fastest string interpolation mechanism in Python. They also support full expressions inside {}, format specifiers, and the = specifier for debug output.
10. Generator Expressions for Memory-Efficient Pipelines
# Non-Pythonic: build a full list just to sum it
total = sum([x ** 2 for x in range(1_000_000)])
# Pythonic: generator expression โ no intermediate list
total = sum(x ** 2 for x in range(1_000_000))
# Generators are lazy: values computed only when consumed
large_log = (line.strip() for line in open("app.log"))
errors = (line for line in large_log if "ERROR" in line)
for error in errors:
print(error)
A list comprehension builds the entire result in memory. A generator expression (() instead of []) creates a lazy iterator that produces values one at a time. For large datasets or pipelines, this is the difference between constant memory usage and memory that grows with input size.
๐ง Under the Hood: Why Python Comprehensions Are Faster and What Bytecode Reveals
The Internals of Python Comprehensions
When CPython compiles a list comprehension, it generates bytecode that is fundamentally different from a for loop with .append(). Understanding this helps you make better decisions about when to use each form.
Consider this simple case:
# Loop with append
result = []
for x in range(5):
result.append(x * 2)
# List comprehension
result = [x * 2 for x in range(5)]
Using the dis module to inspect the bytecode of each reveals the key difference:
import dis
def loop_version():
result = []
for x in range(5):
result.append(x * 2)
return result
def comp_version():
return [x * 2 for x in range(5)]
dis.dis(loop_version)
dis.dis(comp_version)
The loop version must look up result.append on every iteration โ that's a name lookup (LOAD_FAST result), then an attribute lookup (.append), then a function call. CPython's attribute lookup goes through the descriptor protocol and a dictionary probe every single time.
The comprehension version uses the LIST_APPEND opcode directly. LIST_APPEND is a single C-level operation that bypasses the attribute lookup entirely. For a loop of N iterations, the comprehension saves N LOAD_ATTR calls.
There is also a scope difference that trips up many developers. In Python 3, list comprehensions have their own scope:
x = 100
result = [x for x in range(5)] # x here is a comprehension-local variable
print(x) # prints 100, not 4 โ comprehension did not leak x
In Python 2, the loop variable would have leaked. Python 3 corrected this by giving comprehensions their own stack frame, which also contributes to the bytecode efficiency (local variable access is faster than global).
Performance Analysis: Comprehension vs Loop vs map vs Generator
The performance hierarchy for transforming a collection in Python follows a consistent pattern that holds across Python 3.8โ3.12:
| Approach | Relative Speed | Memory Usage | When to Use |
| List comprehension | Fastest for materialised lists | O(n) โ full list in memory | Default choice for in-memory transformation |
| Generator expression | Slightly slower per-item | O(1) โ lazy | Large sequences, pipeline stages, passed to sum/any/all |
map() with lambda | Similar to comprehension | O(1) โ lazy iterator | Rarely preferred; use comprehension for readability |
For loop with .append() | Slowest | O(n) โ full list | When you need side effects or multi-statement bodies |
map() with built-in function | Fastest for built-in transforms | O(1) โ lazy | map(str, numbers) beats [str(n) for n in numbers] |
A quick benchmark on Python 3.11 transforming 1,000,000 integers:
import timeit
# List comprehension
t1 = timeit.timeit("[x * 2 for x in range(1_000_000)]", number=10)
# for loop with append
setup = "result = []\nfor x in range(1_000_000): result.append(x * 2)"
t2 = timeit.timeit(setup, number=10)
# map with lambda
t3 = timeit.timeit("list(map(lambda x: x * 2, range(1_000_000)))", number=10)
# map with operator (fastest)
t4 = timeit.timeit(
"import operator; list(map(operator.mul, range(1_000_000), [2]*1_000_000))",
number=10
)
print(f"Comprehension: {t1:.2f}s")
print(f"Loop + append: {t2:.2f}s")
print(f"map + lambda: {t3:.2f}s")
Typical results: list comprehension runs about 15โ20% faster than the equivalent loop. The difference grows proportionally with loop length because the attribute lookup overhead is paid once per iteration. For a 10-element list the difference is irrelevant; for a 10-million-element transformation it is measurable.
Bottlenecks to watch for:
- Nested comprehensions โ
[[f(x) for x in row] for row in matrix]is fine, but three levels deep becomes a CPU-cache and readability problem simultaneously. - Comprehensions with expensive guards โ
[f(x) for x in data if expensive_check(x)]callsexpensive_checkonce per item beforef. Consider pre-filtering or caching. - Generator +
list()wrapping โlist(x for x in data)is always slower than[x for x in data]because the generator object creation andlist()call add overhead. Use generators only when you intend lazy evaluation.
๐ Should You Use a Comprehension or a For Loop? A Decision Flow
Choosing between a comprehension and a for loop is not always obvious. The decision turns on four questions: whether you need a materialised collection, whether the body has side effects, whether the logic is deeply nested, and whether memory efficiency matters. The flowchart below captures the complete decision path.
flowchart TD
A[Need to produce a list, set, or dict from an iterable?] -->|Yes| B{Does the body produce side effects?}
A -->|No| C[Use a for loop directly]
B -->|Yes| C
B -->|No| D{Is the logic more than two conditions deep?}
D -->|Yes| E[Use a for loop for readability]
D -->|No| F{Does it fit on one readable line?}
F -->|No| E
F -->|Yes| G[Use a comprehension]
G --> H{Will the result be consumed once or is the input large?}
H -->|Yes| I[Use a generator expression]
H -->|No| J[Use a list, set, or dict comprehension]
Read this flowchart from top to bottom, choosing the branch that matches your situation. The key insight is that comprehensions are the right default for simple, side-effect-free transformations of moderate size, and generators become the right default the moment memory becomes a concern or the result feeds into a single consuming expression like sum(), any(), or max(). If the body of your loop does anything beyond computing a value โ logging, mutating external state, printing, appending to multiple lists โ a regular for loop is clearer and more appropriate.
๐ Idiomatic Python in the Wild: How Flask, requests, and Django Use These Patterns
The Python ecosystem's most-loved libraries are themselves textbooks of idiomatic style. Reading their source code is one of the fastest ways to internalise these patterns.
Flask โ context managers for application setup:
# Flask's test client uses a context manager
from flask import Flask
app = Flask(__name__)
with app.test_client() as client:
response = client.get("/api/users")
assert response.status_code == 200
Flask's app.test_client() returns a context manager that sets up and tears down the test request context automatically. The same pattern appears in app.app_context() and app.test_request_context().
requests โ zip and dict comprehensions in header processing:
import requests
# requests uses dict comprehensions internally for header normalisation
# In user code, zip pairs up query parameter names and values cleanly
params = dict(zip(["page", "per_page", "sort"], [1, 20, "created_at"]))
response = requests.get("https://api.example.com/posts", params=params)
Django ORM โ truthiness and dict.get() in view logic:
from django.http import JsonResponse
def user_profile(request, user_id):
try:
user = User.objects.get(pk=user_id)
except User.DoesNotExist:
return JsonResponse({"error": "Not found"}, status=404)
# dict.get() with defaults for optional profile fields
profile_data = {
"name": user.get_full_name() or user.username,
"bio": getattr(user, "bio", ""),
"active": user.is_active, # truthiness used in template
}
return JsonResponse(profile_data)
Django's ORM methods like .filter(), .values(), and .annotate() all return lazy querysets โ the generator-expression philosophy applied at the database layer. Data is only fetched when you iterate, slice, or call .count().
The pathlib module โ comprehension-based file discovery:
from pathlib import Path
# Find all Python files in a project tree (generator over pathlib glob)
python_files = [p for p in Path(".").rglob("*.py") if not p.name.startswith("_")]
# Walrus operator to process only files that match a size threshold
large_files = [
p for p in Path(".").rglob("*")
if p.is_file() and (size := p.stat().st_size) > 1_000_000
]
โ๏ธ When Pythonic Style Becomes a Liability
Idiomatic Python is a tool, not a religion. Several of these idioms have well-known failure modes when applied beyond their intended scope.
Nested comprehensions collapse readability past two levels. A nested list comprehension like [cell for row in matrix for cell in row] is fine and common. Three levels deep โ [f(x) for block in file for line in block for x in line.split()] โ requires the reader to mentally unroll three for clauses in an order that does not match top-to-bottom reading. Switch to named loops.
The walrus operator creates cognitive debt when overused. The := operator is excellent in while/if guard conditions. Using it inside a comprehension filter that already has a transformation creates a line that is semantically dense in two directions at once:
# Pushing walrus too far โ technically correct, mentally expensive
results = [y for x in data if (y := expensive(x)) is not None]
# Better: name the intermediate step explicitly
processed = (expensive(x) for x in data)
results = [y for y in processed if y is not None]
Truthiness checks can mask None vs empty-string bugs. if name: passes when name is any non-empty string. If your logic should distinguish None (field not provided) from "" (field explicitly cleared), use explicit if name is not None: instead.
Generator expressions are not always memory-efficient for small inputs. A generator adds object creation overhead. For a 5-element list, list(x*2 for x in items) is slower than [x*2 for x in items] because the generator object allocation is more expensive than the list append saved. The memory savings only materialise for sequences large enough that the O(n) list allocation is the bottleneck.
๐งญ Non-Pythonic to Pythonic: A Quick-Reference Conversion Guide
| Non-Pythonic Pattern | Pythonic Equivalent | Performance Note |
for i in range(len(arr)): | for i, v in enumerate(arr): | Same O(n); cleaner attribute access |
if x == True: | if x: | Identical bytecode; one fewer comparison |
if x == None: | if x is None: | is tests identity, not equality โ correct for None |
result = []; for x in ...: result.append(f(x)) | result = [f(x) for x in ...] | ~15-20% faster due to LIST_APPEND opcode |
"Hello " + name + "!" | f"Hello {name}!" | f-strings are compiled; concatenation allocates N-1 temps |
try: v = d[k] except KeyError: v = default | v = d.get(k, default) | Single dict probe vs exception setup |
open(f); ... ; close(f) | with open(f) as fh: | Guarantees close on exception |
for x in data: if cond: result.append(x) | [x for x in data if cond] | ~15% faster for filtering |
[f(x) for x in range(N)] passed to sum() | sum(f(x) for x in range(N)) | O(1) memory vs O(N) |
zip(a, b) without strict= on Python 3.10+ | zip(a, b, strict=True) | Catches length mismatch as ValueError |
๐งช Three Refactoring Exercises: From Java-Style Python to Idiomatic Python
These three exercises show complete, runnable before/after pairs. Each demonstrates a cluster of idioms working together. As you read, notice how the Pythonic version is not just shorter but expresses the programmer's intent more directly: the structure of the code matches the structure of the problem.
Exercise 1: Inventory Report
The goal is to read a list of product dictionaries and return a formatted report string for all in-stock items whose price exceeds a threshold.
# === BEFORE: Java-style Python ===
def generate_report(products, min_price):
report_lines = []
for i in range(len(products)):
product = products[i]
if product["in_stock"] == True and product["price"] > min_price:
line = product["name"] + ": $" + str(product["price"])
report_lines.append(line)
result = ""
for j in range(len(report_lines)):
result = result + report_lines[j]
if j < len(report_lines) - 1:
result = result + "\n"
return result
# === AFTER: Pythonic Python ===
def generate_report(products: list[dict], min_price: float) -> str:
lines = [
f"{p['name']}: ${p['price']:.2f}"
for p in products
if p.get("in_stock") and p.get("price", 0) > min_price
]
return "\n".join(lines)
# Test both versions
inventory = [
{"name": "Widget", "price": 9.99, "in_stock": True},
{"name": "Gadget", "price": 24.99, "in_stock": True},
{"name": "Doohickey", "price": 5.00, "in_stock": False},
{"name": "Thingamajig", "price": 49.99, "in_stock": True},
]
print(generate_report(inventory, 10.0))
# Gadget: $24.99
# Thingamajig: $49.99
The Pythonic version uses: list comprehension with filter, dict.get() for safe key access, f-strings with a format specifier (:.2f), and str.join() instead of manual concatenation with index boundary checks.
Exercise 2: Log File Parser with Walrus and Generator
Parse a large log file, extract error messages, and count them by error code โ without loading the entire file into memory.
# === BEFORE: memory-heavy, index-based ===
def count_errors(filepath):
lines = open(filepath).readlines() # loads entire file!
error_counts = {}
for i in range(len(lines)):
line = lines[i].strip()
if "ERROR" in line:
parts = line.split(":")
if len(parts) >= 2:
code = parts[1].strip().split()[0]
if code in error_counts:
error_counts[code] = error_counts[code] + 1
else:
error_counts[code] = 1
return error_counts
# === AFTER: lazy generator + Counter + walrus ===
import re
from collections import Counter
def count_errors(filepath: str) -> dict[str, int]:
pattern = re.compile(r"ERROR:\s*(\w+)")
with open(filepath) as f:
codes = (
match.group(1)
for line in f
if (match := pattern.search(line))
)
return dict(Counter(codes))
# Create a small test file and verify
import io
fake_log = io.StringIO(
"INFO: server started\n"
"ERROR: DISK_FULL writing to /var/log\n"
"ERROR: CONN_TIMEOUT after 30s\n"
"ERROR: DISK_FULL again\n"
"INFO: cleanup complete\n"
)
pattern = re.compile(r"ERROR:\s*(\w+)")
codes = (match.group(1) for line in fake_log if (match := pattern.search(line)))
print(dict(Counter(codes))) # {'DISK_FULL': 2, 'CONN_TIMEOUT': 1}
The Pythonic version uses: context manager for file handling, walrus operator to assign and test the regex match in one expression, a generator expression so the file is processed line-by-line without loading it all, and Counter from the standard library instead of manual dictionary increment logic.
Exercise 3: Pairing and Transforming Parallel Datasets
Given two parallel lists (user IDs and raw scores), produce a sorted list of (username, grade) tuples where grade is a letter based on score, skipping any user not found in the user lookup dictionary.
# === BEFORE: nested indexing, manual grading ===
def pair_and_grade(user_ids, scores, user_lookup):
result = []
for i in range(len(user_ids)):
uid = user_ids[i]
score = scores[i]
if uid in user_lookup:
name = user_lookup[uid]
if score >= 90:
grade = "A"
elif score >= 80:
grade = "B"
elif score >= 70:
grade = "C"
else:
grade = "F"
result.append((name, grade))
result.sort(key=lambda t: t[0])
return result
# === AFTER: zip, dict.get, comprehension, ternary ===
def pair_and_grade(
user_ids: list[str],
scores: list[int],
user_lookup: dict[str, str],
) -> list[tuple[str, str]]:
def letter_grade(score: int) -> str:
return "A" if score >= 90 else "B" if score >= 80 else "C" if score >= 70 else "F"
pairs = [
(user_lookup[uid], letter_grade(score))
for uid, score in zip(user_ids, scores)
if uid in user_lookup
]
return sorted(pairs, key=lambda t: t[0])
# Test
ids = ["u1", "u2", "u3", "u99"]
raw = [95, 72, 88, 61]
lookup = {"u1": "Alice", "u2": "Bob", "u3": "Carol"}
print(pair_and_grade(ids, raw, lookup))
# [('Alice', 'A'), ('Bob', 'C'), ('Carol', 'B')]
The Pythonic version uses: zip to pair parallel lists, dict.__contains__ (the in operator) for safe key-existence checks, a comprehension that filters and transforms in one pass, a helper function for the grading logic (keeping the comprehension at one level of complexity), and sorted() with a lambda key instead of in-place .sort() (which also works, but sorted() is more composable).
๐ ๏ธ pylint, flake8, and ruff: How the Python Community Enforces Idiomatic Style
Pythonic style is not just convention โ it is enforced at scale by the community's linting ecosystem. Three tools dominate Python projects today, each with a different design philosophy.
pylint performs deep semantic analysis and catches both style issues and logic bugs. It is the most thorough but also the slowest, and its output can be overwhelming on a first pass. Enable only the rules relevant to Pythonic idioms:
# .pylintrc โ focus on style-relevant rules
[MESSAGES CONTROL]
enable =
consider-using-enumerate, # flags range(len(...))
consider-using-dict-items, # flags d.keys() when you need both k and v
use-implicit-booleaness-not-len, # flags len(x) > 0 instead of x
use-implicit-booleaness-not-comparison, # flags == True / == False
consider-using-f-string, # flags % and .format() formatting
consider-using-with, # flags manual open/close without with
flake8 focuses on PEP 8 compliance and common errors. Extend it with flake8-comprehensions for comprehension-specific idiom checks:
# setup.cfg or .flake8
[flake8]
max-line-length = 99
extend-select = C4 # flake8-comprehensions plugin rules
# C400: rewrite list() call as comprehension
# C401: rewrite set() call as comprehension
# C407: rewrite sum(list comprehension) as sum(generator)
# C416: unnecessary list comprehension โ use list() directly
ruff is the modern replacement for both. Written in Rust, it runs 10โ100x faster than flake8 and implements over 500 rules including the full set of pycodestyle, pyflakes, flake8-comprehensions, pylint, and more:
# pyproject.toml
[tool.ruff]
line-length = 99
select = [
"E", # pycodestyle errors
"F", # pyflakes
"C4", # flake8-comprehensions (comprehension idioms)
"B", # flake8-bugbear (common bugs)
"SIM", # flake8-simplify (simplification idioms including walrus suggestions)
"UP", # pyupgrade (flag old-style formatting, range(len), etc.)
]
Run ruff on a file and it will flag every range(len(...)), every == True, every %string formatting pattern, and every list comprehension wrapped in list(). For a team adopting Pythonic style, configuring ruff in CI is the fastest way to enforce consistency without code review friction.
For a full deep-dive on Python linting and code quality tooling, a companion post on configuring ruff, mypy, and pre-commit hooks together is planned as a follow-up in this series.
๐ Lessons Learned From Reviewing Thousands of Lines of Python
These observations come from the experience of reviewing Python codebases ranging from solo scripts to multi-million-line production systems.
The most common mistake is not learning from the first code review. Developers who receive the range(len(...)) comment once and fix it mechanically without understanding why, repeat the same pattern in a slightly different context a week later. Understanding the Pythonic alternative โ not just the surface form but why it exists โ is what separates developers who improve from developers who accumulate a personal debt of linting suppressions.
Comprehensions are the entry point, walrus is the graduation test. Most developers learn comprehensions quickly because the transformation from loop-to-comprehension is mechanical. The walrus operator reveals deeper fluency because it requires understanding that Python expressions can have side effects (assignment), which is unusual in a language that usually separates assignment from evaluation cleanly.
Context managers are underused outside of file I/O. Every developer knows with open(...), but the same pattern applies to any resource: threading.Lock(), unittest.mock.patch(), psycopg2.connect(), tempfile.TemporaryDirectory(), and custom contextlib.contextmanager-decorated functions. Reaching for with whenever you acquire and release a resource is a habit that eliminates entire categories of resource-leak bugs.
Generator expressions require you to think about "consumed once" vs "reused." A generator is an iterator โ once exhausted, it is empty. Passing a generator to two different functions will give the second function nothing. This surprises developers who treat generator expressions like lazy lists. When in doubt, materialise with list() unless you specifically need lazy evaluation or infinite sequences.
dict.get() is the single highest-ROI idiom for reducing line count. In most business logic code, the pattern of reading a potentially-absent key with a fallback appears many more times than any other pattern. Replacing every try/except KeyError and if key in d: ... else: block with d.get(key, default) typically reduces line count by 20โ30% in view and configuration code.
๐ Key Takeaways: The Pythonic Mindset in One Page
TLDR: Writing
for i in range(len(arr)):works, but Python veterans will flag it in your first code review. Idiomatic Python usesenumerate,zip, comprehensions, context managers, unpacking, the walrus operator, and truthiness checks โ not because they're clever, but because they compress intent into the minimum readable form. Learn these 10+ idioms and your Python will look like it belongs in the language.
The ten idioms that every Python developer should reach for automatically:
enumerateโ use when you need both the index and the value during iterationzipโ use to pair two parallel sequences without index arithmetic- Comprehensions โ default choice for building lists, dicts, and sets from iterables
- Extended unpacking โ use
*restto extract head, tail, or interior without slicing - Walrus operator โ assign-and-test in
if/while/comprehension guards - Context managers โ use
withfor any resource that must be released - Truthiness โ test
if x:rather thanif x is not None and x != "" dict.get()โ default return without try/except for missing keys- f-strings โ the only string formatting you should reach for in Python 3.6+
- Generator expressions โ lazy evaluation for large data or single-pass consumption
The meta-skill behind all of them: read the standard library. Every built-in, every stdlib function, every language feature was designed with an idiomatic use in mind. When you find yourself writing boilerplate, the standard library almost always has a cleaner form waiting.
๐ Practice Quiz: Test Your Pythonic Fluency
What is wrong with
for i in range(len(items)):and what should you use instead?A) Nothing is wrong with it โ it is valid Python B) It is slower than a while loop and should be replaced with while C) It creates an unnecessary index variable; use
for item in items:orfor i, item in enumerate(items):D) Python'srange()does not acceptlen()as an argumentCorrect Answer: C.
range(len(...))forces you to manually index the list on every iteration. When you need the element only,for item in items:is clearer. When you need both the index and the element,enumerateprovides both without manual indexing.You have a dictionary
configand want the value at key"timeout"or30if it is absent. Which is the Pythonic form?A)
config["timeout"] if "timeout" in config else 30B)try: v = config["timeout"] except KeyError: v = 30C)v = config.get("timeout", 30)D)v = config.setdefault("timeout", 30)Correct Answer: C.
dict.get(key, default)is the canonical one-liner for this pattern. Option D (setdefault) also returns the default but also writes it back into the dictionary, which is a side effect you rarely want when you are just reading.Which of the following correctly uses a generator expression instead of a list comprehension to avoid building an intermediate list?
A)
total = sum([x ** 2 for x in data])B)total = sum(x ** 2 for x in data)C)total = sum(list(x ** 2 for x in data))D)total = sum({x ** 2 for x in data})Correct Answer: B. Passing a generator expression directly to
sum()means values are computed one at a time without ever building an O(n) list in memory. Option A builds the full list first. Option C wraps the generator inlist(), defeating the purpose. Option D uses a set comprehension, which removes duplicates (incorrect) and builds O(n) memory.What does the walrus operator (
:=) do and in which Python version was it introduced?A) It is the augmented assignment operator for walrus (marine mammal) data types; Python 2.7 B) It assigns a value to a variable as part of an expression, allowing assignment inside
if,while, and comprehension conditions; Python 3.8 C) It is the deep-copy assignment operator, equivalent toimport copy; copy.deepcopy(); Python 3.6 D) It merges two dictionaries in-place, introduced alongside the|operator for dicts; Python 3.9Correct Answer: B. The walrus operator (PEP 572, Python 3.8) assigns a value to a name inside an expression. Its most common use cases are assigning the result of a condition before testing it (
if match := re.search(...)) and consuming a stream in awhileloop (while chunk := f.read(4096)).A colleague writes
if len(items) > 0:to check whether a list is non-empty. How would you suggest they improve this?A) Use
if items != []:to be explicit about comparing to an empty list B) Useif items:โ Python's truthiness rules make any non-empty sequence truthy C) Useif bool(items):โ wrapping inbool()is clearer than implicit truthiness D) Useif items.__len__() > 0:to call the special method directlyCorrect Answer: B. Any non-empty list, string, dict, set, or tuple is truthy in Python. Testing
if items:is idiomatic, more readable, and works correctly for all sequence types including custom objects that implement__bool__or__len__. Callingbool()explicitly adds no clarity.Open-ended challenge: You have a function that reads a CSV file row-by-row, filters rows where the
statuscolumn equals"active", converts thevaluecolumn to a float, and sums all values. Rewrite it using the most memory-efficient, idiomatic Python approach. What idioms do you reach for, and why? Would your answer change if the CSV had 10 rows versus 100 million rows?
๐ Further Reading in This Series and Beyond
Python Basics: Variables, Types, and Control Flow โ Foundation post: if you are still building the mental model of how Python variables differ from Java, start here before applying the idioms above.
Exploring the Strategy Design Pattern: Simplifying Software Design โ Many Pythonic patterns โ callable objects, duck typing, comprehensions over polymorphism โ directly parallel the Strategy pattern. Understanding why the pattern exists in statically-typed OOP languages illuminates why Python often doesn't need it.
Java Memory Model Demystified: Key Concepts and Usage โ If you came to Python from Java, understanding why the JMM exists and how it contrasts with Python's GIL helps explain why Python concurrency idioms (
asyncio,multiprocessing) look so different from Java'ssynchronizedandvolatile.

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Watermarking and Late Data Handling in Spark Structured Streaming
TLDR: A watermark tells Spark Structured Streaming: "I will accept events up to N minutes late, and then I am done waiting." Spark tracks the maximum event time seen per partition, takes the global minimum across all partitions, subtracts the thresho...
Spark Structured Streaming: Micro-Batch vs Continuous Processing
๐ The 15-Minute Gap: How a Fraud Team Discovered They Needed Real-Time Streaming A fintech team runs payment fraud detection with a well-tuned Spark batch job. Every 15 minutes it reads a day's worth of transaction events from S3, scores them agains...
Stateful Aggregations in Spark Structured Streaming: mapGroupsWithState
TLDR: mapGroupsWithState gives each streaming key its own mutable state object, persisted in a fault-tolerant state store that checkpoints to object storage on every micro-batch. Where window aggregations assume fixed time boundaries, mapGroupsWithSt...
Shuffles in Spark: Why groupBy Kills Performance
TLDR: A Spark shuffle is the most expensive operation in any distributed job โ it moves every matching key across the network, writes temporary sorted files to disk, and forces a hard synchronization barrier between every upstream and downstream stag...
