“Pythonic” code does things the way the language makes easiest: list comprehensions instead of manual loops, unpacking instead of indexing, with instead of try/finally, generators instead of eager lists when the whole sequence isn't needed. Pythonic code is usually also more efficient because the idioms map to fast built-in machinery.
The single biggest performance trap is picking the wrong data structure. Looking up items in a 1-million-element list with in is O(n) and painful; converting the list to a set makes it O(1). Counting duplicates with a manual loop is slow; collections.Counter is implemented in C and an order of magnitude faster. Pick the right tool and most performance issues disappear.
The second trap is doing work in Python when a built-in or standard-library function could do it in C. sum, min, max, any, all, sorted, "".join, itertools, map/filter are all implemented in C. Use them. A tight for-loop over a million ints might take a second; the equivalent sum(generator) takes milliseconds.
The third trap is premature optimization. Profile first, optimize second. cProfile and timeit identify the real bottlenecks in a few minutes, and the answer often surprises you — the slow line is rarely the one you suspected. Only after measurement should you reach for micro-optimizations, specialized containers, or C extensions.
Pythonic idioms in everyday code
Prefer for x in seq to for i in range(len(seq)). Prefer unpacking (a, b = t) to positional indexing. Prefer ",".join(parts) to building strings with +. Prefer set/dict membership to linear list search. Prefer comprehensions and generators to hand-rolled loops.
Use EAFP (“easier to ask forgiveness than permission”) when try/except is cheaper than checking first: try: d[k] except KeyError: ... is typical. Use LBYL (“look before you leap”) when the check is obvious and cheap.
Benchmark before you optimize
timeit measures a snippet dozens of times and reports the best. Use it on two candidate implementations to pick the faster one objectively. cProfile profiles a full script; use it to find the top function, not to measure a one-liner.
Don't tune code without a benchmark that shows the change helped. Complexity class dominates: moving from O(n^2) to O(n) wins by orders of magnitude on big inputs; most micro-tweaks save microseconds no one will notice.
Efficient-Python toolbox.
| Tool | Purpose |
|---|---|
timeitmodule | Micro-benchmark a snippet. |
cProfilemodule | Function-level CPU profiler. |
collectionsmodule | Counter, defaultdict, deque — fast specialized. |
itertoolsmodule | Lazy combinators in C. |
functools.lru_cachedecorator | Memoize pure functions. |
heapqmodule | Priority queue on a plain list. |
bisectmodule | O(log n) search in a sorted list. |
numpylibrary | Vectorized numeric computation. |
Writing Efficient and Pythonic Code code example
The script benchmarks slow vs Pythonic approaches to the same task and prints the speedups.
# Lesson: Writing Efficient and Pythonic Code
import timeit
from collections import Counter
# Task 1: sum squares of 1..n
def slow_sum_squares(n: int) -> int:
total = 0
for i in range(1, n + 1):
total += i * i
return total
def fast_sum_squares(n: int) -> int:
return sum(i * i for i in range(1, n + 1))
# Task 2: count words
text = "red blue red green blue red yellow blue red"
def slow_count(text: str) -> dict[str, int]:
counts: dict[str, int] = {}
for w in text.split():
counts[w] = counts.get(w, 0) + 1
return counts
def fast_count(text: str) -> dict[str, int]:
return dict(Counter(text.split()))
# Task 3: membership
data = list(range(10_000))
needle = 9_999
def slow_member(data, needle):
return needle in data # O(n) on a list
def fast_member(data_set, needle):
return needle in data_set # O(1) on a set
data_set = set(data)
# Benchmarks
print("sum_squares:")
print(f" slow: {timeit.timeit(lambda: slow_sum_squares(5000), number=50)*1000/50:.2f} ms")
print(f" fast: {timeit.timeit(lambda: fast_sum_squares(5000), number=50)*1000/50:.2f} ms")
print("word_count:")
print(f" slow: {timeit.timeit(lambda: slow_count(text), number=10_000)*1000/10_000:.3f} ms")
print(f" fast: {timeit.timeit(lambda: fast_count(text), number=10_000)*1000/10_000:.3f} ms")
print("membership:")
print(f" slow: {timeit.timeit(lambda: slow_member(data, needle), number=1000)*1000/1000:.3f} ms")
print(f" fast: {timeit.timeit(lambda: fast_member(data_set, needle), number=1000)*1000/1000:.4f} ms")
assert slow_sum_squares(100) == fast_sum_squares(100)
assert slow_count(text) == fast_count(text)
Three typical upgrades:
1) Replace a for-loop + accumulator with `sum(generator)` — measurably faster.
2) Swap hand-rolled counting for `collections.Counter`.
3) Convert a list to a set for O(1) membership checks.
Profile a small tuning candidate.
import timeit
def concat_plus(items):
s = ""
for x in items:
s += x
return s
def concat_join(items):
return "".join(items)
items = ["py"] * 10_000
print(f"plus: {timeit.timeit(lambda: concat_plus(items), number=50)*1000/50:.2f} ms")
print(f"join: {timeit.timeit(lambda: concat_join(items), number=50)*1000/50:.2f} ms")
Sanity checks on both implementations.
from collections import Counter
def slow(text):
counts = {}
for w in text.split():
counts[w] = counts.get(w, 0) + 1
return counts
def fast(text):
return dict(Counter(text.split()))
text = "a b a c"
assert slow(text) == fast(text) == {"a": 2, "b": 1, "c": 1}
Output varies by machine but looks like:
sum_squares:
slow: 0.64 ms
fast: 0.47 ms
word_count:
slow: 0.005 ms
fast: 0.003 ms
membership:
slow: 0.123 ms
fast: 0.0005 ms