Writing Efficient and Pythonic Code

“Pythonic” code does things the way the language makes easiest: list comprehensions instead of manual loops, unpacking instead of indexing, with instead of try/finally, generators instead of eager lists when the whole sequence isn't needed. Pythonic code is usually also more efficient because the idioms map to fast built-in machinery.

The single biggest performance trap is picking the wrong data structure. Looking up items in a 1-million-element list with in is O(n) and painful; converting the list to a set makes it O(1). Counting duplicates with a manual loop is slow; collections.Counter is implemented in C and an order of magnitude faster. Pick the right tool and most performance issues disappear.

The second trap is doing work in Python when a built-in or standard-library function could do it in C. sum, min, max, any, all, sorted, "".join, itertools, map/filter are all implemented in C. Use them. A tight for-loop over a million ints might take a second; the equivalent sum(generator) takes milliseconds.

The third trap is premature optimization. Profile first, optimize second. cProfile and timeit identify the real bottlenecks in a few minutes, and the answer often surprises you — the slow line is rarely the one you suspected. Only after measurement should you reach for micro-optimizations, specialized containers, or C extensions.

Pythonic idioms in everyday code

Prefer for x in seq to for i in range(len(seq)). Prefer unpacking (a, b = t) to positional indexing. Prefer ",".join(parts) to building strings with +. Prefer set/dict membership to linear list search. Prefer comprehensions and generators to hand-rolled loops.

Use EAFP (“easier to ask forgiveness than permission”) when try/except is cheaper than checking first: try: d[k] except KeyError: ... is typical. Use LBYL (“look before you leap”) when the check is obvious and cheap.

Benchmark before you optimize

timeit measures a snippet dozens of times and reports the best. Use it on two candidate implementations to pick the faster one objectively. cProfile profiles a full script; use it to find the top function, not to measure a one-liner.

Don't tune code without a benchmark that shows the change helped. Complexity class dominates: moving from O(n^2) to O(n) wins by orders of magnitude on big inputs; most micro-tweaks save microseconds no one will notice.

Efficient-Python toolbox.

ToolPurpose
timeit
module
Micro-benchmark a snippet.
cProfile
module
Function-level CPU profiler.
collections
module
Counter, defaultdict, deque — fast specialized.
itertools
module
Lazy combinators in C.
functools.lru_cache
decorator
Memoize pure functions.
heapq
module
Priority queue on a plain list.
bisect
module
O(log n) search in a sorted list.
numpy
library
Vectorized numeric computation.

Writing Efficient and Pythonic Code code example

The script benchmarks slow vs Pythonic approaches to the same task and prints the speedups.

# Lesson: Writing Efficient and Pythonic Code
import timeit
from collections import Counter


# Task 1: sum squares of 1..n
def slow_sum_squares(n: int) -> int:
    total = 0
    for i in range(1, n + 1):
        total += i * i
    return total


def fast_sum_squares(n: int) -> int:
    return sum(i * i for i in range(1, n + 1))


# Task 2: count words
text = "red blue red green blue red yellow blue red"


def slow_count(text: str) -> dict[str, int]:
    counts: dict[str, int] = {}
    for w in text.split():
        counts[w] = counts.get(w, 0) + 1
    return counts


def fast_count(text: str) -> dict[str, int]:
    return dict(Counter(text.split()))


# Task 3: membership
data = list(range(10_000))
needle = 9_999


def slow_member(data, needle):
    return needle in data   # O(n) on a list


def fast_member(data_set, needle):
    return needle in data_set   # O(1) on a set


data_set = set(data)


# Benchmarks
print("sum_squares:")
print(f"  slow: {timeit.timeit(lambda: slow_sum_squares(5000), number=50)*1000/50:.2f} ms")
print(f"  fast: {timeit.timeit(lambda: fast_sum_squares(5000), number=50)*1000/50:.2f} ms")

print("word_count:")
print(f"  slow: {timeit.timeit(lambda: slow_count(text), number=10_000)*1000/10_000:.3f} ms")
print(f"  fast: {timeit.timeit(lambda: fast_count(text), number=10_000)*1000/10_000:.3f} ms")

print("membership:")
print(f"  slow: {timeit.timeit(lambda: slow_member(data, needle), number=1000)*1000/1000:.3f} ms")
print(f"  fast: {timeit.timeit(lambda: fast_member(data_set, needle), number=1000)*1000/1000:.4f} ms")

assert slow_sum_squares(100) == fast_sum_squares(100)
assert slow_count(text) == fast_count(text)

Three typical upgrades:

1) Replace a for-loop + accumulator with `sum(generator)` — measurably faster.
2) Swap hand-rolled counting for `collections.Counter`.
3) Convert a list to a set for O(1) membership checks.

Profile a small tuning candidate.

import timeit

def concat_plus(items):
    s = ""
    for x in items:
        s += x
    return s

def concat_join(items):
    return "".join(items)

items = ["py"] * 10_000

print(f"plus: {timeit.timeit(lambda: concat_plus(items), number=50)*1000/50:.2f} ms")
print(f"join: {timeit.timeit(lambda: concat_join(items), number=50)*1000/50:.2f} ms")

Sanity checks on both implementations.

from collections import Counter
def slow(text):
    counts = {}
    for w in text.split():
        counts[w] = counts.get(w, 0) + 1
    return counts
def fast(text):
    return dict(Counter(text.split()))
text = "a b a c"
assert slow(text) == fast(text) == {"a": 2, "b": 1, "c": 1}

Output varies by machine but looks like:

sum_squares:
  slow: 0.64 ms
  fast: 0.47 ms
word_count:
  slow: 0.005 ms
  fast: 0.003 ms
membership:
  slow: 0.123 ms
  fast: 0.0005 ms