Counter And Defaultdict

Deep dive · part of Python Dictionaries

collections.Counter counts hashable items and supports arithmetic on bags. defaultdict auto-initialises missing keys via a factory. Together they replace most ad-hoc tallying loops.

Counter is a dict subclass specialized for counting hashable items with most_common, arithmetic, and subset tests. defaultdict supplies missing keys via a factory—list for grouping edges, int for tallies—eliminating boilerplate if key not in d.

Together they replace most manual frequency loops in log analysis, text mining, and graph adjacency construction.

Production code combines this topic with logging, tests, and clear module boundaries so refactors stay safe when requirements grow.

Counter('abbccc') counts characters; elements() expands multiplicities.

Counter arithmetic adds/subtracts counts with zero floor on subtraction.

most_common(n) returns [(item, count), ...] sorted descending.

defaultdict never raises KeyError on read; factory must be zero-arg callable.

subtract and update ingest iterables or mappings efficiently.

Counter is not a multiset for unhashable items—keys must be hashable.

For weighted counts, store totals in Counter and normalize later. In graph algorithms, defaultdict(list) for adjacency is standard; for degrees, Counter on edges works.

Serialization: Counter is JSON-friendly as dict; document that non-string keys need str conversion for JSON.

Combine Counter.most_common with heapq.nlargest when you only need top-k tokens from huge streams without storing the full Counter in RAM.

For terabyte logs, stream tokens into Counter.update and report heapq.nlargest when only top-k frequencies matter.

Read the parent tutorial on pythondeck.com for runnable snippets, then reproduce them locally in a virtual environment with pinned dependency versions matching your deployment target.

When pairing with teammates, agree on one idiomatic pattern per concern—mixed styles in one repo slow reviews and invite subtle integration bugs during merges.

Using Counter on huge streams without updating incrementally (memory if storing every unique).

defaultdict(factory) where factory is list—must be type list not list().

Expecting Counter subtraction to remove keys below zero—it floors at zero.

Using unhashable lists as Counter keys.

Use Counter.update with generators for large corpora line by line.

Combine Counter objects across workers then sum for MapReduce tallies.

Pick defaultdict only when access pattern is many writes per key.

Print most_common(20) in CLI tools for quick log summaries.

Re-read the examples below with these ideas in mind; change variable names and inputs to match your own project.

The program below demonstrates word frequency. Read the comments on each line, run the code, then change names or values to see how the output shifts.

# Example: Word frequency
# Run in the REPL or save as a .py file and execute with python.
from collections import Counter
text = "the quick brown fox jumps over the lazy dog the"
c = Counter(text.split())
print(c.most_common(3))

This sample walks through counter arithmetic in a small, runnable script. Paste it into the REPL or save it as a .py file before you continue to the next block.

# Example: Counter arithmetic
# Run in the REPL or save as a .py file and execute with python.
from collections import Counter
a = Counter("pythondeck")
b = Counter("deckpython")
print(a == b)         # True
print(a + Counter("xx"))
print(a - Counter("yyy"))

Here is a hands-on illustration of defaultdict of lists. Follow the inline comments first; only then execute the snippet and compare the result with what you expected.

# Example: defaultdict of lists
# Run in the REPL or save as a .py file and execute with python.
from collections import defaultdict
edges = defaultdict(list)
for a, b in [(1,2),(1,3),(2,4),(3,4)]:
    edges[a].append(b)
print(dict(edges))

The program below demonstrates word counter. Read the comments on each line, run the code, then change names or values to see how the output shifts.

# Counter is a dict subclass for tallies
from collections import Counter  # multiset
text = "the cat and the hat"  # sample
counts = Counter(text.split())  # token counts
print(counts.most_common(2))  # top two
print(counts["the"])  # 2
counts.update(["the", "hat"])  # increment
print(counts["hat"])  # 2

This sample walks through counter math in a small, runnable script. Paste it into the REPL or save it as a .py file before you continue to the next block.

# Counters support arithmetic like multisets
from collections import Counter  # Counter
a = Counter("aab")  # multiset a
b = Counter("abb")  # multiset b
print(a + b)  # combine counts
print(a - b)  # saturate at zero
print(a & b)  # intersection (min)
print(a | b)  # union (max)

« back to Python Dictionaries All tutorials