Choosing the Right Data Structure

With list, tuple, dict and set in your toolbox, most programming questions become a variant of "which container should I use for this?". The answer depends on three axes you already know: ordering, mutability, and access pattern (by position, by key, or by membership). A clear decision here makes the rest of the code almost write itself; a wrong choice leads to awkward loops and subtle bugs.

Use a list when order matters and you need to grow or shrink the collection. Use a tuple when you have a small, fixed-length record whose positions mean something distinct (a coordinate, a row from a database, a multi-value return from a function). Use a dict when you access data by a label rather than a position. Use a set when you care only about uniqueness or fast membership tests.

Performance should rarely decide the container, but it is worth knowing the rough costs. Membership tests (x in c) are O(1) for set and dict, O(n) for list and tuple. Indexing is O(1) for list and tuple, undefined for set. Inserting at the end is O(1) amortized for list; inserting in the middle is O(n). For counts up to a few hundred, these differences don't matter; for millions, they decide feasibility.

The standard library adds specialized structures for cases the built-ins handle poorly: collections.deque for fast pops from either end, heapq for priority queues, collections.Counter for frequency tables, collections.ChainMap for layered configuration. Start with the plain built-ins; reach for these when profiling or clarity demands it.

A practical decision tree

Ask yourself: am I indexing by position or by label? By label → dict. By position → list or tuple. Then: does the collection change over time? Yes → list. No → tuple. Finally: do I only need to know whether something is present? → set.

When two options look equally good, prefer the more restrictive one. A tuple reveals more intent than a list; a frozen set more than a set; a NamedTuple more than a plain tuple. Tighter types catch bugs earlier and make code self-documenting.

When to reach for the collections module

deque is a doubly-ended queue: O(1) append/pop at both ends and maxlen=N caps its size. Ideal for sliding windows and recent-event caches. Counter is a dict subclass specialized for counts and supports arithmetic: Counter("aab") + Counter("abc").

defaultdict(factory) removes the "is the key already there?" boilerplate; ChainMap stacks several dicts into a single view, perfect for configuration layered over defaults.

A side-by-side summary of the four built-ins and their specialized cousins.

ToolPurpose
list
built-in type
Ordered, mutable; index by position.
tuple
built-in type
Ordered, immutable; fixed-length record.
dict
built-in type
Key-to-value mapping; O(1) lookup.
set
built-in type
Unordered unique items; O(1) membership.
collections.deque
class
Fast append/pop at both ends.
heapq
module
Min-heap priority queue on a plain list.
collections.Counter
class
Frequency counts with arithmetic.
collections.ChainMap
class
Layered mappings (defaults + overrides).

Choosing the Right Data Structure code example

The script below picks one container per question so you can compare them side by side.

# Lesson: Choosing the Right Data Structure
from collections import Counter, ChainMap, deque
import heapq

events = ["click", "view", "click", "buy", "view", "view"]

# Ordered history (list): keep duplicates, add at the end
history: list[str] = list(events)
history.append("click")

# Record of current user (tuple): fixed positions, no mutation
user = ("ana", 30, "oslo")
name, age, city = user

# Config lookup (dict): access by name
config = {"host": "localhost", "port": 5432}

# Dedup (set): do we have this token already?
seen = set(events)

# Recent 3 events (deque with maxlen): evict old ones automatically
recent: deque[str] = deque(maxlen=3)
for e in events: recent.append(e)

# Frequency of events (Counter)
counts = Counter(events)

# Priority queue (heapq): smallest priority pops first
pq: list[tuple[int, str]] = []
for priority, label in [(3, "c"), (1, "a"), (2, "b")]:
    heapq.heappush(pq, (priority, label))

# Layered config (ChainMap)
overrides = {"port": 6543}
effective = ChainMap(overrides, config)

print("history :", history)
print("user    :", user, "->", name, city)
print("config  :", config["host"])
print("seen    :", sorted(seen))
print("recent  :", list(recent))
print("counts  :", counts.most_common())
print("pop 1st :", heapq.heappop(pq))
print("port    :", effective["port"])

Each block picks a container to answer a specific question:

1) History = list (order + duplicates matter).
2) User record = tuple / NamedTuple (fixed fields).
3) Unique events = set (dedup with O(1) membership).
4) Recent events = deque(maxlen=N) (automatic eviction).

Practice matching structure to problem.

# Example A: top 5 most common words
from collections import Counter
text = "to be or not to be that is the question"
print(Counter(text.split()).most_common(5))

# Example B: sliding window average using deque
from collections import deque
window: deque[float] = deque(maxlen=3)
values = [1, 2, 3, 4, 5]
for v in values:
    window.append(v)
    if len(window) == window.maxlen:
        print("avg:", sum(window) / len(window))

Assert the canonical properties.

from collections import deque, Counter
assert list(deque([1, 2, 3], maxlen=2)) == [2, 3]
assert Counter("aab") == {"a": 2, "b": 1}
assert {1, 2} | {2, 3} == {1, 2, 3}
assert dict(zip(["a", "b"], [1, 2])) == {"a": 1, "b": 2}

Running prints:

history : ['click', 'view', 'click', 'buy', 'view', 'view', 'click']
user    : ('ana', 30, 'oslo') -> ana oslo
config  : localhost
seen    : ['buy', 'click', 'view']
recent  : ['buy', 'view', 'view']
counts  : [('view', 3), ('click', 2), ('buy', 1)]
pop 1st : (1, 'a')
port    : 6543