Sequences — ordered collections of items — show up everywhere: lists of records, streams of events, ranges of numbers, lines in a file. Python's sequence toolkit is deep: built-in operations like slicing and iteration, the itertools module for combinators, and a handful of patterns (windowing, grouping, chunking) that recur across almost every data task.
Start with the basics: slicing (seq[a:b]), iteration (for x in seq), membership (x in seq), enumeration (enumerate(seq)), and zipping (zip(a, b)). These are the LEGO bricks for everything above them. Learn to spot when each one simplifies a manual loop.
itertools is the next layer. chain glues sequences together; islice takes a lazy slice; pairwise yields overlapping pairs; groupby groups consecutive equal items; combinations and permutations enumerate arrangements; accumulate computes running totals. Each one is a one-liner that replaces a dozen lines of hand-written code.
Patterns you will reuse: windowing (look at N consecutive items at a time), chunking (split into batches of N), running totals, most-common-k, and stable sort by multiple keys. Knowing these by name turns “how do I do this?” into “apply this recipe”.
Built-in sequence operations
enumerate(seq, start=1) yields (index, item) pairs with a configurable starting index. zip(a, b) pairs two iterables; zip(*matrix) transposes a list-of-lists. sorted(seq, key=..., reverse=...) returns a new sorted list; reversed(seq) returns an iterator.
For homogeneous numeric sequences, sum, min, max, statistics.mean, statistics.median cover most aggregations. For heterogenous records, pass key= to pick the field you care about.
itertools patterns
Windowing: from itertools import pairwise; for a, b in pairwise(seq):. Chunking: from itertools import batched; list(batched(seq, 3)) (Python 3.12+). Running totals: list(accumulate(seq)). Group consecutive: groupby(seq, key=...).
Most-common: Counter(seq).most_common(k). Top-k largest: heapq.nlargest(k, seq, key=...). These one-liners cover a huge fraction of data cleanup and summarization.
The sequence toolkit.
| Tool | Purpose |
|---|---|
enumerate(seq)built-in | Yield (index, item) pairs. |
zip(*iters)built-in | Combine parallel iterables. |
sorted(seq, key=...)built-in | Return a sorted copy. |
itertools.pairwisefunction | Yield overlapping (a, b) pairs. |
itertools.batched (3.12+)function | Chunk into tuples of size n. |
itertools.accumulatefunction | Running total / running function. |
itertools.groupbyfunction | Group consecutive equal items. |
heapq.nlargestfunction | Top-k by key without sorting everything. |
Working with Data Sequences code example
The script applies the main sequence patterns to a small timeseries.
# Lesson: Working with Data Sequences
import heapq
from itertools import accumulate, groupby, pairwise
from operator import itemgetter
samples = [
("2026-01-01", 10),
("2026-01-02", 12),
("2026-01-03", 11),
("2026-01-04", 15),
("2026-01-05", 14),
("2026-01-06", 20),
("2026-01-07", 18),
]
dates = [s[0] for s in samples]
values = [s[1] for s in samples]
# Enumerate and zip
for i, (d, v) in enumerate(zip(dates, values), start=1):
print(f" #{i}: {d} -> {v}")
# Day-over-day diff with pairwise
deltas = [b - a for a, b in pairwise(values)]
print("daily deltas:", deltas)
# Running total
print("running sum :", list(accumulate(values)))
# Top 3 busiest days by value
top3 = heapq.nlargest(3, samples, key=itemgetter(1))
print("top 3:", top3)
# Group consecutive days where the value strictly increased vs not
def trend(pair): return "up" if pair[1] > pair[0] else "flat-or-down"
pairs = list(pairwise(values))
for key, group in groupby(pairs, key=trend):
group = list(group)
print(f"{key:14s} run of {len(group)} pair(s)")
# Windowed mean of size 3
def windowed(seq, n):
window: list = []
for v in seq:
window.append(v)
if len(window) > n:
window.pop(0)
if len(window) == n:
yield sum(window) / n
print("3-day avg :", [round(x, 2) for x in windowed(values, 3)])
Each block is one pattern:
1) `zip` + `enumerate` turns two parallel lists into labeled rows.
2) `pairwise` is the canonical way to take differences.
3) `accumulate` gives you running totals in a single call.
4) `groupby` clusters consecutive equal keys (data must already be sorted by key for full groups).
Practice chunking and top-k.
from itertools import islice
def chunks(seq, n):
it = iter(seq)
while True:
batch = list(islice(it, n))
if not batch:
return
yield batch
print(list(chunks(range(7), 3)))
import heapq
scores = [("ana", 91), ("ben", 78), ("cai", 85), ("dev", 95)]
print(heapq.nlargest(2, scores, key=lambda p: p[1]))
Tiny truths about the patterns.
from itertools import accumulate, pairwise
assert list(accumulate([1, 2, 3])) == [1, 3, 6]
assert list(pairwise([1, 2, 3, 4])) == [(1, 2), (2, 3), (3, 4)]
assert sorted([(1, "b"), (2, "a")], key=lambda p: p[1]) == [(2, "a"), (1, "b")]
Running prints:
#1: 2026-01-01 -> 10
#2: 2026-01-02 -> 12
#3: 2026-01-03 -> 11
#4: 2026-01-04 -> 15
#5: 2026-01-05 -> 14
#6: 2026-01-06 -> 20
#7: 2026-01-07 -> 18
daily deltas: [2, -1, 4, -1, 6, -2]
running sum : [10, 22, 33, 48, 62, 82, 100]
top 3: [('2026-01-06', 20), ('2026-01-07', 18), ('2026-01-04', 15)]
up run of 1 pair(s)
flat-or-down run of 1 pair(s)
up run of 1 pair(s)
flat-or-down run of 1 pair(s)
up run of 1 pair(s)
flat-or-down run of 1 pair(s)
3-day avg : [11.0, 12.67, 13.33, 16.33, 17.33]