Working with Data Sequences

Sequences — ordered collections of items — show up everywhere: lists of records, streams of events, ranges of numbers, lines in a file. Python's sequence toolkit is deep: built-in operations like slicing and iteration, the itertools module for combinators, and a handful of patterns (windowing, grouping, chunking) that recur across almost every data task.

Start with the basics: slicing (seq[a:b]), iteration (for x in seq), membership (x in seq), enumeration (enumerate(seq)), and zipping (zip(a, b)). These are the LEGO bricks for everything above them. Learn to spot when each one simplifies a manual loop.

itertools is the next layer. chain glues sequences together; islice takes a lazy slice; pairwise yields overlapping pairs; groupby groups consecutive equal items; combinations and permutations enumerate arrangements; accumulate computes running totals. Each one is a one-liner that replaces a dozen lines of hand-written code.

Patterns you will reuse: windowing (look at N consecutive items at a time), chunking (split into batches of N), running totals, most-common-k, and stable sort by multiple keys. Knowing these by name turns “how do I do this?” into “apply this recipe”.

Built-in sequence operations

enumerate(seq, start=1) yields (index, item) pairs with a configurable starting index. zip(a, b) pairs two iterables; zip(*matrix) transposes a list-of-lists. sorted(seq, key=..., reverse=...) returns a new sorted list; reversed(seq) returns an iterator.

For homogeneous numeric sequences, sum, min, max, statistics.mean, statistics.median cover most aggregations. For heterogenous records, pass key= to pick the field you care about.

itertools patterns

Windowing: from itertools import pairwise; for a, b in pairwise(seq):. Chunking: from itertools import batched; list(batched(seq, 3)) (Python 3.12+). Running totals: list(accumulate(seq)). Group consecutive: groupby(seq, key=...).

Most-common: Counter(seq).most_common(k). Top-k largest: heapq.nlargest(k, seq, key=...). These one-liners cover a huge fraction of data cleanup and summarization.

The sequence toolkit.

ToolPurpose
enumerate(seq)
built-in
Yield (index, item) pairs.
zip(*iters)
built-in
Combine parallel iterables.
sorted(seq, key=...)
built-in
Return a sorted copy.
itertools.pairwise
function
Yield overlapping (a, b) pairs.
itertools.batched (3.12+)
function
Chunk into tuples of size n.
itertools.accumulate
function
Running total / running function.
itertools.groupby
function
Group consecutive equal items.
heapq.nlargest
function
Top-k by key without sorting everything.

Working with Data Sequences code example

The script applies the main sequence patterns to a small timeseries.

# Lesson: Working with Data Sequences
import heapq
from itertools import accumulate, groupby, pairwise
from operator import itemgetter


samples = [
    ("2026-01-01", 10),
    ("2026-01-02", 12),
    ("2026-01-03", 11),
    ("2026-01-04", 15),
    ("2026-01-05", 14),
    ("2026-01-06", 20),
    ("2026-01-07", 18),
]

dates = [s[0] for s in samples]
values = [s[1] for s in samples]

# Enumerate and zip
for i, (d, v) in enumerate(zip(dates, values), start=1):
    print(f"  #{i}: {d} -> {v}")

# Day-over-day diff with pairwise
deltas = [b - a for a, b in pairwise(values)]
print("daily deltas:", deltas)

# Running total
print("running sum :", list(accumulate(values)))

# Top 3 busiest days by value
top3 = heapq.nlargest(3, samples, key=itemgetter(1))
print("top 3:", top3)

# Group consecutive days where the value strictly increased vs not
def trend(pair): return "up" if pair[1] > pair[0] else "flat-or-down"

pairs = list(pairwise(values))
for key, group in groupby(pairs, key=trend):
    group = list(group)
    print(f"{key:14s} run of {len(group)} pair(s)")

# Windowed mean of size 3
def windowed(seq, n):
    window: list = []
    for v in seq:
        window.append(v)
        if len(window) > n:
            window.pop(0)
        if len(window) == n:
            yield sum(window) / n

print("3-day avg   :", [round(x, 2) for x in windowed(values, 3)])

Each block is one pattern:

1) `zip` + `enumerate` turns two parallel lists into labeled rows.
2) `pairwise` is the canonical way to take differences.
3) `accumulate` gives you running totals in a single call.
4) `groupby` clusters consecutive equal keys (data must already be sorted by key for full groups).

Practice chunking and top-k.

from itertools import islice

def chunks(seq, n):
    it = iter(seq)
    while True:
        batch = list(islice(it, n))
        if not batch:
            return
        yield batch

print(list(chunks(range(7), 3)))

import heapq
scores = [("ana", 91), ("ben", 78), ("cai", 85), ("dev", 95)]
print(heapq.nlargest(2, scores, key=lambda p: p[1]))

Tiny truths about the patterns.

from itertools import accumulate, pairwise
assert list(accumulate([1, 2, 3])) == [1, 3, 6]
assert list(pairwise([1, 2, 3, 4])) == [(1, 2), (2, 3), (3, 4)]
assert sorted([(1, "b"), (2, "a")], key=lambda p: p[1]) == [(2, "a"), (1, "b")]

Running prints:

  #1: 2026-01-01 -> 10
  #2: 2026-01-02 -> 12
  #3: 2026-01-03 -> 11
  #4: 2026-01-04 -> 15
  #5: 2026-01-05 -> 14
  #6: 2026-01-06 -> 20
  #7: 2026-01-07 -> 18
daily deltas: [2, -1, 4, -1, 6, -2]
running sum : [10, 22, 33, 48, 62, 82, 100]
top 3: [('2026-01-06', 20), ('2026-01-07', 18), ('2026-01-04', 15)]
up             run of 1 pair(s)
flat-or-down   run of 1 pair(s)
up             run of 1 pair(s)
flat-or-down   run of 1 pair(s)
up             run of 1 pair(s)
flat-or-down   run of 1 pair(s)
3-day avg   : [11.0, 12.67, 13.33, 16.33, 17.33]