Creating Dictionaries and Sets Efficiently

Dict and set comprehensions carry the same philosophy as list comprehensions into mapping and membership territory. A dict comprehension is {key_expr: value_expr for x in it [if cond]}; a set comprehension is the same without the colon. Both produce a fresh object from an iterable in a single expression, and both are the idiomatic choice when you want to transform or filter into a new dict or set.

Dict comprehensions shine whenever a mapping is derived from something else: a list of pairs, two parallel sequences, an existing dict you want to transform or filter. {n: n*n for n in range(5)} gives you {0: 0, 1: 1, 2: 4, 3: 9, 4: 16} in one line; {k: v.upper() for k, v in d.items()} returns a copy with uppercased values.

Set comprehensions are the natural partner when you also need deduplication. {word.lower() for word in text.split()} gives you a set of unique, lowercased tokens; the set takes care of duplicates implicitly. For problems where the final order doesn't matter, a set comprehension is cleaner than building a list and converting.

The familiar restrictions apply. Dict keys and set elements must be hashable; if you try to build a set of lists you'll get TypeError. Comprehensions run eagerly and allocate memory for the whole result; if the input is huge and you only need to iterate once, use a generator expression instead.

From pairs and from other dicts

dict(zip(keys, values)) is often shorter than a dict comprehension when you already have two parallel sequences. Use the comprehension form when you need a transformation: {k: int(v) for k, v in pairs}. Inverting a dict is the classic one-liner: {v: k for k, v in d.items()} (remember it silently drops duplicates).

Filter values as they go in: {k: v for k, v in d.items() if v is not None}. Transform keys at the same time: {k.upper(): v for k, v in d.items()}. Both read naturally once you're used to the pattern.

Deduplication and grouping with sets

{token.lower() for token in tokens} is the textbook dedup. For grouping into sets of unique items, fall back to defaultdict(set): groups.setdefault(key, set()).add(item). Set comprehensions don't handle grouping in a single expression, and that's fine.

If you need a stable order after deduplication, a list comprehension with a "seen" set is the idiom: [x for x in xs if x not in seen and not seen.add(x)]. Slightly hacky, but well-known and efficient.

Dict and set comprehensions in context.

Tool	Purpose
`{k: v for ...}` syntax	Dict comprehension.
`{e for ...}` syntax	Set comprehension.
`dict(zip(k, v))` idiom	Pairs two parallel iterables.
`d.items()` method	Pairs of (key, value) for iteration.
`defaultdict(set)` class	Auto-creates a set per missing key.
`frozenset(...)` built-in	Immutable, hashable set.
`itemgetter(i)` factory	Extracts position i; handy as a key in loops.
`Counter` class	Specialized dict for counting occurrences.

Creating Dictionaries and Sets Efficiently code example

The script derives dicts and sets from the same data source using comprehensions throughout.

# Lesson: Creating Dictionaries and Sets Efficiently
from collections import defaultdict

text = "The quick brown fox jumps over the lazy dog the quick fox"
tokens = text.lower().split()

# Set comprehension: unique tokens
unique = {t for t in tokens}
print("unique:", sorted(unique))

# Dict comprehension: token -> length
lengths = {t: len(t) for t in unique}
print("lengths:", dict(sorted(lengths.items())))

# Dict comprehension from pairs + filter
pairs = [("host", "localhost"), ("port", "5432"), ("debug", "")]
config = {k: v for k, v in pairs if v}
print("config:", config)

# Invert a dict (values must be unique to avoid collisions)
src = {"a": 1, "b": 2, "c": 3}
inv = {v: k for k, v in src.items()}
print("inverted:", inv)

# Group tokens by first letter (not a one-liner; defaultdict helps)
groups: defaultdict[str, set[str]] = defaultdict(set)
for t in tokens:
    groups[t[0]].add(t)
print("groups:", {k: sorted(v) for k, v in sorted(groups.items())})

# Frozenset as a dict key
seen: dict[frozenset, str] = {}
seen[frozenset({"py", "sql"})] = "tag-1"
print("frozen key:", seen[frozenset({"sql", "py"})])

Watch how each comprehension earns its keep:

1) Set comprehension replaces 'start with set() and add inside a loop'.
2) Dict comprehension from `.items()` is the standard copy-with-transform form.
3) Filtering in the same expression avoids a second pass.
4) Grouping needs `defaultdict(set)`; there's no one-liner equivalent and that's fine.

Practice with a frequency map and a whitelist filter.

from collections import Counter

words = "red blue red green blue red".split()
counts = Counter(words)
print(counts)

# Keep keys only if value is above a threshold
freq = {w: n for w, n in counts.items() if n >= 2}
print(freq)

# Whitelisted tokens
allowed = {"red", "green"}
only_allowed = {w for w in words if w in allowed}
print(sorted(only_allowed))

Core comprehension invariants.

assert {n: n*n for n in range(3)} == {0: 0, 1: 1, 2: 4}
assert {n % 2 for n in range(5)} == {0, 1}
assert dict(zip("ab", [1, 2])) == {"a": 1, "b": 2}
d = {"a": 1, "b": 2}
assert {v: k for k, v in d.items()} == {1: "a", 2: "b"}

Running prints:

unique: ['brown', 'dog', 'fox', 'jumps', 'lazy', 'over', 'quick', 'the']
lengths: {'brown': 5, 'dog': 3, 'fox': 3, 'jumps': 5, 'lazy': 4, 'over': 4, 'quick': 5, 'the': 3}
config: {'host': 'localhost', 'port': '5432'}
inverted: {1: 'a', 2: 'b', 3: 'c'}
groups: {'b': ['brown'], 'd': ['dog'], 'f': ['fox'], 'j': ['jumps'], 'l': ['lazy'], 'o': ['over'], 'q': ['quick'], 't': ['the']}
frozen key: tag-1

From pairs and from other dicts

Deduplication and grouping with sets

Creating Dictionaries and Sets Efficiently code example

Related Resources