Creating Dictionaries and Sets Efficiently

Dict and set comprehensions carry the same philosophy as list comprehensions into mapping and membership territory. A dict comprehension is {key_expr: value_expr for x in it [if cond]}; a set comprehension is the same without the colon. Both produce a fresh object from an iterable in a single expression, and both are the idiomatic choice when you want to transform or filter into a new dict or set.

Dict comprehensions shine whenever a mapping is derived from something else: a list of pairs, two parallel sequences, an existing dict you want to transform or filter. {n: n*n for n in range(5)} gives you {0: 0, 1: 1, 2: 4, 3: 9, 4: 16} in one line; {k: v.upper() for k, v in d.items()} returns a copy with uppercased values.

Set comprehensions are the natural partner when you also need deduplication. {word.lower() for word in text.split()} gives you a set of unique, lowercased tokens; the set takes care of duplicates implicitly. For problems where the final order doesn't matter, a set comprehension is cleaner than building a list and converting.

The familiar restrictions apply. Dict keys and set elements must be hashable; if you try to build a set of lists you'll get TypeError. Comprehensions run eagerly and allocate memory for the whole result; if the input is huge and you only need to iterate once, use a generator expression instead.

From pairs and from other dicts

dict(zip(keys, values)) is often shorter than a dict comprehension when you already have two parallel sequences. Use the comprehension form when you need a transformation: {k: int(v) for k, v in pairs}. Inverting a dict is the classic one-liner: {v: k for k, v in d.items()} (remember it silently drops duplicates).

Filter values as they go in: {k: v for k, v in d.items() if v is not None}. Transform keys at the same time: {k.upper(): v for k, v in d.items()}. Both read naturally once you're used to the pattern.

Deduplication and grouping with sets

{token.lower() for token in tokens} is the textbook dedup. For grouping into sets of unique items, fall back to defaultdict(set): groups.setdefault(key, set()).add(item). Set comprehensions don't handle grouping in a single expression, and that's fine.

If you need a stable order after deduplication, a list comprehension with a "seen" set is the idiom: [x for x in xs if x not in seen and not seen.add(x)]. Slightly hacky, but well-known and efficient.

Dict and set comprehensions in context.

ToolPurpose
{k: v for ...}
syntax
Dict comprehension.
{e for ...}
syntax
Set comprehension.
dict(zip(k, v))
idiom
Pairs two parallel iterables.
d.items()
method
Pairs of (key, value) for iteration.
defaultdict(set)
class
Auto-creates a set per missing key.
frozenset(...)
built-in
Immutable, hashable set.
itemgetter(i)
factory
Extracts position i; handy as a key in loops.
Counter
class
Specialized dict for counting occurrences.

Creating Dictionaries and Sets Efficiently code example

The script derives dicts and sets from the same data source using comprehensions throughout.

# Lesson: Creating Dictionaries and Sets Efficiently
from collections import defaultdict

text = "The quick brown fox jumps over the lazy dog the quick fox"
tokens = text.lower().split()

# Set comprehension: unique tokens
unique = {t for t in tokens}
print("unique:", sorted(unique))

# Dict comprehension: token -> length
lengths = {t: len(t) for t in unique}
print("lengths:", dict(sorted(lengths.items())))

# Dict comprehension from pairs + filter
pairs = [("host", "localhost"), ("port", "5432"), ("debug", "")]
config = {k: v for k, v in pairs if v}
print("config:", config)

# Invert a dict (values must be unique to avoid collisions)
src = {"a": 1, "b": 2, "c": 3}
inv = {v: k for k, v in src.items()}
print("inverted:", inv)

# Group tokens by first letter (not a one-liner; defaultdict helps)
groups: defaultdict[str, set[str]] = defaultdict(set)
for t in tokens:
    groups[t[0]].add(t)
print("groups:", {k: sorted(v) for k, v in sorted(groups.items())})

# Frozenset as a dict key
seen: dict[frozenset, str] = {}
seen[frozenset({"py", "sql"})] = "tag-1"
print("frozen key:", seen[frozenset({"sql", "py"})])

Watch how each comprehension earns its keep:

1) Set comprehension replaces 'start with set() and add inside a loop'.
2) Dict comprehension from `.items()` is the standard copy-with-transform form.
3) Filtering in the same expression avoids a second pass.
4) Grouping needs `defaultdict(set)`; there's no one-liner equivalent and that's fine.

Practice with a frequency map and a whitelist filter.

from collections import Counter

words = "red blue red green blue red".split()
counts = Counter(words)
print(counts)

# Keep keys only if value is above a threshold
freq = {w: n for w, n in counts.items() if n >= 2}
print(freq)

# Whitelisted tokens
allowed = {"red", "green"}
only_allowed = {w for w in words if w in allowed}
print(sorted(only_allowed))

Core comprehension invariants.

assert {n: n*n for n in range(3)} == {0: 0, 1: 1, 2: 4}
assert {n % 2 for n in range(5)} == {0, 1}
assert dict(zip("ab", [1, 2])) == {"a": 1, "b": 2}
d = {"a": 1, "b": 2}
assert {v: k for k, v in d.items()} == {1: "a", 2: "b"}

Running prints:

unique: ['brown', 'dog', 'fox', 'jumps', 'lazy', 'over', 'quick', 'the']
lengths: {'brown': 5, 'dog': 3, 'fox': 3, 'jumps': 5, 'lazy': 4, 'over': 4, 'quick': 5, 'the': 3}
config: {'host': 'localhost', 'port': '5432'}
inverted: {1: 'a', 2: 'b', 3: 'c'}
groups: {'b': ['brown'], 'd': ['dog'], 'f': ['fox'], 'j': ['jumps'], 'l': ['lazy'], 'o': ['over'], 'q': ['quick'], 't': ['the']}
frozen key: tag-1