Understanding Sets and Their Properties

A set is an unordered collection of unique, hashable items. You can think of it as a dict that only remembers its keys. Sets are the right tool any time you care about membership or uniqueness and not about order: removing duplicates from a list, checking whether something is in a whitelist, finding the common elements between two groups — all of these are one-liners with a set.

Because sets rely on hashing, their two hallmark properties come essentially for free. First, x in s is O(1) on average, regardless of how many items the set contains. Second, duplicates are automatically discarded: set([1, 2, 2, 3]) is {1, 2, 3}. The trade-off is that sets do not preserve insertion order and cannot contain unhashable items like lists.

Creating a set is slightly tricky because {} is an empty dict, not an empty set. Use set() for an empty set and {1, 2, 3} for non-empty ones. The set() constructor accepts any iterable and deduplicates as it goes, so set("hello") becomes {'h', 'e', 'l', 'o'}.

Python also ships a frozenset for cases where you need a set that cannot be changed afterwards. A frozenset is hashable, which means it can itself be used as a dict key or put inside another set — handy for representing, for example, immutable tags attached to a record.

What goes in, what does not

Every item in a set must be hashable: numbers, strings, tuples of hashables, frozensets. Putting a list into a set raises TypeError: unhashable type. If your data is mutable, convert it first (for example, turn each row into a tuple) or use a different container.

Sets compare by content: {1, 2} == {2, 1} is True because order doesn't matter. len(s) gives the size; for x in s iterates in an arbitrary (but consistent for a given interpreter run) order.

Mutable vs frozen

s.add(x), s.discard(x) and s.remove(x) modify a set in place. (discard is safe on missing items; remove raises.) You cannot use any of these on a frozenset; everything has to be passed to the constructor.

Because both flavors implement the same read-only interface (membership tests, union, intersection, iteration), you can often accept either and benefit from the caller's choice. Annotate parameters as Iterable[T] or AbstractSet[T] if you want to be maximally flexible.

The core tools for working with sets.

ToolPurpose
set
built-in type
Mutable unordered collection of unique items.
frozenset
built-in type
Immutable, hashable set.
s.add(x)
method
Adds x if it is not already present.
s.discard(x)
method
Removes x if present; no error otherwise.
s.remove(x)
method
Removes x; raises KeyError if absent.
x in s
operator
O(1) membership test on average.
hash(obj)
built-in
Raises TypeError for unhashable items.
s.copy()
method
Returns a shallow copy of the set.

Understanding Sets and Their Properties code example

The script below exercises creation, membership, uniqueness and the hashability rule that trips beginners up.

# Lesson: Understanding Sets and Their Properties
from typing import Iterable


def dedupe(items: Iterable) -> set:
    return set(items)


empty = set()
small = {1, 2, 3}
from_string = set("mississippi")
frozen = frozenset([1, 2, 3])

print("empty, len:", empty, len(empty))
print("small:     ", small)
print("letters:   ", from_string)
print("frozen:    ", frozen)

small.add(4)
small.discard(10)        # safe no-op
try:
    small.remove(10)
except KeyError as err:
    print("missing:   ", err)

print("grown:     ", sorted(small))
print("contains 2:", 2 in small)
print("equal?    :", {1, 2, 3, 4} == small)

# Unhashable items fail loudly
try:
    bad = {[1, 2]}
except TypeError as err:
    print("unhashable:", err)

# Frozensets are hashable (can be dict keys)
tags = {frozenset({"py", "sql"}): "tag-1"}
print("tag key:   ", tags[frozenset({"sql", "py"})])

As you read, look for four ideas:

1) `set()` is the only way to get an empty set; `{}` is a dict.
2) Adding an existing item is a no-op; there is no such thing as a duplicate entry.
3) `discard` is safe on missing items; `remove` raises KeyError.
4) Frozensets are hashable and therefore suitable as dict keys.

Try two tiny exercises that exercise uniqueness and hashability.

# Example A: deduplicate while comparing two batches
emails_a = ["A@x.com", "b@x.com", "A@x.com"]
unique = {e.lower() for e in emails_a}
print(unique)

# Example B: tag a record with a frozenset
record = {"id": 1, "tags": frozenset(["python", "oop"])}
print("python" in record["tags"])

Each assertion targets a core set property.

assert len({1, 1, 2}) == 2
assert 2 in {1, 2, 3} and 4 not in {1, 2, 3}
assert set("aab") == {"a", "b"}
assert isinstance(frozenset([1, 2]), frozenset)

Running prints roughly (iteration order may differ):

empty, len: set() 0
small:      {1, 2, 3}
letters:    {'m', 'i', 's', 'p'}
frozen:     frozenset({1, 2, 3})
missing:    10
grown:      [1, 2, 3, 4]
contains 2: True
equal?    : True
unhashable: unhashable type: 'list'
tag key:    tag-1