Sometimes the data you want to persist isn't plain text — it is a web of live Python objects: a partially-trained model, a cache of computed values, a game state. The pickle module serializes almost any Python object to a compact binary stream and reconstructs it later, references and all. It is the answer to “save this whole Python object to disk”.
The API is intentionally tiny: pickle.dumps(obj) produces bytes, pickle.loads(data) reverses it. The file variants are pickle.dump(obj, f) and pickle.load(f). Always open the file in binary mode ("wb"/"rb"); pickle is not text.
The headline caveat: pickle is not safe for untrusted data. Loading a pickle can execute arbitrary code embedded in it. Only unpickle data you produced yourself or received over a trusted channel. For portable or secure serialization, prefer json, msgpack, or a schema-based tool like protobuf.
For key-value persistence without setting up a database, shelve wraps pickle into a dictionary-like store backed by dbm. You open a shelf, treat it like a dict, and the values are automatically pickled. Perfect for small caches and session stores; not a replacement for SQLite or Redis.
Pickle basics
Protocol 5 (the default on 3.8+) is the fastest and smallest. Picklable objects include lists, dicts, sets, classes you defined, dataclasses, most built-ins. Non-picklable: open files, lambdas, locks, generators.
For bigger streams, pickle can be combined with gzip or lzma to shrink the output. Always remember: the file format is tied to the class definitions. If you rename a class, old pickles break.
Shelve as a persistent dict
with shelve.open("cache.db") as db: db["ana"] = [1, 2, 3]. The keys are strings; the values are anything pickle can serialize. Changes are written on close (or on writeback=True, immediately).
shelve is process-local and not safe for concurrent writers. For multi-process storage reach for a proper database. But within a single script, it removes the boilerplate of “load a pickle at start, save on exit”.
Binary serialization tools.
| Tool | Purpose |
|---|---|
picklemodule | Serialize Python objects to bytes. |
pickle.dumps(obj)function | Returns a bytes pickle of obj. |
pickle.loads(bytes)function | Reconstructs obj from bytes. |
pickle.HIGHEST_PROTOCOLconstant | Use the newest pickle protocol. |
shelve.open(path)function | Dict-like persistent store. |
dbmmodule | Low-level key-value store underneath shelve. |
gzip.open(path, 'wb')function | Transparent gzip wrapper for any write. |
copyreg.pickle(cls, fn)function | Customize pickling of a class. |
Saving and Restoring Program Data code example
The script pickles a small state bundle, restores it, and demonstrates shelve as a persistent dict.
# Lesson: Saving and Restoring Program Data
import pickle
import shelve
from dataclasses import dataclass
from pathlib import Path
from tempfile import gettempdir
@dataclass
class Game:
score: int
seed: int
items: list[str]
tmp = Path(gettempdir())
# -- pickle round trip --
state = Game(score=42, seed=2025, items=["key", "map", "coin"])
path = tmp / "save.pkl"
with open(path, "wb") as f:
pickle.dump(state, f, protocol=pickle.HIGHEST_PROTOCOL)
with open(path, "rb") as f:
restored = pickle.load(f)
print("restored:", restored)
print("same type:", type(restored) is Game, "| equal:", restored == state)
# -- shelve as a persistent dict --
shelf_path = tmp / "cache"
with shelve.open(str(shelf_path)) as db:
db["ana"] = {"score": 91, "streak": 4}
db["ben"] = {"score": 78, "streak": 1}
with shelve.open(str(shelf_path)) as db:
print("ana in shelf:", db["ana"])
print("keys: ", sorted(db.keys()))
# Cleanup
path.unlink()
for ext in ("", ".db", ".bak", ".dat", ".dir"):
p = Path(str(shelf_path) + ext)
if p.exists():
p.unlink()
Key points to internalize:
1) Pickle preserves the exact Python type (Game stays Game).
2) Open files in binary mode (`wb`/`rb`); pickle is bytes.
3) Shelve = dict whose values are auto-pickled; perfect for small caches.
4) Never unpickle untrusted bytes; use JSON for safe interchange.
Practice a pickle + gzip combo.
import gzip, pickle
from pathlib import Path
from tempfile import gettempdir
data = {"nums": list(range(1000)), "tag": "demo"}
p = Path(gettempdir()) / "snapshot.pkl.gz"
with gzip.open(p, "wb") as f:
pickle.dump(data, f)
print("size:", p.stat().st_size, "bytes")
with gzip.open(p, "rb") as f:
back = pickle.load(f)
print(back["tag"], len(back["nums"]))
p.unlink()
Round-trip checks.
import pickle
obj = {"a": [1, 2, 3], "b": {1, 2}, "c": (7, 8)}
back = pickle.loads(pickle.dumps(obj))
assert back == obj
assert type(back["b"]) is set
Running prints:
restored: Game(score=42, seed=2025, items=['key', 'map', 'coin'])
same type: True | equal: True
ana in shelf: {'score': 91, 'streak': 4}
keys: ['ana', 'ben']