JSON is the lingua franca of structured data on the internet. Python's json module maps cleanly between Python objects and JSON text: dict becomes an object, list becomes an array, int/float/bool/None become their JSON equivalents, and str stays a string. That one-to-one mapping makes JSON the easiest serialization format to reach for.
The four entry points you will use 99% of the time are json.dumps(obj) (Python object → JSON string), json.loads(text) (string → object), and their file-oriented cousins json.dump(obj, f) and json.load(f). Pass indent=2 for human-readable output; omit it for compact machine-to-machine payloads.
Some Python types do not round-trip through JSON natively: datetime, Decimal, set, dataclasses. For them, either convert to a JSON-friendly form before dumping (dt.isoformat(), list(s), dataclasses.asdict(d)) or supply a custom default= callback to json.dumps. On the way back in, convert strings to proper types yourself.
For stricter schemas — validating types, applying default values, rejecting unknown keys — use pydantic or dataclasses + dacite. They trade a small setup cost for large reliability wins in anything longer-lived than a one-off script. For very large JSON documents, use ijson to stream rather than materialize the whole structure.
dumps, loads, and indentation
json.dumps(obj, indent=2, sort_keys=True) is the most useful form for files humans will read. sort_keys=True makes output stable for diffing. separators=(",", ":") yields compact output when bytes matter.
ensure_ascii=False keeps Unicode characters verbatim rather than escaping them as \uXXXX; almost always what you want for non-English text.
Custom types and validation
A quick default= handler is three lines: def _enc(o): if isinstance(o, datetime): return o.isoformat(); raise TypeError. Pass it to json.dumps(obj, default=_enc). On the way back, object_hook converts dicts into domain objects before they are returned.
For validated structures use pydantic.BaseModel: define fields with types, get validation, conversion and serialization for free. It is a 30x-heavier dependency than json; worth it the moment the data crosses a trust boundary.
JSON-handling tools.
| Tool | Purpose |
|---|---|
json.dumps(obj, indent=2)function | Python object → JSON string. |
json.loads(text)function | JSON string → Python object. |
json.dump(obj, f)function | Write JSON to a file object. |
json.load(f)function | Read JSON from a file object. |
json.JSONEncoderclass | Subclass for custom types. |
dataclasses.asdictfunction | Convert dataclass to dict. |
object_pairs_hookparameter | Preserve duplicate keys or order. |
pydanticlibrary | Rich validation / schema / JSON layer. |
Storing and Retrieving Structured Data code example
The script serializes a dict containing a datetime and a dataclass, round-trips it through a file, and re-parses the timestamp.
# Lesson: Storing and Retrieving Structured Data
import json
from dataclasses import asdict, dataclass
from datetime import datetime, timezone
from pathlib import Path
from tempfile import gettempdir
@dataclass
class Event:
kind: str
user: str
def default_encoder(obj):
if isinstance(obj, datetime):
return obj.isoformat()
if hasattr(obj, "__dataclass_fields__"):
return asdict(obj)
raise TypeError(f"cannot encode {type(obj).__name__}")
payload = {
"recorded_at": datetime.now(timezone.utc),
"count": 2,
"events": [Event(kind="login", user="ana"), Event(kind="logout", user="ana")],
}
text = json.dumps(payload, indent=2, default=default_encoder, ensure_ascii=False)
print(text)
# File round trip
p = Path(gettempdir()) / "events.json"
with open(p, "w", encoding="utf-8") as f:
json.dump(payload, f, indent=2, default=default_encoder)
with open(p, "r", encoding="utf-8") as f:
reloaded = json.load(f)
# Convert strings back to domain objects manually
reloaded["recorded_at"] = datetime.fromisoformat(reloaded["recorded_at"])
reloaded["events"] = [Event(**ev) for ev in reloaded["events"]]
print("type of recorded_at:", type(reloaded["recorded_at"]).__name__)
print("events: ", reloaded["events"])
p.unlink()
Focus on the serialization boundary:
1) Custom types travel as strings or dicts through JSON.
2) `default=encoder` handles anything `json` doesn't recognize.
3) On the way back, convert ISO strings to `datetime` yourself.
4) Dataclass -> dict -> dataclass via `Event(**ev)` preserves the Python type.
Parse a small JSONL (one JSON per line) log.
import json
from io import StringIO
lines = StringIO(
'{"kind": "login", "user": "ana"}\n'
'{"kind": "view", "user": "ana", "page": "/home"}\n'
)
events = [json.loads(line) for line in lines if line.strip()]
print(events)
print("unique users:", {e["user"] for e in events})
Round-trip invariants.
import json
payload = {"a": 1, "b": [1, 2, 3], "c": None, "d": True}
assert json.loads(json.dumps(payload)) == payload
assert json.loads("[1, 2, 3]") == [1, 2, 3]
Running the script prints something like:
{
"recorded_at": "2026-04-21T10:00:00+00:00",
"count": 2,
"events": [
{"kind": "login", "user": "ana"},
{"kind": "logout", "user": "ana"}
]
}
type of recorded_at: datetime
events: [Event(kind='login', user='ana'), Event(kind='logout', user='ana')]