Storing and Retrieving Structured Data

JSON is the lingua franca of structured data on the internet. Python's json module maps cleanly between Python objects and JSON text: dict becomes an object, list becomes an array, int/float/bool/None become their JSON equivalents, and str stays a string. That one-to-one mapping makes JSON the easiest serialization format to reach for.

The four entry points you will use 99% of the time are json.dumps(obj) (Python object → JSON string), json.loads(text) (string → object), and their file-oriented cousins json.dump(obj, f) and json.load(f). Pass indent=2 for human-readable output; omit it for compact machine-to-machine payloads.

Some Python types do not round-trip through JSON natively: datetime, Decimal, set, dataclasses. For them, either convert to a JSON-friendly form before dumping (dt.isoformat(), list(s), dataclasses.asdict(d)) or supply a custom default= callback to json.dumps. On the way back in, convert strings to proper types yourself.

For stricter schemas — validating types, applying default values, rejecting unknown keys — use pydantic or dataclasses + dacite. They trade a small setup cost for large reliability wins in anything longer-lived than a one-off script. For very large JSON documents, use ijson to stream rather than materialize the whole structure.

dumps, loads, and indentation

json.dumps(obj, indent=2, sort_keys=True) is the most useful form for files humans will read. sort_keys=True makes output stable for diffing. separators=(",", ":") yields compact output when bytes matter.

ensure_ascii=False keeps Unicode characters verbatim rather than escaping them as \uXXXX; almost always what you want for non-English text.

Custom types and validation

A quick default= handler is three lines: def _enc(o): if isinstance(o, datetime): return o.isoformat(); raise TypeError. Pass it to json.dumps(obj, default=_enc). On the way back, object_hook converts dicts into domain objects before they are returned.

For validated structures use pydantic.BaseModel: define fields with types, get validation, conversion and serialization for free. It is a 30x-heavier dependency than json; worth it the moment the data crosses a trust boundary.

JSON-handling tools.

ToolPurpose
json.dumps(obj, indent=2)
function
Python object → JSON string.
json.loads(text)
function
JSON string → Python object.
json.dump(obj, f)
function
Write JSON to a file object.
json.load(f)
function
Read JSON from a file object.
json.JSONEncoder
class
Subclass for custom types.
dataclasses.asdict
function
Convert dataclass to dict.
object_pairs_hook
parameter
Preserve duplicate keys or order.
pydantic
library
Rich validation / schema / JSON layer.

Storing and Retrieving Structured Data code example

The script serializes a dict containing a datetime and a dataclass, round-trips it through a file, and re-parses the timestamp.

# Lesson: Storing and Retrieving Structured Data
import json
from dataclasses import asdict, dataclass
from datetime import datetime, timezone
from pathlib import Path
from tempfile import gettempdir


@dataclass
class Event:
    kind: str
    user: str


def default_encoder(obj):
    if isinstance(obj, datetime):
        return obj.isoformat()
    if hasattr(obj, "__dataclass_fields__"):
        return asdict(obj)
    raise TypeError(f"cannot encode {type(obj).__name__}")


payload = {
    "recorded_at": datetime.now(timezone.utc),
    "count": 2,
    "events": [Event(kind="login", user="ana"), Event(kind="logout", user="ana")],
}

text = json.dumps(payload, indent=2, default=default_encoder, ensure_ascii=False)
print(text)

# File round trip
p = Path(gettempdir()) / "events.json"
with open(p, "w", encoding="utf-8") as f:
    json.dump(payload, f, indent=2, default=default_encoder)

with open(p, "r", encoding="utf-8") as f:
    reloaded = json.load(f)

# Convert strings back to domain objects manually
reloaded["recorded_at"] = datetime.fromisoformat(reloaded["recorded_at"])
reloaded["events"] = [Event(**ev) for ev in reloaded["events"]]

print("type of recorded_at:", type(reloaded["recorded_at"]).__name__)
print("events:              ", reloaded["events"])

p.unlink()

Focus on the serialization boundary:

1) Custom types travel as strings or dicts through JSON.
2) `default=encoder` handles anything `json` doesn't recognize.
3) On the way back, convert ISO strings to `datetime` yourself.
4) Dataclass -> dict -> dataclass via `Event(**ev)` preserves the Python type.

Parse a small JSONL (one JSON per line) log.

import json
from io import StringIO

lines = StringIO(
    '{"kind": "login", "user": "ana"}\n'
    '{"kind": "view", "user": "ana", "page": "/home"}\n'
)

events = [json.loads(line) for line in lines if line.strip()]
print(events)
print("unique users:", {e["user"] for e in events})

Round-trip invariants.

import json
payload = {"a": 1, "b": [1, 2, 3], "c": None, "d": True}
assert json.loads(json.dumps(payload)) == payload
assert json.loads("[1, 2, 3]") == [1, 2, 3]

Running the script prints something like:

{
  "recorded_at": "2026-04-21T10:00:00+00:00",
  "count": 2,
  "events": [
    {"kind": "login", "user": "ana"},
    {"kind": "logout", "user": "ana"}
  ]
}
type of recorded_at: datetime
events:               [Event(kind='login', user='ana'), Event(kind='logout', user='ana')]