Handling Errors in Large Applications

In small scripts, failing fast with a bare traceback is usually enough. In large applications, errors have to be categorised, logged, retried where it makes sense, turned into user-friendly messages at the edges, and measured over time. The stdlib gives you the primitives (exceptions, logging, contextlib); the rest is discipline.

Design your exceptions. Create a small hierarchy rooted at a single base class (class AppError(Exception): ...) and derive specific types from it (NotFoundError, ValidationError, TransientError). Catching the base at the edge gives you one place to translate internal errors into user-facing responses.

Don't swallow exceptions. Catching Exception: and logging a warning is the most common source of silent bugs in Python code. Catch specific types, or catch broadly only at clearly-defined boundaries (HTTP entry point, background task worker, CLI main). Always log the full traceback with log.exception("...") so the cause survives.

Chain exceptions instead of losing information. raise ProfileError("load failed") from err preserves the original traceback as the cause. raise from None suppresses the cause (rarely right). Centralize retry logic in a small helper with explicit backoff and a ceiling — unbounded retries hide outages rather than report them.

Exception hierarchies and edges

A project's exceptions live in one module: from myapp.errors import NotFoundError, ValidationError. That is also where error-code mappings and HTTP translations live. Every module raises from this set, never directly from the stdlib.

At the edge (HTTP handler, CLI, message consumer), catch AppError and translate; catch Exception only to log and re-raise or return a 500. Never let internal types leak to users.

Logging, retries, observability

Use logging with named loggers (log = logging.getLogger(__name__)). Call log.exception inside except to include the traceback. Configure handlers once at startup; prefer JSON output in production for easy ingestion.

Retries belong behind a helper (retry(attempts, delay)) with a jittered exponential backoff and a classification of retryable vs fatal. Error-rate metrics (Sentry, OpenTelemetry, custom counters) complete the picture.

Error-handling tools.

Tool	Purpose
`exceptions` language feature	try/except/else/finally.
`raise X from err` syntax	Chain an exception with its cause.
`logging` module	Leveled, hierarchical logging.
`log.exception` method	Log current traceback at ERROR level.
`traceback` module	Programmatic traceback formatting.
`contextlib.suppress` class	Silently swallow specific exceptions.
`warnings` module	Non-fatal notices.
`Sentry` service	Centralized error tracking (third-party).

Handling Errors in Large Applications code example

The script defines a small exception hierarchy, a retry helper, and a boundary handler that translates errors.

# Lesson: Handling Errors in Large Applications
import logging
import random
import time

logging.basicConfig(level=logging.INFO, format="%(levelname)s %(name)s %(message)s")
log = logging.getLogger("app")


class AppError(Exception):
    """Base class for everything our app raises."""


class NotFoundError(AppError):
    pass


class TransientError(AppError):
    pass


def retry(fn, *, attempts=3, base_delay=0.01):
    for i in range(1, attempts + 1):
        try:
            return fn()
        except TransientError as err:
            if i == attempts:
                raise
            delay = base_delay * (2 ** (i - 1))
            log.warning("attempt %d failed: %s (retry in %.3fs)", i, err, delay)
            time.sleep(delay)


calls = {"n": 0}

def load_user(uid: int) -> dict:
    calls["n"] += 1
    if uid < 0:
        raise NotFoundError(f"uid {uid} does not exist")
    if calls["n"] < 2 and random.random() < 0.5:
        raise TransientError("temporary glitch")
    return {"id": uid, "name": "ana"}


def handle(uid: int) -> str:
    try:
        user = retry(lambda: load_user(uid))
        return f"ok: {user}"
    except NotFoundError as err:
        return f"not found: {err}"
    except AppError as err:
        log.exception("unexpected app error")
        return "internal error"


random.seed(1)
print(handle(7))
print(handle(-1))

# Chained exceptions preserve the cause for debuggers
try:
    try:
        int("notanumber")
    except ValueError as err:
        raise AppError("bad input") from err
except AppError as err:
    print("top  :", err)
    print("cause:", err.__cause__)

Study the layers:

1) Custom hierarchy (`AppError` → `NotFoundError`/`TransientError`) separates cases.
2) `retry()` only retries the transient category — not every exception.
3) Boundary handler `handle()` returns user-friendly text; internals never leak.
4) `raise AppError from err` preserves the original cause for tracebacks.

Use contextlib.suppress for an optional operation.

import logging
from contextlib import suppress

log = logging.getLogger(__name__)

def remove_tmp(path):
    with suppress(FileNotFoundError):
        path.unlink()
    log.info("cleanup done")

from pathlib import Path
remove_tmp(Path("does/not/exist"))

Chained-exception facts.

class A(Exception): pass
class B(Exception): pass
try:
    try:
        raise A("x")
    except A as e:
        raise B("y") from e
except B as outer:
    assert isinstance(outer.__cause__, A)

Running prints something like:

WARNING app attempt 1 failed: temporary glitch (retry in 0.010s)
ok: {'id': 7, 'name': 'ana'}
not found: uid -1 does not exist
top  : bad input
cause: invalid literal for int() with base 10: 'notanumber'

Exception hierarchies and edges

Logging, retries, observability

Handling Errors in Large Applications code example

Related Resources