Performing Advanced File Tasks

Beyond the basics of open and read, real programs need to handle large files, binary files, files that must be replaced atomically, and files you iterate line-by-line over a network. Python's file API covers all of these, once you know which tools to combine. This lesson walks through the most useful advanced patterns.

For large files, never use f.read(). Iterate line-by-line: for line in f:. Each line is read lazily, so memory stays flat even on multi-gigabyte files. For binary streams with fixed record sizes, f.read(n) in a loop is the equivalent. Combine with generators to build pipelines that never materialize the whole file.

For atomic replacement (“write the new file or leave the old one untouched”), write to a temporary file next to the target and os.replace(tmp, target). os.replace is atomic on every POSIX system and on Windows. Never open the real file for writing until your new content is fully produced; a crash mid-write otherwise leaves a corrupt half-file.

For structured binary formats (images, archives, floats, network packets), struct packs and unpacks fixed-layout records (struct.pack("<If", 42, 3.14)). memoryview lets you read a large bytes object without copying. mmap exposes a file as if it were a bytearray in memory, which is dramatically faster than read/seek on large files.

Streaming and binary

Line streaming: for line in f: yields one line at a time. Combine with comprehensions for transformations: (l.strip() for l in f if not l.startswith("#")).

Binary: always open with "rb"/"wb". Read fixed records with f.read(N). Use struct to decode well-defined layouts; use int.from_bytes for simple integers.

Safe writes and mmap

Atomic write: tmp = target.with_suffix(target.suffix + ".tmp"); tmp.write_text(content); os.replace(tmp, target). Guarantees readers never see a half-written file.

mmap: with open(path, "rb") as f: mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ). Search, slice, or iterate mm like bytes with near-instant seeks.

Advanced file tools.

Tool	Purpose
`open(p, mode)` built-in	Text or binary file object.
`Path.read_bytes` method	Read a whole binary file.
`os.replace(src, dst)` function	Atomic rename that overwrites dst.
`struct.pack/unpack` module	Fixed-layout binary records.
`mmap` module	Memory-mapped files.
`io.BytesIO / io.StringIO` class	In-memory file-like objects.
`shutil.copyfileobj` function	Stream bytes between files.
`gzip.open` function	Transparent compressed read/write.

Performing Advanced File Tasks code example

The script streams a big text file line-by-line, does an atomic replace, and reads a binary record with struct.

# Lesson: Performing Advanced File Tasks
import os
import struct
from pathlib import Path
from tempfile import TemporaryDirectory


with TemporaryDirectory() as tmp:
    root = Path(tmp)

    # 1) Build a large-ish file
    big = root / "big.txt"
    with open(big, "w", encoding="utf-8") as f:
        for i in range(1, 1001):
            f.write(f"line {i}\n")

    # 2) Stream count lines that contain a '5'
    with open(big, "r", encoding="utf-8") as f:
        total = sum(1 for line in f if "5" in line)
    print("lines containing '5':", total)

    # 3) Atomic replace
    target = root / "config.ini"
    target.write_text("old=1\n", encoding="utf-8")
    tmp_target = target.with_suffix(".ini.tmp")
    tmp_target.write_text("new=2\n", encoding="utf-8")
    os.replace(tmp_target, target)
    print("config now:", target.read_text(encoding="utf-8").strip())

    # 4) Binary record with struct
    #    Format: < little-endian, I unsigned int, f float, 4s 4-byte string
    rec = root / "rec.bin"
    payload = struct.pack("<If4s", 42, 3.14, b"ABCD")
    rec.write_bytes(payload)

    data = rec.read_bytes()
    idx, val, tag = struct.unpack("<If4s", data)
    print(f"record idx={idx} val={val:.2f} tag={tag.decode()}")
    print("record size:", struct.calcsize("<If4s"), "bytes")

    # 5) Copy the file efficiently with copyfileobj
    import shutil
    copy = root / "big.copy.txt"
    with open(big, "rb") as src, open(copy, "wb") as dst:
        shutil.copyfileobj(src, dst, length=64 * 1024)
    print("copy size:", copy.stat().st_size, "bytes")

Trace each block:

1) Streaming `for line in f:` scales to huge files.
2) `os.replace(tmp, target)` makes the switch atomic.
3) `struct.pack`/`unpack` encodes a fixed binary record.
4) `shutil.copyfileobj` copies a stream in chunks without loading it all.

Strip blank lines from a file without materializing it.

from pathlib import Path
from tempfile import NamedTemporaryFile

src = NamedTemporaryFile("w", delete=False, encoding="utf-8")
src.write("a\n\nb\n\n\nc\n")
src.close()

dst = Path(src.name + ".clean")
with open(src.name) as i, open(dst, "w", encoding="utf-8") as o:
    for line in i:
        if line.strip():
            o.write(line)

print(dst.read_text(encoding="utf-8"))

Binary round-trip for struct.

import struct
fmt = "<Iif"
raw = struct.pack(fmt, 10, -3, 1.5)
a, b, c = struct.unpack(fmt, raw)
assert a == 10 and b == -3 and abs(c - 1.5) < 1e-6

Running prints:

lines containing '5': 271
config now: new=2
record idx=42 val=3.14 tag=ABCD
record size: 12 bytes
copy size: 8893 bytes

Streaming and binary

Safe writes and mmap

Performing Advanced File Tasks code example

Related Resources