Beyond the basics of open and read, real programs need to handle large files, binary files, files that must be replaced atomically, and files you iterate line-by-line over a network. Python's file API covers all of these, once you know which tools to combine. This lesson walks through the most useful advanced patterns.
For large files, never use f.read(). Iterate line-by-line: for line in f:. Each line is read lazily, so memory stays flat even on multi-gigabyte files. For binary streams with fixed record sizes, f.read(n) in a loop is the equivalent. Combine with generators to build pipelines that never materialize the whole file.
For atomic replacement (“write the new file or leave the old one untouched”), write to a temporary file next to the target and os.replace(tmp, target). os.replace is atomic on every POSIX system and on Windows. Never open the real file for writing until your new content is fully produced; a crash mid-write otherwise leaves a corrupt half-file.
For structured binary formats (images, archives, floats, network packets), struct packs and unpacks fixed-layout records (struct.pack("<If", 42, 3.14)). memoryview lets you read a large bytes object without copying. mmap exposes a file as if it were a bytearray in memory, which is dramatically faster than read/seek on large files.
Streaming and binary
Line streaming: for line in f: yields one line at a time. Combine with comprehensions for transformations: (l.strip() for l in f if not l.startswith("#")).
Binary: always open with "rb"/"wb". Read fixed records with f.read(N). Use struct to decode well-defined layouts; use int.from_bytes for simple integers.
Safe writes and mmap
Atomic write: tmp = target.with_suffix(target.suffix + ".tmp"); tmp.write_text(content); os.replace(tmp, target). Guarantees readers never see a half-written file.
mmap: with open(path, "rb") as f: mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ). Search, slice, or iterate mm like bytes with near-instant seeks.
Advanced file tools.
| Tool | Purpose |
|---|---|
open(p, mode)built-in | Text or binary file object. |
Path.read_bytesmethod | Read a whole binary file. |
os.replace(src, dst)function | Atomic rename that overwrites dst. |
struct.pack/unpackmodule | Fixed-layout binary records. |
mmapmodule | Memory-mapped files. |
io.BytesIO / io.StringIOclass | In-memory file-like objects. |
shutil.copyfileobjfunction | Stream bytes between files. |
gzip.openfunction | Transparent compressed read/write. |
Performing Advanced File Tasks code example
The script streams a big text file line-by-line, does an atomic replace, and reads a binary record with struct.
# Lesson: Performing Advanced File Tasks
import os
import struct
from pathlib import Path
from tempfile import TemporaryDirectory
with TemporaryDirectory() as tmp:
root = Path(tmp)
# 1) Build a large-ish file
big = root / "big.txt"
with open(big, "w", encoding="utf-8") as f:
for i in range(1, 1001):
f.write(f"line {i}\n")
# 2) Stream count lines that contain a '5'
with open(big, "r", encoding="utf-8") as f:
total = sum(1 for line in f if "5" in line)
print("lines containing '5':", total)
# 3) Atomic replace
target = root / "config.ini"
target.write_text("old=1\n", encoding="utf-8")
tmp_target = target.with_suffix(".ini.tmp")
tmp_target.write_text("new=2\n", encoding="utf-8")
os.replace(tmp_target, target)
print("config now:", target.read_text(encoding="utf-8").strip())
# 4) Binary record with struct
# Format: < little-endian, I unsigned int, f float, 4s 4-byte string
rec = root / "rec.bin"
payload = struct.pack("<If4s", 42, 3.14, b"ABCD")
rec.write_bytes(payload)
data = rec.read_bytes()
idx, val, tag = struct.unpack("<If4s", data)
print(f"record idx={idx} val={val:.2f} tag={tag.decode()}")
print("record size:", struct.calcsize("<If4s"), "bytes")
# 5) Copy the file efficiently with copyfileobj
import shutil
copy = root / "big.copy.txt"
with open(big, "rb") as src, open(copy, "wb") as dst:
shutil.copyfileobj(src, dst, length=64 * 1024)
print("copy size:", copy.stat().st_size, "bytes")
Trace each block:
1) Streaming `for line in f:` scales to huge files.
2) `os.replace(tmp, target)` makes the switch atomic.
3) `struct.pack`/`unpack` encodes a fixed binary record.
4) `shutil.copyfileobj` copies a stream in chunks without loading it all.
Strip blank lines from a file without materializing it.
from pathlib import Path
from tempfile import NamedTemporaryFile
src = NamedTemporaryFile("w", delete=False, encoding="utf-8")
src.write("a\n\nb\n\n\nc\n")
src.close()
dst = Path(src.name + ".clean")
with open(src.name) as i, open(dst, "w", encoding="utf-8") as o:
for line in i:
if line.strip():
o.write(line)
print(dst.read_text(encoding="utf-8"))
Binary round-trip for struct.
import struct
fmt = "<Iif"
raw = struct.pack(fmt, 10, -3, 1.5)
a, b, c = struct.unpack(fmt, raw)
assert a == 10 and b == -3 and abs(c - 1.5) < 1e-6
Running prints:
lines containing '5': 271
config now: new=2
record idx=42 val=3.14 tag=ABCD
record size: 12 bytes
copy size: 8893 bytes