Python Read Files

Tutorial 35 of 65 · pythondeck.com Python course

To read text efficiently, iterate the file object line by line instead of loading the whole file. Use read() for the full content, readlines() for a list of lines or splitlines() to strip newlines.

Reading is the first half of most I/O workflows: ingest configs, parse CSV, load models. Choose the API for your shape of data—whole file, line by line, or fixed-size chunks. Text decoding errors (UnicodeDecodeError) mean the encoding assumption was wrong, not that the file is "corrupt".

For structured formats, use specialised modules (json, csv, configparser) instead of hand-parsing lines when possible.

read(), readline(), and iteration for line in f:.

Path.read_text / read_bytes for small files in one call.

Chunked reads in binary mode: while chunk := f.read(8192):.

errors="replace" or "ignore" when lossy recovery is acceptable.

CSV: csv.reader / DictReader; JSON: json.load.

Existence checks: Path.exists, is_file before open.

Iterating lines preserves newlines unless you strip; .strip() removes whitespace but can break fixed-width formats. Use splitlines() on strings when data is already in memory.

Network filesystems and cloud mounts may raise transient errors—retry with backoff for batch jobs. Local SSD reads are usually fast enough that micro-optimising buffer size rarely matters before profiling.

When reading user-provided paths, validate against directory traversal (resolve paths, ensure result stays under an allowed base).

Using read() on multi-gigabyte files without chunking.

Default locale encoding on Windows causing mojibake for UTF-8 files.

Parsing CSV with split(",") instead of the csv module.

Not handling FileNotFoundError with a clear message.

Following symlinks outside intended directories when paths come from users.

Iterate large text files line by line; use binary chunks for binary data.

Open with encoding="utf-8" and explicit error handling policy.

Use csv and json modules for structured formats.

Validate and normalise paths with Path.resolve() against a base directory.

Re-read the examples below with these ideas in mind; change variable names and inputs to match your own project.

The program below demonstrates line by line. Read the comments on each line, run the code, then change names or values to see how the output shifts.

# Example: Line by line
# Run in the REPL or save as a .py file and execute with python.
with open("big.log") as f:
    for i, line in enumerate(f, 1):
        if "ERROR" in line:
            print(i, line.rstrip())

This sample walks through read all in a small, runnable script. Paste it into the REPL or save it as a .py file before you continue to the next block.

# Example: Read all
# Run in the REPL or save as a .py file and execute with python.
with open("notes.md") as f:
    text = f.read()
print(len(text), "chars,", text.count("\n"), "lines")

Here is a hands-on illustration of csv. Follow the inline comments first; only then execute the snippet and compare the result with what you expected.

# Example: CSV
# Run in the REPL or save as a .py file and execute with python.
import csv
with open("data.csv", newline="") as f:
    for row in csv.DictReader(f):
        print(row["name"], row["score"])

The program below demonstrates read lines. Read the comments on each line, run the code, then change names or values to see how the output shifts.

# open() returns a file object — always specify encoding for text
path = "data/notes.txt"  # sample path
with open(path, encoding="utf-8") as fh:  # context manager closes file
    for line in fh:  # iterate without loading whole file
        print(line.rstrip())  # strip trailing newline
with open(path, encoding="utf-8") as fh:  # read all at once
    text = fh.read()  # entire contents
print(len(text), text[:5])  # size + prefix
with open(path, encoding="utf-8") as fh:  # read fixed-size chunks
    chunk = fh.read(5)  # first five characters
print(repr(chunk))  # quoted view

This sample walks through csv dict rows in a small, runnable script. Paste it into the REPL or save it as a .py file before you continue to the next block.

# csv.DictReader maps columns to field names
import csv, io  # csv + in-memory file
buffer = io.StringIO("name,score\nAda,99\nGrace,97\n")  # fake CSV file
reader = csv.DictReader(buffer)  # row dicts keyed by header
for row in reader:  # each row is dict
    print(row["name"], row["score"])  # column access
buffer.seek(0)  # rewind for second pass
rows = list(csv.reader(buffer))  # list of lists
header, *body = rows  # unpack header vs data
print(header, body)  # show structure

« Python File Handling All tutorials Python Write Files »