Reading Data from Files

Reading a file is where many programs start. The challenge is rarely the single call to open() but choosing the right shape for the data: one big string, a list of lines, or a stream of records. Each style has a specific cost profile, and picking the wrong one is the most common cause of memory spikes on large files.

For small files (a few MB) the read-all style is the simplest: text = path.read_text(encoding="utf-8"). Split it with text.splitlines() and process the list. It is fast, easy to reason about, and fine for configuration, user-written text and small exports.

For files that might be large or unbounded — logs, CSV exports, streamed data — prefer the streaming style: with open(p, "r", encoding="utf-8") as f: for line in f: .... The file object is an iterator that yields one line at a time, so memory usage stays constant regardless of file size. This is the default you should reach for when in doubt.

For structured data, Python ships domain-specific readers: csv for spreadsheets, json for JSON, configparser for INI, and tomllib (Python 3.11+) for TOML. Each one wraps a text file and gives you back native Python objects (lists, dicts). Using them is almost always preferable to parsing by hand.

Streaming vs read-all

f.readline() reads exactly one line (including the trailing newline). f.readlines() reads the whole file as a list of lines — same memory cost as read(). Iterating the file directly is the usual idiom: for line in f: with line.rstrip() to drop the newline as you go.

f.read(size) reads at most size bytes (or characters in text mode) and returns an empty string at end-of-file. Handy for binary data or when you need to parse fixed-size records.

Structured readers

csv.reader(f) yields one list per row, handling quoted fields and embedded commas. csv.DictReader(f) uses the first row as headers and yields a dict per row — much easier to read six months later. For JSON, json.load(f) reads a whole document; for large JSONL (one object per line), iterate and call json.loads(line) per line.

Always open CSV and JSON files with encoding="utf-8" and, for CSV, newline="". Those two arguments avoid the well-known round-trip bugs on Windows caused by mismatched line endings and codepages.

The four reading patterns you will use constantly.

ToolPurpose
Path.read_text()
method
Reads an entire small text file.
file.readlines()
method
Reads every line into a list.
for line in f
pattern
Streams lines one at a time.
csv.DictReader
class
Yields one dict per CSV row.
json.load(f)
function
Parses a JSON document from a file.
configparser
module
Reads INI-style configuration files.
tomllib
module (3.11+)
Reads TOML files, returning a dict.
open(p, 'rb')
built-in
Opens a file in binary mode for bytes.

Reading Data from Files code example

The script below writes a tiny CSV and a tiny JSON file into a temp folder, then reads them back using every pattern from the table.

# Lesson: Reading Data from Files
import csv
import json
from pathlib import Path
from tempfile import gettempdir

root = Path(gettempdir())
csv_path = root / "people.csv"
json_path = root / "meta.json"

csv_path.write_text("name,age\nana,30\nben,25\n", encoding="utf-8")
json_path.write_text('{"version": 1, "active": true}', encoding="utf-8")

# Read-all
print("all text:", csv_path.read_text(encoding="utf-8").splitlines())

# Streaming
with open(csv_path, "r", encoding="utf-8", newline="") as f:
    for i, line in enumerate(f, start=1):
        print(f"  raw L{i}: {line.rstrip()}")

# CSV DictReader
with open(csv_path, "r", encoding="utf-8", newline="") as f:
    for row in csv.DictReader(f):
        print(f"  dict: {row}")

# JSON whole-file
with open(json_path, "r", encoding="utf-8") as f:
    meta = json.load(f)
print("meta:  ", meta)

# Partial reads via read(n)
with open(csv_path, "r", encoding="utf-8") as f:
    head = f.read(10)
    tail = f.read()
print("head:", repr(head))
print("tail:", repr(tail))

csv_path.unlink(); json_path.unlink()

Compare each reading style against the others:

1) `read_text()` loads everything in memory; great for small files.
2) Streaming with `for line in f` keeps memory constant.
3) `csv.DictReader` turns rows into dicts using the header row.
4) `json.load(f)` reads a whole document; use `json.loads` for line-by-line JSONL.

Practice reading a file in two shapes.

from pathlib import Path
from tempfile import gettempdir

p = Path(gettempdir()) / "nums.txt"
p.write_text("1\n2\n3\n4\n", encoding="utf-8")

# Example A: sum of integers from a file, streaming
with open(p, "r", encoding="utf-8") as f:
    total = sum(int(line) for line in f if line.strip())
print("total:", total)

# Example B: read into a list (small file only)
nums = [int(x) for x in p.read_text(encoding="utf-8").split()]
print("nums: ", nums)

p.unlink()

Assertions you can run without touching the disk.

import json
assert json.loads('[1,2,3]') == [1, 2, 3]
assert "a,b,c".split(",") == ["a", "b", "c"]
assert "\n".join(["a", "b"]).splitlines() == ["a", "b"]
assert [int(x) for x in "1 2 3".split()] == [1, 2, 3]

The script prints roughly:

all text: ['name,age', 'ana,30', 'ben,25']
  raw L1: name,age
  raw L2: ana,30
  raw L3: ben,25
  dict: {'name': 'ana', 'age': '30'}
  dict: {'name': 'ben', 'age': '25'}
meta:   {'version': 1, 'active': True}
head: 'name,age\n'
tail: 'ana,30\nben,25\n'