Reading a file is where many programs start. The challenge is rarely the single call to open() but choosing the right shape for the data: one big string, a list of lines, or a stream of records. Each style has a specific cost profile, and picking the wrong one is the most common cause of memory spikes on large files.
For small files (a few MB) the read-all style is the simplest: text = path.read_text(encoding="utf-8"). Split it with text.splitlines() and process the list. It is fast, easy to reason about, and fine for configuration, user-written text and small exports.
For files that might be large or unbounded — logs, CSV exports, streamed data — prefer the streaming style: with open(p, "r", encoding="utf-8") as f: for line in f: .... The file object is an iterator that yields one line at a time, so memory usage stays constant regardless of file size. This is the default you should reach for when in doubt.
For structured data, Python ships domain-specific readers: csv for spreadsheets, json for JSON, configparser for INI, and tomllib (Python 3.11+) for TOML. Each one wraps a text file and gives you back native Python objects (lists, dicts). Using them is almost always preferable to parsing by hand.
Streaming vs read-all
f.readline() reads exactly one line (including the trailing newline). f.readlines() reads the whole file as a list of lines — same memory cost as read(). Iterating the file directly is the usual idiom: for line in f: with line.rstrip() to drop the newline as you go.
f.read(size) reads at most size bytes (or characters in text mode) and returns an empty string at end-of-file. Handy for binary data or when you need to parse fixed-size records.
Structured readers
csv.reader(f) yields one list per row, handling quoted fields and embedded commas. csv.DictReader(f) uses the first row as headers and yields a dict per row — much easier to read six months later. For JSON, json.load(f) reads a whole document; for large JSONL (one object per line), iterate and call json.loads(line) per line.
Always open CSV and JSON files with encoding="utf-8" and, for CSV, newline="". Those two arguments avoid the well-known round-trip bugs on Windows caused by mismatched line endings and codepages.
The four reading patterns you will use constantly.
| Tool | Purpose |
|---|---|
Path.read_text()method | Reads an entire small text file. |
file.readlines()method | Reads every line into a list. |
for line in fpattern | Streams lines one at a time. |
csv.DictReaderclass | Yields one dict per CSV row. |
json.load(f)function | Parses a JSON document from a file. |
configparsermodule | Reads INI-style configuration files. |
tomllibmodule (3.11+) | Reads TOML files, returning a dict. |
open(p, 'rb')built-in | Opens a file in binary mode for bytes. |
Reading Data from Files code example
The script below writes a tiny CSV and a tiny JSON file into a temp folder, then reads them back using every pattern from the table.
# Lesson: Reading Data from Files
import csv
import json
from pathlib import Path
from tempfile import gettempdir
root = Path(gettempdir())
csv_path = root / "people.csv"
json_path = root / "meta.json"
csv_path.write_text("name,age\nana,30\nben,25\n", encoding="utf-8")
json_path.write_text('{"version": 1, "active": true}', encoding="utf-8")
# Read-all
print("all text:", csv_path.read_text(encoding="utf-8").splitlines())
# Streaming
with open(csv_path, "r", encoding="utf-8", newline="") as f:
for i, line in enumerate(f, start=1):
print(f" raw L{i}: {line.rstrip()}")
# CSV DictReader
with open(csv_path, "r", encoding="utf-8", newline="") as f:
for row in csv.DictReader(f):
print(f" dict: {row}")
# JSON whole-file
with open(json_path, "r", encoding="utf-8") as f:
meta = json.load(f)
print("meta: ", meta)
# Partial reads via read(n)
with open(csv_path, "r", encoding="utf-8") as f:
head = f.read(10)
tail = f.read()
print("head:", repr(head))
print("tail:", repr(tail))
csv_path.unlink(); json_path.unlink()
Compare each reading style against the others:
1) `read_text()` loads everything in memory; great for small files.
2) Streaming with `for line in f` keeps memory constant.
3) `csv.DictReader` turns rows into dicts using the header row.
4) `json.load(f)` reads a whole document; use `json.loads` for line-by-line JSONL.
Practice reading a file in two shapes.
from pathlib import Path
from tempfile import gettempdir
p = Path(gettempdir()) / "nums.txt"
p.write_text("1\n2\n3\n4\n", encoding="utf-8")
# Example A: sum of integers from a file, streaming
with open(p, "r", encoding="utf-8") as f:
total = sum(int(line) for line in f if line.strip())
print("total:", total)
# Example B: read into a list (small file only)
nums = [int(x) for x in p.read_text(encoding="utf-8").split()]
print("nums: ", nums)
p.unlink()
Assertions you can run without touching the disk.
import json
assert json.loads('[1,2,3]') == [1, 2, 3]
assert "a,b,c".split(",") == ["a", "b", "c"]
assert "\n".join(["a", "b"]).splitlines() == ["a", "b"]
assert [int(x) for x in "1 2 3".split()] == [1, 2, 3]
The script prints roughly:
all text: ['name,age', 'ana,30', 'ben,25']
raw L1: name,age
raw L2: ana,30
raw L3: ben,25
dict: {'name': 'ana', 'age': '30'}
dict: {'name': 'ben', 'age': '25'}
meta: {'version': 1, 'active': True}
head: 'name,age\n'
tail: 'ana,30\nben,25\n'