Tabular data — rows and named columns — is the most common shape for real-world information: spreadsheet exports, database results, log files, survey responses. Python's csv module handles the common cases out of the box, and pandas handles the rest when you outgrow it. For most scripting, csv is fast, dependency-free, and good enough.
The two main entry points are csv.reader (yields each row as a list) and csv.DictReader (uses the first row as headers and yields a dict per row). DictReader is almost always more readable: row["name"] beats row[0] by a wide margin, especially six months after you wrote the code.
Writing works symmetrically with csv.writer / csv.DictWriter. Call writeheader() once, then writerow(dict) or writerows(list_of_dicts). Always open the underlying file with newline="" so the csv module can control line endings itself; failing to do so is the classic source of blank-line-every-other-row bugs on Windows.
CSV has no schema. Columns are strings; numbers, dates and booleans round-trip as text. If you need type awareness, parse explicitly (int(row["age"]), date.fromisoformat(row["dob"])) on the way in. For anything heavier than a few thousand rows or with serious mixed-type handling, use pandas.read_csv and its dtype parameter.
Reading with DictReader
Open with newline="" and encoding="utf-8"; always. Iterate the reader once — it is a streaming iterator. If the file is small and you need random access, materialize: rows = list(csv.DictReader(f)).
DictReader accepts a fieldnames= argument if the file has no header. Pass a list; it becomes the keys for every row.
Writing and dialects
csv.DictWriter(f, fieldnames=...) needs the list of fieldnames up front. Call writer.writeheader() first; then writerow/writerows. Unknown fields raise unless you pass extrasaction="ignore".
csv.excel is the default dialect (comma separator, \r\n lines, quoted on need). For tab-separated data, pass delimiter="\t" or dialect="excel-tab". For tricky cases, register a custom dialect with csv.register_dialect.
Tabular I/O tools.
| Tool | Purpose |
|---|---|
csv.readerfunction | Yields each row as a list of strings. |
csv.DictReaderclass | Yields each row as a dict keyed by headers. |
csv.writerfunction | Writes rows from iterables. |
csv.DictWriterclass | Writes dict rows with validated fieldnames. |
csv.register_dialectfunction | Define custom delimiter/quoting rules. |
csv.Snifferclass | Guesses delimiter and header presence. |
pandas.read_csvfunction | Rich typed CSV reader with schema control. |
open(p, newline="")built-in | Always pass newline='' for csv files. |
Managing Data in Tabular Format code example
The script writes a small CSV, reads it back as dicts, aggregates, and converts types explicitly.
# Lesson: Managing Data in Tabular Format
import csv
from collections import Counter
from pathlib import Path
from tempfile import gettempdir
path = Path(gettempdir()) / "people.csv"
rows = [
{"name": "ana", "age": 30, "city": "oslo"},
{"name": "ben", "age": 25, "city": "oslo"},
{"name": "cai", "age": 40, "city": "rome"},
]
# Write
with open(path, "w", encoding="utf-8", newline="") as f:
w = csv.DictWriter(f, fieldnames=["name", "age", "city"])
w.writeheader()
w.writerows(rows)
print("raw file:")
print(path.read_text(encoding="utf-8"))
# Read
with open(path, "r", encoding="utf-8", newline="") as f:
parsed = [
{"name": r["name"], "age": int(r["age"]), "city": r["city"]}
for r in csv.DictReader(f)
]
avg_age = sum(r["age"] for r in parsed) / len(parsed)
by_city = Counter(r["city"] for r in parsed)
over_30 = [r for r in parsed if r["age"] >= 30]
print("parsed rows:", parsed)
print("avg age :", round(avg_age, 1))
print("by city :", by_city.most_common())
print("over 30 :", [r["name"] for r in over_30])
path.unlink()
Read the script noting:
1) `newline=''` and `encoding='utf-8'` on both sides prevent the usual bugs.
2) `DictWriter.writeheader()` is required; it doesn't happen automatically.
3) Type conversion happens explicitly when reading; CSV is all strings otherwise.
4) Aggregations read naturally when each row is already a dict.
Practice schema coercion and filtering.
import csv
from io import StringIO
data = "item,qty,price\napple,3,1.20\nbread,1,3.50\n"
reader = csv.DictReader(StringIO(data))
lines = [
{"item": r["item"], "qty": int(r["qty"]), "price": float(r["price"])}
for r in reader
]
total = sum(r["qty"] * r["price"] for r in lines)
print(lines)
print("total:", round(total, 2))
Sanity checks for parsing.
import csv
from io import StringIO
rows = list(csv.DictReader(StringIO("a,b\n1,2\n3,4\n")))
assert rows == [{"a": "1", "b": "2"}, {"a": "3", "b": "4"}]
assert [int(r["a"]) for r in rows] == [1, 3]
Running prints:
raw file:
name,age,city
ana,30,oslo
ben,25,oslo
cai,40,rome
parsed rows: [{'name': 'ana', 'age': 30, 'city': 'oslo'}, {'name': 'ben', 'age': 25, 'city': 'oslo'}, {'name': 'cai', 'age': 40, 'city': 'rome'}]
avg age : 31.7
by city : [('oslo', 2), ('rome', 1)]
over 30 : ['ana', 'cai']