Managing Data in Tabular Format

Tabular data — rows and named columns — is the most common shape for real-world information: spreadsheet exports, database results, log files, survey responses. Python's csv module handles the common cases out of the box, and pandas handles the rest when you outgrow it. For most scripting, csv is fast, dependency-free, and good enough.

The two main entry points are csv.reader (yields each row as a list) and csv.DictReader (uses the first row as headers and yields a dict per row). DictReader is almost always more readable: row["name"] beats row[0] by a wide margin, especially six months after you wrote the code.

Writing works symmetrically with csv.writer / csv.DictWriter. Call writeheader() once, then writerow(dict) or writerows(list_of_dicts). Always open the underlying file with newline="" so the csv module can control line endings itself; failing to do so is the classic source of blank-line-every-other-row bugs on Windows.

CSV has no schema. Columns are strings; numbers, dates and booleans round-trip as text. If you need type awareness, parse explicitly (int(row["age"]), date.fromisoformat(row["dob"])) on the way in. For anything heavier than a few thousand rows or with serious mixed-type handling, use pandas.read_csv and its dtype parameter.

Reading with DictReader

Open with newline="" and encoding="utf-8"; always. Iterate the reader once — it is a streaming iterator. If the file is small and you need random access, materialize: rows = list(csv.DictReader(f)).

DictReader accepts a fieldnames= argument if the file has no header. Pass a list; it becomes the keys for every row.

Writing and dialects

csv.DictWriter(f, fieldnames=...) needs the list of fieldnames up front. Call writer.writeheader() first; then writerow/writerows. Unknown fields raise unless you pass extrasaction="ignore".

csv.excel is the default dialect (comma separator, \r\n lines, quoted on need). For tab-separated data, pass delimiter="\t" or dialect="excel-tab". For tricky cases, register a custom dialect with csv.register_dialect.

Tabular I/O tools.

ToolPurpose
csv.reader
function
Yields each row as a list of strings.
csv.DictReader
class
Yields each row as a dict keyed by headers.
csv.writer
function
Writes rows from iterables.
csv.DictWriter
class
Writes dict rows with validated fieldnames.
csv.register_dialect
function
Define custom delimiter/quoting rules.
csv.Sniffer
class
Guesses delimiter and header presence.
pandas.read_csv
function
Rich typed CSV reader with schema control.
open(p, newline="")
built-in
Always pass newline='' for csv files.

Managing Data in Tabular Format code example

The script writes a small CSV, reads it back as dicts, aggregates, and converts types explicitly.

# Lesson: Managing Data in Tabular Format
import csv
from collections import Counter
from pathlib import Path
from tempfile import gettempdir


path = Path(gettempdir()) / "people.csv"

rows = [
    {"name": "ana", "age": 30, "city": "oslo"},
    {"name": "ben", "age": 25, "city": "oslo"},
    {"name": "cai", "age": 40, "city": "rome"},
]

# Write
with open(path, "w", encoding="utf-8", newline="") as f:
    w = csv.DictWriter(f, fieldnames=["name", "age", "city"])
    w.writeheader()
    w.writerows(rows)

print("raw file:")
print(path.read_text(encoding="utf-8"))

# Read
with open(path, "r", encoding="utf-8", newline="") as f:
    parsed = [
        {"name": r["name"], "age": int(r["age"]), "city": r["city"]}
        for r in csv.DictReader(f)
    ]

avg_age = sum(r["age"] for r in parsed) / len(parsed)
by_city = Counter(r["city"] for r in parsed)
over_30 = [r for r in parsed if r["age"] >= 30]

print("parsed rows:", parsed)
print("avg age   :", round(avg_age, 1))
print("by city   :", by_city.most_common())
print("over 30   :", [r["name"] for r in over_30])

path.unlink()

Read the script noting:

1) `newline=''` and `encoding='utf-8'` on both sides prevent the usual bugs.
2) `DictWriter.writeheader()` is required; it doesn't happen automatically.
3) Type conversion happens explicitly when reading; CSV is all strings otherwise.
4) Aggregations read naturally when each row is already a dict.

Practice schema coercion and filtering.

import csv
from io import StringIO

data = "item,qty,price\napple,3,1.20\nbread,1,3.50\n"
reader = csv.DictReader(StringIO(data))

lines = [
    {"item": r["item"], "qty": int(r["qty"]), "price": float(r["price"])}
    for r in reader
]
total = sum(r["qty"] * r["price"] for r in lines)
print(lines)
print("total:", round(total, 2))

Sanity checks for parsing.

import csv
from io import StringIO
rows = list(csv.DictReader(StringIO("a,b\n1,2\n3,4\n")))
assert rows == [{"a": "1", "b": "2"}, {"a": "3", "b": "4"}]
assert [int(r["a"]) for r in rows] == [1, 3]

Running prints:

raw file:
name,age,city
ana,30,oslo
ben,25,oslo
cai,40,rome

parsed rows: [{'name': 'ana', 'age': 30, 'city': 'oslo'}, {'name': 'ben', 'age': 25, 'city': 'oslo'}, {'name': 'cai', 'age': 40, 'city': 'rome'}]
avg age    : 31.7
by city    : [('oslo', 2), ('rome', 1)]
over 30    : ['ana', 'cai']