Replacing Text Based on Patterns

Finding patterns is only half the job; often you want to transform them. re.sub(pattern, replacement, text) replaces every match with either a literal string or the result of a function called with each match. It is the regex version of find-and-replace, with the full power of pattern matching for the “find” side.

The replacement string can reference capture groups with \1, \2, … (or \g<name> for named groups). That makes simple transforms like reformatting dates or redacting fields trivial: re.sub(r"(\d{4})-(\d{2})-(\d{2})", r"\3/\2/\1", text) flips ISO dates to day/month/year.

For anything beyond a literal with back-references, pass a function as the replacement. re.sub(pat, lambda m: m.group(0).upper(), text) uppercases every match; re.sub(pat, my_func, text) lets my_func see the full Match object and decide what to return. That is the idiomatic way to do complex rewrites without chaining three sub calls.

Two cousins are handy: re.subn returns a (new_text, n_replacements) tuple, useful when you want to know whether anything changed. re.split splits on a pattern, with optional captures preserved in the output. Combined with sub, they cover almost every structured text transformation.

Replacement strings and groups

Inside the replacement, \1, \2 refer to the first and second capture groups. \g<name> refers to a named group. Literal backslashes must be doubled, which is why raw strings (r"...") are essential: they stop Python from mangling the escapes.

If you need a literal backslash in the replacement, write r"\\". Python quotes the raw string; the regex engine then interprets one backslash.

Function replacements and count

When you pass a callable as the replacement, it receives the Match and must return a string. Inside you can look up groups, apply business logic, or even call another regex. This is the cleanest way to implement mini-rewriters like URL canonicalizers.

Pass count=N to re.sub to limit the number of replacements. Useful for “replace only the first occurrence” type requirements.

Pattern-based text transformation.

ToolPurpose
re.sub(p, repl, s)
function
Replace every match with repl.
re.subn(p, repl, s)
function
Same as sub, returns (new, count).
re.split(p, s)
function
Split on a pattern.
pattern.sub(repl, s)
method
Method form on a compiled pattern.
Match.group(name)
method
Access a named or numbered group.
Match.expand(template)
method
Expand a template using the match's groups.
re.escape(s)
function
Escape characters for literal matching.
re.IGNORECASE
flag
Case-insensitive matching.

Replacing Text Based on Patterns code example

The script redacts sensitive fields, reformats dates, and counts replacements using both string and callable replacements.

# Lesson: Replacing Text Based on Patterns
import re


text = (
    "Contact ana at a@example.com on 2026-04-21, then cai at c@x.org on 2026-04-22."
)

# Redact email addresses
redacted = re.sub(r"[\w.+-]+@[\w.-]+", "[redacted]", text)
print("redacted:", redacted)

# Reformat ISO dates (YYYY-MM-DD) to DD/MM/YYYY
flipped = re.sub(
    r"(\d{4})-(\d{2})-(\d{2})", r"\3/\2/\1", text
)
print("flipped :", flipped)

# Uppercase every word that starts with a capital
def upper_match(m: re.Match) -> str:
    return m.group(0).upper()

loud = re.sub(r"[A-Z]\w+", upper_match, text)
print("loud    :", loud)

# Redact with a function that keeps the domain
def mask_email(m: re.Match) -> str:
    addr = m.group(0)
    local, _, domain = addr.partition("@")
    return "***@" + domain

masked = re.sub(r"[\w.+-]+@[\w.-]+", mask_email, text)
print("masked  :", masked)

# subn: count how many replacements happened
cleaned, n = re.subn(r"\s{2,}", " ", "hello    world   !")
print("cleaned :", repr(cleaned), "| replacements:", n)

# split: turn structured text into fields
rows = re.split(r"\s*,\s*", "apple ,  banana,cherry   ,date")
print("split   :", rows)

Keep these four tricks in mind:

1) Use `\1`, `\2` in the replacement for capture-group references.
2) Pass a function when the replacement needs logic or conditionals.
3) `subn` returns the count of replacements if you need to know.
4) `split` with a pattern is the cleanest way to handle flexible separators.

Use sub to normalize phone numbers.

import re

text = "Call 1-800-555-0100 or (415) 555-0123 or +44 20 7946 0018."

def digits_only(m: re.Match) -> str:
    return re.sub(r"\D", "", m.group(0))

phones = re.findall(r"[\d+()\- ]{7,}", text)
print([digits_only(re.match(r".*", p)) for p in phones])

Tiny checks.

import re
assert re.sub(r"\s+", " ", "a  b   c") == "a b c"
assert re.subn(r"\d", "X", "a1b2c3") == ("aXbXcX", 3)
assert re.split(r",\s*", "a, b,  c") == ["a", "b", "c"]

Running prints:

redacted: Contact ana at [redacted] on 2026-04-21, then cai at [redacted] on 2026-04-22.
flipped : Contact ana at a@example.com on 21/04/2026, then cai at c@x.org on 22/04/2026.
loud    : CONTACT ana at a@example.com on 2026-04-21, then cai at c@x.org on 2026-04-22.
masked  : Contact ana at ***@example.com on 2026-04-21, then cai at ***@x.org on 2026-04-22.
cleaned : 'hello world !' | replacements: 2
split   : ['apple', 'banana', 'cherry', 'date']