String Performance Tips
Deep dive · part of Python Strings
Strings are immutable, so repeated concatenation in a loop allocates new objects each time. Build up a list and join it once, or use io.StringIO for very large assembly. Avoid += on strings inside hot loops.
str is immutable: s += chunk allocates a new string each time in a loop. For thousands of concatenations, collect fragments in a list and join once, or write into io.StringIO. Hot paths in logging and template assembly hit this often.
Performance work belongs after profiling—many scripts never concatenate enough to matter. When they do, the join idiom is orders of magnitude faster than += in micro-benchmarks.
Production code combines this topic with logging, tests, and clear module boundaries so refactors stay safe when requirements grow.
''.join(parts) is O(total length) with one allocation when parts is a list.
List comprehension collecting str(i) beats repeated += in loops.
StringIO.getvalue() builds the final str from a buffer for many small writes.
f-strings format once; repeated % or .format in tight loops may cost more.
Decode bytes once at the boundary; avoid encode/decode in inner loops.
re.sub with callbacks on huge texts may need chunking or regex compile once.
For building HTML or SQL, still prefer template engines or parameterized queries over manual join—speed is worthless if you introduce injection bugs. bytes concatenation has the same immutability issue; join on list of bytes or bytearray for binary protocols.
International text: normalize once (NFC) before using as dict keys, not per comparison in loops.
Logging with %s defers expensive str() until level passes—never f-string huge objects in debug lines unconditionally.
Defer formatting in logging with %-style args so filtered DEBUG lines skip huge str().
Read the parent tutorial on pythondeck.com for runnable snippets, then reproduce them locally in a virtual environment with pinned dependency versions matching your deployment target.
When pairing with teammates, agree on one idiomatic pattern per concern—mixed styles in one repo slow reviews and invite subtle integration bugs during merges.
Using += on str in loops over millions of rows.
Calling str() on objects in join list when a single f-string pass suffices.
Re-decoding UTF-8 bytes inside inner loops instead of keeping str.
Premature StringIO when join on a list comprehension is simpler.
Profile with timeit on realistic N before rewriting working code.
Preallocate list size only when measurable (often unnecessary).
Keep human-readable f-strings unless profiling proves otherwise.
Separate text assembly from I/O—buffer writes to files in chunks.
Re-read the examples below with these ideas in mind; change variable names and inputs to match your own project.
The program below demonstrates bad vs good concat. Read the comments on each line, run the code, then change names or values to see how the output shifts.
# Example: Bad vs good concat
# Run in the REPL or save as a .py file and execute with python.
import time
N = 50_000
t0 = time.perf_counter()
s = ""
for i in range(N):
s += str(i)
print("+= :", round(time.perf_counter()-t0, 3), "s")
t0 = time.perf_counter()
parts = [str(i) for i in range(N)]
s = "".join(parts)
print("join:", round(time.perf_counter()-t0, 3), "s")
This sample walks through stringio in a small, runnable script. Paste it into the REPL or save it as a .py file before you continue to the next block.
# Example: StringIO
# Run in the REPL or save as a .py file and execute with python.
from io import StringIO
buf = StringIO()
for i in range(1000):
buf.write(f"line {i}\n")
text = buf.getvalue()
print(len(text), "chars")
Here is a hands-on illustration of avoid re-encoding. Follow the inline comments first; only then execute the snippet and compare the result with what you expected.
# Example: Avoid re-encoding
# Run in the REPL or save as a .py file and execute with python.
data = b"\xc3\xa9clair"
text = data.decode("utf-8") # decode ONCE
# downstream code keeps using `text`, not `data`
The program below demonstrates join not plus. Read the comments on each line, run the code, then change names or values to see how the output shifts.
# Repeated += on strings allocates many temporaries
import time # timing
N = 5000 # loop size (smaller for demo speed)
t0 = time.perf_counter() # start
s = "" # empty
for i in range(N): # bad pattern
s += str(i) # realloc each time
print("plus", round(time.perf_counter() - t0, 4)) # slow
t0 = time.perf_counter() # restart
parts = [str(i) for i in range(N)] # list of pieces
s2 = "".join(parts) # one allocation
print("join", round(time.perf_counter() - t0, 4), len(s2)) # faster
This sample walks through stringio build in a small, runnable script. Paste it into the REPL or save it as a .py file before you continue to the next block.
# io.StringIO accumulates large text like a file buffer
from io import StringIO # in-memory text buffer
buf = StringIO() # empty buffer
for i in range(1000): # many lines
buf.write(f"line {i}\n") # append without reallocating whole str
text = buf.getvalue() # materialize final string
print(len(text), text.splitlines()[0]) # size + first line
print(text.endswith("\n")) # last line ends with newline
Related deep dives on Python Strings: