Python NumPy Basics

Tutorial 46 of 65 · pythondeck.com Python course

numpy provides the ndarray, a fast n-dimensional array, plus vectorised math, broadcasting and linear algebra. Operations on arrays are typically 10-100x faster than equivalent Python loops.

NumPy is the foundation of Python’s numeric stack: contiguous arrays, vectorized operations, and broadcasting replace slow Python loops. Almost every data science and ML library builds on ndarray semantics.

Grasping dtypes, shapes, and views versus copies prevents subtle bugs when you scale from toy arrays to gigabyte matrices.

ndarray — homogeneous n-dimensional data; shape, strides, and dtype define memory layout.

Vectorization — ufuncs apply element-wise work in C; prefer arr * 2 over Python loops.

Broadcasting — align trailing dimensions for arithmetic between different shapes without explicit tiling.

Indexing — slices often return views; fancy indexing returns copies—know which you mutate.

Linear algebra — np.dot, @, linalg.solve for small systems; use specialized libs for huge sparse problems.

Random and statistics — np.random.Generator (modern API) for reproducible draws.

Performance comes from locality: operations that touch contiguous memory win. Reshaping with reshape(-1, n) or ravel is cheap when possible; repeated append in loops forces reallocations. For missing data, np.nan propagates through ufuncs—pair with np.isnan masks or pandas later.

Interoperability matters: NumPy arrays exchange memory with C, Rust, and CUDA stacks when layouts match. Document expected dtype (float64 vs float32) in public APIs to avoid silent precision loss.

Using Python lists of lists for large numeric grids instead of preallocated arrays.

Assuming slice assignment never copies when advanced indexing is involved.

Comparing floats with == without tolerance after chained operations.

Ignoring dtype overflow (e.g., uint8 addition wrapping).

Preallocate output buffers with out= in hot loops when profiling shows allocation cost.

Fix random seeds in tests; use Generator per process in parallel jobs.

Name axes in higher dimensions mentally (rows, cols, channels) before broadcasting.

Fall back to pandas only when you need labeled columns, not for every small table.

Re-read the examples below with these ideas in mind; change variable names and inputs to match your own project.

The program below demonstrates array creation. Read the comments on each line, run the code, then change names or values to see how the output shifts.

# Example: Array creation
# Run in the REPL or save as a .py file and execute with python.
import numpy as np
a = np.array([1, 2, 3, 4])
b = np.zeros((2, 3))
c = np.arange(0, 1, 0.1)
print(a, a.dtype)
print(b)
print(c)

This sample walks through vectorised math in a small, runnable script. Paste it into the REPL or save it as a .py file before you continue to the next block.

# Example: Vectorised math
# Run in the REPL or save as a .py file and execute with python.
import numpy as np
x = np.linspace(0, 2*np.pi, 5)
print(np.sin(x))
print((x + 1) ** 2)

Here is a hands-on illustration of broadcasting. Follow the inline comments first; only then execute the snippet and compare the result with what you expected.

# Example: Broadcasting
# Run in the REPL or save as a .py file and execute with python.
import numpy as np
m = np.arange(12).reshape(3, 4)
row_mean = m.mean(axis=1, keepdims=True)
print(m - row_mean)

The program below demonstrates array creation. Read the comments on each line, run the code, then change names or values to see how the output shifts.

# NumPy ndarray stores homogeneous numeric data efficiently
import numpy as np  # ndarray library
a = np.array([1, 2, 3])  # 1-D from sequence
b = np.zeros((2, 3))  # 2x3 float zeros
print(a.shape, b.shape)  # dimensions
print(a.dtype, b.dtype)  # data types
print(a + 10)  # vectorized scalar add
print(b.mean())  # aggregate on array
c = np.arange(6).reshape(2, 3)  # reshape 1-D -> 2-D
print(c)  # [[0 1 2],[3 4 5]]

This sample walks through boolean indexing in a small, runnable script. Paste it into the REPL or save it as a .py file before you continue to the next block.

# Boolean masks select elements without explicit loops
import numpy as np  # numpy
data = np.array([3, -1, 0, 7, -4])  # signed ints
mask = data > 0  # True where positive
print(data[mask])  # filtered view
data[mask] *= 2  # multiply positives in-place
print(data)  # mutated array
idx = np.where(data < 0)  # indices of negatives
print(idx, data[idx])  # tuple of index array
print(np.clip(data, 0, 10))  # bound values

Continue with these focused follow-up lessons on Python NumPy Basics:

« Python Asyncio All tutorials Python Pandas Basics »