Foundations·10 min read
Neural Networks
From a single perceptron to deep multilayer networks — the building blocks of modern deep learning.
The Perceptron
A perceptron is the simplest neural unit. Given input vector x, weights w, and bias b, it computes:
y = φ(w · x + b)where φ is a non-linear activation function. Stack many of these and you get a network capable of approximating arbitrarily complex functions.
Multilayer Perceptrons (MLPs)
An MLP chains layers of perceptrons. Each layer transforms its input into a new representation:
h₁ = φ(W₁ x + b₁)
h₂ = φ(W₂ h₁ + b₂)
ŷ = W₃ h₂ + b₃The depth gives hierarchical features; the width gives capacity.
Universal Approximation
A feedforward network with a single hidden layer and a non-polynomial activation can approximate any continuous function on a compact subset of ℝⁿ to arbitrary accuracy — given enough neurons.
In practice, depth is exponentially more efficient than width for representing complex functions, which is why "deep" learning works.
A Minimal Example
import torch.nn as nn
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 10),
)That's a 3-layer MLP for MNIST — about 235k parameters and good for ~98% accuracy.