Foundations·10 min read

Neural Networks

From a single perceptron to deep multilayer networks — the building blocks of modern deep learning.

The Perceptron

A perceptron is the simplest neural unit. Given input vector x, weights w, and bias b, it computes:

y = φ(w · x + b)

where φ is a non-linear activation function. Stack many of these and you get a network capable of approximating arbitrarily complex functions.

Multilayer Perceptrons (MLPs)

An MLP chains layers of perceptrons. Each layer transforms its input into a new representation:

h₁ = φ(W₁ x + b₁)
h₂ = φ(W₂ h₁ + b₂)
ŷ  = W₃ h₂ + b₃

The depth gives hierarchical features; the width gives capacity.

Universal Approximation

A feedforward network with a single hidden layer and a non-polynomial activation can approximate any continuous function on a compact subset of ℝⁿ to arbitrary accuracy — given enough neurons.

In practice, depth is exponentially more efficient than width for representing complex functions, which is why "deep" learning works.

A Minimal Example

import torch.nn as nn

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 128),
    nn.ReLU(),
    nn.Linear(128, 10),
)

That's a 3-layer MLP for MNIST — about 235k parameters and good for ~98% accuracy.