Advanced·8 min read
Generative Models
Learning to sample from data distributions — VAEs, GANs, and diffusion.
Variational Autoencoders
An encoder maps inputs to a distribution q(z|x) over a latent space; a decoder reconstructs x from z. Trained with the ELBO:
ELBO = E[log p(x|z)] − KL(q(z|x) ‖ p(z))Reconstruction term + a KL regularizer that keeps the latent space well-structured. Sampling is cheap; quality is modest.
Generative Adversarial Networks
Two networks play a minimax game:
- A generator
Gmaps noise to fake samples. - A discriminator
Dtries to distinguish real from fake.
min_G max_D E[log D(x)] + E[log(1 − D(G(z)))]Beautiful samples when training works. Notoriously hard to train.
Diffusion Models
Define a forward process that gradually adds Gaussian noise to data over T steps. Train a network to reverse it — predict the noise that was added at each step:
L = E[ ‖ ε − ε_θ(xₜ, t) ‖² ]Sample by starting from noise and iteratively denoising. Powers Stable Diffusion, DALL·E, Sora, and most modern image/video generators.
Diffusion won because it's stable to train, scales gracefully, and unifies easily with conditioning (text, images, depth).