docs/Deep Learning
Advanced·8 min read

Generative Models

Learning to sample from data distributions — VAEs, GANs, and diffusion.

Variational Autoencoders

An encoder maps inputs to a distribution q(z|x) over a latent space; a decoder reconstructs x from z. Trained with the ELBO:

ELBO = E[log p(x|z)] − KL(q(z|x) ‖ p(z))

Reconstruction term + a KL regularizer that keeps the latent space well-structured. Sampling is cheap; quality is modest.

Generative Adversarial Networks

Two networks play a minimax game:

  • A generator G maps noise to fake samples.
  • A discriminator D tries to distinguish real from fake.
min_G max_D  E[log D(x)] + E[log(1 − D(G(z)))]

Beautiful samples when training works. Notoriously hard to train.

Diffusion Models

Define a forward process that gradually adds Gaussian noise to data over T steps. Train a network to reverse it — predict the noise that was added at each step:

L = E[ ‖ ε − ε_θ(xₜ, t) ‖² ]

Sample by starting from noise and iteratively denoising. Powers Stable Diffusion, DALL·E, Sora, and most modern image/video generators.

Diffusion won because it's stable to train, scales gracefully, and unifies easily with conditioning (text, images, depth).