docs/Deep Learning
Project·60–90 min build

Image Classifier with Transfer Learning

Fine-tune a pretrained ResNet-50 on a custom dataset, add proper augmentation and evaluation, and ship it behind a small inference API.

What you'll build

An end-to-end image classification pipeline: dataset loading with augmentation, transfer learning from a pretrained backbone, mixed-precision training, evaluation with a confusion matrix, and a minimal FastAPI inference endpoint.

Prerequisites

Make sure you're comfortable with these before starting:

  • CNNs — convolutions, pooling, receptive fields, why ResNet uses skip connections
  • Cross-entropy loss and softmax — and why we usually use logits + CrossEntropyLoss instead of softmax + NLL
  • Backpropagation and the role of .requires_grad / .zero_grad()
  • Transfer learning intuition — what early vs late conv layers learn
  • Regularization — weight decay, label smoothing, data augmentation
  • LR schedules — warmup + cosine decay, discriminative learning rates
  • PyTorch basicsDataset, DataLoader, nn.Module, training loop, autocast + GradScaler

Warm-up exercises

  1. Train a 3-layer CNN on CIFAR-10 from scratch and reach >70% test accuracy.
  2. Take any pretrained ResNet, freeze it, and extract features for 100 images — verify the output shape is (100, 2048).
  3. Write a 20-line training loop with mixed precision and gradient accumulation, and confirm the loss decreases on a toy dataset.
  4. Build a confusion matrix by hand from raw predictions (no sklearn) for a 3-class problem.

Difficulty

Intermediate. Assumes you understand CNNs, cross-entropy, and PyTorch basics.

Stack

  • PyTorch + torchvision (ResNet-50 weights)
  • Albumentations for augmentation
  • Weights & Biases for experiment tracking
  • FastAPI for serving

Milestones

  1. Data pipeline. Build a Dataset with stratified train/val/test split. Add augmentations: random crop, horizontal flip, color jitter, RandAugment.
  2. Model. Load resnet50(weights="IMAGENET1K_V2"), replace the final FC layer with nn.Linear(2048, num_classes).
  3. Two-phase training. Freeze the backbone and train the head for 3 epochs; then unfreeze and fine-tune with discriminative learning rates (1e-4 backbone, 1e-3 head).
  4. Regularization. Label smoothing 0.1, weight decay 1e-4, cosine LR schedule with warmup.
  5. Evaluation. Top-1/top-5 accuracy, per-class F1, confusion matrix, Grad-CAM visualizations for misclassified samples.
  6. Serve. Export to TorchScript, wrap in FastAPI with a /predict endpoint that accepts an uploaded image.

Key code

import torch, torch.nn as nn
from torchvision.models import resnet50, ResNet50_Weights

model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
for p in model.parameters(): p.requires_grad = False
model.fc = nn.Linear(model.fc.in_features, num_classes)

opt = torch.optim.AdamW([
    {"params": model.fc.parameters(), "lr": 1e-3},
], weight_decay=1e-4)

scaler = torch.cuda.amp.GradScaler()
for x, y in loader:
    with torch.cuda.amp.autocast():
        loss = nn.functional.cross_entropy(model(x), y, label_smoothing=0.1)
    scaler.scale(loss).backward()
    scaler.step(opt); scaler.update(); opt.zero_grad()

Stretch goals

  • Test-time augmentation (TTA) for +1–2% accuracy
  • Knowledge distillation into a MobileNetV3 for edge deployment
  • ONNX export and quantization to INT8