Project·60–90 min build
Image Classifier with Transfer Learning
Fine-tune a pretrained ResNet-50 on a custom dataset, add proper augmentation and evaluation, and ship it behind a small inference API.
What you'll build
An end-to-end image classification pipeline: dataset loading with augmentation, transfer learning from a pretrained backbone, mixed-precision training, evaluation with a confusion matrix, and a minimal FastAPI inference endpoint.
Prerequisites
Make sure you're comfortable with these before starting:
- CNNs — convolutions, pooling, receptive fields, why ResNet uses skip connections
- Cross-entropy loss and softmax — and why we usually use
logits + CrossEntropyLossinstead of softmax + NLL - Backpropagation and the role of
.requires_grad/.zero_grad() - Transfer learning intuition — what early vs late conv layers learn
- Regularization — weight decay, label smoothing, data augmentation
- LR schedules — warmup + cosine decay, discriminative learning rates
- PyTorch basics —
Dataset,DataLoader,nn.Module, training loop,autocast+GradScaler
Warm-up exercises
- Train a 3-layer CNN on CIFAR-10 from scratch and reach >70% test accuracy.
- Take any pretrained ResNet, freeze it, and extract features for 100 images — verify the output shape is
(100, 2048). - Write a 20-line training loop with mixed precision and gradient accumulation, and confirm the loss decreases on a toy dataset.
- Build a confusion matrix by hand from raw predictions (no sklearn) for a 3-class problem.
Difficulty
Intermediate. Assumes you understand CNNs, cross-entropy, and PyTorch basics.
Stack
- PyTorch + torchvision (ResNet-50 weights)
- Albumentations for augmentation
- Weights & Biases for experiment tracking
- FastAPI for serving
Milestones
- Data pipeline. Build a
Datasetwith stratified train/val/test split. Add augmentations: random crop, horizontal flip, color jitter, RandAugment. - Model. Load
resnet50(weights="IMAGENET1K_V2"), replace the final FC layer withnn.Linear(2048, num_classes). - Two-phase training. Freeze the backbone and train the head for 3 epochs; then unfreeze and fine-tune with discriminative learning rates (1e-4 backbone, 1e-3 head).
- Regularization. Label smoothing 0.1, weight decay 1e-4, cosine LR schedule with warmup.
- Evaluation. Top-1/top-5 accuracy, per-class F1, confusion matrix, Grad-CAM visualizations for misclassified samples.
- Serve. Export to TorchScript, wrap in FastAPI with a
/predictendpoint that accepts an uploaded image.
Key code
import torch, torch.nn as nn
from torchvision.models import resnet50, ResNet50_Weights
model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
for p in model.parameters(): p.requires_grad = False
model.fc = nn.Linear(model.fc.in_features, num_classes)
opt = torch.optim.AdamW([
{"params": model.fc.parameters(), "lr": 1e-3},
], weight_decay=1e-4)
scaler = torch.cuda.amp.GradScaler()
for x, y in loader:
with torch.cuda.amp.autocast():
loss = nn.functional.cross_entropy(model(x), y, label_smoothing=0.1)
scaler.scale(loss).backward()
scaler.step(opt); scaler.update(); opt.zero_grad()Stretch goals
- Test-time augmentation (TTA) for +1–2% accuracy
- Knowledge distillation into a MobileNetV3 for edge deployment
- ONNX export and quantization to INT8