ViT ablation lab

Skip attention or MLP sub-blocks of WinKawaks/vit-small-patch16-224 and watch the prediction shift.
Ablation control
Click any cell to skip that sub-block — the residual stream passes through it unchanged.
All blocks active
blockL0L1L2L3L4L5L6L7L8L9L10L11
Attn
MLP
activeskipped (identity passthrough)
Set up your ablation, pick a sample, then hit Run.
The full-model prediction is computed automatically and shown alongside the ablated one, so you can see exactly how much each block contributes.
Experiments to try
  • Skip just the attention of one mid layer (e.g. L5). Often barely affects the prediction — most layers are doing redundant work.
  • Skip all attention but keep every MLP. Tells you how much the patch-mixing actually matters vs the per-token computation.
  • Skip the last 2–3 layers entirely. Watch the model break — late layers do the heavy lifting for classification.
  • Skip layers 0 and 1. Often surprisingly resilient — patch embeddings + later layers can recover.
  • Skip only MLPs. The model becomes a pure attention-only network — usually drops in accuracy a lot more than the symmetric experiment.