ViT ablation lab

Ablation control

Click any cell to skip that sub-block — the residual stream passes through it unchanged.

All blocks active

block	L0	L1	L2	L3	L4	L5	L6	L7	L8	L9	L10	L11
Attn
MLP

activeskipped (identity passthrough)

Set up your ablation, pick a sample, then hit Run.

The full-model prediction is computed automatically and shown alongside the ablated one, so you can see exactly how much each block contributes.

Experiments to try

Skip just the attention of one mid layer (e.g. L5). Often barely affects the prediction — most layers are doing redundant work.
Skip all attention but keep every MLP. Tells you how much the patch-mixing actually matters vs the per-token computation.
Skip the last 2–3 layers entirely. Watch the model break — late layers do the heavy lifting for classification.
Skip layers 0 and 1. Often surprisingly resilient — patch embeddings + later layers can recover.
Skip only MLPs. The model becomes a pure attention-only network — usually drops in accuracy a lot more than the symmetric experiment.