Ablation control
Click any cell to skip that sub-block — the residual stream passes through it unchanged.
All blocks active
| block | L0 | L1 | L2 | L3 | L4 | L5 | L6 | L7 | L8 | L9 | L10 | L11 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Attn | ||||||||||||
| MLP |
activeskipped (identity passthrough)
Set up your ablation, pick a sample, then hit Run.
The full-model prediction is computed automatically and shown alongside the ablated one, so you can see exactly how much each block contributes.
Experiments to try
- Skip just the attention of one mid layer (e.g. L5). Often barely affects the prediction — most layers are doing redundant work.
- Skip all attention but keep every MLP. Tells you how much the patch-mixing actually matters vs the per-token computation.
- Skip the last 2–3 layers entirely. Watch the model break — late layers do the heavy lifting for classification.
- Skip layers 0 and 1. Often surprisingly resilient — patch embeddings + later layers can recover.
- Skip only MLPs. The model becomes a pure attention-only network — usually drops in accuracy a lot more than the symmetric experiment.