Layer-Aligned Distillation
Cross-source consensus on Layer-Aligned Distillation from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Risks & contraindications
Comparisons
Evidence quality
Highlighted claims
- Layer-aligned distillation trains each student layer to match a mapped teacher layer. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
- Layer-aligned distillation suppresses the redundancy needed by convergence-based early exit. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
- Standard layer-aligned distilled models tend to transform representations across layers rather than stabilizing before the final layer. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
- The MiniLM-L12 baseline produced no early exits by layer 7 despite having higher STS-B than LEAP in the controlled comparison. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
- The article attributes early-exit incompatibility specifically to intermediate-layer alignment rather than to distillation in general. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference