Layer-Aligned Distillation

Cross-source consensus on Layer-Aligned Distillation from 1 sources and 5 claims.

1 sources · 5 claims

How it works

Layer-aligned distillation trains each student layer to match a mapped teacher layer. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
Layer-aligned distillation suppresses the redundancy needed by convergence-based early exit. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
Standard layer-aligned distilled models tend to transform representations across layers rather than stabilizing before the final layer. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
The MiniLM-L12 baseline produced no early exits by layer 7 despite having higher STS-B than LEAP in the controlled comparison. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
The article attributes early-exit incompatibility specifically to intermediate-layer alignment rather than to distillation in general. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference