Convergence-Based Early Exit

Cross-source consensus on Convergence-Based Early Exit from 1 sources and 5 claims.

1 sources · 5 claims

How it works

The evaluated inference procedure exits after a minimum layer when the current pooled representation is sufficiently similar to an earlier pooled representation. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
Convergence-based early exit terminates inference when intermediate representations are sufficiently stable. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
Early exit is viable when layer-to-layer representation changes diminish consistently. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
The paper recommends an inference threshold of 0.95 for balancing quality and latency. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
Batch inference limits realized savings because total batch compute depends on the latest-exiting sample. — LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference