Reduced KL Loss
Cross-source consensus on Reduced KL Loss from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Benefits
Comparisons
Highlighted claims
- Reduced KL selects top-K target-probability tokens and computes cross-entropy only on that support. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
- Reduced KL Loss is the central training objective for CATS adapters. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
- Reduced KL concentrates adapter capacity on tokens likely to be accepted during speculation. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
- Full-vocabulary distillation wastes supervision on low-probability tokens that rarely affect speculative acceptance. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
- Focusing on top-K target tokens improves compact draft and shallow-verifier adapters under memory-limited constraints. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration