Reduced KL Loss

Cross-source consensus on Reduced KL Loss from 1 sources and 5 claims.

1 sources · 5 claims

How it works

Reduced KL selects top-K target-probability tokens and computes cross-entropy only on that support. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
Reduced KL Loss is the central training objective for CATS adapters. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
Reduced KL concentrates adapter capacity on tokens likely to be accepted during speculation. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
Full-vocabulary distillation wastes supervision on low-probability tokens that rarely affect speculative acceptance. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
Focusing on top-K target tokens improves compact draft and shallow-verifier adapters under memory-limited constraints. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration