Memory Budget Ablations
Cross-source consensus on Memory Budget Ablations from 1 sources and 5 claims.
1 sources · 5 claims
Benefits
Dosage & preparation
Comparisons
Highlighted claims
- CATS adapts LSV to available DRAM and improves speed at 2 GB, 6 GB, and 8 GB budgets. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
- At a 2 GB budget, CATS with LDM=3 and LSV=5 reached 0.329 tokens/s and 2.82x speedup. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
- Deepening Kangaroo's draft model improves acceptance but worsens bytes per token and speed. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
- CATS achieved similar or better acceptance than deep Kangaroo while reducing bytes per token and improving wall-clock speed. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
- The default drafting horizon of gamma-bar = 5 captures most acceptance benefit before memory traffic and loop latency rise. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration