Memory Budget Ablations

Cross-source consensus on Memory Budget Ablations from 1 sources and 5 claims.

1 sources · 5 claims

Benefits

CATS adapts LSV to available DRAM and improves speed at 2 GB, 6 GB, and 8 GB budgets. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
At a 2 GB budget, CATS with LDM=3 and LSV=5 reached 0.329 tokens/s and 2.82x speedup. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
Deepening Kangaroo's draft model improves acceptance but worsens bytes per token and speed. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
CATS achieved similar or better acceptance than deep Kangaroo while reducing bytes per token and improving wall-clock speed. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
The default drafting horizon of gamma-bar = 5 captures most acceptance benefit before memory traffic and loop latency rise. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration