Performance Evaluation
Cross-source consensus on Performance Evaluation from 1 sources and 5 claims.
1 sources · 5 claims
Benefits
Highlighted claims
- On Vicuna-7B greedy chain decoding, CATS reached 3.0598 accepted tokens and 3.18x speedup. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
- CATS with EAGLE tree decoding reached 3.7050 accepted tokens and 3.71x speedup on Vicuna-7B greedy decoding. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
- On LLaMA2-7B greedy chain decoding, CATS achieved 4.6491 accepted tokens and 4.65x speedup. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
- CATS remains faster across larger and different model families. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
- Under evaluated benchmarks and decoding settings, CATS reported up to 5.08x wall-clock speedup without observed generation quality degradation. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration