Performance Evaluation

Cross-source consensus on Performance Evaluation from 1 sources and 5 claims.

1 sources · 5 claims

Benefits

On Vicuna-7B greedy chain decoding, CATS reached 3.0598 accepted tokens and 3.18x speedup. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
CATS with EAGLE tree decoding reached 3.7050 accepted tokens and 3.71x speedup on Vicuna-7B greedy decoding. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
On LLaMA2-7B greedy chain decoding, CATS achieved 4.6491 accepted tokens and 4.65x speedup. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
CATS remains faster across larger and different model families. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
Under evaluated benchmarks and decoding settings, CATS reported up to 5.08x wall-clock speedup without observed generation quality degradation. — CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration