KV Cache
Cross-source consensus on KV Cache from 1 sources and 4 claims.
1 sources · 4 claims
Benefits
Risks & contraindications
Comparisons
Evidence quality
Highlighted claims
- KV cache eviction bounds memory at the cost of irreversibly discarding past context. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
- ARL2 reduces per-frame memory for hybrid layers by 40%, from 293 MB to 175 MB. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
- Sparse attention, KV cache quantization, and KV cache eviction each have fundamental shortcomings and none simultaneously solves linear memory growth and proper streaming context management. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
- Unlike auxiliary linear-complexity modules that preserve the primary softmax path and its O(N) memory, ARL2 fully replaces inter-frame attention with constant-memory recurrence. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion