KV Cache

Cross-source consensus on KV Cache from 1 sources and 4 claims.

1 sources · 4 claims

Benefits

KV cache eviction bounds memory at the cost of irreversibly discarding past context. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
ARL2 reduces per-frame memory for hybrid layers by 40%, from 293 MB to 175 MB. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
Sparse attention, KV cache quantization, and KV cache eviction each have fundamental shortcomings and none simultaneously solves linear memory growth and proper streaming context management. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
Unlike auxiliary linear-complexity modules that preserve the primary softmax path and its O(N) memory, ARL2 fully replaces inter-frame attention with constant-memory recurrence. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion