ARL2
Cross-source consensus on ARL2 from 1 sources and 7 claims.
1 sources · 7 claims
How it works
Benefits
Risks & contraindications
Comparisons
Background
Highlighted claims
- ARL2 decomposes self-attention into two parallel branches: softmax attention for intra-frame spatial structure and a fixed-size recurrent state for inter-frame temporal memory. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
- ARL2 is the first work to convert a pretrained autoregressive video diffusion model into a hybrid linear attention architecture. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
- At 1,005 frames ARL2 completes generation at 43 GB while Causal Forcing runs out of memory at over 91 GB. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
- ARL2 at 50% layer replacement achieves the highest Quality Average (87.17) among all distilled models evaluated, improving temporal flickering by +0.99 and motion smoothness by +1.35 over its teacher. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
- ARL2 with 1.3B parameters matches the quality of MAGI-1, which has 4.5B parameters. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
- ARL2 underperforms the teacher model on spatial relationships and human actions due to the inherent constraint of a fixed-size recurrent state. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
- ARL2's improvements in temporal flickering and motion smoothness are attributed to the recurrent state providing more stable long-range context than a linearly growing KV cache. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion