Gated Delta Network
Cross-source consensus on Gated Delta Network from 1 sources and 4 claims.
1 sources · 4 claims
How it works
Evidence quality
Highlighted claims
- Without restricting state updates to clean frames only, noisy denoising intermediates written into the recurrent state corrupt accumulated memory in a way that compounds with video length. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
- Without frame-level state access, tokens within the same frame observe different versions of accumulated history depending on processing order, causing an asymmetry that has no counterpart in language modeling. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
- The Gated Delta Network is selected for the inter-frame branch because its forget gate enables selective decay of stale context and its delta rule enables error-correction updates. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
- Frame-level state access matches full-softmax teacher performance, whereas token-level access degrades average quality. — Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion