Top-k Pruning
Cross-source consensus on Top-k Pruning from 1 sources and 4 claims.
1 sources · 4 claims
How it works
Benefits
Risks & contraindications
Comparisons
Highlighted claims
- Inverse-based operators degraded sharply under train-and-test top-k masking. — Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity
- Raw one-hop attention can remain accurate with very small top-k masks in the tested setting. — Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity
- Pruning before the inverse can remove low-weight bridge edges that later matter after powers of the attention matrix densify. — Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity
- The blockwise method avoids naive row-pruning failure by preserving exact local inverse computation and using a reduced off-block system. — Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity