Top-k Pruning

Cross-source consensus on Top-k Pruning from 1 sources and 4 claims.

1 sources · 4 claims

How it works

Inverse-based operators degraded sharply under train-and-test top-k masking. — Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity
Raw one-hop attention can remain accurate with very small top-k masks in the tested setting. — Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity
Pruning before the inverse can remove low-weight bridge edges that later matter after powers of the attention matrix densify. — Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity
The blockwise method avoids naive row-pruning failure by preserving exact local inverse computation and using a reduced off-block system. — Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity