Structural Gradient Asymmetry
Cross-source consensus on Structural Gradient Asymmetry from 1 sources and 4 claims.
1 sources · 4 claims
How it works
Evidence quality
Highlighted claims
- The measured structural constant stays below 0.1 at the median and below 0.5 at the 95th percentile across measured models. — When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR
- Theorem 1 bounds intermediate-layer gradient magnitude relative to lm_head gradient magnitude using an architectural constant. — When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR
- The structural constant remains stable during RL training, including at naive-reuse collapse. — When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR
- The paper argues structural gradient asymmetry may be a general architectural property of pre-norm Transformers rather than a model-specific artifact. — When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR