lm_head Gradient Norm
Cross-source consensus on lm_head Gradient Norm from 1 sources and 4 claims.
1 sources · 4 claims
How it works
Comparisons
Evidence quality
Highlighted claims
- Among evaluated signals, only the lm_head gradient norm produced a sharp spike synchronized with collapse onset. — When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR
- The lm_head gradient norm lower-bounds empirical Pearson chi-squared divergence at the batch level. — When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR
- The lm_head gradient receives the raw error signal directly, while intermediate layer gradients are filtered through a Jacobian. — When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR
- A surge in lm_head gradient norm is interpreted as a certified early-warning indicator rather than a heuristic proxy. — When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR