Stability Diagnostics
Cross-source consensus on Stability Diagnostics from 1 sources and 6 claims.
1 sources · 6 claims
How it works
Risks & contraindications
Evidence quality
Highlighted claims
- GXPO maintains a rolling buffer of corrective-gradient norms and computes a z-score before adding the current norm. — Gradient Extrapolation-Based Policy Optimization
- If the corrective-gradient norm is unusually large, GXPO treats the lookahead signal as unreliable and falls back to GRPO. — Gradient Extrapolation-Based Policy Optimization
- KL and clipping diagnostics indicated that GXPO repositioning did not substantially increase clipping. — Gradient Extrapolation-Based Policy Optimization
- Surrogate diagnostics showed that extrapolated-point absolute error increased with K but remained small. — Gradient Extrapolation-Based Policy Optimization
- Median active cosine similarity between the initial and corrective gradients stayed approximately stable across K values. — Gradient Extrapolation-Based Policy Optimization
- Mean KL penalties were higher for GXPO, especially at larger K, but remained small in absolute terms. — Gradient Extrapolation-Based Policy Optimization