Theoretical Analysis
Cross-source consensus on Theoretical Analysis from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Comparisons
Evidence quality
Highlighted claims
- The theoretical analysis uses a plain-gradient-descent surrogate to explain extrapolation geometry, not AdamW state dynamics. — Gradient Extrapolation-Based Policy Optimization
- For diagonal Hessians, each gradient coordinate follows exact geometric decay. — Gradient Extrapolation-Based Policy Optimization
- A diagonal-quadratic sanity check shows that idealized GXPO can land at the same point as K plus one plain-GD steps with only three backward passes. — Gradient Extrapolation-Based Policy Optimization
- The surrogate theory explains exactness for diagonal quadratic losses and identifies failure modes from coupling, ratio error, inactive coordinates, and Taylor remainder. — Gradient Extrapolation-Based Policy Optimization
- Under a fixed local quadratic Hessian, plain gradient descent produces gradients following repeated multiplication by I minus eta times the Hessian. — Gradient Extrapolation-Based Policy Optimization