Empirical Evaluation
Cross-source consensus on Empirical Evaluation from 1 sources and 6 claims.
1 sources · 6 claims
Benefits
Preparation
Comparisons
Highlighted claims
- Evaluation used Math-500, AMC23, GSM8K, Minerva Math, and OlympiadBench. — Gradient Extrapolation-Based Policy Optimization
- All evaluated methods used identical prompts, rewards, decoding, KL penalty, and GRPO loss, with only the policy-update rule changed. — Gradient Extrapolation-Based Policy Optimization
- GXPO consistently improved average sampled pass@1 over GRPO in the reported experiments. — Gradient Extrapolation-Based Policy Optimization
- Across reported configurations, GXPO improved average results over GRPO by 1.65 to 5.00 points depending on the model group. — Gradient Extrapolation-Based Policy Optimization
- Compute-controlled ablations showed GXPO outperforming GRPO and SFPO at matched backward-pass budgets on three math benchmarks. — Gradient Extrapolation-Based Policy Optimization
- At matched wall-clock checkpoints, GXPO averaged higher than GRPO and SFPO in the reported Llama3.2-3B ablation. — Gradient Extrapolation-Based Policy Optimization