Empirical Evaluation

Cross-source consensus on Empirical Evaluation from 1 sources and 6 claims.

1 sources · 6 claims

Benefits

Evaluation used Math-500, AMC23, GSM8K, Minerva Math, and OlympiadBench. — Gradient Extrapolation-Based Policy Optimization
All evaluated methods used identical prompts, rewards, decoding, KL penalty, and GRPO loss, with only the policy-update rule changed. — Gradient Extrapolation-Based Policy Optimization
GXPO consistently improved average sampled pass@1 over GRPO in the reported experiments. — Gradient Extrapolation-Based Policy Optimization
Across reported configurations, GXPO improved average results over GRPO by 1.65 to 5.00 points depending on the model group. — Gradient Extrapolation-Based Policy Optimization
Compute-controlled ablations showed GXPO outperforming GRPO and SFPO at matched backward-pass budgets on three math benchmarks. — Gradient Extrapolation-Based Policy Optimization
At matched wall-clock checkpoints, GXPO averaged higher than GRPO and SFPO in the reported Llama3.2-3B ablation. — Gradient Extrapolation-Based Policy Optimization