Mathematical Reasoning

Cross-source consensus on Mathematical Reasoning from 1 sources and 6 claims.

1 sources · 6 claims

Benefits

Math results are limited to the locally curated AceReason-Math-Subset rather than unfiltered AceReason-Math. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
The mathematical-reasoning experiments trained Qwen3-4B-Instruct-2507 and Qwen3-8B without thinking mode and evaluated on AIME 2025. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
The most detailed mechanism audits came from the 4B mathematical-reasoning run. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
The AceReason-Math-Subset was constructed by excluding problems with empirical pass rate at least 75%. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
On AIME 2025, the 4B model reached the baseline peak level earlier with a 7.3 percentage point same-step gain. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
On AIME 2025, the 8B model reached the baseline peak level earlier with a 10.8 percentage point same-step gain. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime