Mathematical Reasoning
Cross-source consensus on Mathematical Reasoning from 1 sources and 6 claims.
1 sources · 6 claims
Benefits
Preparation
Evidence quality
Highlighted claims
- Math results are limited to the locally curated AceReason-Math-Subset rather than unfiltered AceReason-Math. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- The mathematical-reasoning experiments trained Qwen3-4B-Instruct-2507 and Qwen3-8B without thinking mode and evaluated on AIME 2025. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- The most detailed mechanism audits came from the 4B mathematical-reasoning run. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- The AceReason-Math-Subset was constructed by excluding problems with empirical pass rate at least 75%. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- On AIME 2025, the 4B model reached the baseline peak level earlier with a 7.3 percentage point same-step gain. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- On AIME 2025, the 8B model reached the baseline peak level earlier with a 10.8 percentage point same-step gain. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime