SWE-bench Verified
Cross-source consensus on SWE-bench Verified from 1 sources and 5 claims.
1 sources · 5 claims
Benefits
Risks & contraindications
Evidence quality
Highlighted claims
- The SWE-bench experiments trained Qwen3-14B and Qwen3-32B in thinking mode on R2E-Gym-Subset and evaluated on SWE-bench Verified. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- SWE-bench-style agentic reinforcement learning uses long, stateful, interactive, and expensive rollouts. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- On SWE-bench Verified, Prefix Sampling improved the 14B model's peak Pass@1 over the same-step baseline by 4.7 percentage points. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- On SWE-bench Verified, Prefix Sampling improved the 32B model's peak Pass@1 over the same-step baseline by 5.4 percentage points. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Agentic Prefix Sampling can reduce wall-clock cost by replaying prefix execution rather than regenerating prefix text. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime