SWE-bench Verified

Cross-source consensus on SWE-bench Verified from 1 sources and 5 claims.

1 sources · 5 claims

Benefits

The SWE-bench experiments trained Qwen3-14B and Qwen3-32B in thinking mode on R2E-Gym-Subset and evaluated on SWE-bench Verified. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
SWE-bench-style agentic reinforcement learning uses long, stateful, interactive, and expensive rollouts. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
On SWE-bench Verified, Prefix Sampling improved the 14B model's peak Pass@1 over the same-step baseline by 4.7 percentage points. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
On SWE-bench Verified, Prefix Sampling improved the 32B model's peak Pass@1 over the same-step baseline by 5.4 percentage points. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
Agentic Prefix Sampling can reduce wall-clock cost by replaying prefix execution rather than regenerating prefix text. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime