Adaptive Control Ablations
Cross-source consensus on Adaptive Control Ablations from 1 sources and 6 claims.
1 sources · 6 claims
Benefits
Comparisons
Evidence quality
Highlighted claims
- PS-ada produced bucket-level rerollout pass rates near 0.5 across the 1/8, 2/8, 6/8, and 7/8 buckets. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Having more valid groups per step did not by itself determine performance. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Adaptive control kept rerollout pass rates closer to the 50% target. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Replay produced the largest raw reduction in convergence steps in the 4B math ablations. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Fixed prefix replay left the 1/8 bucket too hard and the 7/8 bucket too easy. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Hard-only adaptation controlled hard buckets but ignored easy buckets. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime