Binary-Reward RLVR
Cross-source consensus on Binary-Reward RLVR from 1 sources and 5 claims.
1 sources · 5 claims
Uses
How it works
Benefits
Evidence quality
Highlighted claims
- Claims are limited to binary-reward RLVR with grouped rollouts. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- The paper addresses inefficiency in reinforcement learning with verifiable binary rewards. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- The paper positions pass-rate control as a practical efficiency objective for binary-reward RLVR. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Binary-reward reinforcement learning signal is strongest when a rollout group contains substantial success-failure contrast. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Prefix Sampling increases update-bearing groups, reward entropy, contrastive structure, and RLOO advantage energy under the same rollout budget. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime