Binary-Reward RLVR

Cross-source consensus on Binary-Reward RLVR from 1 sources and 5 claims.

1 sources · 5 claims

Uses

Claims are limited to binary-reward RLVR with grouped rollouts. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
The paper addresses inefficiency in reinforcement learning with verifiable binary rewards. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
The paper positions pass-rate control as a practical efficiency objective for binary-reward RLVR. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
Binary-reward reinforcement learning signal is strongest when a rollout group contains substantial success-failure contrast. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
Prefix Sampling increases update-bearing groups, reward entropy, contrastive structure, and RLOO advantage energy under the same rollout budget. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime