Rollout Pass-Rate Control
Cross-source consensus on Rollout Pass-Rate Control from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Benefits
Comparisons
Highlighted claims
- For N = 8, exactly four successful rollouts produces the maximum number of success-failure contrastive pairs. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Bernoulli reward entropy is maximized at a pass rate of 0.5. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Rollout Pass-Rate Control defines the pass rate as the fraction of successful rollouts in a fixed task and rollout group. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- The probability that a rollout group survives all-fail or all-pass filtering is maximized at a pass rate of 0.5. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Binary-reward learning is most informative when the rollout pass rate is 50%. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime