Prefix Sampling
Cross-source consensus on Prefix Sampling from 1 sources and 6 claims.
1 sources · 6 claims
How it works
Highlighted claims
- Prefix Sampling is a bidirectional controller for rollout groups of size eight. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Prefix tokens are masked out of the reinforcement-learning loss so replayed off-policy actions are not credited by the new rollout. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Prefix Sampling filters degenerate groups, trains directly on balanced groups, and replays saved successful or failing prefixes for skewed groups. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Prefix length is controlled separately for the 1/8, 2/8, 6/8, and 7/8 buckets. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- The adaptive controller maintains an exponential moving average of rerollout pass rate for each bucket. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- Hard prefixes are intended to raise rerollout pass rates, while easy prefixes are intended to lower them. — Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime