Selective Rollout

Cross-source consensus on Selective Rollout from 1 sources and 5 claims.

1 sources · 5 claims

How it works

In the main online experiments, the gate used K = 10 and dL = 0.12. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
When the gate fires, running trajectories are stopped at step K and the group is excluded from GRPO loss. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
Selective rollout uses a one-parameter mid-rollout gate to decide whether to stop a group early. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
The gate saves both post-K rollout generation and training compute for groups it removes. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
The gate has negligible computational overhead because it uses a small number of short Levenshtein computations. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL