Selective Rollout
Cross-source consensus on Selective Rollout from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Benefits
Dosage & preparation
Highlighted claims
- In the main online experiments, the gate used K = 10 and dL = 0.12. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
- When the gate fires, running trajectories are stopped at step K and the group is excluded from GRPO loss. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
- Selective rollout uses a one-parameter mid-rollout gate to decide whether to stop a group early. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
- The gate saves both post-K rollout generation and training compute for groups it removes. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
- The gate has negligible computational overhead because it uses a small number of short Levenshtein computations. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL