Zero-Variance Groups
Cross-source consensus on Zero-Variance Groups from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Risks & contraindications
Evidence quality
Highlighted claims
- A zero-variance group occurs when all trajectories in a group receive the same terminal reward. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
- Zero-variance groups add no policy-gradient signal while still consuming rollout and training compute. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
- In the Qwen2.5-7B ALFWorld setting, 39% of an offline 100-group sample was zero-variance. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
- The on-policy run averaged around 40% zero-variance groups. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
- Removing zero-advantage groups is proposed to reduce batch dilution and increase effective update magnitude. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL