Zero-Variance Groups

Cross-source consensus on Zero-Variance Groups from 1 sources and 5 claims.

1 sources · 5 claims

How it works

A zero-variance group occurs when all trajectories in a group receive the same terminal reward. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
Zero-variance groups add no policy-gradient signal while still consuming rollout and training compute. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
In the Qwen2.5-7B ALFWorld setting, 39% of an offline 100-group sample was zero-variance. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
The on-policy run averaged around 40% zero-variance groups. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
Removing zero-advantage groups is proposed to reduce batch dilution and increase effective update magnitude. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL