GRESO

Cross-source consensus on GRESO from 1 sources and 4 claims.

1 sources · 4 claims

Uses

GRESO predicts uninformative prompts before rollout begins using cross-epoch reward consistency on math tasks. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
GRESO cannot use information revealed during a multi-step agent trajectory. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
GRESO was developed around math-style reasoning settings without a multi-step environment trajectory to inspect. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
The article identifies evaluating a combined pipeline with GRESO and DAPO as future work. — Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL