DUET
Cross-source consensus on DUET from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Benefits
Comparisons
Highlighted claims
- DUET frames RLVR efficiency as jointly deciding rollout counts and stopping points under one shared token budget. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
- DUET is implemented as a three-phase layer over GRPO covering allocation, generation, and update. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
- DUET improved both accuracy and wall-clock efficiency in the reported experiments. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
- DUET coordinates rollout count and rollout length simultaneously, unlike compared baselines that address only part of the cost problem. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
- DUET is presented as improving efficiency and learning-signal quality rather than simply trading accuracy for speed. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards