DUET

Cross-source consensus on DUET from 1 sources and 5 claims.

1 sources · 5 claims

How it works

DUET frames RLVR efficiency as jointly deciding rollout counts and stopping points under one shared token budget. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
DUET is implemented as a three-phase layer over GRPO covering allocation, generation, and update. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
DUET improved both accuracy and wall-clock efficiency in the reported experiments. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
DUET coordinates rollout count and rollout length simultaneously, unlike compared baselines that address only part of the cost problem. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
DUET is presented as improving efficiency and learning-signal quality rather than simply trading accuracy for speed. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards