Baselines

Cross-source consensus on Baselines from 1 sources and 5 claims.

1 sources · 5 claims

Comparisons

Prior efficient-RLVR methods usually control either prompt or rollout selection, or within-rollout pruning, rather than both count and length jointly. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
DAPO post-filters after rollouts are generated, so it does not save the main generation cost. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
Fixed length caps are treated as suboptimal because they can penalize unfinished coherent reasoning and remove useful long reasoning chains. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
VIP adapts rollout allocation but treats rollout length as exogenous. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
ARRoL saved only a small slice of generated tokens in the math setting because few rollouts reached its inspection point. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards