Cost-Weighted Neyman Allocation

Cross-source consensus on Cost-Weighted Neyman Allocation from 1 sources and 5 claims.

1 sources · 5 claims

How it works

The allocation rule minimizes trace variance of the policy-gradient estimator under a token budget. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
The cost-weighted Neyman optimum assigns more rollouts to prompts with higher within-prompt gradient variance and favors shorter prompts through expected length. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
DUET uses a running per-prompt surrogate because true prompt variance is unavailable before generation. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
The cost-weighted Neyman allocator is interpreted as the quality lever because it concentrates rollouts on high-variance, lower-cost prompts. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
The formal theorem gives a unique variance-minimizing continuous allocation under a token budget. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards