Cost-Weighted Neyman Allocation
Cross-source consensus on Cost-Weighted Neyman Allocation from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Benefits
Evidence quality
Highlighted claims
- The allocation rule minimizes trace variance of the policy-gradient estimator under a token budget. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
- The cost-weighted Neyman optimum assigns more rollouts to prompts with higher within-prompt gradient variance and favors shorter prompts through expected length. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
- DUET uses a running per-prompt surrogate because true prompt variance is unavailable before generation. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
- The cost-weighted Neyman allocator is interpreted as the quality lever because it concentrates rollouts on high-variance, lower-cost prompts. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
- The formal theorem gives a unique variance-minimizing continuous allocation under a token budget. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards