Rollout Informativeness under a Fixed Budget
Cross-source consensus on Rollout Informativeness under a Fixed Budget from 1 sources and 4 claims.
1 sources · 4 claims
Uses
Comparisons
Evidence quality
Other
Highlighted claims
- RIFB is defined as the expected squared norm of the GRPO gradient mass contributed by a rollout set. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
- The paper argues that rollout selection should optimize gradient informativeness rather than be treated as a budget-agnostic sampling detail. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
- The tree objective F correlated strongly with measured RIFB across 500 prompts. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
- InfoTree with ABA sustained higher RIFB over 300 steps while flat GRPO RIFB declined. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning