Rollout Informativeness under a Fixed Budget

Cross-source consensus on Rollout Informativeness under a Fixed Budget from 1 sources and 4 claims.

1 sources · 4 claims

Uses

RIFB is defined as the expected squared norm of the GRPO gradient mass contributed by a rollout set. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
The paper argues that rollout selection should optimize gradient informativeness rather than be treated as a budget-agnostic sampling detail. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
The tree objective F correlated strongly with measured RIFB across 500 prompts. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
InfoTree with ABA sustained higher RIFB over 300 steps while flat GRPO RIFB declined. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning