Hierarchical Advantage
Cross-source consensus on Hierarchical Advantage from 1 sources and 4 claims.
1 sources · 4 claims
Uses
How it works
Highlighted claims
- Hierarchical advantage assigns sibling contrast at the node level using normalized backed-up values. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
- For a trajectory, hierarchical advantage sums discounted sibling contrasts along the path with alpha equal to 0.7. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
- The final advantage combines GRPO advantage and hierarchical advantage with lambda equal to 0.5. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
- The InfoTree workflow computes GRPO plus hierarchical advantage before the policy update. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning