Hierarchical Advantage

Cross-source consensus on Hierarchical Advantage from 1 sources and 4 claims.

1 sources · 4 claims

Uses

Hierarchical advantage assigns sibling contrast at the node level using normalized backed-up values. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
For a trajectory, hierarchical advantage sums discounted sibling contrasts along the path with alpha equal to 0.7. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
The final advantage combines GRPO advantage and hierarchical advantage with lambda equal to 0.5. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
The InfoTree workflow computes GRPO plus hierarchical advantage before the policy update. — Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning