Reinforcement Learning
Cross-source consensus on Reinforcement Learning from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Evidence quality
Highlighted claims
- RAO trains all recursive nodes jointly using a local reward. — Recursive Agent Optimization
- The node reward combines local task success with a delegation bonus based on immediate-child success. — Recursive Agent Optimization
- Using success rate rather than the number of successful children is intended to avoid rewarding indiscriminate spawning. — Recursive Agent Optimization
- Depth-level inverse-frequency weighting reduces domination by depths with many trajectories. — Recursive Agent Optimization
- The leave-one-out root-group baseline is described as unbiased because it is independent of the scored rollout tree. — Recursive Agent Optimization