Reinforcement Learning

Cross-source consensus on Reinforcement Learning from 1 sources and 5 claims.

1 sources · 5 claims

How it works

RAO trains all recursive nodes jointly using a local reward. — Recursive Agent Optimization
The node reward combines local task success with a delegation bonus based on immediate-child success. — Recursive Agent Optimization
Using success rate rather than the number of successful children is intended to avoid rewarding indiscriminate spawning. — Recursive Agent Optimization
Depth-level inverse-frequency weighting reduces domination by depths with many trajectories. — Recursive Agent Optimization
The leave-one-out root-group baseline is described as unbiased because it is independent of the scored rollout tree. — Recursive Agent Optimization