Slow-Fast Policy Optimization
Cross-source consensus on Slow-Fast Policy Optimization from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Risks & contraindications
Comparisons
Highlighted claims
- SFPO costs K plus one backward passes per update. — Gradient Extrapolation-Based Policy Optimization
- SFPO uses K fast inner steps and then applies a slow correction. — Gradient Extrapolation-Based Policy Optimization
- GXPO generally matched or exceeded SFPO while using fewer active-phase backward passes at larger K. — Gradient Extrapolation-Based Policy Optimization
- In the model-group comparisons, GXPO improved over the strongest SFPO setting by 0.14 to 1.28 average pass@1 points. — Gradient Extrapolation-Based Policy Optimization
- SFPO is treated as an optimizer-side lookahead method close to the setting studied by the paper. — Gradient Extrapolation-Based Policy Optimization