Sample Reuse
Cross-source consensus on Sample Reuse from 1 sources and 4 claims.
1 sources · 4 claims
Uses
Risks & contraindications
Comparisons
Highlighted claims
- Sample reuse means applying multiple gradient updates to each newly generated rollout batch. — When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR
- Naive sample reuse in LLM RLVR is said to cause catastrophic training collapse. — When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR
- Under naive sample reuse, training first converges faster than single-use training and then degrades severely. — When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR
- Fixed reuse has a stability-efficiency trade-off where low reuse is stable but less efficient and high reuse can collapse. — When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR