Gradient Computation Cost
Cross-source consensus on Gradient Computation Cost from 1 sources and 4 claims.
1 sources · 4 claims
How it works
Comparisons
Background
Highlighted claims
- In simulator-based GRPO training of a 7B VLA model, gradient computation accounts for approximately 78% of wall-clock time per training step while rollout collection accounts for only approximately 21%. — Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking
- Prior efficiency work implicitly assumed rollout collection was the dominant cost, but this paper challenges that assumption with direct measurement. — Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking
- The finding that gradient computation dominates training cost inverts the assumption behind most prior VLA RL efficiency work and motivates treating gradient allocation as an explicit design axis. — Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking
- An earlier approach of branching at decision-critical timesteps was abandoned because it added exponential rollout overhead rather than reducing gradient computation. — Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking