Probabilistic Chunk Masking

Cross-source consensus on Probabilistic Chunk Masking from 1 sources and 7 claims.

1 sources · 7 claims

How it works

Benefits

Comparisons

Highlighted claims

Probabilistic Chunk Masking is a drop-in modification to GRPO that selects a fixed budget of trajectory chunks per update and physically removes the rest before the forward and backward pass. — Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking
PCM reaches the 98% success rate threshold 2.38 times faster than vanilla GRPO on LIBERO-Object. — Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking
PCM's wall-clock gain is entirely attributable to per-step compute savings because its per-step learning curves match those of vanilla GRPO. — Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking
A budget of B=12 chunks (19% of trajectory) is selected as the PCM default and applied without task-specific tuning across all benchmarks. — Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking
PCM relies on three distinct mechanisms — concentration, exploration, and online adaptation — all of which are necessary for its effectiveness. — Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking
The masked biased estimator used by PCM is preferred over an unbiased importance-weighted alternative because it yields lower gradient estimator variance at the same chunk budget. — Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking
PCM operates at the trajectory-phase level rather than the token level and uses an outcome-grounded signal rather than policy-internal uncertainty proxies such as entropy. — Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking