Marker-Gated Abort
Cross-source consensus on Marker-Gated Abort from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Benefits
Dosage & preparation
Highlighted claims
- Generation uses domain-specific answer markers including boxed answers for math, closing code fences for code, and answer spans for short-form QA. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
- When a marker appears, DUET keeps the rollout and stops shortly after the marker. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
- When no marker appears past the abort horizon plus grace window, the rollout is usually aborted but sometimes continued to EOS with importance weighting. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
- The default abort keep probability is 0.05, with sensitivity tested across 0.01, 0.05, 0.10, and 0.20. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
- The marker-gated abort is described as the wall-clock lever because it cuts marker-less tails and trims successful rollouts after answers appear. — DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards