Learning-Zone Energy (LZE)
Cross-source consensus on Learning-Zone Energy (LZE) from 1 sources and 7 claims.
1 sources · 7 claims
How it works
Comparisons
Highlighted claims
- The pass-rate momentum factor amplifies prompts where the policy is actively improving and is neutral for stagnating prompts. — Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
- Gumbel perturbations are added to Energy Scores before ranking to provide stochastic exploration in prompt selection. — Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
- The forward pruner re-evaluates a fraction of pruned prompts each epoch and restores those that no longer achieve full correctness, serving as a safety mechanism against catastrophic forgetting. — Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
- LZE is a dual-stage data selection framework with a per-step backward selector that concentrates gradient budget and an epoch-level forward pruner that eliminates rollout generation for persistently solved prompts. — Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
- The outcome uncertainty term 4p(1-p) is symmetric, simultaneously down-weighting both all-correct and all-incorrect regimes, unlike Focal Loss which is asymmetric. — Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
- The difficulty anchor is computed from a single forward pass at initialization and never updated thereafter, encoding intrinsic prompt hardness as a fixed prior. — Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
- The Energy Score can be interpreted as a sample-level attention mechanism where the difficulty anchor is the key, the EMA is the query, and 4p(1-p)(1+αm) is the normalized attention score. — Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training