Out-of-Distribution Generalization

Cross-source consensus on Out-of-Distribution Generalization from 1 sources and 5 claims.

1 sources · 5 claims

How it works

OOD gains are the most pronounced results, with the largest single-benchmark improvement being +45.9% on AIME25 for Qwen2.5-Math-1.5B trained on MATH. — Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
LZE improvements are consistent across model scales from 1.5B to 8B parameters and across both Qwen2.5 and Qwen3 architecture families, indicating broad applicability. — Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
OOD improvements are attributed to avoiding overfitting to the in-distribution difficulty profile; suppressing trivially solved and completely failed prompts prevents the policy from memorizing training difficulty distributions. — Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
Smaller models (1.5B, 1.7B) show especially large OOD gains because the learning zone is narrower for smaller models, making the Energy Score's discriminative power highest there. — Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
Learning-zone targeting may be an important inductive bias for generalization in mathematical reasoning, distinct from and complementary to architectural choices. — Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training