Pretraining Performance
Cross-source consensus on Pretraining Performance from 1 sources and 5 claims.
1 sources · 5 claims
Benefits
Risks & contraindications
Comparisons
Evidence quality
Highlighted claims
- In the main 1.92T-token experiment, EMO reached a final pretraining loss of 1.017. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- EMO remained behind the fixed E=128 baseline by 0.023 absolute loss and 2.3 percent relative loss. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- EMO reduced GPU hours by 10 percent while approaching the quality regime of fixed E=128. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- Expansion-induced loss spikes recovered within roughly 10,000 steps. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- The final EMO model did not fully match fixed E=128 in all pretraining and downstream results. — EMO: Frustratingly Easy Progressive Training of Extendable MoE