EMO
Cross-source consensus on EMO from 1 sources and 5 claims.
1 sources · 5 claims
Uses
How it works
Preparation
Comparisons
Highlighted claims
- EMO is a progressive MoE pretraining framework that grows the number of experts during training. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- EMO uses five stages that double the expert pool from 8 to 128 experts. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- EMO delays expensive large-expert configurations until later in training. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- EMO treats expert capacity as expandable memory rather than requiring the full expert pool from the start. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- EMO does not require new routing architectures, nonstandard expert layers, or extra objectives beyond the existing load-balancing objective. — EMO: Frustratingly Easy Progressive Training of Extendable MoE