EMO

Cross-source consensus on EMO from 1 sources and 5 claims.

1 sources · 5 claims

Uses

EMO is a progressive MoE pretraining framework that grows the number of experts during training. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
EMO uses five stages that double the expert pool from 8 to 128 experts. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
EMO delays expensive large-expert configurations until later in training. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
EMO treats expert capacity as expandable memory rather than requiring the full expert pool from the start. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
EMO does not require new routing architectures, nonstandard expert layers, or extra objectives beyond the existing load-balancing objective. — EMO: Frustratingly Easy Progressive Training of Extendable MoE