Downstream Evaluation

Cross-source consensus on Downstream Evaluation from 1 sources and 5 claims.

1 sources · 5 claims

Benefits

Fixed E=128 retained a larger advantage on GSM8K than on most tasks. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
EMO Stage 5 was much stronger than fixed E=16 and generally stronger than fixed E=32 on downstream tasks. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
Knowledge and commonsense tasks broadly benefited from larger expert pools and progressive expansion. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
The paper leaves open whether reasoning-heavy benchmarks need earlier exposure to the full expert pool than knowledge or commonsense tasks. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
Reasoning-heavy behavior may require more training time with the full expert pool. — EMO: Frustratingly Easy Progressive Training of Extendable MoE