Downstream Evaluation
Cross-source consensus on Downstream Evaluation from 1 sources and 5 claims.
1 sources · 5 claims
Benefits
Risks & contraindications
Comparisons
Evidence quality
Highlighted claims
- Fixed E=128 retained a larger advantage on GSM8K than on most tasks. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- EMO Stage 5 was much stronger than fixed E=16 and generally stronger than fixed E=32 on downstream tasks. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- Knowledge and commonsense tasks broadly benefited from larger expert pools and progressive expansion. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- The paper leaves open whether reasoning-heavy benchmarks need earlier exposure to the full expert pool than knowledge or commonsense tasks. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- Reasoning-heavy behavior may require more training time with the full expert pool. — EMO: Frustratingly Easy Progressive Training of Extendable MoE