Expert Utilization
Cross-source consensus on Expert Utilization from 1 sources and 4 claims.
1 sources · 4 claims
Risks & contraindications
Comparisons
Highlighted claims
- EMO had a higher Gini coefficient for expert utilization than the fixed baseline. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- No EMO layer exhibited routing collapse in the utilization analysis. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- Fixed E=128 and EMO showed similar per-layer and per-expert routing distributions. — EMO: Frustratingly Easy Progressive Training of Extendable MoE
- The paper reports slightly higher expert-utilization imbalance for EMO despite observing no collapse. — EMO: Frustratingly Easy Progressive Training of Extendable MoE