PhaseAwareRMSNorm
Cross-source consensus on PhaseAwareRMSNorm from 1 sources and 4 claims.
1 sources · 4 claims
How it works
Benefits
Evidence quality
Highlighted claims
- PhaseAwareRMSNorm replaces global RMSNorm with independent normalization over each phase at pre-attention, pre-FFN, and final norm sites. — Three-Phase Transformer
- PhaseAwareRMSNorm does not change parameter count because its per-phase scale vectors concatenate to the model dimension. — Three-Phase Transformer
- Long-horizon TinyStories runs found PhaseAwareRMSNorm became a small late-training win. — Three-Phase Transformer
- In cumulative refinement experiments, PhaseAwareRMSNorm eventually helped despite hurting short-horizon performance in the full grid. — Three-Phase Transformer