Phase-Aligned GQA
Cross-source consensus on Phase-Aligned GQA from 1 sources and 4 claims.
1 sources · 4 claims
How it works
Benefits
Highlighted claims
- Phase-aligned GQA requires the number of query and key-value heads to be divisible by the number of phases. — Three-Phase Transformer
- Phase-aligned GQA is only a layout constraint; attention softmax and output projection still mix heads globally. — Three-Phase Transformer
- Cumulative refinement found phase-aligned heads helpful. — Three-Phase Transformer
- The article argues that 3PT's gain comes partly from coordinating phase-compatible head layout with other phase-aware operations. — Three-Phase Transformer