Sign-Based Optimizers
Cross-source consensus on Sign-Based Optimizers from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Benefits
Risks & contraindications
Highlighted claims
- Lion uses a momentum-like exponential moving average and updates using the sign of a current-gradient and momentum combination. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- Sign-based methods challenge whether LLM optimizers need full gradient magnitudes. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- signSGD updates using the sign of the gradient and can support one-bit communication or majority-vote aggregation. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- Lion can avoid AdamW's second-moment vector by using one momentum state and decoupled weight decay. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- Lion's sign-normalized updates make learning rate and weight decay especially important. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers