Sign-Based Optimizers

Cross-source consensus on Sign-Based Optimizers from 1 sources and 5 claims.

1 sources · 5 claims

How it works

Lion uses a momentum-like exponential moving average and updates using the sign of a current-gradient and momentum combination. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
Sign-based methods challenge whether LLM optimizers need full gradient magnitudes. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
signSGD updates using the sign of the gradient and can support one-bit communication or majority-vote aggregation. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
Lion can avoid AdamW's second-moment vector by using one momentum state and decoupled weight decay. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
Lion's sign-normalized updates make learning rate and weight decay especially important. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers