Low-Rank Optimizers
Cross-source consensus on Low-Rank Optimizers from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Risks & contraindications
Comparisons
Highlighted claims
- GaLore is an optimizer-memory method for full-parameter learning rather than a LoRA-style parameter-efficient fine-tuning method. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- Low-rank projection methods trade memory savings against approximation error and projection-refresh overhead. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- GaLore performs Adam-style updates in a reduced space and then projects updates back to full parameters. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- Projection-based optimizers maintain optimizer states in a projected space and map updates back to the full parameter space. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- Low-rank methods use the observation that gradients or updates for large Transformer matrices can have low effective rank. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers