Low-Rank Optimizers

Cross-source consensus on Low-Rank Optimizers from 1 sources and 5 claims.

1 sources · 5 claims

How it works

GaLore is an optimizer-memory method for full-parameter learning rather than a LoRA-style parameter-efficient fine-tuning method. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
Low-rank projection methods trade memory savings against approximation error and projection-refresh overhead. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
GaLore performs Adam-style updates in a reduced space and then projects updates back to full parameters. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
Projection-based optimizers maintain optimizer states in a projected space and map updates back to the full parameter space. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
Low-rank methods use the observation that gradients or updates for large Transformer matrices can have low effective rank. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers