Curvature-Aware Optimizers
Cross-source consensus on Curvature-Aware Optimizers from 1 sources and 5 claims.
1 sources · 5 claims
How it works
Risks & contraindications
Evidence quality
Highlighted claims
- Curvature methods must be evaluated on both token efficiency and wall-clock efficiency because their steps can be more expensive. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- Curvature-aware optimizers attempt to improve update geometry by approximating second-order information with practical structure. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- Shampoo uses matrix statistics and inverse matrix roots to provide richer preconditioning than diagonal adaptivity. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- Curvature-aware optimizers add computation and hyperparameters through inverse roots, Hessian estimates, damping, clipping, and schedules. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- Sophia uses lightweight diagonal Hessian-like curvature estimates for language-model pretraining. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers