Large-Batch Optimizers
Cross-source consensus on Large-Batch Optimizers from 1 sources and 5 claims.
1 sources · 5 claims
Uses
How it works
Benefits
Comparisons
Highlighted claims
- LAMB combines Adam-style adaptivity with layer-wise trust ratios for very large-batch training. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- Increasing global batch size can improve accelerator utilization but eventually gives diminishing optimization returns. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- Large-batch optimizers address the connection between optimization behavior and hardware throughput. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- Large-batch methods are most relevant when training is communication-limited or throughput improves meaningfully with larger batches. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers
- A larger batch can increase tokens per second while reducing token efficiency. — Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers