Zeroth-Order Optimization
Cross-source consensus on Zeroth-Order Optimization from 1 sources and 6 claims.
1 sources · 6 claims
How it works
Risks & contraindications
Comparisons
Evidence quality
Highlighted claims
- Zeroth-order optimization avoids backpropagation by estimating gradients from forward-pass finite differences. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- Dense Gaussian perturbations force standard ZO methods to touch every parameter at each step. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- Standard zeroth-order optimization is slow and noisy for billion-parameter models. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- The basic symmetric ZO estimator evaluates the loss at positively and negatively perturbed parameters and multiplies the finite difference by the perturbation vector. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- On OPT-6.7B, perturbation and update work account for nearly half of total MeZO step time. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- Full-parameter ZO perturbation is not necessarily more accurate because variance scales with dimension. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling