AdaLeZO
Cross-source consensus on AdaLeZO from 1 sources and 6 claims.
1 sources · 6 claims
How it works
Benefits
Comparisons
Highlighted claims
- AdaLeZO adaptively selects layers for zeroth-order perturbations by treating layers as arms in a non-stationary multi-armed bandit problem. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- AdaLeZO samples only a subset of layers at each step and generates Gaussian perturbations only for active layers. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- AdaLeZO concentrates a limited perturbation budget on layers estimated to be sensitive. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- AdaLeZO preserves peak memory while improving throughput relative to MeZO on OPT-6.7B. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- AdaLeZO reduces perturbation and update work from dense parameter cost to approximately proportional to the sampling ratio. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- AdaLeZO can wrap several other zeroth-order optimizers because it changes spatial allocation rather than the underlying optimizer family. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling