AdaLeZO

Cross-source consensus on AdaLeZO from 1 sources and 6 claims.

1 sources · 6 claims

How it works

AdaLeZO adaptively selects layers for zeroth-order perturbations by treating layers as arms in a non-stationary multi-armed bandit problem. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
AdaLeZO samples only a subset of layers at each step and generates Gaussian perturbations only for active layers. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
AdaLeZO concentrates a limited perturbation budget on layers estimated to be sensitive. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
AdaLeZO preserves peak memory while improving throughput relative to MeZO on OPT-6.7B. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
AdaLeZO reduces perturbation and update work from dense parameter cost to approximately proportional to the sampling ratio. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
AdaLeZO can wrap several other zeroth-order optimizers because it changes spatial allocation rather than the underlying optimizer family. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling