Layer Sensitivity
Cross-source consensus on Layer Sensitivity from 1 sources and 6 claims.
1 sources · 6 claims
How it works
Evidence quality
Highlighted claims
- AdaLeZO tracks selected-layer rewards using an exponential moving average. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- AdaLeZO uses the absolute scalar finite-difference magnitude as a proxy for layer sensitivity. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- AdaLeZO learns layer sensitivity over time from noisy scalar feedback. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- Accumulated AdaLeZO sampling probabilities increasingly align with Adam gradient norms over training. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- The raw ZO magnitude proxy has positive empirical correlation with Adam gradient norms. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
- The multi-armed bandit acts as a temporal denoiser by aggregating noisy rewards into layer value estimates. — Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling