Inverse Preconditioning

Cross-source consensus on Inverse Preconditioning from 1 sources and 5 claims.

1 sources · 5 claims

How it works

The practical LLQR relaxation replaces exact layer updates with a preconditioned gradient using a block-diagonal inverse preconditioner. — Layerwise LQR for Geometry-Aware Optimization of Deep Networks
LLQR learns the inverse preconditioner under the LQR objective instead of inverting a pre-structured curvature matrix. — Layerwise LQR for Geometry-Aware Optimization of Deep Networks
The overhead of LLQR is mainly in periodic refitting of the inverse preconditioner rather than in applying it. — Layerwise LQR for Geometry-Aware Optimization of Deep Networks
LLQR periodically refits the learned inverse preconditioner and passes the preconditioned gradient to SGDM or AdamW. — Layerwise LQR for Geometry-Aware Optimization of Deep Networks
The relaxed LLQR update can differ from exact LQR because the learned inverse preconditioner is constrained by directions spanned by current gradients. — Layerwise LQR for Geometry-Aware Optimization of Deep Networks