Geometry-Aware Optimization
Cross-source consensus on Geometry-Aware Optimization from 1 sources and 5 claims.
1 sources · 5 claims
Uses
Benefits
Risks & contraindications
Comparisons
Highlighted claims
- Dense principled curvature matrices are difficult to use directly because they couple parameters across layers through the chain rule. — Layerwise LQR for Geometry-Aware Optimization of Deep Networks
- Standard gradient descent is steepest descent under the Euclidean norm, while Newton, Gauss-Newton, and natural-gradient methods use curvature or divergence-induced metrics. — Layerwise LQR for Geometry-Aware Optimization of Deep Networks
- Scalable preconditioners often make computation tractable by imposing block-diagonal or factored structure early. — Layerwise LQR for Geometry-Aware Optimization of Deep Networks
- LLQR can compare optimizers interpretable as steepest descent under different norms within a common layerwise optimal-control objective. — Layerwise LQR for Geometry-Aware Optimization of Deep Networks
- Geometry-aware methods can affect both convergence speed and the implicit bias of training trajectories. — Layerwise LQR for Geometry-Aware Optimization of Deep Networks