NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning
From MaRDI portal
Publication:6370245
arXiv2106.07454MaRDI QIDQ6370245
Author name not available (Why is that?)
Publication date: 14 June 2021
Abstract: In this paper, a novel second-order method called NG+ is proposed. By following the rule ``the shape of the gradient equals the shape of the parameter", we define a generalized fisher information matrix (GFIM) using the products of gradients in the matrix form rather than the traditional vectorization. Then, our generalized natural gradient direction is simply the inverse of the GFIM multiplies the gradient in the matrix form. Moreover, the GFIM and its inverse keeps the same for multiple steps so that the computational cost can be controlled and is comparable with the first-order methods. A global convergence is established under some mild conditions and a regret bound is also given for the online learning setting. Numerical results on image classification with ResNet50, quantum chemistry modeling with Schnet, neural machine translation with Transformer and recommendation system with DLRM illustrate that GN+ is competitive with the state-of-the-art methods.
Has companion code repository: https://github.com/yangorwell/NGPlus
This page was built for publication: NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6370245)