Second-order step-size tuning of SGD for non-convex optimization
From MaRDI portal
Publication:6362208
arXiv2103.03570MaRDI QIDQ6362208
Author name not available (Why is that?)
Publication date: 5 March 2021
Abstract: In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case. For doing so, one estimates curvature, based on a local quadratic model and using only noisy gradient approximations. One obtains a new stochastic first-order method (Step-Tuned SGD), enhanced by second-order information, which can be seen as a stochastic version of the classical Barzilai-Borwein method. Our theoretical results ensure almost sure convergence to the critical set and we provide convergence rates. Experiments on deep residual network training illustrate the favorable properties of our approach. For such networks we observe, during training, both a sudden drop of the loss and an improvement of test accuracy at medium stages, yielding better results than SGD, RMSprop, or ADAM.
Has companion code repository: https://github.com/Abdoulaye-Koroko/Second-order-step-size-tuning-of-SGD-for-non-convex-optimization
This page was built for publication: Second-order step-size tuning of SGD for non-convex optimization
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6362208)