Momentum-Based Variance Reduction in Non-Convex SGD

arXiv1905.10018MaRDI QIDQ6319287

Author name not available (Why is that?)

Publication date: 23 May 2019

Abstract: Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-batches" in order to achieve their improved results. We present a new algorithm, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. On smooth losses

F

, STORM finds a point

with

in

T

iterations with

s i g m a^{2}

variance in the gradients, matching the optimal rate but without requiring knowledge of

s i g m a

.

Has companion code repository: https://github.com/duanzhiihao/PyTorch_OLoptim

This page was built for publication: Momentum-Based Variance Reduction in Non-Convex SGD

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6319287)