MoMo: Momentum Models for Adaptive Learning Rates

From MaRDI portal
Publication:6436405

arXiv2305.07583MaRDI QIDQ6436405

Author name not available (Why is that?)

Publication date: 12 May 2023

Abstract: We present new adaptive learning rates that can be used with any momentum method. To showcase our new learning rates we develop MoMo and MoMo-Adam, which are SGD with momentum (SGDM) and Adam together with our new adaptive learning rates. Our MoMo methods are motivated through model-based stochastic optimization, wherein we use momentum estimates of the batch losses and gradients sampled at each iteration to build a model of the loss function. Our model also makes use of any known lower bound of the loss function by using truncation. Indeed most losses are bounded below by zero. We then approximately minimize this model at each iteration to compute the next step. For losses with unknown lower bounds, we develop new on-the-fly estimates of the lower bound that we use in our model. Numerical experiments show that our MoMo methods improve over SGDM and Adam in terms of accuracy and robustness to hyperparameter tuning for training image classifiers on MNIST, CIFAR10, CIFAR100, Imagenet32, DLRM on the Criteo dataset, and a transformer model on the translation task IWSLT14.




Has companion code repository: https://github.com/fabian-sp/MoMo

No records found.








This page was built for publication: MoMo: Momentum Models for Adaptive Learning Rates

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6436405)