CADA: Communication-Adaptive Distributed Adam
From MaRDI portal
Publication:6357273
arXiv2012.15469MaRDI QIDQ6357273
Tianyi Chen, Yuejiao Sun, Ziye Guo, Wotao Yin
Publication date: 31 December 2020
Abstract: Stochastic gradient descent (SGD) has taken the stage as the primary workhorse for large-scale machine learning. It is often used with its adaptive variants such as AdaGrad, Adam, and AMSGrad. This paper proposes an adaptive stochastic gradient descent method for distributed machine learning, which can be viewed as the communication-adaptive counterpart of the celebrated Adam method - justifying its name CADA. The key components of CADA are a set of new rules tailored for adaptive stochastic gradients that can be implemented to save communication upload. The new algorithms adaptively reuse the stale Adam gradients, thus saving communication, and still have convergence rates comparable to original Adam. In numerical experiments, CADA achieves impressive empirical performance in terms of total communication round reduction.
Has companion code repository: https://github.com/ChrisYZZ/CADA-master
This page was built for publication: CADA: Communication-Adaptive Distributed Adam
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6357273)