Variance Reduced Local SGD with Lower Communication Complexity
From MaRDI portal
Publication:6331995
arXiv1912.12844MaRDI QIDQ6331995
Author name not available (Why is that?)
Publication date: 30 December 2019
Abstract: To accelerate the training of machine learning models, distributed stochastic gradient descent (SGD) and its variants have been widely adopted, which apply multiple workers in parallel to speed up training. Among them, Local SGD has gained much attention due to its lower communication cost. Nevertheless, when the data distribution on workers is non-identical, Local SGD requires communications to maintain its emph{linear iteration speedup} property, where is the total number of iterations and is the number of workers. In this paper, we propose Variance Reduced Local SGD (VRL-SGD) to further reduce the communication complexity. Benefiting from eliminating the dependency on the gradient variance among workers, we theoretically prove that VRL-SGD achieves a emph{linear iteration speedup} with a lower communication complexity even if workers access non-identical datasets. We conduct experiments on three machine learning tasks, and the experimental results demonstrate that VRL-SGD performs impressively better than Local SGD when the data among workers are quite diverse.
Has companion code repository: https://github.com/zerolxf/VRL-SGD
This page was built for publication: Variance Reduced Local SGD with Lower Communication Complexity
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6331995)