Adaptive Gradient Coding

From MaRDI portal
Publication:6342406

arXiv2006.04845MaRDI QIDQ6342406

Xiaohu Tang, Qifa Yan, Guojun Han, Hankun Cao

Publication date: 8 June 2020

Abstract: This paper focuses on mitigating the impact of stragglers in distributed learning system. Unlike the existing results designed for a fixed number of stragglers, we developed a new scheme called Adaptive Gradient Coding(AGC) with flexible tolerance of various number of stragglers. Our scheme gives an optimal tradeoff between computation load, straggler tolerance and communication cost. In particular, it allows to minimize the communication cost according to the real-time number of stragglers in the practical environments. Implementations on Amazon EC2 clusters using Python with mpi4py package verify the flexibility in several situations.












This page was built for publication: Adaptive Gradient Coding