Distributed Machine Learning for Computational Engineering using MPI

From MaRDI portal
Publication:6352860

arXiv2011.01349MaRDI QIDQ6352860

Eric Darve, Kailai Xu, Weiqiang Zhu

Publication date: 2 November 2020

Abstract: We propose a framework for training neural networks that are coupled with partial differential equations (PDEs) in a parallel computing environment. Unlike most distributed computing frameworks for deep neural networks, our focus is to parallelize both numerical solvers and deep neural networks in forward and adjoint computations. Our parallel computing model views data communication as a node in the computational graph for numerical simulations. The advantage of our model is that data communication and computing are cleanly separated and thus provide better flexibility, modularity, and testability. We demonstrate using various large-scale problems that we can achieve substantial acceleration by using parallel solvers for PDEs in training deep neural networks that are coupled with PDEs.




Has companion code repository: https://github.com/kailaix/ADCME.jl








This page was built for publication: Distributed Machine Learning for Computational Engineering using MPI

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6352860)