Layer-Parallel Training of Deep Residual Neural Networks
From MaRDI portal
Publication:5027015
DOI10.1137/19M1247620OpenAlexW3004424267WikidataQ114978694 ScholiaQ114978694MaRDI QIDQ5027015
Lars Ruthotto, Jacob B. Schroder, Eric C. Cyr, Stefanie Günther, Nicolas R. Gauger
Publication date: 3 February 2022
Published in: SIAM Journal on Mathematics of Data Science (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1812.04352
optimal controlsupervised learningsimultaneous optimizationdeep learningparallel-in-timeresidual networkslayer-parallelization
Artificial neural networks and deep learning (68T07) Parallel algorithms in computer science (68W10) Optimality conditions for problems involving ordinary differential equations (49K15)
Related Items
A Unified Analysis Framework for Iterative Parallel-in-Time Algorithms ⋮ Multilevel Objective-Function-Free Optimization with an Application to Neural Networks Training ⋮ Semi-implicit back propagation ⋮ Globally Convergent Multilevel Training of Deep Residual Networks ⋮ MGIC: Multigrid-in-Channels Neural Network Architectures ⋮ Long-time integration of parametric evolution equations with physics-informed DeepONets ⋮ Connections between numerical algorithms for PDEs and neural networks ⋮ Efficient multigrid reduction-in-time for method-of-lines discretizations of linear advection ⋮ Parareal with a learned coarse model for robotic manipulation ⋮ Applications of time parallelization ⋮ A space-time parallel algorithm with adaptive mesh refinement for computational fluid dynamics ⋮ Multigrid reduction in time with Richardson extrapolation ⋮ Quantized convolutional neural networks through the lens of partial differential equations ⋮ Structure-preserving deep learning ⋮ AutoMat: automatic differentiation for generalized standard materials on GPUs
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Adaptive sequencing of primal, dual, and design steps in simulation based optimization
- Multigrid methods with space-time concurrency
- A proposal on machine learning via dynamical systems
- 50 Years of Time Parallel Time Integration
- One-Shot Approaches to Design Optimzation
- Optimal Design with Bounded Retardation for Problems with Non-separable Adjoints
- Adaptive Multilevel Inexact SQP Methods for PDE-Constrained Optimization
- Approximate Nullspace Iterations for KKT Systems
- Computational Optimization of Systems Governed by Partial Differential Equations
- Learning Deep Architectures for AI
- Evaluating Derivatives
- Minimal Repetition Dynamic Checkpointing Algorithm for Unsteady Adjoint Calculation
- Multi-Level Adaptive Solutions to Boundary-Value Problems
- A Multigrid Tutorial, Second Edition
- Stable architectures for deep neural networks
- A Nonlinear ParaExp Algorithm
- Parallel Time Integration with Multigrid
- An Efficient Parallel-in-Time Method for Optimization with Parabolic PDEs
- A non-intrusive parallel-in-time approach for simultaneous optimization with unsteady PDEs
- Multigrid Reduction in Time for Nonlinear Parabolic Problems: A Case Study
- Two-Level Convergence Theory for Multigrid Reduction in Time (MGRIT)
- Analysis of the Parareal Time‐Parallel Time‐Integration Method
- Parallel Lagrange--Newton--Krylov--Schur Methods for PDE-Constrained Optimization. Part I: The Krylov--Schur Solver
- An introduction to the adjoint approach to design
- A non-intrusive parallel-in-time adjoint solver with the xbraid library