A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
From MaRDI portal
Publication:2167333
DOI10.1007/s00033-022-01716-wOpenAlexW3141318533WikidataQ113906263 ScholiaQ113906263MaRDI QIDQ2167333
Arnulf Jentzen, Adrian Riekert
Publication date: 25 August 2022
Published in: ZAMP. Zeitschrift für angewandte Mathematik und Physik (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2104.00277
Asymptotic approximations, asymptotic expansions (steepest descent, etc.) (41A60) Artificial intelligence (68T99) Algorithms for approximation of functions (65D15)
Related Items (1)
Cites Work
- Unnamed Item
- Unnamed Item
- Introductory lectures on convex optimization. A basic course.
- General multilevel adaptations for stochastic approximation algorithms of Robbins-Monro and Polyak-Ruppert type
- Non-convergence of stochastic gradient descent in the training of deep neural networks
- Solving the Kolmogorov PDE by means of deep learning
- A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
- Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions
- Gradient descent optimizes over-parameterized deep ReLU networks
- A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics
- Lower error bounds for the stochastic gradient descent optimization algorithm: sharp convergence rates for slowly and fast decaying learning rates
- First-order methods almost always avoid strict saddle points
- Gradient Convergence in Gradient methods with Errors
- Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
- Optimization Methods for Large-Scale Machine Learning
- Strong error analysis for stochastic gradient descent optimization algorithms
- Full error analysis for the training of deep neural networks
- Dying ReLU and Initialization: Theory and Numerical Examples
This page was built for publication: A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions