Plateau Phenomenon in Gradient Descent Training of RELU Networks: Explanation, Quantification, and Avoidance
From MaRDI portal
Publication:5157837
DOI10.1137/20M1353010zbMath1487.65070arXiv2007.07213OpenAlexW3203168519MaRDI QIDQ5157837
Publication date: 20 October 2021
Published in: SIAM Journal on Scientific Computing (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2007.07213
Nonlinear programming (90C30) Numerical optimization and variational techniques (65K10) Dynamical systems in numerical analysis (37N30)
Related Items (1)
Cites Work
- Least squares approximation by splines with free knots
- Optimal approximation of piecewise smooth functions using deep ReLU neural networks
- Gradient descent optimizes over-parameterized deep ReLU networks
- Error bounds for approximations with deep ReLU networks
- Mean field analysis of neural networks: a central limit theorem
- Approximation to Data by Splines with Free Knots
- Universal approximation bounds for superpositions of a sigmoidal function
- A mean field view of the landscape of two-layer neural networks
- Dynamics of Learning Near Singularities in Layered Networks
- Wide neural networks of any depth evolve as linear models under gradient descent *
- Approximation by superpositions of a sigmoidal function
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
This page was built for publication: Plateau Phenomenon in Gradient Descent Training of RELU Networks: Explanation, Quantification, and Avoidance