How Implicit Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part I: the 1-D Case of Two Layers with Random First Layer

arXiv1911.02903MaRDI QIDQ6328725

Author name not available (Why is that?)

Publication date: 7 November 2019

Abstract: Today, various forms of neural networks are trained to perform approximation tasks in many fields. However, the estimates obtained are not fully understood on function space. Empirical results suggest that typical training algorithms favor regularized solutions. These observations motivate us to analyze properties of the neural networks found by gradient descent initialized close to zero, that is frequently employed to perform the training task. As a starting point, we consider one dimensional (shallow) ReLU neural networks in which weights are chosen randomly and only the terminal layer is trained. First, we rigorously show that for such networks ridge regularized regression corresponds in function space to regularizing the estimate's second derivative for fairly general loss functionals. For least squares regression, we show that the trained network converges to the smooth spline interpolation of the training data as the number of hidden nodes tends to infinity. Moreover, we derive a correspondence between the early stopped gradient descent and the smoothing spline regression. Our analysis might give valuable insight on the properties of the solutions obtained using gradient descent methods in general settings.

Has companion code repository: https://github.com/JakobHeiss/NN_regularization1

Mathematics Subject Classification ID

No records found.

This page was built for publication: How Implicit Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part I: the 1-D Case of Two Layers with Random First Layer

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6328725)