Limitations of Lazy Training of Two-layers Neural Networks

arXiv1906.08899MaRDI QIDQ6320818

Author name not available (Why is that?)

Publication date: 20 June 2019

Abstract: We study the supervised learning problem under either of the following two models: (1) Feature vectors

are

d

-dimensional Gaussians and responses are

for

f_{*}

an unknown quadratic function; (2) Feature vectors

are distributed as a mixture of two

d

-dimensional centered Gaussians, and

y_{i}

's are the corresponding class labels. We use two-layers neural networks with quadratic activations, and compare three different learning regimes: the random features (RF) regime in which we only train the second-layer weights; the neural tangent (NT) regime in which we train a linearization of the neural network around its initialization; the fully trained neural network (NN) regime in which we train all the weights in the network. We prove that, even for the simple quadratic model of point (1), there is a potentially unbounded gap between the prediction risk achieved in these three training regimes, when the number of neurons is smaller than the ambient dimension. When the number of neurons is larger than the number of dimensions, the problem is significantly easier and both NT and NN learning achieve zero risk.

Has companion code repository: https://github.com/bGhorbani/Lazy-Training-Neural-Nets

This page was built for publication: Limitations of Lazy Training of Two-layers Neural Networks

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6320818)