Data Augmentation in the Underparameterized and Overparameterized Regimes
From MaRDI portal
Publication:6391485
arXiv2202.09134MaRDI QIDQ6391485
Author name not available (Why is that?)
Publication date: 18 February 2022
Abstract: We provide results that exactly quantify how data augmentation affects the convergence rate and variance of estimates. They lead to some unexpected findings: Contrary to common intuition, data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction risk. Our main theoretical tool is a limit theorem for functions of randomly transformed, high-dimensional random vectors. The proof draws on work in probability on noise stability of functions of many variables. The pathological behavior we identify is not a consequence of complex models, but can occur even in the simplest settings -- one of our examples is a ridge regressor with two parameters. On the other hand, our results also show that data augmentation can have real, quantifiable benefits.
Has companion code repository: https://github.com/kevhh/dataaug
This page was built for publication: Data Augmentation in the Underparameterized and Overparameterized Regimes
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6391485)