Neural Stein critics with staged $L^2$-regularization

Author name not available (Why is that?)

Publication date: 7 July 2022

Abstract: Learning to differentiate model distributions from observed data is a fundamental problem in statistics and machine learning, and high-dimensional data remains a challenging setting for such problems. Metrics that quantify the disparity in probability distributions, such as the Stein discrepancy, play an important role in high-dimensional statistical testing. In this paper, we investigate the role of

L^{2}

regularization in training a neural network Stein critic so as to distinguish between data sampled from an unknown probability distribution and a nominal model distribution. Making a connection to the Neural Tangent Kernel (NTK) theory, we develop a novel staging procedure for the weight of regularization over training time, which leverages the advantages of highly-regularized training at early times. Theoretically, we prove the approximation of the training dynamic by the kernel optimization, namely the ``lazy training, when the $L^{2}$ regularization weight is large, and training on $n$ samples converge at a rate of $O (n^{- 1 / 2})$ up to a log factor. The result guarantees learning the optimal critic assuming sufficient alignment with the leading eigen-modes of the zero-time NTK. The benefit of the staged $L^{2}$ regularization is demonstrated on simulated high dimensional data and an application to evaluating generative models of image data.

Has companion code repository: https://github.com/mrepasky3/staged_l2_neural_stein_critics

This page was built for publication: Neural Stein critics with staged $L^2$-regularization

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6404307)