Neural Stein critics with staged $L^2$-regularization
From MaRDI portal
Publication:6404307
arXiv2207.03406MaRDI QIDQ6404307
Author name not available (Why is that?)
Publication date: 7 July 2022
Abstract: Learning to differentiate model distributions from observed data is a fundamental problem in statistics and machine learning, and high-dimensional data remains a challenging setting for such problems. Metrics that quantify the disparity in probability distributions, such as the Stein discrepancy, play an important role in high-dimensional statistical testing. In this paper, we investigate the role of regularization in training a neural network Stein critic so as to distinguish between data sampled from an unknown probability distribution and a nominal model distribution. Making a connection to the Neural Tangent Kernel (NTK) theory, we develop a novel staging procedure for the weight of regularization over training time, which leverages the advantages of highly-regularized training at early times. Theoretically, we prove the approximation of the training dynamic by the kernel optimization, namely the ``lazy training, when the regularization weight is large, and training on samples converge at a rate of up to a log factor. The result guarantees learning the optimal critic assuming sufficient alignment with the leading eigen-modes of the zero-time NTK. The benefit of the staged regularization is demonstrated on simulated high dimensional data and an application to evaluating generative models of image data.
Has companion code repository: https://github.com/mrepasky3/staged_l2_neural_stein_critics
This page was built for publication: Neural Stein critics with staged $L^2$-regularization
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6404307)