Classification Logit Two-sample Testing by Neural Networks

arXiv1909.11298MaRDI QIDQ6325933

Alexander Cloninger, Xiuyuan Cheng

Publication date: 25 September 2019

Abstract: The recent success of generative adversarial networks and variational learning suggests training a classifier network may work well in addressing the classical two-sample problem. Network-based tests have the computational advantage that the algorithm scales to large samples. This paper proposes a two-sample statistic which is the difference of the logit function, provided by a trained classification neural network, evaluated on the testing set split of the two datasets. Theoretically, we prove the testing power to differentiate two sub-exponential densities given that the network is sufficiently parametrized. When the two densities lie on or near to low-dimensional manifolds embedded in possibly high-dimensional space, the needed network complexity is reduced to only scale with the intrinsic dimensionality. Both the approximation and estimation error analysis are based on a new result of near-manifold integral approximation. In experiments, the proposed method demonstrates better performance than previous network-based tests using classification accuracy as the two-sample statistic, and compares favorably to certain kernel maximum mean discrepancy tests on synthetic datasets and hand-written digit datasets.

Has companion code repository: https://github.com/xycheng/net_logit_test

This page was built for publication: Classification Logit Two-sample Testing by Neural Networks

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6325933)