Compress Then Test: Powerful Kernel Testing in Near-linear Time
From MaRDI portal
Publication:6423410
arXiv2301.05974MaRDI QIDQ6423410
Author name not available (Why is that?)
Publication date: 14 January 2023
Abstract: Kernel two-sample testing provides a powerful framework for distinguishing any pair of distributions based on sample points. However, existing kernel tests either run in time or sacrifice undue power to improve runtime. To address these shortcomings, we introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression. CTT cheaply approximates an expensive test by compressing each point sample into a small but provably high-fidelity coreset. For standard kernels and subexponential distributions, CTT inherits the statistical behavior of a quadratic-time test -- recovering the same optimal detection boundary -- while running in near-linear time. We couple these advances with cheaper permutation testing, justified by new power analyses; improved time-vs.-quality guarantees for low-rank approximation; and a fast aggregation procedure for identifying especially discriminating kernels. In our experiments with real and simulated data, CTT and its extensions provide 20--200x speed-ups over state-of-the-art approximate MMD tests with no loss of power.
Has companion code repository: https://github.com/microsoft/goodpoints
This page was built for publication: Compress Then Test: Powerful Kernel Testing in Near-linear Time
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6423410)