Near-Optimal Procedures for Model Discrimination with Non-Disclosure Properties
From MaRDI portal
Publication:6355350
arXiv2012.02901MaRDI QIDQ6355350
Dmitrii M. Ostrovskii, Mohamed Ndaoud, Adel Javanmard, Meisam Razaviyayn
Publication date: 4 December 2020
Abstract: Let be the population risk minimizers associated to some loss and two distributions on . The models are unknown, and can be accessed by drawing i.i.d samples from them. Our work is motivated by the following model discrimination question: "What sizes of the samples from and allow to distinguish between the two hypotheses and for given ?" Making the first steps towards answering it in full generality, we first consider the case of a well-specified linear model with squared loss. Here we provide matching upper and lower bounds on the sample complexity as given by up to a constant factor; here is a measure of separation between and and is the rank of the design covariance matrix. We then extend this result in two directions: (i) for general parametric models in asymptotic regime; (ii) for generalized linear models in small samples () under weak moment assumptions. In both cases we derive sample complexity bounds of a similar form while allowing for model misspecification. In fact, our testing procedures only access via a certain functional of empirical risk. In addition, the number of observations that allows us to reach statistical confidence does not allow to "resolve" the two models that is, recover up to prediction accuracy. These two properties allow to use our framework in applied tasks where one would like to a prediction model, which can be proprietary, while guaranteeing that the model cannot be actually by the identifying agent.
Has companion code repository: https://github.com/ostrodmit/testing-without-recovery
This page was built for publication: Near-Optimal Procedures for Model Discrimination with Non-Disclosure Properties
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6355350)