Near-Optimal Procedures for Model Discrimination with Non-Disclosure Properties

From MaRDI portal
Publication:6355350

arXiv2012.02901MaRDI QIDQ6355350

Dmitrii M. Ostrovskii, Mohamed Ndaoud, Adel Javanmard, Meisam Razaviyayn

Publication date: 4 December 2020

Abstract: Let heta0,heta1inmathbbRd be the population risk minimizers associated to some loss ell:mathbbRdimesmathcalZomathbbR and two distributions mathbbP0,mathbbP1 on mathcalZ. The models heta0,heta1 are unknown, and mathbbP0,mathbbP1 can be accessed by drawing i.i.d samples from them. Our work is motivated by the following model discrimination question: "What sizes of the samples from mathbbP0 and mathbbP1 allow to distinguish between the two hypotheses heta*=heta0 and heta*=heta1 for given heta*inheta0,heta1?" Making the first steps towards answering it in full generality, we first consider the case of a well-specified linear model with squared loss. Here we provide matching upper and lower bounds on the sample complexity as given by min1/Delta2,sqrtr/Delta up to a constant factor; here Delta is a measure of separation between mathbbP0 and mathbbP1 and r is the rank of the design covariance matrix. We then extend this result in two directions: (i) for general parametric models in asymptotic regime; (ii) for generalized linear models in small samples (nler) under weak moment assumptions. In both cases we derive sample complexity bounds of a similar form while allowing for model misspecification. In fact, our testing procedures only access heta* via a certain functional of empirical risk. In addition, the number of observations that allows us to reach statistical confidence does not allow to "resolve" the two models that is, recover heta0,heta1 up to O(Delta) prediction accuracy. These two properties allow to use our framework in applied tasks where one would like to extitidentify a prediction model, which can be proprietary, while guaranteeing that the model cannot be actually extitinferred by the identifying agent.




Has companion code repository: https://github.com/ostrodmit/testing-without-recovery








This page was built for publication: Near-Optimal Procedures for Model Discrimination with Non-Disclosure Properties

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6355350)