Near-Optimal Procedures for Model Discrimination with Non-Disclosure Properties

arXiv2012.02901MaRDI QIDQ6355350

Dmitrii M. Ostrovskii, Mohamed Ndaoud, Adel Javanmard, Meisam Razaviyayn

Publication date: 4 December 2020

Abstract: Let

h e t a_{0}, h e t a_{1} i n m a t h b b R^{d}

be the population risk minimizers associated to some loss

e l l : m a t h b b R^{d} i m e s m a t h c a l Z o m a t h b b R

and two distributions

m a t h b b P_{0}, m a t h b b P_{1}

on

m a t h c a l Z

. The models

h e t a_{0}, h e t a_{1}

are unknown, and

m a t h b b P_{0}, m a t h b b P_{1}

can be accessed by drawing i.i.d samples from them. Our work is motivated by the following model discrimination question: "What sizes of the samples from

m a t h b b P_{0}

and

m a t h b b P_{1}

allow to distinguish between the two hypotheses

h e t a^{*} = h e t a_{0}

and

h e t a^{*} = h e t a_{1}

for given

h e t a^{*} i n h e t a_{0}, h e t a_{1}

?" Making the first steps towards answering it in full generality, we first consider the case of a well-specified linear model with squared loss. Here we provide matching upper and lower bounds on the sample complexity as given by

m i n 1 / D e l t a^{2}, s q r t r / D e l t a

up to a constant factor; here

D e l t a

is a measure of separation between

m a t h b b P_{0}

and

m a t h b b P_{1}

and

r

is the rank of the design covariance matrix. We then extend this result in two directions: (i) for general parametric models in asymptotic regime; (ii) for generalized linear models in small samples (

n l e r

) under weak moment assumptions. In both cases we derive sample complexity bounds of a similar form while allowing for model misspecification. In fact, our testing procedures only access

h e t a^{*}

via a certain functional of empirical risk. In addition, the number of observations that allows us to reach statistical confidence does not allow to "resolve" the two models

-

that is, recover

h e t a_{0}, h e t a_{1}

up to

O (D e l t a)

prediction accuracy. These two properties allow to use our framework in applied tasks where one would like to

e x t i t i d e n t i f y

a prediction model, which can be proprietary, while guaranteeing that the model cannot be actually

e x t i t i n f e r r e d

by the identifying agent.

Has companion code repository: https://github.com/ostrodmit/testing-without-recovery

This page was built for publication: Near-Optimal Procedures for Model Discrimination with Non-Disclosure Properties

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6355350)