False membership rate control in mixture models
From MaRDI portal
Publication:6392844
arXiv2203.02597MaRDI QIDQ6392844
Author name not available (Why is that?)
Publication date: 4 March 2022
Abstract: The clustering task consists in partitioning elements of a sample into homogeneous groups. Most datasets contain individuals that are ambiguous and intrinsically difficult to attribute to one or another cluster. However, in practical applications, misclassifying individuals is potentially disastrous and should be avoided. To keep the misclassification rate small, one can decide to classify only a part of the sample. In the supervised setting, this approach is well known and referred to as classification with an abstention option. In this paper the approach is revisited in an unsupervised mixture model framework and the purpose is to develop a method that comes with the guarantee that the false clustering rate (FCR) does not exceed a pre-defined nominal level . A new procedure is proposed and shown to be optimal up to a remainder term in the sense that the FCR is controlled and at the same time the number of classified items is maximized. Bootstrap versions of the procedure are shown to improve the performance in numerical experiments. An application to breast cancer data illustrates the benefits of the new approach from a practical viewpoint.
Has companion code repository: https://github.com/arianemarandon/fmrcontrol
This page was built for publication: False membership rate control in mixture models
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6392844)