Minimum sample size to identify nonzero coefficients in normal regression (Q1905124)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Minimum sample size to identify nonzero coefficients in normal regression |
scientific article; zbMATH DE number 830592
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Minimum sample size to identify nonzero coefficients in normal regression |
scientific article; zbMATH DE number 830592 |
Statements
Minimum sample size to identify nonzero coefficients in normal regression (English)
0 references
1 September 1996
0 references
Ordinary least squares (OLS) estimation of the vector of coefficients in \(n\)-variable linear normal regression requires an experimental design with \(O(n)\) points. The number of points in the experimental design can be substantially reduced, however, if it is known that only \(k\) out of \(n\) coefficients are nonzero, whereas the remaining coefficients are zero. The method proposed in the author's paper, Cybern. Syst. Anal. 29, No. 5, 716-726 (1993); translation from Kibern. Sist. Anal. 1993, No. 5, 104-115 (1993; Zbl 0814.62036), for instance, requires only \(O(\ln n)\) points as \(n \to \infty\) and \(k = \text{const.}\) The method consists of two stages. First a list of indices of the nonzero regression coefficients is created, which requires \(O(\ln n)\) points. In the second stage, the columns with the corresponding indices are extracted from the full observation matrix, a reduced observation matrix is formed from these columns, and the nonzero regression coefficients are estimated by applying the OLS method to the reduced observation matrix. For OLS estimation in the second stage to work, it is sufficient that the reduced observation matrix contains information about \(O(k)\) experimental points. For \(n \to \infty\) and \(k = \)const we have \(O(\ln n) \gg O(k)\), and the experimental design required to solve the entire problem is determined by the size constraint of the first stage, which is the main user of experimental information. Are there algorithms for which \(o(\ln n)\) experimental design points are sufficient to identify the nonzero coefficients of normal regression in the first stage? We prove that no such algorithms exist among passive probabilistic algorithms. The proof is presented for the case of a regression function in the form of a generalized polynomial.
0 references
ordinary least squares estimation
0 references
linear normal regression
0 references
reduced observation matrix
0 references