Cross-validation and the smoothing of orthogonal series density estimators (Q1089698)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Cross-validation and the smoothing of orthogonal series density estimators |
scientific article; zbMATH DE number 4005345
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Cross-validation and the smoothing of orthogonal series density estimators |
scientific article; zbMATH DE number 4005345 |
Statements
Cross-validation and the smoothing of orthogonal series density estimators (English)
0 references
1987
0 references
Let \(X_ 1,...,X_ n\) be independent, identically distributed random variables with probability density function f having as support a region \({\mathcal R}\), and let \(\Phi_ i\), \(i=0,1,..\). be orthogonal functions on \({\mathcal R}\), normalized so that \(\int \Phi^ 2_ i=1\), \(i\geq 0\), where non-specified integrals extend over all of \({\mathcal R}\). Suppose f has an \(L^ 2\)-convergent expansion in terms of the \(\Phi_ i's\), \(f\sim \sum_{i\geq 0}c_ i\Phi_ i\), where \(c_ i=\int \Phi_ if\) and each \(c_ i\) may be unbiasedly estimated by \(\hat c{}_ i=n^{- 1}\sum^{n}_{j=1}\Phi_ i(X_ j)\), \(i\geq 0.\) The naive estimator \(\hat f{}_{\infty}=\sum_{i\geq 0}\hat c_ i\Phi_ i\) is usually not well defined since the series need not to converge, not even in the \(L^ 2\) sense. This problem may be resolved by using weights \(b_ i\), so that \(\hat f(x| \underset \tilde{} b)=\sum_{i\geq 0}b_ i\hat c_ i\Phi_ i(x)\) is well defined, where \(\underset \tilde{} b=(b_ 0,b_ 1,...,)\) is what the author calls the (smoothing) policy involved. The condition \(\sum_{i\geq 0}| b_ i| <\infty\) usually implies pointwise convergence for \(\hat f(\cdot | \underset \tilde{} b)\). Whereas the naive estimator \(\hat f_{\infty}\) is (formally) unbiased, it has nevertheless infinite variance. With an appropriate choice of \(\underset \tilde{} b\), the shrunken estimator \(\hat f(\cdot | \underset \tilde{} b)\) has finite variance but it is biased. Actually, the shrinkage operation just suggested is designed to reduce both variance and mean square error at the expense of bias. Set \(\mu_ n(\underset \tilde{} b)\) for the mean integrated square error of \(\hat f(\cdot | \underset \tilde{} b)\) corresponding to the policy \(\underset \tilde{} b\); that is, \(\mu_ n(\underset \tilde{} b)=\int E[f(\cdot)-\hat f(\cdot | \underset \tilde{} b)]^ 2\). On the basis of consistency and efficiency considerations, attention is restricted to policies which are completely determined by only a finite number of parameters. For the most part, two specific classes of policies are entertained; namely, the traditional class of policies, \({\mathcal B}_ 0\), where \(\underset \tilde{} b_ m\in {\mathcal B}_ 0\), if \(\underset \tilde{} b_ m\) is the (infinite) vector with precisely the first \(m+1\) components equal to 1 and the rest equal to 0, \(m\geq 0\); and the class of policies suggested by \textit{G. Wahba}, Ann. Stat. 9, 146-156 (1981; Zbl 0463.62034), \({\mathcal B}_ w\), where \(\underset \tilde{} b\in {\mathcal B}_ w\), if \(\underset \tilde{} b=(b_ 0,b_ 1,...,)\) with \(b_ i=(1+ci^ d)^{-1}\), \(i\geq 0\), \(0<c\leq c_ 0<\infty\), \(1<d_ 0\leq d<\infty.\) Also the author, for the most part, confines himself to cosine series: \(\Phi_ 0(x)=\pi^{-}\), \(\Phi_ i(x)=(2/\pi)^{-}\cos ix\), \(i\geq 1\), with \({\mathcal R}=(0,\pi)\); and the Hermite series \(\Phi_ i(x)=(\pi^ 2i!2^ i)^{-}\). \(H_ i(x)e^{-x^{1/2}}\), \(i\geq 0\), with \({\mathcal R}=(-\infty,\infty)\). Consider the quantity \[ J_ n(\underset \tilde{} b)=\sum_{i\geq 0}b^ 2_ i\hat c^ 2_ i-2n^{-1}(n-1)^{- 1}\sum_{i\geq 0}\sum_{j\neq k}b_ i\Phi_ i(X_ j)\Phi_ i(X_ k) \] which is unbiased for \(\mu_ n(\underset \tilde{} b)-\int f^ 2\) in the case of all policies \(\underset \tilde{} b\) for which \(\mu_ n(\underset \tilde{} b)\) is well defined. \(J_ n(\underset \tilde{} b)\) may also be viewed as an estimable version of \(I_ n(\underset \tilde{} b)-\int f^ 2\). Given a class \({\mathcal B}\) of (smoothing) policies, the cross-validatory policy \(\underset \tilde{} b\) is that vector which minimizes \(J_ n(\underset \tilde{} b)\) over all \(\underset \tilde{} b\in {\mathcal B}\). The policy \(\underset \tilde{}{\hat b}\) is said to be asymptotically optimal within a class \({\mathcal B}\), if \(I_ n(\underset \tilde{}{\hat b})/\inf [I_ n(\underset \tilde{} b);\underset \tilde{} b\in {\mathcal B}]\) tends to 1 as \(n\to \infty.\) The main objective of the paper is that of proving that the cross- validatory policy is asymptotically optimal. To this effect, the main result of the paper (Theorem 2.1) is established, and from it, the following conclusions are drawn: namely, \(I_ n(\hat b\underset \tilde{} {\;})/\inf [I_ n(\underset \tilde{} b_ m);0\leq m\leq n^{\gamma}]\) tends to 1, almost surely, as \(n\to \infty\) \((\gamma >0)\). Also, \(I_ n(\hat b)/\inf [I_ n(\underset \tilde{} b);\underset \tilde{} b\in {\mathcal B}(n)]\) tends to 1, almost surely, as \(n\to \infty\), where \({\mathcal B}(n)\) is any subset of \({\mathcal B}_ w\) containing \(O(n^{\gamma})\) policies, for some \(\gamma >0\).
0 references
smoothed orthogonal series density estimates
0 references
sequential-series
0 references
two- parameter smoothing
0 references
least-squares cross-validation
0 references
smoothing policy
0 references
unbiasedness
0 references
orthogonal functions
0 references
mean square error
0 references
mean integrated square error
0 references
consistency
0 references
efficiency
0 references
cosine series
0 references
Hermite series
0 references
cross-validatory policy
0 references
asymptotically optimal
0 references
0 references
0.89682245
0 references
0.8935685
0 references
0.88821125
0 references
0 references
0.8775581
0 references