On some significance tests in cluster analysis (Q1072282)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: On some significance tests in cluster analysis |
scientific article; zbMATH DE number 3942733
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | On some significance tests in cluster analysis |
scientific article; zbMATH DE number 3942733 |
Statements
On some significance tests in cluster analysis (English)
0 references
1985
0 references
The author investigates the properties of several significance tests for distinguishing between the hypothesis H of a ''homogeneous'' population and an alternative A involving ''clustering'' or ''heterogeneity'', with emphasis on the case of multidimensional observations \(x_ 1,...,x_ n\in R^ p.\) Four types of test statistics are considered: the (s-th) largest gap between observations, their mean distance (or similarity), the minimum within-cluster sum of squares resulting from a k-means algorithm, and the resulting maximum F statistic. If, for a given significance level (error probability) a, such a test statistic exceeds the corresponding critical value \(c=c(a)\), the hypothesis H of homogeneity is rejected (e.g., in favor of a clustering structure A). The asymptotic distributions under H are given for \(n\to \infty\) and the asymptotic power of the tests is derived for neighboring alternatives \(A=A_ n\) approaching A. In particular, the asymptotic distribution of the maximum F statistic is obtained. Moreover, the asymptotic power of the gap test is characterized by a speed factor (log n)\({}^{-1}\) (for \(A_ n\) converging to H), and by a factor \(n^{-1/4}\) for tests based on the mean similarity.
0 references
cluster analysis
0 references
asymptotic normality
0 references
classification
0 references
significance tests
0 references
clustering
0 references
heterogeneity
0 references
mean distance
0 references
similarity
0 references
minimum within-cluster sum of squares
0 references
k-means algorithm
0 references
maximum F statistic
0 references
homogeneity
0 references
neighboring alternatives
0 references
asymptotic power
0 references
gap test
0 references
0 references
0 references
0 references
0 references
0.88871443
0 references
0.88791573
0 references
0.88635314
0 references
0.8860533
0 references
0 references
0.8827182
0 references
0.88048726
0 references