On sparse linear discriminant analysis algorithm for high-dimensional data classification. (Q2889381)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: On sparse linear discriminant analysis algorithm for high-dimensional data classification. |
scientific article; zbMATH DE number 6043434
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | On sparse linear discriminant analysis algorithm for high-dimensional data classification. |
scientific article; zbMATH DE number 6043434 |
Statements
7 June 2012
0 references
weighting
0 references
small sample size problem
0 references
0 references
0 references
0 references
0 references
On sparse linear discriminant analysis algorithm for high-dimensional data classification. (English)
0 references
This paper proposes a sparse linear discriminant analysis (LDA) algorithm for high-dimensional objects with a number of samples much smaller than the data dimension. This situation arises, for example, in text data classification, where a text document is viewed as a sample vector with every entry giving the measure of a particular term or word, e.g., the frequency of the term in the document. The number of terms tends to be much larger than the number of documents. Groups of documents of different types are classified by different subsets of terms; the terms characterizing one group may not occur in the samples of other groups. Therefore, in high dimensional data, groups of objects often exist in subspaces rather than in the entire space.NEWLINENEWLINE In the proposed algorithm, an LDA is considered which calculates a weight for each dimension and uses the weights to identify the subsets of important dimensions in the discriminant vectors that categorize different groups. This is achieved by including a weight sparsity term that is minimized in the LDA the objective function. The LDA objective function is based on the ratio of between-class and within-class scatters. To avoid singularity of the within-class scatter covariance matrix, it is shifted with a small perturbation. An iterative algorithm is developed for computing the sparse and orthogonal vectors related to the modified objective function. Experiments on real data sets show that the new algorithm can generate better classification results and identify relevant dimensions.
0 references