On sparse linear discriminant analysis algorithm for high-dimensional data classification. (Q2889381)

From MaRDI portal





scientific article; zbMATH DE number 6043434
Language Label Description Also known as
English
On sparse linear discriminant analysis algorithm for high-dimensional data classification.
scientific article; zbMATH DE number 6043434

    Statements

    0 references
    0 references
    0 references
    7 June 2012
    0 references
    weighting
    0 references
    small sample size problem
    0 references
    0 references
    0 references
    0 references
    On sparse linear discriminant analysis algorithm for high-dimensional data classification. (English)
    0 references
    This paper proposes a sparse linear discriminant analysis (LDA) algorithm for high-dimensional objects with a number of samples much smaller than the data dimension. This situation arises, for example, in text data classification, where a text document is viewed as a sample vector with every entry giving the measure of a particular term or word, e.g., the frequency of the term in the document. The number of terms tends to be much larger than the number of documents. Groups of documents of different types are classified by different subsets of terms; the terms characterizing one group may not occur in the samples of other groups. Therefore, in high dimensional data, groups of objects often exist in subspaces rather than in the entire space.NEWLINENEWLINE In the proposed algorithm, an LDA is considered which calculates a weight for each dimension and uses the weights to identify the subsets of important dimensions in the discriminant vectors that categorize different groups. This is achieved by including a weight sparsity term that is minimized in the LDA the objective function. The LDA objective function is based on the ratio of between-class and within-class scatters. To avoid singularity of the within-class scatter covariance matrix, it is shifted with a small perturbation. An iterative algorithm is developed for computing the sparse and orthogonal vectors related to the modified objective function. Experiments on real data sets show that the new algorithm can generate better classification results and identify relevant dimensions.
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references