Outliers -- finding and classifying which genuine and which spurious (Q1584167)

From MaRDI portal





scientific article; zbMATH DE number 1524249
Language Label Description Also known as
English
Outliers -- finding and classifying which genuine and which spurious
scientific article; zbMATH DE number 1524249

    Statements

    Outliers -- finding and classifying which genuine and which spurious (English)
    0 references
    0 references
    0 references
    1 November 2000
    0 references
    The identification of outliers hidden in the data is very important and there are several analytical methods which permit quite surely to find out and indicate for data vectors suspected to be atypical. However, these methods are based, more or less, on some significance testing which, in principle, needs the assumption of normality of the data. Also, these methods yield only a global index (characterization) of the outlyingness of the identified data points and do not explain specifically, why the indicated data points were pointed to as ``atypical''. The grand tour is a method permitting to find interesting \(2\)-dimensional projections of high-dimensional data. When combined with dynamic graphics environment the method enables to recognize visually the shape of a multivariate data cloud and the degree of outlyingness or closeness of some indicated data vectors. Another problem with multivariate data in higher dimensions is that if we have already obtained an indication that a given point is an outlier it is expected to say more precisely why that point was indicated as an outlier. In practice it is difficult to say this looking directly at the values gathered in the data table. An automated procedure would help us in that task. The aim of this paper is to present such a procedure-based on a clustering of the found outliers. The clustering based on angular similarities of the suspected data vectors introduces a kind of hierarchy among them and permits to classify them into several subgroups. Taking separately data vectors belonging to each of the subgroups and tracking them once more using the grand tour it is easier to fix their common features and verify to what extend they are outlying as compared to the entire bulk of the data.
    0 references
    exploratory data analysis
    0 references
    the grand tour
    0 references
    suspected outliers
    0 references
    clustering by angular similarities
    0 references
    dynamic graphics
    0 references
    multivariate data
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references