Speech

OpenML dataset with id 40910

Author name not available (Why is that?)

Full work available at URL: https://api.openml.org/data/v1/download/16787473/Speech.arff

Upload date: 22 September 2017

Dataset Characteristics

Number of classes: 2
Number of features: 401 (numeric: 400, symbolic: 1 and in total binary: 1 )
Number of instances: 3,686
Number of instances with missing values: 0
Number of missing values: 0

Description

"The speech dataset was also provided by (see citation request) and contains real world data from recorded English language. The normal class contains data from persons having an American accent whereas the outliers are represented from seven other speakers, having a different accent. The feature vector is the i-vector of the speech segment, which is a state-of-the- art feature in speaker recognition. The dataset has 400 dimensions and is thus the task in our evaluation with the largest number of dimensions. It has 3,686 instances including 1.65% anomalies." (cite from Goldstein, Markus, and Seiichi Uchida. "A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data." PloS one 11.4 (2016): e0152173.). This dataset is not the original dataset. The target variable "Target" is relabeled into "Normal" and "Anomaly".

This page was built for dataset: Speech