Satellite
OpenML dataset with id 40900
Author name not available (Why is that?)
Full work available at URL: https://api.openml.org/data/v1/download/16787463/Satellite.arff
Upload date: 22 September 2017
Dataset Characteristics
Number of classes: 2
Number of features: 37 (numeric: 36, symbolic: 1 and in total binary: 1 )
Number of instances: 5,100
Number of instances with missing values: 0
Number of missing values: 0
Author: Markus Goldstein Source: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OPQMVF Dataverse Please cite:
The satellite dataset comprises of features extracted from satellite observations. In particular, each image was taken under four different light wavelength, two in visible light (green and red) and two infrared images. The task of the original dataset is to classify the image into the soil category of the observed region.
Classes
We defined the soil classes “red soil”, “gray soil”, “damp gray soil” and “very damp gray soil” as the normal class. From the semantically different classes “cotton crop” and “soil with vegetation stubble” anomalies are sampled.
After merging the original training and test set into a single dataset, the resulting dataset contains 5,025 normal instances as well as 75 randomly sampled anomalies (1.49%) with 36 dimensions
Relevant Papers
Goldstein, Markus, and Seiichi Uchida. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data." PloS one 11.4 (2016): e0152173
This dataset is not the original dataset. The target variable 'Target' is relabeled into 'Normal' and 'Anomaly'
This page was built for dataset: Satellite