Epsilon
OpenML dataset with id 45575
Author name not available (Why is that?)
Full work available at URL: https://api.openml.org/data/v1/download/22116559/Epsilon.arff
Upload date: 15 June 2023
Dataset Characteristics
Number of classes: 2
Number of features: 2,001 (numeric: 2,000, symbolic: 1 and in total binary: 1 )
Number of instances: 500,000
Number of instances with missing values: 0
Number of missing values: 0
Data from the PASCAL Challenge 2008 as available on the LibSVM repository
Description
Notes by the LibSVM dataset website
Preprocessing: The raw data set (epsilon_train) is instance-wisely scaled to unit length and split into two parts: 4/5 for training and 1/5 for testing. The training part is feature-wisely normalized to mean zero and variance one and then instance-wisely scaled to unit length. Using the scaling factors of the training part, the testing part is processed in a similar way. These train and testing data sets are used in [GXY11a].
[GXY11a] Guo-Xun Yuan, Chia-Hua Ho, and Chih-Jen Lin. An improved GLMNET for l1-regularized logistic regression. Journal of Machine Learning Research, 13:1999-2030, 2012.
Notes by Uploader to OpenML
- This dataset contains both the train and test split.
This page was built for dataset: Epsilon