Epsilon

OpenML dataset with id 45575

Author name not available (Why is that?)

Full work available at URL: https://api.openml.org/data/v1/download/22116559/Epsilon.arff

Upload date: 15 June 2023

Dataset Characteristics

Number of classes: 2
Number of features: 2,001 (numeric: 2,000, symbolic: 1 and in total binary: 1 )
Number of instances: 500,000
Number of instances with missing values: 0
Number of missing values: 0

Description

Data from the PASCAL Challenge 2008 as available on the LibSVM repository

Description

Notes by the LibSVM dataset website

Preprocessing: The raw data set (epsilon_train) is instance-wisely scaled to unit length and split into two parts: 4/5 for training and 1/5 for testing. The training part is feature-wisely normalized to mean zero and variance one and then instance-wisely scaled to unit length. Using the scaling factors of the training part, the testing part is processed in a similar way. These train and testing data sets are used in [GXY11a].

[GXY11a] Guo-Xun Yuan, Chia-Hua Ho, and Chih-Jen Lin. An improved GLMNET for l1-regularized logistic regression. Journal of Machine Learning Research, 13:1999-2030, 2012.

Notes by Uploader to OpenML

This dataset contains both the train and test split.

This page was built for dataset: Epsilon