poker

OpenML dataset with id 43955

Author name not available (Why is that?)

Full work available at URL: https://api.openml.org/data/v1/download/22103041/poker.arff

Upload date: 15 June 2022

Dataset Characteristics

Number of classes: 0
Number of features: 6 (numeric: 6, symbolic: 0 and in total binary: 0 )
Number of instances: 1,022,616
Number of instances with missing values: 0
Number of missing values: 0

Description

Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark. Original description:

Author: UCI Source: original - Please cite:

This is the poker dataset, retrieved 2013-11-14 from the libSVM site. Additional to the preprocessing done there (see LibSVM site for details), this dataset was created as follows:

-join test and train datasets (non-scaled versions) -relabel classes 0=positive class and 1,2,...9=negative class -normalize each file columnwise according to the following rules: -If a column only contains one value (constant feature), it will set to zero and thus removed by sparsity. -If a column contains two values (binary feature), the value occuring more often will be set to zero, the other to one. -If a column contains more than two values (multinary/real feature), the column is divided by its std deviation.

NOTE: please keep in mind that poker has a mild redundancy, e.g. some duplicated data points, roughly 0.2%, within each file (train,test). these duplicated points have not been removed!

This page was built for dataset: poker