Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
gina_agnostic - MaRDI portal

Deprecated: Use of MediaWiki\Skin\SkinTemplate::injectLegacyMenusIntoPersonalTools was deprecated in Please make sure Skin option menus contains `user-menu` (and possibly `notifications`, `user-interface-preferences`, `user-page`) 1.46. [Called from MediaWiki\Skin\SkinTemplate::getPortletsTemplateData in /var/www/html/w/includes/Skin/SkinTemplate.php at line 691] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of MediaWiki\Skin\BaseTemplate::getPersonalTools was deprecated in 1.46 Call $this->getSkin()->getPersonalToolsForMakeListItem instead (T422975). [Called from Skins\Chameleon\Components\NavbarHorizontal\PersonalTools::getHtml in /var/www/html/w/skins/chameleon/src/Components/NavbarHorizontal/PersonalTools.php at line 66] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of QuickTemplate::(get/html/text/haveData) with parameter `personal_urls` was deprecated in MediaWiki Use content_navigation instead. [Called from MediaWiki\Skin\QuickTemplate::get in /var/www/html/w/includes/Skin/QuickTemplate.php at line 131] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

gina_agnostic

From MaRDI portal
Dataset:6033761



OpenML1038MaRDI QIDQ6033761

OpenML dataset with id 1038

Author name not available (Why is that?)

Full work available at URL: https://api.openml.org/data/v1/download/53921/gina_agnostic.arff

Upload date: 6 October 2014



Dataset Characteristics

Number of classes: 2
Number of features: 971 (numeric: 970, symbolic: 1 and in total binary: 1 )
Number of instances: 3,468
Number of instances with missing values: 0
Number of missing values: 0

Author: [isabelle@clopinet.com Isabelle Guyon] Source: Agnostic Learning vs. Prior Knowledge Challenge Please cite: None


Dataset from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch), which consisted of 5 different datasets (SYLVA, GINA, NOVA, HIVA, ADA). The purpose of the challenge was to check if the performance of domain-specific feature engineering (prior knowledge) can be met by algorithms that were trained on data without any domain-specific knowledge (agnostic). For the latter, the data was anonymised and preprocessed in a way that makes them uninterpretable.

Modified by TunedIT (converted to ARFF format)


Topic

The task of GINA is handwritten digit recognition. This is the agnostic version of a subset of the MNIST data set. We chose the problem of separating the odd numbers from even numbers. We use 2-digit numbers. Only the unit digit is informative for that task, therefore at least ½ of the features are distracters. This is a twoclass classification problem with sparse continuous input variables, in which each class is composed of several clusters. It is a problems with heterogeneous classes.


Source

The data set was constructed from the MNIST data that is made available by Yann LeCun of the NEC Research Institute at http://yann.lecun.com/exdb/mnist/. The digits have been size-normalized and centered in a fixed-size image of dimension 28x28. Examples are shown in the documentation in chapter 3.


Description

To construct the “agnostic” dataset, we performed the following steps: - We removed the pixels that were 99% of the time white. This reduced the original feature set of 784 pixels to 485. - The original resolution (256 gray levels) was kept. - In spite of the fact that the data are rather sparse (about 30% of the values are non-zero), we saved the data as a dense matrix because we found that it can be compressed better in this way (to 19 MB.) - The feature names are the (i,j) matrix coordinates of the pixels (in a 28x28 matrix.) - We created 2 digit numbers by dividing the datasets into to parts and pairing the digits at random. - The task is to separate odd from even numbers. The digit of the tens being not informative, the features of that digit act as distracters. To construct the “prior” dataset, we went back to the original data and fetched the “informative” digit in its original representation. Therefore, this data representation consists in a vector of concatenating the lines of a 28x28 pixel map.

Data type: non-sparse Number of features: 970 Number of examples and check-sums: Pos_ex Neg_ex Tot_ex Check_sum Train 1550 1603 3153 164947945.00 Valid 155 160 315 16688946.00


This dataset contains samples from both training and validation datasets.






This page was built for dataset: gina_agnostic