sylva_agnostic
OpenML dataset with id 40992
Author name not available (Why is that?)
Full work available at URL: https://api.openml.org/data/v1/download/18154656/sylva_agnostic.arff
Upload date: 5 December 2017
Dataset Characteristics
Number of features: 217 (numeric: 40, symbolic: 177 and in total binary: 173 )
Number of instances: 14,395
Number of instances with missing values: 0
Number of missing values: 0
Author: [isabelle@clopinet.com Isabelle Guyon] Source: Agnostic Learning vs. Prior Knowledge Challenge Please cite: None
__Major changes w.r.t. version 1: changed binary features to data type factor.__
Dataset from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch), which consisted of 5 different datasets (SYLVA, GINA, NOVA, HIVA, ADA). The purpose of the challenge was to check if the performance of domain-specific feature engineering (prior knowledge) can be met by algorithms that were trained on data without any domain-specific knowledge (agnostic). For the latter, the data was anonymised and preprocessed in a way that makes them uninterpretable.
This dataset contains the agnostic (smashed) version of a data set from the Remote Sensing and GIS Program of Colorado State University for the time span June 2005 - September 2006. A Similar, raw and not-agnostic data set is termed __Covertype Dataset__ and can be found in the UCI Database.
Modified by TunedIT (converted to ARFF format)
Topic
The task of SYLVA is to classify forest cover types. The forest cover type for 30 x 30 meter cells is obtained from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. We brought it back to a two-class classification problem (classifying Ponderosa pine vs. everything else). The “agnostic data” consists in 216 input variables. Each pattern is composed of 4 records: 2 true records matching the target and 2 records picked at random. Thus ½ of the features are distracters. The “prior knowledge data” is identical to the “agnostic data”, except that the distracters are removed and the identity of the features is revealed.
Description
Data type: non-sparse Number of features: 216 Number of examples and check-sums: Pos_ex Neg_ex Tot_ex Check_sum Train 805 12281 13086 238271607.00 Valid 81 1228 1309 23817234.00
This dataset contains samples from both training and validation datasets.
Source
Original owners:
Remote Sensing and GIS Program
Department of Forest Sciences
College of Natural Resources
Colorado State University
Fort Collins, CO 80523
(contact Jock A. Blackard, jblackard/wo_ftcol@fs.fed.us
or Dr. Denis J. Dean, denis@cnr.colostate.edu)
Jock A. Blackard
USDA Forest Service
3825 E. Mulberry
Fort Collins, CO 80524 USA
jblackard/wo_ftcol@fs.fed.us
This page was built for dataset: sylva_agnostic