hiva_agnostic

OpenML dataset with id 1039

Author name not available (Why is that?)

Full work available at URL: https://api.openml.org/data/v1/download/53922/hiva_agnostic.arff

Upload date: 6 October 2014

Dataset Characteristics

Number of classes: 2
Number of features: 1,618 (numeric: 1,617, symbolic: 1 and in total binary: 1 )
Number of instances: 4,229
Number of instances with missing values: 0
Number of missing values: 0

Description

Author: Source: Unknown - Date unknown Please cite:

Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch)

Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php

Modified by TunedIT (converted to ARFF format)

HIVA is the HIV infection database

The task of HIVA is to predict which compounds are active against the AIDS HIV infection. The original data has 3 classes (active, moderately active, and inactive). We brought it back to a two-class classification problem (active vs. inactive). We represented the data as 2000 sparse binary input variables. The variables represent properties of the molecule inferred from its structure. The problem is therefore to relate structure to activity (a QSAR=quantitative structure-activity relationship problem) to screen new compounds before actually testing them (a HTS=high-throughput screening problem.)

Data type: non-sparse Number of features: 1617 Number of examples and check-sum: Pos_ex Neg_ex Tot_ex Check_sum Train 135 3710 3845 564954.00 Valid 14 370 384 56056.00

This dataset contains samples from both training and validation datasets.

This page was built for dataset: hiva_agnostic