micro-mass
OpenML dataset with id 1515
Author name not available (Why is that?)
Full work available at URL: https://api.openml.org/data/v1/download/1593707/micro-mass.arff
Upload date: 1 June 2015
Dataset Characteristics
Number of classes: 20
Number of features: 1,301 (numeric: 1,300, symbolic: 1 and in total binary: 0 )
Number of instances: 571
Number of instances with missing values: 0
Number of missing values: 0
Author: Pierre Mahé, Jean-Baptiste Veyrieras Source: UCI - 2014 Please cite:
Description
MicroMass (pure spectra version) is a dataset to explore machine learning approaches for the identification of microorganisms from mass-spectrometry data.
Source
``` Pierre Mahé, pierre.mahe '@' biomerieux.com, bioMérieux Jean-Baptiste Veyrieras, jean-baptiste.veyrieras '@' biomerieux.com, bioMérieux ```
Data Set Information
This MALDI-TOF dataset consists in:
a) A reference panel of 20 Gram positive and negative bacterial species covering 9 genera among which several species are known to be hard to discriminate by mass spectrometry (MALDI-TOF). Each species was represented by 11 to 60 mass spectra obtained from 7 to 20 bacterial strains, constituting altogether a dataset of 571 spectra obtained from 213 strains. The spectra were obtained according to the standard culture-based workflow used in clinical routine in which the microorganism was first grown on an agar plate for 24 to 48 hours before a portion of the colony was picked, spotted on a MALDI slide and a mass spectrum was acquired.
b) Based on this reference panel, a dedicated in vitro mock-up mixture dataset was constituted. For that purpose we considered 10 pairs of species of various taxonomic proximity:
- 4 mixtures, labeled A, B, C and D, involved species that belong to the same genus,
- 2 mixtures, labeled E and F, involved species that belong to distinct genera, but to the same Gram type,
- 4 mixtures, labeled G, H, I and J, involved species that belong to distinct Gram types.
Each mixture was represented by 2 pairs of strains, which were mixed according to the following 9 concentration ratios : 1:0, 10:1, 5:1, 2:1, 1:1, 1:2, 1:5, 1:10, 0:1. Two replicate spectra were acquired for each concentration ratio and each couple of strains, leading altogether to a dataset of 360 spectra, among which 80 are actually pure sample spectra.
Relevant Papers
Mahé et al. (2014). Automatic identification of mixed bacterial species fingerprints in a MALDI-TOF mass-spectrum. Bioinformatics.
Vervier et al., A benchmark of support vector machines strategies for microbial identification by mass-spectrometry data, submitted
This page was built for dataset: micro-mass