Brain-cancer-gene-expression---CuMiDa

OpenML dataset with id 43657

Author name not available (Why is that?)

Full work available at URL: https://api.openml.org/data/v1/download/22102482/Brain-cancer-gene-expression---CuMiDa.arff

Upload date: 24 March 2022

Dataset Characteristics

Number of features: 54,676 (numeric: 54,675, symbolic: 0 and in total binary: 0 )
Number of instances: 130
Number of instances with missing values: 0
Number of missing values: 0

Description

Dataset GSE50161 on brain cancer gene expression from CuMiDa

5 classes 54676 genes 130 samples

About Here we present the Curated Microarray Database (CuMiDa), a repository containing 78 handpicked cancer microarray datasets, extensively curated from 30.000 studies from the Gene Expression Omnibus (GEO), solely for machine learning. The aim of CuMiDa is to offer homogeneous and state-of-the-art biological preprocessing of these datasets, together with numerous 3-fold cross validation benchmark results to propel machine learning studies focused on cancer research. The database make available various download options to be employed by other programs, as well for PCA and t-SNE results. CuMiDa stands different from existing databases for offering newer datasets, manually and carefully curated, from samples quality, unwanted probes, background correction and normalization, to create a more reliable source of data for computational research. http://sbcb.inf.ufrgs.br/cumida

References

Feltes, B.C.; Chandelier, E.B.; Grisci, B.I.; Dorn, M. (2019) CuMiDa: An Extensively Curated Microarray Database for Benchmarking and Testing of Machine Learning Approaches in Cancer Research. Journal of Computational Biology, 26 (4), 376-386. [1] Grisci, B. I., Feltes, B. C., Dorn, M. (2019). Neuroevolution as a tool for microarray gene expression pattern identification in cancer research. Journal of biomedical informatics, 89, 122-133. [2]

Inspiration

How to deal with class imbalance for classification? How to identify the most important genes for the classification of each cancer subtype? Is it possible to discover subtypes? How to beat the classification and clustering benchmarks for this dataset listed on the CuMiDa website?

This page was built for dataset: Brain-cancer-gene-expression---CuMiDa