Dataset of histopathological image crops from GTEx project
DOI10.5281/zenodo.13330659Zenodo13330659MaRDI QIDQ6723411
Dataset published at Zenodo repository.
Author name not available (Why is that?)
Publication date: 16 August 2024
Copyright license: No records found.
This is a dataset of histological slides from the GTEx project that has been balanced for 3 major factors (organ, sex, and age bracket) that may beuseful to train models in supervised or self-supervised modes. Four datasets are avaialble: gtex_histology_balanced_3_slides_200_tiles.tar.gz: Conditioned on the 3 factors, 3 slides were selected per group, and 200 tiles in tissue segmented areas selected randomly per slide. gtex_histology_balanced_3_slides_2000_tiles.tar.gz: Conditioned on the 3 factors, 3 slides were selected per group, and 2000 tiles in tissue segmented areas selected randomly per slide. gtex_histology_balanced_10_slides_100_tiles.tar.gz: Conditioned on the 3 factors, 10 slides were selected per group (when possible), and 100 tiles in tissue segmented areas selected randomly per slide. This dataset matches closely the "gtex_histology_balanced_3_slides_200_tiles.tar.gz" dataset in total number of tiles. gtex_histology_balanced_10_slides_800_tiles.tar.gz: Conditioned on the 3 factors, 10 slides were selected per group (when possible), and 800 tiles in tissue segmented areas selected randomly per slide. This dataset matches closely the "gtex_histology_balanced_3_slides_200_tiles.tar.gz" dataset in total number of tiles. Each archive file contains the following: slide_annotation.csv: a slide-level annotation of the slides (see below) train: a directory with image tiles to be used to train a model valid: a directory with image tiles to be used to validate a model The slide_annotation file contains publicly available information on the slides in addition to 3 columns: "Tissue_simple": the organ of the slide "split": whether the slide was assign the 'train' or 'valid' split for training. The validation split slides have 1/10th of the tiles from training. "n_tiles": the number of image tiles in the dataset for each slide Example: Tissue Sample ID Tissue Subject ID Sex Age Bracket Hardy Scale Pathology Categories Pathology Notes Tissue_simple split n_tiles GTEX-1128S-1426 Esophagus - Mucosa GTEX-1128S female 60-69 Fast death - natural causes 6 pieces, near- total autolysis/mucosa completely sloughed Esophagus train 200 GTEX-113JC-1226 Stomach GTEX-113JC female 50-59 Fast death - natural causes 6 pieces, well dissected mucosa; some areas are severely autolyzed Stomach valid 20 GTEX-1192W-2526 Muscle - Skeletal GTEX-1192W male 60-69 Fast death - natural causes 2 pieces, ~10-20% interstitial fat, rep foci delineated Muscle train 200 GTEX-1192X-0426 Muscle - Skeletal GTEX-1192X male 50-59 Slow death 2 pieces, 5-10% interstitial fat, rep. foci delineated Muscle valid 20 GTEX-11DXX-1326 Stomach GTEX-11DXX female 60-69 Ventilator case gastritis 6 pieces, mild chronic active gastritis Stomach train 200 Inside train and valid and JPEG files named with the following convention: Tissue Sample ID.Tissue_simple.Sex.Age Bracket.Y position.X position.jpg such that the origin of the crops can be traced and the file name serve as a direct class label if desired. Examples: "GTEX-ZYT6-1326.Pancreas.male.30-39.47492.16064.jpg", "GTEX-WWYW-2726.Ovary.female.50-59.5024.15008.jpg.
This page was built for dataset: Dataset of histopathological image crops from GTEx project