Large-scale and fine-grained phenological stage annotation of herbarium specimens datasets

From MaRDI portal
Dataset:6710101



DOI10.5281/zenodo.2548630Zenodo2548630MaRDI QIDQ6710101

Dataset published at Zenodo repository.

Author name not available (Why is that?)

Publication date: 25 January 2019

Copyright license: No records found.



This upload is constituted of four datasets of specimens from American herbaria covering different levels of information precision and different floras - from temperate to equatorial. Three of these datasets consist of selected specimens from herbaria located in different geographic and environmental regions. Each specimen of these three datasets was annotated with the following fields: family, genus, species name, fertile / non-fertile, presence / absence of flower(s), presence / absence of fruit(s). The resulting dataset was composed of 163,233 herbarium specimens belonging to 7,782 species, 1,906 genera, and 236 families. Specimens were annotated as fertile if any reproductive structures were present, such as sporangia (ferns), cones (gymnosperms), flowers, or fruits (angiosperms). Non-fertile specimens were those that lacked any reproductive structures. The fourth dataset consists of 20,371 herbarium specimens from 11 genera in the sunflower family (Asteraceae). The main difference in this dataset is that it is annotated with fine-grained phenophase scores rather than presence/absence attributes (see description below). Each of these datasets is described below: NEVP: this dataset of New England vascular plant (NEVP) specimens was produced by members of the Consortium of Northeastern Herbaria. The dataset comprises 42,658 digitized specimens that belong to 1,375 species and come from several North American institutions. Most of the specimens in this dataset are from the north-temperate region of the northeastern United States. FSU: this dataset was produced by the Florida State Universitys Robert K. Godfrey Herbarium (FSU), a collection that focuses on northern Florida and the U.S. Southeast Coastal Plain, one of North Americas biodiversity hotspots. This dataset contains 54,263 digitized herbarium specimen records that belong to 3,870 species, making it the taxonomically richest dataset in this study. Most species in this dataset grow under subtropical or warm temperate conditions in the southeastern region of the United States. CAY: this dataset comes from the IRDs Herbarium of French Guiana (CAY). CAY is dedicated to the Guayana Shield flora, with a strong focus on tropical tree species. This dataset is composed of 66,312 herbarium specimens that belong to 3,024 species. All digitized specimens of this herbarium are accessible online. Most specimens were collected in the tropical rainforests of French Guiana, with the remaining specimens coming mostly from Suriname and Guyana. PHENO: this dataset includes 20,371 herbarium specimens of 139 species in the Asteraceae produced in a study of phenological trends in the U.S. Southeast Coastal Plain. The dataset is composed of specimen records from 57 herbaria. Each recorded specimen was annotated for quartile percentages (0, 25, 50, 75, or 100%) of (i) closed buds, (ii) buds transformed into flowers, and (iii) fruits. According to the distribution of these three categories for each specimen, a phenophase code was computed. Datasets format These datasets are grouped in 3 tasks: fertility detection flowers and/or fruit detection phenophase classification The first 2 tasks are carried on the first 3 previous datasets and thus are based on the same set of images, unlike the third task which has its own disjoint set of images. This is why the dataset is presented into two separated files, one for each set of images. Fertility detection flower/fruit detection These tasks are contained into the herbarium_fertility_annotations.zip archive. It consists of 3 files: metadata.csv: general information about all the herbarium specimens for these tasks id: specimen identifier collection: which of NEVP, FSU or CAY does the specimen come from herbarium: institution of origin of the specimen, especially for NEVP collection clade, family, genus, species: classification of the specimen URL: URL of the scan fertility_task.csv: specific information regarding the fertility detection task id: specimen identifier is_fertile: True if the specimen has an expression of fertility, False otherwise train_test_set: which subset does the specimen belong to; possible values are: train, random_test, species_test and herbarium_test flower_fruit_task.csv: specific information regarding the flower/fruit detection task id: specimen identifier, note that in this case not all the specimen described in metadata.csv are included in this task has_flower: True if the specimen has at least one flower, False otherwise has_fruit: True if the specimen has at least one fruit, False otherwise train_test_set: which subset does the specimen belong to; possible values are: train, random_test, species_test and herbarium_test Phenophase classification These tasks are contained into the herbarium_asteraceae_phenophase_annotations.zip archive. It consists of a single file: annotations.csv: id: specimen identifier URL: URL of the scan genus: genus of the specimen phenophase: integer from 1 to 9 describing the phenophase of the specimen train_test_set: which subset does the specimen belong to; possible values are: train and test Additional ressources More information can be found in the related paper: Lorieul, T., K. D. Pearson, E. R. Ellwood, H. Goau, J.-F. Molino, P. W. Sweeney, J. M. Yost, J. Sachs, E. Mata-Montero, G. Nelson, P. S. Soltis, P. Bonnet, and A. Joly. 2019. Toward a large-scale and deep phenological stage annotation of herbarium specimens: Case studies from temperate, tropical, and equatorial floras. Applications in Plant Sciences 7(3): e1233. For an example of usage of these datasets as well as a baseline, see: http://doi.org/10.5281/zenodo.2549996






This page was built for dataset: Large-scale and fine-grained phenological stage annotation of herbarium specimens datasets