Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
DECIMER V2 Benchmark Datasets - MaRDI portal

Deprecated: Use of MediaWiki\Skin\SkinTemplate::injectLegacyMenusIntoPersonalTools was deprecated in Please make sure Skin option menus contains `user-menu` (and possibly `notifications`, `user-interface-preferences`, `user-page`) 1.46. [Called from MediaWiki\Skin\SkinTemplate::getPortletsTemplateData in /var/www/html/w/includes/Skin/SkinTemplate.php at line 691] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of QuickTemplate::(get/html/text/haveData) with parameter `personal_urls` was deprecated in MediaWiki Use content_navigation instead. [Called from MediaWiki\Skin\QuickTemplate::get in /var/www/html/w/includes/Skin/QuickTemplate.php at line 131] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

DECIMER V2 Benchmark Datasets

From MaRDI portal



DOI10.5281/zenodo.8139328Zenodo8139328MaRDI QIDQ6710330

Dataset published at Zenodo repository.

Author name not available (Why is that?)

Publication date: 12 July 2023

Copyright license: No records found.



A comprehensive benchmark of the DECIMER Image Transformer was conducted using all publicly available OCSR benchmark datasets and DECIMER test datasets. USPTO: A set of 5,719 images of chemical structures and the corresponding MOL files (US Patent Office) obtained from the OSRA online presence UOB: The dataset of 5,740 images and MOL files of chemical structures developed by the University of Birmingham, United Kingdom, and published alongside MolRec CLEF: The Conference and Labs of the Evaluation Forum test set of 992 images and molfiles published in 2012 JPO: A subset (450 images and MOL files) of a dataset based on data from the Japanese Patent Office, obtained from the OSRA online presence. Note that this dataset contains many labels (sometimes with Japanese characters) and irregular features, such as variations in the line thickness. Additionally, some images have poor quality and contain a lot of noise. RanDepict250k: A set of 250,000 chemical structure depictions generated with RanDepict (1.0.8) using RanDepicts depiction feature fingerprints to ensure diverse depiction parameters. None of the depicted molecules is present in the DECIMER training data. The images here are all 299 x 299 pixels in size. RanDepict250k_augmented: A set of the same 250,000 images from the RanDepict250k dataset. Additional augmentations (examples: mild rotation, shearing, insertion of labels and reaction arrows around the structures, insertion of curved arrows in the structure) were added to the images using RanDepict. The images here are all 299 x 299 pixels in size. DECIMER hand-drawn: A set of 5,088 chemical structure depictions which were manually drawn by a group of 24 volunteers. The drawn molecules have been picked using the MaxMinalgorithm from all molecules in PubChemso that the set represents a big part of the chemical space. Indigo: 50,000 images generated by Staker et al. using Indigowhich were collected from the supplementary information. All images have a resolution of 224 x 224 pixels. USPTO_big: 50,000 images from the USPTO from Staker et al.which were collected from the supplementary information. All images have a resolution of 224 x 224 pixels. Img2Mol test set: A set of 25,000 chemical structure depictions used by Clvert et al. for testing . All images have a resolution of 224 x 224 pixels.






This page was built for dataset: DECIMER V2 Benchmark Datasets