Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Notice: Unexpected clearActionName after getActionName already called in /var/www/html/w/includes/Context/RequestContext.php on line 321
SDDF Energy Dataset - MaRDI portal

Deprecated: Use of MediaWiki\Skin\SkinTemplate::injectLegacyMenusIntoPersonalTools was deprecated in Please make sure Skin option menus contains `user-menu` (and possibly `notifications`, `user-interface-preferences`, `user-page`) 1.46. [Called from MediaWiki\Skin\SkinTemplate::getPortletsTemplateData in /var/www/html/w/includes/Skin/SkinTemplate.php at line 691] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of QuickTemplate::(get/html/text/haveData) with parameter `personal_urls` was deprecated in MediaWiki Use content_navigation instead. [Called from MediaWiki\Skin\QuickTemplate::get in /var/www/html/w/includes/Skin/QuickTemplate.php at line 131] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

SDDF Energy Dataset

From MaRDI portal
(Redirected from Dataset:6701004)



DOI10.5281/zenodo.14008357Zenodo14008357MaRDI QIDQ6701004

Dataset published at Zenodo repository.

Author name not available (Why is that?)

Publication date: 29 October 2024

Copyright license: No records found.



This conformational energy dataset, developed as part of the Smart Distributed Data Factory (SDDF) project, contains over 2.17 million molecular conformations based on drug-like molecules sourced from theENAMINE database. Energies were calculated usingDFT with the B97x density functional and the 631G(d) basis set. The conformations were generated from SMILES using RDKit, MMFF94 optimization, and molecular dynamics (MD) simulations, providing a diverse set of molecular structures and energy states. RDKit Conformations: 535,338 RDKit + MMFF94 Optimized: 1,151,936 MD-Generated: 483,279 This dataset serves as a benchmark for energy prediction models, with training (638,617 examples), validation (134,732 examples), and test subsets (24,890 examples) created using a strict scaffold-based split to ensure no overlap and less than 70% similarity between the training and test sets. Dataset contents: data.tar.gz: contains the conformations in Structured Data File format, grouped into separate folders based on the molecule ID. INDEX.smi: specifies the molecule IDs and their corresponding SMILES. SOURCES.csv: specifies the conformation generation method for each conformation. SDDF_train.tsv, SDDF_validation.tsv, and SDDF_test.tsvspecify the molecule IDs and conformations for each subset of the benchmark. A detailed description is provided in the accompanying paper.






This page was built for dataset: SDDF Energy Dataset