Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Datasets for Out-of-KB Mention Discovery with Entity Linking - MaRDI portal

Deprecated: Use of MediaWiki\Skin\SkinTemplate::injectLegacyMenusIntoPersonalTools was deprecated in Please make sure Skin option menus contains `user-menu` (and possibly `notifications`, `user-interface-preferences`, `user-page`) 1.46. [Called from MediaWiki\Skin\SkinTemplate::getPortletsTemplateData in /var/www/html/w/includes/Skin/SkinTemplate.php at line 691] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of QuickTemplate::(get/html/text/haveData) with parameter `personal_urls` was deprecated in MediaWiki Use content_navigation instead. [Called from MediaWiki\Skin\QuickTemplate::get in /var/www/html/w/includes/Skin/QuickTemplate.php at line 131] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Datasets for Out-of-KB Mention Discovery with Entity Linking

From MaRDI portal



DOI10.5281/zenodo.8228371Zenodo8228371MaRDI QIDQ6718562

Dataset published at Zenodo repository.

Author name not available (Why is that?)

Publication date: 9 August 2023

Copyright license: No records found.



The repository contains datasets for out-of-KB mention discovery from texts, documented in the work, Reveal the Unknown: Out-of-Knowledge-Base Mention Discovery with Entity Linking, on arXiv: https://arxiv.org/abs/2302.07189 (CIKM 2023). Each data setting (as a sub-folder) contains train, valid, and test files and also 100 random sample files for each data split for debugging. Data folder names with syn_full at the end are synonym augmented data (each synonym as an entity) for the setting. Ontology .jsonl files have two versions for each, syn_attr setting treats synonyms are attributes, syn_full setting treats synonyms as entities. Data scripts are available at https://github.com/KRR-Oxford/BLINKout#data-scripts Acknowledgement of the data sources below: ShARe/CLEF 2013 dataset is from https://physionet.org/content/shareclefehealth2013/1.0/ MedMention dataset is from https://github.com/chanzuckerberg/MedMentions UMLS (versions 2012AB, 2014AB, 2017AA) is from https://www.nlm.nih.gov/research/umls/index.html SNOMED CT (corresponding versions) is from https://www.nlm.nih.gov/healthit/snomedct/index.html NILK dataset is from https://zenodo.org/record/6607514 WikiData 2017 dump is from https://archive.org/download/enwiki-20170220/enwiki-20170220-pages-articles.xml.bz2






This page was built for dataset: Datasets for Out-of-KB Mention Discovery with Entity Linking