Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Gitome: A curated dataset for GitHub README-related tasks - MaRDI portal

Deprecated: Use of MediaWiki\Skin\SkinTemplate::injectLegacyMenusIntoPersonalTools was deprecated in Please make sure Skin option menus contains `user-menu` (and possibly `notifications`, `user-interface-preferences`, `user-page`) 1.46. [Called from MediaWiki\Skin\SkinTemplate::getPortletsTemplateData in /var/www/html/w/includes/Skin/SkinTemplate.php at line 691] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of QuickTemplate::(get/html/text/haveData) with parameter `personal_urls` was deprecated in MediaWiki Use content_navigation instead. [Called from MediaWiki\Skin\QuickTemplate::get in /var/www/html/w/includes/Skin/QuickTemplate.php at line 131] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Gitome: A curated dataset for GitHub README-related tasks

From MaRDI portal



DOI10.5281/zenodo.10311456Zenodo10311456MaRDI QIDQ6706838

Dataset published at Zenodo repository.

Author name not available (Why is that?)

Publication date: 8 December 2023

Copyright license: No records found.



AboutThis repository contains the source code implementation used to replicate the experimental results obtained in the submitted to the 21st International Conference on Mining Software Repositories (MSR204)."Gitome: A curated dataset for GitHub README-related tasks"authored by:Claudio Di Sipio, Juri Di Rocco, Riccardo Rubei, Phuong Than Nguyen, and Davide Di Ruscio,Università degli Studi dell'Aquila, ItalyData descriptionThe dataset is structured as follows:emf_metamodel.zip: It contains the Ecore project with the Gitome data modelexisting_dumps.zip: It contains the existing datasets used to build Gitomelang_aggr_stats.csv: It contains the language data to compute the statistics presented in the paperlangs.csv: It contains all the languages and their frequencyoutput_dataset.zip: It contains the benchmarking dataset obtained by parsing the README filesrepository_lists.zip: It contains the list of repositories for each considered dataset (with possible duplicates)topics.csv: It contains all the topics and their frequencytopics_aggr_stats.csv: It contains the topics data to compute the statistics presented in the papergitome_repo.txt: It contains the list of the URLs of the considered GitHub repositoriesHow to collect GitomeTo collect all the data stored in this archive, please refer to the supporting Github repository https://github.com/MDEGroup/Gitome-MSR2024.






This page was built for dataset: Gitome: A curated dataset for GitHub README-related tasks