Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Notice: Unexpected clearActionName after getActionName already called in /var/www/html/w/includes/Context/RequestContext.php on line 321
Dataset Reuse Indicators Datasets - MaRDI portal

Deprecated: Use of MediaWiki\Skin\SkinTemplate::injectLegacyMenusIntoPersonalTools was deprecated in Please make sure Skin option menus contains `user-menu` (and possibly `notifications`, `user-interface-preferences`, `user-page`) 1.46. [Called from MediaWiki\Skin\SkinTemplate::getPortletsTemplateData in /var/www/html/w/includes/Skin/SkinTemplate.php at line 691] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of QuickTemplate::(get/html/text/haveData) with parameter `personal_urls` was deprecated in MediaWiki Use content_navigation instead. [Called from MediaWiki\Skin\QuickTemplate::get in /var/www/html/w/includes/Skin/QuickTemplate.php at line 131] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Dataset Reuse Indicators Datasets

From MaRDI portal
(Redirected from Dataset:6718835)



DOI10.5281/zenodo.4015955Zenodo4015955MaRDI QIDQ6718835

Dataset published at Zenodo repository.

Author name not available (Why is that?)

Publication date: 5 September 2020

Copyright license: No records found.



Thisdataset containstwo files. 1) A python pickle file (github_dataset.zip) that contains Github repositories with datasets.Specifically, usingGoogles public dataset copy of Github and the BigQuery serviceto build a list of repositories that havea CSV or XLSX or XLS file. We then used the GitHub API to collectnformation about each repository in this list. The resulting dataset consists of 87936 repositories that contain at least a CSV, XLSX or XLS file, alongside with information abouttheir features (e.g. number of open and closed issues and license) from GitHub. This corpus had more than two million data files. We then excluded those files withless then ten rows, which was the case for 65537 repositories with a total of 1,467,240 data files. 2) A python pickle file (processed_dataset.zip) containing the feature information necessary to train a machine learning model to predict reuse on these Github datasets Source code can be found at:https://github.com/laurakoesten/Dataset-Reuse-Indicators For a full description of the content see: Koesten, Laura and Vougiouklis, Pavlos and Simperl, Elena and Groth, Paul, Dataset Reuse: Translating Principles to Practice. Available at SSRN:https://ssrn.com/abstract=3589836orhttp://dx.doi.org/10.2139/ssrn.3589836






This page was built for dataset: Dataset Reuse Indicators Datasets