Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
A Large-scale Dataset of (Open Source) License Text Variants - MaRDI portal

Deprecated: Use of MediaWiki\Skin\SkinTemplate::injectLegacyMenusIntoPersonalTools was deprecated in Please make sure Skin option menus contains `user-menu` (and possibly `notifications`, `user-interface-preferences`, `user-page`) 1.46. [Called from MediaWiki\Skin\SkinTemplate::getPortletsTemplateData in /var/www/html/w/includes/Skin/SkinTemplate.php at line 691] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of MediaWiki\Skin\BaseTemplate::getPersonalTools was deprecated in 1.46 Call $this->getSkin()->getPersonalToolsForMakeListItem instead (T422975). [Called from Skins\Chameleon\Components\NavbarHorizontal\PersonalTools::getHtml in /var/www/html/w/skins/chameleon/src/Components/NavbarHorizontal/PersonalTools.php at line 66] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of QuickTemplate::(get/html/text/haveData) with parameter `personal_urls` was deprecated in MediaWiki Use content_navigation instead. [Called from MediaWiki\Skin\QuickTemplate::get in /var/www/html/w/includes/Skin/QuickTemplate.php at line 131] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

A Large-scale Dataset of (Open Source) License Text Variants (Q6716134)

From MaRDI portal





Dataset published at Zenodo repository.
Language Label Description Also known as
English
A Large-scale Dataset of (Open Source) License Text Variants
Dataset published at Zenodo repository.

    Statements

    0 references
    We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archivethe largest publicly available archive of FOSS source code with accompanying development historyall versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums. For more details see the includedREADME file and companion paper: Stefano Zacchiroli.A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022. If you use this dataset for research purposes, please acknowledge its use by citing the above paper.
    0 references
    23 March 2022
    0 references
    0 references
    2021-03-23
    0 references

    Identifiers

    0 references