WikiLinkGraphs: A complete, longitudinal and multilanguage dataset of the Wikipedia link networks (Q6704199)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: WikiLinkGraphs: A complete, longitudinal and multilanguage dataset of the Wikipedia link networks |
Dataset published at Zenodo repository.
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | WikiLinkGraphs: A complete, longitudinal and multilanguage dataset of the Wikipedia link networks |
Dataset published at Zenodo repository. |
Statements
This dataset contains yearly snapshots of the Wikipedias internal link network for the 9 largest language edition (de, en, es, fr, it, nl, pl, ru, sv). The dataset spans over 17 years, from the creation of Wikipedia in 2001 to March 2018. The snapshots are taken on March 1st of every year. The graphs include the links extract from the wikitext of each page (i.e in the form [[wikilink]]). Links transcluded from templates are not included. Redirects are resolved to their target page. More detailed information and supporting datasets are available at: http://disi.unitn.it/~consonni/datasets/. IMPORTANT NOTICE Gzipped files are compressed two times by Zenodo, the MD5 provided by Zenodo and the SHA512 sums provided in the `.sha512sums.txt` files, match with the files compressed once. In other words, when you download a `.gz` file save it as `.gz.gz`, uncompress it once and it should match both the MD5 provided by Zenodo and the SHA512 sum provided by us. We have opened a bug report for this behavior on Zenodos repository at: https://github.com/zenodo/zenodo/issues/1705
0 references
14 January 2019
0 references
1.0
0 references