Dutch-News-Articles
OpenML dataset with id 43370
Author name not available (Why is that?)
Full work available at URL: https://api.openml.org/data/v1/download/22102195/Dutch-News-Articles.arff
Upload date: 23 March 2022
Copyright license: No records found.
Dataset Characteristics
Number of features: 5 (numeric: 0, symbolic: 0 and in total binary: 0 )
Number of instances: 237,861
Number of instances with missing values: 0
Number of missing values: 0
Dutch News Articles This dataset contains all the articles published by the NOS as of the 1st of January 2010. The data is obtained by scraping the NOS website. The NOS is one of the biggest (online) news organizations in the Netherlands. Features:
datetime: date and time of publication of the article. title: the title of the news article. content: the content of the news article. category: the category under which the NOS filed the article. url: link to the original article.
About the data
The title and content of features somewhat clean. Meaning extra whites spaces and newlines are removed. Furthermore, these features are normalized (NFKD). The NOS also publishes liveblogs. The posts in this live blog are not part of this dataset.
Example
I used this dataset in a recent blog post.
This page was built for dataset: Dutch-News-Articles