Multipurpose-World-News-Dataset
OpenML dataset with id 43522
Author name not available (Why is that?)
Full work available at URL: https://api.openml.org/data/v1/download/22102347/Multipurpose-World-News-Dataset.arff
Upload date: 23 March 2022
Dataset Characteristics
Number of features: 4 (numeric: 0, symbolic: 0 and in total binary: 0 )
Number of instances: 193,279
Number of instances with missing values: 29,954
Number of missing values: 29,954
Content This is a dataset I started building for my future personal projects, as I think this kind of data is quite hard to acquire for free and in short time. I started acquiring data on March 21st, 2020 and intend to keep doing that constantly. What you'll have inside this are news extracted from the following sources:
Foxbusiness.com Youtube.com Cnet.com The Verge Nytimes.com Rawstory.com Investors.com Wreg.com Reuters Koin.com Inc.com CNBC, Nj.com Wmtw.com Nbcdfw.com Bloomberg Wowt.com Bbc.com
For every 20-minute interval, a script checks for new headlines on these sources and add'em into a database. This CSV file is generated from that.
I intend to update this dataset every day if I can (and if the machine I run this script is up).
This page was built for dataset: Multipurpose-World-News-Dataset