Capitol-Riot-Tweets
OpenML dataset with id 43353
No author found.
Full work available at URL: https://api.openml.org/data/v1/download/22102178/Capitol-Riot-Tweets.arff
Upload date: 23 March 2022
Dataset Characteristics
Number of features: 14 (numeric: 8, symbolic: 0 and in total binary: 0 )
Number of instances: 82,309
Number of instances with missing values: 82,296
Number of missing values: 392,323
A csv file with 80,000+ tweets from January 6th, 2021 -- the day of the capitol hill riots. Made using the Twitter Developer API + Tweepy. Nowhere close to the size of the Parler data dumps, but anyone with NLP experience might be able to find something useful here.
tweets have mentions, hyperlinks, emojis, and punctuation removed. All text is converted to lowercase.
Some tweets have coordinates (if users had geotagging enabled).
Verified users have their usernames included
"user location" is the user's self reported location in their profile. Blank if it doesn't correspond to a US state (or DC)
This page was built for dataset: Capitol-Riot-Tweets