Tweets-with-keyword-lockdown-in-April-July-2020
OpenML dataset with id 43794
Author name not available (Why is that?)
Full work available at URL: https://api.openml.org/data/v1/download/22102619/Tweets-with-keyword-lockdown-in-April-July-2020.arff
Upload date: 24 March 2022
Dataset Characteristics
Number of features: 7 (numeric: 3, symbolic: 0 and in total binary: 0 )
Number of instances: 95,488
Number of instances with missing values: 90,899
Number of missing values: 160,244
Context This data was collected to be used with an academic project of mine. The project was about sentiment analysis of tweets during lockdown. Content I used the GetOldTweets3 (https://pypi.org/project/GetOldTweets3/) python3 library to pull the tweets off Twitter. The tweets range between 1 April 2020 to 1 August 2020, which was the peak lockdown period in India. Tweets with duplicate text and NaN values and that was the only cleaning I did on the data. Total rows of tweets: 95488 Columns:
Index (be sure to use df = pandas.read_csv("tweets_lockdown.csv", index_col=0)) Text - The text of the tweet Date - Date and time of tweet in datetime format Retweets - Number of retweets for the tweet Favorites - Favorites on the tweet Mentions - Usernames mentioned in the tweets in format HashTags - Hashtags present in the tweet in format
"Top Tweets" attribute was turned off while scraping.
Inspiration
Twitter data gives us a lot of scope for data cleaning, text preprocessing, association rule mining, sentiment analysis and so on.
This page was built for dataset: Tweets-with-keyword-lockdown-in-April-July-2020