Tweets-with-keyword-lockdown-in-April-July-2020

OpenML dataset with id 43794

Author name not available (Why is that?)

Full work available at URL: https://api.openml.org/data/v1/download/22102619/Tweets-with-keyword-lockdown-in-April-July-2020.arff

Upload date: 24 March 2022

Dataset Characteristics

Number of features: 7 (numeric: 3, symbolic: 0 and in total binary: 0 )
Number of instances: 95,488
Number of instances with missing values: 90,899
Number of missing values: 160,244

Description

Context This data was collected to be used with an academic project of mine. The project was about sentiment analysis of tweets during lockdown. Content I used the GetOldTweets3 (https://pypi.org/project/GetOldTweets3/) python3 library to pull the tweets off Twitter. The tweets range between 1 April 2020 to 1 August 2020, which was the peak lockdown period in India. Tweets with duplicate text and NaN values and that was the only cleaning I did on the data. Total rows of tweets: 95488 Columns:

Index (be sure to use df = pandas.read_csv("tweets_lockdown.csv", index_col=0)) Text - The text of the tweet Date - Date and time of tweet in datetime format Retweets - Number of retweets for the tweet Favorites - Favorites on the tweet Mentions - Usernames mentioned in the tweets in format HashTags - Hashtags present in the tweet in format

"Top Tweets" attribute was turned off while scraping. Inspiration Twitter data gives us a lot of scope for data cleaning, text preprocessing, association rule mining, sentiment analysis and so on.

This page was built for dataset: Tweets-with-keyword-lockdown-in-April-July-2020