DutchTwitterDataset
OpenML dataset with id 45108
-, Nicky van der Linden
Full work available at URL: https://api.openml.org/data/v1/download/22115756/DutchTwitterDataset.arff
Upload date: 12 April 2023
Copyright license: CC0
Dataset Characteristics
Number of classes: 0
Number of features: 20 (numeric: 19, symbolic: 0 and in total binary: 0 )
Number of instances: 451,200
Number of instances with missing values: 0
Number of missing values: 0
Context
A collection of tweets (in dutch) and features, gathered in april 2022 using the Twitter API.
A small portion of the tweets are annotated by volunteer annotators.
The main task is to identify which of the tweets are rumours, based on the features and the labelled examples in the dataset.
Content
'followers_count' : Number of users following the account. 'tweet_count' : number of tweets by the account. 'question_marks' : presence of questions marks. 0 or 1. 'verified' : Whether the account is verified or not. 'accountlife' : How long the account has existed at the time of posting. 'followers_ratio' : ratio of number of users following / number of users followed by the account. 'exclamation_marks' : presence of exclamation marks. 0 or 1. 'capital letters' : ratio of capital to lowercase letters. 'retweet_count' : number of retweets on the tweet. 'hashtags' : presence of the hashtag symbol. one or zero. 'following' : number of users the account follows. 'text length' : length of the text. 'listed_count' : number of lists the account is in. 'emoticons' : Presence of emoticons, 0 or 1. 'like_count' : number of likes on the tweet. 'time_after_posting' : How long the account existed before posting the tweet. 'activity' : how active the account is. "text" : tweet_id. 'hashtag' : Which twitter hashtag the tweet was from. One of three: #jinek, #vleestaks, or #inflatie. 'upsample_group' : a feature to allow one to sample each combination of hashtag and label in equal amounts. 'label' : 1 for Rumour, 0 for Non-Rumour, -1 for unannotated
Acknowledgements :
Dr. Peter van der Putten
Dr. Jan N. van Rijn
This page was built for dataset: DutchTwitterDataset