Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
DutchTwitterDataset - MaRDI portal

DutchTwitterDataset

From MaRDI portal
Dataset:6037930



OpenML45108MaRDI QIDQ6037930

OpenML dataset with id 45108

-, Nicky van der Linden

Full work available at URL: https://api.openml.org/data/v1/download/22115756/DutchTwitterDataset.arff

Upload date: 12 April 2023
Copyright license: CC0



Dataset Characteristics

Number of classes: 0
Number of features: 20 (numeric: 19, symbolic: 0 and in total binary: 0 )
Number of instances: 451,200
Number of instances with missing values: 0
Number of missing values: 0

Context

A collection of tweets (in dutch) and features, gathered in april 2022 using the Twitter API.

A small portion of the tweets are annotated by volunteer annotators.

The main task is to identify which of the tweets are rumours, based on the features and the labelled examples in the dataset.

Content

'followers_count' : Number of users following the account. 'tweet_count' : number of tweets by the account. 'question_marks' : presence of questions marks. 0 or 1. 'verified' : Whether the account is verified or not. 'accountlife' : How long the account has existed at the time of posting. 'followers_ratio' : ratio of number of users following / number of users followed by the account. 'exclamation_marks' : presence of exclamation marks. 0 or 1. 'capital letters' : ratio of capital to lowercase letters. 'retweet_count' : number of retweets on the tweet. 'hashtags' : presence of the hashtag symbol. one or zero. 'following' : number of users the account follows. 'text length' : length of the text. 'listed_count' : number of lists the account is in. 'emoticons' : Presence of emoticons, 0 or 1. 'like_count' : number of likes on the tweet. 'time_after_posting' : How long the account existed before posting the tweet. 'activity' : how active the account is. "text" : tweet_id. 'hashtag' : Which twitter hashtag the tweet was from. One of three: #jinek, #vleestaks, or #inflatie. 'upsample_group' : a feature to allow one to sample each combination of hashtag and label in equal amounts. 'label' : 1 for Rumour, 0 for Non-Rumour, -1 for unannotated

Acknowledgements : Dr. Peter van der Putten Dr. Jan N. van Rijn






This page was built for dataset: DutchTwitterDataset