Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Quora_Insincere_Questions_2018 - MaRDI portal

Quora_Insincere_Questions_2018

From MaRDI portal
Dataset:6036448



OpenML43345MaRDI QIDQ6036448

OpenML dataset with id 43345

No author found.

Full work available at URL: https://api.openml.org/data/v1/download/22102170/Quora_Insincere_Questions_2018.arff

Upload date: 23 March 2022



Dataset Characteristics

Number of features: 4 (numeric: 2, symbolic: 0 and in total binary: 0 )
Number of instances: 1,306,122
Number of instances with missing values: 1
Number of missing values: 1

Context It's the preprocessed train data from Quora Insincere Questions competition 2018 The original train data is preprocessed to remove stop words, numbers, punctuations, common words and converted to lower case. The resultant data set is lemmatised and stemmed with scikit-learn/NLTK library. Content It contains approximately 1.3 million rows of quora questions with target =0 for sincere questions and target=1 for insincere questions. Acknowledgements Thanks for Co-learning lounge mentors to help me to work on this problem Inspiration It's very handy to build the ML models in NLP.




This page was built for dataset: Quora_Insincere_Questions_2018