wine-reviews
OpenML dataset with id 41275
Author name not available (Why is that?)
Full work available at URL: https://api.openml.org/data/v1/download/20649219/wine-reviews.arff
Upload date: 12 November 2018
Copyright license: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
Dataset Characteristics
Number of classes: 0
Number of features: 13 (numeric: 2, symbolic: 6 and in total binary: 0 )
Number of instances: 129,971
Number of instances with missing values: 107,584
Number of missing values: 204,752
130k wine reviews with variety, location, winery, price, and description. Downloaded from Kaggle [1] on 29.10.2018. The original data was scraped from the WineEnthusiast homepage [2]. The second version of the dataset was used, which was scraped on 22.11.2017. The Kaggle dataset was licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) [3]. The variable 'points' (the number of points WineEnthusiast rated the wine on a scale of 1-100) was selected as target variable. For a description of all variables, checkout the Kaggle dataset repo. The variable 'region_2' is ignored by default as it contains a large portion of missing values. The variable 'designation' is not used by default, as the number of factor labels is extremely high compared to the number of observations. The dataset further includes the text based variables 'description', 'taster_twitter_handle', and 'title' (ignored by default) which could be used to construct additional features. Special characters in text features have been removed to allow the upload to the platform. The ID variable from the Kaggle version was removed from the dataset. The factor labels of all nominal features had to be changed to integers to prevent a problem which would not allow the upload of nominal features with too many and too long labels.
This page was built for dataset: wine-reviews