Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
porto-seguro - MaRDI portal

porto-seguro

From MaRDI portal
Dataset:6035887



OpenML42206MaRDI QIDQ6035887

OpenML dataset with id 42206

Author name not available (Why is that?)

Full work available at URL: https://api.openml.org/data/v1/download/21770028/porto-seguro.arff

Upload date: 4 December 2019



Dataset Characteristics

Number of classes: 2
Number of features: 38 (numeric: 12, symbolic: 26 and in total binary: 18 )
Number of instances: 595,212
Number of instances with missing values: 470,281
Number of missing values: 846,458

Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [1]. The goal was to predict whether a driver will file an insurance claim next year. The official rules of the challenge explicitely state that the data may be used for 'academic research and education, and other non-commercial purposes' [2]. For a description of all variables checkout the Kaggle dataset repository [3]. It states that numeric features with integer values that do not contain 'bin' or 'cat' in their variable names are in fact ordinal features which could be treated as ordinal factors in R. For further information on effective preprocessing and feature engineering checkout the 'Kernels' section of the Kaggle challenge website [4]. For this version we removed all 'calc' variables, as the Kaggle forum indicates that they do not carry much information.






This page was built for dataset: porto-seguro