porto-seguro
OpenML dataset with id 42742
Author name not available (Why is that?)
Full work available at URL: https://api.openml.org/data/v1/download/22044816/porto-seguro.arff
Upload date: 3 December 2020
Dataset Characteristics
Number of classes: 2
Number of features: 58 (numeric: 26, symbolic: 32 and in total binary: 24 )
Number of instances: 595,212
Number of instances with missing values: 470,281
Number of missing values: 846,458
Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [1]. The goal was to predict whether a driver will file an insurance claim next year. The official rules of the challenge explicitely state that the data may be used for 'academic research and education, and other non-commercial purposes' [2]. For a description of all variables checkout the Kaggle dataset repository [3]. It states that numeric features with integer values that do not contain 'bin' or 'cat' in their variable names are in fact ordinal features which could be treated as ordinal factors in R. For further information on effective preprocessing and feature engineering checkout the 'Kernels' section of the Kaggle challenge website [4]. Note that many Kagglers removed all 'calc' variables as they do not seem to carry much information.
This page was built for dataset: porto-seguro