Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
sf-police-incidents - MaRDI portal

sf-police-incidents

From MaRDI portal
Dataset:6035910



OpenML42344MaRDI QIDQ6035910

OpenML dataset with id 42344

No author found.

Full work available at URL: https://api.openml.org/data/v1/download/21801023/sf-police-incidents.arff

Upload date: 3 April 2020



Dataset Characteristics

Number of classes: 2
Number of features: 7 (numeric: 1, symbolic: 6 and in total binary: 1 )
Number of instances: 538,638
Number of instances with missing values: 0
Number of missing values: 0

Incident reports from the San Franciso Police Department between January 2003 and May 2018, provided by the City and County of San Francisco. The dataset was downloaded on 05.11.2018. from [1]. For a description of all variables, checkout the homepage of the data provider. The original data was published under ODC Public Domain Dedication and Licence (PDDL) [2]. As target, the binary variable 'ViolentCrime' was created. A 'ViolentCrime' was defined as 'Category' %in% c('ASSAULT', 'ROBBERY', 'SEX OFFENSES, FORCIBLE', 'KIDNAPPING') | 'Descript' %in% c('GRAND THEFT PURSESNATCH', 'ATTEMPTED GRAND THEFT PURSESNATCH'). Additional date and time features 'Hour', 'DayOfWeek', 'Month', and 'Year' were created. The original variables 'Category', 'Descript', 'Date', 'Time', 'Resolution', 'Location', and 'PdId' were removed from the dataset. One record which contained the only missing value in the variable 'PdDistrict' was removed from the dataset. Using this dataset for machine learning was inspired by Nina Zumel's blogpost [3]. Note that incidents consist of multiple rows in the dataset when the crime belongs to more than one 'Category', which is indicated by the ID variable 'IncidntNum' (ignored by default). For this version, the majority class was downsampled to achieve a balanced classification task. Unused factor levels were dropped. The numeric features 'X' and 'Y' were removed to increase the importance of the high cardinal factorial features




This page was built for dataset: sf-police-incidents