gender-by-name
OpenML dataset with id 42996
Author name not available (Why is that?)
Full work available at URL: https://api.openml.org/data/v1/download/22045668/gender-by-name.arff
Upload date: 28 May 2021
Dataset Characteristics
Number of features: 4 (numeric: 2, symbolic: 0 and in total binary: 0 )
Number of instances: 147,269
Number of instances with missing values: 0
Number of missing values: 0
Author: Arun Rao Source: UCI - 2020 Please cite: UCI
Gender by Name Data Set
This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia.
This dataset combines raw counts for first/given names of male and female babies in those time periods, and then calculates a probability for a name given the aggregate count. Source datasets are from government authorities:
- US: Baby Names from Social Security Card Applications - National Data, 1880 to 2019 - UK: Baby names in England and Wales Statistical bulletins, 2011 to 2018 - Canada: British Columbia 100 Years of Popular Baby names, 1918 to 2018 - Australia: Popular Baby Names, Attorney-General's Department, 1944 to 2019
Attribute information
- Name: String
- Gender: M/F (category/string)
- Count: Integer
- Probability: Float
This page was built for dataset: gender-by-name