Govt.-of-India-Census-2001-District-Wise

OpenML dataset with id 43707

No author found.

Full work available at URL: https://api.openml.org/data/v1/download/22102532/Govt.-of-India-Census-2001-District-Wise.arff

Upload date: 24 March 2022

Dataset Characteristics

Number of features: 82 (numeric: 47, symbolic: 0 and in total binary: 0 )
Number of instances: 590
Number of instances with missing values: 582
Number of missing values: 3,219

Description

Context Census of India is a rich database which can tell stories of over a billion Indians. It is important not only for research point of view, but commercially as well for the organizations that want to understand India's complex yet strongly knitted heterogeneity. However, nowhere on the web, there exists a single database that combines the district- wise information of all the variables (most include no more than 4-5 out of over 50 variables!). Extracting and using data from Census of India 2001 is quite a laborious task since all data is made available in scattered PDFs district wise. Individual PDFs can be extracted from http://www.censusindia.gov.in/(S(ogvuk1y2e5sueoyc5eyc0g55))/Tables_Published/Basic_Data_Sheet.aspx. Content This database has been extracted from Census of 2001 and includes data of 590 districts, having around 80 variables each. In case of confusion regarding the context of the variable, refer to the following PDF and you will be able to make sense out of it: http://censusindia.gov.in/Dist_File/datasheet-2923.pdf All the extraction work can be found https://github.com/preetskhalsa97/census2001auto The final CSV can be found at finalCSV/all.csv The subtle hack that was used to automate extraction to a great extent was the the URLs of all the PDFs were same except the four digits (that were respective state and district codes). A few abbreviations used for states: AN- Andaman and Nicobar CG- Chhattisgarh DD- Daman and Diu DN_H- Dadra and Nagar Haveli JK- Jammu and Kashmir MP- Madhya Pradesh TN- Tamil Nadu UP- Uttar Pradesh WB- West Bengal A few variables for clarification: Growth..19912001- population growth from 1991 to 2001 X0..4 years- People in age group 0 to 4 years SC1- Scheduled Class with highest population Acknowledgements Inspiration This is a massive dataset which can be used to explain the interplay between education, caste, development, gender and much more. It really can explain a lot about India and propel data driven research. Happy Number Crunching!

This page was built for dataset: Govt.-of-India-Census-2001-District-Wise