Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
U.S.-Pollution-Data - MaRDI portal

U.S.-Pollution-Data

From MaRDI portal
Dataset:6036683



OpenML43586MaRDI QIDQ6036683

OpenML dataset with id 43586

No author found.

Full work available at URL: https://api.openml.org/data/v1/download/22102411/U.S.-Pollution-Data.arff

Upload date: 24 March 2022



Dataset Characteristics

Number of features: 29 (numeric: 20, symbolic: 0 and in total binary: 0 )
Number of instances: 1,746,661
Number of instances with missing values: 1,309,785
Number of missing values: 1,746,230

Context This dataset deals with pollution in the U.S. Pollution in the U.S. has been well documented by the U.S. EPA but it is a pain to download all the data and arrange them in a format that interests data scientists. Hence I gathered four major pollutants (Nitrogen Dioxide, Sulphur Dioxide, Carbon Monoxide and Ozone) for every day from 2000 - 2016 and place them neatly in a CSV file. Content There is a total of 28 fields. The four pollutants (NO2, O3, SO2 and O3) each has 5 specific columns. Observations totaled to over 1.4 million. This kernel provides a good introduction to this dataset! For observations on specific columns visit the Column Metadata on the Data tab. Acknowledgements All the data is scraped from the database of U.S. EPA : https://aqsdr1.epa.gov/aqsweb/aqstmp/airdata/download_files.html Inspiration I did a related project with some of my friends in college, and decided to open source our dataset so that data scientists don't need to re-scrape the U.S. EPA site for historical pollution data.





This page was built for dataset: U.S.-Pollution-Data