Tracking pollution levels on the ground is important to the environment and public health. One of the pollutants of concern is ozone, which, at high concentrations, can cause respiratory and cardiovascular problems. The National Center for Atmospheric Research (NCAR) has published valuable ozone data obtained from ground-based sensors installed at selected locations. Because it is unfeasible to measure the exact ozone levels everywhere at any time, it would be valuable to predict the temporal-spatial distributions of ozone concentration based on existing data. This would help us better understand the patterns and trends in the data and make better decisions to reduce pollution. Motivated by this, the objective of this paper was to build predictive models to illustrate the temporal-spatial structure of the large amount of ozone data. The training data included measurements of ozone in 513 locations in the eastern states of the United States spanning five years. We used a machine-learning method called Gaussian process regression (GPR) with a covariance function that describes the temporal-spatial relationship between data points. With this method, we were able to observe the trends and dynamics of ozone formation. Additionally, maps were created to visualize the spatial and temporal distribution of ozone concentrations. The results demonstrate that the GPR method with the Matérn covariance function was able to give a reliable estimate of the uncertainty as well as the mean ozone concentration at various locations and times, which helps us better understand the dynamics of ozone formation.
"Machine Learning of Big Data: A Gaussian Regression Model to Predict the Spatiotemporal Distribution of Ground Ozone,"
The Journal of Purdue Undergraduate Research:
Vol. 13, Article 5.