Abstract
Exposure to ambient air pollution is a global health burden, and assessing its relationships to health effects requires predicting concentrations of ambient pollution over time and space. We propose a spatiotemporal penalized regression model that provides high predictive accuracy and greater computation speed than competing approaches. This model uses overfitting and time-smoothing penalties to provide accurate predictions when there are large amounts of temporal missingness in the data. When compared to spatial-only and spatiotemporal universal kriging models in simulations, our model performs similarly under most conditions and can outperform the others when temporal missingness in the data is high. As the number of spatial locations in a data set increases, the computation time of our penalized regression model is more scalable than either of the compared methods. We demonstrate our model using total particulate matter mass (\(\hbox {PM}_{2.5}\) and \(\hbox {PM}_{{10}}\)) and using sulfate and silicon component concentrations. For total mass, our model has lower cross-validated RMSE than the spatial-only universal kriging method, but not the spatiotemporal version. For the component concentrations, which are less frequently observed, our model outperforms both of the other approaches, showing 15% and 13% improvements over the spatiotemporal universal kriging method for sulfate and silicon. The computational speed of our model also allows for the use of nonparametric bootstrap for measurement error correction, a valuable tool in two-stage health effects models. Supplementary materials accompanying this paper appear online.
Similar content being viewed by others
References
Beelen R, Hoek G, Vienneau D, Eeftens M, Dimakopoulou K, Pedeli X, Tsai M-Y, Künzli N, Schikowski T, Marcon A et al (2013) Development of NO2 and NOx land use regression models for estimating air pollution exposure in 36 study areas in Europe - The ESCAPE project. Atmos Environ 72:10–23
Bergen S, Sheppard L, Kaufman JD, Szpiro AA (2016) Multipollutant measurement error in air pollution epidemiology studies arising from predicting exposures with penalized regression splines. J Roy Stat Soc: Ser C (Appl Stat) 65(5):731–753
Bergen S, Szpiro AA (2015) Mitigating the impact of measurement error when using penalized regression to model exposure in two-stage air pollution epidemiology studies. Environ Ecol Stat 22(3):601–631
Berrocal VJ, Gelfand AE, Holland DM (2010) A spatio-temporal downscaler for output from numerical models. J Agric Biol Environ Stat 15(2):176–197
Berrocal VJ, Gelfand AE, Holland DM (2012) Space-time data fusion under error in computer model output: an application to modeling air quality. Biometrics 68(3):837–848
Berrocal VJ, Guan Y, Muyskens A, Wang H, Reich BJ, Mulholland JA, Chang HH (2020) A comparison of statistical and machine learning methods for creating national daily maps of ambient PM2.5 concentration. Atmos Environ 222:117130
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2010) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Cameletti M, Lindgren F, Simpson D, Rue H (2013) Spatio-temporal modeling of particulate matter concentration through the SPDE approach. AStA Adv Statist Anal 97(2):109–131
Datta A, Banerjee S, Finley AO, Hamm NAS, Schaap M (2016) Nonseparable dynamic nearest neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis. Ann Appl Statist 10(3):1286–1316
Di Q, Amini H, Shi L, Kloog I, Silvern R, Kelly J, Sabath MB, Choirat C, Koutrakis P, Lyapustin A et al (2019) An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environ Int 130:104909
Di Q, Amini H, Shi L, Kloog I, Silvern R, Kelly J, Sabath MB, Choirat C, Koutrakis P, Lyapustin A et al (2020) Assessing NO2 concentration and model uncertainty with high spatiotemporal resolution across the contiguous united states using ensemble model averaging. Environ Sci Technol 54(3):1372–1384
Di Q, Kloog I, Koutrakis P, Lyapustin A, Wang Y, Schwartz J (2016) Assessing PM2.5 exposures with high spatiotemporal resolution across the continental United States. Environ Sci Technol 50(9):4712–4721
Gneiting T (2002) Nonseparable, stationary covariance functions for space-time data. J Am Statist Assoc 97(458):590–600. https://doi.org/10.1198/016214502760047113
Hoek G, Beelen R, de Hoogh K, Vienneau D, Gulliver J, Fischer P, Briggs D (2008) A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmos Environ 42(33):7561–7578
Keet CA, Keller JP, Peng RD (2018) Long-term coarse particulate matter exposure is associated with asthma among children in medicaid. Am J Respir Crit Care Med 197(6):737–746
Keller JP, Peng RD (2019) Error in estimating area-level air pollution exposures for epidemiology. Environmetrics 30(8)
Lindström J, Szpiro AA, Sampson PD, Oron AP, Richards M, Larson TV, Sheppard L (2014) A flexible spatio-temporal model for air pollution with spatial and spatio-temporal covariates. Environ Ecol Stat 21(3):411–433
Malm WC, Sisler JF, Huffman D, Eldred RA, Cahill TA (1994) Spatial and seasonal trends in particle concentration and optical extinction in the United States. J Geophys Res Atmosph 99(D1):1347–1370
Mesinger F, DiMego G, Kalnay E, Mitchell K, Shafran PC, Ebisuzaki W, Jović D, Woollen J, Rogers E, Berbery EH et al (2006) North American regional reanalysis. Bull Am Meteor Soc 87(3):343–360
Murray CJL, Aravkin AY, Zheng P, Abbafati C, Abbas KM, Abbasi-Kangevari M, Abd-Allah F, Abdelalim A, Abdollahi M, Abdollahpour I et al (2020) Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet 396(10258):1223–1249
Paciorek CJ, Yanosky JD, Puett RC, Laden F, Suh HH (2009) Practical large-scale spatio-temporal modeling of particulate matter concentrations. Ann Appl Stat 3(1):370–397
Pati D, Reich BJ, Dunson DB (2011) Bayesian geostatistical modelling with informative sampling locations. Biometrika 98(1):35–48
Petersen A, Witten D (2019) Data-adaptive additive modeling. Stat Med 38(4):583–600
Reff A, Phillips S, Eyth A, Mintz D (2020) Bayesian space-time downscaling fusion model (Downscaler)—derived estimates of air quality for 2017. Technical Report EPA-454/R-20-005, United States Environmental Protection Agency, Office of Air Quality Planning and Standards Air Quality Assessment Division Research Triangle Park, NC
Reich BJ, Chang HH, Foley KM (2014) A spectral method for spatial downscaling. Biometrics 70(4):932–942
Reich BJ, Fuentes M, Dunson DB (2011) Bayesian spatial quantile regression. J Am Stat Assoc 106(493):6–20
Sampson PD, Richards M, Szpiro AA, Bergen S, Sheppard L, Larson TV, Kaufman JD (2013) A regionalized national universal kriging model using Partial Least Squares regression for estimating annual PM2.5 concentrations in epidemiology. Atmos Environ 75:383–392
Schabenberger O, Gotway CA (2004) Statistical methods for spatial data analysis. CRC Press, New York
Schlather M, Malinowski A, Menck P, Oesting M, Strokorb K (2015) Analysis, simulation and prediction of multivariate random fields with package random fields. J Stat Softw 63:1–25
Szpiro AA, Paciorek CJ (2013) Measurement error in two-stage analyses, with application to air pollution epidemiology. Environmetrics 24(8):501–517
U.S. Environmental Protection Agency (2019) Integrated science assessment (ISA) for particulate matter (final report, Dec 2019). U.S. Environmental Protection Agency, Washington, DC
Wang M, Sampson PD, Hu J, Kleeman M, Keller JP, Olives C, Szpiro AA, Vedal S, Kaufman JD (2016) Combining land-use regression and chemical transport modeling in a spatiotemporal geostatistical model for ozone and PM2.5. Environ Sci Technol 50(10):5111–5118
Wood SN (2003) Thin plate regression splines. J R Stat Soc Ser B 65(1):95–114
Xu H, Bechle MJ, Wang M, Szpiro AA, Vedal S, Bai Y, Marshall JD (2019) National PM2.5 and NO2 exposure models for China based on land use regression, satellite measurements, and universal kriging. Sci Total Environ 655:423–433
Young MT, Bechle MJ, Sampson PD, Szpiro AA, Marshall JD, Sheppard L, Kaufman JD (2016) Satellite-based NO2 and model validation in a national prediction model based on universal kriging and land-use regression. Environ Sci Technol 50(7):3686–3694
Acknowledgements
This work utilized resources from the University of Colorado Boulder Research Computing Group, which is supported by the National Science Foundation (awards ACI-1532235 and ACI-1532236), the University of Colorado Boulder, and Colorado State University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ryder, N.A., Keller, J.P. Spatiotemporal Exposure Prediction with Penalized Regression. JABES 28, 260–278 (2023). https://doi.org/10.1007/s13253-022-00523-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-022-00523-0