Skip to main content
Log in

Spatiotemporal Exposure Prediction with Penalized Regression

  • Published:
Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

Abstract

Exposure to ambient air pollution is a global health burden, and assessing its relationships to health effects requires predicting concentrations of ambient pollution over time and space. We propose a spatiotemporal penalized regression model that provides high predictive accuracy and greater computation speed than competing approaches. This model uses overfitting and time-smoothing penalties to provide accurate predictions when there are large amounts of temporal missingness in the data. When compared to spatial-only and spatiotemporal universal kriging models in simulations, our model performs similarly under most conditions and can outperform the others when temporal missingness in the data is high. As the number of spatial locations in a data set increases, the computation time of our penalized regression model is more scalable than either of the compared methods. We demonstrate our model using total particulate matter mass (\(\hbox {PM}_{2.5}\) and \(\hbox {PM}_{{10}}\)) and using sulfate and silicon component concentrations. For total mass, our model has lower cross-validated RMSE than the spatial-only universal kriging method, but not the spatiotemporal version. For the component concentrations, which are less frequently observed, our model outperforms both of the other approaches, showing 15% and 13% improvements over the spatiotemporal universal kriging method for sulfate and silicon. The computational speed of our model also allows for the use of nonparametric bootstrap for measurement error correction, a valuable tool in two-stage health effects models. Supplementary materials accompanying this paper appear online.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Beelen R, Hoek G, Vienneau D, Eeftens M, Dimakopoulou K, Pedeli X, Tsai M-Y, Künzli N, Schikowski T, Marcon A et al (2013) Development of NO2 and NOx land use regression models for estimating air pollution exposure in 36 study areas in Europe - The ESCAPE project. Atmos Environ 72:10–23

    Article  Google Scholar 

  • Bergen S, Sheppard L, Kaufman JD, Szpiro AA (2016) Multipollutant measurement error in air pollution epidemiology studies arising from predicting exposures with penalized regression splines. J Roy Stat Soc: Ser C (Appl Stat) 65(5):731–753

    MathSciNet  Google Scholar 

  • Bergen S, Szpiro AA (2015) Mitigating the impact of measurement error when using penalized regression to model exposure in two-stage air pollution epidemiology studies. Environ Ecol Stat 22(3):601–631

    Article  MathSciNet  Google Scholar 

  • Berrocal VJ, Gelfand AE, Holland DM (2010) A spatio-temporal downscaler for output from numerical models. J Agric Biol Environ Stat 15(2):176–197

    Article  MathSciNet  MATH  Google Scholar 

  • Berrocal VJ, Gelfand AE, Holland DM (2012) Space-time data fusion under error in computer model output: an application to modeling air quality. Biometrics 68(3):837–848

    Article  MathSciNet  MATH  Google Scholar 

  • Berrocal VJ, Guan Y, Muyskens A, Wang H, Reich BJ, Mulholland JA, Chang HH (2020) A comparison of statistical and machine learning methods for creating national daily maps of ambient PM2.5 concentration. Atmos Environ 222:117130

    Article  Google Scholar 

  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2010) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122

    Article  MATH  Google Scholar 

  • Cameletti M, Lindgren F, Simpson D, Rue H (2013) Spatio-temporal modeling of particulate matter concentration through the SPDE approach. AStA Adv Statist Anal 97(2):109–131

    Article  MathSciNet  MATH  Google Scholar 

  • Datta A, Banerjee S, Finley AO, Hamm NAS, Schaap M (2016) Nonseparable dynamic nearest neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis. Ann Appl Statist 10(3):1286–1316

    Article  MathSciNet  MATH  Google Scholar 

  • Di Q, Amini H, Shi L, Kloog I, Silvern R, Kelly J, Sabath MB, Choirat C, Koutrakis P, Lyapustin A et al (2019) An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environ Int 130:104909

    Article  Google Scholar 

  • Di Q, Amini H, Shi L, Kloog I, Silvern R, Kelly J, Sabath MB, Choirat C, Koutrakis P, Lyapustin A et al (2020) Assessing NO2 concentration and model uncertainty with high spatiotemporal resolution across the contiguous united states using ensemble model averaging. Environ Sci Technol 54(3):1372–1384

    Article  Google Scholar 

  • Di Q, Kloog I, Koutrakis P, Lyapustin A, Wang Y, Schwartz J (2016) Assessing PM2.5 exposures with high spatiotemporal resolution across the continental United States. Environ Sci Technol 50(9):4712–4721

    Article  Google Scholar 

  • Gneiting T (2002) Nonseparable, stationary covariance functions for space-time data. J Am Statist Assoc 97(458):590–600. https://doi.org/10.1198/016214502760047113

    Article  MathSciNet  MATH  Google Scholar 

  • Hoek G, Beelen R, de Hoogh K, Vienneau D, Gulliver J, Fischer P, Briggs D (2008) A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmos Environ 42(33):7561–7578

    Article  Google Scholar 

  • Keet CA, Keller JP, Peng RD (2018) Long-term coarse particulate matter exposure is associated with asthma among children in medicaid. Am J Respir Crit Care Med 197(6):737–746

    Article  Google Scholar 

  • Keller JP, Peng RD (2019) Error in estimating area-level air pollution exposures for epidemiology. Environmetrics 30(8)

  • Lindström J, Szpiro AA, Sampson PD, Oron AP, Richards M, Larson TV, Sheppard L (2014) A flexible spatio-temporal model for air pollution with spatial and spatio-temporal covariates. Environ Ecol Stat 21(3):411–433

    Article  MathSciNet  Google Scholar 

  • Malm WC, Sisler JF, Huffman D, Eldred RA, Cahill TA (1994) Spatial and seasonal trends in particle concentration and optical extinction in the United States. J Geophys Res Atmosph 99(D1):1347–1370

    Article  Google Scholar 

  • Mesinger F, DiMego G, Kalnay E, Mitchell K, Shafran PC, Ebisuzaki W, Jović D, Woollen J, Rogers E, Berbery EH et al (2006) North American regional reanalysis. Bull Am Meteor Soc 87(3):343–360

    Article  Google Scholar 

  • Murray CJL, Aravkin AY, Zheng P, Abbafati C, Abbas KM, Abbasi-Kangevari M, Abd-Allah F, Abdelalim A, Abdollahi M, Abdollahpour I et al (2020) Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet 396(10258):1223–1249

    Article  Google Scholar 

  • Paciorek CJ, Yanosky JD, Puett RC, Laden F, Suh HH (2009) Practical large-scale spatio-temporal modeling of particulate matter concentrations. Ann Appl Stat 3(1):370–397

    Article  MathSciNet  MATH  Google Scholar 

  • Pati D, Reich BJ, Dunson DB (2011) Bayesian geostatistical modelling with informative sampling locations. Biometrika 98(1):35–48

    Article  MathSciNet  MATH  Google Scholar 

  • Petersen A, Witten D (2019) Data-adaptive additive modeling. Stat Med 38(4):583–600

    Article  MathSciNet  Google Scholar 

  • Reff A, Phillips S, Eyth A, Mintz D (2020) Bayesian space-time downscaling fusion model (Downscaler)—derived estimates of air quality for 2017. Technical Report EPA-454/R-20-005, United States Environmental Protection Agency, Office of Air Quality Planning and Standards Air Quality Assessment Division Research Triangle Park, NC

  • Reich BJ, Chang HH, Foley KM (2014) A spectral method for spatial downscaling. Biometrics 70(4):932–942

    Article  MathSciNet  MATH  Google Scholar 

  • Reich BJ, Fuentes M, Dunson DB (2011) Bayesian spatial quantile regression. J Am Stat Assoc 106(493):6–20

    Article  MathSciNet  MATH  Google Scholar 

  • Sampson PD, Richards M, Szpiro AA, Bergen S, Sheppard L, Larson TV, Kaufman JD (2013) A regionalized national universal kriging model using Partial Least Squares regression for estimating annual PM2.5 concentrations in epidemiology. Atmos Environ 75:383–392

    Article  Google Scholar 

  • Schabenberger O, Gotway CA (2004) Statistical methods for spatial data analysis. CRC Press, New York

    MATH  Google Scholar 

  • Schlather M, Malinowski A, Menck P, Oesting M, Strokorb K (2015) Analysis, simulation and prediction of multivariate random fields with package random fields. J Stat Softw 63:1–25

    Article  Google Scholar 

  • Szpiro AA, Paciorek CJ (2013) Measurement error in two-stage analyses, with application to air pollution epidemiology. Environmetrics 24(8):501–517

    Article  MathSciNet  Google Scholar 

  • U.S. Environmental Protection Agency (2019) Integrated science assessment (ISA) for particulate matter (final report, Dec 2019). U.S. Environmental Protection Agency, Washington, DC

  • Wang M, Sampson PD, Hu J, Kleeman M, Keller JP, Olives C, Szpiro AA, Vedal S, Kaufman JD (2016) Combining land-use regression and chemical transport modeling in a spatiotemporal geostatistical model for ozone and PM2.5. Environ Sci Technol 50(10):5111–5118

    Article  Google Scholar 

  • Wood SN (2003) Thin plate regression splines. J R Stat Soc Ser B 65(1):95–114

    Article  MathSciNet  MATH  Google Scholar 

  • Xu H, Bechle MJ, Wang M, Szpiro AA, Vedal S, Bai Y, Marshall JD (2019) National PM2.5 and NO2 exposure models for China based on land use regression, satellite measurements, and universal kriging. Sci Total Environ 655:423–433

    Article  Google Scholar 

  • Young MT, Bechle MJ, Sampson PD, Szpiro AA, Marshall JD, Sheppard L, Kaufman JD (2016) Satellite-based NO2 and model validation in a national prediction model based on universal kriging and land-use regression. Environ Sci Technol 50(7):3686–3694

    Article  Google Scholar 

Download references

Acknowledgements

This work utilized resources from the University of Colorado Boulder Research Computing Group, which is supported by the National Science Foundation (awards ACI-1532235 and ACI-1532236), the University of Colorado Boulder, and Colorado State University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nathan A. Ryder.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ryder, N.A., Keller, J.P. Spatiotemporal Exposure Prediction with Penalized Regression. JABES 28, 260–278 (2023). https://doi.org/10.1007/s13253-022-00523-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-022-00523-0

Keywords

Navigation