Skip to main content

Advertisement

Log in

A flexible spatio-temporal model for air pollution with spatial and spatio-temporal covariates

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

The development of models that provide accurate spatio-temporal predictions of ambient air pollution at small spatial scales is of great importance for the assessment of potential health effects of air pollution. Here we present a spatio-temporal framework that predicts ambient air pollution by combining data from several different monitoring networks and deterministic air pollution model(s) with geographic information system covariates. The model presented in this paper has been implemented in an R package, SpatioTemporal, available on CRAN. The model is used by the EPA funded Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) to produce estimates of ambient air pollution; MESA Air uses the estimates to investigate the relationship between chronic exposure to air pollution and cardiovascular disease. In this paper we use the model to predict long-term average concentrations of \(\text {NO}_{x}\) in the Los Angeles area during a 10 year period. Predictions are based on measurements from the EPA Air Quality System, MESA Air specific monitoring, and output from a source dispersion model for traffic related air pollution (Caline3QHCR). Accuracy in predicting long-term average concentrations is evaluated using an elaborate cross-validation setup that accounts for a sparse spatio-temporal sampling pattern in the data, and adjusts for temporal effects. The predictive ability of the model is good with cross-validated \(R^2\) of approximately \(0.7\) at subject sites. Replacing four geographic covariate indicators of traffic density with the Caline3QHCR dispersion model output resulted in very similar prediction accuracy from a more parsimonious and more interpretable model. Adding traffic-related geographic covariates to the model that included Caline3QHCR did not further improve the prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Appel KW, Bhave PV, Gilliland AB, Sarwar G, Roselle SJ (2008) Evaluation of the community multiscale air quality (CMAQ) model version 4.5: sensitivities impacting model performance; part II—particulate matter. Atmos Environ 42(24):6057–6066

    Google Scholar 

  • Banerjee S, Carlin BP, Gelfand AE (2004) Hierarchical modeling and analysis for spatial data. Chapman & Hall/CRC, London

    Google Scholar 

  • Banerjee S, Gelfand AE, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J R Stat Soc Ser B 70:825–848

    Article  Google Scholar 

  • Basu R, Woodruff TJ, Parker JD, Saulnier L, Schoendorf KC (2000) Particulate air pollution and mortality: findings from 20 U.S. cities. N Engl J Med 343(24):1742–1749

    Article  Google Scholar 

  • Berrocal VJ, Gelfand AE, Holland DM (2010) A spatio-temporal downscaler for output from numerical models. J Agric Bio Environ Stat 15(2):176–197

    Google Scholar 

  • Bild DE, Bluemke DA, Burke GL, Detrano R, Diez Roux AV, Folsom AR, Greenland P, Jacobs DR Jr, Kronmal R, Liu K, Nelson JC, O’Leary D, Saad MF, Shea S, Szklo M, Tracy RP (2002) Multi-ethnic study of atherosclerosis: objectives and design. Am J Epidemiol 156(9):871–881

    Google Scholar 

  • Brauer M, Hoek G, van Vliet P, Meliefste K, Fischer P, Gehring U, Heinrich J, Cyrys J, Bellander T, Lewne M, Brunekreef B (2003) Estimating long-term average particulate air pollution concentrations: application of traffic indicators and geographic information systems. Epidemiology 14(2):228–239

    PubMed  Google Scholar 

  • Byrd R, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16:1190–1208

    Article  Google Scholar 

  • Calder CA (2008) A dynamic process convolution approach to modeling ambient particulate matter concentrations. Environmetrics 19(1):39–48

    Article  CAS  Google Scholar 

  • Cameletti M, Lindgren F, Simpson D, Rue H (2013) Spatio-temporal modeling of particulate matter concentration through the SPDE approach. AStA Adv Stat Anal 97(2):109–131

    Article  Google Scholar 

  • Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective, 2nd edn. Chapman and Hall, CRC, London

    Book  Google Scholar 

  • Cohen MA, Adar SD, Allen RW, Avol E, Curl CL, Gould T, Hardie D, Ho A, Kinney P, Larson TV, Sampson PD, Sheppard L, Stukovsky KD, Swan SS, Liu LJS, Kaufman JD (2009) Approach to estimating participant pollutant exposures in the multi-ethnic study of atherosclerosis and air pollution (MESA air). Environ Sci Technol 43(13):4687–4693

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Cressie N (1993) Statistics for spatial data, revised edition. Wiley, London

    Google Scholar 

  • Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. J R Stat Soc Ser B 70(1):209–226

    Article  Google Scholar 

  • Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, London

    Google Scholar 

  • De Iaco S, Posa D (2012) Predicting spatio-temporal random fields: some computational aspects. Comput Geosci 41:12–24

    Article  Google Scholar 

  • Dockery DW, Pope CA, Xu X, Spangler JD, Ware JH, Fay ME, Ferris BG, Speizer FE (1993) An association between air pollution and mortality in six cities. N Engl J Med 329(24):1753–1759

    Article  CAS  PubMed  Google Scholar 

  • Eckhoff P, Braverman T (1995) Addendum to the user’s guide to CAL3QHC version 2.0 (CAL3QHCR user’s guide). Technical report, US Environmental Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park, NC, USA

  • Fanshawe TR, Diggle PJ, Rushton S, Sanderson R, Lurz PWW, Glinianaia SV, Pearce MS, Parker L, Charlton M, Pless-Mulloli T (2008) Modelling spatio-temporal variation in exposure to particulate matter: a two-stage approach. Environmetrics 19(6):549–566

    Article  Google Scholar 

  • Finley AO, Banerjee S, Gelfand AE (2012) Bayesian dynamic modeling for large space-time datasets using gaussian predictive processes. J Geogr Syst 14(1):29–47

    Article  Google Scholar 

  • Fuentes M, Raftery AE (2005) Model evaluation and spatial interpolation by Bayesian combination of observations with outputs from numerical models. Biometrics 61(1):34–45

    Article  Google Scholar 

  • Fuentes M, Guttorp P, Sampson PD (2006) Using transforms to analyze space-time processes. In: Finkenstädt B, Held L, Isham V (eds) Statistical methods for spatio-temporal systems. Chapman & Hall/CRC, London, pp 77–150

  • Furrer R, Genton MG, Nychka D (2006) Covariance tapering for interpolation of large spatial datasets. J Comput Graph Stat 15(3):502–523

    Article  Google Scholar 

  • Gamerman D (2010) Dynamic spatial models including spatial time series. In: Gelfand AE, Diggle P, Guttorp P, Fuentes M (eds) Handbook of spatial statistics. Chapman & Hall/CRC, London, pp 437–448

    Chapter  Google Scholar 

  • Gneiting T, Guttorp P (2010) Continuous parameter spatio-temporal processes. In: Gelfand AE, Diggle P, Guttorp P, Fuentes M (eds) Handbook of Spatial Statistics. Chapman & Hall/CRC, London, pp 427–436

    Chapter  Google Scholar 

  • Gryparis A, Paciorek CJ, Zeka A, Schwartz J, Coull BA (2009) Measurement error caused by spatial misalignment in environmental epidemiology. Biostatistics 10(2):258–274

    Article  PubMed Central  PubMed  Google Scholar 

  • Harville DA (1997) Matrix algebra from a statistician’s perspective, 1st edn. Springer, Berlin

    Book  Google Scholar 

  • Hoek G, Beelen R, de Hoogh K, Vienneau D, Gulliver J, Fischer P, Briggs D (2008) A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmos Environ 42(3):7561–7578

    Article  CAS  Google Scholar 

  • Hogrefe C, Porter P, Gego E, Gilliland A, Gilliam R, Swall J, Irwin J, Rao S (2006) Temporal features in observed and simulated meteorology and air quality over the Eastern United States. Atmos Environ 40(26):5041–5055

    Article  CAS  Google Scholar 

  • Jerrett M, Arain A, Kanaroglou P, Beckerman B, Potoglou D, Sahsuvaroglu T, Morrison J, Giovis C (2005) A review and evaluation of intraurban air pollution exposure models. J Expo Anal Environ Epidemiol 15:185–204

    Article  CAS  PubMed  Google Scholar 

  • Kang EL, Cressie N, Shi T (2010) Using temporal variability to improve spatial mapping with application to satellite data. Can J Stat 38(2):271–289

    Article  CAS  Google Scholar 

  • Kaufman JD, Adar SD, Allen RW, Barr RG, Budoff MJ, Burke GL, Casillas AM, Cohen MA, Curl CL, Daviglus ML, Roux AVD, Jacobs DR, Kronmal RA, Larson TV, Liu SLJ, Lumley T, Navas-Acien A, O’Leary DH, Rotter JI, Sampson PD, Sheppard L, Siscovick DS, Stein JH, Szpiro AA, Tracy RP (2012) Prospective study of particulate air pollution exposures, subclinical atherosclerosis, and clinical cardiovascular disease: the multi-ethnic study of atherosclerosis and air pollution (MESA air). Am J Epidemiol 176(9):825–837

    Article  PubMed Central  PubMed  Google Scholar 

  • Lindgren F, Rue H, Lindström J (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J R Stat Soc Ser B 73(4):423–498

    Article  Google Scholar 

  • Lindström J, Szpiro AA, Sampson PD, Sheppard L, Oron A, Richards M, Larson T (2011) Incorporating output from source dispersion models into the spatio-temporal modelling of outdoor pollutant concentrations. Technical report. Working paper 370, UW Biostatistics working paper series. http://www.bepress.com/uwbiostat/paper370

  • Mercer LD, Szpiro AA, Sheppard L, Lindström J, Adar SD, Allen RW, Avol EL, Oron AP, Larson T, Liu LJS, Kaufman JD (2011) Comparing universal kriging and land-use regression for predicting concentrations of gaseous oxides of nitrogen (\(\text{ NO }_{x}\)) for the multi-ethnic study of atherosclerosis and air pollution (MESA air). Atmos Environ 45(26):4412–4420

    Article  CAS  PubMed Central  Google Scholar 

  • Miller KA, Sicovick DS, Sheppard L, Shepherd K, Sullivan JH, Anderson GL, Kaufman JD (2007) Long-term exposure to air pollution and incidence of cardiovascular events in women. N Engl J Med 356(5):447–458

    Article  CAS  PubMed  Google Scholar 

  • Paciorek CJ (2010) The importance of scale for spatial-confounding bias and precision of spatial regression estimators. Stat Sci 25(1):107–125

    Article  PubMed Central  PubMed  Google Scholar 

  • Paciorek CP, Yanosky JD, Puett RC, Laden F, Suh HH (2009) Practical large-scale spatio-temporal modeling of particulate matter concentrations. Ann Stat 3(1):370–397

    Google Scholar 

  • Pinheiro J, Bates D (2009) Mixed-effects models in S and S-PLUS. Statistics and computing. Springer, Berlin

    Google Scholar 

  • Pope CA, Thun MJ, Namboodiri MM, Dockery DW, Evans JS, Speizer FE, Heath CW Jr (1995) Particulate air pollution as a predictor of mortality in a prospective study of U.S. adults. Am J Respir Crit Care Med 151:669–674

    PubMed  Google Scholar 

  • Pope CA, Burnett RT, Thun MJ, Calle EE, Krewski D, Ito K, Thurston GD (2002) Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. J Am Med Assoc 9(287):1132–1141

    Article  Google Scholar 

  • Puett RC, Hart JE, Yanosky JD, Paciorek CJ, Schwartz J, Suh H, Speizer FE, Laden F (2009) Chronic fine and coarse particulate exposure, mortality and coronary heart disease in the nurses’ health study. Environ Health Perspect 117:1697–1701

    Article  PubMed Central  PubMed  Google Scholar 

  • R Development Core Team (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org. ISBN 3-900051-07-0

  • Sahu SK, Gelfand AE, Holland D (2006) Spatio-temporal modeling of fine particulate matter. J Agric Bio Environ Stat 11(1):61–86

    Article  Google Scholar 

  • Sampson PD, Szpiro AA, Sheppard L, Lindström J, Kaufman JD (2011) Pragmatic estimation of a spatio-temporal air quality model with irregular monitoring data. Atmos Environ 45(36):6593–6606

    Article  CAS  Google Scholar 

  • Sheppard L, Burnett RT, Szpiro AA, Kim SY, Jerrett M, Pope CA III, Brunekreef B (2012) Confounding and exposure measurement error in air pollution epidemiology. Air Qual Atmos Health 5(2):203–216

    Article  PubMed Central  PubMed  Google Scholar 

  • Smith RL, Kolenikov S, Cox LH (2003) Spatio-temporal modeling of PM2.5 data with missing values. J Geophys Res 108(D24):9004

    Google Scholar 

  • Stein ML, Chi Z, Welty LJ (2004) Approximating likelihoods for large spatial data sets. J R Stat Soc Ser B 66(2):275–296

    Article  Google Scholar 

  • Szpiro AA, Sampson PD, Sheppard L, Lumley T, Adar S, Kaufman J (2010) Predicting intra-urban variation in air pollution concentrations with complex spatio-temporal dependencies. Environmetrics 21(6):606–631

    CAS  Google Scholar 

  • Szpiro AA, Paciorek C, Sheppard L (2011a) Does more accurate exposure prediction necessarily improve health effect estimates? Epidemiology 22(5):680–685

    Article  PubMed Central  PubMed  Google Scholar 

  • Szpiro AA, Sheppard L, Lumley T (2011b) Efficient measurement error correction with spatially misaligned data. Biostatistics 12(4):610–623

    Article  PubMed Central  PubMed  Google Scholar 

  • US Census Bureau (2002) UA census 2000 TIGER/line files technical documentation. Technical report, US Census Bureau, Washington, DC. https://www.census.gov/geo/www/tiger/tigerua/ua2ktgr.pdf

  • Wilton D, Szpiro AA, Gould T, Larson T (2010) Improving spatial concentration estimates for nitrogen oxides using a hybrid meteorological dispersion/land use regression model in Los Angeles, CA and Seattle. WA. Sci Total Environ 408(5):1120–1130

    Article  CAS  Google Scholar 

Download references

Acknowledgments

Although the research described in this article has been funded wholly or in part by the United States Environmental Protection Agency through assistance agreement CR-834077101-0 and grant RD831697 to the University of Washington, it has not been subjected to the Agency’s required peer and policy review and therefore does not necessarily reflect the views of the Agency and no official endorsement should be inferred. Travel for Johan Lindström has been paid by STINT (The Swedish Foundation for International Cooperation in Research and Higher Education) Grant IG2005-2047 and the Royal Physiographic Society in Lund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johan Lindström.

Additional information

Handling Editor: Pierre Dutilleul.

Appendix: Proof of equivalence for the simplified likelihood

Appendix: Proof of equivalence for the simplified likelihood

To prove the equivalence of the two likelihood forms (9) and (10) we need the following:

Lemma 1

If \(\varSigma _1\) and \(\varSigma _2\) are two nonsingular matrices of size \(n_1\)-by-\(n_1\) and \(n_2\)-by-\(n_2\) respectively, and \(A\) is a \(n_2\)-by-\(n_1\) matrix, then:

$$\begin{aligned} \left|A\varSigma _1 A^\top + \varSigma _2\right| = \left|\varSigma _1\right|\left|\varSigma _2\right|\left|\varSigma _1^{-1} + A^\top \varSigma _2^{-1} A\right|. \end{aligned}$$

(Harville 1997, Thm. 18.1.1)

Lemma 2

The Woodbury identity (Harville 1997, Thm. 18.2.8): If \(A\) and \(B\) are two invertable matrices of size \(n\)-by-\(n\) and \(p\)-by-\(p\) respectively, and \(C\) is an arbitrary \(n\)-by-\(p\) matrix, then

$$\begin{aligned} \left( A+CBC^\top \right) ^{-1} = A^{-1} - A^{-1} C\left( B^{-1}+C^\top A^{-1}C\right) ^{-1}C^\top A^{-1}. \end{aligned}$$

Rearranging the terms and multiplying with \(A\) from both sides, Lemma 2 becomes

$$\begin{aligned} C\left( B^{-1}+C^\top A^{-1}C\right) ^{-1}C^\top = A - A \left( A+CBC^\top \right) ^{-1} A. \end{aligned}$$
(13)

Lemma 3

The Searle identity (Harville 1997, Thm. 18.2.3): If \(A, B\) are matrices of size \(p\)-by-\(n\) and \(n\)-by-\(p\) respectively, \(\mathbf I \) denotes identity matrices of appropriate size, and \((\mathbf I + AB)\) is nonsingular, then

$$\begin{aligned} \left( \mathbf I + AB\right) ^{-1}A = A \left( \mathbf I + BA\right) ^{-1}. \end{aligned}$$

Lemma 4

Blockwise inversion (Harville 1997, Thm. 8.5.11): Let \(A, B, C\), and \(D\) be block matrices, with \(A\) and \((D - C A^{-1} B)\) being nonsingular, then

$$\begin{aligned} \left[ \begin{array}{ll} A &{} B \\ C &{} D \end{array}\right] ^{-1} = \left[ \begin{array}{ll} A^{-1} + A^{-1} B \bigl (D - C A^{-1} B \bigr )^{-1} C A^{-1} &{} -A^{-1} B \bigl (D - C A^{-1} B \bigr )^{-1} \\ -\bigl (D - C A^{-1} B \bigr )^{-1} C A^{-1} &{} \bigl (D - C A^{-1} B \bigr )^{-1} \end{array}\right] . \end{aligned}$$

To make the notation clearer we suppress the dependence on \(\varPsi \). Superscripts above equality signs denote the identities used in each step.

For the determinant in (9) we have

$$ \begin{aligned} \left|\widetilde{\varSigma }\right| \overset{\text {(7)}}{=} \left|\varSigma _\nu + F \varSigma _B F^\top \right| \overset{\text {Lem. 1} \& \text {(11a)}}{=} \left|\varSigma _\nu \right|\left|\varSigma _B\right| \left|\varSigma _{B\vert Y}^{-1}\right|, \end{aligned}$$

proving equality with the determinants in (10).

For the quadratic form in (9) we first note that

$$\begin{aligned}&\widetilde{\varSigma }^{-1} \overset{\text {Lem. 2}}{=} \varSigma _\nu ^{-1} - \varSigma _\nu ^{-1} F \varSigma _{B\vert Y} F^\top \varSigma _\nu ^{-1}, \end{aligned}$$
(14a)
$$\begin{aligned}&F^\top \widetilde{\varSigma }^{-1} F \overset{\text {(13)}}{=} \varSigma _B^{-1} - \varSigma _B^{-1} \varSigma _{B\vert Y} \varSigma _B^{-1}, \end{aligned}$$
(14b)
$$\begin{aligned}&\widetilde{\varSigma }^{-1} F \overset{\text {(7)}}{=} \left( {\varvec{I}} + \varSigma _\nu ^{-1} F \varSigma _B F^\top \right) ^{-1} \varSigma _\nu ^{-1} F&\overset{\text {Lem. 3}}{=} \varSigma _\nu ^{-1} F \varSigma _{B\vert Y} \varSigma _B^{-1}. \end{aligned}$$
(14c)

Using (14) we have that

$$ \begin{aligned} \varSigma _{\alpha \vert Y}^{-1}&\overset{\text {(11b)} \& \text { (14b)}}{=}&X^\top F^\top \widetilde{\varSigma }^{-1} F X \end{aligned}$$
(15a)
$$ \begin{aligned} \widehat{\varSigma }&\overset{\text {(11c)} \& \text {(14a)} \& \text {(14c)}}{=}&- \widetilde{\varSigma }^{-1} F X \varSigma _{\alpha \vert Y} X^\top F^\top \widetilde{\varSigma }^{-1} + \widetilde{\varSigma }^{-1}. \end{aligned}$$
(15b)

For the quadratic form in (9) we have

$$ \begin{aligned}&Y^\top \widetilde{\varSigma }^{-1} \widetilde{X} \Bigl (\widetilde{X}^\top \widetilde{\varSigma }^{-1} \widetilde{X} \Bigr )^{-1} \widetilde{X}^\top \widetilde{\varSigma }^{-1} Y - Y^\top \widetilde{\varSigma }^{-1} Y\\&\quad \overset{ (7) \& (15\text {a})}{=} Y^\top \widetilde{\varSigma }^{-1} \widetilde{X} \left[ \begin{array}{cc} \varSigma _{\alpha \vert Y}^{-1} &{} \mathcal M ^\top \widetilde{\varSigma }_\nu ^{-1} F X\\ X^\top F^\top \widetilde{\varSigma }_\nu ^{-1} \mathcal M &{} \mathcal M ^\top \widetilde{\varSigma }_\nu ^{-1} \mathcal M \end{array}\right] ^{-1} \widetilde{X}^\top \widetilde{\varSigma }^{-1} Y- Y^\top \widetilde{\varSigma }^{-1} Y\\&\quad \overset{ \text {Lem. 4} \& \text {(15b)}}{=} Y^\top \widetilde{\varSigma }^{-1} \biggl [ F X \varSigma _{\alpha \vert Y} X^\top F^\top + \Bigl ( \mathbf I - F X \varSigma _{\alpha \vert Y} X^\top F^\top \widetilde{\varSigma }^{-1}\Bigr )\\&\mathcal M \bigl ( \mathcal M ^\top \widehat{\varSigma } \mathcal M \bigr )^{-1} \mathcal M ^\top \Bigl ( \mathbf I - \widetilde{\varSigma }^{-1} F X \varSigma _{\alpha \vert Y} X^\top F^\top \Bigr ) \biggr ]\widetilde{\varSigma }^{-1} Y- Y^\top \widetilde{\varSigma }^{-1} Y\\&\quad =Y^\top \biggl [\Bigl ( \widetilde{\varSigma }^{-1} -\widetilde{\varSigma }^{-1} F X \varSigma _{\alpha \vert Y} X^\top F^\top \widetilde{\varSigma }^{-1} \Bigr )\mathcal M \bigl ( \mathcal M ^\top \widehat{\varSigma } \mathcal M \bigr )^{-1} \mathcal M ^\top \\&\quad \Bigl ( \widetilde{\varSigma }^{-1} - \widetilde{\varSigma }^{-1} F X \varSigma _{\alpha \vert Y} X^\top F^\top \widetilde{\varSigma }^{-1} \Bigr )\\&\quad - \Bigl ( \widetilde{\varSigma }^{-1} -\widetilde{\varSigma }^{-1} F X \varSigma _{\alpha \vert Y} X^\top F^\top \widetilde{\varSigma }^{-1} \Bigr )\biggr ] Y\\&\quad \overset{\text {(15b)}}{=} Y^\top \widehat{\varSigma } \mathcal M \Bigl ( \mathcal M ^\top \widehat{\varSigma } \mathcal M \Bigr )^{-1} \mathcal M ^\top \widehat{\varSigma } Y- Y^\top \widehat{\varSigma } Y, \end{aligned}$$

showing that the quadratic forms in (9) and (10) are equal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lindström, J., Szpiro, A.A., Sampson, P.D. et al. A flexible spatio-temporal model for air pollution with spatial and spatio-temporal covariates. Environ Ecol Stat 21, 411–433 (2014). https://doi.org/10.1007/s10651-013-0261-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-013-0261-4

Keywords

Navigation