Abstract
Distributed lag models (DLM) are regression models that include multiple lagged exposure variables as covariates. They are frequently used to model the relationship between daily mortality and short-term air pollution exposures. Specifying a maximum lag number is but one of the difficulties in using a DLM for environmental epidemiology. We propose an easily extendible ensemble post-processing approach. The resultant estimates are both more parsimonious, approaching zero with increasing lag, and more efficient. The benefits are shown to be robust under various simulation scenario’s and illustrated with data from the National Morbidity, Mortality and Air Pollution Study.
Similar content being viewed by others
References
Almon S (1965) The distributed lag between capital appropriations and expenditures. Econometrica: Journal of the Econometric Society:178–196
Eilers PH, Marx BD (1996) Flexible smoothing with b-splines and penalties. Stat Sci:89–102
Friedman JH, Popescu BE (2003) Importance sampled learning ensembles. J Mach Learn Res:94305
Gasparrini A, Armstrong B, Kenward MG (2010) Distributed lag non-linear models. Stat Med 29:2224–2234
Hastie TJ, Tibshirani RJ (1990) Generalized additive models, vol 43. CRC Press
Hoerl AE, Kennard RW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
McCullagh P, Nelder JA (1989) Generalized linear models. London England Chapman and Hall, p 1983
Obermeier V, Scheipl F, Heumann C, Wassermann J, Küchenhoff H (2014) Flexible distributed lags for modelling earthquake data. J R Stat Soc: Ser C: Appl Stat
Peng RD, Welty LJ, McDermott A (2004) The national morbidity, mortality, and and air pollution study database in r. John Hopkins University, Dept of Biostatistics Working Papers
Peng RD, Dominici F, Louis TA (2006) Model choice in time series studies of air pollution and mortality. J R Stat Soc A Stat Soc 169(2):179–203
Roberts S, Martin MA (2010) Bootstrap-after-bootstrap model averaging for reducing model uncertainty in model selection for air pollution mortality studies. Environ Health Perspect 118(1):131–136
Rose S (2013) Mortality risk score prediction in an elderly population using machine learning. Am J Epidemiol 177(5):443–452. doi:10.1093/aje/kws241. http://aje.oxfordjournals.org/content/177/5/443.abstract
Schwartz J (2000) The distributed lag between air pollution and daily deaths. Epidemiology 11(3):320–326
Smyth C, Coomans D (2007) Predictive weighting for cluster ensembles. J Chemom 21(7-9):364–375
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol:267–288
Welty LJ, Peng R, Zeger S, Dominici F (2009) Bayesian distributed lag models: estimating effects of particulate matter air pollution on daily mortality. Biometrics 65(1):282–291
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301– 320
Acknowledgments
This study was supported by a grant from the Brussels Institute for Research and Innovation (Innoviris). In addition, we’d like to express our thanks to the reviewer who pointed out the link between BDLM’s hyper-prior and the ensemble approach.
Supplementary material
We refer to Tables 4–15 in the supplementary materials for more simulation results.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Simons, K., De Smedt, T., Van Nieuwenhuyse, A. et al. Ensemble post-processing is a promising method to obtain flexible distributed lag models. Air Qual Atmos Health 9, 835–846 (2016). https://doi.org/10.1007/s11869-015-0388-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11869-015-0388-6