Skip to main content
Log in

A descriptive review of variable selection methods in four epidemiologic journals: there is still room for improvement

  • METHODS
  • Published:
European Journal of Epidemiology Aims and scope Submit manuscript

Abstract

A review of epidemiological papers conducted in 2009 concluded that several studies employed variable selection methods susceptible to introduce bias and yield inadequate inferences. Many new confounder selection methods have been developed since then. The goal of the study was to provide an updated descriptive portrait of which variable selection methods are used by epidemiologists for analyzing observational data. Studies published in four major epidemiological journals in 2015 were reviewed. Only articles concerned with a predictive or explicative objective and reporting on the analysis of individual data were included. Method(s) employed for selecting variables were extracted from retained articles. A total of 975 articles were retrieved and 299 met eligibility criteria, 292 of which pursued an explicative objective. Among those, 146 studies (50%) reported using prior knowledge or causal graphs for selecting variables, 34 (12%) used change in effect estimate methods, 26 (9%) used stepwise approaches, 16 (5%) employed univariate analyses, 5 (2%) used various other methods and 107 (37%) did not provide sufficient details to allow classification (more than one method could be employed in a single article). Despite being less frequent than in the previous review, stepwise and univariable analyses, which are susceptible to introduce bias and produce inadequate inferences, were still prevalent. Moreover, 37% studies did not provide sufficient details to assess how variables were selected. We thus believe there is still room for improvement in variable selection methods used by epidemiologists and in their reporting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lefebvre G, Delaney JA, McClelland RL. Extending the Bayesian Adjustment for Confounding algorithm to binary treatment covariates to estimate the effect of smoking on carotid intima-media thickness: the Multi-Ethnic Study of Atherosclerosis. Stat Med. 2014;33(16):2797–813. https://doi.org/10.1002/sim.6123.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Walter S, Tiemeier H. Variable selection: current practice in epidemiological studies. Eur J Epidemiol. 2009;24(12):733–6. https://doi.org/10.1007/s10654-009-9411-2.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Greenland S, Pearce N. Statistical foundations for model-based adjustments. Annu Rev Public Health. 2015;36:89–108. https://doi.org/10.1146/annurev-publhealth-031914-122559.

    Article  PubMed  Google Scholar 

  4. Harrell FE. Regression modeling strategies, with applications to linear models, survival analysis and logistic regression. 2nd ed. New York: Springer; 2015.

    Google Scholar 

  5. Steyerberg EW. Clinical prediction models: a practical approach to development, validation, and updating. New York: Springer; 2009.

    Book  Google Scholar 

  6. Greenland S. Modeling and variable selection in epidemiologic analysis. Am J Public Health. 1989;79(3):340–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10(1):37–48.

    Article  CAS  Google Scholar 

  8. VanderWeele TJ, Shpitser I. A new criterion for confounder selection. Biometrics. 2011;67(4):1406–13. https://doi.org/10.1111/j.1541-0420.2011.01619.x.

    Article  PubMed  PubMed Central  Google Scholar 

  9. VanderWeele TJ. Principles of confounder selection. Eur J Epidemiol. 2019;34(3):211–9. https://doi.org/10.1007/s10654-019-00494-6.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Chatfield C. Model uncertainty, data mining and statistical inference. J R Stat Soc Ser A Stat Soc. 1995;158(3):419–44.

    Article  Google Scholar 

  11. Steyerberg EW, Bleeker SE, Moll HA, Grobbee DE, Moons KG. Internal and external validation of predictive models: a simulation study of bias and precision in small samples. J Clin Epidemiol. 2003;56(5):441–7.

    Article  PubMed  Google Scholar 

  12. Sun G-W, Shook TL, Kay GL. Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. J Clin Epidemiol. 1996;49(8):907–16.

    Article  CAS  PubMed  Google Scholar 

  13. Maldonado G, Greenland S. Simulation study of confounder-selection strategies. Am J Epidemiol. 1993;138(11):923–36.

    Article  CAS  Google Scholar 

  14. Mickey RM, Greenland S. The impact of confounder selection criteria on effect estimation. Am J Epidemiol. 1989;129(1):125–37.

    Article  CAS  PubMed  Google Scholar 

  15. Weng H-Y, Hsueh Y-H, Messam LLM, Hertz-Picciotto I. Methods of covariate selection: directed acyclic graphs and the change-in-estimate procedure. Am J Epidemiol. 2009;169(10):1182–90.

    Article  PubMed  Google Scholar 

  16. Hernán MA, Hernández-Díaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002;155(2):176–84.

    Article  PubMed  Google Scholar 

  17. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol. 1996;58(1):267–88.

    Google Scholar 

  18. Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006;101(476):1418–29.

    Article  CAS  Google Scholar 

  19. Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: a tutorial. Stat Sci. 1999;14(4):382–401.

    Article  Google Scholar 

  20. Steyerberg EW, Eijkemans MJ, Harrell FE, Habbema JDF. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med. 2000;19(8):1059–79.

    Article  CAS  PubMed  Google Scholar 

  21. Talbot D, Lefebvre G, Atherton J. The Bayesian causal effect estimation algorithm. J Causal Inference. 2015;3(2):207–36. https://doi.org/10.1515/jci-2014-0035.

    Article  Google Scholar 

  22. Crainiceanu CM, Dominici F, Parmigiani G. Adjustment uncertainty in effect estimation. Biometrika. 2008;95(3):635–51. https://doi.org/10.1093/biomet/asn015.

    Article  Google Scholar 

  23. Wang C, Parmigiani G, Dominici F. Bayesian effect estimation accounting for adjustment uncertainty. Biometrics. 2012;68(3):661–71. https://doi.org/10.1111/j.1541-0420.2011.01731.x.

    Article  PubMed  Google Scholar 

  24. Shortreed SM, Ertefaie A. Outcome-adaptive lasso: variable selection for causal inference. Biometrics. 2017;73(4):1111–22. https://doi.org/10.1111/biom.12679.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Wang C, Dominici F, Parmigiani G, Zigler CM. Accounting for uncertainty in confounder and effect modifier selection when estimating average causal effects in generalized linear models. Biometrics. 2015;71(3):654–65. https://doi.org/10.1111/biom.12315.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Cefalu M, Dominici F, Arvold N, Parmigiani G. Model averaged double robust estimation. Biometrics. 2017;73(2):410–21. https://doi.org/10.1111/biom.12622.

    Article  PubMed  Google Scholar 

  27. Persson E, Häggström J, Waernbaum I, de Luna X. Data-driven algorithms for dimension reduction in causal inference. Comput Stat Data Anal. 2017;105:280–92. https://doi.org/10.1016/j.csda.2016.08.012.

    Article  Google Scholar 

  28. McCandless LC, Gustafson P, Austin PC. Bayesian propensity score analysis for observational data. Stat Med. 2009;28(1):94–112. https://doi.org/10.1002/sim.3460.

    Article  PubMed  Google Scholar 

  29. Zigler CM, Dominici F. Uncertainty in propensity score estimation: bayesian methods for variable selection and model averaged causal effects. J Am Stat Assoc. 2014;109(505):95–107. https://doi.org/10.1080/01621459.2013.869498.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Hill JL. Bayesian nonparametric modeling for causal inference. J Comput Graph Stat. 2011;20(1):217–40. https://doi.org/10.1198/jcgs.2010.08162.

    Article  Google Scholar 

  31. Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20(4):512–22. https://doi.org/10.1097/EDE.0b013e3181a663cc.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Wilson A, Reich BJ. Confounder selection via penalized credible regions. Biometrics. 2014;70(4):852–61.

    Article  PubMed  Google Scholar 

  33. Molinaro AM, Ferrucci LM, Cartmel B, Loftfield E, Leffell DJ, Bale AE, et al. Indoor tanning and the MC1R genotype: risk prediction for basal cell carcinoma risk in young people. Am J Epidemiol. 2015;181(11):908–16. https://doi.org/10.1093/aje/kwu356.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Gracia E, Lopez-Quilez A, Marco M, Lladosa S, Lila M. the spatial epidemiology of intimate partner violence: do neighborhoods matter? Am J Epidemiol. 2015;182(1):58–66. https://doi.org/10.1093/aje/kwv016.

    Article  PubMed  Google Scholar 

  35. Zablotska LB, Nadyrov EA, Polyanskaya ON, McConnell RJ, O’Kane P, Lubin J, et al. Risk of thyroid follicular adenoma among children and adolescents in Belarus exposed to iodine-131 after the Chornobyl accident. Am J Epidemiol. 2015;182(9):781–90. https://doi.org/10.1093/aje/kwv127.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Zuurbier LA, Luik AI, Hofman A, Franco OH, Van Someren EJ, Tiemeier H. Fragmentation and stability of circadian activity rhythms predict mortality: the Rotterdam study. Am J Epidemiol. 2015;181(1):54–63. https://doi.org/10.1093/aje/kwu245.

    Article  PubMed  Google Scholar 

  37. DiMaggio C. Small-area spatiotemporal analysis of pedestrian and bicyclist injuries in New York City. Epidemiology. 2015;26(2):247–54.

    Article  PubMed  Google Scholar 

  38. Luque-Fernandez MA, Ananth CV, Jaddoe VW, Gaillard R, Albert PS, Schomaker M, et al. Is the fetoplacental ratio a differential marker of fetal growth restriction in small for gestational age infants? Eur J Epidemiol. 2015;30(4):331–41. https://doi.org/10.1007/s10654-015-9993-9.

    Article  PubMed  Google Scholar 

  39. Miettinen OS, Cook EF. Confounding: essence and detection. Am J Epidemiol. 1981;114(4):593–603.

    Article  CAS  PubMed  Google Scholar 

Download references

Funding

This work was supported by a start-up Grant from the Fondation du CHU de Québec—Université Laval [#2710 to DT]. DT is a Fonds de Recherche du Québec—Santé Chercheur-Boursier.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Denis Talbot.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 14 kb)

Supplementary material 2 (XLSX 156 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Talbot, D., Massamba, V.K. A descriptive review of variable selection methods in four epidemiologic journals: there is still room for improvement. Eur J Epidemiol 34, 725–730 (2019). https://doi.org/10.1007/s10654-019-00529-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10654-019-00529-y

Keywords

Navigation