Abstract
A review of epidemiological papers conducted in 2009 concluded that several studies employed variable selection methods susceptible to introduce bias and yield inadequate inferences. Many new confounder selection methods have been developed since then. The goal of the study was to provide an updated descriptive portrait of which variable selection methods are used by epidemiologists for analyzing observational data. Studies published in four major epidemiological journals in 2015 were reviewed. Only articles concerned with a predictive or explicative objective and reporting on the analysis of individual data were included. Method(s) employed for selecting variables were extracted from retained articles. A total of 975 articles were retrieved and 299 met eligibility criteria, 292 of which pursued an explicative objective. Among those, 146 studies (50%) reported using prior knowledge or causal graphs for selecting variables, 34 (12%) used change in effect estimate methods, 26 (9%) used stepwise approaches, 16 (5%) employed univariate analyses, 5 (2%) used various other methods and 107 (37%) did not provide sufficient details to allow classification (more than one method could be employed in a single article). Despite being less frequent than in the previous review, stepwise and univariable analyses, which are susceptible to introduce bias and produce inadequate inferences, were still prevalent. Moreover, 37% studies did not provide sufficient details to assess how variables were selected. We thus believe there is still room for improvement in variable selection methods used by epidemiologists and in their reporting.
Similar content being viewed by others
References
Lefebvre G, Delaney JA, McClelland RL. Extending the Bayesian Adjustment for Confounding algorithm to binary treatment covariates to estimate the effect of smoking on carotid intima-media thickness: the Multi-Ethnic Study of Atherosclerosis. Stat Med. 2014;33(16):2797–813. https://doi.org/10.1002/sim.6123.
Walter S, Tiemeier H. Variable selection: current practice in epidemiological studies. Eur J Epidemiol. 2009;24(12):733–6. https://doi.org/10.1007/s10654-009-9411-2.
Greenland S, Pearce N. Statistical foundations for model-based adjustments. Annu Rev Public Health. 2015;36:89–108. https://doi.org/10.1146/annurev-publhealth-031914-122559.
Harrell FE. Regression modeling strategies, with applications to linear models, survival analysis and logistic regression. 2nd ed. New York: Springer; 2015.
Steyerberg EW. Clinical prediction models: a practical approach to development, validation, and updating. New York: Springer; 2009.
Greenland S. Modeling and variable selection in epidemiologic analysis. Am J Public Health. 1989;79(3):340–9.
Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10(1):37–48.
VanderWeele TJ, Shpitser I. A new criterion for confounder selection. Biometrics. 2011;67(4):1406–13. https://doi.org/10.1111/j.1541-0420.2011.01619.x.
VanderWeele TJ. Principles of confounder selection. Eur J Epidemiol. 2019;34(3):211–9. https://doi.org/10.1007/s10654-019-00494-6.
Chatfield C. Model uncertainty, data mining and statistical inference. J R Stat Soc Ser A Stat Soc. 1995;158(3):419–44.
Steyerberg EW, Bleeker SE, Moll HA, Grobbee DE, Moons KG. Internal and external validation of predictive models: a simulation study of bias and precision in small samples. J Clin Epidemiol. 2003;56(5):441–7.
Sun G-W, Shook TL, Kay GL. Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. J Clin Epidemiol. 1996;49(8):907–16.
Maldonado G, Greenland S. Simulation study of confounder-selection strategies. Am J Epidemiol. 1993;138(11):923–36.
Mickey RM, Greenland S. The impact of confounder selection criteria on effect estimation. Am J Epidemiol. 1989;129(1):125–37.
Weng H-Y, Hsueh Y-H, Messam LLM, Hertz-Picciotto I. Methods of covariate selection: directed acyclic graphs and the change-in-estimate procedure. Am J Epidemiol. 2009;169(10):1182–90.
Hernán MA, Hernández-Díaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002;155(2):176–84.
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol. 1996;58(1):267–88.
Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006;101(476):1418–29.
Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: a tutorial. Stat Sci. 1999;14(4):382–401.
Steyerberg EW, Eijkemans MJ, Harrell FE, Habbema JDF. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med. 2000;19(8):1059–79.
Talbot D, Lefebvre G, Atherton J. The Bayesian causal effect estimation algorithm. J Causal Inference. 2015;3(2):207–36. https://doi.org/10.1515/jci-2014-0035.
Crainiceanu CM, Dominici F, Parmigiani G. Adjustment uncertainty in effect estimation. Biometrika. 2008;95(3):635–51. https://doi.org/10.1093/biomet/asn015.
Wang C, Parmigiani G, Dominici F. Bayesian effect estimation accounting for adjustment uncertainty. Biometrics. 2012;68(3):661–71. https://doi.org/10.1111/j.1541-0420.2011.01731.x.
Shortreed SM, Ertefaie A. Outcome-adaptive lasso: variable selection for causal inference. Biometrics. 2017;73(4):1111–22. https://doi.org/10.1111/biom.12679.
Wang C, Dominici F, Parmigiani G, Zigler CM. Accounting for uncertainty in confounder and effect modifier selection when estimating average causal effects in generalized linear models. Biometrics. 2015;71(3):654–65. https://doi.org/10.1111/biom.12315.
Cefalu M, Dominici F, Arvold N, Parmigiani G. Model averaged double robust estimation. Biometrics. 2017;73(2):410–21. https://doi.org/10.1111/biom.12622.
Persson E, Häggström J, Waernbaum I, de Luna X. Data-driven algorithms for dimension reduction in causal inference. Comput Stat Data Anal. 2017;105:280–92. https://doi.org/10.1016/j.csda.2016.08.012.
McCandless LC, Gustafson P, Austin PC. Bayesian propensity score analysis for observational data. Stat Med. 2009;28(1):94–112. https://doi.org/10.1002/sim.3460.
Zigler CM, Dominici F. Uncertainty in propensity score estimation: bayesian methods for variable selection and model averaged causal effects. J Am Stat Assoc. 2014;109(505):95–107. https://doi.org/10.1080/01621459.2013.869498.
Hill JL. Bayesian nonparametric modeling for causal inference. J Comput Graph Stat. 2011;20(1):217–40. https://doi.org/10.1198/jcgs.2010.08162.
Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20(4):512–22. https://doi.org/10.1097/EDE.0b013e3181a663cc.
Wilson A, Reich BJ. Confounder selection via penalized credible regions. Biometrics. 2014;70(4):852–61.
Molinaro AM, Ferrucci LM, Cartmel B, Loftfield E, Leffell DJ, Bale AE, et al. Indoor tanning and the MC1R genotype: risk prediction for basal cell carcinoma risk in young people. Am J Epidemiol. 2015;181(11):908–16. https://doi.org/10.1093/aje/kwu356.
Gracia E, Lopez-Quilez A, Marco M, Lladosa S, Lila M. the spatial epidemiology of intimate partner violence: do neighborhoods matter? Am J Epidemiol. 2015;182(1):58–66. https://doi.org/10.1093/aje/kwv016.
Zablotska LB, Nadyrov EA, Polyanskaya ON, McConnell RJ, O’Kane P, Lubin J, et al. Risk of thyroid follicular adenoma among children and adolescents in Belarus exposed to iodine-131 after the Chornobyl accident. Am J Epidemiol. 2015;182(9):781–90. https://doi.org/10.1093/aje/kwv127.
Zuurbier LA, Luik AI, Hofman A, Franco OH, Van Someren EJ, Tiemeier H. Fragmentation and stability of circadian activity rhythms predict mortality: the Rotterdam study. Am J Epidemiol. 2015;181(1):54–63. https://doi.org/10.1093/aje/kwu245.
DiMaggio C. Small-area spatiotemporal analysis of pedestrian and bicyclist injuries in New York City. Epidemiology. 2015;26(2):247–54.
Luque-Fernandez MA, Ananth CV, Jaddoe VW, Gaillard R, Albert PS, Schomaker M, et al. Is the fetoplacental ratio a differential marker of fetal growth restriction in small for gestational age infants? Eur J Epidemiol. 2015;30(4):331–41. https://doi.org/10.1007/s10654-015-9993-9.
Miettinen OS, Cook EF. Confounding: essence and detection. Am J Epidemiol. 1981;114(4):593–603.
Funding
This work was supported by a start-up Grant from the Fondation du CHU de Québec—Université Laval [#2710 to DT]. DT is a Fonds de Recherche du Québec—Santé Chercheur-Boursier.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Talbot, D., Massamba, V.K. A descriptive review of variable selection methods in four epidemiologic journals: there is still room for improvement. Eur J Epidemiol 34, 725–730 (2019). https://doi.org/10.1007/s10654-019-00529-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10654-019-00529-y