A descriptive review of variable selection methods in four epidemiologic journals: there is still room for improvement

Talbot, Denis; Massamba, Victoria Kubuta

doi:10.1007/s10654-019-00529-y

A descriptive review of variable selection methods in four epidemiologic journals: there is still room for improvement

METHODS
Published: 03 June 2019

Volume 34, pages 725–730, (2019)
Cite this article

European Journal of Epidemiology Aims and scope Submit manuscript

2953 Accesses
46 Citations
10 Altmetric
Explore all metrics

Abstract

A review of epidemiological papers conducted in 2009 concluded that several studies employed variable selection methods susceptible to introduce bias and yield inadequate inferences. Many new confounder selection methods have been developed since then. The goal of the study was to provide an updated descriptive portrait of which variable selection methods are used by epidemiologists for analyzing observational data. Studies published in four major epidemiological journals in 2015 were reviewed. Only articles concerned with a predictive or explicative objective and reporting on the analysis of individual data were included. Method(s) employed for selecting variables were extracted from retained articles. A total of 975 articles were retrieved and 299 met eligibility criteria, 292 of which pursued an explicative objective. Among those, 146 studies (50%) reported using prior knowledge or causal graphs for selecting variables, 34 (12%) used change in effect estimate methods, 26 (9%) used stepwise approaches, 16 (5%) employed univariate analyses, 5 (2%) used various other methods and 107 (37%) did not provide sufficient details to allow classification (more than one method could be employed in a single article). Despite being less frequent than in the previous review, stepwise and univariable analyses, which are susceptible to introduce bias and produce inadequate inferences, were still prevalent. Moreover, 37% studies did not provide sufficient details to assess how variables were selected. We thus believe there is still room for improvement in variable selection methods used by epidemiologists and in their reporting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sensitivity Analyses for Unmeasured Confounding: This Is the Way

Target Validity: Bringing Treatment of External Validity in Line with Internal Validity

Article 30 June 2020

Meta-Analytic Methods for Public Health Research

References

Lefebvre G, Delaney JA, McClelland RL. Extending the Bayesian Adjustment for Confounding algorithm to binary treatment covariates to estimate the effect of smoking on carotid intima-media thickness: the Multi-Ethnic Study of Atherosclerosis. Stat Med. 2014;33(16):2797–813. https://doi.org/10.1002/sim.6123.
Article PubMed PubMed Central Google Scholar
Walter S, Tiemeier H. Variable selection: current practice in epidemiological studies. Eur J Epidemiol. 2009;24(12):733–6. https://doi.org/10.1007/s10654-009-9411-2.
Article PubMed PubMed Central Google Scholar
Greenland S, Pearce N. Statistical foundations for model-based adjustments. Annu Rev Public Health. 2015;36:89–108. https://doi.org/10.1146/annurev-publhealth-031914-122559.
Article PubMed Google Scholar
Harrell FE. Regression modeling strategies, with applications to linear models, survival analysis and logistic regression. 2nd ed. New York: Springer; 2015.
Google Scholar
Steyerberg EW. Clinical prediction models: a practical approach to development, validation, and updating. New York: Springer; 2009.
Book Google Scholar
Greenland S. Modeling and variable selection in epidemiologic analysis. Am J Public Health. 1989;79(3):340–9.
Article CAS PubMed PubMed Central Google Scholar
Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10(1):37–48.
Article CAS Google Scholar
VanderWeele TJ, Shpitser I. A new criterion for confounder selection. Biometrics. 2011;67(4):1406–13. https://doi.org/10.1111/j.1541-0420.2011.01619.x.
Article PubMed PubMed Central Google Scholar
VanderWeele TJ. Principles of confounder selection. Eur J Epidemiol. 2019;34(3):211–9. https://doi.org/10.1007/s10654-019-00494-6.
Article PubMed PubMed Central Google Scholar
Chatfield C. Model uncertainty, data mining and statistical inference. J R Stat Soc Ser A Stat Soc. 1995;158(3):419–44.
Article Google Scholar
Steyerberg EW, Bleeker SE, Moll HA, Grobbee DE, Moons KG. Internal and external validation of predictive models: a simulation study of bias and precision in small samples. J Clin Epidemiol. 2003;56(5):441–7.
Article PubMed Google Scholar
Sun G-W, Shook TL, Kay GL. Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. J Clin Epidemiol. 1996;49(8):907–16.
Article CAS PubMed Google Scholar
Maldonado G, Greenland S. Simulation study of confounder-selection strategies. Am J Epidemiol. 1993;138(11):923–36.
Article CAS Google Scholar
Mickey RM, Greenland S. The impact of confounder selection criteria on effect estimation. Am J Epidemiol. 1989;129(1):125–37.
Article CAS PubMed Google Scholar
Weng H-Y, Hsueh Y-H, Messam LLM, Hertz-Picciotto I. Methods of covariate selection: directed acyclic graphs and the change-in-estimate procedure. Am J Epidemiol. 2009;169(10):1182–90.
Article PubMed Google Scholar
Hernán MA, Hernández-Díaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002;155(2):176–84.
Article PubMed Google Scholar
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol. 1996;58(1):267–88.
Google Scholar
Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006;101(476):1418–29.
Article CAS Google Scholar
Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: a tutorial. Stat Sci. 1999;14(4):382–401.
Article Google Scholar
Steyerberg EW, Eijkemans MJ, Harrell FE, Habbema JDF. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med. 2000;19(8):1059–79.
Article CAS PubMed Google Scholar
Talbot D, Lefebvre G, Atherton J. The Bayesian causal effect estimation algorithm. J Causal Inference. 2015;3(2):207–36. https://doi.org/10.1515/jci-2014-0035.
Article Google Scholar
Crainiceanu CM, Dominici F, Parmigiani G. Adjustment uncertainty in effect estimation. Biometrika. 2008;95(3):635–51. https://doi.org/10.1093/biomet/asn015.
Article Google Scholar
Wang C, Parmigiani G, Dominici F. Bayesian effect estimation accounting for adjustment uncertainty. Biometrics. 2012;68(3):661–71. https://doi.org/10.1111/j.1541-0420.2011.01731.x.
Article PubMed Google Scholar
Shortreed SM, Ertefaie A. Outcome-adaptive lasso: variable selection for causal inference. Biometrics. 2017;73(4):1111–22. https://doi.org/10.1111/biom.12679.
Article PubMed PubMed Central Google Scholar
Wang C, Dominici F, Parmigiani G, Zigler CM. Accounting for uncertainty in confounder and effect modifier selection when estimating average causal effects in generalized linear models. Biometrics. 2015;71(3):654–65. https://doi.org/10.1111/biom.12315.
Article PubMed PubMed Central Google Scholar
Cefalu M, Dominici F, Arvold N, Parmigiani G. Model averaged double robust estimation. Biometrics. 2017;73(2):410–21. https://doi.org/10.1111/biom.12622.
Article PubMed Google Scholar
Persson E, Häggström J, Waernbaum I, de Luna X. Data-driven algorithms for dimension reduction in causal inference. Comput Stat Data Anal. 2017;105:280–92. https://doi.org/10.1016/j.csda.2016.08.012.
Article Google Scholar
McCandless LC, Gustafson P, Austin PC. Bayesian propensity score analysis for observational data. Stat Med. 2009;28(1):94–112. https://doi.org/10.1002/sim.3460.
Article PubMed Google Scholar
Zigler CM, Dominici F. Uncertainty in propensity score estimation: bayesian methods for variable selection and model averaged causal effects. J Am Stat Assoc. 2014;109(505):95–107. https://doi.org/10.1080/01621459.2013.869498.
Article CAS PubMed PubMed Central Google Scholar
Hill JL. Bayesian nonparametric modeling for causal inference. J Comput Graph Stat. 2011;20(1):217–40. https://doi.org/10.1198/jcgs.2010.08162.
Article Google Scholar
Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20(4):512–22. https://doi.org/10.1097/EDE.0b013e3181a663cc.
Article PubMed PubMed Central Google Scholar
Wilson A, Reich BJ. Confounder selection via penalized credible regions. Biometrics. 2014;70(4):852–61.
Article PubMed Google Scholar
Molinaro AM, Ferrucci LM, Cartmel B, Loftfield E, Leffell DJ, Bale AE, et al. Indoor tanning and the MC1R genotype: risk prediction for basal cell carcinoma risk in young people. Am J Epidemiol. 2015;181(11):908–16. https://doi.org/10.1093/aje/kwu356.
Article PubMed PubMed Central Google Scholar
Gracia E, Lopez-Quilez A, Marco M, Lladosa S, Lila M. the spatial epidemiology of intimate partner violence: do neighborhoods matter? Am J Epidemiol. 2015;182(1):58–66. https://doi.org/10.1093/aje/kwv016.
Article PubMed Google Scholar
Zablotska LB, Nadyrov EA, Polyanskaya ON, McConnell RJ, O’Kane P, Lubin J, et al. Risk of thyroid follicular adenoma among children and adolescents in Belarus exposed to iodine-131 after the Chornobyl accident. Am J Epidemiol. 2015;182(9):781–90. https://doi.org/10.1093/aje/kwv127.
Article PubMed PubMed Central Google Scholar
Zuurbier LA, Luik AI, Hofman A, Franco OH, Van Someren EJ, Tiemeier H. Fragmentation and stability of circadian activity rhythms predict mortality: the Rotterdam study. Am J Epidemiol. 2015;181(1):54–63. https://doi.org/10.1093/aje/kwu245.
Article PubMed Google Scholar
DiMaggio C. Small-area spatiotemporal analysis of pedestrian and bicyclist injuries in New York City. Epidemiology. 2015;26(2):247–54.
Article PubMed Google Scholar
Luque-Fernandez MA, Ananth CV, Jaddoe VW, Gaillard R, Albert PS, Schomaker M, et al. Is the fetoplacental ratio a differential marker of fetal growth restriction in small for gestational age infants? Eur J Epidemiol. 2015;30(4):331–41. https://doi.org/10.1007/s10654-015-9993-9.
Article PubMed Google Scholar
Miettinen OS, Cook EF. Confounding: essence and detection. Am J Epidemiol. 1981;114(4):593–603.
Article CAS PubMed Google Scholar

Download references

Funding

This work was supported by a start-up Grant from the Fondation du CHU de Québec—Université Laval [#2710 to DT]. DT is a Fonds de Recherche du Québec—Santé Chercheur-Boursier.

Author information

Authors and Affiliations

Département de médecine sociale et préventive, Faculté de médecine, Université Laval, 1050, avenue de la Médecine, Pavillon Ferdinand-Vandry, room 2454, Quebec, QC, G1V 0A6, Canada
Denis Talbot & Victoria Kubuta Massamba
Unité santé des populations et pratiques optimales en santé, CHU de Québec – Université Laval Research Center, Quebec, QC, Canada
Denis Talbot & Victoria Kubuta Massamba

Authors

Denis Talbot
View author publications
You can also search for this author in PubMed Google Scholar
Victoria Kubuta Massamba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Denis Talbot.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 14 kb)

Supplementary material 2 (XLSX 156 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Talbot, D., Massamba, V.K. A descriptive review of variable selection methods in four epidemiologic journals: there is still room for improvement. Eur J Epidemiol 34, 725–730 (2019). https://doi.org/10.1007/s10654-019-00529-y

Download citation

Received: 09 October 2018
Accepted: 24 May 2019
Published: 03 June 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s10654-019-00529-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A descriptive review of variable selection methods in four epidemiologic journals: there is still room for improvement

Abstract

Access this article

Similar content being viewed by others

Sensitivity Analyses for Unmeasured Confounding: This Is the Way

Target Validity: Bringing Treatment of External Validity in Line with Internal Validity

Meta-Analytic Methods for Public Health Research

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (DOCX 14 kb)

Supplementary material 2 (XLSX 156 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A descriptive review of variable selection methods in four epidemiologic journals: there is still room for improvement

Abstract

Access this article

Similar content being viewed by others

Sensitivity Analyses for Unmeasured Confounding: This Is the Way

Target Validity: Bringing Treatment of External Validity in Line with Internal Validity

Meta-Analytic Methods for Public Health Research

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (DOCX 14 kb)

Supplementary material 2 (XLSX 156 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation