The Replication Crisis in Epidemiology: Snowball, Snow Job, or Winter Solstice?
Purpose of Review
Like a snowball rolling down a steep hill, the most recent crisis over the perceived lack of reproducibility of scientific results has outpaced the evidence of crisis. It has led to new actions and new guidelines that have been rushed to market without plans for evaluation, metrics for success, or due consideration of the potential for unintended consequences.
The perception of the crisis is at least partly a snow job, heavily influenced by a small number of centers lavishly funded by a single foundation, with undue and unsupported attention to preregistration as a solution to the perceived crisis. At the same time, the perception of crisis provides an opportunity for introspection. Two studies’ estimates of association may differ because of undue attention on null hypothesis statistical testing, because of differences in the distribution of effect modifiers, because of differential susceptibility to threats to validity, or for other reasons. Perhaps the expectation of what reproducible epidemiology ought to look like is more misguided than the practice of epidemiology. We advocate for the idea of “replication and advancement.” Studies should not only replicate earlier work, but also improve on it in by enhancing the design or analysis.
Abandoning blind reliance on null hypothesis significance testing for statistical inference, finding consensus on when preregistration of non-randomized study protocols has merit, and focusing on replication and advance are the most certain ways to emerge from this solstice for the better.
KeywordsEpidemiologic methods Reproducibility of results
Richard MacLehose reviewed and served as section editor for this manuscript. He has previously consulted with the Nutritional Science Initiative which received funds from the Arnold Foundation.
Compliance with Ethical Standards
Conflict of Interest
The authors declare that they have no conflicts of interest.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance
- 2.Unreliable research: trouble at the lab. Economist. 2013 19 October 2013.Google Scholar
- 5.Journals unite for reproducibility. Nature 2014;515(7525):7. https://doi.org/10.1038/515007a.
- 6.US National Institutes of Health. Rigor and Reproducibility. 2016. http://grants.nih.gov/reproducibility/index.htm#guidance. Accessed 6 July 2016.
- 7.Benjamin D, Berger J, Johannesson M, et al. Redefine Statistical Significance. Unpublished Manuscript. 2017.Google Scholar
- 8.•• Lash TL. The harm done to reproducibility by the culture of null hypothesis significance testing. Am J Epidemiol. 2017;186(6):627–35. https://doi.org/10.1093/aje/kwx261. Demonstrates that null hypothesis significance testing leads to the appearance of poor reproducibility by at least four mechanisms, yet few proposed interventions to improve reproducibility have suggested change to the culture of null hypothesis significance testing. CrossRefPubMedGoogle Scholar
- 10.McShane B, Gal D, Gelman A, Robert C, Tackett J. Abandon statistical significance. Unpublished Manuscript. 2017.Google Scholar
- 11.Trafimow D, Amrhein V, Areshenkoff C, et al. Manipulating the alpha level cannot cure significance testing—comments on “Redefine statistical significance”. Unpublished Manuscript. 2017.Google Scholar
- 14.Crane H. Why “redefining statistical significance” will not improve reproducibility and could make the replication crisis worse. Unpublished Manuscript 2017.Google Scholar
- 18.• Blair A, Saracci R, Vineis P, Cocco P, Forastiere F, Grandjean P, et al. Epidemiology, public health, and the rhetoric of false positives. Environ Health Perspect. 2009;117(12):1809–13. https://doi.org/10.1289/ehp.0901194. One of several papers emphasizing the importance of false-positive associations without due consideration to the importance of false-negative associations. CrossRefPubMedPubMedCentralGoogle Scholar
- 24.Chemicals ECfEaTo. ECETOC workshop report no. In: 18; 2009.Google Scholar
- 25.• Lash TL, Vandenbroucke JP. Commentary: should preregistration of epidemiologic study protocols become compulsory?: reflections and a counterproposal. Epidemiology. 2012;23(2):184–8. https://doi.org/10.1097/EDE.0b013e318245c05b. Review of advantages and disadvantages of compulsory preregistration of nonrandomized epidemiologic research. CrossRefPubMedGoogle Scholar
- 28.Krleza-Jeric K, Chan AW, Dickersin K, Sim I, Grimshaw J, Gluud C. Principles for international registration of protocol information and results from human trials of health related interventions: Ottawa statement (part 1). BMJ. 2005;330(7497):956–8. https://doi.org/10.1136/bmj.330.7497.956.CrossRefPubMedPubMedCentralGoogle Scholar
- 32.Center for Open Science. Our Sponsors. https://cos.io/about/our-sponsors/.
- 34.Laura and John Arnold Foundation. Grants. http://www.arnoldfoundation.org/grants/
- 39.Apple S. John Arnold made a fortune at Enron. Now he’s declared war on bad science. Wired 2017.Google Scholar
- 41.Hill AB. The environment and disease: association or causation? Proc Royal Soc Med. 1965;58:295–300.Google Scholar
- 45.Holman CD, rnold-Reed DE, de KN, McComb C, English DR. A psychometric experiment in causal inference to estimate evidential weights used by epidemiologists. 2001. p. 246–255.Google Scholar
- 47.Rothman KJ, Greenland S, Poole C, Lash TL. Causation and causal inference. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 5–31.Google Scholar
- 49.•• Gelman A, Stern H. The difference between “significant” and “not significant” is not itself statistically significant. Am Stat. 2006;60(4):328–31. https://doi.org/10.1198/000313006X152649. Two results, one statistically significant and the other not, are not necessarily different. CrossRefGoogle Scholar
- 50.•• Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31(4):337–50. https://doi.org/10.1007/s10654-016-0149-3. Comprehensive review of all the ways that null hypothesis significance testing is misused and misunderstood. CrossRefPubMedPubMedCentralGoogle Scholar
- 59.Rothman KJ, Greenland S, Lash TL. Design strategies to improve study accuracy. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 168–82.Google Scholar
- 60.Greenland S, Lash TL. Bias Analysis. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 345–80.Google Scholar
- 67.O’Boyle EH, Banks GC, Gonzalez-Mulé E. The Chrysalis effect: how ugly initial results metamorphosize into beautiful articles. J Manag. 2014 https://doi.org/10.1177/0149206314527133.
- 73.Announcement: transparency upgrade for Nature journals. Nature. 2017;543(7645):288. doi: https://doi.org/10.1038/543288b.
- 74.US National Institutes of Health. Rigor and reproducibility. https://www.nih.gov/research-training/rigor-reproducibility.
- 79.Lanes SF. Error and uncertainty in causal inference. In: Rothman KJ, editor. Causal Inference. Chestnut Hill: Epidemiology Resources Inc.; 1988.Google Scholar
- 82.Rothman KJ, Greenland S, Lash TL. Precision and statistics in epidemiologic studies. In: Rothman KJ, Greenland S, Lash TL, editors. Modern Epidemiology. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 148–67.Google Scholar
- 86.The American College of Obstetricians and Gynecologists. Ultrasound Exams. 2017. https://www.acog.org/Patients/FAQs/Ultrasound-Exams.
- 90.Rossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, Stefanick ML, et al. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results from the Women’s Health Initiative randomized controlled. Trials. 2002:321–33.Google Scholar
- 92.Hernan MA, Alonso A, Logan R, Grodstein F, Michels KB, Willett WC, et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology. 2008;19(6):766–79. https://doi.org/10.1097/EDE.0b013e3181875e61.CrossRefPubMedPubMedCentralGoogle Scholar