Skip to main content
Log in

The Replication Crisis in Epidemiology: Snowball, Snow Job, or Winter Solstice?

  • Epidemiologic Methods (R Maclehose, Section Editor)
  • Published:
Current Epidemiology Reports Aims and scope Submit manuscript

Abstract

Purpose of Review

Like a snowball rolling down a steep hill, the most recent crisis over the perceived lack of reproducibility of scientific results has outpaced the evidence of crisis. It has led to new actions and new guidelines that have been rushed to market without plans for evaluation, metrics for success, or due consideration of the potential for unintended consequences.

Recent Findings

The perception of the crisis is at least partly a snow job, heavily influenced by a small number of centers lavishly funded by a single foundation, with undue and unsupported attention to preregistration as a solution to the perceived crisis. At the same time, the perception of crisis provides an opportunity for introspection. Two studies’ estimates of association may differ because of undue attention on null hypothesis statistical testing, because of differences in the distribution of effect modifiers, because of differential susceptibility to threats to validity, or for other reasons. Perhaps the expectation of what reproducible epidemiology ought to look like is more misguided than the practice of epidemiology. We advocate for the idea of “replication and advancement.” Studies should not only replicate earlier work, but also improve on it in by enhancing the design or analysis.

Summary

Abandoning blind reliance on null hypothesis significance testing for statistical inference, finding consensus on when preregistration of non-randomized study protocols has merit, and focusing on replication and advance are the most certain ways to emerge from this solstice for the better.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance

  1. Ioannidis JP. How to make more published research true. PLoS Med. 2014;11(10):e1001747. https://doi.org/10.1371/journal.pmed.1001747.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Unreliable research: trouble at the lab. Economist. 2013 19 October 2013.

  3. Collins FS, Tabak LA. Policy: NIH plans to enhance reproducibility. Nature. 2014;505(7485):612–3.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, et al. SCIENTIFIC STANDARDS. Promoting an open research culture. Science. 2015;348(6242):1422–5. https://doi.org/10.1126/science.aab2374.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. Journals unite for reproducibility. Nature 2014;515(7525):7. https://doi.org/10.1038/515007a.

  6. US National Institutes of Health. Rigor and Reproducibility. 2016. http://grants.nih.gov/reproducibility/index.htm#guidance. Accessed 6 July 2016.

  7. Benjamin D, Berger J, Johannesson M, et al. Redefine Statistical Significance. Unpublished Manuscript. 2017.

  8. •• Lash TL. The harm done to reproducibility by the culture of null hypothesis significance testing. Am J Epidemiol. 2017;186(6):627–35. https://doi.org/10.1093/aje/kwx261. Demonstrates that null hypothesis significance testing leads to the appearance of poor reproducibility by at least four mechanisms, yet few proposed interventions to improve reproducibility have suggested change to the culture of null hypothesis significance testing.

    Article  PubMed  Google Scholar 

  9. Matthews R, Wasserstein R, Spiegelhalter D. The ASA’s p-value statement, one year on. Significance. 2017;14(2):38–41. https://doi.org/10.1111/j.1740-9713.2017.01021.x.

    Article  Google Scholar 

  10. McShane B, Gal D, Gelman A, Robert C, Tackett J. Abandon statistical significance. Unpublished Manuscript. 2017.

  11. Trafimow D, Amrhein V, Areshenkoff C, et al. Manipulating the alpha level cannot cure significance testing—comments on “Redefine statistical significance”. Unpublished Manuscript. 2017.

  12. Lash TL. Declining the transparency and openness promotion guidelines. Epidemiology. 2015;26(6):779–80. https://doi.org/10.1097/ede.0000000000000382.

    Article  PubMed  Google Scholar 

  13. Lash TL. Lash responds to “is reproducibility thwarted by hypothesis testing?” and “the need for cognitive science in methodology”. Am J Epidemiol. 2017;186(6):646–7. https://doi.org/10.1093/aje/kwx260.

    Article  PubMed  Google Scholar 

  14. Crane H. Why “redefining statistical significance” will not improve reproducibility and could make the replication crisis worse. Unpublished Manuscript 2017.

  15. Feinstein AR. Scientific standards in epidemiologic studies of the menace of daily life. Science. 1988;242(4883):1257–63.

    Article  PubMed  CAS  Google Scholar 

  16. Taubes G. Epidemiology faces its limits. Science. 1995;269(5221):164–9.

    Article  PubMed  CAS  Google Scholar 

  17. Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2(8):e124.

    Article  PubMed  PubMed Central  Google Scholar 

  18. • Blair A, Saracci R, Vineis P, Cocco P, Forastiere F, Grandjean P, et al. Epidemiology, public health, and the rhetoric of false positives. Environ Health Perspect. 2009;117(12):1809–13. https://doi.org/10.1289/ehp.0901194. One of several papers emphasizing the importance of false-positive associations without due consideration to the importance of false-negative associations.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Ioannidis JP. Why most discovered true associations are inflated. Epidemiology. 2008;19(5):640–8. https://doi.org/10.1097/EDE.0b013e31818131e7.

    Article  PubMed  Google Scholar 

  20. Ioannidis JP, Tarone R, McLaughlin JK. The false-positive to false-negative ratio in epidemiologic studies. Epidemiology. 2011;22(4):450–6. https://doi.org/10.1097/EDE.0b013e31821b506e.

    Article  PubMed  Google Scholar 

  21. McLaughlin JK, Tarone RE. False positives in cancer epidemiology. Cancer Epidemiol Biomark Prev. 2013;22(1):11–5. https://doi.org/10.1158/1055-9965.EPI-12-0995.

    Article  Google Scholar 

  22. • Mayes LC, Horwitz RI, Feinstein AR. A collection of 56 topics with contradictory results in case-control research. Int J Epidemiol. 1988;17(3):680–5. Demonstrates long-standing concerns about the reproducibility of epidemiologic research.

    Article  PubMed  CAS  Google Scholar 

  23. Goodman S, Greenland S. Why most published research findings are false: problems in the analysis. PLoS Med. 2007;4(4):e168. https://doi.org/10.1371/journal.pmed.0040168.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Chemicals ECfEaTo. ECETOC workshop report no. In: 18; 2009.

    Google Scholar 

  25. • Lash TL, Vandenbroucke JP. Commentary: should preregistration of epidemiologic study protocols become compulsory?: reflections and a counterproposal. Epidemiology. 2012;23(2):184–8. https://doi.org/10.1097/EDE.0b013e318245c05b. Review of advantages and disadvantages of compulsory preregistration of nonrandomized epidemiologic research.

    Article  PubMed  Google Scholar 

  26. Boccia S, Rothman KJ, Panic N, Flacco ME, Rosso A, Pastorino R, et al. Registration practices for observational studies on ClinicalTrials.gov indicated low adherence. J Clin Epidemiol. 2016;70:176–82. https://doi.org/10.1016/j.jclinepi.2015.09.009.

    Article  PubMed  Google Scholar 

  27. De Angelis C, Drazen JM, Frizelle FAP, Haug C, Hoey J, Horton R, et al. Clinical trial registration: a statement from the International Committee of Medical Journal Editors. N Engl J Med. 2004;351(12):1250–1. https://doi.org/10.1056/NEJMe048225.

    Article  PubMed  Google Scholar 

  28. Krleza-Jeric K, Chan AW, Dickersin K, Sim I, Grimshaw J, Gluud C. Principles for international registration of protocol information and results from human trials of health related interventions: Ottawa statement (part 1). BMJ. 2005;330(7497):956–8. https://doi.org/10.1136/bmj.330.7497.956.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Williams RJ, Tse T, Harlan WR, Zarin DA. Registration of observational studies: is it time? CMAJ. 2010;182(15):1638–42. https://doi.org/10.1503/cmaj.092252.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Bracken MB. Preregistration of epidemiology protocols: a commentary in support. Epidemiology. 2011;22(2):135–7. https://doi.org/10.1097/EDE.0b013e318207fc7c.

    Article  PubMed  Google Scholar 

  31. Loder E, Groves T, MacAuley D. Registration of observational studies. BMJ. 2010;340:c950. https://doi.org/10.1136/bmj.c950.

    Article  PubMed  Google Scholar 

  32. Center for Open Science. Our Sponsors. https://cos.io/about/our-sponsors/.

  33. Buck S. Solving reproducibility. Science. 2015;348(6242):1403. https://doi.org/10.1126/science.aac8041.

    Article  PubMed  CAS  Google Scholar 

  34. Laura and John Arnold Foundation. Grants. http://www.arnoldfoundation.org/grants/

  35. Begley CG, Ioannidis JP. Reproducibility in science: improving the standard for basic and preclinical research. Circ Res. 2015;116(1):116–26. https://doi.org/10.1161/CIRCRESAHA.114.303819.

    Article  PubMed  CAS  Google Scholar 

  36. Iqbal SA, Wallach JD, Khoury MJ, Schully SD, Ioannidis JP. Reproducible research practices and transparency across the biomedical literature. PLoS Biol. 2016;14(1):e1002333. https://doi.org/10.1371/journal.pbio.1002333.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Stodden V, McNutt M, Bailey DH, Deelman E, Gil Y, Hanson B, et al. Enhancing reproducibility for computational methods. Science. 2016;354(6317):1240–1. https://doi.org/10.1126/science.aah6168.

    Article  PubMed  CAS  Google Scholar 

  38. Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Percie du Sert N, et al. A manifesto for reproducible science. Nat Hum Behav. 2017;1:0021. https://doi.org/10.1038/s41562-016-0021.

    Article  Google Scholar 

  39. Apple S. John Arnold made a fortune at Enron. Now he’s declared war on bad science. Wired 2017.

  40. Dreber A, Pfeiffer T, Almenberg J, Isaksson S, Wilson B, Chen Y, et al. Using prediction markets to estimate the reproducibility of scientific research. PNAS. 2015;112(50):15343–7.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. Hill AB. The environment and disease: association or causation? Proc Royal Soc Med. 1965;58:295–300.

    CAS  Google Scholar 

  42. Lemen RA. Chrysotile asbestos as a cause of mesothelioma: application of the Hill Causation Model. Int J Occup Environ Health. 2004;10(2):233–9. https://doi.org/10.1179/oeh.2004.10.2.233.

    Article  PubMed  Google Scholar 

  43. Degelman ML, Herman KM. Smoking and multiple sclerosis: a systematic review and meta-analysis using the Bradford Hill criteria for causation. Mult Scler Relat Disord. 2017;17:207–16. https://doi.org/10.1016/j.msard.2017.07.020.

    Article  PubMed  Google Scholar 

  44. Weed DL. Epidemiologic evidence and causal inference. Hematol Oncol Clin North Am. 2000;14(4):797–807. viii

    Article  PubMed  CAS  Google Scholar 

  45. Holman CD, rnold-Reed DE, de KN, McComb C, English DR. A psychometric experiment in causal inference to estimate evidential weights used by epidemiologists. 2001. p. 246–255.

  46. Causes RKJ. Am J Epidemiol. 1976;104(6):587–92.

    Article  Google Scholar 

  47. Rothman KJ, Greenland S, Poole C, Lash TL. Causation and causal inference. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 5–31.

    Google Scholar 

  48. Open Science CPSYCHOLOGY. Estimating the reproducibility of psychological science. Science. 2015;349(6251):aac4716. https://doi.org/10.1126/science.aac4716.

    Article  CAS  Google Scholar 

  49. •• Gelman A, Stern H. The difference between “significant” and “not significant” is not itself statistically significant. Am Stat. 2006;60(4):328–31. https://doi.org/10.1198/000313006X152649. Two results, one statistically significant and the other not, are not necessarily different.

    Article  Google Scholar 

  50. •• Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31(4):337–50. https://doi.org/10.1007/s10654-016-0149-3. Comprehensive review of all the ways that null hypothesis significance testing is misused and misunderstood.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Rothman KJ, Lanes S, Robins J. Casual inference. Epidemiology. 1993;4(6):555–6.

    Article  PubMed  CAS  Google Scholar 

  52. Seliger C, Meier CR, Becker C, Jick SS, Bogdahn U, Hau P, et al. Statin use and risk of glioma: population-based case–control analysis. Eur J Epidemiol. 2016;31(9):947–52. https://doi.org/10.1007/s10654-016-0145-7.

    Article  PubMed  CAS  Google Scholar 

  53. Brown HK, Ray JG, Wilton AS, Lunsky Y, Gomes T, Vigod SN. Association between serotonergic antidepressant use during pregnancy and autism spectrum disorder in children. JAMA. 2017;317(15):1544–52. https://doi.org/10.1001/jama.2017.3415.

    Article  PubMed  CAS  Google Scholar 

  54. Utts J. Replication and meta-analysis in parapsychology. Stat Sci. 1991;6(4):363–78.

    Article  Google Scholar 

  55. Rothman KJ, Poole C. A strengthening programme for weak associations. Int J Epidemiol. 1988;17(4):955–9.

    Article  PubMed  CAS  Google Scholar 

  56. Cole SR, Stuart EA. Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 trial. Am J Epidemiol. 2010;172(1):107–15. https://doi.org/10.1093/aje/kwq084.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Lesko CR, Buchanan AL, Westreich D, Edwards JK, Hudgens MG, Cole SR. Generalizing study results: a potential outcomes perspective. Epidemiology. 2017;28(4):553–61. https://doi.org/10.1097/EDE.0000000000000664.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Westreich D, Edwards JK, Lesko CR, Stuart E, Cole SR. Transportability of trial results using inverse odds of sampling weights. Am J Epidemiol. 2017;186(8):1010–4. https://doi.org/10.1093/aje/kwx164.

    Article  PubMed  PubMed Central  Google Scholar 

  59. Rothman KJ, Greenland S, Lash TL. Design strategies to improve study accuracy. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 168–82.

    Google Scholar 

  60. Greenland S, Lash TL. Bias Analysis. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 345–80.

    Google Scholar 

  61. Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC, Greenland S. Good practices for quantitative bias analysis. Int J Epidemiol. 2014;43(6):1969–85. https://doi.org/10.1093/ije/dyu149.

    Article  PubMed  Google Scholar 

  62. Hernan MA, Sauer BC, Hernandez-Diaz S, Platt R, Shrier I. Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses. J Clin Epidemiol. 2016;79:70–5. https://doi.org/10.1016/j.jclinepi.2016.04.014.

    Article  PubMed  PubMed Central  Google Scholar 

  63. Maldonado G. Adjusting a relative-risk estimate for study imperfections. J Epidemiol Community Health. 2008;62(7):655–63.

    Article  PubMed  CAS  Google Scholar 

  64. Fox MP, Lash TL. On the need for quantitative bias analysis in the peer-review process. Am J Epidemiol. 2017;185(10):865–8. https://doi.org/10.1093/aje/kwx057.

    Article  PubMed  Google Scholar 

  65. Hunnicutt JN, Ulbricht CM, Chrysanthopoulou SA, Lapane KL. Probabilistic bias analysis in pharmacoepidemiology and comparative effectiveness research: a systematic review. Pharmacoepidemiol Drug Saf. 2016;25(12):1343–53. https://doi.org/10.1002/pds.4076.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Greenland S. Invited commentary: the need for cognitive science in methodology. Am J Epidemiol. 2017;186(6):639–45. https://doi.org/10.1093/aje/kwx259.

    Article  PubMed  Google Scholar 

  67. O’Boyle EH, Banks GC, Gonzalez-Mulé E. The Chrysalis effect: how ugly initial results metamorphosize into beautiful articles. J Manag. 2014 https://doi.org/10.1177/0149206314527133.

  68. Sterling TD. Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. J Am Stat Assoc. 1959;54(285):30–4. https://doi.org/10.2307/2282137.

    Article  Google Scholar 

  69. Begg CBA. Measure to aid in the interpretation of published clinical trials. Stat Med. 1985;4(1):1–9.

    Article  PubMed  CAS  Google Scholar 

  70. Motulsky HJ. Common misconceptions about data analysis and statistics. Pharmacol Res Perspect. 2015;3(1):e00093. https://doi.org/10.1002/prp2.93.

    Article  PubMed  Google Scholar 

  71. Kerr NL. HARKing: hypothesizing After the Results are Known. Personal Soc Psychol Rev. 1998;2(3):196–217. https://doi.org/10.1207/s15327957pspr0203_4.

    Article  CAS  Google Scholar 

  72. Rothman KJ. Significance questing. Ann Intern Med. 1986;105(3):445–7.

    Article  PubMed  CAS  Google Scholar 

  73. Announcement: transparency upgrade for Nature journals. Nature. 2017;543(7645):288. doi:https://doi.org/10.1038/543288b.

  74. US National Institutes of Health. Rigor and reproducibility. https://www.nih.gov/research-training/rigor-reproducibility.

  75. Goldstein ND. Toward open-source epidemiology. Epidemiology. 2018;29(2):161–4. https://doi.org/10.1097/ede.0000000000000782.

    Article  PubMed  Google Scholar 

  76. Khoury MJ. Planning for the future of epidemiology in the era of big data and precision medicine. Am J Epidemiol. 2015;182(12):977–9. https://doi.org/10.1093/aje/kwv228.

    Article  PubMed  PubMed Central  Google Scholar 

  77. Galea S. An argument for a consequentialist epidemiology. Am J Epidemiol. 2013;178(8):1185–91. https://doi.org/10.1093/aje/kwt172.

    Article  PubMed  Google Scholar 

  78. von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007;370(9596):1453–7.

    Article  Google Scholar 

  79. Lanes SF. Error and uncertainty in causal inference. In: Rothman KJ, editor. Causal Inference. Chestnut Hill: Epidemiology Resources Inc.; 1988.

    Google Scholar 

  80. Lash TL. Advancing research through replication. Paediatr Perinat Epidemiol. 2015;29(1):82–3. https://doi.org/10.1111/ppe.12167.

    Article  PubMed  Google Scholar 

  81. Munafo M, Davey Smith G. Robust research needs many lines of evidence. Nature. 2018;553:399–401.

    Article  PubMed  CAS  Google Scholar 

  82. Rothman KJ, Greenland S, Lash TL. Precision and statistics in epidemiologic studies. In: Rothman KJ, Greenland S, Lash TL, editors. Modern Epidemiology. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 148–67.

    Google Scholar 

  83. Lash TL, Fox MP, Fink AK. Applying quantitative bias analysis to epidemiologic data. Statistics for biology and health, vol book, whole. New York: Springer; 2009.

    Book  Google Scholar 

  84. Kieler H, Cnattingius S, Haglund B, Palmgren J, Axelsson O. Sinistrality—a side-effect of prenatal sonography: a comparative study of young men. Epidemiology. 2001;12(6):618–23.

    Article  PubMed  CAS  Google Scholar 

  85. Salvesen KA. Ultrasound in pregnancy and non-right handedness: meta-analysis of randomized trials. Ultrasound Obstet Gynecol. 2011;38(3):267–71. https://doi.org/10.1002/uog.9055.

    Article  PubMed  CAS  Google Scholar 

  86. The American College of Obstetricians and Gynecologists. Ultrasound Exams. 2017. https://www.acog.org/Patients/FAQs/Ultrasound-Exams.

  87. Grady D, Rubin SM, Petitti DB, Fox CS, Black D, Ettinger B, et al. Hormone therapy to prevent disease and prolong life in postmenopausal women. Ann Intern Med. 1992;117(12):1016–37.

    Article  PubMed  CAS  Google Scholar 

  88. Stampfer MJ, Colditz GA. Estrogen replacement therapy and coronary heart disease: a quantitative assessment of the epidemiologic evidence. Prev Med. 1991;20(1):47–63.

    Article  PubMed  CAS  Google Scholar 

  89. Petitti D. Hormone replacement therapy and coronary heart disease: results of randomized trials. Prog Cardiovasc Dis. 2003;46(3):231–8.

    Article  PubMed  CAS  Google Scholar 

  90. Rossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, Stefanick ML, et al. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results from the Women’s Health Initiative randomized controlled. Trials. 2002:321–33.

  91. Lawlor DA, Davey Smith G, Ebrahim S. Commentary: the hormone replacement-coronary heart disease conundrum: is this the death of observational epidemiology? Int J Epidemiol. 2004;33(3):464–7. https://doi.org/10.1093/ije/dyh124.

    Article  PubMed  Google Scholar 

  92. Hernan MA, Alonso A, Logan R, Grodstein F, Michels KB, Willett WC, et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology. 2008;19(6):766–79. https://doi.org/10.1097/EDE.0b013e3181875e61.

    Article  PubMed  PubMed Central  Google Scholar 

  93. Gunn LJ, Chapeau-Blondeau F, McDonnell MD, Davis BR, Allison A, Abbott D. Too good to be true: when overwhelming evidence fails to convince. Proc Math Phys Eng Sci. 2016;472(2187):20150748. https://doi.org/10.1098/rspa.2015.0748.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

Richard MacLehose reviewed and served as section editor for this manuscript. He has previously consulted with the Nutritional Science Initiative which received funds from the Arnold Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Timothy L. Lash.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflicts of interest.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.

Additional information

This article is part of the Topical Collection on Epidemiologic Methods

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lash, T.L., Collin, L.J. & Van Dyke, M.E. The Replication Crisis in Epidemiology: Snowball, Snow Job, or Winter Solstice?. Curr Epidemiol Rep 5, 175–183 (2018). https://doi.org/10.1007/s40471-018-0148-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40471-018-0148-x

Keywords

Navigation