Current Epidemiology Reports

, Volume 5, Issue 2, pp 175–183 | Cite as

The Replication Crisis in Epidemiology: Snowball, Snow Job, or Winter Solstice?

  • Timothy L. LashEmail author
  • Lindsay J. Collin
  • Miriam E. Van Dyke
Epidemiologic Methods (R Maclehose, Section Editor)
Part of the following topical collections:
  1. Topical Collection on Epidemiologic Methods


Purpose of Review

Like a snowball rolling down a steep hill, the most recent crisis over the perceived lack of reproducibility of scientific results has outpaced the evidence of crisis. It has led to new actions and new guidelines that have been rushed to market without plans for evaluation, metrics for success, or due consideration of the potential for unintended consequences.

Recent Findings

The perception of the crisis is at least partly a snow job, heavily influenced by a small number of centers lavishly funded by a single foundation, with undue and unsupported attention to preregistration as a solution to the perceived crisis. At the same time, the perception of crisis provides an opportunity for introspection. Two studies’ estimates of association may differ because of undue attention on null hypothesis statistical testing, because of differences in the distribution of effect modifiers, because of differential susceptibility to threats to validity, or for other reasons. Perhaps the expectation of what reproducible epidemiology ought to look like is more misguided than the practice of epidemiology. We advocate for the idea of “replication and advancement.” Studies should not only replicate earlier work, but also improve on it in by enhancing the design or analysis.


Abandoning blind reliance on null hypothesis significance testing for statistical inference, finding consensus on when preregistration of non-randomized study protocols has merit, and focusing on replication and advance are the most certain ways to emerge from this solstice for the better.


Epidemiologic methods Reproducibility of results 



Richard MacLehose reviewed and served as section editor for this manuscript. He has previously consulted with the Nutritional Science Initiative which received funds from the Arnold Foundation.

Compliance with Ethical Standards

Conflict of Interest

The authors declare that they have no conflicts of interest.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.


Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance

  1. 1.
    Ioannidis JP. How to make more published research true. PLoS Med. 2014;11(10):e1001747. Scholar
  2. 2.
    Unreliable research: trouble at the lab. Economist. 2013 19 October 2013.Google Scholar
  3. 3.
    Collins FS, Tabak LA. Policy: NIH plans to enhance reproducibility. Nature. 2014;505(7485):612–3.CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, et al. SCIENTIFIC STANDARDS. Promoting an open research culture. Science. 2015;348(6242):1422–5. CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Journals unite for reproducibility. Nature 2014;515(7525):7.
  6. 6.
    US National Institutes of Health. Rigor and Reproducibility. 2016. Accessed 6 July 2016.
  7. 7.
    Benjamin D, Berger J, Johannesson M, et al. Redefine Statistical Significance. Unpublished Manuscript. 2017.Google Scholar
  8. 8.
    •• Lash TL. The harm done to reproducibility by the culture of null hypothesis significance testing. Am J Epidemiol. 2017;186(6):627–35. Demonstrates that null hypothesis significance testing leads to the appearance of poor reproducibility by at least four mechanisms, yet few proposed interventions to improve reproducibility have suggested change to the culture of null hypothesis significance testing. CrossRefPubMedGoogle Scholar
  9. 9.
    Matthews R, Wasserstein R, Spiegelhalter D. The ASA’s p-value statement, one year on. Significance. 2017;14(2):38–41. Scholar
  10. 10.
    McShane B, Gal D, Gelman A, Robert C, Tackett J. Abandon statistical significance. Unpublished Manuscript. 2017.Google Scholar
  11. 11.
    Trafimow D, Amrhein V, Areshenkoff C, et al. Manipulating the alpha level cannot cure significance testing—comments on “Redefine statistical significance”. Unpublished Manuscript. 2017.Google Scholar
  12. 12.
    Lash TL. Declining the transparency and openness promotion guidelines. Epidemiology. 2015;26(6):779–80. Scholar
  13. 13.
    Lash TL. Lash responds to “is reproducibility thwarted by hypothesis testing?” and “the need for cognitive science in methodology”. Am J Epidemiol. 2017;186(6):646–7. Scholar
  14. 14.
    Crane H. Why “redefining statistical significance” will not improve reproducibility and could make the replication crisis worse. Unpublished Manuscript 2017.Google Scholar
  15. 15.
    Feinstein AR. Scientific standards in epidemiologic studies of the menace of daily life. Science. 1988;242(4883):1257–63.CrossRefPubMedGoogle Scholar
  16. 16.
    Taubes G. Epidemiology faces its limits. Science. 1995;269(5221):164–9.CrossRefPubMedGoogle Scholar
  17. 17.
    Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2(8):e124.CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    • Blair A, Saracci R, Vineis P, Cocco P, Forastiere F, Grandjean P, et al. Epidemiology, public health, and the rhetoric of false positives. Environ Health Perspect. 2009;117(12):1809–13. One of several papers emphasizing the importance of false-positive associations without due consideration to the importance of false-negative associations. CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Ioannidis JP. Why most discovered true associations are inflated. Epidemiology. 2008;19(5):640–8. Scholar
  20. 20.
    Ioannidis JP, Tarone R, McLaughlin JK. The false-positive to false-negative ratio in epidemiologic studies. Epidemiology. 2011;22(4):450–6. Scholar
  21. 21.
    McLaughlin JK, Tarone RE. False positives in cancer epidemiology. Cancer Epidemiol Biomark Prev. 2013;22(1):11–5. Scholar
  22. 22.
    • Mayes LC, Horwitz RI, Feinstein AR. A collection of 56 topics with contradictory results in case-control research. Int J Epidemiol. 1988;17(3):680–5. Demonstrates long-standing concerns about the reproducibility of epidemiologic research. CrossRefPubMedGoogle Scholar
  23. 23.
    Goodman S, Greenland S. Why most published research findings are false: problems in the analysis. PLoS Med. 2007;4(4):e168. Scholar
  24. 24.
    Chemicals ECfEaTo. ECETOC workshop report no. In: 18; 2009.Google Scholar
  25. 25.
    • Lash TL, Vandenbroucke JP. Commentary: should preregistration of epidemiologic study protocols become compulsory?: reflections and a counterproposal. Epidemiology. 2012;23(2):184–8. Review of advantages and disadvantages of compulsory preregistration of nonrandomized epidemiologic research. CrossRefPubMedGoogle Scholar
  26. 26.
    Boccia S, Rothman KJ, Panic N, Flacco ME, Rosso A, Pastorino R, et al. Registration practices for observational studies on indicated low adherence. J Clin Epidemiol. 2016;70:176–82. Scholar
  27. 27.
    De Angelis C, Drazen JM, Frizelle FAP, Haug C, Hoey J, Horton R, et al. Clinical trial registration: a statement from the International Committee of Medical Journal Editors. N Engl J Med. 2004;351(12):1250–1. Scholar
  28. 28.
    Krleza-Jeric K, Chan AW, Dickersin K, Sim I, Grimshaw J, Gluud C. Principles for international registration of protocol information and results from human trials of health related interventions: Ottawa statement (part 1). BMJ. 2005;330(7497):956–8. Scholar
  29. 29.
    Williams RJ, Tse T, Harlan WR, Zarin DA. Registration of observational studies: is it time? CMAJ. 2010;182(15):1638–42. Scholar
  30. 30.
    Bracken MB. Preregistration of epidemiology protocols: a commentary in support. Epidemiology. 2011;22(2):135–7. Scholar
  31. 31.
    Loder E, Groves T, MacAuley D. Registration of observational studies. BMJ. 2010;340:c950. Scholar
  32. 32.
    Center for Open Science. Our Sponsors.
  33. 33.
    Buck S. Solving reproducibility. Science. 2015;348(6242):1403. Scholar
  34. 34.
    Laura and John Arnold Foundation. Grants.
  35. 35.
    Begley CG, Ioannidis JP. Reproducibility in science: improving the standard for basic and preclinical research. Circ Res. 2015;116(1):116–26. Scholar
  36. 36.
    Iqbal SA, Wallach JD, Khoury MJ, Schully SD, Ioannidis JP. Reproducible research practices and transparency across the biomedical literature. PLoS Biol. 2016;14(1):e1002333. Scholar
  37. 37.
    Stodden V, McNutt M, Bailey DH, Deelman E, Gil Y, Hanson B, et al. Enhancing reproducibility for computational methods. Science. 2016;354(6317):1240–1. Scholar
  38. 38.
    Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Percie du Sert N, et al. A manifesto for reproducible science. Nat Hum Behav. 2017;1:0021. Scholar
  39. 39.
    Apple S. John Arnold made a fortune at Enron. Now he’s declared war on bad science. Wired 2017.Google Scholar
  40. 40.
    Dreber A, Pfeiffer T, Almenberg J, Isaksson S, Wilson B, Chen Y, et al. Using prediction markets to estimate the reproducibility of scientific research. PNAS. 2015;112(50):15343–7.CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Hill AB. The environment and disease: association or causation? Proc Royal Soc Med. 1965;58:295–300.Google Scholar
  42. 42.
    Lemen RA. Chrysotile asbestos as a cause of mesothelioma: application of the Hill Causation Model. Int J Occup Environ Health. 2004;10(2):233–9. Scholar
  43. 43.
    Degelman ML, Herman KM. Smoking and multiple sclerosis: a systematic review and meta-analysis using the Bradford Hill criteria for causation. Mult Scler Relat Disord. 2017;17:207–16. Scholar
  44. 44.
    Weed DL. Epidemiologic evidence and causal inference. Hematol Oncol Clin North Am. 2000;14(4):797–807. viiiCrossRefPubMedGoogle Scholar
  45. 45.
    Holman CD, rnold-Reed DE, de KN, McComb C, English DR. A psychometric experiment in causal inference to estimate evidential weights used by epidemiologists. 2001. p. 246–255.Google Scholar
  46. 46.
    Causes RKJ. Am J Epidemiol. 1976;104(6):587–92.CrossRefGoogle Scholar
  47. 47.
    Rothman KJ, Greenland S, Poole C, Lash TL. Causation and causal inference. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 5–31.Google Scholar
  48. 48.
    Open Science CPSYCHOLOGY. Estimating the reproducibility of psychological science. Science. 2015;349(6251):aac4716. Scholar
  49. 49.
    •• Gelman A, Stern H. The difference between “significant” and “not significant” is not itself statistically significant. Am Stat. 2006;60(4):328–31. Two results, one statistically significant and the other not, are not necessarily different. CrossRefGoogle Scholar
  50. 50.
    •• Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31(4):337–50. Comprehensive review of all the ways that null hypothesis significance testing is misused and misunderstood. CrossRefPubMedPubMedCentralGoogle Scholar
  51. 51.
    Rothman KJ, Lanes S, Robins J. Casual inference. Epidemiology. 1993;4(6):555–6.CrossRefPubMedGoogle Scholar
  52. 52.
    Seliger C, Meier CR, Becker C, Jick SS, Bogdahn U, Hau P, et al. Statin use and risk of glioma: population-based case–control analysis. Eur J Epidemiol. 2016;31(9):947–52. Scholar
  53. 53.
    Brown HK, Ray JG, Wilton AS, Lunsky Y, Gomes T, Vigod SN. Association between serotonergic antidepressant use during pregnancy and autism spectrum disorder in children. JAMA. 2017;317(15):1544–52. Scholar
  54. 54.
    Utts J. Replication and meta-analysis in parapsychology. Stat Sci. 1991;6(4):363–78.CrossRefGoogle Scholar
  55. 55.
    Rothman KJ, Poole C. A strengthening programme for weak associations. Int J Epidemiol. 1988;17(4):955–9.CrossRefPubMedGoogle Scholar
  56. 56.
    Cole SR, Stuart EA. Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 trial. Am J Epidemiol. 2010;172(1):107–15. Scholar
  57. 57.
    Lesko CR, Buchanan AL, Westreich D, Edwards JK, Hudgens MG, Cole SR. Generalizing study results: a potential outcomes perspective. Epidemiology. 2017;28(4):553–61. Scholar
  58. 58.
    Westreich D, Edwards JK, Lesko CR, Stuart E, Cole SR. Transportability of trial results using inverse odds of sampling weights. Am J Epidemiol. 2017;186(8):1010–4. Scholar
  59. 59.
    Rothman KJ, Greenland S, Lash TL. Design strategies to improve study accuracy. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 168–82.Google Scholar
  60. 60.
    Greenland S, Lash TL. Bias Analysis. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 345–80.Google Scholar
  61. 61.
    Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC, Greenland S. Good practices for quantitative bias analysis. Int J Epidemiol. 2014;43(6):1969–85. Scholar
  62. 62.
    Hernan MA, Sauer BC, Hernandez-Diaz S, Platt R, Shrier I. Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses. J Clin Epidemiol. 2016;79:70–5. Scholar
  63. 63.
    Maldonado G. Adjusting a relative-risk estimate for study imperfections. J Epidemiol Community Health. 2008;62(7):655–63.CrossRefPubMedGoogle Scholar
  64. 64.
    Fox MP, Lash TL. On the need for quantitative bias analysis in the peer-review process. Am J Epidemiol. 2017;185(10):865–8. Scholar
  65. 65.
    Hunnicutt JN, Ulbricht CM, Chrysanthopoulou SA, Lapane KL. Probabilistic bias analysis in pharmacoepidemiology and comparative effectiveness research: a systematic review. Pharmacoepidemiol Drug Saf. 2016;25(12):1343–53. Scholar
  66. 66.
    Greenland S. Invited commentary: the need for cognitive science in methodology. Am J Epidemiol. 2017;186(6):639–45. Scholar
  67. 67.
    O’Boyle EH, Banks GC, Gonzalez-Mulé E. The Chrysalis effect: how ugly initial results metamorphosize into beautiful articles. J Manag. 2014
  68. 68.
    Sterling TD. Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. J Am Stat Assoc. 1959;54(285):30–4. Scholar
  69. 69.
    Begg CBA. Measure to aid in the interpretation of published clinical trials. Stat Med. 1985;4(1):1–9.CrossRefPubMedGoogle Scholar
  70. 70.
    Motulsky HJ. Common misconceptions about data analysis and statistics. Pharmacol Res Perspect. 2015;3(1):e00093. Scholar
  71. 71.
    Kerr NL. HARKing: hypothesizing After the Results are Known. Personal Soc Psychol Rev. 1998;2(3):196–217. Scholar
  72. 72.
    Rothman KJ. Significance questing. Ann Intern Med. 1986;105(3):445–7.CrossRefPubMedGoogle Scholar
  73. 73.
    Announcement: transparency upgrade for Nature journals. Nature. 2017;543(7645):288. doi:
  74. 74.
    US National Institutes of Health. Rigor and reproducibility.
  75. 75.
    Goldstein ND. Toward open-source epidemiology. Epidemiology. 2018;29(2):161–4. Scholar
  76. 76.
    Khoury MJ. Planning for the future of epidemiology in the era of big data and precision medicine. Am J Epidemiol. 2015;182(12):977–9. PubMedPubMedCentralCrossRefGoogle Scholar
  77. 77.
    Galea S. An argument for a consequentialist epidemiology. Am J Epidemiol. 2013;178(8):1185–91. Scholar
  78. 78.
    von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007;370(9596):1453–7.CrossRefGoogle Scholar
  79. 79.
    Lanes SF. Error and uncertainty in causal inference. In: Rothman KJ, editor. Causal Inference. Chestnut Hill: Epidemiology Resources Inc.; 1988.Google Scholar
  80. 80.
    Lash TL. Advancing research through replication. Paediatr Perinat Epidemiol. 2015;29(1):82–3. Scholar
  81. 81.
    Munafo M, Davey Smith G. Robust research needs many lines of evidence. Nature. 2018;553:399–401.CrossRefPubMedGoogle Scholar
  82. 82.
    Rothman KJ, Greenland S, Lash TL. Precision and statistics in epidemiologic studies. In: Rothman KJ, Greenland S, Lash TL, editors. Modern Epidemiology. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 148–67.Google Scholar
  83. 83.
    Lash TL, Fox MP, Fink AK. Applying quantitative bias analysis to epidemiologic data. Statistics for biology and health, vol book, whole. New York: Springer; 2009.CrossRefGoogle Scholar
  84. 84.
    Kieler H, Cnattingius S, Haglund B, Palmgren J, Axelsson O. Sinistrality—a side-effect of prenatal sonography: a comparative study of young men. Epidemiology. 2001;12(6):618–23.CrossRefPubMedGoogle Scholar
  85. 85.
    Salvesen KA. Ultrasound in pregnancy and non-right handedness: meta-analysis of randomized trials. Ultrasound Obstet Gynecol. 2011;38(3):267–71. Scholar
  86. 86.
    The American College of Obstetricians and Gynecologists. Ultrasound Exams. 2017.
  87. 87.
    Grady D, Rubin SM, Petitti DB, Fox CS, Black D, Ettinger B, et al. Hormone therapy to prevent disease and prolong life in postmenopausal women. Ann Intern Med. 1992;117(12):1016–37.CrossRefPubMedGoogle Scholar
  88. 88.
    Stampfer MJ, Colditz GA. Estrogen replacement therapy and coronary heart disease: a quantitative assessment of the epidemiologic evidence. Prev Med. 1991;20(1):47–63.CrossRefPubMedGoogle Scholar
  89. 89.
    Petitti D. Hormone replacement therapy and coronary heart disease: results of randomized trials. Prog Cardiovasc Dis. 2003;46(3):231–8.CrossRefPubMedGoogle Scholar
  90. 90.
    Rossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, Stefanick ML, et al. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results from the Women’s Health Initiative randomized controlled. Trials. 2002:321–33.Google Scholar
  91. 91.
    Lawlor DA, Davey Smith G, Ebrahim S. Commentary: the hormone replacement-coronary heart disease conundrum: is this the death of observational epidemiology? Int J Epidemiol. 2004;33(3):464–7. Scholar
  92. 92.
    Hernan MA, Alonso A, Logan R, Grodstein F, Michels KB, Willett WC, et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology. 2008;19(6):766–79. Scholar
  93. 93.
    Gunn LJ, Chapeau-Blondeau F, McDonnell MD, Davis BR, Allison A, Abbott D. Too good to be true: when overwhelming evidence fails to convince. Proc Math Phys Eng Sci. 2016;472(2187):20150748. Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Timothy L. Lash
    • 1
    Email author
  • Lindsay J. Collin
    • 1
  • Miriam E. Van Dyke
    • 1
  1. 1.Department of EpidemiologyRollins School of Public Health, Emory UniversityAtlantaUSA

Personalised recommendations