Understanding and Mitigating the Replication Crisis, for Environmental Epidemiologists

  • Scott M. BartellEmail author
Methods in Environmental Epidemiology (AZ Pollack and NJ Perkins, Section Editors)
Part of the following topical collections:
  1. Topical Collection on Methods in Environmental Epidemiology


Purpose of Review

In recent years, investigators in a variety of fields have reported that most published findings can not be replicated. This review evaluates the factors contributing to lack of reproducibility, implications for environmental epidemiology, and strategies for mitigation.

Recent Findings

Although publication bias and other types of selective reporting may contribute substantially to irreproducible results, underpowered analyses and low prevalence of true associations likely explain most failures to replicate novel scientific results. Epidemiologists can counter these risks by ensuring that analyses are well-powered or precise, focusing on scientifically justified hypotheses, strictly controlling type I error rates, emphasizing estimation over statistical significance, avoiding practices that introduce bias, or employing bias analysis and triangulation. Avoidance of p values has no effect on reproducibility if confidence intervals excluding the null are emphasized in a similar manner.


Increased attention to exposure mixtures and susceptible subpopulations, and wider use of omics technologies, will likely decrease the proportion of investigated associations that are true associations, requiring greater caution in study design, analysis, and interpretation. Though well intentioned, these recent trends in environmental epidemiology will likely decrease reproducibility if no effective actions are taken to mitigate the risk of spurious findings.


Reliability Reproducibility False positive Type I error Family-wise error rate False discovery rate p value Hypothesis testing 


Compliance with Ethical Standards

Conflict of Interest

Scott M. Bartell declares that he has no conflict of interest.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.


Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance

  1. 1.
    Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets? Nat Rev Drug Discov. September 2011;10(9):712. Scholar
  2. 2.
    Begley CG, Ellis LM. Drug development: raise standards for preclinical cancer research. Nature. 2012.
  3. 3.
    Open Science Collaboration. Estimating the reproducibility of psychological science. Science. 2015;349(6251):aac4716. Scholar
  4. 4.
    • Dumas-Mallet E, Button K, Boraud T, Munafo M, Gonon F. Replication validity of initial association studies: a comparison between psychiatry, neurology and four somatic diseases. PLoS One. 2016;11(6):e0158064. This study assesses reproducibility by comparing 663 meta analyses of risk factor associations to the initial studies reporting those associations. CrossRefGoogle Scholar
  5. 5.
    Baker M. 1,500 scientists lift the lid on reproducibility. Nat News. 2016;533(7604):452–4.CrossRefGoogle Scholar
  6. 6.
    •• Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):e124. This study extends the PPV framework to account for bias, and provides example PPV calculations for various types of studies. CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    •• Browner WS, Newman TB. Are all significant p values created equal? The analogy between diagnostic tests and clinical research. JAMA. 1987;257:2459–63. This study explains the application of the PPV framework to hypothesis testing in research. CrossRefGoogle Scholar
  8. 8.
    • Lash TL. The harm done to reproducibility by the culture of null hypothesis significance testing. Am J Epidemiol. 2017;186(6):627–35. This manuscript discusses the poor reproducibility of traditional hypothesis testing, and advocates a change in scientific culture to focus on estimation. CrossRefGoogle Scholar
  9. 9.
    McDonald JH. Handbook of biological statistics. 3rd ed. Baltimore: Sparky House Publishing; 2014.Google Scholar
  10. 10.
    • Sterne JAC, Smith GD. Sifting the evidence—what’s wrong with significance tests? Phys Ther. 2001;81(8):1464–9. This manuscript assesses the impacts of power and type I error rate on the proportion of false positives, and advocates the use of p-values as measures of evidence rather than determining statistical significance. CrossRefGoogle Scholar
  11. 11.
    Mullard A. 2016 FDA drug approvals. Nat Rev Drug Discov. 2017;16:73–6.CrossRefGoogle Scholar
  12. 12.
    Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14(5):365–76.CrossRefGoogle Scholar
  13. 13.
    •• Dumas-Mallet E, Button KS, Boraud T, Gonon F, Munafò MR. Low statistical power in biomedical science: a review of three human research domains. R Soc Open Sci. 2017 cited 2018 Aug 5;4(2)160254. Available from: This study assesses reproducibility by comparing 663 meta analyses of risk factor associations to the initial studies reporting those associations.
  14. 14.
    • Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, du Sert NP, et al. A manifesto for reproducible science. Nat Hum Behav. 2017;1(1):0021. This manuscript proposes changes in key elements of the scientific process that could enhance reproducibility. CrossRefGoogle Scholar
  15. 15.
    Rothman KJ. No adjustments are needed for multiple comparisons. Epidemiology. 1990;1(1):43–6.CrossRefGoogle Scholar
  16. 16.
    Young SS. Air quality environmental epidemiology studies are unreliable. Regul Toxicol Pharmacol. 2017;86:177–80.CrossRefGoogle Scholar
  17. 17.
    Chadeau-Hyam M, Campanella G, Jombart T, Bottolo L, Portengen L, Vineis P, et al. Deciphering the complex: methodological overview of statistical models to derive OMICS-based biomarkers. Environ Mol Mutagen. 2013;54(7):542–57.CrossRefGoogle Scholar
  18. 18.
    • Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, et al. A systematic comparison of linear regression-based statistical methods to assess exposome-health associations. Environ Health Perspect. 2016 [cited 2018 Aug 6];124(12)1848–1856. Available from: This exposome simulation study assesses the false discovery proportion and sensitivity for a variety of common statistical methods addressing multiple comparisons.
  19. 19.
    Mielke MM, Vemuri P, Rocca WA. Clinical epidemiology of Alzheimer’s disease: assessing sex and gender differences. Clin Epidemiol. 2014;6:37–48.CrossRefGoogle Scholar
  20. 20.
    van den Berg M, Wendel-Vos W, van Poppel M, Kemper H, van Mechelen W, Maas J. Health benefits of green spaces in the living environment: a systematic review of epidemiological studies. Urban For Urban Green. 2015;14(4):806–16.CrossRefGoogle Scholar
  21. 21.
    Greenland S. Tests for interaction in epidemiologic studies: a review and a study of power. Stat Med. 1983;2(2):243–51.CrossRefGoogle Scholar
  22. 22.
    Thompson WD. Effect modification and the limits of biological inference from epidemiologic data. J Clin Epidemiol. 1991;44(3):221–32.CrossRefGoogle Scholar
  23. 23.
    Landis SC, Amara SG, Asadullah K, Austin CP, Blumenstein R, Bradley EW, et al. A call for transparent reporting to optimize the predictive value of preclinical research. Nature. 2012;490(7419):187–91.CrossRefGoogle Scholar
  24. 24.
    Collins FS, Tabak LA. NIH plans to enhance reproducibility. Nature. 2014;505(7485):612–3.CrossRefGoogle Scholar
  25. 25.
    LaKind JS, Goodman M, Makris SL, Mattison DR. Improving concordance in environmental epidemiology: a three-part proposal. J Toxicol Environ Health B Crit Rev. 2015;18(2):105–20.CrossRefGoogle Scholar
  26. 26.
    • Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers E-J, Berk R, et al. Redefine statistical significance. Nat Hum Behav. 2018;2(1):6–10. This manuscript by 72 authors advocates the use of 0.005 instead of 0.05 as the standard threshold for statistical significance. CrossRefGoogle Scholar
  27. 27.
    Shaffer JP. Multiple hypothesis testing. Annu Rev Psychol. 1995;46(1):561–84.CrossRefGoogle Scholar
  28. 28.
    Benjamini Y, Yekutieli D, Edwards D, Shaffer JP, Tamhane AC, Westfall PH, et al. False discovery rate: adjusted multiple confidence intervals for selected parameters [with comments, rejoinder]. J Am Stat Assoc. 2005;100(469):71–93.CrossRefGoogle Scholar
  29. 29.
    Langholz B, Borgan ØR. Counter-matching: a stratified nested case-control sampling method. Biometrika. 1995;82(1):69–79.CrossRefGoogle Scholar
  30. 30.
    Weinberg CR, Umbach DM. Using pooled exposure assessment to improve efficiency in case-control studies. Biometrics. 1999;55(3):718–26.CrossRefGoogle Scholar
  31. 31.
    Zhou H, Weaver MA, Qin J, Longnecker MP, Wang MC. A semiparametric empirical likelihood method for data from an outcome-dependent sampling scheme with a continuous outcome. Biometrics. 2004;58:413–21.CrossRefGoogle Scholar
  32. 32.
    Haneuse S, Bartell S. Designs for the combination of group- and individual-level data. Epidemiology. 2011;22(3):382–9.CrossRefGoogle Scholar
  33. 33.
    Kass PH. Modern epidemiological study designs. In: Handbook of epidemiology. Springer, New York, NY; 2014 [cited 2018 Aug 5]. p. 325–63. Available from:
  34. 34.
    Steenland K, Jin C, MacNeil J, Lally C, Ducatman A, Vieira V, et al. Predictors of PFOA levels in a community surrounding a chemical plant. Environ Health Perspect. 2009;117(7):1083–8.CrossRefGoogle Scholar
  35. 35.
    Rothman KJ, Greenland S. Planning study size based on precision rather than power. Epidemiology. 2018;29(5):599–603.CrossRefGoogle Scholar
  36. 36.
    Tukey JW. We need both exploratory and confirmatory. Am Stat. 1980;34(1):23–5.Google Scholar
  37. 37.
    Bartell SM, Longhurst J, Tjoa T, Sioutas C, Delfino RJ. Particulate air pollution, ambulatory heart rate variability, and cardiac arrhythmia in retirement community residents with coronary artery disease. Environ Health Perspect. 2013;121(10):1135–41.CrossRefGoogle Scholar
  38. 38.
    Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC, Greenland S. Good practices for quantitative bias analysis. Int J Epidemiol. 2014;43(6):1969–85.CrossRefGoogle Scholar
  39. 39.
    Munafò MR, Smith GD. Robust research needs many lines of evidence. Nature. 2018;553(7689):399–401.CrossRefGoogle Scholar
  40. 40.
    Watkins DJ, Josson J, Elston B, Bartell SM, Shin H-M, Vieira VM, et al. Exposure to perfluoroalkyl acids and markers of kidney function among children and adolescents living near a chemical plant. Environ Health Perspect. 2013;121(5):625–30.CrossRefGoogle Scholar
  41. 41.
    Dhingra R, Winquist A, Darrow LA, Klein M, Steenland K. A study of reverse causation: examining the associations of perfluorooctanoic acid serum levels with two outcomes. Environ Health Perspect. 2017;125(3):416–21.CrossRefGoogle Scholar
  42. 42.
    Weisskopf MG, Webster TF. Trade-offs of personal versus more proxy exposure measures in environmental epidemiology. Epidemiology. 2017;28:635–43.CrossRefGoogle Scholar
  43. 43.
    Lipsitch M, Tchetgen ET, Cohen T. Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology. 2010;21(3):383–8.CrossRefGoogle Scholar
  44. 44.
    Arnold BF, Ercumen A, Benjamin-Chung J, Colford JMJ. Brief report: negative controls to detect selection bias and measurement bias in epidemiologic studies. Epidemiology. 2016;27(5):637–41.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Program in Public Health, Susan and Henry Samueli College of Health SciencesUniversity of California IrvineIrvineUSA
  2. 2.Department of Statistics, Donald Bren School of Information and Computer SciencesUniversity of California IrvineIrvineUSA
  3. 3.Department of Epidemiology, School of Medicine, Susan and Henry Samueli College of Health SciencesUniversity of California IrvineIrvineUSA

Personalised recommendations