Understanding and Mitigating the Replication Crisis, for Environmental Epidemiologists
- 25 Downloads
Purpose of Review
In recent years, investigators in a variety of fields have reported that most published findings can not be replicated. This review evaluates the factors contributing to lack of reproducibility, implications for environmental epidemiology, and strategies for mitigation.
Although publication bias and other types of selective reporting may contribute substantially to irreproducible results, underpowered analyses and low prevalence of true associations likely explain most failures to replicate novel scientific results. Epidemiologists can counter these risks by ensuring that analyses are well-powered or precise, focusing on scientifically justified hypotheses, strictly controlling type I error rates, emphasizing estimation over statistical significance, avoiding practices that introduce bias, or employing bias analysis and triangulation. Avoidance of p values has no effect on reproducibility if confidence intervals excluding the null are emphasized in a similar manner.
Increased attention to exposure mixtures and susceptible subpopulations, and wider use of omics technologies, will likely decrease the proportion of investigated associations that are true associations, requiring greater caution in study design, analysis, and interpretation. Though well intentioned, these recent trends in environmental epidemiology will likely decrease reproducibility if no effective actions are taken to mitigate the risk of spurious findings.
KeywordsReliability Reproducibility False positive Type I error Family-wise error rate False discovery rate p value Hypothesis testing
Compliance with Ethical Standards
Conflict of Interest
Scott M. Bartell declares that he has no conflict of interest.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance
- 2.Begley CG, Ellis LM. Drug development: raise standards for preclinical cancer research. Nature. 2012. https://doi.org/10.1038/483531a.
- 4.• Dumas-Mallet E, Button K, Boraud T, Munafo M, Gonon F. Replication validity of initial association studies: a comparison between psychiatry, neurology and four somatic diseases. PLoS One. 2016;11(6):e0158064. This study assesses reproducibility by comparing 663 meta analyses of risk factor associations to the initial studies reporting those associations. CrossRefGoogle Scholar
- 6.•• Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):e124. https://doi.org/10.1371/journal.pmed.0020124. This study extends the PPV framework to account for bias, and provides example PPV calculations for various types of studies. CrossRefPubMedPubMedCentralGoogle Scholar
- 8.• Lash TL. The harm done to reproducibility by the culture of null hypothesis significance testing. Am J Epidemiol. 2017;186(6):627–35. This manuscript discusses the poor reproducibility of traditional hypothesis testing, and advocates a change in scientific culture to focus on estimation. CrossRefGoogle Scholar
- 9.McDonald JH. Handbook of biological statistics. 3rd ed. Baltimore: Sparky House Publishing; 2014.Google Scholar
- 10.• Sterne JAC, Smith GD. Sifting the evidence—what’s wrong with significance tests? Phys Ther. 2001;81(8):1464–9. This manuscript assesses the impacts of power and type I error rate on the proportion of false positives, and advocates the use of p-values as measures of evidence rather than determining statistical significance. CrossRefGoogle Scholar
- 13.•• Dumas-Mallet E, Button KS, Boraud T, Gonon F, Munafò MR. Low statistical power in biomedical science: a review of three human research domains. R Soc Open Sci. 2017 cited 2018 Aug 5;4(2)160254. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5367316/. This study assesses reproducibility by comparing 663 meta analyses of risk factor associations to the initial studies reporting those associations.
- 18.• Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, et al. A systematic comparison of linear regression-based statistical methods to assess exposome-health associations. Environ Health Perspect. 2016 [cited 2018 Aug 6];124(12)1848–1856. Available from: http://ehp.niehs.nih.gov/EHP172. This exposome simulation study assesses the false discovery proportion and sensitivity for a variety of common statistical methods addressing multiple comparisons.
- 26.• Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers E-J, Berk R, et al. Redefine statistical significance. Nat Hum Behav. 2018;2(1):6–10. This manuscript by 72 authors advocates the use of 0.005 instead of 0.05 as the standard threshold for statistical significance. CrossRefGoogle Scholar
- 33.Kass PH. Modern epidemiological study designs. In: Handbook of epidemiology. Springer, New York, NY; 2014 [cited 2018 Aug 5]. p. 325–63. Available from: https://link.springer.com/referenceworkentry/10.1007/978-0-387-09834-0_8
- 36.Tukey JW. We need both exploratory and confirmatory. Am Stat. 1980;34(1):23–5.Google Scholar