Skip to main content

Error Rates, Decisive Outcomes and Publication Bias with Several Inferential Methods

An Erratum to this article was published on 08 April 2016



Statistical methods for inferring the true magnitude of an effect from a sample should have acceptable error rates when the true effect is trivial (type I rates) or substantial (type II rates).


The objective of this study was to quantify the error rates, rates of decisive (publishable) outcomes and publication bias of five inferential methods commonly used in sports medicine and science. The methods were conventional null-hypothesis significance testing [NHST] (significant and non-significant imply substantial and trivial true effects, respectively); conservative NHST (the observed magnitude is interpreted as the true magnitude only for significant effects); non-clinical magnitude-based inference [MBI] (the true magnitude is interpreted as the magnitude range of the 90 % confidence interval only for intervals not spanning substantial values of the opposite sign); clinical MBI (a possibly beneficial effect is recommended for implementation only if it is most unlikely to be harmful); and odds-ratio clinical MBI (implementation is also recommended when the odds of benefit outweigh the odds of harm, with an odds ratio >66).


Simulation was used to quantify standardized mean effects in 500,000 randomized, controlled trials each for true standardized magnitudes ranging from null through marginally moderate with three sample sizes: suboptimal (10 + 10), optimal for MBI (50 + 50) and optimal for NHST (144 + 144).


Type I rates for non-clinical MBI were always lower than for NHST. When type I rates for clinical MBI were higher, most errors were debatable, given the probabilistic qualification of those inferences (unlikely or possibly beneficial). NHST often had unacceptable rates for either type II errors or decisive outcomes, and it had substantial publication bias with the smallest sample size, whereas MBI had no such problems.


MBI is a trustworthy, nuanced alternative to NHST, which it outperforms in terms of the sample size, error rates, decision rates and publication bias.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. Carver R. The case against statistical significance testing. Harv Educ Rev. 1978;48:378–99.

    Article  Google Scholar 

  2. Cohen J. The earth is round (p < .05). Am Psychol. 1994;49:997–1003.

    Article  Google Scholar 

  3. Stang A, Poole C, Kuss O. The ongoing tyranny of statistical significance testing in biomedical research. Eur J Epidemiol. 2010;25:225–30.

    Article  PubMed  Google Scholar 

  4. Ziliak ST, McCloskey DN. The cult of statistical significance. Ann Arbor: University of Michigan Press; 2008.

    Google Scholar 

  5. Cumming G. The new statistics: why and how. Psychol Sci. 2014;25:7–29.

    Article  PubMed  Google Scholar 

  6. Halsey LG, Curran-Everett D, Vowler SL, et al. The fickle P value generates irreproducible results. Nature Methods. 2015;12:179–85.

    CAS  Article  PubMed  Google Scholar 

  7. Nuzzo R. Scientific method: statistical errors. Nature. 2014;506:150–2.

    CAS  Article  PubMed  Google Scholar 

  8. Batterham AM, Hopkins WG. Making meaningful inferences about magnitudes. Int J Sports Physiol Perform. 2006;1:50–7.

    Article  PubMed  Google Scholar 

  9. Hopkins WG, Marshall SW, Batterham AM, et al. Progressive statistics for studies in sports medicine and exercise science. Med Sci Sports Exerc. 2009;41:3–12.

    Article  PubMed  Google Scholar 

  10. Gurrin LC, Kurinczuk JJ, Burton PR. Bayesian statistics in medical research: an intuitive alternative to conventional data analysis. J Eval Clin Pract. 2000;6:193–204.

    CAS  Article  PubMed  Google Scholar 

  11. Shakespeare TP, Gebski VJ, Veness MJ, et al. Improving interpretation of clinical studies by use of confidence levels, clinical significance curves, and risk–benefit contours. Lancet. 2001;357:1349–53.

    CAS  Article  PubMed  Google Scholar 

  12. Welsh AH, Knight EJ. “Magnitude-based inference”: a statistical review. Med Sci Sports Exerc. 2015;47:874–84.

    Article  PubMed  Google Scholar 

  13. Batterham AM, Hopkins WG. The case for magnitude-based inference. Med Sci Sports Exerc. 2015;47:885.

    Article  PubMed  Google Scholar 

  14. Hopkins WG. A spreadsheet for deriving a confidence interval, mechanistic inference and clinical inference from a P value. Sportscience. 2007;11:16–20.

    Google Scholar 

  15. Barker RJ, Schofield MR. Inference about magnitudes of effects. Int J Sports Physiol Perform. 2008;3:547–57.

    Article  PubMed  Google Scholar 

  16. Hopkins WG, Batterham AM. An imaginary Bayesian monster. Int J Sports Physiol Perform. 2010;3:411–2.

    Article  Google Scholar 

  17. Gigerenzer G. Mindless statistics. J Socio Econ. 2004;33:587–606.

    Article  Google Scholar 

  18. Schneider JW. Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations. Scientometrics. 2015;102:411–32.

    Article  Google Scholar 

  19. Armitage P, Berry G. Statistical methods in medical research. 2nd ed. Oxford: Blackwell Scientific; 1994.

    Google Scholar 

  20. Burton PR, Gurrin LC, Campbell MJ. Clinical significance not statistical significance: a simple Bayesian alternative to P values. J Epidemiol Community Health. 1998;52:318–23.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian approaches to clinical trials and health-care evaluation. Chichester: Wiley; 2004.

    Google Scholar 

  22. Burton PR. Helping doctors to draw appropriate inferences from the analysis of medical studies. Stat Med. 1994;13:1699–713.

    CAS  Article  PubMed  Google Scholar 

  23. Hopkins WG. Estimating sample size for magnitude-based inferences. Sportscience. 2006;10:63–70.

    Google Scholar 

  24. Becker BJ. Synthesizing standardized mean-change measures. Br J Math Stat Psychol. 1988;41:257–78.

    Article  Google Scholar 

  25. Hopewell S, Loudon K, Clarke MJ, et al. Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database Syst Rev. 2009;(1):MR000006.

  26. Hughes MD. Reporting Bayesian analyses of clinical trials. Stat Med. 1993;12:1651–63.

    CAS  Article  PubMed  Google Scholar 

  27. George K, Batterham AM. So what does this all mean? Phys Ther Sport. 2015;16:1–2.

    Article  PubMed  Google Scholar 

  28. Cook JA, Hislop J, Adewuyi TE, et al. Assessing methods to specify the target difference for a randomised controlled trial: DELTA (Difference Elicitation in Trials) review. Health Technol Assess. 2014;18(28):v–vi, 1–175.

  29. Gigerenzer G, Marewski JN. Surrogate science: the idol of a universal method for scientific inference. J Manage. 2015;41:421–40.

    Google Scholar 

Download references


The authors thank Kenneth Quarrie for his valuable feedback on drafts of this article.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Will G. Hopkins.

Ethics declarations

Conflict of interest

Will G. Hopkins and Alan M. Batterham have no conflicts of interest to declare with regard to this publication. No funding was received for the conduct of this study and/or the preparation of this manuscript.

Additional information

An erratum to this article can be found at

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 39 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hopkins, W.G., Batterham, A.M. Error Rates, Decisive Outcomes and Publication Bias with Several Inferential Methods. Sports Med 46, 1563–1573 (2016).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Error Rate
  • Publication Bias
  • True Effect
  • Substantial Bias
  • Decisive Effect