Error Rates, Decisive Outcomes and Publication Bias with Several Inferential Methods
Statistical methods for inferring the true magnitude of an effect from a sample should have acceptable error rates when the true effect is trivial (type I rates) or substantial (type II rates).
The objective of this study was to quantify the error rates, rates of decisive (publishable) outcomes and publication bias of five inferential methods commonly used in sports medicine and science. The methods were conventional null-hypothesis significance testing [NHST] (significant and non-significant imply substantial and trivial true effects, respectively); conservative NHST (the observed magnitude is interpreted as the true magnitude only for significant effects); non-clinical magnitude-based inference [MBI] (the true magnitude is interpreted as the magnitude range of the 90 % confidence interval only for intervals not spanning substantial values of the opposite sign); clinical MBI (a possibly beneficial effect is recommended for implementation only if it is most unlikely to be harmful); and odds-ratio clinical MBI (implementation is also recommended when the odds of benefit outweigh the odds of harm, with an odds ratio >66).
Simulation was used to quantify standardized mean effects in 500,000 randomized, controlled trials each for true standardized magnitudes ranging from null through marginally moderate with three sample sizes: suboptimal (10 + 10), optimal for MBI (50 + 50) and optimal for NHST (144 + 144).
Type I rates for non-clinical MBI were always lower than for NHST. When type I rates for clinical MBI were higher, most errors were debatable, given the probabilistic qualification of those inferences (unlikely or possibly beneficial). NHST often had unacceptable rates for either type II errors or decisive outcomes, and it had substantial publication bias with the smallest sample size, whereas MBI had no such problems.
MBI is a trustworthy, nuanced alternative to NHST, which it outperforms in terms of the sample size, error rates, decision rates and publication bias.
- 4.Ziliak ST, McCloskey DN. The cult of statistical significance. Ann Arbor: University of Michigan Press; 2008.Google Scholar
- 14.Hopkins WG. A spreadsheet for deriving a confidence interval, mechanistic inference and clinical inference from a P value. Sportscience. 2007;11:16–20.Google Scholar
- 19.Armitage P, Berry G. Statistical methods in medical research. 2nd ed. Oxford: Blackwell Scientific; 1994.Google Scholar
- 21.Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian approaches to clinical trials and health-care evaluation. Chichester: Wiley; 2004.Google Scholar
- 23.Hopkins WG. Estimating sample size for magnitude-based inferences. Sportscience. 2006;10:63–70.Google Scholar
- 25.Hopewell S, Loudon K, Clarke MJ, et al. Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database Syst Rev. 2009;(1):MR000006.Google Scholar
- 28.Cook JA, Hislop J, Adewuyi TE, et al. Assessing methods to specify the target difference for a randomised controlled trial: DELTA (Difference Elicitation in Trials) review. Health Technol Assess. 2014;18(28):v–vi, 1–175.Google Scholar
- 29.Gigerenzer G, Marewski JN. Surrogate science: the idol of a universal method for scientific inference. J Manage. 2015;41:421–40.Google Scholar