Evidence-based medicine frequently uses statistical hypothesis testing. In this paradigm, data can only disconfirm a research hypothesis’ competitors: One tests the negation of a statistical hypothesis that is supposed to correspond to the research hypothesis. In practice, these hypotheses are often misaligned. For instance, directional research hypotheses are often paired with non-directional statistical hypotheses. Prima facie, one cannot gain proper evidence for one’s research hypothesis employing a misaligned statistical hypothesis. This paper sheds lights on the nature of and the reasons for such misalignments and it provides a thorough analysis of whether they pose a threat to evidence-based medicine. The upshots are that the misalignments are often hidden for clinicians and that although some cases of misalignments can be partially counterbalanced, the overall threat is non-negligible. The counterbalances either lead to methodological inadequacy (in addition to the misalignment), loss of statistical power, or involve a (potential) lack of information that could be crucial for decision making. This result casts doubt on various findings of medical studies in addition to issues associated with under-powered studies or the replication crisis.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
We briefly discuss the normative issue of whether directional research hypotheses should be used in evidence-based medicine at all in Sect. 4.4.
There are also cases where the null hypothesis is not the negation of the alternative hypothesis (e.g., when considering fixed point alternatives). However, since these cases are the exception rather than the rule, they are not our main focus of this article, except for cases where they may serve as a potential remedy for avoiding what we call ‘magnitude misalignment’ (see below).
Note that it would be methodologically inadequate to exclusively consider effect sizes because by-chance variations cannot be accounted for.
As we indicated in footnote 4, in such cases \(H_1\) and \(H_0\) do not exhaust the parameter space.
In fact, Jeong and Yoo’s study also features direction misalignment. They conclude from two-sided tests that there is a significant improvement (cf., e.g., Jeong and Yoo 2015, p. 1952).
We owe this suggestion to Gerit Pfuhl.
There are proposals for solving this problem, see, e.g., (Lehmann and Romano 2005), p. 229 ff.
We owe this suggestion to a researcher in the audience of our talk at the 8th Philosophy of Medicine Roundtable.
Altman D (1991) Statistic for medical research, first edn. Chapan & Hall, London
Bariani G, de Celis Ferrari A, Precivale M, Arai R, Saad E, Riechelmann R (2015) Sample size calculation in oncology trials: quality of reporting and implications for clinical cancer research. Am J Clin Oncol 38(6):570–574
Benjamin D et al (2018) Redefine statistical significance. Nat Hum Behav 2:6–10
Berger J, Sellke T (1987) Testing a point null hypothesis: the irreconcilability of P values and evidence. J Am Stat Assoc 82(397):62–71
Bigirumurame T, Kasim AS (2017) Can testing clinical significance reduce false positive rates in randomized controlled trials? A snap review. BMC Res Notes 10:775
Bland M (2000) Introduction to medical statistics, third edn. Oxford University Press, Oxford
Braver S (1975) On splitting the tails unequally: a new perspective on one- versus two-tailed tests. Educ Psychol Meas 32:283–301
Casella G, Berger R (1987) Reconcilling bayesian and frequentist evidence in the one-sided testing problem. J Am Stat Assoc 82(397):106–111
Cho H-C, Abe S (2013) Is two-tailed testing for directional research hypotheses tests legitimate? J Bus Res 66:1261–1266
Cohen J (1994) The earth is round (P \(<\).05). Am Psychol 49(12):997–1003
Colquhoun D (2014) An investigation of the false discovery rate and the misinterpretation of P-values. R Soc Open Sci 1(140216):1–16
Cumming G (2012) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Routledge, London
Davis R et al (2014) Reproducibility project: cancer biology. https://elifesciences.org/collections/9b1e83d1/reproducibility-project-cancer-biology
Derakhshanrad N, Vosoughi F, Yekaninejad M, Moshayedi P, Saberi H (2015) Functional impact of multidisciplinary outpatient program on patients with chronic complete spinal cord injury. Spinal Cord 53:850–865
Djulbegovic B (2009) The paradox of equipoise: the principle that drives and limits therapeutic discoveries in clinical research. Cancer Control 16(4):342–347
Dubey S (1991) Some thoughts on the one-sided and two-sided tests. J Biopharm Stat 1(1):139–150
Dwan K, Altman D, Arnaiz JA, Bloom J, Chan A-W, Cronin E, Decullier E, Easterbrook P, Von Elm E, Gamble C, Ghersi D, Ioannidis J, Simesa J, Williamson PR (2008) Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS ONE 3(8):e3081
Everitt B (2006) Medical statistics from A to Z—a guide for clinicians and medical students, second edn. Cambridge University Press, Cambridge
Field A (2000) Discovering statistics using SPSS for windows. Sage Publications, London
Freedman B (1987) Equipoise and the ethics of clinical research. N Engl J Med 317(3):141–145
Gigerenzer G (2004) Mindless statistics. J Socio-Econ 33:587–606
Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schünemann HJ (2008) GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 336(7650):924–926
Hacking I (2001) An introduction to probability and inductive logic. Cambridge University Press, Cambridge
Howell D (2007) Statistical methods for psychology, 6th edn. Thomson Wadsworth, Belmont
International council for harmonisation of technical requirements for pharmaceuticals for human use (1998) ICH harmonized tripartite guideline: statistical principles for clinical trials E9
Ioannidis J (2005) Why most published research findings are false. PLoS Med 2(8):e124
Jeong J, Yoo W (2015) Effects of air stacking on pulmonary function and peak cough flow in patients with cervical spinal cord injury. J Phys Ther Sci 27(6):1951–1952
Kaiser H (1960) Directional statistical decisions. Psychol Rev 67:160–167
Kimmer H (1957) Three criteria for the use of one-tailed tests. Psychol Bull 54:351–353
Kirkwood B, Sterne J (2003) Essential medical statistics, second edn. Blackwell, Oxford
Lecoutre B, Poitevineau J (2014) The Significance test controversy revisited. The fiducial bayesian alternative. Springer, New York
Lehmann E, Romano J (2005) Testing statistical hypotheses, third edn. Springer, New York
Machin D, Campbell M, Walters S (2007) Medical statistics. A textbook for the health sciences, fourth edn. Wiley, New York
Marcus R, Peritz E, Gabriel K (1976) On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63(3):655–660
Meehl P (1978) Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. J Consult Clin Psychol 46:806–834
Nickerson R (2000) Null hypothesis significance testing: a review of an old and continuing controversy. Psycholl Methods 5(2):241–301
Ruxton GD, Neuhäuser M (2010) When should we use one-tailed hypothesis testing? Methods Ecol Evol 1:114–117
Zimmermann G, Bolter L-M, Sluka R, Höller Y, Bathke AC, Thomschewski A, Leis S, Lattanzi S, Brigo F, Trinka E (2019) Sample sizes and statistica methods in interventional studies on individuals with spinal cord injury: a systematic review. J Evid-Based Med
We thank Arne Bathke, Robyn Bluhm, Charlotte Werndl, the audiences in Genoa, Paris, and Munich, the editors of the special issue Fabrizio Macagno and Carlo Martini, as well as two anonymous reviewers for their constructive criticisms and suggestions.
Insa Lawler gratefully acknowledges that part of her research for this article was funded by the OeAD for an Ernst Mach Scholarship and by an Emmy Noether Grant from the German Research Council (DFG), Reference No. BR 5210/1-1. Georg Zimmermann received research support (IT equipment and conference travel reimbursements) from Eisai Europe Ltd.
Conflict of interest
The authors declare that they have no conflict of interest.
Research Involving Human and Animal Rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lawler, I., Zimmermann, G. Misalignment Between Research Hypotheses and Statistical Hypotheses: A Threat to Evidence-Based Medicine?. Topoi (2019). https://doi.org/10.1007/s11245-019-09667-0
- Research hypotheses
- Statistical hypothesis testing
- Null hypotheses
- Evidence-based medicine
- Clinical decision making