Misalignment Between Research Hypotheses and Statistical Hypotheses: A Threat to Evidence-Based Medicine?


Evidence-based medicine frequently uses statistical hypothesis testing. In this paradigm, data can only disconfirm a research hypothesis’ competitors: One tests the negation of a statistical hypothesis that is supposed to correspond to the research hypothesis. In practice, these hypotheses are often misaligned. For instance, directional research hypotheses are often paired with non-directional statistical hypotheses. Prima facie, one cannot gain proper evidence for one’s research hypothesis employing a misaligned statistical hypothesis. This paper sheds lights on the nature of and the reasons for such misalignments and it provides a thorough analysis of whether they pose a threat to evidence-based medicine. The upshots are that the misalignments are often hidden for clinicians and that although some cases of misalignments can be partially counterbalanced, the overall threat is non-negligible. The counterbalances either lead to methodological inadequacy (in addition to the misalignment), loss of statistical power, or involve a (potential) lack of information that could be crucial for decision making. This result casts doubt on various findings of medical studies in addition to issues associated with under-powered studies or the replication crisis.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2


  1. 1.

    We briefly discuss the normative issue of whether directional research hypotheses should be used in evidence-based medicine at all in Sect. 4.4.

  2. 2.

    Cho and Abe (2013) claim that this issue also prevails in business research, and given the reasons provided in Sect. 4, it is likely to be also found in other disciplines, e.g., psychology.

  3. 3.

    For an overview see, e.g., (Nickerson 2000; Gigerenzer 2004; Lecoutre and Poitevineau 2014).

  4. 4.

    There are also cases where the null hypothesis is not the negation of the alternative hypothesis (e.g., when considering fixed point alternatives). However, since these cases are the exception rather than the rule, they are not our main focus of this article, except for cases where they may serve as a potential remedy for avoiding what we call ‘magnitude misalignment’ (see below).

  5. 5.

    Note that it would be methodologically inadequate to exclusively consider effect sizes because by-chance variations cannot be accounted for.

  6. 6.

    As we indicated in footnote 4, in such cases \(H_1\) and \(H_0\) do not exhaust the parameter space.

  7. 7.

    In fact, Jeong and Yoo’s study also features direction misalignment. They conclude from two-sided tests that there is a significant improvement (cf., e.g., Jeong and Yoo 2015, p. 1952).

  8. 8.

    We owe this suggestion to Gerit Pfuhl.

  9. 9.

    Similar demands are voiced by, e.g., (Berger and Sellke 1987; Nickerson 2000; Colquhoun 2014).

  10. 10.

    There are proposals for solving this problem, see, e.g., (Lehmann and Romano 2005), p. 229 ff.

  11. 11.

    Statistical significance still seems to play a dominant role in medical research insofar as that statistically insignificant results are less frequently published (cf., e.g., Altman 1991, Chaps. 8.5.4, 15.5.2; Dwan et al. 2008)—this phenomenon is called ‘publication bias’.

  12. 12.

    We owe this suggestion to a researcher in the audience of our talk at the 8th Philosophy of Medicine Roundtable.


  1. Altman D (1991) Statistic for medical research, first edn. Chapan & Hall, London

    Google Scholar 

  2. Bariani G, de Celis Ferrari A, Precivale M, Arai R, Saad E, Riechelmann R (2015) Sample size calculation in oncology trials: quality of reporting and implications for clinical cancer research. Am J Clin Oncol 38(6):570–574

    Article  Google Scholar 

  3. Benjamin D et al (2018) Redefine statistical significance. Nat Hum Behav 2:6–10

    Article  Google Scholar 

  4. Berger J, Sellke T (1987) Testing a point null hypothesis: the irreconcilability of P values and evidence. J Am Stat Assoc 82(397):62–71

    Google Scholar 

  5. Bigirumurame T, Kasim AS (2017) Can testing clinical significance reduce false positive rates in randomized controlled trials? A snap review. BMC Res Notes 10:775

    Article  Google Scholar 

  6. Bland M (2000) Introduction to medical statistics, third edn. Oxford University Press, Oxford

    Google Scholar 

  7. Braver S (1975) On splitting the tails unequally: a new perspective on one- versus two-tailed tests. Educ Psychol Meas 32:283–301

    Article  Google Scholar 

  8. Casella G, Berger R (1987) Reconcilling bayesian and frequentist evidence in the one-sided testing problem. J Am Stat Assoc 82(397):106–111

    Article  Google Scholar 

  9. Cho H-C, Abe S (2013) Is two-tailed testing for directional research hypotheses tests legitimate? J Bus Res 66:1261–1266

    Article  Google Scholar 

  10. Cohen J (1994) The earth is round (P \(<\).05). Am Psychol 49(12):997–1003

    Article  Google Scholar 

  11. Colquhoun D (2014) An investigation of the false discovery rate and the misinterpretation of P-values. R Soc Open Sci 1(140216):1–16

    Google Scholar 

  12. Cumming G (2012) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Routledge, London

    Google Scholar 

  13. Davis R et al (2014) Reproducibility project: cancer biology. https://elifesciences.org/collections/9b1e83d1/reproducibility-project-cancer-biology

  14. Derakhshanrad N, Vosoughi F, Yekaninejad M, Moshayedi P, Saberi H (2015) Functional impact of multidisciplinary outpatient program on patients with chronic complete spinal cord injury. Spinal Cord 53:850–865

    Google Scholar 

  15. Djulbegovic B (2009) The paradox of equipoise: the principle that drives and limits therapeutic discoveries in clinical research. Cancer Control 16(4):342–347

    Article  Google Scholar 

  16. Dubey S (1991) Some thoughts on the one-sided and two-sided tests. J Biopharm Stat 1(1):139–150

    Article  Google Scholar 

  17. Dwan K, Altman D, Arnaiz JA, Bloom J, Chan A-W, Cronin E, Decullier E, Easterbrook P, Von Elm E, Gamble C, Ghersi D, Ioannidis J, Simesa J, Williamson PR (2008) Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS ONE 3(8):e3081

    Article  Google Scholar 

  18. Everitt B (2006) Medical statistics from A to Z—a guide for clinicians and medical students, second edn. Cambridge University Press, Cambridge

    Google Scholar 

  19. Field A (2000) Discovering statistics using SPSS for windows. Sage Publications, London

    Google Scholar 

  20. Freedman B (1987) Equipoise and the ethics of clinical research. N Engl J Med 317(3):141–145

    Article  Google Scholar 

  21. Gigerenzer G (2004) Mindless statistics. J Socio-Econ 33:587–606

    Article  Google Scholar 

  22. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schünemann HJ (2008) GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 336(7650):924–926

    Article  Google Scholar 

  23. Hacking I (2001) An introduction to probability and inductive logic. Cambridge University Press, Cambridge

    Google Scholar 

  24. Howell D (2007) Statistical methods for psychology, 6th edn. Thomson Wadsworth, Belmont

    Google Scholar 

  25. International council for harmonisation of technical requirements for pharmaceuticals for human use (1998) ICH harmonized tripartite guideline: statistical principles for clinical trials E9

  26. Ioannidis J (2005) Why most published research findings are false. PLoS Med 2(8):e124

    Article  Google Scholar 

  27. Jeong J, Yoo W (2015) Effects of air stacking on pulmonary function and peak cough flow in patients with cervical spinal cord injury. J Phys Ther Sci 27(6):1951–1952

    Article  Google Scholar 

  28. Kaiser H (1960) Directional statistical decisions. Psychol Rev 67:160–167

    Article  Google Scholar 

  29. Kimmer H (1957) Three criteria for the use of one-tailed tests. Psychol Bull 54:351–353

    Article  Google Scholar 

  30. Kirkwood B, Sterne J (2003) Essential medical statistics, second edn. Blackwell, Oxford

    Google Scholar 

  31. Lecoutre B, Poitevineau J (2014) The Significance test controversy revisited. The fiducial bayesian alternative. Springer, New York

    Google Scholar 

  32. Lehmann E, Romano J (2005) Testing statistical hypotheses, third edn. Springer, New York

    Google Scholar 

  33. Machin D, Campbell M, Walters S (2007) Medical statistics. A textbook for the health sciences, fourth edn. Wiley, New York

    Google Scholar 

  34. Marcus R, Peritz E, Gabriel K (1976) On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63(3):655–660

    Article  Google Scholar 

  35. Meehl P (1978) Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. J Consult Clin Psychol 46:806–834

    Article  Google Scholar 

  36. Nickerson R (2000) Null hypothesis significance testing: a review of an old and continuing controversy. Psycholl Methods 5(2):241–301

    Article  Google Scholar 

  37. Ruxton GD, Neuhäuser M (2010) When should we use one-tailed hypothesis testing? Methods Ecol Evol 1:114–117

    Article  Google Scholar 

  38. Zimmermann G, Bolter L-M, Sluka R, Höller Y, Bathke AC, Thomschewski A, Leis S, Lattanzi S, Brigo F, Trinka E (2019) Sample sizes and statistica methods in interventional studies on individuals with spinal cord injury: a systematic review. J Evid-Based Med

Download references


 We thank Arne Bathke, Robyn Bluhm, Charlotte Werndl, the audiences in Genoa, Paris, and Munich, the editors of the special issue Fabrizio Macagno and Carlo Martini, as well as two anonymous reviewers for their constructive criticisms and suggestions.


Insa Lawler gratefully acknowledges that part of her research for this article was funded by the OeAD for an Ernst Mach Scholarship and by an Emmy Noether Grant from the German Research Council (DFG), Reference No. BR 5210/1-1. Georg Zimmermann received research support (IT equipment and conference travel reimbursements) from Eisai Europe Ltd.

Author information



Corresponding author

Correspondence to Insa Lawler.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Research Involving Human and Animal Rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lawler, I., Zimmermann, G. Misalignment Between Research Hypotheses and Statistical Hypotheses: A Threat to Evidence-Based Medicine?. Topoi (2019). https://doi.org/10.1007/s11245-019-09667-0

Download citation


  • Research hypotheses
  • Statistical hypothesis testing
  • Null hypotheses
  • Evidence-based medicine
  • Clinical decision making