Skip to main content
Log in

Misalignment Between Research Hypotheses and Statistical Hypotheses: A Threat to Evidence-Based Medicine?

  • Published:
Topoi Aims and scope Submit manuscript

Abstract

Evidence-based medicine frequently uses statistical hypothesis testing. In this paradigm, data can only disconfirm a research hypothesis’ competitors: One tests the negation of a statistical hypothesis that is supposed to correspond to the research hypothesis. In practice, these hypotheses are often misaligned. For instance, directional research hypotheses are often paired with non-directional statistical hypotheses. Prima facie, one cannot gain proper evidence for one’s research hypothesis employing a misaligned statistical hypothesis. This paper sheds lights on the nature of and the reasons for such misalignments and it provides a thorough analysis of whether they pose a threat to evidence-based medicine. The upshots are that the misalignments are often hidden for clinicians and that although some cases of misalignments can be partially counterbalanced, the overall threat is non-negligible. The counterbalances either lead to methodological inadequacy (in addition to the misalignment), loss of statistical power, or involve a (potential) lack of information that could be crucial for decision making. This result casts doubt on various findings of medical studies in addition to issues associated with under-powered studies or the replication crisis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. We briefly discuss the normative issue of whether directional research hypotheses should be used in evidence-based medicine at all in Sect. 4.4.

  2. Cho and Abe (2013) claim that this issue also prevails in business research, and given the reasons provided in Sect. 4, it is likely to be also found in other disciplines, e.g., psychology.

  3. For an overview see, e.g., (Nickerson 2000; Gigerenzer 2004; Lecoutre and Poitevineau 2014).

  4. There are also cases where the null hypothesis is not the negation of the alternative hypothesis (e.g., when considering fixed point alternatives). However, since these cases are the exception rather than the rule, they are not our main focus of this article, except for cases where they may serve as a potential remedy for avoiding what we call ‘magnitude misalignment’ (see below).

  5. Note that it would be methodologically inadequate to exclusively consider effect sizes because by-chance variations cannot be accounted for.

  6. As we indicated in footnote 4, in such cases \(H_1\) and \(H_0\) do not exhaust the parameter space.

  7. In fact, Jeong and Yoo’s study also features direction misalignment. They conclude from two-sided tests that there is a significant improvement (cf., e.g., Jeong and Yoo 2015, p. 1952).

  8. We owe this suggestion to Gerit Pfuhl.

  9. Similar demands are voiced by, e.g., (Berger and Sellke 1987; Nickerson 2000; Colquhoun 2014).

  10. There are proposals for solving this problem, see, e.g., (Lehmann and Romano 2005), p. 229 ff.

  11. Statistical significance still seems to play a dominant role in medical research insofar as that statistically insignificant results are less frequently published (cf., e.g., Altman 1991, Chaps. 8.5.4, 15.5.2; Dwan et al. 2008)—this phenomenon is called ‘publication bias’.

  12. We owe this suggestion to a researcher in the audience of our talk at the 8th Philosophy of Medicine Roundtable.

References

  • Altman D (1991) Statistic for medical research, first edn. Chapan & Hall, London

    Google Scholar 

  • Bariani G, de Celis Ferrari A, Precivale M, Arai R, Saad E, Riechelmann R (2015) Sample size calculation in oncology trials: quality of reporting and implications for clinical cancer research. Am J Clin Oncol 38(6):570–574

    Article  Google Scholar 

  • Benjamin D et al (2018) Redefine statistical significance. Nat Hum Behav 2:6–10

    Article  Google Scholar 

  • Berger J, Sellke T (1987) Testing a point null hypothesis: the irreconcilability of P values and evidence. J Am Stat Assoc 82(397):62–71

    Google Scholar 

  • Bigirumurame T, Kasim AS (2017) Can testing clinical significance reduce false positive rates in randomized controlled trials? A snap review. BMC Res Notes 10:775

    Article  Google Scholar 

  • Bland M (2000) Introduction to medical statistics, third edn. Oxford University Press, Oxford

    Google Scholar 

  • Braver S (1975) On splitting the tails unequally: a new perspective on one- versus two-tailed tests. Educ Psychol Meas 32:283–301

    Article  Google Scholar 

  • Casella G, Berger R (1987) Reconcilling bayesian and frequentist evidence in the one-sided testing problem. J Am Stat Assoc 82(397):106–111

    Article  Google Scholar 

  • Cho H-C, Abe S (2013) Is two-tailed testing for directional research hypotheses tests legitimate? J Bus Res 66:1261–1266

    Article  Google Scholar 

  • Cohen J (1994) The earth is round (P \(<\).05). Am Psychol 49(12):997–1003

    Article  Google Scholar 

  • Colquhoun D (2014) An investigation of the false discovery rate and the misinterpretation of P-values. R Soc Open Sci 1(140216):1–16

    Google Scholar 

  • Cumming G (2012) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Routledge, London

    Google Scholar 

  • Davis R et al (2014) Reproducibility project: cancer biology. https://elifesciences.org/collections/9b1e83d1/reproducibility-project-cancer-biology

  • Derakhshanrad N, Vosoughi F, Yekaninejad M, Moshayedi P, Saberi H (2015) Functional impact of multidisciplinary outpatient program on patients with chronic complete spinal cord injury. Spinal Cord 53:850–865

    Google Scholar 

  • Djulbegovic B (2009) The paradox of equipoise: the principle that drives and limits therapeutic discoveries in clinical research. Cancer Control 16(4):342–347

    Article  Google Scholar 

  • Dubey S (1991) Some thoughts on the one-sided and two-sided tests. J Biopharm Stat 1(1):139–150

    Article  Google Scholar 

  • Dwan K, Altman D, Arnaiz JA, Bloom J, Chan A-W, Cronin E, Decullier E, Easterbrook P, Von Elm E, Gamble C, Ghersi D, Ioannidis J, Simesa J, Williamson PR (2008) Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS ONE 3(8):e3081

    Article  Google Scholar 

  • Everitt B (2006) Medical statistics from A to Z—a guide for clinicians and medical students, second edn. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Field A (2000) Discovering statistics using SPSS for windows. Sage Publications, London

    Google Scholar 

  • Freedman B (1987) Equipoise and the ethics of clinical research. N Engl J Med 317(3):141–145

    Article  Google Scholar 

  • Gigerenzer G (2004) Mindless statistics. J Socio-Econ 33:587–606

    Article  Google Scholar 

  • Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schünemann HJ (2008) GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 336(7650):924–926

    Article  Google Scholar 

  • Hacking I (2001) An introduction to probability and inductive logic. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Howell D (2007) Statistical methods for psychology, 6th edn. Thomson Wadsworth, Belmont

    Google Scholar 

  • International council for harmonisation of technical requirements for pharmaceuticals for human use (1998) ICH harmonized tripartite guideline: statistical principles for clinical trials E9

  • Ioannidis J (2005) Why most published research findings are false. PLoS Med 2(8):e124

    Article  Google Scholar 

  • Jeong J, Yoo W (2015) Effects of air stacking on pulmonary function and peak cough flow in patients with cervical spinal cord injury. J Phys Ther Sci 27(6):1951–1952

    Article  Google Scholar 

  • Kaiser H (1960) Directional statistical decisions. Psychol Rev 67:160–167

    Article  Google Scholar 

  • Kimmer H (1957) Three criteria for the use of one-tailed tests. Psychol Bull 54:351–353

    Article  Google Scholar 

  • Kirkwood B, Sterne J (2003) Essential medical statistics, second edn. Blackwell, Oxford

    Google Scholar 

  • Lecoutre B, Poitevineau J (2014) The Significance test controversy revisited. The fiducial bayesian alternative. Springer, New York

    Book  Google Scholar 

  • Lehmann E, Romano J (2005) Testing statistical hypotheses, third edn. Springer, New York

    Google Scholar 

  • Machin D, Campbell M, Walters S (2007) Medical statistics. A textbook for the health sciences, fourth edn. Wiley, New York

    Google Scholar 

  • Marcus R, Peritz E, Gabriel K (1976) On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63(3):655–660

    Article  Google Scholar 

  • Meehl P (1978) Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. J Consult Clin Psychol 46:806–834

    Article  Google Scholar 

  • Nickerson R (2000) Null hypothesis significance testing: a review of an old and continuing controversy. Psycholl Methods 5(2):241–301

    Article  Google Scholar 

  • Ruxton GD, Neuhäuser M (2010) When should we use one-tailed hypothesis testing? Methods Ecol Evol 1:114–117

    Article  Google Scholar 

  • Zimmermann G, Bolter L-M, Sluka R, Höller Y, Bathke AC, Thomschewski A, Leis S, Lattanzi S, Brigo F, Trinka E (2019) Sample sizes and statistica methods in interventional studies on individuals with spinal cord injury: a systematic review. J Evid-Based Med

Download references

Acknowledgements

 We thank Arne Bathke, Robyn Bluhm, Charlotte Werndl, the audiences in Genoa, Paris, and Munich, the editors of the special issue Fabrizio Macagno and Carlo Martini, as well as two anonymous reviewers for their constructive criticisms and suggestions.

Funding

Insa Lawler gratefully acknowledges that part of her research for this article was funded by the OeAD for an Ernst Mach Scholarship and by an Emmy Noether Grant from the German Research Council (DFG), Reference No. BR 5210/1-1. Georg Zimmermann received research support (IT equipment and conference travel reimbursements) from Eisai Europe Ltd.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Insa Lawler.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Research Involving Human and Animal Rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lawler, I., Zimmermann, G. Misalignment Between Research Hypotheses and Statistical Hypotheses: A Threat to Evidence-Based Medicine?. Topoi 40, 307–318 (2021). https://doi.org/10.1007/s11245-019-09667-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11245-019-09667-0

Keywords

Navigation