Advertisement

Topoi

pp 1–12 | Cite as

Misalignment Between Research Hypotheses and Statistical Hypotheses: A Threat to Evidence-Based Medicine?

  • Insa LawlerEmail author
  • Georg Zimmermann
Article

Abstract

Evidence-based medicine frequently uses statistical hypothesis testing. In this paradigm, data can only disconfirm a research hypothesis’ competitors: One tests the negation of a statistical hypothesis that is supposed to correspond to the research hypothesis. In practice, these hypotheses are often misaligned. For instance, directional research hypotheses are often paired with non-directional statistical hypotheses. Prima facie, one cannot gain proper evidence for one’s research hypothesis employing a misaligned statistical hypothesis. This paper sheds lights on the nature of and the reasons for such misalignments and it provides a thorough analysis of whether they pose a threat to evidence-based medicine. The upshots are that the misalignments are often hidden for clinicians and that although some cases of misalignments can be partially counterbalanced, the overall threat is non-negligible. The counterbalances either lead to methodological inadequacy (in addition to the misalignment), loss of statistical power, or involve a (potential) lack of information that could be crucial for decision making. This result casts doubt on various findings of medical studies in addition to issues associated with under-powered studies or the replication crisis.

Keywords

Research hypotheses Statistical hypothesis testing Null hypotheses Evidence-based medicine Clinical decision making 

Notes

Acknowledgements

 We thank Arne Bathke, Robyn Bluhm, Charlotte Werndl, the audiences in Genoa, Paris, and Munich, the editors of the special issue Fabrizio Macagno and Carlo Martini, as well as two anonymous reviewers for their constructive criticisms and suggestions.

Funding

Insa Lawler gratefully acknowledges that part of her research for this article was funded by the OeAD for an Ernst Mach Scholarship and by an Emmy Noether Grant from the German Research Council (DFG), Reference No. BR 5210/1-1. Georg Zimmermann received research support (IT equipment and conference travel reimbursements) from Eisai Europe Ltd.

Compliance with Ethical Standards

Conflict of interest

The authors declare that they have no conflict of interest.

Research Involving Human and Animal Rights

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Altman D (1991) Statistic for medical research, first edn. Chapan & Hall, LondonGoogle Scholar
  2. Bariani G, de Celis Ferrari A, Precivale M, Arai R, Saad E, Riechelmann R (2015) Sample size calculation in oncology trials: quality of reporting and implications for clinical cancer research. Am J Clin Oncol 38(6):570–574CrossRefGoogle Scholar
  3. Benjamin D et al (2018) Redefine statistical significance. Nat Hum Behav 2:6–10CrossRefGoogle Scholar
  4. Berger J, Sellke T (1987) Testing a point null hypothesis: the irreconcilability of P values and evidence. J Am Stat Assoc 82(397):62–71Google Scholar
  5. Bigirumurame T, Kasim AS (2017) Can testing clinical significance reduce false positive rates in randomized controlled trials? A snap review. BMC Res Notes 10:775CrossRefGoogle Scholar
  6. Bland M (2000) Introduction to medical statistics, third edn. Oxford University Press, OxfordGoogle Scholar
  7. Braver S (1975) On splitting the tails unequally: a new perspective on one- versus two-tailed tests. Educ Psychol Meas 32:283–301CrossRefGoogle Scholar
  8. Casella G, Berger R (1987) Reconcilling bayesian and frequentist evidence in the one-sided testing problem. J Am Stat Assoc 82(397):106–111CrossRefGoogle Scholar
  9. Cho H-C, Abe S (2013) Is two-tailed testing for directional research hypotheses tests legitimate? J Bus Res 66:1261–1266CrossRefGoogle Scholar
  10. Cohen J (1994) The earth is round (P \(<\).05). Am Psychol 49(12):997–1003CrossRefGoogle Scholar
  11. Colquhoun D (2014) An investigation of the false discovery rate and the misinterpretation of P-values. R Soc Open Sci 1(140216):1–16Google Scholar
  12. Cumming G (2012) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Routledge, LondonGoogle Scholar
  13. Davis R et al (2014) Reproducibility project: cancer biology. https://elifesciences.org/collections/9b1e83d1/reproducibility-project-cancer-biology
  14. Derakhshanrad N, Vosoughi F, Yekaninejad M, Moshayedi P, Saberi H (2015) Functional impact of multidisciplinary outpatient program on patients with chronic complete spinal cord injury. Spinal Cord 53:850–865Google Scholar
  15. Djulbegovic B (2009) The paradox of equipoise: the principle that drives and limits therapeutic discoveries in clinical research. Cancer Control 16(4):342–347CrossRefGoogle Scholar
  16. Dubey S (1991) Some thoughts on the one-sided and two-sided tests. J Biopharm Stat 1(1):139–150CrossRefGoogle Scholar
  17. Dwan K, Altman D, Arnaiz JA, Bloom J, Chan A-W, Cronin E, Decullier E, Easterbrook P, Von Elm E, Gamble C, Ghersi D, Ioannidis J, Simesa J, Williamson PR (2008) Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS ONE 3(8):e3081CrossRefGoogle Scholar
  18. Everitt B (2006) Medical statistics from A to Z—a guide for clinicians and medical students, second edn. Cambridge University Press, CambridgeGoogle Scholar
  19. Field A (2000) Discovering statistics using SPSS for windows. Sage Publications, LondonGoogle Scholar
  20. Freedman B (1987) Equipoise and the ethics of clinical research. N Engl J Med 317(3):141–145CrossRefGoogle Scholar
  21. Gigerenzer G (2004) Mindless statistics. J Socio-Econ 33:587–606CrossRefGoogle Scholar
  22. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schünemann HJ (2008) GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 336(7650):924–926CrossRefGoogle Scholar
  23. Hacking I (2001) An introduction to probability and inductive logic. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  24. Howell D (2007) Statistical methods for psychology, 6th edn. Thomson Wadsworth, BelmontGoogle Scholar
  25. International council for harmonisation of technical requirements for pharmaceuticals for human use (1998) ICH harmonized tripartite guideline: statistical principles for clinical trials E9Google Scholar
  26. Ioannidis J (2005) Why most published research findings are false. PLoS Med 2(8):e124CrossRefGoogle Scholar
  27. Jeong J, Yoo W (2015) Effects of air stacking on pulmonary function and peak cough flow in patients with cervical spinal cord injury. J Phys Ther Sci 27(6):1951–1952CrossRefGoogle Scholar
  28. Kaiser H (1960) Directional statistical decisions. Psychol Rev 67:160–167CrossRefGoogle Scholar
  29. Kimmer H (1957) Three criteria for the use of one-tailed tests. Psychol Bull 54:351–353CrossRefGoogle Scholar
  30. Kirkwood B, Sterne J (2003) Essential medical statistics, second edn. Blackwell, OxfordGoogle Scholar
  31. Lecoutre B, Poitevineau J (2014) The Significance test controversy revisited. The fiducial bayesian alternative. Springer, New YorkCrossRefGoogle Scholar
  32. Lehmann E, Romano J (2005) Testing statistical hypotheses, third edn. Springer, New YorkGoogle Scholar
  33. Machin D, Campbell M, Walters S (2007) Medical statistics. A textbook for the health sciences, fourth edn. Wiley, New YorkGoogle Scholar
  34. Marcus R, Peritz E, Gabriel K (1976) On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63(3):655–660CrossRefGoogle Scholar
  35. Meehl P (1978) Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. J Consult Clin Psychol 46:806–834CrossRefGoogle Scholar
  36. Nickerson R (2000) Null hypothesis significance testing: a review of an old and continuing controversy. Psycholl Methods 5(2):241–301CrossRefGoogle Scholar
  37. Ruxton GD, Neuhäuser M (2010) When should we use one-tailed hypothesis testing? Methods Ecol Evol 1:114–117CrossRefGoogle Scholar
  38. Zimmermann G, Bolter L-M, Sluka R, Höller Y, Bathke AC, Thomschewski A, Leis S, Lattanzi S, Brigo F, Trinka E (2019) Sample sizes and statistica methods in interventional studies on individuals with spinal cord injury: a systematic review. J Evid-Based MedGoogle Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.Department of PhilosophyUniversity of North Carolina at GreensboroGreensboroUSA
  2. 2.Department of Neurology, Christian Doppler Medical CentreParacelsus Medical UniversitySalzburgAustria
  3. 3.Department of MathematicsParis Lodron University of SalzburgSalzburgAustria
  4. 4.Spinal Cord Injury and Tissue Regeneration Centre SalzburgParacelsus Medical UniversitySalzburgAustria

Personalised recommendations