Skip to main content
Log in

NHST is still logically flawed

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In this elaborate response to Wu (in Scientometrics, 2018), I maintain that null hypothesis significance testing (NHST) is logically flawed. Wu (2018) disagrees with this claim presented in Schneider (in Scientometrics 102(1):411–432, 2015). In this response, I examine the claim in more depth and demonstrate that since NHST is based on one conditional probability alone and framed in a probabilistic modus tollens framework of reasoning, it is by definition logically invalid. I also argue that disregarding this logically fallacy, as most researchers do, and treating the p value as a heuristic value for dichotomous decisions against the null hypothesis, is a risky business that often leads to false-positive claims.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

  1. http://issi-society.org/blog/posts/2017/october/issi-paper-of-the-year-award-2017/.

  2. The latter in itself can be seen as a logical flaw, as a valid measure of strength of evidence should not include the probabilities of unobserved outcomes (Jeffreys 1939; Berger and Delampady 1987; Berger and Berry 1988a, b; Royall 1997; Goodman 1999), but this is not the main logical flaw of interest here.

References

  • Berger, J. O., & Berry, D. A. (1988a). The relevance of stopping rules in statistical inference (with discussion). In S. Gupta & J. O. Berger (Eds.), Statistical decision theory and related topics IV (Vol. 1, pp. 29–72). New York, NY: Springer.

  • Berger, J. O., & Berry, D. A. (1988b). Statistical analysis and the illusion of objectivity. American Scientist, 76(2), 159–165.

    Google Scholar 

  • Berger, J. O., & Delampady, M. (1987). Testing precise hypotheses. Statistcial Science, 2(3), 317–352.

    Article  MathSciNet  MATH  Google Scholar 

  • Berger, J. O., & Sellke, T. (1987). Testing a point null hypothesis—The irreconcilability of p-values and evidence. Journal of the American Statistical Association, 82(397), 112–122.

    MathSciNet  MATH  Google Scholar 

  • Berkson, J. (1942). Tests of significance considered as evidence. Journal of the American Statistical Association, 37(219), 325–335.

    Article  Google Scholar 

  • Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003.

    Article  Google Scholar 

  • Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of p-values. Royal Society Open Science, 1(3), 1–16. https://doi.org/10.1098/rsos.140216.

    Article  Google Scholar 

  • Colquhoun, D. (2017). The reproducibility of research and the misinterpretation of P values. bioRxiv. https://doi.org/10.1101/144337.

    Google Scholar 

  • Edwards, A. W. F. (1972). Likelihood. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Falk, R., & Greenbaum, C. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory Psychology, 5, 75–98.

    Article  Google Scholar 

  • Fisher, R. A. (1956). Statistical methods and scientific inference. New York, NY: Hafner.

    MATH  Google Scholar 

  • Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311–339). Hillsdale, MI: Erlbaum.

    Google Scholar 

  • Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-economics, 33(5), 587–606.

    Article  Google Scholar 

  • Goodman, S. N. (1999). Toward evidence-based medical statistics. 1: The P value fallacy. Annals of Internal Medicine, 130(12), 995–1004.

    Article  Google Scholar 

  • Hacking, I. (1965). Logic of statistical inference. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Hofmann, S. G. (2002). Fisher’s fallacy and NHST’s flawed logic. American Psychologist, 57(1), 69–70.

    Article  Google Scholar 

  • Hubbard, R., & Lindsay, R. M. (2008). Why P values are not a useful measure of evidence in statistical significance testing. Theory and Psychology, 18(1), 69–88.

    Article  Google Scholar 

  • Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), 696–701.

    Article  Google Scholar 

  • Ioannidis, J. P. A., Stanley, T. D., & Doucouliagos, H. (2017). The power of bias in economics research. The Economic Journal, 127(605), F236–F265.

    Article  Google Scholar 

  • Jeffreys, H. (1939). Theory of probability. Oxford: Clarendon Press.

    MATH  Google Scholar 

  • Krueger, J. (2001). Null hypothesis significance testing: On the survival of a flawed method. American Psychologist, 56(1), 16–26.

    Article  Google Scholar 

  • Krueger, J. I., & Heck, P. R. (2017). The Heuristic value of p in inductive statistical inference. Frontiers in Psychology, 8(908), 1–16. https://doi.org/10.3389/fpsyg.2017.00908.

    Google Scholar 

  • Lindley, D. V. (1957). A statistical paradox. Biometrika, 44(1–2), 187–192.

    Article  MATH  Google Scholar 

  • Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103–115.

    Article  Google Scholar 

  • Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301.

    Article  Google Scholar 

  • Pollard, P., & Richardson, J. T. (1987). On the probability of making type I errors. Psychological Bulletin, 102(1), 159–163.

    Article  Google Scholar 

  • Royall, R. (1997). Statistical evidence: A likelihood paradigm. London: Chapman & Hall.

    MATH  Google Scholar 

  • Schneider, J. W. (2015). Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations. Scientometrics, 102(1), 411–432.

    Article  Google Scholar 

  • Sellke, T., Bayarri, M. J., & Berger, J. O. (2001). Calibration of rho values for testing precise null hypotheses. The American Statistician, 55, 62–71.

    Article  MathSciNet  MATH  Google Scholar 

  • Sober, E. (2008). Evidence and evolution. The logic behind science. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Szucs, D., & Ioannidis, J. P. A. (2017). When null hypothesis significance testing is unsuitable for research: A reassessment. Frontiers in Human Neuroscience, 11(390), 1–21. https://doi.org/10.3389/fnhum.2017.00390.

    Google Scholar 

  • Trafimow, D. (2003). Hypothesis testing and theory evaluation at the boundaries: Surprising insights from Bayes’s theorem. Psychological Review, 110(3), 526.

    Article  Google Scholar 

  • Trafimow, D., & Rice, S. (2009). A test of the null hypothesis significance testing procedure correlation argument. The Journal of General Psychology, 136(3), 261–270.

    Article  Google Scholar 

  • Wu, J. (2018). Is there an intrinsic logical error in null hypothesis significance tests? Commentary on: “Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations”. Scientometrics. https://doi.org/10.1007/s11192-018-2656-3.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesper W. Schneider.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schneider, J.W. NHST is still logically flawed. Scientometrics 115, 627–635 (2018). https://doi.org/10.1007/s11192-018-2655-4

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-018-2655-4

Keywords

Navigation