Advertisement

Scientometrics

, Volume 115, Issue 1, pp 627–635 | Cite as

NHST is still logically flawed

  • Jesper W. SchneiderEmail author
Article

Abstract

In this elaborate response to Wu (in Scientometrics, 2018), I maintain that null hypothesis significance testing (NHST) is logically flawed. Wu (2018) disagrees with this claim presented in Schneider (in Scientometrics 102(1):411–432, 2015). In this response, I examine the claim in more depth and demonstrate that since NHST is based on one conditional probability alone and framed in a probabilistic modus tollens framework of reasoning, it is by definition logically invalid. I also argue that disregarding this logically fallacy, as most researchers do, and treating the p value as a heuristic value for dichotomous decisions against the null hypothesis, is a risky business that often leads to false-positive claims.

Keywords

Null hypothesis significance test Statistical inference p values Conditional probability Inference logic Modus tollens 

References

  1. Berger, J. O., & Berry, D. A. (1988a). The relevance of stopping rules in statistical inference (with discussion). In S. Gupta & J. O. Berger (Eds.), Statistical decision theory and related topics IV (Vol. 1, pp. 29–72). New York, NY: Springer.Google Scholar
  2. Berger, J. O., & Berry, D. A. (1988b). Statistical analysis and the illusion of objectivity. American Scientist, 76(2), 159–165.Google Scholar
  3. Berger, J. O., & Delampady, M. (1987). Testing precise hypotheses. Statistcial Science, 2(3), 317–352.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Berger, J. O., & Sellke, T. (1987). Testing a point null hypothesis—The irreconcilability of p-values and evidence. Journal of the American Statistical Association, 82(397), 112–122.MathSciNetzbMATHGoogle Scholar
  5. Berkson, J. (1942). Tests of significance considered as evidence. Journal of the American Statistical Association, 37(219), 325–335.CrossRefGoogle Scholar
  6. Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003.CrossRefGoogle Scholar
  7. Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of p-values. Royal Society Open Science, 1(3), 1–16.  https://doi.org/10.1098/rsos.140216.CrossRefGoogle Scholar
  8. Colquhoun, D. (2017). The reproducibility of research and the misinterpretation of P values. bioRxiv.  https://doi.org/10.1101/144337.Google Scholar
  9. Edwards, A. W. F. (1972). Likelihood. Cambridge: Cambridge University Press.zbMATHGoogle Scholar
  10. Falk, R., & Greenbaum, C. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory Psychology, 5, 75–98.CrossRefGoogle Scholar
  11. Fisher, R. A. (1956). Statistical methods and scientific inference. New York, NY: Hafner.zbMATHGoogle Scholar
  12. Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311–339). Hillsdale, MI: Erlbaum.Google Scholar
  13. Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-economics, 33(5), 587–606.CrossRefGoogle Scholar
  14. Goodman, S. N. (1999). Toward evidence-based medical statistics. 1: The P value fallacy. Annals of Internal Medicine, 130(12), 995–1004.CrossRefGoogle Scholar
  15. Hacking, I. (1965). Logic of statistical inference. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  16. Hofmann, S. G. (2002). Fisher’s fallacy and NHST’s flawed logic. American Psychologist, 57(1), 69–70.CrossRefGoogle Scholar
  17. Hubbard, R., & Lindsay, R. M. (2008). Why P values are not a useful measure of evidence in statistical significance testing. Theory and Psychology, 18(1), 69–88.CrossRefGoogle Scholar
  18. Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), 696–701.CrossRefGoogle Scholar
  19. Ioannidis, J. P. A., Stanley, T. D., & Doucouliagos, H. (2017). The power of bias in economics research. The Economic Journal, 127(605), F236–F265.CrossRefGoogle Scholar
  20. Jeffreys, H. (1939). Theory of probability. Oxford: Clarendon Press.zbMATHGoogle Scholar
  21. Krueger, J. (2001). Null hypothesis significance testing: On the survival of a flawed method. American Psychologist, 56(1), 16–26.CrossRefGoogle Scholar
  22. Krueger, J. I., & Heck, P. R. (2017). The Heuristic value of p in inductive statistical inference. Frontiers in Psychology, 8(908), 1–16.  https://doi.org/10.3389/fpsyg.2017.00908.Google Scholar
  23. Lindley, D. V. (1957). A statistical paradox. Biometrika, 44(1–2), 187–192.CrossRefzbMATHGoogle Scholar
  24. Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103–115.CrossRefGoogle Scholar
  25. Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301.CrossRefGoogle Scholar
  26. Pollard, P., & Richardson, J. T. (1987). On the probability of making type I errors. Psychological Bulletin, 102(1), 159–163.CrossRefGoogle Scholar
  27. Royall, R. (1997). Statistical evidence: A likelihood paradigm. London: Chapman & Hall.zbMATHGoogle Scholar
  28. Schneider, J. W. (2015). Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations. Scientometrics, 102(1), 411–432.CrossRefGoogle Scholar
  29. Sellke, T., Bayarri, M. J., & Berger, J. O. (2001). Calibration of rho values for testing precise null hypotheses. The American Statistician, 55, 62–71.MathSciNetCrossRefzbMATHGoogle Scholar
  30. Sober, E. (2008). Evidence and evolution. The logic behind science. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  31. Szucs, D., & Ioannidis, J. P. A. (2017). When null hypothesis significance testing is unsuitable for research: A reassessment. Frontiers in Human Neuroscience, 11(390), 1–21.  https://doi.org/10.3389/fnhum.2017.00390.Google Scholar
  32. Trafimow, D. (2003). Hypothesis testing and theory evaluation at the boundaries: Surprising insights from Bayes’s theorem. Psychological Review, 110(3), 526.CrossRefGoogle Scholar
  33. Trafimow, D., & Rice, S. (2009). A test of the null hypothesis significance testing procedure correlation argument. The Journal of General Psychology, 136(3), 261–270.CrossRefGoogle Scholar
  34. Wu, J. (2018). Is there an intrinsic logical error in null hypothesis significance tests? Commentary on: “Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations”. Scientometrics.  https://doi.org/10.1007/s11192-018-2656-3.

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2018

Authors and Affiliations

  1. 1.Danish Centre for Studies in Research and Research Policy, Department of Political ScienceAarhus UniversityAarhus CDenmark

Personalised recommendations