Method Matters in Psychology pp 187-203 | Cite as

# Tests of Statistical Significance Made Sound

## Abstract

This chapter considers the nature and place of tests of statistical significance (ToSS) in science, with particular reference to psychology. Despite the enormous amount of attention given to this topic, psychology’s understanding of ToSS remains deficient. The major problem stems from a widespread and uncritical acceptance of null hypothesis significance testing, which is an indefensible amalgam of ideas adapted from Fisher’s thinking on the subject and from Neyman and Pearson’s alternative account. To correct for the deficiencies of the hybrid, it is suggested that psychology avail itself of two important and more recent viewpoints on ToSS, namely the neo-Fisherian and the error-statistical perspectives. It is suggested that these more recent outlooks on ToSS are a definite improvement on standard null hypothesis significance testing. It is concluded that ToSS can play a useful, if limited, role in psychological research.

## References

- Acree, M. C. (1978).
*Theories of statistical inference in psychological research: A historico-critical study*(University Microfilms No. H790 H7000). Ann Arbor, MI: University Microfilms International.Google Scholar - Bolles, R. C. (1962). The difference between statistical hypotheses and scientific hypotheses.
*Psychological Reports,**11,*639–645.CrossRefGoogle Scholar - Cox, D. R. (1958). Some problems connected with statistical inference.
*Annals of Mathematical Statistics,**29,*357–372.CrossRefGoogle Scholar - Cox, D. R. (2006).
*Principles of statistical inference*. Cambridge, England: Cambridge University Press.CrossRefGoogle Scholar - Cox, D. R., & Mayo, D. G. (2010). Objectivity and conditionality in frequentist inference. In D. G. Mayo & A. Spanos (Eds.),
*Error and inference: recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science*(pp. 276–304). New York, NY: Cambridge University Press.Google Scholar - Cumming, G. (2014). The new statistics: why and how.
*Psychological Science,**25,*7–29.CrossRefGoogle Scholar - Dienes, Z. (2011). Bayesian versus orthodox statistics: Which side are you on?
*Perspectives on Psychological Science,**6,*274–290.CrossRefGoogle Scholar - Fisher, R. A. (1925).
*Statistical methods for research workers*. Edinburgh, Scotland: Oliver & Boyd.Google Scholar - Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics.
*British Journal of Mathematical and Statistical Psychology,**66,*8–38.CrossRefGoogle Scholar - Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (Eds.),
*A handbook for data analysis in the behavioral sciences*(pp. 311–339). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar - Grice, J. W. (2011).
*Observation oriented modeling: analysis of cause in the behavioral sciences*. San Diego, CA: Academic Press.Google Scholar - Haig, B. D. (2014).
*Investigating the psychological world: scientific method in the behavioral sciences*. Cambridge, MA: MIT Press.CrossRefGoogle Scholar - Halpin, P. F., & Stam, H. J. (2006). Inductive inference or inductive behavior: Fisher and Neyman-Pearson approaches to statistical testing in psychological research (1940–1960).
*American Journal of Psychology,**119,*625–653.CrossRefGoogle Scholar - Harlow, L. L., Mulaik, S. A., & Steiger, J. H. (Eds.). (1997).
*What if there were no significance tests?*. Mahwah, NJ: Lawrence Erlbaum.Google Scholar - Harris, R. J. (1997). Reforming significance testing via three-valued logic. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.),
*What if there were no significance tests?*(pp. 145–174). Mahwah, NJ: Lawrence Erlbaum.Google Scholar - Hoover, K. D., & Siegler, M. V. (2008). Sound and fury: McCloskey and significance testing in economics.
*Journal of Economic Methodology,**15,*1–37.CrossRefGoogle Scholar - Hubbard, R. (2004). Alphabet soup: Blurring the distinction between
*p’*s and*a*’s in psychological research.*Theory & Psychology,**14,*295–327.CrossRefGoogle Scholar - Hubbard, R. (2016).
*Corrupt research: The case for reconceptualising empirical management and social science*. Thousand Oaks, CA: Sage.Google Scholar - Hurlbert, S. H., & Lombardi, C. M. (2009). Final collapse of the Neyman-Pearson decision theoretic framework and rise of the neoFisherian.
*Annales Zoologici Fennici,**46,*311–349.CrossRefGoogle Scholar - Kaiser, H. F. (1960). Directional statistical decisions.
*Psychological Review,**67,*160–167.CrossRefGoogle Scholar - Kruscke, J. (2015).
*Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan*(2nd ed.). Amsterdam, the Netherlands: Elsevier.Google Scholar - Lehmann, E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two?
*Journal of the American Statistical Association,**88,*1242–1249.CrossRefGoogle Scholar - Mayo, D. G. (1996).
*Error and the growth of experimental knowledge*. Chicago, IL: University of Chicago Press.CrossRefGoogle Scholar - Mayo, D. G. (2011). Statistical science and philosophy of science: Where do/should they meet in 2011 (and beyond)?
*Rationality, Markets and Morals,**2,*79–102.Google Scholar - Mayo, D. G. (2012). Statistical science meets philosophy of science, part 2: Shallow versus deep explorations.
*Rationality, Markets and Morals,**3,*71–107.Google Scholar - Mayo, D. G., & Cox, D. (2010). Frequentist statistics as a theory of inductive inference. In D. G. Mayo & A. Spanos (Eds.),
*Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science*(pp. 247–304). New York, NY: Cambridge University Press.Google Scholar - Mayo, D. G., & Spanos, A. (2006). Severe testing as a basic concept in a Neyman-Pearson philosophy of induction.
*British Journal for the Philosophy of Science,**57,*323–357.CrossRefGoogle Scholar - Mayo, D. G., & Spanos, A. (Eds.). (2010).
*Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science*. New York, NY: Cambridge University Press.Google Scholar - Mayo, D. G., & Spanos, A. (2011). Error statistics. In P. S. Bandyopadhyay & M. R. Forster (Eds.),
*Handbook of philosophy of Science: Vol. 7. Philosophy of statistics*(pp. 153–198). Amsterdam, the Netherlands: Elsevier.CrossRefGoogle Scholar - McCloskey, D. N., & Ziliak, S. T. (1996). The standard error of regressions.
*Journal of Economic Literature,**34,*97–114.Google Scholar - Morrison, D. E., & Henkel, R. E. (Eds.). (1970).
*The significance test controversy: A reader*. Chicago, IL: Aldine.Google Scholar - Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses.
*Philosophical Transactions of the Royal Society of London A,**231,*289–337.CrossRefGoogle Scholar - Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy.
*Psychological Methods,**5,*241–301.CrossRefGoogle Scholar - Pace, L., & Salvan, A. (1997).
*Advanced series on statistical science and applied probability: Vol. 4. Principles of statistical inference from a neo-Fisherian perspective*. Singapore: World Scientific.Google Scholar - Peirce, C. S. (1931–1958).
*The collected papers of Charles Sanders Peirce*(Vols. 1–8; C. Hartshorne & P. Weiss [Eds., Vols. 1–6], & A. W. Burks [Ed., Vols. 7-8]). Cambridge, MA: Harvard University Press.Google Scholar - Popper, K. R. (1959).
*The logic of scientific discovery*. London, England: Hutchinson.Google Scholar - Senn, S. (2001). Two cheers for
*P*-values?*Journal of Epidemiology and Biostatistics,**6,*193–204.CrossRefGoogle Scholar - Spanos, A. (1999).
*Probability theory and statistical inference: Economic modeling with observational data*. Cambridge, England: Cambridge University Press.CrossRefGoogle Scholar - Spanos, A. (2010). On a new philosophy of frequentist inference: Exchanges with David Cox and Deborah G. Mayo. In D. G. Mayo & A. Spanos (Eds.),
*Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science*(pp. 315–330). New York, NY: Cambridge University Press.Google Scholar - Spanos, A. (2014). Recurring controversies about
*P*values and confidence intervals revisited.*Ecology,**95,*645–651.CrossRefGoogle Scholar - Suppes, P. (1962). Models of data. In E. Nagel, P. Suppes, & A. Tarski (Eds.),
*Logic, methodology, and philosophy of science: Proceedings of the 1960 International Congress*(pp. 252–261). Stanford, CA: Stanford University Press.Google Scholar - Trafimow, D., & Marks, M. (2015). Editorial.
*Basic and Applied Social Psychology,**37,*1–2.CrossRefGoogle Scholar - Van Dyk, D. A. (2014). The role of statistics in the discovery of a Higgs Boson.
*Annual Review of Statistics and Its Applications,**1,*41–59.CrossRefGoogle Scholar - Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of
*p*values.*Psychonomic Bulletin & Review,**14,*779–804.CrossRefGoogle Scholar