Skip to main content

My Ban on Null Hypothesis Significance Testing and Confidence Intervals

  • Conference paper
  • First Online:
Structural Changes and their Econometric Modeling (TES 2019)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 808))

Included in the following conference series:

Abstract

The journal, Basic and Applied Social Psychology, banned null hypothesis significance testing and confidence intervals. Was this justified, and if so, why? I address these questions with a focus on the different types of assumptions that compose the models on which p-values and confidence intervals are based. For the computation of p-values, in addition to problematic model assumptions, there also is the problem that p-values confound the implications of sample effect sizes and sample sizes. For the computation of confidence intervals, in contrast to the justification that they provide valuable information about the precision of the data, there is a triple confound involving three types of precision. These are measurement precision, precision of homogeneity, and sampling precision. Because it is possible to estimate all three separately, provided the researcher has tested the reliability of the dependent variable, there is no reason to confound them via the computation of a confidence interval. Thus, the ban is justified both with respect to null hypothesis significance testing and confidence intervals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    An example is the book by Briggs (2016), who is a distinguished participant at TES2019.

  2. 2.

    Richard Morey, in his blog (http://bayesfactor.blogspot.com/2015/11/neyman-does-science-part-1.html), has documented how even Neyman was unable to avoid misusing p-values in this way, though he warned against it himself.

  3. 3.

    In fact, Rothman et al. (2013) provided arguments against random selection.

  4. 4.

    The reader may wonder about p-values as used in NHST versus as used to provide continuous indices of alleged justified worry about the model. Although both are problematic for the reasons described, null hypothesis significance tests are worse because of the dichotomous thinking they encourage, and the dramatic overestimates of effect sizes in scientific literatures that they promote (see Locascio, 2017a for an explanation). If p-values were calculated but not used to draw any conclusions, their costs would be reduced though still without providing any added benefits.

  5. 5.

    Of course, even this very limited conclusion depends on the model being correct, and as we already have seen, the model is not correct because of problematic inferential assumptions.

  6. 6.

    Assuming random sampling, an assumption most likely incorrect.

  7. 7.

    This argument should not be interpreted as indicating that contemporary researchers are at an overall disadvantage. In fact, contemporary researchers have many advantages over the researchers of yesteryear, including better knowledge, better technology, and others.

  8. 8.

    This sparse description may seem to imply that a priori procedures are simply another way to perform power analyses. However, this is not true, and I have provided demonstrations of the differences, including contradictory effects (Trafimow 2017b; Trafimow and MacDonald, 2017).

References

  • Bakker, M., van Dijk, A., Wicherts, J.M.: The rules of the game called psychological science. Perspect. Psychol. Sci. 7(6), 543–554 (2012)

    Article  Google Scholar 

  • Berk, R.A., Freedman, D.A.: Statistical assumptions as empirical commitments. In: Blomberg, T.G., Cohen, S. (eds.) Law, Punishment, and Social Control: Essays in Honor of Sheldon Messinger. 2nd edn., pp. 235–254. Aldine de Gruyter (2003)

    Google Scholar 

  • Box, G.E.P., Draper, N.R.: Empirical Model-Building and Response Surfaces. Wiley, New York (1987)

    MATH  Google Scholar 

  • Briggs, W.: Uncertainty: The Soul of Modeling, Probability and Statistics. Springer, New York (2016)

    Book  Google Scholar 

  • Cumming, G., Calin-Jageman, R.: Introduction to the New Statistics: Estimation, Open Science, and Beyond. Taylor and Francis Group, New York (2017)

    Google Scholar 

  • Duhem, P.: The Aim and Structure of Physical Theory (P.P. Wiener, Trans). Princeton University Press, Princeton (1954). (Original work published 1906)

    Google Scholar 

  • Earp, B.D., Trafimow, D.: Replication, falsification, and the crisis of confidence in social psychology. Front. Psychol. 6, 1–11, Article 621 (2015)

    Google Scholar 

  • Gillies, D.: Philosophical Theories of Probability. Routledge, London (2000)

    Google Scholar 

  • Greenland, S.: Invited commentary: the need for cognitive science in methodology. Am. J. Epidemiol. 186, 639–645 (2017)

    Article  Google Scholar 

  • Gulliksen, H.: Theory of Mental Tests. Lawrence Erlbaum Associates Publishers, Hillsdale (1987)

    Google Scholar 

  • Halsey, L.G., Curran-Everett, D., Vowler, S.L., Drummond, G.B.: The fickle P value generates irreproducible results. Nat. Methods 12, 179–185 (2015). https://doi.org/10.1038/nmeth.3288

    Article  Google Scholar 

  • Hubbard, R.: Corrupt Research: The Case for Reconceptualizing Empirical Management and Social Science. Sage Publications, Los Angeles (2016)

    Google Scholar 

  • John, L.K., Loewenstein, G., Prelec, D.: Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23(5), 524–532 (2012)

    Article  Google Scholar 

  • Lakatos, I.: The Methodology of Scientific Research Programmes. Cambridge University Press, Cambridge (1978)

    Google Scholar 

  • Lord, F.M., Novick, M.R.: Statistical Theories of Mental Test Scores. Addison-Wesley, Reading (1968)

    Google Scholar 

  • Nguyen, H.T.: On evidential measures of support for reasoning with integrated uncertainty: a lesson from the ban of P-values in statistical inference. In: Huynh, V.N., et al., (eds.) Integrated Uncertainty in Knowledge Modeling and Decision Making. Lecture Notes in Artificial Intelligence, vol. 9978, pp. 3–15. Springer (2016)

    Google Scholar 

  • Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349(6251), aac4716 (2015). 10.1126/science.aac4716

  • Rothman, K.J., Galacher, J.E.J., Hatch, E.E.: Why representativeness should be avoided. Int. J. Epidemiol. 42(4), 1012–1014 (2013)

    Article  Google Scholar 

  • Simmons, J.P., Nelson, L.D., Simonsohn, U.: False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22(11), 1359–1366 (2011)

    Article  Google Scholar 

  • Speelman, C.P., McGann, M.: Editorial: challenges to mean-based analysis in psychology: the contrast between individual people and general science. Front. Psychol. 7, 1234 (2016)

    Google Scholar 

  • Trafimow, D.: Editorial. Basic Appl. Soc. Psychol. 36(1), 1–2 (2014)

    Article  MathSciNet  Google Scholar 

  • Trafimow, D.: Implications of an initial empirical victory for the truth of the theory and additional empirical victories. Philos. Psychol. 30(4), 411–433 (2017a)

    Article  Google Scholar 

  • Trafimow, D.: Using the coefficient of confidence to make the philosophical switch from a posteriori to a priori inferential statistics. Educ. Psychol. Meas. 77(5), 831–854 (2017b)

    Article  Google Scholar 

  • Trafimow, D.: An a priori solution to the replication crisis. Philos. Psychol. 31, 1188–1214 (2018)

    Article  Google Scholar 

  • Trafimow, D.: A taxonomy of model assumptions on which P is based and implications for added benefit in the soft sciences (under submission)

    Google Scholar 

  • Trafimow, D., Amrhein, V., Areshenkoff, C.N., Barrera-Causil, C.J., Beh, E.J., Bilgiç, Y.K., Bono, R., Bradley, M.T., Briggs, W.M., Cepeda-Freyre, H.A., Chaigneau, S.E., Ciocca, D.R., Correa, J.C., Cousineau, D., de Boer, M.R., Dhar, S.S., Dolgov, I., Gómez-Benito, J., Grendar, M., Grice, J.W., Guerrero-Gimenez, M.E., Gutiérrez, A., Huedo-Medina, T.B., Jaffe, K., Janyan, A., Karimnezhad, A., Korner-Nievergelt, F., Kosugi, K., Lachmair, M., Ledesma, R.D., Limongi, R., Liuzza, M.T., Lombardo, R., Marks, M.J., Meinlschmidt, G., Nalborczyk, L., Nguyen, H.T., Ospina, R., Perezgonzalez, J.D., Pfister, R., Rahona, J.J., Rodríguez-Medina, D.A., Romão, X., Ruiz-Fernández, S., Suarez, I., Tegethoff, M., Tejo, M., van de Schoot, R., Vankov, I.I., Velasco-Forero, S., Wang, T., Yamada, Y., Zoppino, F.C.M., Marmolejo-Ramos, F.: Manipulating the alpha level cannot cure significance testing. Front. Psychology. 9, 699 (2018)

    Article  Google Scholar 

  • Trafimow, D., MacDonald, J.A.: Performing inferential statistics prior to data collection. Educ. Psychol. Meas. 77(2), 204–219 (2017)

    Article  Google Scholar 

  • Trafimow, D., Marks, M.: Editorial. Basic Appl. Soc. Psychol. 37(1), 1–2 (2015)

    Article  MathSciNet  Google Scholar 

  • Trafimow, D., Marks, M.: Editorial. Basic Appl. Soc. Psychol. 38(1), 1–2 (2016)

    Article  MathSciNet  Google Scholar 

  • Trafimow, D., Wang, T., Wang, C.: Means and standard deviations, or locations and scales? That is the question! New Ideas Psychol. 50, 34–37 (2018)

    Article  Google Scholar 

  • Trafimow, D., Wang, T., Wang, C.: From a sampling precision perspective, skewness is a friend and not an enemy! Educ. Psychol. Meas. (in press)

    Google Scholar 

  • Trueblood, J.S., Busemeyer, J.R.: A quantum probability account of order effects in inference. Cogn. Sci. 35, 1518–1552 (2011)

    Article  Google Scholar 

  • Trueblood, J.S., Busemeyer, J.R.: A quantum probability model of causal reasoning. Front. Psychol. 3, 138 (2012)

    Article  Google Scholar 

  • Valentine, J.C., Aloe, A.M., Lau, T.S.: Life after NHST: how to describe your data without “p-ing” everywhere. Basic Appl. Soc. Psychol. 37(5), 260–273 (2015)

    Article  Google Scholar 

  • Wasserstein, R.L., Lazar, N.A.: The ASA’s statement on p-values: context, process, and purpose. Am. Stat. 70, 129–133 (2016)

    Article  MathSciNet  Google Scholar 

  • Woodside, A.: The good practices manifesto: overcoming bad practices pervasive in current research in business. J. Bus. Res. 69(2), 365–381 (2016)

    Article  Google Scholar 

  • Ziliak, S.T., McCloskey, D.N.: The Cult of Statistical Significance: How the Standard Error Costs us Jobs, Justice, and Lives. The University of Michigan Press, Ann Arbor (2016)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Trafimow .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Trafimow, D. (2019). My Ban on Null Hypothesis Significance Testing and Confidence Intervals. In: Kreinovich, V., Sriboonchitta, S. (eds) Structural Changes and their Econometric Modeling. TES 2019. Studies in Computational Intelligence, vol 808. Springer, Cham. https://doi.org/10.1007/978-3-030-04263-9_3

Download citation

Publish with us

Policies and ethics