My Ban on Null Hypothesis Significance Testing and Confidence Intervals

Trafimow, David

doi:10.1007/978-3-030-04263-9_3

David Trafimow⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 808))

Included in the following conference series:

International Conference of the Thailand Econometrics Society

917 Accesses
1 Citations
1 Altmetric

Abstract

The journal, Basic and Applied Social Psychology, banned null hypothesis significance testing and confidence intervals. Was this justified, and if so, why? I address these questions with a focus on the different types of assumptions that compose the models on which p-values and confidence intervals are based. For the computation of p-values, in addition to problematic model assumptions, there also is the problem that p-values confound the implications of sample effect sizes and sample sizes. For the computation of confidence intervals, in contrast to the justification that they provide valuable information about the precision of the data, there is a triple confound involving three types of precision. These are measurement precision, precision of homogeneity, and sampling precision. Because it is possible to estimate all three separately, provided the researcher has tested the reliability of the dependent variable, there is no reason to confound them via the computation of a confidence interval. Thus, the ban is justified both with respect to null hypothesis significance testing and confidence intervals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
An example is the book by Briggs (2016), who is a distinguished participant at TES2019.
2.
Richard Morey, in his blog (http://bayesfactor.blogspot.com/2015/11/neyman-does-science-part-1.html), has documented how even Neyman was unable to avoid misusing p-values in this way, though he warned against it himself.
3.
In fact, Rothman et al. (2013) provided arguments against random selection.
4.
The reader may wonder about p-values as used in NHST versus as used to provide continuous indices of alleged justified worry about the model. Although both are problematic for the reasons described, null hypothesis significance tests are worse because of the dichotomous thinking they encourage, and the dramatic overestimates of effect sizes in scientific literatures that they promote (see Locascio, 2017a for an explanation). If p-values were calculated but not used to draw any conclusions, their costs would be reduced though still without providing any added benefits.
5.
Of course, even this very limited conclusion depends on the model being correct, and as we already have seen, the model is not correct because of problematic inferential assumptions.
6.
Assuming random sampling, an assumption most likely incorrect.
7.
This argument should not be interpreted as indicating that contemporary researchers are at an overall disadvantage. In fact, contemporary researchers have many advantages over the researchers of yesteryear, including better knowledge, better technology, and others.
8.
This sparse description may seem to imply that a priori procedures are simply another way to perform power analyses. However, this is not true, and I have provided demonstrations of the differences, including contradictory effects (Trafimow 2017b; Trafimow and MacDonald, 2017).

References

Bakker, M., van Dijk, A., Wicherts, J.M.: The rules of the game called psychological science. Perspect. Psychol. Sci. 7(6), 543–554 (2012)
Article Google Scholar
Berk, R.A., Freedman, D.A.: Statistical assumptions as empirical commitments. In: Blomberg, T.G., Cohen, S. (eds.) Law, Punishment, and Social Control: Essays in Honor of Sheldon Messinger. 2nd edn., pp. 235–254. Aldine de Gruyter (2003)
Google Scholar
Box, G.E.P., Draper, N.R.: Empirical Model-Building and Response Surfaces. Wiley, New York (1987)
MATH Google Scholar
Briggs, W.: Uncertainty: The Soul of Modeling, Probability and Statistics. Springer, New York (2016)
Book Google Scholar
Cumming, G., Calin-Jageman, R.: Introduction to the New Statistics: Estimation, Open Science, and Beyond. Taylor and Francis Group, New York (2017)
Google Scholar
Duhem, P.: The Aim and Structure of Physical Theory (P.P. Wiener, Trans). Princeton University Press, Princeton (1954). (Original work published 1906)
Google Scholar
Earp, B.D., Trafimow, D.: Replication, falsification, and the crisis of confidence in social psychology. Front. Psychol. 6, 1–11, Article 621 (2015)
Google Scholar
Gillies, D.: Philosophical Theories of Probability. Routledge, London (2000)
Google Scholar
Greenland, S.: Invited commentary: the need for cognitive science in methodology. Am. J. Epidemiol. 186, 639–645 (2017)
Article Google Scholar
Gulliksen, H.: Theory of Mental Tests. Lawrence Erlbaum Associates Publishers, Hillsdale (1987)
Google Scholar
Halsey, L.G., Curran-Everett, D., Vowler, S.L., Drummond, G.B.: The fickle P value generates irreproducible results. Nat. Methods 12, 179–185 (2015). https://doi.org/10.1038/nmeth.3288
Article Google Scholar
Hubbard, R.: Corrupt Research: The Case for Reconceptualizing Empirical Management and Social Science. Sage Publications, Los Angeles (2016)
Google Scholar
John, L.K., Loewenstein, G., Prelec, D.: Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23(5), 524–532 (2012)
Article Google Scholar
Lakatos, I.: The Methodology of Scientific Research Programmes. Cambridge University Press, Cambridge (1978)
Google Scholar
Lord, F.M., Novick, M.R.: Statistical Theories of Mental Test Scores. Addison-Wesley, Reading (1968)
Google Scholar
Nguyen, H.T.: On evidential measures of support for reasoning with integrated uncertainty: a lesson from the ban of P-values in statistical inference. In: Huynh, V.N., et al., (eds.) Integrated Uncertainty in Knowledge Modeling and Decision Making. Lecture Notes in Artificial Intelligence, vol. 9978, pp. 3–15. Springer (2016)
Google Scholar
Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349(6251), aac4716 (2015). 10.1126/science.aac4716
Rothman, K.J., Galacher, J.E.J., Hatch, E.E.: Why representativeness should be avoided. Int. J. Epidemiol. 42(4), 1012–1014 (2013)
Article Google Scholar
Simmons, J.P., Nelson, L.D., Simonsohn, U.: False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22(11), 1359–1366 (2011)
Article Google Scholar
Speelman, C.P., McGann, M.: Editorial: challenges to mean-based analysis in psychology: the contrast between individual people and general science. Front. Psychol. 7, 1234 (2016)
Google Scholar
Trafimow, D.: Editorial. Basic Appl. Soc. Psychol. 36(1), 1–2 (2014)
Article MathSciNet Google Scholar
Trafimow, D.: Implications of an initial empirical victory for the truth of the theory and additional empirical victories. Philos. Psychol. 30(4), 411–433 (2017a)
Article Google Scholar
Trafimow, D.: Using the coefficient of confidence to make the philosophical switch from a posteriori to a priori inferential statistics. Educ. Psychol. Meas. 77(5), 831–854 (2017b)
Article Google Scholar
Trafimow, D.: An a priori solution to the replication crisis. Philos. Psychol. 31, 1188–1214 (2018)
Article Google Scholar
Trafimow, D.: A taxonomy of model assumptions on which P is based and implications for added benefit in the soft sciences (under submission)
Google Scholar
Trafimow, D., Amrhein, V., Areshenkoff, C.N., Barrera-Causil, C.J., Beh, E.J., Bilgiç, Y.K., Bono, R., Bradley, M.T., Briggs, W.M., Cepeda-Freyre, H.A., Chaigneau, S.E., Ciocca, D.R., Correa, J.C., Cousineau, D., de Boer, M.R., Dhar, S.S., Dolgov, I., Gómez-Benito, J., Grendar, M., Grice, J.W., Guerrero-Gimenez, M.E., Gutiérrez, A., Huedo-Medina, T.B., Jaffe, K., Janyan, A., Karimnezhad, A., Korner-Nievergelt, F., Kosugi, K., Lachmair, M., Ledesma, R.D., Limongi, R., Liuzza, M.T., Lombardo, R., Marks, M.J., Meinlschmidt, G., Nalborczyk, L., Nguyen, H.T., Ospina, R., Perezgonzalez, J.D., Pfister, R., Rahona, J.J., Rodríguez-Medina, D.A., Romão, X., Ruiz-Fernández, S., Suarez, I., Tegethoff, M., Tejo, M., van de Schoot, R., Vankov, I.I., Velasco-Forero, S., Wang, T., Yamada, Y., Zoppino, F.C.M., Marmolejo-Ramos, F.: Manipulating the alpha level cannot cure significance testing. Front. Psychology. 9, 699 (2018)
Article Google Scholar
Trafimow, D., MacDonald, J.A.: Performing inferential statistics prior to data collection. Educ. Psychol. Meas. 77(2), 204–219 (2017)
Article Google Scholar
Trafimow, D., Marks, M.: Editorial. Basic Appl. Soc. Psychol. 37(1), 1–2 (2015)
Article MathSciNet Google Scholar
Trafimow, D., Marks, M.: Editorial. Basic Appl. Soc. Psychol. 38(1), 1–2 (2016)
Article MathSciNet Google Scholar
Trafimow, D., Wang, T., Wang, C.: Means and standard deviations, or locations and scales? That is the question! New Ideas Psychol. 50, 34–37 (2018)
Article Google Scholar
Trafimow, D., Wang, T., Wang, C.: From a sampling precision perspective, skewness is a friend and not an enemy! Educ. Psychol. Meas. (in press)
Google Scholar
Trueblood, J.S., Busemeyer, J.R.: A quantum probability account of order effects in inference. Cogn. Sci. 35, 1518–1552 (2011)
Article Google Scholar
Trueblood, J.S., Busemeyer, J.R.: A quantum probability model of causal reasoning. Front. Psychol. 3, 138 (2012)
Article Google Scholar
Valentine, J.C., Aloe, A.M., Lau, T.S.: Life after NHST: how to describe your data without “p-ing” everywhere. Basic Appl. Soc. Psychol. 37(5), 260–273 (2015)
Article Google Scholar
Wasserstein, R.L., Lazar, N.A.: The ASA’s statement on p-values: context, process, and purpose. Am. Stat. 70, 129–133 (2016)
Article MathSciNet Google Scholar
Woodside, A.: The good practices manifesto: overcoming bad practices pervasive in current research in business. J. Bus. Res. 69(2), 365–381 (2016)
Article Google Scholar
Ziliak, S.T., McCloskey, D.N.: The Cult of Statistical Significance: How the Standard Error Costs us Jobs, Justice, and Lives. The University of Michigan Press, Ann Arbor (2016)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, New Mexico State University, MSC 3452, P. O. Box 30001, Las Cruces, NM, 88003-8001, USA
David Trafimow

Authors

David Trafimow
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Trafimow .

Editor information

Editors and Affiliations

Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA
Vladik Kreinovich
Faculty of Economics, Chiang Mai University, Chiang Mai, Thailand
Songsak Sriboonchitta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Trafimow, D. (2019). My Ban on Null Hypothesis Significance Testing and Confidence Intervals. In: Kreinovich, V., Sriboonchitta, S. (eds) Structural Changes and their Econometric Modeling. TES 2019. Studies in Computational Intelligence, vol 808. Springer, Cham. https://doi.org/10.1007/978-3-030-04263-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-04263-9_3
Published: 24 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04262-2
Online ISBN: 978-3-030-04263-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics