# Robust misinterpretation of confidence intervals

## Abstract

Null hypothesis significance testing (NHST) is undoubtedly the most common inferential technique used to justify claims in the social sciences. However, even staunch defenders of NHST agree that its outcomes are often misinterpreted. Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual. Nevertheless, little is known about how researchers interpret CIs. In this study, 120 researchers and 442 students—all in the field of psychology—were asked to assess the truth value of six particular statements involving different interpretations of a CI. Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs. Self-declared experience with statistics was not related to researchers’ performance, and, even more surprisingly, researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever. Our findings suggest that many researchers do not know the correct interpretation of a CI. The misunderstandings surrounding *p*-values and CIs are particularly unfortunate because they constitute the main tools by which psychologists draw conclusions from data.

## Keywords

Confidence intervals Significance testing Inference## Notes

### Acknowledgements

This work was supported by the starting grant “Bayes or Bust” awarded by the European Research Council, and by National Science Foundation Grants BCS-1240359 and SES-102408.

## References

- Abelson, R. P. (1997). A retrospective on the significance test ban of 1999 (if there were no significance tests, they would be invented). In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.),
*What if there were no significance tests?*Mahwah, NJ: Erlbaum.Google Scholar - American Psychological Association. (2001).
*Publication manual of the American Psychological Association*(5th ed.). Washington, DC: Author.Google Scholar - American Psychological Association. (2009).
*Publication manual of the American Psychological Association*(6th ed.). Washington, DC: Author.Google Scholar - Belia, S., Fidler, F., Williams, J., & Cumming, G. (2005). Researchers misunderstand confidence intervals and standard error bars.
*Psychological Methods, 10,*389–396.PubMedCrossRefGoogle Scholar - Berger, J. O. (2006). The case for objective Bayesian analysis.
*Bayesian Analysis, 1,*385–402.CrossRefGoogle Scholar - Berger, J. O., & Wolpert, R. L. (1988).
*The likelihood principle*(2nd ed.). Hayward, CA: Institute of Mathematical Statistics.Google Scholar - Berkson, J. (1942). Tests of significance considered as evidence.
*Journal of the American Statistical Association, 37,*325–335.CrossRefGoogle Scholar - Blaker, H., & Spjøtvoll, E. (2000). Paradoxes and improvements in interval estimation.
*The American Statistician, 54,*242–247.Google Scholar - Chow, S. L. (1998). A précis of “Statistical significance: Rationale, validity and utility.
*Behavioral and Brain Sciences, 21,*169–194.PubMedGoogle Scholar - Cohen, J. (1994). The earth is round (p <.05).
*American Psychologist, 49,*997–1003.CrossRefGoogle Scholar - Cortina, J. M., & Dunlap, W. P. (1997). On the logic and purpose of significance testing.
*Psychological Methods, 2,*161–172.CrossRefGoogle Scholar - Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and non-central distributions.
*Educational and Psychological Measurement, 61,*532–574.CrossRefGoogle Scholar - Cumming, G., & Fidler, F. (2009). Confidence intervals: Better answers to better questions.
*Zeitschrift für Psychologie, 217*, 15–26. doi: 10.1027/0044-3409.217.1.15 - Curran-Everett, D. (2000). Multiple comparisons: Philosophies and illustrations.
*American Journal of Physiology - Regulatory, Integrative and Comparative Physiology, 279,*R1–R8.PubMedGoogle Scholar - Dienes, Z. (2011). Bayesian versus orthodox statistics: Which side are you on?
*Perspectives on Psychological Science, 6,*274–290.CrossRefGoogle Scholar - Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research.
*Psychological Review, 70,*193–242.CrossRefGoogle Scholar - Falk, R., & Greenbaum, C. W. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception.
*Theory and Psychology, 5,*75–98.CrossRefGoogle Scholar - Fidler, F. (2005).
*From statistical significance to effect estimation: Statistical reform in psychology, medicine and ecology.*Unpublished doctoral dissertation, University of Melbourne, Melbourne.Google Scholar - Fidler, F., & Loftus, G. R. (2009). Why figures with error bars should replace p values: Some conceptual arguments and empirical demonstrations.
*Journal of Psychology, 217,*27–37.Google Scholar - Finch, S., Cumming, G., Williams, J., Palmer, L., Griffith, E., Alders, C., Anderson, J., & Goodman, O. (2004). Reform of statistical inference in psychology: The case of memory and cognition.
*Behavior Research Methods, Instruments, & Computers, 36,*312–324.Google Scholar - Gigerenzer, G. (2004). Mindless statistics.
*The Journal of Socio-Economics, 33,*587–606.CrossRefGoogle Scholar - Haller, H., & Krauss, S. (2002). Misinterpretations of significance: a problem students share with their teachers? Methods of Psychological Research Online [On-line serial], 7, 120. Retrieved May 27, 2013, from www2.uni-jena.de/svw/metheval/lehre/0405-ws/evaluationuebung/haller.pdf
- Harlow, Mulaik, S. A., & Steiger, J. H. (1997).
*What if there were no significance tests?*Mahwah, NJ: Erlbaum.Google Scholar - Hoekstra, R., Finch, S., Kiers, H. A. L., & Johnson, A. (2006). Probability as certainty: Dichotomous thinking and the misuse of p-values.
*Psychonomic Bulletin & Review, 13,*1033–1037.CrossRefGoogle Scholar - Hoekstra, R., Johnson, A., & Kiers, H. A. L. (2012). Confidence intervals make a difference: Effects of showing confidence intervals on inferential reasoning.
*Educational and Psychological Measurement, 72,*1039–1052.CrossRefGoogle Scholar - Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis.
*The American Statistician, 55,*19–24.CrossRefGoogle Scholar - Jaynes, E. T. (1976). Confidence intervals vs Bayesian intervals. In W. L. Harper & C. A. Hooker (Eds.),
*Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science*(pp. 175–257). Dordrecht, The Netherlands: Reidel Publishing Company.CrossRefGoogle Scholar - Kalinowski, P. (2010). Identifying misconceptions about confidence intervals. Proceedings of the Eighth International Conference on Teaching Statistics. [CDROM]. IASE, Lijbljana, Slovenia, Refereed paper.Google Scholar
- Kline, R. B. (2004).
*Beyond significance testing: reforming data analysis methods in behavioral research*. Washington DC, USA: American Psychological Association.Google Scholar - Kruschke, J. K., Aguinis, H., & Joo, H. (2012). The time has come: Bayesian methods for data analysis in the organizational sciences.
*Organizational Research Methods, 15,*722–752. doi: 10.1177/1094428112457829 CrossRefGoogle Scholar - Lecoutre, M.-P., Poitevineau, J., & Lecoutre, B. (2003). Even statisticians are not immune to misinterpretations of null hypothesis tests.
*International Journal of Psychology, 38,*37–45.CrossRefGoogle Scholar - Lindley, D. V. (1965).
*Introduction to probability and statistics from a Bayesian viewpoint. Part 2*. Cambridge: Inference. Cambridge University Press.CrossRefGoogle Scholar - Morey, R. D. (2013). The consistency test does not-and cannot-deliver what is advertised: A comment on Francis (2013).
*Journal of Mathematical Psychology*. doi: 10.1016/j.jmp.2013.03.004 Google Scholar - Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability.
*Perspectives on Psychological Science, 7,*615–631. doi: 10.1177/1745691612459058 CrossRefGoogle Scholar - O’Hagan, A. (2004). Dicing with the unknown.
*Significance, 1,*132–133.CrossRefGoogle Scholar - Oakes, M. (1986).
*Statistical inference: A commentary for the social and behavioural sciences*. Chicester: John Wiley & Sons.Google Scholar - Pashler, H., & Wagenmakers, E.-J. (2012). Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence?
*Perspectives on Psychological Science, 7,*528–530.CrossRefGoogle Scholar - Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science.
*American Psychologist, 44,*1276–1284.CrossRefGoogle Scholar - Scheutz, F., Andersen, B., & Wulff, H. R. (1988). What do dentists know about statistics?
*Scandinavian Journal of Dental Research, 96,*281–287.PubMedGoogle Scholar - Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers.
*Psychological Methods, 1,*115–129.CrossRefGoogle Scholar - Schmidt, F. L., & Hunter, J. E. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.),
*What if there were no significance tests?*Mahwah, NJ: Erlbaum.Google Scholar - Sellke, T., Bayarri, M.-J., & Berger, J. O. (2001). Calibration of
*p*values for testing precise null hypotheses.*The American Statistician, 55,*62–71.CrossRefGoogle Scholar - Stone, M. (1969). The role of significance testing: Some data with a message.
*Biometrika, 56,*485–493.CrossRefGoogle Scholar - Wagenmakers, E.-J. (2007). A practical solution to the pervasive problem of
*p*values.*Psychonomic Bulletin & Review, 14,*779–804.CrossRefGoogle Scholar - Wilkinson, L., & APA Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations.
*American Psychologist, 54,*594–604.CrossRefGoogle Scholar - Winch, R. F., & Campbell, D. T. (1969). Proof? No. Evidence? Yes. The significance of tests of significance.
*American Sociologist, 4,*140–143.Google Scholar - Wulff, H. R., Andersen, B., Brandenhoff, P., & Guttler, F. (1987). What do doctors know about statistics?
*Statistics in Medicine, 6,*3–10.PubMedCrossRefGoogle Scholar