Educational Psychology Review

, Volume 11, Issue 2, pp 157–169 | Cite as

Journal Editorial Policies Regarding Statistical Significance Tests: Heat Is to Fire as p Is to Importance

  • Bruce Thompson
Article

Abstract

The present paper responds to defenses of statistical significance testing offered by Levin and Robinson. First, some inaccurate perceptions of contemporary criticisms of statistical tests are noted. Second, areas of disagreement are explored. For example, it is noted that all nine empirical studies of reporting practices since 1994 show that “encouraging” (per the 1994 APA style manual) authors to report effect sizes has not worked; two reasons for this failure are explored. Finally, two important areas of agreement regarding needed improvements in contemporary practices are noted.

statistical significance effect size editorial policy statistics research methods 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

REFERENCES

  1. Biskin, B. H. (1998). Comment on significance testing. Measure. Eval. Counsel. Dev. 31: 58-62.Google Scholar
  2. Carver, R. (1978). The case against statistical significance testing. Harvard Educ. Rev. 48: 378-399.Google Scholar
  3. Carver, R. (1993). The case against statistical significance testing, revisited. J. Exp. Educ. 61: 287-292.Google Scholar
  4. Cohen, J. (1994). The earth is round (p <.05). Am. Psychol. 49: 997-1003.Google Scholar
  5. Heldref Foundation (1997). Guidelines for contributors. J. Exp. Educ. 65: 95-96.Google Scholar
  6. Huberty, C. J (1993). Historical origins of statistical testing practices: The treatment of Fisher versus Neyman-Pearson views in textbooks. J. Exp. Educ. 61: 317-333.Google Scholar
  7. Huberty, C. J., and Pike, C. J. (1999). The historical origins of statistical significance testing. In Thompson, B. (ed.), Advances in Social Science Methodology, Vol. 5, JAI Press, Greenwich, CT (in press).Google Scholar
  8. Kirk, R. (1996). Practical significance: A concept whose time has come. Educ. Psychol. Measure. 56: 746-759.Google Scholar
  9. Lance, T., and Vacha-Haase, T. (1998). The Counseling Psychologist: Trends and usages of statistical significance testing. Paper presented at the annual meeting of the American Psychological Association, San Francisco, Aug.Google Scholar
  10. Levin, J. R. (1997). Overcoming feelings of powerlessness in “aging” researchers: A primer on statistical power in analysis of variance designs. Psychol. Aging 12: 84-106.Google Scholar
  11. Levin, J. R., and Robinson, D. H. (1999). Further reflections on hypothesis testing and editorial policy for primary research journals. Educ. Psychol. Rev. 11: 143-155.Google Scholar
  12. Loftus, G. R. (1994). Why psychology will never be a real science until we change the way we analyze data. Paper presented at the annual meeting of the American Psychological Association, Los Angeles, Aug.Google Scholar
  13. Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. J. Consult. Clin. Psychol. 46: 806-834.Google Scholar
  14. Mulaik, S. A., Raju, N. S., and Harshman, R. A. (1997). There is a time and place for significance testing. In Harlow, L. L., Mulaik, S. A., and Steiger, J. H. (eds.), What If There Were No Significance Tests? Erlbaum, Mahwah, NJ, pp. 65-115.Google Scholar
  15. Murphy, K. R. (1997). Editorial. J. Appl. Psychol. 82: 3-5.Google Scholar
  16. Nelson, N., Rosenthal, R., and Rosnow, R. L. (1986). Interpretation of significance levels and effect sizes by psychological researchers. Am. Psychol. 41: 1299-1301.Google Scholar
  17. Ness, C., and Vacha-Haase, T. (1998). Statistical significance reporting: Current trends and usages within Professional Psychology: Research and Practice. Paper presented at the annual meeting of the American Psychological Association, San Francisco, Aug.Google Scholar
  18. Nillson, J., and Vacha-Haase, T. (1998). A review of statistical significance reporting in the Journal of Counseling Psychology. Paper presented at the annual meeting of the American Psychological Association, San Francisco, Aug.Google Scholar
  19. Oakes, M. (1986). Statistical Inference: A Commentary for the Social and Behavioral Sciences, Wiley, New York.Google Scholar
  20. Reetz, D., and Vacha-Haase, T. (1998). Trends and usages of statistical significance testing in adult development and aging research: A review of Psychology and Aging. Paper presented at the annual meeting of the American Psychological Association, San Francisco, Aug.Google Scholar
  21. Robinson, D., and Levin, J. (1997). Reflections on statistical and substantive significance, with a slice of replication. Educ. Res. 26(5): 21-26.Google Scholar
  22. Rosenthal, R. (1979). The “file drawer problem” and tolerance for null results. Psychol. Bull. 86: 638-641.Google Scholar
  23. Rosenthal, R., and Gaito, J. (1963). The interpretation of level of significance by psychological researchers. J. Psychol. 55: 33-38.Google Scholar
  24. Rosnow, R. L., and Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. Am. Psychol. 44: 1276-1284.Google Scholar
  25. Rozeboom, W. W. (1960). The fallacy of the null hypothesis significance test. Psychol. Bull. 57: 416-428.Google Scholar
  26. Schmidt, F. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. Psychol. Methods 1: 115-129.Google Scholar
  27. Schmidt, F. L., and Hunter, J. E. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In Harlow, L. L., Mulaik, S. A., and Steiger, J. H. (eds.), What If There Were No Significance Tests? Erlbaum, Mahwah, NJ, pp. 37-64.Google Scholar
  28. Shaver, J. (1985). Chance and nonsense. Phi Delta Kappan 67(1): 57-60.Google Scholar
  29. Snyder, P. A., and Thompson, B. (1999). Use of tests of statistical significance and other analytic choices in a school psychology journal: Review of practices and suggested alternatives. School Psychol. Q. (in press).Google Scholar
  30. Thompson, B. (1988). Program FACSTRAP: A program that computes bootstrap estimates of factor structure. Educ. Psychol. Measure. 48: 681-686.Google Scholar
  31. Thompson, B. (1989). Statistical significance, result importance, and result generalizability: Three noteworthy but somewhat different issues. Measure. Eval. Counsel. Dev. 22: 2-5.Google Scholar
  32. Thompson, B. (1992). DISCSTRA: A computer program that computes bootstrap resampling estimates of descriptive discriminant analysis function and structure coefficients and group centroids. Educ. Psychol. Measure. 52: 905-911.Google Scholar
  33. Thompson, B. (1993). The use of statistical significance tests in research: Bootstrap and other alternatives. J. Exp. Educ. 61: 361-377.Google Scholar
  34. Thompson, B. (1994a). Guidelines for authors. Educ. Psychol. Measure. 54(4): 837-847.Google Scholar
  35. Thompson, B. (1994b). The pivotal role of replication in psychological research: Empirically evaluating the replicability of sample results. J. Person. 62: 157-176.Google Scholar
  36. Thompson, B. (1995). Exploring the replicability of a study's results: Bootstrap statistics for the multivariate case. Educ. Psychol. Measure. 55: 84-94.Google Scholar
  37. Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educ. Res. 25(2): 26-30.Google Scholar
  38. Thompson, B. (1997). Editorial policies regarding statistical significance tests: Further comments. Educ. Res. 26(5): 29-32.Google Scholar
  39. Thompson, B. (1999a). Five methodology errors in educational research: The pantheon of statistical significance and other faux pas. In Thompson, B. (ed.), Advances in Social Science Methodology, Vol. 5, JAI Press, Greenwich, CT (in press). [Invited address presented at the 1998 annual meeting of the American Educational Research Association, San Diego.]Google Scholar
  40. Thompson, B. (1999b). In praise of brilliance, where that praise really belongs. Am. Psychol. (in press).Google Scholar
  41. Thompson, B. (1999c). If statistical significance tests are broken/misused, what practices should supplement or replace them? Theory Psychol. (in press). [Invited address presented at the 1997 annual meeting of the American Psychological Association, Chicago.]Google Scholar
  42. Thompson, B. (1999d). Why “encouraging” effect size reporting isn't working: The etiology of researcher resistance to changing practices. J. Psychol. (in press).Google Scholar
  43. Thompson, B., and Snyder, P. A. (1997). Statistical significance testing practices in the Journal of Experimental Education. J. Exp. Educ. 66: 75-83.Google Scholar
  44. Thompson, B., and Snyder, P. A. (1998). Statistical significance and reliability analyses in recent JCD research articles. J. Counsel. Dev. 76: 436-441.Google Scholar
  45. Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educ. Psychol. Measure. 58: 6-20.Google Scholar
  46. Vacha-Haase, T., and Nilsson, J. E. (1998). Statistical significance reporting: Current trends and usages within MECD. Measure. Eval. Counsel. Dev. 31: 46-57.Google Scholar
  47. Vacha-Haase, T., and Thompson, B. (1998). Further comments on statistical significance tests. Measure. Eval. Counsel. Dev. 31: 63-67.Google Scholar
  48. Zuckerman, M., Hodgins, H. S., Zuckerman, A., and Rosenthal, R. (1993). Contemporary issues in the analysis of data: A survey of 551 psychologists. Psychol. Sci. 4: 49-53.Google Scholar

Copyright information

© Plenum Publishing Corporation 1999

Authors and Affiliations

  • Bruce Thompson
    • 1
  1. 1.TAMU Department of Educational PsychologyTexas A&M University and Baylor College of MedicineCollege Station

Personalised recommendations