Educational Psychology Review

, Volume 11, Issue 2, pp 143–155 | Cite as

Further Reflections on Hypothesis Testing and Editorial Policy for Primary Research Journals

  • Joel R. Levin
  • Daniel H. Robinson
Article

Abstract

Questions have recently been raised about the value of statistical hypothesis testing, as well as the associated policy implications for publishing empirically based research in professional journals. In this Reflections note, we extend our (Robinson and Levin, 1997) earlier thoughts on what could, should, and should not be done to existing editorial practices.

statistical hypothosis testing statistical effect size educational statistics educational research 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

REFERENCES

  1. Abelson, R. P. (1997). A retrospective on the significance test ban of 1999 (If there were no significance tests, they would be invented). In Harlow, L. L., Mulaik, S. A., and Steiger, J. H. (eds.), What If There Were No Significance Tests? Erlbaum, Mahwah, NJ, pp. 118-141.Google Scholar
  2. American Psychological Association (1994). Publication Manual of the American Psychological Association, 4th ed., American Psychological Association, Washington, DC.Google Scholar
  3. Boling, N., and Robinson, D. H. (1997). Interactive media or cooperative learning: Which activity best supplements lecture-based distance education? Unpublished manuscript, Mississippi State University.Google Scholar
  4. Carver, R. (1978). The case against statistical significance testing. Harvard Educ. Rev. 48: 378-399.Google Scholar
  5. Cohen, J. (1990). Things I have learned (so far). Am. Psychol. 45: 1304-1312.Google Scholar
  6. Cohen, J. (1994). The earth is round (p <.05). Am. Psychol. 49: 997-1003.Google Scholar
  7. Derry, S., Levin, J. R., and Schauble, L. (1995). Stimulating statistical thinking through situated simulations. Teach. Psychol. 22: 51-57.Google Scholar
  8. Efron, B., and Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat. 37: 36-48.Google Scholar
  9. Estes, W. K. (1997). On the communication of information by displays of standard errors and confidence intervals. Psychon. Bull. Rev. 4: 330-341.Google Scholar
  10. Fern, E. F., and Monroe, K. B. (1996). Effect-size estimates: Issues and problems in interpretation. J. Consumer Res. 23: 89-105.Google Scholar
  11. Frick, R. W. (1996). The appropriate use of null hypothesis testing. Psychol. Methods 1: 379-390.Google Scholar
  12. Hagen, R. L. (1997). In praise of the null hypothesis statistical test. Am. Psychol. 52: 15-24.Google Scholar
  13. Harlow, L. L., Mulaik, S. A., and Steiger, J. A. (eds.) (1997). What If There Were No Significance Tests? Erlbaum, Mahwah, NJ.Google Scholar
  14. Huberty, C. (1987). On statistical significance testing. Educ. Res. 16(8): 4-9.Google Scholar
  15. Jaeger, R. M. (ed.) (1988). Complementary Methods for Research in Education, American Educational Research Association, Washington, DC.Google Scholar
  16. Keselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A. Donahue, B., Kowalehuk, P. K., Lowman, L. L., Petroskey, M. D., Keselman, J. C., and Levin, J. R. (1998). Statistical practices of educational researchers. An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Rev. Edu. Res. 68: 350-386.Google Scholar
  17. Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educ. Psychol. Measure. 56: 746-759.Google Scholar
  18. Knapp, T. R. (1997). Personal communication, June.Google Scholar
  19. Levin, J. R. (1994). Crafting educational intervention research that's both credible and creditable. Educ. Psychol. Rev. 6: 231-243.Google Scholar
  20. Levin, J. R. (1995). The consultant's manual of researchers' common stat-illogical disorders. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, Apr.Google Scholar
  21. Levin, J. R. (1997). Overcoming feelings of powerlessness in “aging” research: A primer on statistical power in analysis of variance designs. Psychol. Aging 12: 84-106.Google Scholar
  22. Levin, J. R. (1998). To test or not to test H0? Educ. Psychol. Measure. 58: 313-333.Google Scholar
  23. Levin, J. R., and Neumann, E. (1999). Testing for predicted patterns: When interest in the whole is greater than in some of its parts. Psychol. Methods 4: 44-57.Google Scholar
  24. Lykken, D. T. (1968). Statistical significance in psychological research. Psychol. Bull. 70: 151-159.Google Scholar
  25. McCutchen, D., Bell, L. C., France, I. M., and Perfetti, C. A. (1991). Phoneme-specific interference in reading: The tongue-twister effect revisited. Read. Res. Q. 26: 87-103.Google Scholar
  26. McKeachie, W. J. (1997). Personal communication, Mar.Google Scholar
  27. Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. J. Consult. Clin. Psychol. 46: 806-834.Google Scholar
  28. Meehl, P. E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In Harlow, L. L., Mulaik, S. A., and Steiger, J. H. (eds.), What If There Were No Significance Tests? Erlbaum, Mahwah, NJ, pp. 393-425.Google Scholar
  29. Mulaik, S. A., Raju, N. S., and Harshman, R. A. (1997). There is a time and place for significance testing. In Harlow, L. L., Mulaik, S. A., and Steiger, J. H. (eds.), What If There Were No Significance Tests? Erlbaum, Mahwah, NJ, pp. 65-115.Google Scholar
  30. Platt, J. R. (1964). Strong inference. Science 146: 347-353.Google Scholar
  31. Reichardt, C. S., and Gollob, H. F. (1997). When confidence intervals should be used instead of statistical tests, and vice versa. In Harlow, L. L., Mulaik, S. A., and Steiger, J. H. (eds.), What If There Were No Significance Tests? Erlbaum, Mahwah, NJ, pp. 259-284.Google Scholar
  32. Rindskopf, D. M. (1997). Testing “small,” not null, hypotheses: Classical and Bayesian approaches. In Harlow, L. L., Mulaik, S. A., and Steiger, J. H. (eds.), What If There Were No Significance Tests? Erlbaum, Mahwah, NJ, pp. 319-332.Google Scholar
  33. Robinson, D. H., and Katayama, A. D. (1997). At-lexical, articulatory interference in silent reading: The “upstream” tongue-twister effect. Memory Cognit. 25: 661-665.Google Scholar
  34. Robinson, D. H., and Levin, J. R. (1997). Reflections on statistical and substantive significance, with a slice of replication. Educ. Res. 26(5): 21-26.Google Scholar
  35. Robinson, D. H., Levin, J. R., Halbur, D., and O'Neill, L. (1999, Apr.). Does use of statistical language constitute a “significant” roadblock to readers' interpretations of research results? Paper presented at the annual meeting of the American Educational Research Association, Montreal.Google Scholar
  36. Rozeboom, W. W. (1960). The fallacy of the null-hypothesis significance test. Psychol. Bull. 57: 416-428.Google Scholar
  37. Rozeboom, W. W. (1997). Good science is abductive, not hypothetico-deductive. In Harlow, L. L., Mulaik, S. A., and Steiger, J. H. (eds.), What If There Were No Significance Tests? Erlbaum, Mahwah, NJ, pp. 335-391.Google Scholar
  38. Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychol. Methods 1: 115-129.Google Scholar
  39. Seaman, M. A., and Serlin, R. C. (1998). Equivalence confidence intervals for two-group comparisons of means. Psychol. Methods 3: 403-411.Google Scholar
  40. Shea, C. (1996). Psychologists debate accuracy of “significance test.” Chronicle Higher Educ. Aug. 16: A12-A17.Google Scholar
  41. Thompson, B. (1993). The use of statistical significance tests in research: Bootstrap and other alternatives. J. Exp. Educ. 61: 361-377.Google Scholar
  42. Thompson, B. (1994). Guidelines for authors. Educ. Psychol. Measure. 54: 837-847.Google Scholar
  43. Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educ. Res. 25(2): 26-30.Google Scholar
  44. Thompson, B. (1997). Editorial policies regarding statistical significance tests: Further comments. Educ. Res. 26(5): 29-32.Google Scholar
  45. Thompson, B., and Snyder, P. A. (1998). Statistical significance and reliability analyses in recent JCD research articles. J. Counsel. Dev. 76: 436-441.Google Scholar
  46. Thompson, B., and Snyder, P. A. (1997). Statistical significance testing practices in the Journal of Experimental Education. J. Exp. Educ. 66: 75-83.Google Scholar
  47. Tukey, J. W. (1991). The philosophy of multiple comparisons. Stat. Sci. 6: 100-116.Google Scholar
  48. Wollack, J. A. (1997). A nominal response model approach for detecting answer copying. Appl. Psychol. Measure. 21: 307-320.Google Scholar

Copyright information

© Plenum Publishing Corporation 1999

Authors and Affiliations

  • Joel R. Levin
    • 1
  • Daniel H. Robinson
    • 2
  1. 1.Department of Educational Psychology, University of WisconsinUniversity of Wisconsin—MadisonMadisonWisconsin
  2. 2.University of Texas at AustinUSA

Personalised recommendations