Skip to main content

Part of the book series: Higher Education: Handbook of Theory and Research ((HATR,volume 19))

Summary

Effect sizes will be routinely reported only once editors promulgate policies that make these practices normatively expected. As Sedlmeier and Gigerenzer (1989) argued, “there is only one force that can effect a change, and that is the same force that helped institutionalize null hypothesis testing as the sine qua non for publication, namely, the editors of the major journals” (p. 315). Glantz (1980) agreed, noting that “The journals are the major force for quality control in scientific work” (p. 3).

The fact that 23 journals, including two major journals of large professional associations, now require effect size reporting bodes well for improved practices. As Fidler (2002) recently observed in her penetrating essay, “Of the major American associations, only all the journals of the American Educational Research Association have remained silent on all these issues” (p. 754).

As Thompson (1999a) noted, “It is doubtful that the field will ever settle on a single index to be used in all studies, given that so many choices exist and because the statistics can be translated into approximations across the two major classes” (p. 171). But three practices should be expected:

  1. 1.

    report effect sizes for all primary outcomes, even if they are not statistically significant (see Thompson, 2002c);

  2. 2.

    interpret effect sizes by explicit and direct comparison with the effects in related prior studies (Thompson, 2002c; Wilkinson and APA Task Force on Statistical Inference, 1999); and

  3. 3.

    compute CIs for results, including effect sizes (cf. Cumming and Finch, 2001), and when a reasonably large number of effects for current and prior studies are available, consider using graphics to facilitate comparisons (Wilkinson and APA Task Force on Statistical Inference, 1999).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aaron, B., Kromrey, J.D., and Ferron, J.M. (November, 1998). Equating r-based and d-based effect size indices: problems with a commonly recommended formula. Paper presented at the annual meeting of the Florida Educational Research Association, Orlando, FL (ERIC Document Reproduction Service No. ED 433 353).

    Google Scholar 

  • Abelson, R.P. (1997). A retrospective on the significance test ban of 1999 (If there were no significance tests, they would be invented). In L.L. Harlow, S.A. Mulaik, and J.H. Steiger (eds.), What if There Were no Significance Tests? (pp. 117–141). Mahwah, NJ: Erlbaum.

    Google Scholar 

  • American Psychological Association. (1994). Publication Manual of the American Psychological Association (4th edn.). Washington, DC: Author.

    Google Scholar 

  • American Psychological Association. (2001). Publication Manual of the American Psychological Association (5th edn.). Washington, DC: Author.

    Google Scholar 

  • Bagozzi, R.P., Fornell, C., and Larcker, D.F. (1981). Canonical correlation analysis as a special case of a structural relations model. Multivariate Behavioral Research 16: 437–454.

    Google Scholar 

  • Baugh, F. (2002). Correcting effect sizes for score reliability: A reminder that measurement and substantive issues are linked inextricably. Educational and Psychological Measurement 62: 254–263.

    Article  Google Scholar 

  • Baugh, F., and Thompson, B. (2001). Using effect sizes in social science research: New APA and journal mandates for improved methodology practices. Journal of Research in Education 11(1): 120–129.

    Google Scholar 

  • Boring, E.G. (1919). Mathematical vs. scientific importance. Psychological Bulletin 16: 335–338.

    Google Scholar 

  • Carver, R. (1978). The case against statistical significance testing. Harvard Educational Review 48: 378–399.

    Google Scholar 

  • Cohen, J. (1968). Multiple regression as a general data-analytic system. Psychological Bulletin 70: 426–443.

    Google Scholar 

  • Cohen, J. (1969). Statistical Power Analysis for the Behavioral Sciences. New York: Academic Press.

    Google Scholar 

  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd edn.), Hillside, NJ: Erlbaum.

    Google Scholar 

  • Cohen, J. (1994). The earth is round (p<.05). American Psychologist 49: 997–1003.

    Article  Google Scholar 

  • Cortina, J.M., and Dunlap, W.P. (1997). Logic and purpose of significance testing. Psychological Methods 2: 161–172.

    Article  Google Scholar 

  • Cumming, G., and Finch, S. (2001). A primer on the understanding, use and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement 61: 532–574.

    Article  Google Scholar 

  • Elmore, P., and Rotou, O. (2001, April). A primer on basic effect size concepts. Paper presented at the annual meeting of the American Educational Research Association, Seattle (ERIC Document Reproduction Service No. ED 453 260).

    Google Scholar 

  • Ezekiel, M. (1930). Methods of Correlational Analysis. New York: Wiley.

    Google Scholar 

  • Fidler, F. (2002). The fifth edition of the APA Publication Manual: Why its statistics recommendations are so controversial. Educational and Psychological Measurement 62: 749–770.

    Article  Google Scholar 

  • Finch, S., Cumming, G., and Thomason, N. (2001). Reporting of statistical inference in the Journal of Applied Psychology: Little evidence of reform. Educational and Psychological Measurement 61: 181–210.

    Google Scholar 

  • Fleishman, A.I. (1980). Confidence intervals for correlation ratios. Educational and Psychological Measurement 40: 659–670.

    Google Scholar 

  • Friedman, H. (1968). Magnitude of experimental effect and a table for its rapid estimation. Psychological Bulletin 70: 245–251.

    Google Scholar 

  • Glantz, S.A. (1980). Biostatistics: How to detect, correct and prevent errors in the medical literature. Circulation 61: 1–7.

    Google Scholar 

  • Glass, G. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher 5(10): 3–8.

    Google Scholar 

  • Gregg, M., and Leinhardt, G. (2002). Learning from the Birmingham Civil Rights Institute: Documenting teacher development. American Educational Research Journal 39: 553–587.

    Google Scholar 

  • Harris, M.J. (1991). Significance tests are not enough: The role of effect-size estimation in theory corroboration. Theory & Psychology 1: 375–382.

    Google Scholar 

  • Herzberg, P.A. (1969). The parameters of cross-validation. Psychometrika Monograph Supplement 16: 1–67.

    Google Scholar 

  • Hess, B., Olejnik, S., and Huberty, C.J (2001). The efficacy of two Improvement-over-chance effect sizes for two-group univariate comparisons under variance heterogeneity and non-normality. Educational and Psychological Measurement 61: 909–936.

    Article  Google Scholar 

  • Huberty, C.J. (1999). On some history regarding statistical testing. In B. Thompson (ed.), Advances in Social Science Methodology (Vol. 5, pp. 1–23). Stamford, CT: JAI Press.

    Google Scholar 

  • Huberty, C.J. (2002). A history of effect size indices. Educational and Psychological Measurement 62: 227–240.

    Article  Google Scholar 

  • Huberty, C.J., and Holmes, S.E. (1983). Two-group comparisons and univariate classification. Educational and Psychological Measurement 43: 15–26.

    Google Scholar 

  • Huberty, C.J., and Lowman, L.L. (2000). Group overlap as a basis for effect size. Educational and Psychological Measurement 60: 543–563.

    Article  Google Scholar 

  • Huberty, C.J., and Morris, J.D. (1988). A single contrast test procedure. Educational and Psychological Measurement 48: 567–578.

    Google Scholar 

  • Hunter, J.E. (1997). Needed: A ban on the significance test. Psychological Science 8(1): 3–7.

    Google Scholar 

  • Jacobson, N.S., Roberts, L.J., Berns, S.B., and McGlinchey, J.B (1999). Methods for defining and determining the clinical significance of treatment effects: Description, application, and alternatives. Journal of Consulting and Clinical Psychology 67: 300–307.

    Article  Google Scholar 

  • Kazdin, A.E. (1999). The meanings and measurement of clinical significance. Journal of Consulting and Clinical Psychology 67: 332–339.

    Article  Google Scholar 

  • Kendall, P.C. (1999). Clinical significance. Journal of Consulting and Clinical Psychology 67: 283–284.

    Article  Google Scholar 

  • Kieffer, K.M., Reese, R.J., and Thompson, B. (2001). Statistical techniques employed in AERJ and JCP articles from 1988 to 1997: A methodological review. Journal of Experimental Education 69: 280–309.

    Google Scholar 

  • Kirk, R.E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement 56: 746–759.

    Google Scholar 

  • Knapp, T.R. (1978). Canonical correlation analysis: A general parametric significance testing system. Psychological Bulletin 85: 410–416.

    Google Scholar 

  • Kromrey, J.D., and Hines, C.V. (1996). Estimating the coefficient of cross-validity in multiple regression: A comparison of analytical and empirical methods. Journal of Experimental Education 64: 240–266.

    Google Scholar 

  • Kupersmid, J. (1988). Improving what is published: A model in search of an editor. American Psychologist 43: 635–642.

    Google Scholar 

  • Loftus, G.R. (1994, August). Why psychology will never be a real science until we change the way we analyze data. Paper presented at the annual meeting of the American Psychological Association, Los Angeles.

    Google Scholar 

  • Lord, F.M. (1950). Efficiency of Prediction when a Regression Equation from One Sample is Used in a New Sample (Research Bulletin 50-110). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Mittag, K.C., and Thompson, B. (2000). A national survey of AERA members’ perceptions of statistical significance tests and other statistical issues. Educational Researcher 29(4): 14–20.

    Google Scholar 

  • Murray, L.W., and Dosser, D.A. (1987). How significant is a significant difference? Problems with the measurement of magnitude of effect. Journal of Counseling Psychology 34: 68–72.

    Article  Google Scholar 

  • Nelson, N., Rosenthal, R., and Rosnow, R.L. (1986). Interpretation of significance levels and effect sizes by psychological researchers. American Psychologist 41: 1299–1301.

    Article  Google Scholar 

  • Oakes, M. (1986). Statistical Inference: A Commentary for the Social and Behavioral Sciences. New York: Wiley.

    Google Scholar 

  • O’Grady, K.E. (1982). Measures of explained variance: Cautions and limitations. Psychological Bulletin 92: 766–777.

    Google Scholar 

  • Olejnik, S., and Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology 25: 241–286.

    Article  Google Scholar 

  • Roberts, J.K., and Henson, R.K. (2002). Correction for bias in estimating effect sizes. Educational and Psychological Measurement 62: 241–253.

    Article  Google Scholar 

  • Robinson, D.H., and Wainer, H. (2002). On the past and future of null hypothesis significance testing. Journal of Wildlife Management 66: 263–271.

    Google Scholar 

  • Rosenthal, R., and Gaito, J. (1963). The interpretation of level of significance by psychological researchers. Journal of Psychology 55: 33–38.

    Google Scholar 

  • Rosnow, R.L., and Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist 44: 1276–1284.

    Article  Google Scholar 

  • Saunders, S.M., Howard, K.I., and Newman, F.L. (1988). Evaluating the clinical-significance of treatment effects — norms and normality. Behavioral Assessment 10: 207–218.

    Google Scholar 

  • Schmidt, F. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods 1: 115–129.

    Google Scholar 

  • Sedlmeier, P., and Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin 105: 309–316.

    Article  Google Scholar 

  • Shaver, J. (1985). Chance and nonsense. Phi Delta Kappan 67(1): 57–60.

    Google Scholar 

  • Smithson, M. (2001). Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psychological Measurement 61: 605–632.

    Article  Google Scholar 

  • Snyder, P. (2000). Guidelines for reporting results of group quantitative investigations. Journal of Early Intervention 23: 145–150.

    Google Scholar 

  • Snyder, P., and Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. Journal of Experimental Education 61: 334–349.

    Google Scholar 

  • Steiger, J.H., and Fouladi, R.T. (1992). R2: A computer program for interval estimation, power calculation, and hypothesis testing for the squared multiple correlation. Behavior Research Methods, Instruments, and Computers 4: 581–582.

    Google Scholar 

  • Stevens, J. (1992). Applied Multivariate Statistics for the Social Sciences (2nd edn.). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Thompson, B. (1992). Two and one-half decades of leadership in measurement and evaluation. Journal of Counseling and Development 70: 434–438.

    Google Scholar 

  • Thompson, B. (1993). The use of statistical significance tests in research: Bootstrap and other alternatives. Journal of Experimental Education 61: 361–377.

    Google Scholar 

  • Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher 25(2): 26–30.

    Google Scholar 

  • Thompson, B. (1998a). In praise of brilliance: Where that praise really belongs. American Psychologist 53: 799–800.

    Google Scholar 

  • Thompson, B. (1998b). Review of What if there were no significance tests? Educational and Psychological Measurement 58: 332–344.

    Google Scholar 

  • Thompson, B. (1999a). If statistical significance tests are broken/misused, what practices should supplement or replace them? Theory & Psychology 9: 167–183.

    Google Scholar 

  • Thompson, B. (1999b). Journal editorial policies regarding statistical significance tests: Heat is to fire as p is to importance. Educational Psychology Review 11: 157–169.

    Google Scholar 

  • Thompson, B. (2000a). Canonical correlation analysis. In L. Grimm, and P. Yarnold (eds.), Reading and Understanding More Multivariate Statistics (pp. 285–316). Washington, DC: American Psychological Association.

    Google Scholar 

  • Thompson, B. (2000b). Ten commandments of structural equation modeling. In L. Grimm, and P. Yarnold (eds.), Reading and Understanding More Multivariate Statistics (pp. 261–284). Washington, DC: American Psychological Association.

    Google Scholar 

  • Thompson, B. (2001). Significance, effect sizes, stepwise methods, and other issues: Strong arguments move the field. Journal of Experimental Education 70: 80–93.

    Google Scholar 

  • Thompson, B. (ed.) (2002a). Score Reliability: Contemporary Thinking on Reliability Issues. Newbury Park, CA: Sage.

    Google Scholar 

  • Thompson, B. (2002b). “Statistical,” “practical,” and “clinical”: How many kinds of significance do counselors need to consider? Journal of Counseling and Development 80: 64–71.

    Google Scholar 

  • Thompson, B. (2002c). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher 31(3), 24–31.

    Google Scholar 

  • Thompson, B., and Kieffer, K.M. (2000). Interpreting statistical significance test results: A proposed new “What if” method. Research in the Schools 7(2): 3–10.

    Google Scholar 

  • Thompson, B., and Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement 60: 174–195.

    Article  Google Scholar 

  • Trusty, J., Thompson, B., and Petrocelli, J.V. (2004). Practical guide to implementing the requirement of reporting effect size in quantitative research in the Journal of Counseling & Development. Journal of Counseling and Development.

    Google Scholar 

  • Tryon, W.W. (1998). The inscrutable null hypothesis. American Psychologist 53: 796.

    Google Scholar 

  • Vacha-Haase, T., Nilsson, J.E., Reetz, D.R., Lance, T.S., and Thompson, B. (2000). Reporting practices and APA editorial policies regarding statistical significance and effect size. Theory & Psychology 10: 413–425.

    Google Scholar 

  • Wilkinson, L., and APA Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist 54: 594–604 (reprint available through the APA Home Page: http://www.apa.org/journals/amp/amp548594.html.

    Article  Google Scholar 

  • Zuckerman, M., Hodgins, H.S., Zuckerman, A., and Rosenthal, R. (1993). Contemporary issues in the analysis of data: A survey of 551 psychologists. Psychological Science 4: 49–53.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Kluwer Academic Publishers

About this chapter

Cite this chapter

Hill, C.R., Thompson, B. (2004). Computing and Interpreting Effect Sizes. In: Smart, J.C. (eds) Higher Education: Handbook of Theory and Research. Higher Education: Handbook of Theory and Research, vol 19. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2456-8_5

Download citation

Publish with us

Policies and ethics