Computing and Interpreting Effect Sizes

Hill, Crystal Reneé; Thompson, Bruce

doi:10.1007/1-4020-2456-8_5

Crystal Reneé Hill¹¹ &
Bruce Thompson¹²

Part of the book series: Higher Education: Handbook of Theory and Research ((HATR,volume 19))

1643 Accesses
5 Citations

Summary

Effect sizes will be routinely reported only once editors promulgate policies that make these practices normatively expected. As Sedlmeier and Gigerenzer (1989) argued, “there is only one force that can effect a change, and that is the same force that helped institutionalize null hypothesis testing as the sine qua non for publication, namely, the editors of the major journals” (p. 315). Glantz (1980) agreed, noting that “The journals are the major force for quality control in scientific work” (p. 3).

The fact that 23 journals, including two major journals of large professional associations, now require effect size reporting bodes well for improved practices. As Fidler (2002) recently observed in her penetrating essay, “Of the major American associations, only all the journals of the American Educational Research Association have remained silent on all these issues” (p. 754).

As Thompson (1999a) noted, “It is doubtful that the field will ever settle on a single index to be used in all studies, given that so many choices exist and because the statistics can be translated into approximations across the two major classes” (p. 171). But three practices should be expected:

1.
report effect sizes for all primary outcomes, even if they are not statistically significant (see Thompson, 2002c);
2.
interpret effect sizes by explicit and direct comparison with the effects in related prior studies (Thompson, 2002c; Wilkinson and APA Task Force on Statistical Inference, 1999); and
3.
compute CIs for results, including effect sizes (cf. Cumming and Finch, 2001), and when a reasonably large number of effects for current and prior studies are available, consider using graphics to facilitate comparisons (Wilkinson and APA Task Force on Statistical Inference, 1999).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aaron, B., Kromrey, J.D., and Ferron, J.M. (November, 1998). Equating r-based and d-based effect size indices: problems with a commonly recommended formula. Paper presented at the annual meeting of the Florida Educational Research Association, Orlando, FL (ERIC Document Reproduction Service No. ED 433 353).
Google Scholar
Abelson, R.P. (1997). A retrospective on the significance test ban of 1999 (If there were no significance tests, they would be invented). In L.L. Harlow, S.A. Mulaik, and J.H. Steiger (eds.), What if There Were no Significance Tests? (pp. 117–141). Mahwah, NJ: Erlbaum.
Google Scholar
American Psychological Association. (1994). Publication Manual of the American Psychological Association (4th edn.). Washington, DC: Author.
Google Scholar
American Psychological Association. (2001). Publication Manual of the American Psychological Association (5th edn.). Washington, DC: Author.
Google Scholar
Bagozzi, R.P., Fornell, C., and Larcker, D.F. (1981). Canonical correlation analysis as a special case of a structural relations model. Multivariate Behavioral Research 16: 437–454.
Google Scholar
Baugh, F. (2002). Correcting effect sizes for score reliability: A reminder that measurement and substantive issues are linked inextricably. Educational and Psychological Measurement 62: 254–263.
Article Google Scholar
Baugh, F., and Thompson, B. (2001). Using effect sizes in social science research: New APA and journal mandates for improved methodology practices. Journal of Research in Education 11(1): 120–129.
Google Scholar
Boring, E.G. (1919). Mathematical vs. scientific importance. Psychological Bulletin 16: 335–338.
Google Scholar
Carver, R. (1978). The case against statistical significance testing. Harvard Educational Review 48: 378–399.
Google Scholar
Cohen, J. (1968). Multiple regression as a general data-analytic system. Psychological Bulletin 70: 426–443.
Google Scholar
Cohen, J. (1969). Statistical Power Analysis for the Behavioral Sciences. New York: Academic Press.
Google Scholar
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd edn.), Hillside, NJ: Erlbaum.
Google Scholar
Cohen, J. (1994). The earth is round (p<.05). American Psychologist 49: 997–1003.
Article Google Scholar
Cortina, J.M., and Dunlap, W.P. (1997). Logic and purpose of significance testing. Psychological Methods 2: 161–172.
Article Google Scholar
Cumming, G., and Finch, S. (2001). A primer on the understanding, use and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement 61: 532–574.
Article Google Scholar
Elmore, P., and Rotou, O. (2001, April). A primer on basic effect size concepts. Paper presented at the annual meeting of the American Educational Research Association, Seattle (ERIC Document Reproduction Service No. ED 453 260).
Google Scholar
Ezekiel, M. (1930). Methods of Correlational Analysis. New York: Wiley.
Google Scholar
Fidler, F. (2002). The fifth edition of the APA Publication Manual: Why its statistics recommendations are so controversial. Educational and Psychological Measurement 62: 749–770.
Article Google Scholar
Finch, S., Cumming, G., and Thomason, N. (2001). Reporting of statistical inference in the Journal of Applied Psychology: Little evidence of reform. Educational and Psychological Measurement 61: 181–210.
Google Scholar
Fleishman, A.I. (1980). Confidence intervals for correlation ratios. Educational and Psychological Measurement 40: 659–670.
Google Scholar
Friedman, H. (1968). Magnitude of experimental effect and a table for its rapid estimation. Psychological Bulletin 70: 245–251.
Google Scholar
Glantz, S.A. (1980). Biostatistics: How to detect, correct and prevent errors in the medical literature. Circulation 61: 1–7.
Google Scholar
Glass, G. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher 5(10): 3–8.
Google Scholar
Gregg, M., and Leinhardt, G. (2002). Learning from the Birmingham Civil Rights Institute: Documenting teacher development. American Educational Research Journal 39: 553–587.
Google Scholar
Harris, M.J. (1991). Significance tests are not enough: The role of effect-size estimation in theory corroboration. Theory & Psychology 1: 375–382.
Google Scholar
Herzberg, P.A. (1969). The parameters of cross-validation. Psychometrika Monograph Supplement 16: 1–67.
Google Scholar
Hess, B., Olejnik, S., and Huberty, C.J (2001). The efficacy of two Improvement-over-chance effect sizes for two-group univariate comparisons under variance heterogeneity and non-normality. Educational and Psychological Measurement 61: 909–936.
Article Google Scholar
Huberty, C.J. (1999). On some history regarding statistical testing. In B. Thompson (ed.), Advances in Social Science Methodology (Vol. 5, pp. 1–23). Stamford, CT: JAI Press.
Google Scholar
Huberty, C.J. (2002). A history of effect size indices. Educational and Psychological Measurement 62: 227–240.
Article Google Scholar
Huberty, C.J., and Holmes, S.E. (1983). Two-group comparisons and univariate classification. Educational and Psychological Measurement 43: 15–26.
Google Scholar
Huberty, C.J., and Lowman, L.L. (2000). Group overlap as a basis for effect size. Educational and Psychological Measurement 60: 543–563.
Article Google Scholar
Huberty, C.J., and Morris, J.D. (1988). A single contrast test procedure. Educational and Psychological Measurement 48: 567–578.
Google Scholar
Hunter, J.E. (1997). Needed: A ban on the significance test. Psychological Science 8(1): 3–7.
Google Scholar
Jacobson, N.S., Roberts, L.J., Berns, S.B., and McGlinchey, J.B (1999). Methods for defining and determining the clinical significance of treatment effects: Description, application, and alternatives. Journal of Consulting and Clinical Psychology 67: 300–307.
Article Google Scholar
Kazdin, A.E. (1999). The meanings and measurement of clinical significance. Journal of Consulting and Clinical Psychology 67: 332–339.
Article Google Scholar
Kendall, P.C. (1999). Clinical significance. Journal of Consulting and Clinical Psychology 67: 283–284.
Article Google Scholar
Kieffer, K.M., Reese, R.J., and Thompson, B. (2001). Statistical techniques employed in AERJ and JCP articles from 1988 to 1997: A methodological review. Journal of Experimental Education 69: 280–309.
Google Scholar
Kirk, R.E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement 56: 746–759.
Google Scholar
Knapp, T.R. (1978). Canonical correlation analysis: A general parametric significance testing system. Psychological Bulletin 85: 410–416.
Google Scholar
Kromrey, J.D., and Hines, C.V. (1996). Estimating the coefficient of cross-validity in multiple regression: A comparison of analytical and empirical methods. Journal of Experimental Education 64: 240–266.
Google Scholar
Kupersmid, J. (1988). Improving what is published: A model in search of an editor. American Psychologist 43: 635–642.
Google Scholar
Loftus, G.R. (1994, August). Why psychology will never be a real science until we change the way we analyze data. Paper presented at the annual meeting of the American Psychological Association, Los Angeles.
Google Scholar
Lord, F.M. (1950). Efficiency of Prediction when a Regression Equation from One Sample is Used in a New Sample (Research Bulletin 50-110). Princeton, NJ: Educational Testing Service.
Google Scholar
Mittag, K.C., and Thompson, B. (2000). A national survey of AERA members’ perceptions of statistical significance tests and other statistical issues. Educational Researcher 29(4): 14–20.
Google Scholar
Murray, L.W., and Dosser, D.A. (1987). How significant is a significant difference? Problems with the measurement of magnitude of effect. Journal of Counseling Psychology 34: 68–72.
Article Google Scholar
Nelson, N., Rosenthal, R., and Rosnow, R.L. (1986). Interpretation of significance levels and effect sizes by psychological researchers. American Psychologist 41: 1299–1301.
Article Google Scholar
Oakes, M. (1986). Statistical Inference: A Commentary for the Social and Behavioral Sciences. New York: Wiley.
Google Scholar
O’Grady, K.E. (1982). Measures of explained variance: Cautions and limitations. Psychological Bulletin 92: 766–777.
Google Scholar
Olejnik, S., and Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology 25: 241–286.
Article Google Scholar
Roberts, J.K., and Henson, R.K. (2002). Correction for bias in estimating effect sizes. Educational and Psychological Measurement 62: 241–253.
Article Google Scholar
Robinson, D.H., and Wainer, H. (2002). On the past and future of null hypothesis significance testing. Journal of Wildlife Management 66: 263–271.
Google Scholar
Rosenthal, R., and Gaito, J. (1963). The interpretation of level of significance by psychological researchers. Journal of Psychology 55: 33–38.
Google Scholar
Rosnow, R.L., and Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist 44: 1276–1284.
Article Google Scholar
Saunders, S.M., Howard, K.I., and Newman, F.L. (1988). Evaluating the clinical-significance of treatment effects — norms and normality. Behavioral Assessment 10: 207–218.
Google Scholar
Schmidt, F. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods 1: 115–129.
Google Scholar
Sedlmeier, P., and Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin 105: 309–316.
Article Google Scholar
Shaver, J. (1985). Chance and nonsense. Phi Delta Kappan 67(1): 57–60.
Google Scholar
Smithson, M. (2001). Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psychological Measurement 61: 605–632.
Article Google Scholar
Snyder, P. (2000). Guidelines for reporting results of group quantitative investigations. Journal of Early Intervention 23: 145–150.
Google Scholar
Snyder, P., and Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. Journal of Experimental Education 61: 334–349.
Google Scholar
Steiger, J.H., and Fouladi, R.T. (1992). R²: A computer program for interval estimation, power calculation, and hypothesis testing for the squared multiple correlation. Behavior Research Methods, Instruments, and Computers 4: 581–582.
Google Scholar
Stevens, J. (1992). Applied Multivariate Statistics for the Social Sciences (2nd edn.). Hillsdale, NJ: Erlbaum.
Google Scholar
Thompson, B. (1992). Two and one-half decades of leadership in measurement and evaluation. Journal of Counseling and Development 70: 434–438.
Google Scholar
Thompson, B. (1993). The use of statistical significance tests in research: Bootstrap and other alternatives. Journal of Experimental Education 61: 361–377.
Google Scholar
Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher 25(2): 26–30.
Google Scholar
Thompson, B. (1998a). In praise of brilliance: Where that praise really belongs. American Psychologist 53: 799–800.
Google Scholar
Thompson, B. (1998b). Review of What if there were no significance tests? Educational and Psychological Measurement 58: 332–344.
Google Scholar
Thompson, B. (1999a). If statistical significance tests are broken/misused, what practices should supplement or replace them? Theory & Psychology 9: 167–183.
Google Scholar
Thompson, B. (1999b). Journal editorial policies regarding statistical significance tests: Heat is to fire as p is to importance. Educational Psychology Review 11: 157–169.
Google Scholar
Thompson, B. (2000a). Canonical correlation analysis. In L. Grimm, and P. Yarnold (eds.), Reading and Understanding More Multivariate Statistics (pp. 285–316). Washington, DC: American Psychological Association.
Google Scholar
Thompson, B. (2000b). Ten commandments of structural equation modeling. In L. Grimm, and P. Yarnold (eds.), Reading and Understanding More Multivariate Statistics (pp. 261–284). Washington, DC: American Psychological Association.
Google Scholar
Thompson, B. (2001). Significance, effect sizes, stepwise methods, and other issues: Strong arguments move the field. Journal of Experimental Education 70: 80–93.
Google Scholar
Thompson, B. (ed.) (2002a). Score Reliability: Contemporary Thinking on Reliability Issues. Newbury Park, CA: Sage.
Google Scholar
Thompson, B. (2002b). “Statistical,” “practical,” and “clinical”: How many kinds of significance do counselors need to consider? Journal of Counseling and Development 80: 64–71.
Google Scholar
Thompson, B. (2002c). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher 31(3), 24–31.
Google Scholar
Thompson, B., and Kieffer, K.M. (2000). Interpreting statistical significance test results: A proposed new “What if” method. Research in the Schools 7(2): 3–10.
Google Scholar
Thompson, B., and Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement 60: 174–195.
Article Google Scholar
Trusty, J., Thompson, B., and Petrocelli, J.V. (2004). Practical guide to implementing the requirement of reporting effect size in quantitative research in the Journal of Counseling & Development. Journal of Counseling and Development.
Google Scholar
Tryon, W.W. (1998). The inscrutable null hypothesis. American Psychologist 53: 796.
Google Scholar
Vacha-Haase, T., Nilsson, J.E., Reetz, D.R., Lance, T.S., and Thompson, B. (2000). Reporting practices and APA editorial policies regarding statistical significance and effect size. Theory & Psychology 10: 413–425.
Google Scholar
Wilkinson, L., and APA Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist 54: 594–604 (reprint available through the APA Home Page: http://www.apa.org/journals/amp/amp548594.html.
Article Google Scholar
Zuckerman, M., Hodgins, H.S., Zuckerman, A., and Rosenthal, R. (1993). Contemporary issues in the analysis of data: A survey of 551 psychologists. Psychological Science 4: 49–53.
Google Scholar

Download references

Author information

Authors and Affiliations

Texas A&M University, USA
Crystal Reneé Hill
LSU Health Science Center, USA
Bruce Thompson

Authors

Crystal Reneé Hill
View author publications
You can also search for this author in PubMed Google Scholar
Bruce Thompson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Memphis, USA
John C. Smart

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hill, C.R., Thompson, B. (2004). Computing and Interpreting Effect Sizes. In: Smart, J.C. (eds) Higher Education: Handbook of Theory and Research. Higher Education: Handbook of Theory and Research, vol 19. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2456-8_5

Download citation

DOI: https://doi.org/10.1007/1-4020-2456-8_5
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-1919-7
Online ISBN: 978-1-4020-2456-6
eBook Packages: Humanities, Social Sciences and LawEducation (R0)

Publish with us

Policies and ethics