Summary
Effect sizes will be routinely reported only once editors promulgate policies that make these practices normatively expected. As Sedlmeier and Gigerenzer (1989) argued, “there is only one force that can effect a change, and that is the same force that helped institutionalize null hypothesis testing as the sine qua non for publication, namely, the editors of the major journals” (p. 315). Glantz (1980) agreed, noting that “The journals are the major force for quality control in scientific work” (p. 3).
The fact that 23 journals, including two major journals of large professional associations, now require effect size reporting bodes well for improved practices. As Fidler (2002) recently observed in her penetrating essay, “Of the major American associations, only all the journals of the American Educational Research Association have remained silent on all these issues” (p. 754).
As Thompson (1999a) noted, “It is doubtful that the field will ever settle on a single index to be used in all studies, given that so many choices exist and because the statistics can be translated into approximations across the two major classes” (p. 171). But three practices should be expected:
-
1.
report effect sizes for all primary outcomes, even if they are not statistically significant (see Thompson, 2002c);
-
2.
interpret effect sizes by explicit and direct comparison with the effects in related prior studies (Thompson, 2002c; Wilkinson and APA Task Force on Statistical Inference, 1999); and
-
3.
compute CIs for results, including effect sizes (cf. Cumming and Finch, 2001), and when a reasonably large number of effects for current and prior studies are available, consider using graphics to facilitate comparisons (Wilkinson and APA Task Force on Statistical Inference, 1999).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aaron, B., Kromrey, J.D., and Ferron, J.M. (November, 1998). Equating r-based and d-based effect size indices: problems with a commonly recommended formula. Paper presented at the annual meeting of the Florida Educational Research Association, Orlando, FL (ERIC Document Reproduction Service No. ED 433 353).
Abelson, R.P. (1997). A retrospective on the significance test ban of 1999 (If there were no significance tests, they would be invented). In L.L. Harlow, S.A. Mulaik, and J.H. Steiger (eds.), What if There Were no Significance Tests? (pp. 117–141). Mahwah, NJ: Erlbaum.
American Psychological Association. (1994). Publication Manual of the American Psychological Association (4th edn.). Washington, DC: Author.
American Psychological Association. (2001). Publication Manual of the American Psychological Association (5th edn.). Washington, DC: Author.
Bagozzi, R.P., Fornell, C., and Larcker, D.F. (1981). Canonical correlation analysis as a special case of a structural relations model. Multivariate Behavioral Research 16: 437–454.
Baugh, F. (2002). Correcting effect sizes for score reliability: A reminder that measurement and substantive issues are linked inextricably. Educational and Psychological Measurement 62: 254–263.
Baugh, F., and Thompson, B. (2001). Using effect sizes in social science research: New APA and journal mandates for improved methodology practices. Journal of Research in Education 11(1): 120–129.
Boring, E.G. (1919). Mathematical vs. scientific importance. Psychological Bulletin 16: 335–338.
Carver, R. (1978). The case against statistical significance testing. Harvard Educational Review 48: 378–399.
Cohen, J. (1968). Multiple regression as a general data-analytic system. Psychological Bulletin 70: 426–443.
Cohen, J. (1969). Statistical Power Analysis for the Behavioral Sciences. New York: Academic Press.
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd edn.), Hillside, NJ: Erlbaum.
Cohen, J. (1994). The earth is round (p<.05). American Psychologist 49: 997–1003.
Cortina, J.M., and Dunlap, W.P. (1997). Logic and purpose of significance testing. Psychological Methods 2: 161–172.
Cumming, G., and Finch, S. (2001). A primer on the understanding, use and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement 61: 532–574.
Elmore, P., and Rotou, O. (2001, April). A primer on basic effect size concepts. Paper presented at the annual meeting of the American Educational Research Association, Seattle (ERIC Document Reproduction Service No. ED 453 260).
Ezekiel, M. (1930). Methods of Correlational Analysis. New York: Wiley.
Fidler, F. (2002). The fifth edition of the APA Publication Manual: Why its statistics recommendations are so controversial. Educational and Psychological Measurement 62: 749–770.
Finch, S., Cumming, G., and Thomason, N. (2001). Reporting of statistical inference in the Journal of Applied Psychology: Little evidence of reform. Educational and Psychological Measurement 61: 181–210.
Fleishman, A.I. (1980). Confidence intervals for correlation ratios. Educational and Psychological Measurement 40: 659–670.
Friedman, H. (1968). Magnitude of experimental effect and a table for its rapid estimation. Psychological Bulletin 70: 245–251.
Glantz, S.A. (1980). Biostatistics: How to detect, correct and prevent errors in the medical literature. Circulation 61: 1–7.
Glass, G. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher 5(10): 3–8.
Gregg, M., and Leinhardt, G. (2002). Learning from the Birmingham Civil Rights Institute: Documenting teacher development. American Educational Research Journal 39: 553–587.
Harris, M.J. (1991). Significance tests are not enough: The role of effect-size estimation in theory corroboration. Theory & Psychology 1: 375–382.
Herzberg, P.A. (1969). The parameters of cross-validation. Psychometrika Monograph Supplement 16: 1–67.
Hess, B., Olejnik, S., and Huberty, C.J (2001). The efficacy of two Improvement-over-chance effect sizes for two-group univariate comparisons under variance heterogeneity and non-normality. Educational and Psychological Measurement 61: 909–936.
Huberty, C.J. (1999). On some history regarding statistical testing. In B. Thompson (ed.), Advances in Social Science Methodology (Vol. 5, pp. 1–23). Stamford, CT: JAI Press.
Huberty, C.J. (2002). A history of effect size indices. Educational and Psychological Measurement 62: 227–240.
Huberty, C.J., and Holmes, S.E. (1983). Two-group comparisons and univariate classification. Educational and Psychological Measurement 43: 15–26.
Huberty, C.J., and Lowman, L.L. (2000). Group overlap as a basis for effect size. Educational and Psychological Measurement 60: 543–563.
Huberty, C.J., and Morris, J.D. (1988). A single contrast test procedure. Educational and Psychological Measurement 48: 567–578.
Hunter, J.E. (1997). Needed: A ban on the significance test. Psychological Science 8(1): 3–7.
Jacobson, N.S., Roberts, L.J., Berns, S.B., and McGlinchey, J.B (1999). Methods for defining and determining the clinical significance of treatment effects: Description, application, and alternatives. Journal of Consulting and Clinical Psychology 67: 300–307.
Kazdin, A.E. (1999). The meanings and measurement of clinical significance. Journal of Consulting and Clinical Psychology 67: 332–339.
Kendall, P.C. (1999). Clinical significance. Journal of Consulting and Clinical Psychology 67: 283–284.
Kieffer, K.M., Reese, R.J., and Thompson, B. (2001). Statistical techniques employed in AERJ and JCP articles from 1988 to 1997: A methodological review. Journal of Experimental Education 69: 280–309.
Kirk, R.E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement 56: 746–759.
Knapp, T.R. (1978). Canonical correlation analysis: A general parametric significance testing system. Psychological Bulletin 85: 410–416.
Kromrey, J.D., and Hines, C.V. (1996). Estimating the coefficient of cross-validity in multiple regression: A comparison of analytical and empirical methods. Journal of Experimental Education 64: 240–266.
Kupersmid, J. (1988). Improving what is published: A model in search of an editor. American Psychologist 43: 635–642.
Loftus, G.R. (1994, August). Why psychology will never be a real science until we change the way we analyze data. Paper presented at the annual meeting of the American Psychological Association, Los Angeles.
Lord, F.M. (1950). Efficiency of Prediction when a Regression Equation from One Sample is Used in a New Sample (Research Bulletin 50-110). Princeton, NJ: Educational Testing Service.
Mittag, K.C., and Thompson, B. (2000). A national survey of AERA members’ perceptions of statistical significance tests and other statistical issues. Educational Researcher 29(4): 14–20.
Murray, L.W., and Dosser, D.A. (1987). How significant is a significant difference? Problems with the measurement of magnitude of effect. Journal of Counseling Psychology 34: 68–72.
Nelson, N., Rosenthal, R., and Rosnow, R.L. (1986). Interpretation of significance levels and effect sizes by psychological researchers. American Psychologist 41: 1299–1301.
Oakes, M. (1986). Statistical Inference: A Commentary for the Social and Behavioral Sciences. New York: Wiley.
O’Grady, K.E. (1982). Measures of explained variance: Cautions and limitations. Psychological Bulletin 92: 766–777.
Olejnik, S., and Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology 25: 241–286.
Roberts, J.K., and Henson, R.K. (2002). Correction for bias in estimating effect sizes. Educational and Psychological Measurement 62: 241–253.
Robinson, D.H., and Wainer, H. (2002). On the past and future of null hypothesis significance testing. Journal of Wildlife Management 66: 263–271.
Rosenthal, R., and Gaito, J. (1963). The interpretation of level of significance by psychological researchers. Journal of Psychology 55: 33–38.
Rosnow, R.L., and Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist 44: 1276–1284.
Saunders, S.M., Howard, K.I., and Newman, F.L. (1988). Evaluating the clinical-significance of treatment effects — norms and normality. Behavioral Assessment 10: 207–218.
Schmidt, F. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods 1: 115–129.
Sedlmeier, P., and Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin 105: 309–316.
Shaver, J. (1985). Chance and nonsense. Phi Delta Kappan 67(1): 57–60.
Smithson, M. (2001). Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psychological Measurement 61: 605–632.
Snyder, P. (2000). Guidelines for reporting results of group quantitative investigations. Journal of Early Intervention 23: 145–150.
Snyder, P., and Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. Journal of Experimental Education 61: 334–349.
Steiger, J.H., and Fouladi, R.T. (1992). R2: A computer program for interval estimation, power calculation, and hypothesis testing for the squared multiple correlation. Behavior Research Methods, Instruments, and Computers 4: 581–582.
Stevens, J. (1992). Applied Multivariate Statistics for the Social Sciences (2nd edn.). Hillsdale, NJ: Erlbaum.
Thompson, B. (1992). Two and one-half decades of leadership in measurement and evaluation. Journal of Counseling and Development 70: 434–438.
Thompson, B. (1993). The use of statistical significance tests in research: Bootstrap and other alternatives. Journal of Experimental Education 61: 361–377.
Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher 25(2): 26–30.
Thompson, B. (1998a). In praise of brilliance: Where that praise really belongs. American Psychologist 53: 799–800.
Thompson, B. (1998b). Review of What if there were no significance tests? Educational and Psychological Measurement 58: 332–344.
Thompson, B. (1999a). If statistical significance tests are broken/misused, what practices should supplement or replace them? Theory & Psychology 9: 167–183.
Thompson, B. (1999b). Journal editorial policies regarding statistical significance tests: Heat is to fire as p is to importance. Educational Psychology Review 11: 157–169.
Thompson, B. (2000a). Canonical correlation analysis. In L. Grimm, and P. Yarnold (eds.), Reading and Understanding More Multivariate Statistics (pp. 285–316). Washington, DC: American Psychological Association.
Thompson, B. (2000b). Ten commandments of structural equation modeling. In L. Grimm, and P. Yarnold (eds.), Reading and Understanding More Multivariate Statistics (pp. 261–284). Washington, DC: American Psychological Association.
Thompson, B. (2001). Significance, effect sizes, stepwise methods, and other issues: Strong arguments move the field. Journal of Experimental Education 70: 80–93.
Thompson, B. (ed.) (2002a). Score Reliability: Contemporary Thinking on Reliability Issues. Newbury Park, CA: Sage.
Thompson, B. (2002b). “Statistical,” “practical,” and “clinical”: How many kinds of significance do counselors need to consider? Journal of Counseling and Development 80: 64–71.
Thompson, B. (2002c). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher 31(3), 24–31.
Thompson, B., and Kieffer, K.M. (2000). Interpreting statistical significance test results: A proposed new “What if” method. Research in the Schools 7(2): 3–10.
Thompson, B., and Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement 60: 174–195.
Trusty, J., Thompson, B., and Petrocelli, J.V. (2004). Practical guide to implementing the requirement of reporting effect size in quantitative research in the Journal of Counseling & Development. Journal of Counseling and Development.
Tryon, W.W. (1998). The inscrutable null hypothesis. American Psychologist 53: 796.
Vacha-Haase, T., Nilsson, J.E., Reetz, D.R., Lance, T.S., and Thompson, B. (2000). Reporting practices and APA editorial policies regarding statistical significance and effect size. Theory & Psychology 10: 413–425.
Wilkinson, L., and APA Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist 54: 594–604 (reprint available through the APA Home Page: http://www.apa.org/journals/amp/amp548594.html.
Zuckerman, M., Hodgins, H.S., Zuckerman, A., and Rosenthal, R. (1993). Contemporary issues in the analysis of data: A survey of 551 psychologists. Psychological Science 4: 49–53.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Kluwer Academic Publishers
About this chapter
Cite this chapter
Hill, C.R., Thompson, B. (2004). Computing and Interpreting Effect Sizes. In: Smart, J.C. (eds) Higher Education: Handbook of Theory and Research. Higher Education: Handbook of Theory and Research, vol 19. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2456-8_5
Download citation
DOI: https://doi.org/10.1007/1-4020-2456-8_5
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-1919-7
Online ISBN: 978-1-4020-2456-6
eBook Packages: Humanities, Social Sciences and LawEducation (R0)