Advertisement

Educational Psychology Review

, Volume 25, Issue 2, pp 157–209 | Cite as

The Impact of APA and AERA Guidelines on Effect Size Reporting

  • Chao-Ying Joanne PengEmail author
  • Li-Ting Chen
  • Hsu-Min Chiang
  • Yi-Chen Chiang
Research into Practice

Abstract

Given the long history of effect size (ES) indices (Olejnik and Algina Contemporary Educational Psychology, 25, 241–286 2000) and various attempts by APA and AERA to encourage the reporting and interpretation of ES to supplement findings from inferential statistical analyses, it is essential to document the impact of APA and AERA standards on ES reporting practices. In this paper, we investigated the impact by examining findings from 31 published reviews and our own review of 451 articles published in 2009 and 2010. The 32 reviews were divided into two periods: before and after 1999. A total of 116 journals were reviewed. Findings from these 32 reviews revealed that since 1999, the ES reporting has improved in terms of its rate, variety, interpretation, confidence intervals, and fullness. Yet several inadequate practices still persisted: (1) the dominance of Cohen’s d, and the unadjusted R 22, (2) the mere labeling of ES, (3) the under-reporting of confidence intervals, and (4) a lack of integration between ES and statistical tests. The paper concludes with resources on Internet and recommendations for improving ES reporting practices.

Keywords

Effect size Impact Statistical test R2 Cohen’s d η2 Meta-analysis Review 

References

  1. Algina, J., & Keselman, H. J. (2003). Approximate confidence intervals for effect sizes. Educational and Psychological Measurement, 68, 233–244. doi: 10.1177/0013164403256358.CrossRefGoogle Scholar
  2. Algina, J., Keselman, H. J., & Penfield, R. D. (2005). An alternative to Cohen's standardized mean difference effect size: a robust parameter and confidence interval in the two independent groups case. Psychological Methods, 10, 17–328. doi: 10.1037/1082-989X.10.3.317.CrossRefGoogle Scholar
  3. Algina, J., Keselman, H. J., & Penfield, R. D. (2006). Confidence intervals for an effect size when variances are not equal. Journal of Modern Applied Statistical Methods, 5, 2–13. Retrieved from http://www.jmasm.com.
  4. Alhija, F. N.-A., & Levy, A. (2009). Effect size reporting practices in published articles. Educational and Psychological Measurement, 69, 245–265. doi: 10.1177/0013164408315266.CrossRefGoogle Scholar
  5. American Educational Research Association. (2006). Standards for reporting on empirical social science research in AERA publications. Educational Researcher, 35(6), 33–40. doi: 10.3102/0013189X035006033.CrossRefGoogle Scholar
  6. American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: American Psychological Association.Google Scholar
  7. American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: American Psychological Association.Google Scholar
  8. Andersen, M. B., McCullagh, P., & Wilson, G. J. (2007). But what do the numbers really tell us?: arbitrary metrics and effect size reporting in sport psychology research. Journal of Sport & Exercise Psychology, 29, 664–672. Retrieved from http://journals.humankinetics.com/jsep.Google Scholar
  9. APA Publications and Communications Board Working Group on Journal Article Reporting Standards. (2008). Reporting standards for research in psychology: why do we need them? What might they be? American Psychologist, 63, 839–851. doi: 10.1037/0003-066X.63.9.839.Google Scholar
  10. Armstrong, S. A., & Henson, R. K. (2004). Statistical and practical significance in the IJPT: a research review from 1993–2003. International Journal of Play Therapy, 13(2), 9–30. doi: 10.1037/h0088888.CrossRefGoogle Scholar
  11. Bonett, D. G. (2008). Confidence intervals for standardized linear contrasts of means. Psychological Methods, 13, 99–109. doi: 10.1037/1082-989X.13.2.99.CrossRefGoogle Scholar
  12. Byrd, J. K. (2007). A call for statistical reform in EAQ. Educational Administration Quarterly, 43, 381–391. doi: 10.1177/0013161X06297137.CrossRefGoogle Scholar
  13. Camp, C. J., & Maxwell, S. E. (1983). A comparison of various strength of association measures commonly used in gerontological research. Journal of Gerontology, 38, 3–7.CrossRefGoogle Scholar
  14. Carroll, R. M., & Nordholm, L. A. (1975). Sampling characteristics of Kelly's ε2 and Hay's w 2. Educational and Psychological Measurement, 35, 541–554. doi: 10.1177/001316447503500304.CrossRefGoogle Scholar
  15. Cliff, N. (1993). Dominance statistics: ordinal analyses to answer ordinal questions. Psychological Bulletin, 114, 494–509. doi: 10.1037/0033-2909.114.3.494.CrossRefGoogle Scholar
  16. Cliff, N. (1996). Answering ordinal questions with ordinal data using ordinal statistics. Multivariate Behavioral Research, 31, 331–350. doi: 10.1207/s15327906mbr3103_4.CrossRefGoogle Scholar
  17. Cochran-Smith, M., & Zeichner, K. M. (Eds.). (2005). Studying teacher education: the report of the AERA Panel on Research and Teacher Education. Mahwah, NJ: Lawrence Erlbaum.Google Scholar
  18. Cohen, J. (1965). Some statistical issues in psychological research. In B. B. Wolman (Ed.), Handbook of clinical psychology (pp. 95–121). New York: McGraw-Hill.Google Scholar
  19. Cohen, J. (1969). Statistical power analysis in the behavioral sciences. New York: Academic Press.Google Scholar
  20. Cohen, J. (1988). Statistical power analysis in the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  21. Cohen, J., & Cohen, P. (1975). Applied multiple regression/correlation analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  22. Crosnoe, R., & Cooper, C. E. (2010). Economically disadvantaged children's transitions into elementary school: linking family processes, school contexts, and educational policy. American Educational Research Journal, 47(2), 258–291. doi: 10.3102/0002831209351564.CrossRefGoogle Scholar
  23. Delaney, H. D., & Vargha, A. (2002). Comparing several robust tests of stochastic equality with ordinally scaled variables and small to moderate sized samples. Psychological Methods, 7(4), 485–503. doi: 10.1037///1082-989X.7.4.485.CrossRefGoogle Scholar
  24. Dunlap, W. P. (1999). A program to compute McGraw and Wong's common language effect size indicator. Behavior Research Methods, Instruments, & Computers, 31, 706–709. doi: 10.3758/BF03200750.CrossRefGoogle Scholar
  25. Dunleavy, E. M., Barr, C. D., Glenn, D. M., & Miller, K. R. (2006). Effect size reporting in applied psychology: how are we doing? The Industrial-Organizational Psychologist, 43(4), 29–37. Retrieved from http://www.openj-gate.com/browse/Archive.aspx?year=2009&Journal_id=102632.Google Scholar
  26. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.Google Scholar
  27. Fidler, F., Cumming, G., Thomason, N., Pannuzzo, D., Smith, J., Fyffe, P., Schmitt, R. (2005). Evaluating the effectiveness of editorial policy to improve statistical practice: the case of the Journal of Consulting and Clinical Psychology. Journal of Consulting and Clinical Psychology, 73, 136–143. doi: 10.1037/0022-006X.73.1.136.Google Scholar
  28. Fox, C. L., & Boulton, M. J. (2003). Evaluating the effectiveness of a social skills training (SST) program for victims of bullying. Educational Research, 64, 231–247. doi: 10.1080/0013188032000137238.CrossRefGoogle Scholar
  29. Friedman, H. (1968). Magnitude of experimental effect and a table for its rapid estimation. Psychological Bulletin, 70, 245-251. doi: 10.1037/h0026258.
  30. Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141, 2–18. doi: 10.1037/a0024338.
  31. Garrison, A. M., & Kahn, J. H. (2010). Intraindividual relations between the intensity and disclosure of daily emotional events: the moderating role of depressive symptoms. Journal of Counseling Psychology, 57(2), 187–197. doi: 10.1037/a0018386.CrossRefGoogle Scholar
  32. Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8. doi: 10.3102/0013189X005010003.CrossRefGoogle Scholar
  33. Grissom, R. J., & Kim, J. J. (2001). Review of assumptions and problems in the appropriate conceptualization of effect size. Psychological Methods, 6, 135–146. doi: 10.1037/1082-989x.6.2.135.CrossRefGoogle Scholar
  34. Grissom, R. J., & Kim, J. J. (2012). Effect sizes for research: univariate and multivariate applications (2nd ed.). New York: Routledge.Google Scholar
  35. Harrison, J., Thompson, B., & Vannest, K. J. (2009). Interpreting the evidence for effective interventions to increase the academic performance of students with ADHD: relevance of the statistical significance controversy. Review of Educational Research, 79, 740–775. doi: 10.3102/0034654309331516.CrossRefGoogle Scholar
  36. Hays, W. L. (1963). Statistics for psychologists. New York: Holt, Rinehart & Winston.Google Scholar
  37. Hedges, L. V. (1981). Distributional theory for Glass’s estimator of effect size and related estimators. Journal of Educational Statistics, 6, 107–128. doi: 10.2307/1164588.CrossRefGoogle Scholar
  38. Hedges, L. V. (1982). Estimation of effect size from a series of independent experiments. Psychological Bulletin, 92, 490–499. doi: 10.1037/0033-2909.92.2.490.CrossRefGoogle Scholar
  39. Hedges, L. V., & Olkin, I. (1984). Nonparametric estimators of effect size in meta-analysis. Psychological Bulletin, 96, 573–580. doi: 10.1037/0033-2909.96.3.573.CrossRefGoogle Scholar
  40. Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press.Google Scholar
  41. Hess, M. R., & Kromrey, J. D. (2004). Robust confidence intervals for effect sizes: A comparative study of Cohen's d and Cliff's delta under non-normality and heterogeneous variances. Paper presented at the American Educational Research Association, San Diego.Google Scholar
  42. Hogarty, K. Y., & Kromrey, J. D. (April, 2001). We've been reporting some effect sizes: Can you guess what they mean? Paper presented at the annual meeting of the American Educational Research Association, Seattle, WA.Google Scholar
  43. Hsieh, P., Acee, T., Chung, W.-H., Hsieh, Y.-P., Kim, H., Thomas, G. D., Robinson, D. H. (2005). Is educational intervention research on the decline? Journal of Educational Psychology, 97, 523–529. doi: 10.1037/0022-0663.97.4.523.Google Scholar
  44. Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: correcting error and bias in research findings. Thousand Oaks, CA: SAGE Publications.Google Scholar
  45. Jitendra, A. K., Griffin, C. C., Haria, P., Leh, J., Adams, A., & Kaduvettoor, A. (2007). A comparison of single and multiple strategy instruction on third-grade students’ mathematical problem solving. Journal of Educational Psychology, 99, 115–127. doi: 10.1037/0022-0663.99.1.115.CrossRefGoogle Scholar
  46. Kelly, K. (2005). The effects of non nomral distributions on confidence intervals around the standardized mean difference: bootstrap and parametric confidence intervals. Educational and Psychological Measurement, 51–69. doi: 10.1177/0013164404264850.
  47. Keppel, G. (1973). Design and analysis: a researcher's handbook. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
  48. Keselman, H. J., Algina, J., Lix, L. M., Wilcox, R. R., & Deering, K. N. (2008). A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes. Psychological Methods, 13, 110–129. doi: 10.1037/1082-989x.13.2.110.CrossRefGoogle Scholar
  49. Keselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donahue, B., Levin, J. R. (1998). Statistical practices of educational researchers: an analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68, 350–386. doi: 10.3102/00346543068003350.
  50. Kieffer, K. M., Reese, R. J., & Thompson, B. (2001). Statistical techniques employed in AERJ and JCP articles from 1998 to 1997: a methodological review. The Journal of Experimental Education, 69, 280–309. doi: 10.1080/00220970109599489.CrossRefGoogle Scholar
  51. Kirk, R. E. (1996). Practical significance: a concept whose time has come. Educational and Psychological Measurement, 56, 746–759. doi: 10.1177/0013164496056005002.
  52. Kraemer, H. C., & Andrews, G. (1982). A nonparametric technique for meta-analysis effect size calculation. Psychological Bulletin, 91, 404–412.CrossRefGoogle Scholar
  53. Kraemer, H. C., & Kupfer, D. J. (2006). Size of treatment effects and their importance to clinical research and practice. Biological Psychiatry, 59(11), 990–996. doi: 10.1016/j.biopsych.2005.09.014.CrossRefGoogle Scholar
  54. Kromrey, J. D., & Coughlin, K. B. (2007, November). ROBUST_ES: a SAS macro for computing robust estimates of effect size. Paper presented at the annual meeting of the SouthEast SAS Users Group, Hilton Head, SC. Retrieved from http://analytics.ncsu.edu/sesug/2007/PO19.pdf.
  55. Lipsey, M. W., & Wilson, D. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.Google Scholar
  56. Lipsey, M. W., Puzio, K., Yun, C., Hebert, M. A., Steinka-Fry, K., Cole, M. W., Roberts, M., Anthony, K. S., & Busick, M. D. (2012). Translating the statistical representation of the effects of education interventions into more readily interpretable forms. (NCSER 2013–3000). Washington, DC: National Center for Special Education Research, Institute of Education Sciences, US Department of Education.Google Scholar
  57. MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130–149. doi: 10.1037/1082-989X.1.2.130.CrossRefGoogle Scholar
  58. Matthews, M. S., Gentry, M., McCoach, D. B., Worrell, F. C., Matthews, D., & Dixon, F. (2008). Evaluating the state of a field: effect size reporting in gifted education. The Journal of Experimental Education, 77(1), 55–65. doi: 10.3200/JEXE.77.1.55-68.Google Scholar
  59. Maxwell, S. E., Camp, C. J., & Arvey, R. D. (1981). Measures of strength of association: a comparative examination. Journal of Applied Psychology, 66, 525–534. doi: 10.1037/0021-9010.66.5.525.CrossRefGoogle Scholar
  60. McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: the case of r and d. Psychological Methods, 11, 386–401. doi: 10.1037/1082-989X.11.4.386.CrossRefGoogle Scholar
  61. McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111, 361-365. doi: 10.1037/0033-2909.111.2.361.
  62. Meline, T., & Schmitt, J. F. (1997). Case studies for evaluating significance in group designs. American Journal of Speech-Language Pathology, 6(1), 33–41. Retrieved from http://ajslp.asha.org/.Google Scholar
  63. Meline, T., & Wang, B. (2004). Effect reporting practices in AJSLP and other ASHA journals, 1999–2003. American Journal of Speech-Language Pathology, 13, 202–207. Retrieved from http://ajslp.asha.org/.
  64. Mohr, J. J., Weiner, J. L., Chopp, R. M., & Wong, S. J. (2009). Effects of clients bisexuality on clinical judgment: when is bias most likely to occur? Journal of Counseling Psychology, 56, 164–175. doi: 10.1037/a0012816.CrossRefGoogle Scholar
  65. Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philosophical Transactions of the Royal Society of London. Series A, 236, 333–380. Retrieved from http://rstl.royalsocietypublishing.org/.
  66. Odgaard, E. C., & Fowler, R. L. (2010). Confidence intervals for effect sizes: compliance and clinical significance in the Journal of Consulting and Clinical Psychology. Journal of Consulting and Clinical Psychology, 78, 287–297. doi: 10.1037/a0019294.CrossRefGoogle Scholar
  67. Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: applications, interpretations, and limitations. Contemporary Educational Psychology, 25, 241–286. doi: 10.1006/ceps.2000.1040.CrossRefGoogle Scholar
  68. Osborne, J. W. (2008). Sweating the small stuff in educational psychology: how effect size and power reporting failed to change from 1969 to 1999, and what that means for the future of changing practices. Educational Psychology, 28, 151–160. doi: 10.1080/01443410701491718.CrossRefGoogle Scholar
  69. Paul, K. M., & Plucker, J. A. (2004). Two steps forward, one step back: effect size reporting in gifted education research from 1995–2000. Roeper Review, 26(2), 68–72.CrossRefGoogle Scholar
  70. Pearson, K. (1905). Mathematical contributions to the theory of evolution: XIV. On the general theory of skew correlations and nonlinear regression (Draper’s Company Research Memoirs, Biometric Series II). London: DulauGoogle Scholar
  71. Peng, C.-Y. J., & Chen, L.-T. (2013). Beyond Cohen's d: alternative effect size measures for between subject designs. The Journal of Experimental Education (in press).Google Scholar
  72. Peng, C.-Y., Chen, L.-T., Chiang, H.-M., & Chiang, Y.-C. (2013). The impact of APA and AERA guidelines on effect size reporting. Educational Psychology Review. doi: 10.1007/s10648-013-9218-2.
  73. Plucker, J. A. (1997). Debunking the myth of the "highly significant" result: effect sizes in gifted education research. Roeper Review, 20, 122–126. doi: 10.1080/02783199709553873.CrossRefGoogle Scholar
  74. Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis. New York: Russell Sage Foundation.Google Scholar
  75. Ruscio, J. (2008). A probability-based measure of effect size: robustness to base rates and other factors. Psychological Methods, 13, 19–30. doi: 10.1037/1082-989X.13.1.19.CrossRefGoogle Scholar
  76. Schatz, P., Jay, K. A., McComb, J., & McLaughlin, J. R. (2005). Misuse of statistical tests in archives of clinical neuropsychology publications. Archives of Clinical Neuropsychology, 20, 1053–1059. doi: 10.1016/j.acn.2005.06.006.CrossRefGoogle Scholar
  77. Snyder, P., Thompson, B., McLean, M. E., & Smith, B. J. (2002). Examination of quantitative methods used in early intervention research: linkages with recommended practices. Journal of Early Intervention, 25, 137–150. doi: 10.1177/105381510202500211.CrossRefGoogle Scholar
  78. Smith, M. L., & Honoré, H. H. (2008). Effect size reporting in current health education literature. American Journal of Health Studies, 23, 130–135. http://www.va-ajhs.com/.Google Scholar
  79. Snyder, P. A., & Thompson, B. (1998). Use of tests of statistical significance and other analytic choices in a school psychology journal: review of practices and suggested alternatives. School Psychology Quarterly, 13, 335–348. doi: 10.1037/h0088990.CrossRefGoogle Scholar
  80. Staudte, R. G., & Sheather, S. J. (1990). Robust estimation and testing. New York: Wiley.Google Scholar
  81. Steiger, J. H. (2004). Beyond the F test: effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164–182. doi: 10.1037/1082-989X.9.2.164.CrossRefGoogle Scholar
  82. Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and the evaluation of statistical models. In L. Harlow, S. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 221–257). Hillsdale, NJ: Erlbaum.Google Scholar
  83. Sun, S. Y., Pan, W., & Wang, L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102, 989–1004. doi: 10.1037/a0019507.CrossRefGoogle Scholar
  84. Thompson, B. (1999). Improving research clarity and usefulness with effect size indices as supplements to statistical significance tests. Exceptional Children, 65, 329–337. http://journals.cec.sped.org/ec/.
  85. Thompson, B. (2002). What future quantitative social science research could look like: confidence intervals for effect sizes. Educational Researcher, 31(3), 25–32. doi: 10.3102/0013189X031003025.CrossRefGoogle Scholar
  86. Thompson, B. (2006). Foundations of behavioral statistics: an insight-based approach. New York: Guilford.Google Scholar
  87. Thompson, B., & Snyder, P. A. (1997). Statistical significance testing practices. The Journal of Experimental Education, 66, 75–83. doi: 10.1080/00220979709601396.CrossRefGoogle Scholar
  88. Thompson, B., & Snyder, P. A. (1998). Statistical significance and reliability analyses in recent Journal of Counseling & Development research articles. Journal of Counseling and Development, 76, 436–441.CrossRefGoogle Scholar
  89. Trusty, J., Thompson, B., & Petrocelli, J. V. (2004). Practical guide for reporting effect size in quantitative research in the Journal of Counseling & Development. Journal of Counseling and Development, 82, 107–110.CrossRefGoogle Scholar
  90. Vacha-Haase, T., & Ness, C. (1999). Statistical significance testing as it relates to practice: use within Professional Psychology. Professional Psychology: Research and Practice, 30, 104–105.CrossRefGoogle Scholar
  91. Vacha-Haase, T., & Nilsson, J. E. (1998). Statistical significance reporting: current trends and usages in MECD. Measurement and Evaluation in Counseling and Development, 31, 46–57. Retrieved from http://mec.sagepub.com.
  92. Vacha-Haase, T., Nilsson, J. E., Reetz, D. R., Lance, T. S., & Thompson, B. (2000). Reporting practices and APA editorial policies regarding statistical significance and effect size. Theory and Psychology, 10, 413–425. doi: 10.1177/0959354300103006.CrossRefGoogle Scholar
  93. Vansteenkiste, M., Sierens, E., Soenens, B., Luyckx, K., & Lens, W. (2009). Motivational profiles from a self-determination perspective: the quality of motivation matters. Journal of Educational Psychology, 101, 671–688. doi: 10.1037/a0015083.CrossRefGoogle Scholar
  94. Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25, 101–132. doi: 10.2307/1165329.Google Scholar
  95. Wilcox, R. R. (2005). Introduction to robust estimation and hypothesis testing (2nd ed.). San Diego, CA: Elsevier Academic Press.Google Scholar
  96. Wilkinson, L., & The Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594–604. doi: 10.1037/0003-066X.54.8.594.Google Scholar
  97. Yin, P., & Fan, X. (2001). Estimating R 2 shrinkage in multiple regression: a comparison of different analytical methods. The Journal of Experimental Education, 69, 203–224. doi: 10.1080/00220970109600656.
  98. Zientek, L. R., Capraro, M. M., & Capraro, R. M. (2008). Reporting practices in quantitative teacher education research: one look at the evidence cited in the AERA Panel Report. Educational Researcher, 37, 208–216. doi: 10.3102/0013189x08319762.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Chao-Ying Joanne Peng
    • 1
    Email author
  • Li-Ting Chen
    • 1
  • Hsu-Min Chiang
    • 2
  • Yi-Chen Chiang
    • 1
  1. 1.Department of Counseling and Educational PsychologyIndiana UniversityBloomingtonUSA
  2. 2.Columbia University—Teachers CollegeNew YorkUSA

Personalised recommendations