Skip to main content
Log in

The sensitivity of three methods to nonnormality and unequal variances in interval estimation of effect sizes

  • Published:
Behavior Research Methods Aims and scope Submit manuscript

Abstract

Confidence interval (CI) estimation for an effect size (ES) provides a range of possible population ESs supported by data. In this article, we investigated the noncentral t method, Bonett’s method, and the bias-corrected and accelerated (BCa) bootstrap method for constructing CIs when a standardized linear contrast of means is defined as an ES. The noncentral t method assumes normality and equal variances, Bonett’s method assumes only normality, and the BCa bootstrap method makes no assumptions. We simulated data for three and four groups from a variety of populations (one normal and five nonnormals) with varied variance ratios (1, 2.25, 4, 8), population ESs (0, 0.2, 0.5, 0.8), and sample size patterns (one equal and two unequal). Results showed that the noncentral method performed the best among the three methods under the joint condition of ES = 0 and equal variances. Performance of the noncentral method was comparable to that of the other two methods under (1) equal sample size, unequal weight for each group, and the last group sampled from a leptokurtic distribution, or (2) equal sample size and equal weight for all groups, when all are sampled from a normal population, or only the last group sampled from a nonnormal distribution. In the remaining conditions, Bonett’s and the BCa bootstrap methods performed better than the noncentral method. The BCa bootstrap method is the method of choice when the sample size per group is 30 or more. Findings from this study have implications for simultaneous comparisons of means and of ranked means in between- and within-subjects designs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Algina, J., Keselman, H. J., & Penfield, R. D. (2005a). An alternative to Cohen’s standardized mean difference effect size: A robust parameter and confidence interval in the two independent groups case. Psychological Methods, 10, 317–328. doi:10.1037/1082-989X.10.3.31

  • Algina, J., Keselman, H. J., & Penfield, R. D. (2005b). Effect sizes and their Intervals: The two-level repeated measures case. Educational and Psychological Measurement, 65, 241–258. doi:10.1177/0013164404268675

  • Barker, N. (2005). A practical introduction to the bootstrap using the SAS system. Proceedings of SAS conference: Phuse. Retrieved from http://www.lexjansen.com/phuse/2005/pk/pk02.pdf

  • Bird, K. D. (2002). Confidence intervals for effect sizes in analysis of variance. Educational and Psychological Measurement, 62, 197–226. doi:10.1177/0013164402062002001

    Article  Google Scholar 

  • Bonett, D. G. (2008). Confidence intervals for standardized linear contrasts of means. Psychological Methods, 13, 99–109. doi:10.1037/1082-989X.13.2.99

    Article  PubMed  Google Scholar 

  • Bonett, D. G., & Price, R. M. (2002). Statistical inference for a linear function of medians: Confidence intervals, hypothesis testing, and sample size requirements. Psychological Methods, 7, 370–383. doi:10.1037/1082-989x.7.3.370

    Article  PubMed  Google Scholar 

  • Bradley, J. V. (1978). Robustness? British Journal of Mathematical & Statistical Psychology, 31, 144–152. doi:10.1111/j.2044-8317.1978.tb00581.x

  • Chen, L.-T., & Peng, C.-Y. J. (2013). Constructing confidence intervals for effect sizes in ANOVA designs. Journal of Modern Applied Statistical Methods, 12, 82--104.

  • Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin, 114, 494–509. doi:10.1037/0033-2909.114.3.494

    Article  Google Scholar 

  • Cohen, J. (1969). Statistic power analysis in the behavioral sciences. New York: Academic Press.

    Google Scholar 

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale: Erlbaum.

    Google Scholar 

  • Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncetral distributions. Educational and Psychological Measurement, 61, 532–574. doi:10.1177/0013164401614002

    Article  Google Scholar 

  • Deng, N., Allison, J. J., Fang, H. J., Ash, A. S., & Ware, J. E. (2013). Using the bootstrap to establish statistical significance for relative validity comparisons among patient-reported outcome measures. Health and Quality of Life Outcomes, 11, 89. doi:10.1186/1477-7525-11-89

    Article  PubMed Central  PubMed  Google Scholar 

  • Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.

    Book  Google Scholar 

  • Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43, 521–532.

    Article  Google Scholar 

  • Harwell, M. (1997). An empirical study of Hedges’s homogeneity test. Psychological Methods, 2, 219–231. doi:10.1037//1082-989x.2.2.219

    Article  Google Scholar 

  • Harwell, M. R., Rubinstein, E. N., Hayes, W. S., & Olds, C. C. (1992). Summarizing Monte Carlo results in methodological research: The one-and two-factor fixed effects ANOVA cases. Journal of Educational and Behavioral Statistics, 17, 315–339. doi:10.3102/10769986017004315

    Google Scholar 

  • Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. San Diego: Academic Press.

    Google Scholar 

  • Hess, M. R., & Kromrey, J. D. (2004). Robust confidence intervals for effect sizes: A comparative study of Cohen’s d and Cliff’s delta under non-normality and heterogeneous variance. Paper presented at the the annual meeting of the American Educational Research Association, San Diego, CA.

  • Kelley, K. (2005). The effects of nonnormal distributions on confidence intervals around the standardized mean difference: Bootstrap and parametric confidence intervals. Educational and Psychological Measurement, 65, 51–69. doi:10.1177/0013164404264850

  • Keselman, H. J., Algina, J., Lix, L. M., Wilcox, R. R., & Deering, K. N. (2008). A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes. Psychological Methods, 13, 110–129. doi:10.1037/1082-989x.13.2.110

    Article  PubMed  Google Scholar 

  • Keselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donahue, B., Kowalchuk, R. K., Lowman, L. L., Petoskey, M. D., Keselman, J. C., & Levin, J. R. (1998). Statistical practices of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68, 350–386. doi:10.3102/00346543068003350

  • Kirk, R. E. (2013). Experimental design: Procedures for the behavioral sciences (4th ed.). Thousand Oaks: Sage.

    Book  Google Scholar 

  • Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research. Washington, DC: American Psychological Association.

  • Kratochwill, T. R., & Levin, J. R. (1992). Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ: Erlbaum.

  • Odgaard, E. C., & Fowler, R. L. (2010). Confidence intervals for effect sizes: Compliance and clinical significance in the Journal of Consulting and Clinical Psychology. Journal of Consulting and Clinical Psychology, 78, 287–297. doi:10.1037/a0019294

    Article  PubMed  Google Scholar 

  • Peng, C.-Y. J., & Chen, L.-T. (2014). Beyond Cohen's d: Alternative effect size measures for between-subject designs. The Journal of Experimental Education, 82, 22–50. doi:10.1080/00220973.2012.745471

  • Peng, C.-Y. J., Chen, L.-T., Chiang, H.-M., & Chiang, Y.-C. (2013). The impact of APA and AERA guidelines on effect size reporting. Educational Psychology Review, 25, 157–209. doi:10.1007/s10648-013-9218-2

  • Ramsey, P. H., Barrera, K., Hachimine-Semprebom, P., & Liu, C.-C. (2011). Pairwise comparisons of means under realistic nonnormality, unequal variances, outliers and equal sample sizes. Journal of Statistical Computation and Simulation, 81, 125–135. doi:10.1080/00949650903219935

    Article  Google Scholar 

  • Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks: Sage.

    Google Scholar 

  • Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164–182. doi:10.1037/1082-989X.9.2.164

  • Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and the evaluation of statistical models. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 221–257). Mahwah, NJ: Lawrence Erlbaum.

  • Stuart, A., & Ord, J. K. (1994). Kendall’s advanced theory of statistics (Vol. I, 6th ed). London: Arnold.

  • Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31(3), 25--32. doi:10.3102/0013189X031003025

  • Thompson, B. (2008). Computing and interpreting effect sizes, confidence intervals, and confidence intervals for effect sizes. In J. W. Osborne (Ed.), Best Practices in Quantitative Methods (pp. 246–262). Thousand Oaks: Sage.

  • Viechtbauer, W. (2007). Approximate confidence intervals for standardized effect sizes in the two-independent and two-dependent samples design. Journal of Educational and Behavioral Statistics, 32, 39–60. doi:10.3102/1076998606298034

    Article  Google Scholar 

  • Wilcox, R. R. (2005). Introduction to robust estimation and hypothesis testing (2nd ed.). San Diego, CA: Academic Press.

Download references

Acknowledgments

This research was supported in part by the Maris M. Proffitt and Mary Higgins Proffitt Endowment Grant of Indiana University, awarded to the second author while the first author worked on the project as a research assistant. We thank the editor, two reviewers, and Po-Ju Wu for their insightful comments on an earlier version of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li-Ting Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, LT., Peng, CY.J. The sensitivity of three methods to nonnormality and unequal variances in interval estimation of effect sizes. Behav Res 47, 107–126 (2015). https://doi.org/10.3758/s13428-014-0461-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3758/s13428-014-0461-3

Keywords

Navigation