Abstract
Confidence interval (CI) estimation for an effect size (ES) provides a range of possible population ESs supported by data. In this article, we investigated the noncentral t method, Bonett’s method, and the bias-corrected and accelerated (BCa) bootstrap method for constructing CIs when a standardized linear contrast of means is defined as an ES. The noncentral t method assumes normality and equal variances, Bonett’s method assumes only normality, and the BCa bootstrap method makes no assumptions. We simulated data for three and four groups from a variety of populations (one normal and five nonnormals) with varied variance ratios (1, 2.25, 4, 8), population ESs (0, 0.2, 0.5, 0.8), and sample size patterns (one equal and two unequal). Results showed that the noncentral method performed the best among the three methods under the joint condition of ES = 0 and equal variances. Performance of the noncentral method was comparable to that of the other two methods under (1) equal sample size, unequal weight for each group, and the last group sampled from a leptokurtic distribution, or (2) equal sample size and equal weight for all groups, when all are sampled from a normal population, or only the last group sampled from a nonnormal distribution. In the remaining conditions, Bonett’s and the BCa bootstrap methods performed better than the noncentral method. The BCa bootstrap method is the method of choice when the sample size per group is 30 or more. Findings from this study have implications for simultaneous comparisons of means and of ranked means in between- and within-subjects designs.
Similar content being viewed by others
References
Algina, J., Keselman, H. J., & Penfield, R. D. (2005a). An alternative to Cohen’s standardized mean difference effect size: A robust parameter and confidence interval in the two independent groups case. Psychological Methods, 10, 317–328. doi:10.1037/1082-989X.10.3.31
Algina, J., Keselman, H. J., & Penfield, R. D. (2005b). Effect sizes and their Intervals: The two-level repeated measures case. Educational and Psychological Measurement, 65, 241–258. doi:10.1177/0013164404268675
Barker, N. (2005). A practical introduction to the bootstrap using the SAS system. Proceedings of SAS conference: Phuse. Retrieved from http://www.lexjansen.com/phuse/2005/pk/pk02.pdf
Bird, K. D. (2002). Confidence intervals for effect sizes in analysis of variance. Educational and Psychological Measurement, 62, 197–226. doi:10.1177/0013164402062002001
Bonett, D. G. (2008). Confidence intervals for standardized linear contrasts of means. Psychological Methods, 13, 99–109. doi:10.1037/1082-989X.13.2.99
Bonett, D. G., & Price, R. M. (2002). Statistical inference for a linear function of medians: Confidence intervals, hypothesis testing, and sample size requirements. Psychological Methods, 7, 370–383. doi:10.1037/1082-989x.7.3.370
Bradley, J. V. (1978). Robustness? British Journal of Mathematical & Statistical Psychology, 31, 144–152. doi:10.1111/j.2044-8317.1978.tb00581.x
Chen, L.-T., & Peng, C.-Y. J. (2013). Constructing confidence intervals for effect sizes in ANOVA designs. Journal of Modern Applied Statistical Methods, 12, 82--104.
Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin, 114, 494–509. doi:10.1037/0033-2909.114.3.494
Cohen, J. (1969). Statistic power analysis in the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale: Erlbaum.
Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncetral distributions. Educational and Psychological Measurement, 61, 532–574. doi:10.1177/0013164401614002
Deng, N., Allison, J. J., Fang, H. J., Ash, A. S., & Ware, J. E. (2013). Using the bootstrap to establish statistical significance for relative validity comparisons among patient-reported outcome measures. Health and Quality of Life Outcomes, 11, 89. doi:10.1186/1477-7525-11-89
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.
Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43, 521–532.
Harwell, M. (1997). An empirical study of Hedges’s homogeneity test. Psychological Methods, 2, 219–231. doi:10.1037//1082-989x.2.2.219
Harwell, M. R., Rubinstein, E. N., Hayes, W. S., & Olds, C. C. (1992). Summarizing Monte Carlo results in methodological research: The one-and two-factor fixed effects ANOVA cases. Journal of Educational and Behavioral Statistics, 17, 315–339. doi:10.3102/10769986017004315
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. San Diego: Academic Press.
Hess, M. R., & Kromrey, J. D. (2004). Robust confidence intervals for effect sizes: A comparative study of Cohen’s d and Cliff’s delta under non-normality and heterogeneous variance. Paper presented at the the annual meeting of the American Educational Research Association, San Diego, CA.
Kelley, K. (2005). The effects of nonnormal distributions on confidence intervals around the standardized mean difference: Bootstrap and parametric confidence intervals. Educational and Psychological Measurement, 65, 51–69. doi:10.1177/0013164404264850
Keselman, H. J., Algina, J., Lix, L. M., Wilcox, R. R., & Deering, K. N. (2008). A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes. Psychological Methods, 13, 110–129. doi:10.1037/1082-989x.13.2.110
Keselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donahue, B., Kowalchuk, R. K., Lowman, L. L., Petoskey, M. D., Keselman, J. C., & Levin, J. R. (1998). Statistical practices of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68, 350–386. doi:10.3102/00346543068003350
Kirk, R. E. (2013). Experimental design: Procedures for the behavioral sciences (4th ed.). Thousand Oaks: Sage.
Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research. Washington, DC: American Psychological Association.
Kratochwill, T. R., & Levin, J. R. (1992). Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ: Erlbaum.
Odgaard, E. C., & Fowler, R. L. (2010). Confidence intervals for effect sizes: Compliance and clinical significance in the Journal of Consulting and Clinical Psychology. Journal of Consulting and Clinical Psychology, 78, 287–297. doi:10.1037/a0019294
Peng, C.-Y. J., & Chen, L.-T. (2014). Beyond Cohen's d: Alternative effect size measures for between-subject designs. The Journal of Experimental Education, 82, 22–50. doi:10.1080/00220973.2012.745471
Peng, C.-Y. J., Chen, L.-T., Chiang, H.-M., & Chiang, Y.-C. (2013). The impact of APA and AERA guidelines on effect size reporting. Educational Psychology Review, 25, 157–209. doi:10.1007/s10648-013-9218-2
Ramsey, P. H., Barrera, K., Hachimine-Semprebom, P., & Liu, C.-C. (2011). Pairwise comparisons of means under realistic nonnormality, unequal variances, outliers and equal sample sizes. Journal of Statistical Computation and Simulation, 81, 125–135. doi:10.1080/00949650903219935
Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks: Sage.
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164–182. doi:10.1037/1082-989X.9.2.164
Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and the evaluation of statistical models. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 221–257). Mahwah, NJ: Lawrence Erlbaum.
Stuart, A., & Ord, J. K. (1994). Kendall’s advanced theory of statistics (Vol. I, 6th ed). London: Arnold.
Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31(3), 25--32. doi:10.3102/0013189X031003025
Thompson, B. (2008). Computing and interpreting effect sizes, confidence intervals, and confidence intervals for effect sizes. In J. W. Osborne (Ed.), Best Practices in Quantitative Methods (pp. 246–262). Thousand Oaks: Sage.
Viechtbauer, W. (2007). Approximate confidence intervals for standardized effect sizes in the two-independent and two-dependent samples design. Journal of Educational and Behavioral Statistics, 32, 39–60. doi:10.3102/1076998606298034
Wilcox, R. R. (2005). Introduction to robust estimation and hypothesis testing (2nd ed.). San Diego, CA: Academic Press.
Acknowledgments
This research was supported in part by the Maris M. Proffitt and Mary Higgins Proffitt Endowment Grant of Indiana University, awarded to the second author while the first author worked on the project as a research assistant. We thank the editor, two reviewers, and Po-Ju Wu for their insightful comments on an earlier version of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, LT., Peng, CY.J. The sensitivity of three methods to nonnormality and unequal variances in interval estimation of effect sizes. Behav Res 47, 107–126 (2015). https://doi.org/10.3758/s13428-014-0461-3
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13428-014-0461-3