Abstract
Effect size is a concept that can be especially useful in bioequivalence and studies designed to find important and not just statistically significant differences among responses to treatments based on independent random samples. We develop and explore a new effect size related to a maximal superiority ordering for assessing the separation among two or more normal distributions, possibly having different means and different variances. Confidence intervals and tests of hypothesis for this effect size are developed using a p value obtained by averaging over a distribution on variances. Since there is almost always some difference among treatments, instead of the usual hypothesis test of exactly no effect, researchers should consider testing that an appropriate effect size has at least, or at most, some meaningful magnitude, when one is available, possibly established using the framework developed here. A simulation study of type I error rate, power and interval length is presented. R-code for constructing the confidence intervals and carrying out the tests here can be downloaded from Author’s website.
Similar content being viewed by others
References
Bayarri MJ, Berger JO (2000) p-Values for composite null models. J Am Stat Assoc 95:1127–1142
Bonnet G (2008) Confidence intervals for standardized linear contrasts of mean. Psychol Methods 13(2):99–109
Browne RH (2010) The t-test p-value and its relationship to the effect size P(X > Y). Am Stat 64:30–33
Casella G, Berger RL (1990) Statistical inference. Duxbury Press, New York
Coe R (2002) It’s the effect size stupid: what is effect size and why is it important. Paper presented at the annual conference of the British educational research association, University of Exeter, England, 12–14 September 2002
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum, USA
Efron B (2010) Large scale inference empirical Bayes methods for estimation, testing and prediction. Cambridge University Press, Cambridge
Fisher RA (1971) The design of experiments, 8th edn. Reprinted, Hafner, New York
Fritz CO, Morris PE, Richler JJ (2012) Effect size estimates: current use, calculations and interpretation. J Exp Psychol Gen 141(1):2–18
Grissom RJ, Kim JJ (2012) Effect sizes for research, univariate and multivariate applications, 2nd edn. Routledge, New York
Hess B, Olejnik S, Huberty C (2001) The efficacy of two-improvement over chance effect sizes for two-group univariate comparisons under variance heterogeneity and nonnormality. Educ Psychol Meas 61:909–936
Hess MR, Hogarty KY, Ferron JM, Kromrey JD (2007) Interval estimates of multivariate effect sizes. Educ Psychol Meas 67:21–40
Hodges JL Jr, Lehmann EL (1954) Testing the validity of statistical hypotheses. J R Stat Soc Ser B 16:261–268
Hsieh F, Turnbull BW (1996) Nonparametric methods for evaluating diagnostic tests. Stat Sinica 6(1996):47–62
Huberty CJ, Lowman LL (2000) Group overlap as a basis for effect size. Educ Psychol Meas 60:543–563
Ioannides JPA (2005) Why most published research dindings are false. PLoS Med 2(8):124
Kelly K (2007) Confidence intervals for standardized effect sizes: theory, application and implementation. J Stat Softw 20(8):1–24
Kemp KE, Yang SS, Perng SK, Nelson PI (1993) An asymptotically distribution free test for assessing the separation between two distributions. J Nonparametr Stat 2:235–248
Keselman HJ, Algin J, Lix LM, Wilcox RR, Deering KN (2008) A generally robust approach for testing hypotheses and setting confidence intervals for effect size. Psychol Methods 13(2):110–129
Kulinskaya E, Staudte RG (2006) Interval estimates of weighted effect sizes in the one-way heteroscedastic ANOVA. Br J Math Stat Psychol 59:97–111
Kuehl RO (2000) Design of experiments: statistical principles of research design and analysis, 2nd edn. Duxbury, Pacific Grove
Lehmann E, Romano CP (2005) Testing statistical hypotheses (Revised 2008). Springer, New York City
Ling Y, Nelson PI (2013) Consistency of p-values obtained by averaging over nuisance parameters. Commun Stat Theory Methods 42(5):852–866
McGraw KO, Wong SP (1992) A common language effect size statistic. Psychol Bull 111:361–365
Meng X (1994) Posterior predictive p-values. Ann Stat 22(3):1142–1160
Newcombe RG (2006) Confidence intervals for an effect size measure based on the Mann–Whitney statistic. Part 2: asymptotic results and evaluation. Stat Med 25:259–573
Perng SK, Kemp KE, Nelson PI (1989) Testing for a separation between two normal distributions. Commun Stat Theory Methods 18(5):1895–1912
Rouanet H (1996) Bayesian methods for assessing importance of effects. Psychol Bull 119:148–149
Shieh G (2013) Confidence intervals and sample size calculations for the weighted eta-squared effect sizes in one way heteroscedastic ANOVA. Behav Res Methods 45(1):25–37
Scheffe H (1959) The analysis of variance. Wiley, New York
Steiger JH (2004) Beyond the f-test: effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychol Methods 9(2):164–182
Stigler SM (1977) Do robust estimators work with real data? Ann Stat 5(6):1055–1078
Tilton JW (1937) The measurement of overlapping. J Educ Psychol 28:656–662
Van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, New York City
Wilcox RR (2012) Introduction to robust estimation and hypothesis testing, 3rd edn. Academic Press, New York
Wilcox RR, Tian TS (2011) Measuring effect size: a robust heteroscedastic approach for two or more groups. J Appl Stat 38(7):1359–1368
Xie R, Nelson PI (2003) Separation among distributions related by linear regression. Am Stat 57:33–36
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ling, Y., Nelson, P.I. Effect size for comparing two or more normal distributions based on maximal contrasts in outcomes. Stat Methods Appl 23, 381–399 (2014). https://doi.org/10.1007/s10260-014-0254-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-014-0254-y