Skip to main content
Log in

Effect size for comparing two or more normal distributions based on maximal contrasts in outcomes

  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

Effect size is a concept that can be especially useful in bioequivalence and studies designed to find important and not just statistically significant differences among responses to treatments based on independent random samples. We develop and explore a new effect size related to a maximal superiority ordering for assessing the separation among two or more normal distributions, possibly having different means and different variances. Confidence intervals and tests of hypothesis for this effect size are developed using a p value obtained by averaging over a distribution on variances. Since there is almost always some difference among treatments, instead of the usual hypothesis test of exactly no effect, researchers should consider testing that an appropriate effect size has at least, or at most, some meaningful magnitude, when one is available, possibly established using the framework developed here. A simulation study of type I error rate, power and interval length is presented. R-code for constructing the confidence intervals and carrying out the tests here can be downloaded from Author’s website.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Bayarri MJ, Berger JO (2000) p-Values for composite null models. J Am Stat Assoc 95:1127–1142

    MATH  MathSciNet  Google Scholar 

  • Bonnet G (2008) Confidence intervals for standardized linear contrasts of mean. Psychol Methods 13(2):99–109

    Article  MathSciNet  Google Scholar 

  • Browne RH (2010) The t-test p-value and its relationship to the effect size P(X > Y). Am Stat 64:30–33

    Article  MathSciNet  Google Scholar 

  • Casella G, Berger RL (1990) Statistical inference. Duxbury Press, New York

    MATH  Google Scholar 

  • Coe R (2002) It’s the effect size stupid: what is effect size and why is it important. Paper presented at the annual conference of the British educational research association, University of Exeter, England, 12–14 September 2002

  • Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum, USA

    MATH  Google Scholar 

  • Efron B (2010) Large scale inference empirical Bayes methods for estimation, testing and prediction. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Fisher RA (1971) The design of experiments, 8th edn. Reprinted, Hafner, New York

  • Fritz CO, Morris PE, Richler JJ (2012) Effect size estimates: current use, calculations and interpretation. J Exp Psychol Gen 141(1):2–18

    Article  Google Scholar 

  • Grissom RJ, Kim JJ (2012) Effect sizes for research, univariate and multivariate applications, 2nd edn. Routledge, New York

    Google Scholar 

  • Hess B, Olejnik S, Huberty C (2001) The efficacy of two-improvement over chance effect sizes for two-group univariate comparisons under variance heterogeneity and nonnormality. Educ Psychol Meas 61:909–936

    Article  MathSciNet  Google Scholar 

  • Hess MR, Hogarty KY, Ferron JM, Kromrey JD (2007) Interval estimates of multivariate effect sizes. Educ Psychol Meas 67:21–40

    Article  MathSciNet  Google Scholar 

  • Hodges JL Jr, Lehmann EL (1954) Testing the validity of statistical hypotheses. J R Stat Soc Ser B 16:261–268

    MATH  MathSciNet  Google Scholar 

  • Hsieh F, Turnbull BW (1996) Nonparametric methods for evaluating diagnostic tests. Stat Sinica 6(1996):47–62

    MATH  MathSciNet  Google Scholar 

  • Huberty CJ, Lowman LL (2000) Group overlap as a basis for effect size. Educ Psychol Meas 60:543–563

    Article  Google Scholar 

  • Ioannides JPA (2005) Why most published research dindings are false. PLoS Med 2(8):124

    Article  Google Scholar 

  • Kelly K (2007) Confidence intervals for standardized effect sizes: theory, application and implementation. J Stat Softw 20(8):1–24

    Google Scholar 

  • Kemp KE, Yang SS, Perng SK, Nelson PI (1993) An asymptotically distribution free test for assessing the separation between two distributions. J Nonparametr Stat 2:235–248

    Article  MATH  MathSciNet  Google Scholar 

  • Keselman HJ, Algin J, Lix LM, Wilcox RR, Deering KN (2008) A generally robust approach for testing hypotheses and setting confidence intervals for effect size. Psychol Methods 13(2):110–129

    Article  Google Scholar 

  • Kulinskaya E, Staudte RG (2006) Interval estimates of weighted effect sizes in the one-way heteroscedastic ANOVA. Br J Math Stat Psychol 59:97–111

    Article  MathSciNet  Google Scholar 

  • Kuehl RO (2000) Design of experiments: statistical principles of research design and analysis, 2nd edn. Duxbury, Pacific Grove

    Google Scholar 

  • Lehmann E, Romano CP (2005) Testing statistical hypotheses (Revised 2008). Springer, New York City

    Google Scholar 

  • Ling Y, Nelson PI (2013) Consistency of p-values obtained by averaging over nuisance parameters. Commun Stat Theory Methods 42(5):852–866

    Google Scholar 

  • McGraw KO, Wong SP (1992) A common language effect size statistic. Psychol Bull 111:361–365

    Article  Google Scholar 

  • Meng X (1994) Posterior predictive p-values. Ann Stat 22(3):1142–1160

    Article  MATH  Google Scholar 

  • Newcombe RG (2006) Confidence intervals for an effect size measure based on the Mann–Whitney statistic. Part 2: asymptotic results and evaluation. Stat Med 25:259–573

    Google Scholar 

  • Perng SK, Kemp KE, Nelson PI (1989) Testing for a separation between two normal distributions. Commun Stat Theory Methods 18(5):1895–1912

    Article  MATH  MathSciNet  Google Scholar 

  • Rouanet H (1996) Bayesian methods for assessing importance of effects. Psychol Bull 119:148–149

    Article  Google Scholar 

  • Shieh G (2013) Confidence intervals and sample size calculations for the weighted eta-squared effect sizes in one way heteroscedastic ANOVA. Behav Res Methods 45(1):25–37

    Article  Google Scholar 

  • Scheffe H (1959) The analysis of variance. Wiley, New York

    MATH  Google Scholar 

  • Steiger JH (2004) Beyond the f-test: effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychol Methods 9(2):164–182

    Article  MathSciNet  Google Scholar 

  • Stigler SM (1977) Do robust estimators work with real data? Ann Stat 5(6):1055–1078

    Article  MATH  MathSciNet  Google Scholar 

  • Tilton JW (1937) The measurement of overlapping. J Educ Psychol 28:656–662

    Article  Google Scholar 

  • Van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, New York City

    Book  MATH  Google Scholar 

  • Wilcox RR (2012) Introduction to robust estimation and hypothesis testing, 3rd edn. Academic Press, New York

    MATH  Google Scholar 

  • Wilcox RR, Tian TS (2011) Measuring effect size: a robust heteroscedastic approach for two or more groups. J Appl Stat 38(7):1359–1368

    Article  MathSciNet  Google Scholar 

  • Xie R, Nelson PI (2003) Separation among distributions related by linear regression. Am Stat 57:33–36

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Ling.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ling, Y., Nelson, P.I. Effect size for comparing two or more normal distributions based on maximal contrasts in outcomes. Stat Methods Appl 23, 381–399 (2014). https://doi.org/10.1007/s10260-014-0254-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-014-0254-y

Keywords

Navigation