Abstract
As a generalization of the standardized mean difference between two independent populations, two different effect size measures have been proposed to represent the degree of disparity among several treatment groups. One index relies on the standard deviation of the standardized means and the second formula is the range of the standardized means. Despite the obvious usage of the two measures, the associated test procedures for detecting a minimal important difference among standardized means have not been well explicated. This article reviews and compares the two approaches to testing the hypothesis that treatments have negligible effects rather than that of no difference. The primary emphasis is to reveal the underlying properties of the two methods with regard to power behavior and sample size requirement across a variety of design configurations. To enhance the practical usefulness, a complete set of computer algorithms for calculating the critical values, p-values, power levels, and sample sizes is also developed.
Similar content being viewed by others
References
Alhija, F. N. A., & Levy, A. (2009). Effect size reporting practices in published articles. Educational and Psychological Measurement, 69(2), 245–265.
Bau, J. J., Chen, H. J., & Xiong, M. (1993). Percentage points of the studentized range test for dispersion of normal means. Journal of Statistical Computation and Simulation, 44(3–4), 149–163.
Baumann, J. F., Seifert-Kessell, N., & Jones, L. A. (1992). Effect of think-aloud instruction on elementary students’ comprehension monitoring abilities. Journal of Reading Behavior, 24(2), 143–172.
Breaugh, J. A. (2003). Effect size estimation: factors to consider and mistakes to avoid. Journal of Management, 29(1), 79–97.
Chen, H. J., Wen, M. J., & Wang, M. (2009). On testing the bioequivalence of several treatments using the measure of distance. Statistics, 43(5), 513–530.
Chen, H. J., Wen, M. J., & Chuang, C. J. (2011). On testing the equivalence of treatments using the measure of range. Computational Statistics and Data Analysis, 55(1), 603–614.
Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale: Erlbaum.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003.
Cribbie, R. A., Arpin-Cribbie, C. A., & Gruman, J. A. (2010). Tests of equivalence for one-way independent groups designs. Journal of Experimental Education, 78(1), 1–13.
David, H. A., Lachenbruch, P. A., & Brandis, H. P. (1972). The power function of range and studentized range tests in normal samples. Biometrika, 59(1), 161–168.
Durlak, J. A. (2009). How to select, calculate, and interpret effect sizes. Journal of Pediatric Psychology, 34(9), 917–928.
Ferguson, C. J. (2009). An effect size primer: a guide for clinicians and researchers. Professional Psychology: Research and Practices, 40(5), 532–538.
Fern, E. F., & Monroe, K. B. (1996). Effect-size estimates: issues and problems in interpretation. Journal of Consumer Research, 23(2), 89–105.
Fleishman, A. I. (1980). Confidence intervals for correlation ratios. Educational and Psychological Measurement, 40(3), 659–670.
Ghosh, B. K. (1973). Some monotonicity theorems for χ2, F and t distributions with applications. Journal of the Royal Statistical Society, Series B, 35, 480–492.
Giani, G., & Finner, H. (1991). Some general results on least favorable parameter configurations with special reference to equivalence testing and the range statistic. Journal of Statistical Planning and Inference, 28(1), 33–47.
Grissom, R. J., & Kim, J. J. (2012). Effect sizes for research: univariate and multivariate applications (2nd ed.). New York: Routledge.
Hayter, A. J., & Hurn, M. (1992). Power comparisons between the F-test, the studentised range test, and an optimal test of the equality of several normal means. Journal of Statistical Computation and Simulation, 42(3–4), 173–185.
Hayter, A. J., & Liu, W. (1992). Some minimax test procedures for comparing several normal means. In F. M. Hoppe (Ed.), Multiple comparisons, selection, and applications in biometry, a festschrift in honor of Charles W. Dunnett (pp. 137–148). New York: Marcel Dekker.
Huberty, C. (2002). A history of effect size indices. Educational and Psychological Measurement, 62(2), 227–240.
Kirk, R. (1996). Practical significance: a concept whose time has come. Educational and Psychological Measurement, 56(5), 746–759.
Kline, R. B. (2004). Beyond significance testing: reforming data analysis methods in behavioral research. Washington: American Psychological Association.
Murphy, K. R. (1990). If the null hypothesis is impossible, why test it? American Psychologist, 45(3), 403–404.
Murphy, K. R., & Myors, B. (1999). Testing the hypothesis that treatments have negligible effects: minimum-effect tests in the general linear model. Journal of Applied Psychology, 84(2), 234–248.
Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: applications, interpretations, and limitations. Contemporary Educational Psychology, 25(3), 241–286.
Pearson, E. S., & Hartley, H. O. (1951). Charts of the power function for analysis of variance tests, derived from the non-central F-distribution. Biometrika, 38(1–2), 112–130.
R Development Core Team (2014). R: A language and environment for statistical computing [Computer software and manual]. Retrieved from http://www.r-project.org.
Richardson, J. T. E. (1996). Measures of effect size. Behavior Research Methods, Instruments, & Computers, 28(1), 12–22.
Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect size in behavioral research: a correlational approach. New York: Cambridge University Press.
Rosnow, R. L., & Rosenthal, R. (2003). Effect sizes for experimenting psychologists. Canadian Journal of Experimental Psychology, 57(3), 221–237.
SAS Institute. (2014). SAS/IML User’s Guide, Version 9.3. Cary: SAS Institute Inc..
Serlin, R. A., & Lapsley, D. K. (1993). Rational appraisal of psychological research and the good-enough principle. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: methodological issues (pp. 199–228). Hillsdale: Erlbaum.
Skidmore, S. T., & Thompson, B. (2010). Statistical techniques used in published articles: a historical review of reviews. Educational and Psychological Measurement, 70(5), 777–795.
Steiger, J. H. (2004). Beyond the F test: effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9(2), 164–182.
Vacha-Haase, T., & Thompson, B. (2004). How to estimate and interpret various effect sizes. Journal of Counseling Psychology, 51(4), 473–481.
Warne, R. T., Lazo, M., Ramos, T., & Ritter, N. (2012). Statistical methods used in gifted education journals, 2006-2010. Gifted Child Quarterly, 56(3), 134–149.
Wellek, S. (2010). Testing statistical hypotheses of equivalence and noninferiority (2nd ed.). New York: CRC Press.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
The author has no support or funding to report.
Conflict of Interest
The author declares that he has no conflict of interest.
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Rights and permissions
About this article
Cite this article
Shieh, G. On Detecting a Minimal Important Difference among Standardized Means. Curr Psychol 37, 640–647 (2018). https://doi.org/10.1007/s12144-016-9549-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12144-016-9549-5