Abstract
Educators often seek to demonstrate the equivalence of groups, such as whether or not students achieve comparable success regardless of the site at which they trained. A methodological consideration that is often underappreciated is how to operationalize equivalence. This study examined whether a distribution-based approach, based on effect size, can identify an appropriate equivalence threshold for medical education data. Thirty-nine individuals rated program site equivalence on a series of simulated pairwise bar graphs representing one of four measures with which they had prior experience: (1) undergraduate academic achievement, (2) a student experience survey, (3) an Objective Structured Clinical Exam global rating scale, or (4) a licensing exam. Descriptive statistics and repeated measures ANOVA examined the effects on equivalence ratings of (a) the difference between means, (b) variability in scores, and (c) which program site (the larger or smaller) scored higher. The equivalence threshold was defined as the point at which 50 % of participants rated the sites as non-equivalent. Across the four measures, the equivalence thresholds converged to average effect size of Cohen’s d = 0.57 (range of 0.50–0.63). This corresponded to an average mean difference of 10 % (range of 3–13 %). These results are discussed in reference to findings from the health-related quality of life field that has demonstrated that d = 0.50 represents a consistent threshold for perceived change. This study provides preliminary empirically-based guidance for defining an equivalence threshold for researchers and evaluators conducting equivalence tests.
Similar content being viewed by others
References
Barker, L. E., Luman, E. T., McCauley, M. M., & Chu, S. Y. (2002). Assessing equivalence: An alternative to the use of difference tests for measuring disparities in vaccination coverage. American Journal of Epidemiology, 56(11), 1056–1061.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.
Copay, A. G., Subach, B. R., Glassman, S. D., Polly, D. W, Jr, & Schuler, T. C. (2007). Understanding the minimum clinically important difference: A review of concepts and methods. The Spine Journal, 7(5), 541–546.
Cribbie, R. A., Gruman, J. A., & Arpin-Cribbie, C. A. (2004). Recommendations for applying tests of equivalence. Journal of Clinical Psychology, 60(1), 1–10.
Crosby, R. D., Kolotkin, R. L., & Williams, G. R. (2003). Defining clinically meaningful change in health-related quality of life. Journal of Clinical Epidemiology, 56(5), 395–407.
Lewis, I., Watson, B., & White, K. M. (2009). Internet versus paper-and-pencil survey methods in psychological experiments: Equivalence testing of participant responses to health-related messages. Australian Journal of Psychology, 61, 107–116.
Liaison Committee on Medical Education. (2013). Functions and structure of a medical school: Standards for accreditation of medical education programs leading to the MD degree. Retrieved from http://www.lcme.org/publications/2014-2015-functions-and-structure-june-2013.doc
Medical Council of Canada. (2015). Scoring. Retrieved from http://mcc.ca/examinations/mccqe-part-i/scoring/
Miller, G. A. (1956). The magic number seven plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97.
Norman, G. R., Sloan, J. A., & Wyrwich, K. W. (2003). Interpretation of changes in health-related quality of life: The Remarkable universality of half a standard deviation. Medical Care, 41(5), 582–592.
Norman, G. R., Sloan, J. A., & Wyrwich, K. W. (2004). The truly remarkable universality of half a standard deviation: Confirmation through another look. Expert Review of Pharmacoeconomics & Outcomes Research, 4(5), 581–585.
Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Pyschological Bulletin, 113, 553–565.
Rusticus, S. A., & Lovato, C. Y. (2011). Applying tests of equivalence for multiple group comparisons: Demonstration of the confidence interval approach. Practical Assessment, Research & Evaluation, 16(7). Available online: http://pareonline.net/getvn.asp?v=16&n=7
Schuirmann, D. J. (1987). A comparison of the two one-sided tests procedures and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15, 657–680.
Stegner, B. L., Bostrom, A. G., & Greenfield, T. K. (1996). Equivalence testing for use in psychosocial and services research: An introduction with examples. Evaluation and Program Planning, 19(3), 193–198.
Stratford, P. W., Binkley, J. M., Riddle, D. L., & Guyatt, G. H. (1998). Sensitivity to change of the Roland-Morris back pain questionnaire: part 1. Physical Therapy, 78(11), 1186–1196.
Tolsgaard, M. G., & Ringsted, C. (2014). Using equivalence designs to improve methodological rigor in medical education trials. Medical Education, 48, 219–221.
Acknowledgments
The authors wish to thank Dr. Chris Lovato for her helpful comments on the manuscript. This study was supported by the Evaluation Studies Unit, Faculty of Medicine, University of British Columbia.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rusticus, S.A., Eva, K.W. Defining equivalence in medical education evaluation and research: does a distribution-based approach work?. Adv in Health Sci Educ 21, 359–373 (2016). https://doi.org/10.1007/s10459-015-9633-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10459-015-9633-x