Does scale length matter? A comparison of nine- versus five-point rating scales for the mini-CEX
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
Educators must often decide how many points to use in a rating scale. No studies have compared interrater reliability for different-length scales, and few have evaluated accuracy. This study sought to evaluate the interrater reliability and accuracy of mini-clinical evaluation exercise (mini-CEX) scores, comparing the traditional mini-CEX nine-point scale to a five-point scale. Methods: The authors conducted a validity study in an academic internal medicine residency program. Fifty-two program faculty participated. Participants rated videotaped resident-patient encounters using the mini-CEX with both a nine-point scale and a five-point scale. Some cases were scripted to reflect a specific level of competence (unsatisfactory, satisfactory, superior). Outcome measures included mini-CEX scores, accuracy (scores compared to scripted competence level), interrater reliability, and domain intercorrelation. Results: Interviewing, exam, counseling, and overall ratings varied significantly across levels of competence (P < .0001). Nine-point scale scores accurately classified competence more often (391/720 [54%] for overall ratings) than five-point scores (316/723 [44%], P < .0001). Interrater reliability was similar for scores from the nine- and five-point scales (0.43 and 0.40, respectively, for overall ratings). With the exception of correlation between exam and counseling scores using the five-point scale (r = 0.38, P = .13), score correlations among all domain combinations were high (r = 0.46–0.89) and statistically significant (P ≤ .015) for both scales. Conclusions: Mini-CEX scores demonstrated modest interrater reliability and accuracy. Although interrater reliability is similar for nine- and five-point scales, nine-point scales appear to provide more accurate scores. This has implications for many educational assessments.
- Beckman, T. J., Ghosh, A. K., Cook, D. A., Erwin, P. J., & Mandrekar, J. N. (2004). How reliable are assessments of clinical teaching? A review of the published instruments. Journal of General Internal Medicine, 19, 971–977. doi:10.1111/j.1525-1497.2004.40066.x. CrossRef
- Brennan, R. L. (2001). Generalizability theory. New York: Springer.
- Cook, D. A., Dupras, D. M., Beckman, T. J., Thomas, K. G., & Pankratz, V. S. (2008). Effect of rater training on reliability and accuracy of mini-CEX scores: A randomized, controlled trial. Journal of General Internal Medicine (in press). doi:10.1007/s11606-008-0842-3.
- Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619. doi:10.1177/001316447303300309. CrossRef
- Hancock, G. R., & Klockars, A. J. (1991). The effect of scale manipulations on validity: Targetting frequency rating scales for anticipated performance levels. Applied Ergonomics, 22, 147–154. doi:10.1016/0003-6870(91)90153-9. CrossRef
- Harvill, L. M. (1991). NCME instructional module: Standard error of measurement. Educational Measurement: Issues and Practice, 10(2), 33–41. doi:10.1111/j.1745-3992.1991.tb00195.x. CrossRef
- Holmboe, E. S., Hawkins, R. E., & Huot, S. J. (2004). Effects of training in direct observation of medical residents’ clinical competence: A randomized trial. Annals of Internal Medicine, 140, 874–881.
- Holmboe, E. S., Huot, S., Chung, J., Norcini, J., & Hawkins, R. E. (2003). Construct validity of the mini-clinical evaluation exercise (mini-CEX). Academic Medicine, 78, 826–830. doi:10.1097/00001888-200308000-00018. CrossRef
- Jenkins, G. D., & Taber, T. D. (1977). A Monte Carlo study of factors affecting three indices of composite scale reliability. The Journal of Applied Psychology, 62(4), 392–398. doi:10.1037/0021-9010.62.4.392. CrossRef
- Kogan, J. R., Bellini, L. M., & Shea, J. A. (2003). Feasibility, reliability, and validity of the mini-clinical evaluation exercise (mCEX) in a medicine core clerkship. Academic Medicine, 78(10, Suppl), S33–S35. doi:10.1097/00001888-200310001-00011. CrossRef
- Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174. doi:10.2307/2529310. CrossRef
- Margolis, M. J., Clauser, B. E., Cuddy, M. M., Ciccone, A., Mee, J., Harik, P., et al. (2006). Use of the mini-clinical evaluation exercise to rate examinee performance on a multiple-station clinical skills examination: A validity study. Academic Medicine, 81(10, Suppl), S56–S60. doi:10.1097/01.ACM.0000236514.53194.f4. CrossRef
- Matell, M. S., & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale items? Study I: Reliability and validity. Educational and Psychological Measurement, 31, 657–674. doi:10.1177/001316447103100307. CrossRef
- Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97. doi:10.1037/h0043158. CrossRef
- Nishisato, S., & Torii, Y. (1970). Effects of categorizing continuous normal variables on product-moment correlation. Japanese Psychological Research, 13, 45–49.
- Norcini, J. J., Blank, L. L., Arnold, G. K., & Kimball, H. R. (1995). The mini-CEX (clinical evaluation exercise): A preliminary investigation. Annals of Internal Medicine, 123, 795–799.
- Norcini, J. J., Blank, L. L., Duffy, F. D., & Fortna, G. S. (2003). The mini-CEX: A method for assessing clinical skills. Annals of Internal Medicine, 138, 476–481.
- Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104, 1–15. doi:10.1016/S0001-6918(99)00050-5. CrossRef
- Streiner, D. L., & Norman, G. R. (2003). Health measurement scales: A practical guide to their development and use (3rd ed.). New York: Oxford University Press.
- Weng, L.-J. (2004). Impact of the number of response categories and anchor labels on coefficient alpha and test–retest reliability. Educational and Psychological Measurement, 64, 956–972. doi:10.1177/0013164404268674. CrossRef
- Does scale length matter? A comparison of nine- versus five-point rating scales for the mini-CEX
Advances in Health Sciences Education
Volume 14, Issue 5 , pp 655-664
- Cover Date
- Print ISSN
- Online ISSN
- Springer Netherlands
- Additional Links
- Medical education
- Educational measurement
- Clinical competence
- Reproducibility of results
- Interrater reliability
- Industry Sectors