Does scale length matter? A comparison of nine- versus five-point rating scales for the mini-CEX

Cook, David A.; Beckman, Thomas J.

doi:10.1007/s10459-008-9147-x

Does scale length matter? A comparison of nine- versus five-point rating scales for the mini-CEX

Published: 26 November 2008

Volume 14, pages 655–664, (2009)
Cite this article

Advances in Health Sciences Education Aims and scope Submit manuscript

David A. Cook¹ &
Thomas J. Beckman¹

1337 Accesses
61 Citations
Explore all metrics

Abstract

Educators must often decide how many points to use in a rating scale. No studies have compared interrater reliability for different-length scales, and few have evaluated accuracy. This study sought to evaluate the interrater reliability and accuracy of mini-clinical evaluation exercise (mini-CEX) scores, comparing the traditional mini-CEX nine-point scale to a five-point scale. Methods: The authors conducted a validity study in an academic internal medicine residency program. Fifty-two program faculty participated. Participants rated videotaped resident-patient encounters using the mini-CEX with both a nine-point scale and a five-point scale. Some cases were scripted to reflect a specific level of competence (unsatisfactory, satisfactory, superior). Outcome measures included mini-CEX scores, accuracy (scores compared to scripted competence level), interrater reliability, and domain intercorrelation. Results: Interviewing, exam, counseling, and overall ratings varied significantly across levels of competence (P < .0001). Nine-point scale scores accurately classified competence more often (391/720 [54%] for overall ratings) than five-point scores (316/723 [44%], P < .0001). Interrater reliability was similar for scores from the nine- and five-point scales (0.43 and 0.40, respectively, for overall ratings). With the exception of correlation between exam and counseling scores using the five-point scale (r = 0.38, P = .13), score correlations among all domain combinations were high (r = 0.46–0.89) and statistically significant (P ≤ .015) for both scales. Conclusions: Mini-CEX scores demonstrated modest interrater reliability and accuracy. Although interrater reliability is similar for nine- and five-point scales, nine-point scales appear to provide more accurate scores. This has implications for many educational assessments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing Competence in Medical Humanism: Development and Validation of the ICARE Scale for Assessing Humanistic Patient Care

Variability and dimensionality of students’ and supervisors’ mini-CEX scores in undergraduate medical clerkships – a multilevel factor analysis

Article Open access 08 May 2018

Examining the educational impact of the mini-CEX: a randomised controlled study

Article Open access 21 April 2021

References

Beckman, T. J., Ghosh, A. K., Cook, D. A., Erwin, P. J., & Mandrekar, J. N. (2004). How reliable are assessments of clinical teaching? A review of the published instruments. Journal of General Internal Medicine, 19, 971–977. doi:10.1111/j.1525-1497.2004.40066.x.
Article Google Scholar
Brennan, R. L. (2001). Generalizability theory. New York: Springer.
Google Scholar
Cook, D. A., Dupras, D. M., Beckman, T. J., Thomas, K. G., & Pankratz, V. S. (2008). Effect of rater training on reliability and accuracy of mini-CEX scores: A randomized, controlled trial. Journal of General Internal Medicine (in press). doi:10.1007/s11606-008-0842-3.
Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619. doi:10.1177/001316447303300309.
Article Google Scholar
Hancock, G. R., & Klockars, A. J. (1991). The effect of scale manipulations on validity: Targetting frequency rating scales for anticipated performance levels. Applied Ergonomics, 22, 147–154. doi:10.1016/0003-6870(91)90153-9.
Article Google Scholar
Harvill, L. M. (1991). NCME instructional module: Standard error of measurement. Educational Measurement: Issues and Practice, 10(2), 33–41. doi:10.1111/j.1745-3992.1991.tb00195.x.
Article Google Scholar
Holmboe, E. S., Hawkins, R. E., & Huot, S. J. (2004). Effects of training in direct observation of medical residents’ clinical competence: A randomized trial. Annals of Internal Medicine, 140, 874–881.
Google Scholar
Holmboe, E. S., Huot, S., Chung, J., Norcini, J., & Hawkins, R. E. (2003). Construct validity of the mini-clinical evaluation exercise (mini-CEX). Academic Medicine, 78, 826–830. doi:10.1097/00001888-200308000-00018.
Article Google Scholar
Jenkins, G. D., & Taber, T. D. (1977). A Monte Carlo study of factors affecting three indices of composite scale reliability. The Journal of Applied Psychology, 62(4), 392–398. doi:10.1037/0021-9010.62.4.392.
Article Google Scholar
Kogan, J. R., Bellini, L. M., & Shea, J. A. (2003). Feasibility, reliability, and validity of the mini-clinical evaluation exercise (mCEX) in a medicine core clerkship. Academic Medicine, 78(10, Suppl), S33–S35. doi:10.1097/00001888-200310001-00011.
Article Google Scholar
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174. doi:10.2307/2529310.
Article Google Scholar
Margolis, M. J., Clauser, B. E., Cuddy, M. M., Ciccone, A., Mee, J., Harik, P., et al. (2006). Use of the mini-clinical evaluation exercise to rate examinee performance on a multiple-station clinical skills examination: A validity study. Academic Medicine, 81(10, Suppl), S56–S60. doi:10.1097/01.ACM.0000236514.53194.f4.
Article Google Scholar
Matell, M. S., & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale items? Study I: Reliability and validity. Educational and Psychological Measurement, 31, 657–674. doi:10.1177/001316447103100307.
Article Google Scholar
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97. doi:10.1037/h0043158.
Article Google Scholar
Nishisato, S., & Torii, Y. (1970). Effects of categorizing continuous normal variables on product-moment correlation. Japanese Psychological Research, 13, 45–49.
Google Scholar
Norcini, J. J., Blank, L. L., Arnold, G. K., & Kimball, H. R. (1995). The mini-CEX (clinical evaluation exercise): A preliminary investigation. Annals of Internal Medicine, 123, 795–799.
Google Scholar
Norcini, J. J., Blank, L. L., Duffy, F. D., & Fortna, G. S. (2003). The mini-CEX: A method for assessing clinical skills. Annals of Internal Medicine, 138, 476–481.
Google Scholar
Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104, 1–15. doi:10.1016/S0001-6918(99)00050-5.
Article Google Scholar
Streiner, D. L., & Norman, G. R. (2003). Health measurement scales: A practical guide to their development and use (3rd ed.). New York: Oxford University Press.
Google Scholar
Weng, L.-J. (2004). Impact of the number of response categories and anchor labels on coefficient alpha and test–retest reliability. Educational and Psychological Measurement, 64, 956–972. doi:10.1177/0013164404268674.
Article Google Scholar

Download references

Acknowledgments

Thanks to K. G. Thomas and D. M. Dupras for assistance in study planning and execution, F. Enders for assistance in statistical planning, and to E. S. Holmboe for use of scripted cases. Funding was provided by the Mayo Education Innovation Program. A paper based on this study was presented at the 2008 meeting of the American Educational Research Association in New York.

Author information

Authors and Affiliations

Division of General Internal Medicine and Office of Education Research, Mayo Clinic College of Medicine, Baldwin 4-A, 200 First Street SW, Rochester, MN, 55905, USA
David A. Cook & Thomas J. Beckman

Authors

David A. Cook
View author publications
You can also search for this author in PubMed Google Scholar
Thomas J. Beckman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David A. Cook.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cook, D.A., Beckman, T.J. Does scale length matter? A comparison of nine- versus five-point rating scales for the mini-CEX. Adv in Health Sci Educ 14, 655–664 (2009). https://doi.org/10.1007/s10459-008-9147-x

Download citation

Received: 19 March 2008
Accepted: 10 November 2008
Published: 26 November 2008
Issue Date: December 2009
DOI: https://doi.org/10.1007/s10459-008-9147-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Does scale length matter? A comparison of nine- versus five-point rating scales for the mini-CEX

Abstract

Access this article

Similar content being viewed by others

Assessing Competence in Medical Humanism: Development and Validation of the ICARE Scale for Assessing Humanistic Patient Care

Variability and dimensionality of students’ and supervisors’ mini-CEX scores in undergraduate medical clerkships – a multilevel factor analysis

Examining the educational impact of the mini-CEX: a randomised controlled study

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Does scale length matter? A comparison of nine- versus five-point rating scales for the mini-CEX

Abstract

Access this article

Similar content being viewed by others

Assessing Competence in Medical Humanism: Development and Validation of the ICARE Scale for Assessing Humanistic Patient Care

Variability and dimensionality of students’ and supervisors’ mini-CEX scores in undergraduate medical clerkships – a multilevel factor analysis

Examining the educational impact of the mini-CEX: a randomised controlled study

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation