Effect of Rater Training on Reliability and Accuracy of Mini-CEX Scores: A Randomized, Controlled Trial

  • David A. Cook
  • Denise M. Dupras
  • Thomas J. Beckman
  • Kris G. Thomas
  • V. Shane Pankratz
Original Article



Mini-CEX scores assess resident competence. Rater training might improve mini-CEX score interrater reliability, but evidence is lacking.


Evaluate a rater training workshop using interrater reliability and accuracy.


Randomized trial (immediate versus delayed workshop) and single-group pre/post study (randomized groups combined).


Academic medical center.


Fifty-two internal medicine clinic preceptors (31 randomized and 21 additional workshop attendees).


The workshop included rater error training, performance dimension training, behavioral observation training, and frame of reference training using lecture, video, and facilitated discussion. Delayed group received no intervention until after posttest.


Mini-CEX ratings at baseline (just before workshop for workshop group), and four weeks later using videotaped resident–patient encounters; mini-CEX ratings of live resident–patient encounters one year preceding and one year following the workshop; rater confidence using mini-CEX.


Among 31 randomized participants, interrater reliabilities in the delayed group (baseline intraclass correlation coefficient [ICC] 0.43, follow-up 0.53) and workshop group (baseline 0.40, follow-up 0.43) were not significantly different (p = 0.19). Mean ratings were similar at baseline (delayed 4.9 [95% confidence interval 4.6–5.2], workshop 4.8 [4.5–5.1]) and follow-up (delayed 5.4 [5.0–5.7], workshop 5.3 [5.0–5.6]; p = 0.88 for interaction). For the entire cohort, rater confidence (1 = not confident, 6 = very confident) improved from mean (SD) 3.8 (1.4) to 4.4 (1.0), p = 0.018. Interrater reliability for ratings of live encounters (entire cohort) was higher after the workshop (ICC 0.34) than before (ICC 0.18) but the standard error of measurement was similar for both periods.


Rater training did not improve interrater reliability or accuracy of mini-CEX scores.

Clinical trials registration

clinicaltrials.gov identifier NCT00667940


medical education faculty development rater training clinical competence assessment randomized trial 

Supplementary material

11606_2008_842_MOESM1_ESM.doc (128 kb)
AppendixCook et al, CEX Rater Training (DOC 129kb)


  1. 1.
    Holmboe ES, Hawkins RE, Huot SJ. Effects of training in direct observation of medical residents’ clinical competence: a randomized trial. Ann Intern Med. 2004;140:874–81.PubMedGoogle Scholar
  2. 2.
    Norcini JJ, Blank LL, Duffy FD, Fortna GS. The mini-CEX: a method for assessing clinical skills. Ann Intern Med. 2003;138:476–81.PubMedGoogle Scholar
  3. 3.
    Kogan JR, Bellini LM, Shea JA. Feasibility, reliability, and validity of the mini-clinical evaluation exercise (mCEX) in a medicine core clerkship. Acad Med. 2003;78(10 Suppl):S33–5.PubMedCrossRefGoogle Scholar
  4. 4.
    Holmboe ES, Hawkins RE. Methods for evaluating the clinical competence of residents in internal medicine: a review. Ann Intern Med. 1998;129:42–8.PubMedGoogle Scholar
  5. 5.
    Woolliscroft JO, Stross JK, Silva J Jr. Clinical competence certification: a critical appraisal. J Med Educ. 1984;59:799–805.PubMedGoogle Scholar
  6. 6.
    Kroboth FJ, Kapoor W, Brown FH, Karpf M, Levey GS. A comparative trial of the clinical evaluation exercise. Arch Intern Med. 1985;145:1121–3.PubMedCrossRefGoogle Scholar
  7. 7.
    Herbers JE Jr., Noel GL, Cooper GS, Harvey J, Pangaro LN, Weaver MJ. How accurate are faculty evaluations of clinical competence. J Gen Intern Med. 1989;4:202–8.PubMedCrossRefGoogle Scholar
  8. 8.
    Kroboth FJ, Hanusa BH, Parker S, et al. The inter-rater reliability and internal consistency of a clinical evaluation exercise. J Gen Intern Med. 1992;7:174–9.PubMedCrossRefGoogle Scholar
  9. 9.
    Noel GL, Herbers JE Jr., Caplow MP, Cooper GS, Pangaro LN, Harvey J. How well do internal medicine faculty members evaluate the clinical skills of residents. Ann Intern Med. 1992;117:757–65.PubMedGoogle Scholar
  10. 10.
    Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical evaluation exercise): a preliminary investigation. Ann Intern Med. 1995;123:795–9.PubMedGoogle Scholar
  11. 11.
    Schroter S, Plowman R, Hutchings A, Gonzalez A. Reporting of Ethical Committee Approval and Patient Consent by Study Design in 5 General Medical Journals. Paper presented at the Fifth International Congress on Peer Review and Biomedical Publication, Chicago, Illinois, September, 2005.Google Scholar
  12. 12.
    Margolis MJ, Clauser BE, Cuddy MM, et al. Use of the Mini-Clinical Evaluation Exercise to Rate Examinee Performance on a Multiple-Station Clinical Skills Examination: A Validity Study. Acad Med. 2006;81(10 Suppl):S56–S60.PubMedCrossRefGoogle Scholar
  13. 13.
    Hatala R, Ainslie M, Kassen BO, Mackie I, Roberts JM. Assessing the mini-Clinical Evaluation Exercise in comparison to a national specialty examination. Med Educ. 2006;40:950–6.PubMedCrossRefGoogle Scholar
  14. 14.
    Brennan RL. An essay on the history and future of reliability from the perspective of replications. J Educ Meas. 2001;38:295–317.CrossRefGoogle Scholar
  15. 15.
    Williams RG, Klamen DA, McGaghie WC. Cognitive, Social, and Environmental Sources of Bias in Clinical Performance Ratings. Teach Learn Med. 2003;15:270–92.PubMedCrossRefGoogle Scholar
  16. 16.
    Newble DI, Hoare J, Sheldrake PF. The selection and training of examiners for clinical examinations. Med Educ. 1980;14:345–9.PubMedCrossRefGoogle Scholar
  17. 17.
    Holmboe ES, Huot S, Chung J, Norcini J, Hawkins RE. Construct validity of the miniclinical evaluation exercise (miniCEX). Acad Med. 2003;78:826–30.PubMedCrossRefGoogle Scholar
  18. 18.
    Müller MJ, Rossbach W, Dannigkeit P, Müller-Siecheneder F, Szegedi A, Wetzel H. Evaluation of standardized rater training for the Positive and Negative Syndrome Scale (PANSS). Schizophr Res. 1998;32:151–60.PubMedCrossRefGoogle Scholar
  19. 19.
    Müller MJ, Dragicevic A. Standardized rater training for the Hamilton Depression Rating Scale (HAMD-17) in psychiatric novices. J Affect Disord. 2003;77:65–9.PubMedCrossRefGoogle Scholar
  20. 20.
    Angkaw AC, Tran GQ, Haaga DAF. Effects of training intensity on observers’ rating of anxiety, social skills, and alcohol-specific coping skills. Behav Res Ther. 2006;44:533–44.PubMedCrossRefGoogle Scholar
  21. 21.
    Woehr DJ, Huffcutt AI. Rater training for performance appraisal: A quantitative review. J Occup Organ Psychol. 1994;67(3):189–205.Google Scholar
  22. 22.
    Cook DA, Beckman TJ. Psychometric properties of mini-clinical evaluation exercise (mini-CEX) scores: Accuracy, reliability, and effect of scale length. Paper presented at the 2008 meeting of the American Educational Research Association, New York, March, 2008.Google Scholar
  23. 23.
    Casella G, Berger RL. Statistical Inference. 2New York: Duxbury Press; 2001.Google Scholar
  24. 24.
    Walter SD, Eliasziw M, Donner A. Sample Size and Optimal Study Designs for Reliability Studies. Stat Med. 1998;17:101–10.PubMedCrossRefGoogle Scholar
  25. 25.
    Zeger SL, Liang K-Y. Longitudinal Data Analysis for Discrete and Continuous Outcomes. Biometrics. 1986;42:121–30.PubMedCrossRefGoogle Scholar
  26. 26.
    Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas. 1973;33:613–9.CrossRefGoogle Scholar
  27. 27.
    Jacobs R, Kozlowski SW. A closer look at halo error in performance ratings. Acad Manage J. 1985;28:201–12.CrossRefGoogle Scholar
  28. 28.
    Harvill LM. NCME Instructional Module: Standard Error of Measurement. Educ Meas: Issues Pract. 1991;10(2):33–41.CrossRefGoogle Scholar
  29. 29.
    Brennan RL. Generalizability Theory. New York: Springer-Verlag; 2001.Google Scholar
  30. 30.
    Norman G. Research in clinical reasoning: past history and current trends. Med Educ. 2005;39:418–27.PubMedCrossRefGoogle Scholar
  31. 31.
    Murphy KR, Cleveland JN, Skattebo AL, Kinney TB. Raters who pursue different goals give different ratings. J Appl Psychol. 2004;89:158–64.PubMedCrossRefGoogle Scholar
  32. 32.
    Kroboth FJ, Hanusa BH, Parker SC. Didactic value of the clinical evaluation exercise. Missed opportunities. J Gen Intern Med. 1996;11:551–3.PubMedCrossRefGoogle Scholar
  33. 33.
    Srinivasan M, Hauer KE, Der-Martirosian C, Wilkes M, Gesundheit N. Does feedback matter? Practice-based learning for medical students after a multi-institutional clinical performance examination. Med Educ. 2007;41:857–65.PubMedCrossRefGoogle Scholar
  34. 34.
    Fernando N, Cleland J, McKenzie H, Cassar K. Identifying the factors that determine feedback given to undergraduate medical students following formative mini-CEX assessments. Med Educ. 2008;42:89–95.PubMedGoogle Scholar
  35. 35.
    Holmboe E, Fiebach N, Galaty L, Huot S. Effectiveness of a focused educational intervention on resident evaluations from faculty. J Gen Intern Med. 2001;16:427–34.PubMedCrossRefGoogle Scholar
  36. 36.
    Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2Hillsdale, NJ: Lawrence Erlbaum; 1988.Google Scholar
  37. 37.
    Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.PubMedCrossRefGoogle Scholar
  38. 38.
    Kobak KA, Engelhardt N, Lipsitz JD. Enriched rater training using Internet based technologies: a comparison to traditional rater training in a multi-site depression trial. J Psychiatr Res. 2006;40:192–9.PubMedCrossRefGoogle Scholar

Copyright information

© Society of General Internal Medicine 2008

Authors and Affiliations

  • David A. Cook
    • 1
    • 2
  • Denise M. Dupras
    • 3
  • Thomas J. Beckman
    • 1
    • 2
  • Kris G. Thomas
    • 3
  • V. Shane Pankratz
    • 4
  1. 1.Office of Education Research, College of MedicineMayo ClinicRochesterUSA
  2. 2.Division of General Internal Medicine, College of MedicineMayo ClinicRochesterUSA
  3. 3.Division of Primary Care Internal Medicine, College of MedicineMayo ClinicRochesterUSA
  4. 4.Division of Biostatistics, College of MedicineMayo ClinicRochesterUSA

Personalised recommendations