Effect of Rater Training on Reliability and Accuracy of Mini-CEX Scores: A Randomized, Controlled Trial
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
Mini-CEX scores assess resident competence. Rater training might improve mini-CEX score interrater reliability, but evidence is lacking.
Evaluate a rater training workshop using interrater reliability and accuracy.
Randomized trial (immediate versus delayed workshop) and single-group pre/post study (randomized groups combined).
Academic medical center.
Fifty-two internal medicine clinic preceptors (31 randomized and 21 additional workshop attendees).
The workshop included rater error training, performance dimension training, behavioral observation training, and frame of reference training using lecture, video, and facilitated discussion. Delayed group received no intervention until after posttest.
Mini-CEX ratings at baseline (just before workshop for workshop group), and four weeks later using videotaped resident–patient encounters; mini-CEX ratings of live resident–patient encounters one year preceding and one year following the workshop; rater confidence using mini-CEX.
Among 31 randomized participants, interrater reliabilities in the delayed group (baseline intraclass correlation coefficient [ICC] 0.43, follow-up 0.53) and workshop group (baseline 0.40, follow-up 0.43) were not significantly different (p = 0.19). Mean ratings were similar at baseline (delayed 4.9 [95% confidence interval 4.6–5.2], workshop 4.8 [4.5–5.1]) and follow-up (delayed 5.4 [5.0–5.7], workshop 5.3 [5.0–5.6]; p = 0.88 for interaction). For the entire cohort, rater confidence (1 = not confident, 6 = very confident) improved from mean (SD) 3.8 (1.4) to 4.4 (1.0), p = 0.018. Interrater reliability for ratings of live encounters (entire cohort) was higher after the workshop (ICC 0.34) than before (ICC 0.18) but the standard error of measurement was similar for both periods.
Rater training did not improve interrater reliability or accuracy of mini-CEX scores.
Clinical trials registration
clinicaltrials.gov identifier NCT00667940
- Holmboe, ES, Hawkins, RE, Huot, SJ (2004) Effects of training in direct observation of medical residents’ clinical competence: a randomized trial. Ann Intern Med 140: pp. 874-881
- Norcini, JJ, Blank, LL, Duffy, FD, Fortna, GS (2003) The mini-CEX: a method for assessing clinical skills. Ann Intern Med 138: pp. 476-481
- Kogan, JR, Bellini, LM, Shea, JA (2003) Feasibility, reliability, and validity of the mini-clinical evaluation exercise (mCEX) in a medicine core clerkship. Acad Med 78: pp. S33-S35 CrossRef
- Holmboe, ES, Hawkins, RE (1998) Methods for evaluating the clinical competence of residents in internal medicine: a review. Ann Intern Med 129: pp. 42-48
- Woolliscroft, JO, Stross, JK, Silva, J (1984) Clinical competence certification: a critical appraisal. J Med Educ 59: pp. 799-805
- Kroboth, FJ, Kapoor, W, Brown, FH, Karpf, M, Levey, GS (1985) A comparative trial of the clinical evaluation exercise. Arch Intern Med 145: pp. 1121-1123 CrossRef
- Herbers, JE, Noel, GL, Cooper, GS, Harvey, J, Pangaro, LN, Weaver, MJ (1989) How accurate are faculty evaluations of clinical competence. J Gen Intern Med. 4: pp. 202-208 CrossRef
- Kroboth, FJ, Hanusa, BH, Parker, S (1992) The inter-rater reliability and internal consistency of a clinical evaluation exercise. J Gen Intern Med 7: pp. 174-179 CrossRef
- Noel, GL, Herbers, JE, Caplow, MP, Cooper, GS, Pangaro, LN, Harvey, J (1992) How well do internal medicine faculty members evaluate the clinical skills of residents. Ann Intern Med. 117: pp. 757-765
- Norcini, JJ, Blank, LL, Arnold, GK, Kimball, HR (1995) The mini-CEX (clinical evaluation exercise): a preliminary investigation. Ann Intern Med 123: pp. 795-799
- Schroter S, Plowman R, Hutchings A, Gonzalez A. Reporting of Ethical Committee Approval and Patient Consent by Study Design in 5 General Medical Journals. Paper presented at the Fifth International Congress on Peer Review and Biomedical Publication, Chicago, Illinois, September, 2005.
- Margolis, MJ, Clauser, BE, Cuddy, MM (2006) Use of the Mini-Clinical Evaluation Exercise to Rate Examinee Performance on a Multiple-Station Clinical Skills Examination: A Validity Study. Acad Med 81: pp. S56-S60 CrossRef
- Hatala, R, Ainslie, M, Kassen, BO, Mackie, I, Roberts, JM (2006) Assessing the mini-Clinical Evaluation Exercise in comparison to a national specialty examination. Med Educ 40: pp. 950-956 CrossRef
- Brennan, RL (2001) An essay on the history and future of reliability from the perspective of replications. J Educ Meas 38: pp. 295-317 CrossRef
- Williams, RG, Klamen, DA, McGaghie, WC (2003) Cognitive, Social, and Environmental Sources of Bias in Clinical Performance Ratings. Teach Learn Med 15: pp. 270-292 CrossRef
- Newble, DI, Hoare, J, Sheldrake, PF (1980) The selection and training of examiners for clinical examinations. Med Educ 14: pp. 345-349 CrossRef
- Holmboe, ES, Huot, S, Chung, J, Norcini, J, Hawkins, RE (2003) Construct validity of the miniclinical evaluation exercise (miniCEX). Acad Med 78: pp. 826-830 CrossRef
- Müller, MJ, Rossbach, W, Dannigkeit, P, Müller-Siecheneder, F, Szegedi, A, Wetzel, H (1998) Evaluation of standardized rater training for the Positive and Negative Syndrome Scale (PANSS). Schizophr Res 32: pp. 151-160 CrossRef
- Müller, MJ, Dragicevic, A (2003) Standardized rater training for the Hamilton Depression Rating Scale (HAMD-17) in psychiatric novices. J Affect Disord 77: pp. 65-69 CrossRef
- Angkaw, AC, Tran, GQ, Haaga, DAF (2006) Effects of training intensity on observers’ rating of anxiety, social skills, and alcohol-specific coping skills. Behav Res Ther 44: pp. 533-544 CrossRef
- Woehr, DJ, Huffcutt, AI (1994) Rater training for performance appraisal: A quantitative review. J Occup Organ Psychol 67: pp. 189-205
- Cook DA, Beckman TJ. Psychometric properties of mini-clinical evaluation exercise (mini-CEX) scores: Accuracy, reliability, and effect of scale length. Paper presented at the 2008 meeting of the American Educational Research Association, New York, March, 2008.
- Casella, G, Berger, RL (2001) Statistical Inference. Duxbury Press, New York
- Walter, SD, Eliasziw, M, Donner, A (1998) Sample Size and Optimal Study Designs for Reliability Studies. Stat Med 17: pp. 101-110 CrossRef
- Zeger, SL, Liang, K-Y (1986) Longitudinal Data Analysis for Discrete and Continuous Outcomes. Biometrics 42: pp. 121-130 CrossRef
- Fleiss, JL, Cohen, J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas 33: pp. 613-619 CrossRef
- Jacobs, R, Kozlowski, SW (1985) A closer look at halo error in performance ratings. Acad Manage J 28: pp. 201-212 CrossRef
- Harvill, LM (1991) NCME Instructional Module: Standard Error of Measurement. Educ Meas: Issues Pract 10: pp. 33-41 CrossRef
- Brennan, RL (2001) Generalizability Theory. Springer-Verlag, New York
- Norman, G (2005) Research in clinical reasoning: past history and current trends. Med Educ 39: pp. 418-427 CrossRef
- Murphy, KR, Cleveland, JN, Skattebo, AL, Kinney, TB (2004) Raters who pursue different goals give different ratings. J Appl Psychol 89: pp. 158-164 CrossRef
- Kroboth, FJ, Hanusa, BH, Parker, SC (1996) Didactic value of the clinical evaluation exercise. Missed opportunities. J Gen Intern Med. 11: pp. 551-553 CrossRef
- Srinivasan, M, Hauer, KE, Der-Martirosian, C, Wilkes, M, Gesundheit, N (2007) Does feedback matter? Practice-based learning for medical students after a multi-institutional clinical performance examination. Med Educ. 41: pp. 857-865 CrossRef
- Fernando, N, Cleland, J, McKenzie, H, Cassar, K (2008) Identifying the factors that determine feedback given to undergraduate medical students following formative mini-CEX assessments. Med Educ. 42: pp. 89-95
- Holmboe, E, Fiebach, N, Galaty, L, Huot, S (2001) Effectiveness of a focused educational intervention on resident evaluations from faculty. J Gen Intern Med 16: pp. 427-434 CrossRef
- Cohen, J (1988) Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum, Hillsdale, NJ
- Landis, JR, Koch, GG (1977) The measurement of observer agreement for categorical data. Biometrics 33: pp. 159-174 CrossRef
- Kobak, KA, Engelhardt, N, Lipsitz, JD (2006) Enriched rater training using Internet based technologies: a comparison to traditional rater training in a multi-site depression trial. J Psychiatr Res 40: pp. 192-199 CrossRef
- Effect of Rater Training on Reliability and Accuracy of Mini-CEX Scores: A Randomized, Controlled Trial
Journal of General Internal Medicine
Volume 24, Issue 1 , pp 74-79
- Cover Date
- Print ISSN
- Online ISSN
- Additional Links
- medical education
- faculty development
- rater training
- clinical competence
- randomized trial
- Industry Sectors
- Author Affiliations
- 1. Office of Education Research, College of Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
- 2. Division of General Internal Medicine, College of Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
- 3. Division of Primary Care Internal Medicine, College of Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
- 4. Division of Biostatistics, College of Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA