Skip to main content
Log in

Effect of Rater Training on Reliability and Accuracy of Mini-CEX Scores: A Randomized, Controlled Trial

  • Original Article
  • Published:
Journal of General Internal Medicine Aims and scope Submit manuscript

Abstract

Background

Mini-CEX scores assess resident competence. Rater training might improve mini-CEX score interrater reliability, but evidence is lacking.

Objective

Evaluate a rater training workshop using interrater reliability and accuracy.

Design

Randomized trial (immediate versus delayed workshop) and single-group pre/post study (randomized groups combined).

Setting

Academic medical center.

Participants

Fifty-two internal medicine clinic preceptors (31 randomized and 21 additional workshop attendees).

Intervention

The workshop included rater error training, performance dimension training, behavioral observation training, and frame of reference training using lecture, video, and facilitated discussion. Delayed group received no intervention until after posttest.

Measurements

Mini-CEX ratings at baseline (just before workshop for workshop group), and four weeks later using videotaped resident–patient encounters; mini-CEX ratings of live resident–patient encounters one year preceding and one year following the workshop; rater confidence using mini-CEX.

Results

Among 31 randomized participants, interrater reliabilities in the delayed group (baseline intraclass correlation coefficient [ICC] 0.43, follow-up 0.53) and workshop group (baseline 0.40, follow-up 0.43) were not significantly different (p = 0.19). Mean ratings were similar at baseline (delayed 4.9 [95% confidence interval 4.6–5.2], workshop 4.8 [4.5–5.1]) and follow-up (delayed 5.4 [5.0–5.7], workshop 5.3 [5.0–5.6]; p = 0.88 for interaction). For the entire cohort, rater confidence (1 = not confident, 6 = very confident) improved from mean (SD) 3.8 (1.4) to 4.4 (1.0), p = 0.018. Interrater reliability for ratings of live encounters (entire cohort) was higher after the workshop (ICC 0.34) than before (ICC 0.18) but the standard error of measurement was similar for both periods.

Conclusions

Rater training did not improve interrater reliability or accuracy of mini-CEX scores.

Clinical trials registration

clinicaltrials.gov identifier NCT00667940

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1

Similar content being viewed by others

References

  1. Holmboe ES, Hawkins RE, Huot SJ. Effects of training in direct observation of medical residents’ clinical competence: a randomized trial. Ann Intern Med. 2004;140:874–81.

    PubMed  Google Scholar 

  2. Norcini JJ, Blank LL, Duffy FD, Fortna GS. The mini-CEX: a method for assessing clinical skills. Ann Intern Med. 2003;138:476–81.

    PubMed  Google Scholar 

  3. Kogan JR, Bellini LM, Shea JA. Feasibility, reliability, and validity of the mini-clinical evaluation exercise (mCEX) in a medicine core clerkship. Acad Med. 2003;78(10 Suppl):S33–5.

    Article  PubMed  Google Scholar 

  4. Holmboe ES, Hawkins RE. Methods for evaluating the clinical competence of residents in internal medicine: a review. Ann Intern Med. 1998;129:42–8.

    PubMed  CAS  Google Scholar 

  5. Woolliscroft JO, Stross JK, Silva J Jr. Clinical competence certification: a critical appraisal. J Med Educ. 1984;59:799–805.

    PubMed  CAS  Google Scholar 

  6. Kroboth FJ, Kapoor W, Brown FH, Karpf M, Levey GS. A comparative trial of the clinical evaluation exercise. Arch Intern Med. 1985;145:1121–3.

    Article  PubMed  CAS  Google Scholar 

  7. Herbers JE Jr., Noel GL, Cooper GS, Harvey J, Pangaro LN, Weaver MJ. How accurate are faculty evaluations of clinical competence. J Gen Intern Med. 1989;4:202–8.

    Article  PubMed  Google Scholar 

  8. Kroboth FJ, Hanusa BH, Parker S, et al. The inter-rater reliability and internal consistency of a clinical evaluation exercise. J Gen Intern Med. 1992;7:174–9.

    Article  PubMed  CAS  Google Scholar 

  9. Noel GL, Herbers JE Jr., Caplow MP, Cooper GS, Pangaro LN, Harvey J. How well do internal medicine faculty members evaluate the clinical skills of residents. Ann Intern Med. 1992;117:757–65.

    PubMed  CAS  Google Scholar 

  10. Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical evaluation exercise): a preliminary investigation. Ann Intern Med. 1995;123:795–9.

    PubMed  CAS  Google Scholar 

  11. Schroter S, Plowman R, Hutchings A, Gonzalez A. Reporting of Ethical Committee Approval and Patient Consent by Study Design in 5 General Medical Journals. Paper presented at the Fifth International Congress on Peer Review and Biomedical Publication, Chicago, Illinois, September, 2005.

  12. Margolis MJ, Clauser BE, Cuddy MM, et al. Use of the Mini-Clinical Evaluation Exercise to Rate Examinee Performance on a Multiple-Station Clinical Skills Examination: A Validity Study. Acad Med. 2006;81(10 Suppl):S56–S60.

    Article  PubMed  Google Scholar 

  13. Hatala R, Ainslie M, Kassen BO, Mackie I, Roberts JM. Assessing the mini-Clinical Evaluation Exercise in comparison to a national specialty examination. Med Educ. 2006;40:950–6.

    Article  PubMed  Google Scholar 

  14. Brennan RL. An essay on the history and future of reliability from the perspective of replications. J Educ Meas. 2001;38:295–317.

    Article  Google Scholar 

  15. Williams RG, Klamen DA, McGaghie WC. Cognitive, Social, and Environmental Sources of Bias in Clinical Performance Ratings. Teach Learn Med. 2003;15:270–92.

    Article  PubMed  Google Scholar 

  16. Newble DI, Hoare J, Sheldrake PF. The selection and training of examiners for clinical examinations. Med Educ. 1980;14:345–9.

    Article  PubMed  CAS  Google Scholar 

  17. Holmboe ES, Huot S, Chung J, Norcini J, Hawkins RE. Construct validity of the miniclinical evaluation exercise (miniCEX). Acad Med. 2003;78:826–30.

    Article  PubMed  Google Scholar 

  18. Müller MJ, Rossbach W, Dannigkeit P, Müller-Siecheneder F, Szegedi A, Wetzel H. Evaluation of standardized rater training for the Positive and Negative Syndrome Scale (PANSS). Schizophr Res. 1998;32:151–60.

    Article  PubMed  Google Scholar 

  19. Müller MJ, Dragicevic A. Standardized rater training for the Hamilton Depression Rating Scale (HAMD-17) in psychiatric novices. J Affect Disord. 2003;77:65–9.

    Article  PubMed  Google Scholar 

  20. Angkaw AC, Tran GQ, Haaga DAF. Effects of training intensity on observers’ rating of anxiety, social skills, and alcohol-specific coping skills. Behav Res Ther. 2006;44:533–44.

    Article  PubMed  Google Scholar 

  21. Woehr DJ, Huffcutt AI. Rater training for performance appraisal: A quantitative review. J Occup Organ Psychol. 1994;67(3):189–205.

    Google Scholar 

  22. Cook DA, Beckman TJ. Psychometric properties of mini-clinical evaluation exercise (mini-CEX) scores: Accuracy, reliability, and effect of scale length. Paper presented at the 2008 meeting of the American Educational Research Association, New York, March, 2008.

  23. Casella G, Berger RL. Statistical Inference. 2New York: Duxbury Press; 2001.

    Google Scholar 

  24. Walter SD, Eliasziw M, Donner A. Sample Size and Optimal Study Designs for Reliability Studies. Stat Med. 1998;17:101–10.

    Article  PubMed  CAS  Google Scholar 

  25. Zeger SL, Liang K-Y. Longitudinal Data Analysis for Discrete and Continuous Outcomes. Biometrics. 1986;42:121–30.

    Article  PubMed  CAS  Google Scholar 

  26. Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas. 1973;33:613–9.

    Article  Google Scholar 

  27. Jacobs R, Kozlowski SW. A closer look at halo error in performance ratings. Acad Manage J. 1985;28:201–12.

    Article  Google Scholar 

  28. Harvill LM. NCME Instructional Module: Standard Error of Measurement. Educ Meas: Issues Pract. 1991;10(2):33–41.

    Article  Google Scholar 

  29. Brennan RL. Generalizability Theory. New York: Springer-Verlag; 2001.

    Google Scholar 

  30. Norman G. Research in clinical reasoning: past history and current trends. Med Educ. 2005;39:418–27.

    Article  PubMed  Google Scholar 

  31. Murphy KR, Cleveland JN, Skattebo AL, Kinney TB. Raters who pursue different goals give different ratings. J Appl Psychol. 2004;89:158–64.

    Article  PubMed  Google Scholar 

  32. Kroboth FJ, Hanusa BH, Parker SC. Didactic value of the clinical evaluation exercise. Missed opportunities. J Gen Intern Med. 1996;11:551–3.

    Article  PubMed  CAS  Google Scholar 

  33. Srinivasan M, Hauer KE, Der-Martirosian C, Wilkes M, Gesundheit N. Does feedback matter? Practice-based learning for medical students after a multi-institutional clinical performance examination. Med Educ. 2007;41:857–65.

    Article  PubMed  Google Scholar 

  34. Fernando N, Cleland J, McKenzie H, Cassar K. Identifying the factors that determine feedback given to undergraduate medical students following formative mini-CEX assessments. Med Educ. 2008;42:89–95.

    PubMed  Google Scholar 

  35. Holmboe E, Fiebach N, Galaty L, Huot S. Effectiveness of a focused educational intervention on resident evaluations from faculty. J Gen Intern Med. 2001;16:427–34.

    Article  PubMed  CAS  Google Scholar 

  36. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2Hillsdale, NJ: Lawrence Erlbaum; 1988.

    Google Scholar 

  37. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.

    Article  PubMed  CAS  Google Scholar 

  38. Kobak KA, Engelhardt N, Lipsitz JD. Enriched rater training using Internet based technologies: a comparison to traditional rater training in a multi-site depression trial. J Psychiatr Res. 2006;40:192–9.

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

We thank E.S. Holmboe for use of scripted cases, G.R. Norman for advice on psychometric analyses, and Mayo Internal Medicine faculty for their participation in this study. Funding was provided by internal sources (the Mayo Education Innovation Program). Study sponsors had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; or preparation, review, or approval of the manuscript. Dr. Cook had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Conflict of Interest

None disclosed.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David A. Cook MD, MHPE.

Electronic supplementary material

Below is the link to the electronic supplementary material

Appendix

Cook et al, CEX Rater Training (DOC 129kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cook, D.A., Dupras, D.M., Beckman, T.J. et al. Effect of Rater Training on Reliability and Accuracy of Mini-CEX Scores: A Randomized, Controlled Trial. J GEN INTERN MED 24, 74–79 (2009). https://doi.org/10.1007/s11606-008-0842-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11606-008-0842-3

KEY WORDS

Navigation