Effect of Rater Training on Reliability and Accuracy of Mini-CEX Scores: A Randomized, Controlled Trial

Cook, David A.; Dupras, Denise M.; Beckman, Thomas J.; Thomas, Kris G.; Pankratz, V. Shane

doi:10.1007/s11606-008-0842-3

Effect of Rater Training on Reliability and Accuracy of Mini-CEX Scores: A Randomized, Controlled Trial

Original Article
Published: 11 November 2008

Volume 24, pages 74–79, (2009)
Cite this article

Journal of General Internal Medicine Aims and scope Submit manuscript

David A. Cook MD, MHPE^1,2,
Denise M. Dupras MD, PhD³,
Thomas J. Beckman MD^1,2,
Kris G. Thomas MD³ &
…
V. Shane Pankratz PhD⁴

2128 Accesses
126 Citations
8 Altmetric
1 Mention
Explore all metrics

Abstract

Background

Mini-CEX scores assess resident competence. Rater training might improve mini-CEX score interrater reliability, but evidence is lacking.

Objective

Evaluate a rater training workshop using interrater reliability and accuracy.

Design

Randomized trial (immediate versus delayed workshop) and single-group pre/post study (randomized groups combined).

Setting

Academic medical center.

Participants

Fifty-two internal medicine clinic preceptors (31 randomized and 21 additional workshop attendees).

Intervention

The workshop included rater error training, performance dimension training, behavioral observation training, and frame of reference training using lecture, video, and facilitated discussion. Delayed group received no intervention until after posttest.

Measurements

Mini-CEX ratings at baseline (just before workshop for workshop group), and four weeks later using videotaped resident–patient encounters; mini-CEX ratings of live resident–patient encounters one year preceding and one year following the workshop; rater confidence using mini-CEX.

Results

Among 31 randomized participants, interrater reliabilities in the delayed group (baseline intraclass correlation coefficient [ICC] 0.43, follow-up 0.53) and workshop group (baseline 0.40, follow-up 0.43) were not significantly different (p = 0.19). Mean ratings were similar at baseline (delayed 4.9 [95% confidence interval 4.6–5.2], workshop 4.8 [4.5–5.1]) and follow-up (delayed 5.4 [5.0–5.7], workshop 5.3 [5.0–5.6]; p = 0.88 for interaction). For the entire cohort, rater confidence (1 = not confident, 6 = very confident) improved from mean (SD) 3.8 (1.4) to 4.4 (1.0), p = 0.018. Interrater reliability for ratings of live encounters (entire cohort) was higher after the workshop (ICC 0.34) than before (ICC 0.18) but the standard error of measurement was similar for both periods.

Conclusions

Rater training did not improve interrater reliability or accuracy of mini-CEX scores.

Clinical trials registration

clinicaltrials.gov identifier NCT00667940

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Expectations, observations, and the cognitive processes that bind them: expert assessment of examinee performance

Article 30 November 2015

Inter-rater variability as mutual disagreement: identifying raters’ divergent points of view

Article 20 September 2016

Inter-rater reliability and generalizability of patient note scores using a scoring rubric based on the USMLE Step-2 CS format

Article 12 January 2016

References

Holmboe ES, Hawkins RE, Huot SJ. Effects of training in direct observation of medical residents’ clinical competence: a randomized trial. Ann Intern Med. 2004;140:874–81.
PubMed Google Scholar
Norcini JJ, Blank LL, Duffy FD, Fortna GS. The mini-CEX: a method for assessing clinical skills. Ann Intern Med. 2003;138:476–81.
PubMed Google Scholar
Kogan JR, Bellini LM, Shea JA. Feasibility, reliability, and validity of the mini-clinical evaluation exercise (mCEX) in a medicine core clerkship. Acad Med. 2003;78(10 Suppl):S33–5.
Article PubMed Google Scholar
Holmboe ES, Hawkins RE. Methods for evaluating the clinical competence of residents in internal medicine: a review. Ann Intern Med. 1998;129:42–8.
PubMed CAS Google Scholar
Woolliscroft JO, Stross JK, Silva J Jr. Clinical competence certification: a critical appraisal. J Med Educ. 1984;59:799–805.
PubMed CAS Google Scholar
Kroboth FJ, Kapoor W, Brown FH, Karpf M, Levey GS. A comparative trial of the clinical evaluation exercise. Arch Intern Med. 1985;145:1121–3.
Article PubMed CAS Google Scholar
Herbers JE Jr., Noel GL, Cooper GS, Harvey J, Pangaro LN, Weaver MJ. How accurate are faculty evaluations of clinical competence. J Gen Intern Med. 1989;4:202–8.
Article PubMed Google Scholar
Kroboth FJ, Hanusa BH, Parker S, et al. The inter-rater reliability and internal consistency of a clinical evaluation exercise. J Gen Intern Med. 1992;7:174–9.
Article PubMed CAS Google Scholar
Noel GL, Herbers JE Jr., Caplow MP, Cooper GS, Pangaro LN, Harvey J. How well do internal medicine faculty members evaluate the clinical skills of residents. Ann Intern Med. 1992;117:757–65.
PubMed CAS Google Scholar
Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical evaluation exercise): a preliminary investigation. Ann Intern Med. 1995;123:795–9.
PubMed CAS Google Scholar
Schroter S, Plowman R, Hutchings A, Gonzalez A. Reporting of Ethical Committee Approval and Patient Consent by Study Design in 5 General Medical Journals. Paper presented at the Fifth International Congress on Peer Review and Biomedical Publication, Chicago, Illinois, September, 2005.
Margolis MJ, Clauser BE, Cuddy MM, et al. Use of the Mini-Clinical Evaluation Exercise to Rate Examinee Performance on a Multiple-Station Clinical Skills Examination: A Validity Study. Acad Med. 2006;81(10 Suppl):S56–S60.
Article PubMed Google Scholar
Hatala R, Ainslie M, Kassen BO, Mackie I, Roberts JM. Assessing the mini-Clinical Evaluation Exercise in comparison to a national specialty examination. Med Educ. 2006;40:950–6.
Article PubMed Google Scholar
Brennan RL. An essay on the history and future of reliability from the perspective of replications. J Educ Meas. 2001;38:295–317.
Article Google Scholar
Williams RG, Klamen DA, McGaghie WC. Cognitive, Social, and Environmental Sources of Bias in Clinical Performance Ratings. Teach Learn Med. 2003;15:270–92.
Article PubMed Google Scholar
Newble DI, Hoare J, Sheldrake PF. The selection and training of examiners for clinical examinations. Med Educ. 1980;14:345–9.
Article PubMed CAS Google Scholar
Holmboe ES, Huot S, Chung J, Norcini J, Hawkins RE. Construct validity of the miniclinical evaluation exercise (miniCEX). Acad Med. 2003;78:826–30.
Article PubMed Google Scholar
Müller MJ, Rossbach W, Dannigkeit P, Müller-Siecheneder F, Szegedi A, Wetzel H. Evaluation of standardized rater training for the Positive and Negative Syndrome Scale (PANSS). Schizophr Res. 1998;32:151–60.
Article PubMed Google Scholar
Müller MJ, Dragicevic A. Standardized rater training for the Hamilton Depression Rating Scale (HAMD-17) in psychiatric novices. J Affect Disord. 2003;77:65–9.
Article PubMed Google Scholar
Angkaw AC, Tran GQ, Haaga DAF. Effects of training intensity on observers’ rating of anxiety, social skills, and alcohol-specific coping skills. Behav Res Ther. 2006;44:533–44.
Article PubMed Google Scholar
Woehr DJ, Huffcutt AI. Rater training for performance appraisal: A quantitative review. J Occup Organ Psychol. 1994;67(3):189–205.
Google Scholar
Cook DA, Beckman TJ. Psychometric properties of mini-clinical evaluation exercise (mini-CEX) scores: Accuracy, reliability, and effect of scale length. Paper presented at the 2008 meeting of the American Educational Research Association, New York, March, 2008.
Casella G, Berger RL. Statistical Inference. 2New York: Duxbury Press; 2001.
Google Scholar
Walter SD, Eliasziw M, Donner A. Sample Size and Optimal Study Designs for Reliability Studies. Stat Med. 1998;17:101–10.
Article PubMed CAS Google Scholar
Zeger SL, Liang K-Y. Longitudinal Data Analysis for Discrete and Continuous Outcomes. Biometrics. 1986;42:121–30.
Article PubMed CAS Google Scholar
Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas. 1973;33:613–9.
Article Google Scholar
Jacobs R, Kozlowski SW. A closer look at halo error in performance ratings. Acad Manage J. 1985;28:201–12.
Article Google Scholar
Harvill LM. NCME Instructional Module: Standard Error of Measurement. Educ Meas: Issues Pract. 1991;10(2):33–41.
Article Google Scholar
Brennan RL. Generalizability Theory. New York: Springer-Verlag; 2001.
Google Scholar
Norman G. Research in clinical reasoning: past history and current trends. Med Educ. 2005;39:418–27.
Article PubMed Google Scholar
Murphy KR, Cleveland JN, Skattebo AL, Kinney TB. Raters who pursue different goals give different ratings. J Appl Psychol. 2004;89:158–64.
Article PubMed Google Scholar
Kroboth FJ, Hanusa BH, Parker SC. Didactic value of the clinical evaluation exercise. Missed opportunities. J Gen Intern Med. 1996;11:551–3.
Article PubMed CAS Google Scholar
Srinivasan M, Hauer KE, Der-Martirosian C, Wilkes M, Gesundheit N. Does feedback matter? Practice-based learning for medical students after a multi-institutional clinical performance examination. Med Educ. 2007;41:857–65.
Article PubMed Google Scholar
Fernando N, Cleland J, McKenzie H, Cassar K. Identifying the factors that determine feedback given to undergraduate medical students following formative mini-CEX assessments. Med Educ. 2008;42:89–95.
PubMed Google Scholar
Holmboe E, Fiebach N, Galaty L, Huot S. Effectiveness of a focused educational intervention on resident evaluations from faculty. J Gen Intern Med. 2001;16:427–34.
Article PubMed CAS Google Scholar
Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2Hillsdale, NJ: Lawrence Erlbaum; 1988.
Google Scholar
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.
Article PubMed CAS Google Scholar
Kobak KA, Engelhardt N, Lipsitz JD. Enriched rater training using Internet based technologies: a comparison to traditional rater training in a multi-site depression trial. J Psychiatr Res. 2006;40:192–9.
Article PubMed Google Scholar

Download references

Acknowledgments

We thank E.S. Holmboe for use of scripted cases, G.R. Norman for advice on psychometric analyses, and Mayo Internal Medicine faculty for their participation in this study. Funding was provided by internal sources (the Mayo Education Innovation Program). Study sponsors had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; or preparation, review, or approval of the manuscript. Dr. Cook had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Conflict of Interest

None disclosed.

Author information

Authors and Affiliations

Office of Education Research, College of Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
David A. Cook MD, MHPE & Thomas J. Beckman MD
Division of General Internal Medicine, College of Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
David A. Cook MD, MHPE & Thomas J. Beckman MD
Division of Primary Care Internal Medicine, College of Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
Denise M. Dupras MD, PhD & Kris G. Thomas MD
Division of Biostatistics, College of Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
V. Shane Pankratz PhD

Authors

David A. Cook MD, MHPE
View author publications
You can also search for this author in PubMed Google Scholar
Denise M. Dupras MD, PhD
View author publications
You can also search for this author in PubMed Google Scholar
Thomas J. Beckman MD
View author publications
You can also search for this author in PubMed Google Scholar
Kris G. Thomas MD
View author publications
You can also search for this author in PubMed Google Scholar
V. Shane Pankratz PhD
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David A. Cook MD, MHPE.

Electronic supplementary material

Below is the link to the electronic supplementary material

Appendix

Cook et al, CEX Rater Training (DOC 129kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cook, D.A., Dupras, D.M., Beckman, T.J. et al. Effect of Rater Training on Reliability and Accuracy of Mini-CEX Scores: A Randomized, Controlled Trial. J GEN INTERN MED 24, 74–79 (2009). https://doi.org/10.1007/s11606-008-0842-3

Download citation

Received: 07 May 2008
Revised: 03 September 2008
Accepted: 09 October 2008
Published: 11 November 2008
Issue Date: January 2009
DOI: https://doi.org/10.1007/s11606-008-0842-3

KEY WORDS

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effect of Rater Training on Reliability and Accuracy of Mini-CEX Scores: A Randomized, Controlled Trial