Abstract
In this perspective, the authors critically examine “rater training” as it has been conceptualized and used in medical education. By “rater training,” they mean the educational events intended to improve rater performance and contributions during assessment events. Historically, rater training programs have focused on modifying faculty behaviours to achieve psychometric ideals (e.g., reliability, inter-rater reliability, accuracy). The authors argue these ideals may now be poorly aligned with contemporary research informing work-based assessment, introducing a compatibility threat, with no clear direction on how to proceed. To address this issue, the authors provide a brief historical review of “rater training” and provide an analysis of the literature examining the effectiveness of rater training programs. They focus mainly on what has served to define effectiveness or improvements. They then draw on philosophical and conceptual shifts in assessment to demonstrate why the function, effectiveness aims, and structure of rater training requires reimagining. These include shifting competencies for assessors, viewing assessment as a complex cognitive task enacted in a social context, evolving views on biases, and reprioritizing which validity evidence should be most sought in medical education. The authors aim to advance the discussion on rater training by challenging implicit incompatibility issues and stimulating ways to overcome them. They propose that “rater training” (a moniker they suggest be reserved for strong psychometric aims) be augmented with “assessor readiness” programs that link to contemporary assessment science and enact the principle of compatibility between that science and ways of engaging with advances in real-world faculty-learner contexts.
Similar content being viewed by others
References
Bittner, R. H. (1948). Developing an industrial merit rating procedure. Personnel Psychology, 1(4), 403–432. https://doi.org/10.1111/j.1744-6570.1948.tb01319.x
Bullock, J. L., Lai, C. J., Lockspeiser, T., O’Sullivan, P. S., Aronowitz, P., Dellmore, D., Fung, C. C., Knight, C., & Hauer, K. E. (2019). In pursuit of honors: A multi-institutional study of students’ perceptions of clerkship evaluation and grading. Academic Medicine, 94(11S), S48–S56. https://doi.org/10.1097/acm.0000000000002905
Cook, D. A., Dupras, D. M., Beckman, T. J., Thomas, K. G., & Pankratz, V. S. (2009). Effect of rater training on reliability and accuracy of mini-CEX scores: A randomized, controlled trial. Journal of General Internal Medicine, 24(1), 74–79. https://doi.org/10.1007/s11606-008-0842-3
Cook, D. A., Zendejas, B., Hamstra, S. J., Hatala, R., & Brydges, R. (2014). What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Advances in Health Sciences Education, 19(2), 233–250. https://doi.org/10.1007/s10459-013-9458-4
Cook, D. A., Brydges, R., Ginsburg, S., & Hatala, R. (2015). A contemporary approach to validity arguments: A practical guide to Kane’s framework. Medical Education, 49(6), 560–575. https://doi.org/10.1111/medu.12678
Cook, D. A., Kuper, A., Hatala, R., & Ginsburg, S. J. A. M. (2016). When assessment data are words: Validity evidence for qualitative educational assessments. Academic Medicine, 91(10), 1359–1369. https://doi.org/10.1097/acm.0000000000001175
Engelhard, G., & Wind, S. A. (2019). Invariant measurement with raters and rating scales. Rasch models for rater-mediated assessments.
Eppich, W., Nannicelli, A. P., Seivert, N. P., Sohn, M. W., Rozenfeld, R., Woods, D. M., & Holl, J. L. (2015). A rater training protocol to assess team performance. Journal of Continuing Education in the Health Professions, 35(2), 83–90. https://doi.org/10.1002/chp.21270
Eva, K. W. (2018). Cognitive influences on complex performance assessment: Lessons from the interplay between medicine and psychology. Journal of Applied Research in Memory and Cognition, 7(2), 177–188. https://doi.org/10.1016/j.jarmac.2018.03.008
Feldman, M., Lazzara, E. H., Vanderbilt, A. A., & DiazGranados, D. (2012). Rater training to support high-stakes simulation‐based assessments. Journal of Continuing Education in the Health Professions, 32(4), 279–286. https://doi.org/10.1002/chp.21156
Forte, M., Morson, N., Mirchandani, N., Grundland, B., Fernando, O., & Rubenstein, W. (2021). How teachers adapt their cognitive strategies when using entrustment scales. Academic Medicine, 96(11S), S87–S92. https://doi.org/10.1097/acm.0000000000004287
Gingerich, A., Regehr, G., & Eva, K. W. (2011). Rater-based assessments as social judgments: Rethinking the etiology of rater errors. Academic Medicine, 86(10), S1–S7. https://doi.org/10.1097/acm.0b013e31822a6cf8
Gingerich, A., Kogan, J., Yeates, P., Govaerts, M., & Holmboe, E. (2014). Seeing the ‘black box’ differently: Assessor cognition from three research perspectives. Medical Education, 48(11), 1055–1068. https://doi.org/10.1111/medu.12546
Gomes, M. M., Driman, D., Park, Y. S., Wood, T. J., Yudkowsky, R., & Dudek, N. L. (2021). Teaching and assessing intra-operative consultations in competency-based medical education: Development of a workplace-based assessment instrument. Virchows Archiv, 479(4), 803–813. https://doi.org/10.1007/s00428-021-03113-6
Gonzalez, C. M., Lypson, M. L., & Sukhera, J. (2021). Twelve tips for teaching implicit bias recognition and management. Medical Teacher, 43(12), 1368–1373. https://doi.org/10.1080/0142159x.2021.1879378
Govaerts, M. J. B. (2016). Competence in assessment: Beyond cognition. Medical Education, 50(5), 502–504. https://doi.org/10.1111/medu.13000
Govaerts, M. J. B., & van der Vleuten, C. P. M. (2013). Validity in work-based assessment: Expanding our horizons. Medical Education, 47(12), 1164–1174. https://doi.org/10.1111/medu.12289
Govaerts, M. J. B., van der Vleuten, C. P. M., Schuwirth, L. W. T., & Muijtjens, A. M. M. (2007). Broadening perspectives on clinical performance assessment: Rethinking the nature of in-training assessment. Advances in Health Sciences Education, 12(2), 239–260. https://doi.org/10.1007/s10459-006-9043-1
Govaerts, M. J. B., Van de Wiel, M. W. J., Schuwirth, L. W. T., Van der Vleuten, C. P. M., & Muijtjens, A. M. M. (2013). Workplace-based assessment: Raters’ performance theories and constructs. Advances in Health Sciences Education, 18(3), 375–396. https://doi.org/10.1007/s10459-012-9376-x
Gruppen, L. D., Irby, D. M., Durning, S. J., & Maggio, L. A. (2019). Conceptualizing learning environments in the health professions. Academic Medicine, 94(7), 969–974. https://doi.org/10.1097/acm.0000000000002702
Halliday, D. A. (2022). Examining the effects of a rater training program on interrater reliability with the Lasater Clinical Judgement Rubric. (Publication No. 29321479). [Doctoral Dissertation, Widener University]. ProQuest Dissertations Publishing.
Holmboe, E. S. (2004). Faculty and the observation of trainees’ clinical skills: Problems and opportunities. Academic Medicine, 79(1), 16–22. https://doi.org/10.1097/00001888-200401000-00006
Holmboe, E. S., Hawkins, R., & Huot, S. J. (2004). Effects of training in direct observation of medical residents’ clinical competence. Annals of Internal Medicine, 140(11), 874–881. https://doi.org/10.7326/0003-4819-140-11-200406010-00008
Kinnear, B., Schumacher, D. J., Driessen, E. W., & Varpio, L. (2022). How argumentation theory can inform assessment validity: A critical review. Medical Education, 56(11), 1064–1075. https://doi.org/10.1111/medu.14882
Klasen, J. M., & Lingard, L. A. (2019). Allowing failure for educational purposes in postgraduate clinical training: A narrative review. Medical Teacher, 41(11), 1263–1269. https://doi.org/10.1080/0142159x.2019.1630728
Klasen, J. M., Driessen, E., Teunissen, P. W., & Lingard, L. A. (2020). Whatever you cut, I can fix it’: Clinical supervisors’ interview accounts of allowing trainee failure while guarding patient safety. BMJ Quality & Safety, 29(9), 727–734. https://doi.org/10.1136/bmjqs-2019-009808
Klein, R., Ufere, N. N., Rao, S. R., Koch, J., Volerman, A., Snyder, E. D., Schaeffer, S., Thompson, V., Warner, A. S., Julian, K. A., & Kalamara, A. (2020). Association of gender with learner assessment in graduate medical education. JAMA Network Open, 3(7), e2010888. https://doi.org/10.1001/jamanetworkopen.2020.10888
Kogan, J. R., Conforti, L. N., Bernabeo, E., Iobst, W., & Holmboe, E. (2015). How faculty members experience workplace-based assessment rater training: A qualitative study. Medical Education, 49(7), 692–708. https://doi.org/10.1111/medu.12733
Kogan, J. R., Conforti, L. N., Yamazaki, K., Iobst, W., & Holmboe, E. S. (2017). Commitment to change and challenges to implementing changes after workplace-based assessment rater training. Academic Medicine, 92(3), 394–402. https://doi.org/10.1097/acm.0000000000001319
Kogan, J. R., Dine, C. J., Conforti, L. N., & Holmboe, E. S. (2022). Can rater training improve the quality and accuracy of workplace-based assessment narrative comments and entrustment ratings? A randomized controlled trial. Academic Medicine, 101097. https://doi.org/10.1097/acm.0000000000004819
Kuper, A., Reeves, S., Albert, M., & Hodges, B. D. (2007). Assessment: Do we need to broaden our methodological horizons? Medical Education, 41(12), 1121–1123. https://doi.org/10.1111/j.1365-2923.2007.02945.x
Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87(1), 72–107. https://doi.org/10.1037/0033-2909.87.1.72
Lockyer, J., Carraccio, C., Chan, M. K., Hart, D., Smee, S., Touchie, C., Holmboe, E. S., & Frank, J. R. (2017). Core principles of assessment in competency-based medical education. Medical Teacher, 39(6), 609–616. https://doi.org/10.1080/0142159x.2017.1315082
Lucey, C. R., Hauer, K. E., Boatright, D., & Fernandez, A. (2020). Medical education’s wicked problem: Achieving equity in assessment for medical learners. Academic Medicine, 95(12S), S98–S108. https://doi.org/10.1097/acm.0000000000003717
Massie, J., & Ali, J. M. (2016). Workplace-based assessment: A review of user perceptions and strategies to address the identified shortcomings. Advances in Health Sciences Education, 21(2), 455–473. https://doi.org/10.1007/s10459-015-9614-0
McDade, W., Vela, M. B., & Sánchez, J. P. (2020). Anticipating the impact of the USMLE Step 1 pass/fail scoring decision on underrepresented-in-medicine students. Academic Medicine, 95(9), 1318–1321. https://doi.org/10.1097/acm.0000000000003490
Melvin, L., Rassos, J., Stroud, L., & Ginsburg, S. (2019). Tensions in assessment: The realities of entrustment in internal medicine. Academic Medicine, 95(4), 609–615. https://doi.org/10.1097/acm.0000000000002991
Newble, D. I., Hoare, J., & Sheldrake, P. F. (1980). The selection and training of examiners for clinical examinations. Medical Education, 14(5), 345–349. https://doi.org/10.1111/j.1365-2923.1980.tb02379.x
Newman, L. R., Brodsky, D., Jones, R. N., Schwartzstein, R. M., Atkins, K. M., & Roberts, D. H. (2016). Frame-of-reference training: Establishing reliable assessment of teaching effectiveness. Journal of Continuing Education in the Health Professions, 36(3), 206–210. https://doi.org/10.1097/ceh.0000000000000086
Ng, S. L., Wright, S. R., & Kuper, A. (2019). The divergence and convergence of critical reflection and critical reflexivity: Implications for health professions education. Academic Medicine, 94(8), 1122–1128. https://doi.org/10.1097/acm.0000000000002724
Ott, M. C., Pack, R., Cristancho, S., Chin, M., Van Koughnett, J. A., & Ott, M. (2022). The most crushing thing”: Understanding resident assessment burden in a competency-based curriculum. Journal of Graduate Medical Education, 14(5), 583–592. https://doi.org/10.4300/jgme-d-22-00050.1
Preusche, I., Schmidts, M., & Wagner-menghin, M. (2012). Twelve tips for designing and implementing a structured rater training in OSCEs. Medical Teacher, 34(5), 368–372. https://doi.org/10.3109/0142159x.2012.652705
Robertson, R. L., Park, J., Gillman, L., & Vergis, A. (2020). The impact of rater training on the psychometric properties of standardized surgical skill assessment tools. The American Journal of Surgery, 220(3), 610–615. https://doi.org/10.1016/j.amjsurg.2020.01.019
Roch, S. G., Woehr, D. J., Mishra, V., & Kieszczynska, U. (2012). Rater training revisited: An updated meta-analytic review of frame‐of‐reference training. Journal of Occupational and Organizational Psychology, 85(2), 370–395. https://doi.org/10.1111/j.2044-8325.2011.02045.x
Sachdeva, A. K. (2016). Continuing professional development in the twenty-first century. Journal of Continuing Education in the Health Professions, 36, S8–S13. https://doi.org/10.1097/ceh.0000000000000107
Sargeant, J., Wong, B. M., & Campbell, C. M. (2018). CPD of the future: A partnership between quality improvement and competency-based education. Medical Education, 52(1), 125–135. https://doi.org/10.1111/medu.13407
Schumacher, D. J., Cate, O., Damodaran, A., Richardson, D., Hamstra, S. J., Ross, S., Hodgson, J., Touchie, C., Molgaard, L., Gofton, W., & Carraccio, C. (2021). Clarifying essential terminology in entrustment. Medical Teacher, 43(7), 737–744. https://doi.org/10.1080/0142159x.2021.1924365
Schuwirth, L. W. T., & van der Vleuten, C. P. M. (2011). Programmatic assessment: From assessment of learning to assessment for learning. Medical Teacher, 33(6), 478–485. https://doi.org/10.3109/0142159x.2011.565828
Schuwirth, L. W., & van der Vleuten, C. P. (2020). A history of assessment in medical education. Advances in Health Sciences Education, 25(5), 1045–1056. https://doi.org/10.1007/s10459-020-10003-0
Shankar, S., St-Onge, C., & Young, M. E. (2022). When I say… response process validity evidence. Medical Education, 56(9), 878–880. https://doi.org/10.1111/medu.14853
Smith, D. E. (1986). Training programs for performance appraisal: A review. Academy of Management, 11(1), 22–40. https://doi.org/10.2307/258329
Spool, M. D. (1978). Training programs for observers of behavior: A review. Personnel Psychology, 31(4), 853–888. https://doi.org/10.1111/j.1744-6570.1978.tb02128.x
St-Onge, C., Young, M., Eva, K. W., & Hodges, B. (2017). Validity: One word with a plurality of meanings. Advances in Health Sciences Education, 22(4), 853–867. https://doi.org/10.1007/s10459-016-9716-3
Sukhera, J., & Watling, C. (2018). A framework for integrating implicit bias recognition into health professions education. Academic Medicine, 93(1), 35–40. https://doi.org/10.1097/acm.0000000000001819
Sukhera, J., Watling, C. J., & Gonzalez, C. M. (2020). Implicit bias in health professions: From recognition to transformation. Academic Medicine, 95(5), 717–723. https://doi.org/10.1097/acm.0000000000003173
Tannenbaum, E. R., Tavares, W., & Kuper, A. (2019). Performance is in the eye of the beholder. Medical Education, 53(8), 759–762. https://doi.org/10.1111/medu.13873
Tavares, W., Ginsburg, S., & Eva, K. W. (2016). Selecting and simplifying: Rater behavior when considering multiple competencies. Teaching and Learning in Medicine, 28(1), 41–51. https://doi.org/10.1080/10401334.2015.1107489
Tavares, W., Sadowski, A., & Eva, K. W. (2018). Asking for less and getting more: The impact of broadening a rater’s focus in formative assessment. Academic Medicine, 93(10), 1584–1590. https://doi.org/10.1097/acm.0000000000002294
Tavares, W., Eppich, W., Cheng, A., Miller, S., Teunissen, P. W., Watling, C. J., & Sargeant, J. (2020a). Learning conversations: An analysis of the theoretical roots and their manifestations of feedback and debriefing in medical education. Academic Medicine, 95(7), 1020–1025. https://doi.org/10.1097/acm.0000000000002932
Tavares, W., Kuper, A., Kulasegaram, K., & Whitehead, C. (2020b). The compatibility principle: On philosophies in the assessment of clinical competence. Advances in Health Sciences Education, 25(4), 1003–1018. https://doi.org/10.1007/s10459-019-09939-9.
Tavares, W., Gofton, W., Bhanji, F., & Dudek, N. (2022a). Reframing the O-SCORE as a retrospective supervision scale using validity theory. Journal of Graduate Medical Education, 14(1), 22–24. https://doi.org/10.4300/jgme-d-21-00592.1
Tavares, W., Pearce, J., & Eva, K. W. (2022b). Tracing philosophical shifts in health professions assessment. In M. E. L. Brown, M. Veen, & G. M. Finn (Eds.), Applied Philosophy for Health Professions Education (pp. 67–84). Singapore: Springer. https://doi.org/10.1007/978-981-19-1512-3_6
Teherani, A., Hauer, K. E., Fernandez, A., King, T. E., & Lucey, C. (2018). How small differences in assessed clinical performance amplify to large differences in grades and awards: A cascade with serious consequences for students underrepresented in medicine. Academic Medicine, 93(9), 1286–1292. https://doi.org/10.1097/acm.0000000000002323
Tekian, A., & Norcini, J. J. (2016). Faculty development in assessment: What the faculty need to know and do. In P. Wimmers, & M. Mentkowski (Eds.), Assessing competence in Professional Performance across Disciplines and Professions (13 vol., pp. 355–374). Cham: Springer. https://doi.org/10.1007/978-3-319-30064-1_16
ten Cate, O., & Regehr, G. (2019). The power of subjectivity in the assessment of medical trainees. Academic Medicine, 94(3), 333–337. https://doi.org/10.1097/acm.0000000000002495
ten Cate, O., Schwartz, A., & Chen, H. C. (2020). Assessing trainees and making entrustment decisions: On the nature and use of entrustment-supervision scales. Academic Medicine, 95(11), 1662–1669. https://doi.org/10.1097/acm.0000000000003427
Valentine, N., Durning, S., Shanahan, E. M., & Schuwirth, L. (2021). Fairness in human judgement in assessment: A hermeneutic literature review and conceptual framework. Advances in Health Sciences Education, 26, 713–738. https://doi.org/10.1007/s10459-020-10002-1
Valentine, N., Durning, S. J., Shanahan, E. M., van der Vleuten, C., & Schuwirth, L. (2022). The pursuit of fairness in assessment: Looking beyond the objective. Medical Teacher, 44(4), 353–359. https://doi.org/10.1080/0142159X.2022.2031943
Vergis, A., Leung, C., & Roberston, R. (2020). Rater training in medical education: A scoping review. Cureus, 12(11), e11613. https://doi.org/10.7759/cureus.11363
Watling, C. J., & Ginsburg, S. (2019). Assessment, feedback and the alchemy of learning. Medical Education, 53(1), 76–85. https://doi.org/10.1111/medu.13645
Weitz, G., Vinzentius, C., Twesten, C., Lehnert, H., Bonnemeier, H., & König, I. R. (2014). Effects of a rater training on rating accuracy in a physical examination skills assessment. GMS Zeitschrift für Medizinische Ausbildung, 31(4), https://doi.org/10.3205/zma000933
Woehr, D. J., & Huffcutt, A. I. (1994). Rater training for performance appraisal: A quantitative review. Journal of Occupational and Organizational Psychology, 67(3), 189–205. https://doi.org/10.1111/j.2044-8325.1994.tb00562.x
Acknowledgements
None.
Funding
None.
Author information
Authors and Affiliations
Contributions
All authors are responsible for the content of this manuscript, and we have not excluded any qualified authors in the production of the manuscript. WT and MF provided the initial conceptualization. WT prepared the initial draft of the manuscript. All authors contributed to subsequent versions and the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors report no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tavares, W., Kinnear, B., Schumacher, D.J. et al. “Rater training” re-imagined for work-based assessment in medical education. Adv in Health Sci Educ 28, 1697–1709 (2023). https://doi.org/10.1007/s10459-023-10237-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10459-023-10237-8