Skip to main content
Log in

Augmenting physician examiner scoring in objective structured clinical examinations: including the standardized patient perspective

  • Published:
Advances in Health Sciences Education Aims and scope Submit manuscript


In Canada, high stakes objective structured clinical examinations (OSCEs) administered by the Medical Council of Canada have relied exclusively on physician examiners (PEs) for scoring. Prior research has looked at using SPs to replace PEs. This paper reports on two studies that implement and evaluate a standardized patient (SP) scoring tool to augment PE scoring. The unique aspect of this study is that it explores the benefits of combining SP and PE scores. SP focus groups developed rating scales for four dimensions they labelled: Listening, Communication, Empathy/Rapport, and Global Impression. In Study I, 43 SPs from one site of a national PE-scored OSCE rated 60 examinees with the initial SP rating scales. In Study II, 137 SPs used slightly revised rating scales with optional narrative comments to score 275 examinees at two sites. Examinees were blinded to SP scoring and SP ratings did not count. Separate PE and SP scoring was examined using descriptive statistics and correlations. Combinations of SP and PE scoring were assessed using pass-rates, reliability, and decision consistency and accuracy indices. In Study II, SP and PE comments were examined. SPs showed greater variability in their scoring, and rated examinees lower than PEs on common elements, resulting in slightly lower pass rates when combined. There was a moderate tendency for both SPs and PEs to make negative comments for the same examinee but for different reasons. We argue that SPs and PE assess performance from different perspectives, and that combining scores from both augments overall reliability of scores and pass/fail decisions. There is potential to provide examinees with feedback comments from each group.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others


  • Boulet, J. R., Smee, S., Dillon, G. F., & Gipel, J. R. (2009). The use of standardized patient assessments for certification and licensure decisions. Simulation in Healthcare: Journal of the Society for Simulation in Healthcare, 4, 35–42.

    Article  Google Scholar 

  • Byrne, A., Tweed, N., & Halligan, C. (2014). A pilot study of the mental workload of objective structured clinical examination examiners. Medical Education, 48, 262–267.

    Article  Google Scholar 

  • Chong, L., Taylor, S., Haywood, M., Adelstein, B. A., & Shulruf, B. (2017). The sights and insights of examiners in objective structured clinical examinations. Journal of Educational Evaluation in the Health Professions, 14, 34.

    Article  Google Scholar 

  • De Champlain, A. F., Margolis, M. J., King, A., & Klass, D. J. (1997). Standardized patients’ accuracy in recording examinees’ behaviours using checklists. Academic Medicine, 72, 9–23.

    Article  Google Scholar 

  • Donnelly, M. B., Sloan, D., Plymale, M., & Schwartz, R. (2000). Assessment of residents’ interpersonal skills by faculty proctors and standardized patients: A psychometric analysis. Academic Medicine, 75(Supplement), S93–95.

    Article  Google Scholar 

  • Eva, K. W., Bordage, G., Campbell, C., Galbraith, R., Ginsburch, S., Holmboe, E., et al. (2016). Towards a program of assessment for health professionals: From training into practice. Advances in Health Science Education: Theory and Practice, 21, 897–913.

    Article  Google Scholar 

  • Gingerich, A., Kogan, J., Yeates, P., Govaerts, M., & Holmboe, E. (2014). Seeing the “black box” differently: Assessor cognition from three research perspectives. Medical Education, 48, 1055–1068.

    Article  Google Scholar 

  • Ginsburg, S., Kogan, J. R., Gingerich, A., Lynch, M., & Watling, C. J. (2019). Taken out of context: Hazards in the interpretation of written assessment comments. Academic Medicine.

    Article  Google Scholar 

  • Han, J. J., Kreiter, C. D., Park, H., & Ferguson, K. J. (2006). An experimental comparison of rater performance on an SP-based clinical skills exam. Teaching and Learning in Medicine, 18, 304–309.

    Article  Google Scholar 

  • Howley, L. D. (2004). Performance assessment in medical education: Where we’ve been and where we’re going. Evaluation and the Health Professions, 27, 285–303.

    Article  Google Scholar 

  • Hauer, K. E., Hodgson, C. S., Kerr, K. M., Teherani, A., & Irby, D. M. (2005). A national study of medical student clinical skills assessment. Academic Medicine, 80(Suppl), S25–S29.

    Article  Google Scholar 

  • Livingston, S. A., & Lewis, C. (1995). Estimating the consistency and accuracy of classifications based on test scores. Journal of Educational Measurement, 32, 179–197.

    Article  Google Scholar 

  • Lockyer, J., Sargeant, J., Campbell, J. L., Richards, S. H., & Rivera, L. A. (2017). Multisource feedback and narrative comments: Polarity, specificity, actionability, and CanMEDS roles. Journal of Continuing Education in the Health Professions.

    Article  Google Scholar 

  • Reznick, R. K., Blackmore, D. E., Dauphinee, W. D., Rothman, A. I., & Smee, S. (1996). Large-scale High Stakes Testing with an OSCE: Report from the Medical Council of Canada. Academic Medicine, 71(Supplement), S19–21.

    Article  Google Scholar 

  • Swanson, D. B., & Norcini, J. J. (1989). Factors influencing the reproducibility of tests using standardized patients. Teaching and Learning in Medicine, 1, 158–166.

    Article  Google Scholar 

  • Tavares, W., & Eva, K. W. (2013). Exploring the impact of mental workload on rater-based assessments. Advances in Health Sciences Education, 18, 291–303.

    Article  Google Scholar 

  • Taveres, W., Sadowski, A., & Eva, K. W. (2018). Asking for less and getting more: The impact of broadening a rater’s focus in formative assessment. Academic Medicine, 93, 1584–1590.

    Article  Google Scholar 

  • Thistlethwaite, J. E. (2002). Developing an OSCE station to assess the ability of medical students to share information and decisions with patients: Issues relating to interrater reliability and the use of simulated patients. Education for Health, 15, 170–179.

    Article  Google Scholar 

  • Touchie, C., & Streefkerk, C. (2014). Blueprint project—Qualifying examinations blueprint and content specifications. Retrieved from Accessed 25 Aug 2019.

  • van Zanten, M., Boulet, J. R., Norcini, J. J., & McKinley, D. (2005). Using a standardised patient assessment to measure professional attributes. Medical Education, 39, 20–29.

    Article  Google Scholar 

  • Weidner, A. C., Gimple, J. R., Boulet, J. R., & Solomon, M. (2010). Using standardized patients to assess the communication skills of graduating physicians for the comprehensive osteopathic medical licencing examination (COMLEX) level 2- performance evaluation (Level 2-PE). Teaching and Learning in Medicine, 22, 8–15.

    Article  Google Scholar 

  • Whelan, G. P., Boulet, J. R., McKinley, D. W., Norcini, J. J., vanZanten, M., Hambleton, R. K., et al. (2005). Scoring standardized patient examinations: Lessons learned from the development and administration of the ECFMG Clinical Skills Assessment (CSA). Medical Teacher, 27, 200–206.

    Article  Google Scholar 

  • Williams, R. G. (2004). Have standardised patient examinations stood the test of time and experience? Teaching and Learning in Medicine, 16, 215–222.

    Article  Google Scholar 

Download references


We would like to acknowledge the contributions of Dr. Claire Touchie and Anthony King for reviewing this manuscript and providing insightful comments. In addition, we would like to thank Dr. Gordon Page for sharing his knowledge of the literature on SP scoring and Dr. Andrea Gotzmann for her statistical advice.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Marguerite Roy.

Ethics declarations

Ethical standards

Examinees registering for examinations with the Medical Council of Canada (MCC) sign an agreement allowing the collected data to be used for quality assurance studies. This study used MCC data to inform OSCEs scoring methods in order to ensure fair and reliable examinations. As per Privacy and Confidentiality of the World Medical Association Declaration of Helsinki (2013)—Ethical Principles for Medical Research Involving Human Subjects, every precaution was taken to protect the privacy of the examinees and the confidentiality of their personal information. When the final research dataset was assembled, all identifiers were deleted; only aggregate summaries are presented. This quality assurance study was reviewed by the Research Advisory Committee and the Central Examination Committee to the Medical Council of Canada and follows the principles of the Tri-council Policy Statement regarding the ethical conduct of research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roy, M., Wojcik, J., Bartman, I. et al. Augmenting physician examiner scoring in objective structured clinical examinations: including the standardized patient perspective. Adv in Health Sci Educ 26, 313–328 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: