Medical Science Educator

, Volume 27, Issue 3, pp 497–502 | Cite as

Detection of Biased Rating of Medical Students by Standardized Patients: Opportunity for Improvement

  • Marian ManciuEmail author
  • Roszella Trevino
  • Zuber D. Mulla
  • Claudia Cortez
  • Sanja Kupesic Plavsic
Original Research


This paper aims to assess the interrater reliability of standardized patients (SPs) as they assess the clinical skills of medical students and to detect possible rating bias in SPs. The ratings received by 6 students examined in 4 clinical stations by 13 SPs were examined. Each SP contributed at least 3 and at most 10 pairwise ratings, with an average of approximately 5 ratings per SP. The standard Cohen’ kappa statistic was calculated and the distribution of scores among SPs was compared via both ANOVA the Kruskal-Wallis H test (one-way ANOVA by ranks). Furthermore, a number of discrepancies between pairwise raters (showing either “positive” or “negative” bias in the rating) were analyzed using ANOVA and a χ 2 goodness-of-fit test. The conventional method, which compared the statistics of kappa scores of the raters (including the prevalence-adjusted bias-adjusted kappa scores), did not reject the null hypothesis that the raters (SPs) are similar. However, the analysis of the distribution of the discrepancies among the raters revealed that the differences between raters cannot be attributed to chance, particularly when a distinction was made between their overall positive and negative bias. A strong (p < 0.001) negative bias was detected, and the SPs responsible for this bias have been identified. The statistical method suggested here, which takes into account explicitly the positive and the negative bias of the raters, is more sensitive than the conventional method (Cohens’ kappa). Since the outliers (the biased SPs) affect the fairness of the grading of the medical students, it is important to detect any statistically significant bias in the rating and to adjust correspondingly the SP’s assessment.


Standardized patients Inter-rater agreement 



Research reported in this paper was supported by the National Institute of General Medical Sciences of the National Institutes of Health under linked Award Numbers RL5GM118969, TL4GM118971, and UL1GM118970. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.


  1. 1.
    Hawkins RE, Swanson DB, Dillon GF, Clauser BE, King AM, Scoles PV, et al. The introduction of clinical skills assessment into the United States medical licensing examination (USMLE): description of USMLE step 2 clinical skills (CS). J Med Licensure Discipline. 2005;91:21–5.Google Scholar
  2. 2.
    Dillon GF, Boulet JR, Hawkins RE, Swanson DB. Simulations in the United States medical licensing examination (USMLE). Qual Saf Health Care. 2004;13(Suppl1):141–5. doi: 10.1136/qshc.2004.010025.Google Scholar
  3. 3.
    2015 National Board of Medical Examiners (NBCM) annual report.
  4. 4.
    Van der Vleuten CPM, Swanson DB. Assessment of clinical skills with standardized patients: state of the art. Teach Learn Med. 1990;2:58–76.CrossRefGoogle Scholar
  5. 5.
    Stillman P, Swanson D, Regan MB, et al. Assessment of clinical skills of residents utilizing standardized patients: a follow-up study and recommendations for application. Ann Intern Med. 1991;114:393–401.CrossRefGoogle Scholar
  6. 6.
    Epstein RM. Assessment in medical education. N Engl J Med. 2007;356(4):387–96. doi: 10.1056/NEJMra054784.CrossRefGoogle Scholar
  7. 7.
    Fiscella K, Franks P, Srinivasan M, Kravitz RL, Epstein R. Ratings of physician communication by real and standardized patients. Ann Fam Med. 2007;5(2):151–8. doi: 10.1370/afm.643.CrossRefGoogle Scholar
  8. 8.
    Dabrh AM, Murad MH, Newcomb RD, Buchta WG, Steffen MW, Wang Z, et al. Proficiency in identifying, managing and communicating medical errors: feasibility and validity study assessing two core competencies. BMC Med Educ. 2016;16(1):233. doi: 10.1186/s12909-016-0755-5.CrossRefGoogle Scholar
  9. 9.
    Szklo M, Nieto FJ. Epidemiology beyond the basics. Gaithersburg: Aspen Publishers, Inc.; 2000.Google Scholar
  10. 10.
    Vierkant RA. A SAS® macro for calculating bootstrapped confidence intervals about a kappa coefficient. Available at Accessed 27 Oct 2016.
  11. 11.
    Byrt T, Bishop J, Carlin JB. Bias, prevalence and kappa. J Clin Epidemiol. 1993;46(5):423–9. doi: 10.1016/0895-4356(93)90018-V.CrossRefGoogle Scholar
  12. 12.
    Colliver JA, Morrison LJ, Markwell SJ, Verhulst SJ, Steward DE, Dawson-Saunders E, et al. Three studies of the effect of multiple standardized patients on intercase reliability of five standardized-patient examinations. Teach Learn Med : Int J. 1990;2(4):237–45.CrossRefGoogle Scholar
  13. 13.
    Setyonugroho W, Kennedy KM, Kropmans TJ. Reliability and validity of OSCE checklists used to assess the communication skills of undergraduate medical students: a systematic review. Patient Educ Couns. 2015;98:1482–91.CrossRefGoogle Scholar

Copyright information

© International Association of Medical Science Educators 2017

Authors and Affiliations

  1. 1.Physics DepartmentUniversity of Texas at El PasoEl PasoUSA
  2. 2.Department of Obstetrics and Gynecology, Center for Advanced Teaching and Assessment in Clinical Simulation, Paul L. Foster School of MedicineTexas Tech University Health Sciences Center El PasoEl PasoUSA

Personalised recommendations