Detection of Biased Rating of Medical Students by Standardized Patients: Opportunity for Improvement
This paper aims to assess the interrater reliability of standardized patients (SPs) as they assess the clinical skills of medical students and to detect possible rating bias in SPs. The ratings received by 6 students examined in 4 clinical stations by 13 SPs were examined. Each SP contributed at least 3 and at most 10 pairwise ratings, with an average of approximately 5 ratings per SP. The standard Cohen’ kappa statistic was calculated and the distribution of scores among SPs was compared via both ANOVA the Kruskal-Wallis H test (one-way ANOVA by ranks). Furthermore, a number of discrepancies between pairwise raters (showing either “positive” or “negative” bias in the rating) were analyzed using ANOVA and a χ 2 goodness-of-fit test. The conventional method, which compared the statistics of kappa scores of the raters (including the prevalence-adjusted bias-adjusted kappa scores), did not reject the null hypothesis that the raters (SPs) are similar. However, the analysis of the distribution of the discrepancies among the raters revealed that the differences between raters cannot be attributed to chance, particularly when a distinction was made between their overall positive and negative bias. A strong (p < 0.001) negative bias was detected, and the SPs responsible for this bias have been identified. The statistical method suggested here, which takes into account explicitly the positive and the negative bias of the raters, is more sensitive than the conventional method (Cohens’ kappa). Since the outliers (the biased SPs) affect the fairness of the grading of the medical students, it is important to detect any statistically significant bias in the rating and to adjust correspondingly the SP’s assessment.
KeywordsStandardized patients Inter-rater agreement
Research reported in this paper was supported by the National Institute of General Medical Sciences of the National Institutes of Health under linked Award Numbers RL5GM118969, TL4GM118971, and UL1GM118970. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
- 1.Hawkins RE, Swanson DB, Dillon GF, Clauser BE, King AM, Scoles PV, et al. The introduction of clinical skills assessment into the United States medical licensing examination (USMLE): description of USMLE step 2 clinical skills (CS). J Med Licensure Discipline. 2005;91:21–5.Google Scholar
- 3.2015 National Board of Medical Examiners (NBCM) annual report. http://www.nbme.org/publications/.
- 9.Szklo M, Nieto FJ. Epidemiology beyond the basics. Gaithersburg: Aspen Publishers, Inc.; 2000.Google Scholar
- 10.Vierkant RA. A SAS® macro for calculating bootstrapped confidence intervals about a kappa coefficient. Available at http://www2.sas.com/proceedings/sugi22/STATS/PAPER295.PDF Accessed 27 Oct 2016.