Differences in Narrative Language in Evaluations of Medical Students by Gender and Under-represented Minority Status
- 102 Downloads
In varied educational settings, narrative evaluations have revealed systematic and deleterious differences in language describing women and those underrepresented in their fields. In medicine, limited qualitative studies show differences in narrative language by gender and under-represented minority (URM) status.
To identify and enumerate text descriptors in a database of medical student evaluations using natural language processing, and identify differences by gender and URM status in descriptions.
An observational study of core clerkship evaluations of third-year medical students, including data on student gender, URM status, clerkship grade, and specialty.
A total of 87,922 clerkship evaluations from core clinical rotations at two medical schools in different geographic areas.
We employed natural language processing to identify differences in the text of evaluations for women compared to men and for URM compared to non-URM students.
We found that of the ten most common words, such as “energetic” and “dependable,” none differed by gender or URM status. Of the 37 words that differed by gender, 62% represented personal attributes, such as “lovely” appearing more frequently in evaluations of women (p < 0.001), while 19% represented competency-related behaviors, such as “scientific” appearing more frequently in evaluations of men (p < 0.001). Of the 53 words that differed by URM status, 30% represented personal attributes, such as “pleasant” appearing more frequently in evaluations of URM students (p < 0.001), and 28% represented competency-related behaviors, such as “knowledgeable” appearing more frequently in evaluations of non-URM students (p < 0.001).
Many words and phrases reflected students’ personal attributes rather than competency-related behaviors, suggesting a gap in implementing competency-based evaluation of students.
We observed a significant difference in narrative evaluations associated with gender and URM status, even among students receiving the same grade. This finding raises concern for implicit bias in narrative evaluation, consistent with prior studies, and suggests opportunities for improvement.
KEY WORDSmedical education medical education—assessment/evaluation medical student and residency education
The authors would like to thank Roy Cherian, Cassidy Clarity, Gato Gourley, Bonnie Hellevig, Mark Lovett, Kate Radcliffe, and Alvin Rajkomar.
Dr. Sarkar is supported by the National Cancer Institute (K24CA212294).
Compliance with Ethical Standards
Conflict of Interest
The authors declare that they do not have a conflict of interest.
- 1.Association of American Medical Colleges. Recommendations for Revising the Medical Student Performance Evaluation (MSPE). May 2017. https://www.aamc.org/download/470400/data/mspe-recommendations.pdf. Accessed December 11, 2018.
- 3.Manning CD, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge Unviersity Press; 2009.Google Scholar
- 4.Corrice A. Unconscious Bias in Faculty and Leadership Recruitment: A Literature Review. Association of American Medical Colleges. 2009;9(2).Google Scholar
- 10.Magua W, Zhu X, Bhattacharya A, et al. Are Female Applicants Disadvantaged in National Institutes of Health Peer Review? Combining Algorithmic Text Mining and Qualitative Methods to Detect Evaluative Differences in R01 Reviewers’ Critiques. Journal of Women’s Health. 26(5):560-570. https://doi.org/10.1089/jwh.2016.6021
- 13.Wilson KY. An analysis of bias in supervisor narrative comments in performance appraisal. Human Relations 63(12):1903-1933. https://doi.org/10.1177/0018726710369396
- 14.P. Bourdieu. Distinction: A Social Critique of the Judgement of Taste. Cambridge, MA: Harvard University Press; 1984.Google Scholar
- 16.S. De Beauvoir. The Second Sex. New York: Knopf; 1952.Google Scholar
- 17.L. Weis. Identity formation and the process of “othering”: Unraveling sexual threads. Educational Foundations. 1995;9(1):17–33.Google Scholar
- 18.W. I. U. Ahmad. Making Black people sick: ‘Race’, ideology and health research. In: ‘Race’ and Health in Contemporary Britain. Philadelphia: Open University Press; 1993:12–33.Google Scholar
- 19.Association of American Medical Colleges. Diversity in the Physician Workforce: Facts & Figures 2014. 2014. http://aamcdiversityfactsandfigures.org/. Accessed December 11, 2018.
- 20.Holmboe E, Edgar L, Hamstra S. The Milestone Guidebook. Accreditation Council for Graduate Medical Education; 2016.Google Scholar
- 21.US Census Bureau. Frequently Occurring Surnames from the 1990 Census. https://www.census.gov/topics/population/genealogy/data/1990_census.html. Published September 15, 2014. Accessed December 11, 2018.
- 22.US Census Bureau. (2000) Frequently Occurring Surnames from the Census. https://www.census.gov/topics/population/genealogy/data/2000_surnames.html. Published September 15, 2014. Accessed December 11, 2018.
- 23.Social Security Administration. Beyond the Top 1000 Names. Popular Baby Names. https://www.ssa.gov/OACT/babynames/limits.html. Published 2017. Accessed December 11, 2018.
- 24.Presta, A., Severyn, A., Golding, A., et al. SyntaxNet: Neural Models of Syntax. tensorflow; 2018. https://github.com/tensorflow/models. Accessed December 11, 2018.
- 25.Andor D, Alberti C, Weiss D, et al. Globally Normalized Transition-Based Neural Networks.; 2016. http://arxiv.org/abs/1603.06042. Accessed December 11, 2018.
- 26.Vijila, S. F., Nirmala, K. Dr. Quantification of Portrayal Concepts using tf-idf Weighting. IJIST. 2013;3(5). https://doi.org/10.5121/ijist.2013.3501
- 27.Dertat, A. (2011) How to Implement a Search Engine Part 3: Ranking tf-idf. http://www.ardendertat.com/2011/07/17/how-to-implement-a-search-engine-part-3-ranking-tf-idf/. Accessed December 11, 2018.
- 30.Lee KB, Vaishnavi SN, Lau SKM, Andriole DA, Jeffe DB. “Making the grade:” noncognitive predictors of medical students’ clinical clerkship grades. J Natl Med Assoc. 2007;99(10):1138–1150.Google Scholar
- 32.Ogunyemi D, De Taylor-Harris S. NBME obstetrics and gynecology clerkship final examination scores: predictive value of standardized tests and demographic factors. J Reprod Med. 2004;49(14):978–982.Google Scholar
- 39.Bohnet, I., van Geen, A., Bazerman, M. When Performance Trumps Gender Bias: Joint vs. Separate Evaluation. Management Science. 2015;621225–1234. https://doi.org/10.1287/mnsc.2015.2186
- 41.Gawande A. The Checklist Manifesto: How to Get Things Right. New York: Picador; 2011.Google Scholar