Differences in Narrative Language in Evaluations of Medical Students by Gender and Under-represented Minority Status

  • Alexandra E. Rojek
  • Raman Khanna
  • Joanne W. L. Yim
  • Rebekah Gardner
  • Sarah Lisker
  • Karen E. Hauer
  • Catherine Lucey
  • Urmimala SarkarEmail author



In varied educational settings, narrative evaluations have revealed systematic and deleterious differences in language describing women and those underrepresented in their fields. In medicine, limited qualitative studies show differences in narrative language by gender and under-represented minority (URM) status.


To identify and enumerate text descriptors in a database of medical student evaluations using natural language processing, and identify differences by gender and URM status in descriptions.


An observational study of core clerkship evaluations of third-year medical students, including data on student gender, URM status, clerkship grade, and specialty.


A total of 87,922 clerkship evaluations from core clinical rotations at two medical schools in different geographic areas.

Main Measures

We employed natural language processing to identify differences in the text of evaluations for women compared to men and for URM compared to non-URM students.

Key Results

We found that of the ten most common words, such as “energetic” and “dependable,” none differed by gender or URM status. Of the 37 words that differed by gender, 62% represented personal attributes, such as “lovely” appearing more frequently in evaluations of women (p < 0.001), while 19% represented competency-related behaviors, such as “scientific” appearing more frequently in evaluations of men (p < 0.001). Of the 53 words that differed by URM status, 30% represented personal attributes, such as “pleasant” appearing more frequently in evaluations of URM students (p < 0.001), and 28% represented competency-related behaviors, such as “knowledgeable” appearing more frequently in evaluations of non-URM students (p < 0.001).


Many words and phrases reflected students’ personal attributes rather than competency-related behaviors, suggesting a gap in implementing competency-based evaluation of students.

We observed a significant difference in narrative evaluations associated with gender and URM status, even among students receiving the same grade. This finding raises concern for implicit bias in narrative evaluation, consistent with prior studies, and suggests opportunities for improvement.


medical education medical education—assessment/evaluation medical student and residency education 



The authors would like to thank Roy Cherian, Cassidy Clarity, Gato Gourley, Bonnie Hellevig, Mark Lovett, Kate Radcliffe, and Alvin Rajkomar.

Funding Information

Dr. Sarkar is supported by the National Cancer Institute (K24CA212294).

Compliance with Ethical Standards

Conflict of Interest

The authors declare that they do not have a conflict of interest.

Supplementary material

11606_2019_4889_MOESM1_ESM.docx (23 kb)
ESM 1 (DOCX 19 kb)
11606_2019_4889_MOESM2_ESM.pdf (211 kb)
ESM 2 (PDF 211 kb)
11606_2019_4889_MOESM3_ESM.pdf (191 kb)
ESM 3 (PDF 191 kb)


  1. 1.
    Association of American Medical Colleges. Recommendations for Revising the Medical Student Performance Evaluation (MSPE). May 2017. Accessed December 11, 2018.
  2. 2.
    Biernat M, Tocci MJ, Williams JC. The Language of Performance Evaluations: Gender-Based Shifts in Content and Consistency of Judgment. Social Psychological and Personality Science. 2012;3(2):186–192. CrossRefGoogle Scholar
  3. 3.
    Manning CD, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge Unviersity Press; 2009.Google Scholar
  4. 4.
    Corrice A. Unconscious Bias in Faculty and Leadership Recruitment: A Literature Review. Association of American Medical Colleges. 2009;9(2).Google Scholar
  5. 5.
    Trix F, Psenka C. Exploring the Color of Glass: Letters of Recommendation for Female and Male Medical Faculty. Discourse & Society. 2003;14(2):191–220. CrossRefGoogle Scholar
  6. 6.
    Axelson RD, Solow CM, Ferguson KJ, Cohen MB. Assessing Implicit Gender Bias in Medical Student Performance Evaluations. Eval Health Prof. 2010;33(3):365–385. CrossRefGoogle Scholar
  7. 7.
    Galvin SL, Parlier AB, Martino E, Scott KR, Buys E. Gender Bias in Nurse Evaluations of Residents in Obstetrics and Gynecology. Obstet Gynecol. 2015;126 Suppl 4:7S–12S. CrossRefGoogle Scholar
  8. 8.
    Isaac C, Chertoff J, Lee B, Carnes M. Do students’ and authors’ genders affect evaluations? A linguistic analysis of Medical Student Performance Evaluations. Acad Med. 2011;86(1):59–66. CrossRefGoogle Scholar
  9. 9.
    Schmader T, Whitehead J, Wysocki VH. A Linguistic Comparison of Letters of Recommendation for Male and Female Chemistry and Biochemistry Job Applicants. Sex Roles. 2007;57(7–8):509–514. CrossRefGoogle Scholar
  10. 10.
    Magua W, Zhu X, Bhattacharya A, et al. Are Female Applicants Disadvantaged in National Institutes of Health Peer Review? Combining Algorithmic Text Mining and Qualitative Methods to Detect Evaluative Differences in R01 Reviewers’ Critiques. Journal of Women’s Health. 26(5):560-570.
  11. 11.
    Kaatz A, Magua W, Zimmerman DR, Carnes M. A quantitative linguistic analysis of National Institutes of Health R01 application critiques from investigators at one institution. Acad Med. 2015;90(1):69–75. CrossRefGoogle Scholar
  12. 12.
    Moss-Racusin CA, Dovidio JF, Brescoll VL, Graham MJ, Handelsman J. Science faculty’s subtle gender biases favor male students. PNAS. 2012;109(41):16474–16479. CrossRefGoogle Scholar
  13. 13.
    Wilson KY. An analysis of bias in supervisor narrative comments in performance appraisal. Human Relations 63(12):1903-1933.
  14. 14.
    P. Bourdieu. Distinction: A Social Critique of the Judgement of Taste. Cambridge, MA: Harvard University Press; 1984.Google Scholar
  15. 15.
    G. C. Spivak. The Rani of Sirmur: An Essay in Reading the Archives. History and Theory. 1985;24(3):247–272.CrossRefGoogle Scholar
  16. 16.
    S. De Beauvoir. The Second Sex. New York: Knopf; 1952.Google Scholar
  17. 17.
    L. Weis. Identity formation and the process of “othering”: Unraveling sexual threads. Educational Foundations. 1995;9(1):17–33.Google Scholar
  18. 18.
    W. I. U. Ahmad. Making Black people sick: ‘Race’, ideology and health research. In: ‘Race’ and Health in Contemporary Britain. Philadelphia: Open University Press; 1993:12–33.Google Scholar
  19. 19.
    Association of American Medical Colleges. Diversity in the Physician Workforce: Facts & Figures 2014. 2014. Accessed December 11, 2018.
  20. 20.
    Holmboe E, Edgar L, Hamstra S. The Milestone Guidebook. Accreditation Council for Graduate Medical Education; 2016.Google Scholar
  21. 21.
    US Census Bureau. Frequently Occurring Surnames from the 1990 Census. Published September 15, 2014. Accessed December 11, 2018.
  22. 22.
    US Census Bureau. (2000) Frequently Occurring Surnames from the Census. Published September 15, 2014. Accessed December 11, 2018.
  23. 23.
    Social Security Administration. Beyond the Top 1000 Names. Popular Baby Names. Published 2017. Accessed December 11, 2018.
  24. 24.
    Presta, A., Severyn, A., Golding, A., et al. SyntaxNet: Neural Models of Syntax. tensorflow; 2018. Accessed December 11, 2018.
  25. 25.
    Andor D, Alberti C, Weiss D, et al. Globally Normalized Transition-Based Neural Networks.; 2016. Accessed December 11, 2018.
  26. 26.
    Vijila, S. F., Nirmala, K. Dr. Quantification of Portrayal Concepts using tf-idf Weighting. IJIST. 2013;3(5).
  27. 27.
    Dertat, A. (2011) How to Implement a Search Engine Part 3: Ranking tf-idf. Accessed December 11, 2018.
  28. 28.
    Storey JD, Tibshirani R. Statistical significance for genomewide studies. PNAS. 2003;100(16):9440–9445. CrossRefGoogle Scholar
  29. 29.
    Noble WS. How does multiple testing correction work? Nat Biotechnol. 2009;27(12):1135–1137. CrossRefGoogle Scholar
  30. 30.
    Lee KB, Vaishnavi SN, Lau SKM, Andriole DA, Jeffe DB. “Making the grade:” noncognitive predictors of medical students’ clinical clerkship grades. J Natl Med Assoc. 2007;99(10):1138–1150.Google Scholar
  31. 31.
    Colbert C, McNeal T, Lezama M, et al. Factors associated with performance in an internal medicine clerkship. Proc (Bayl Univ Med Cent). 2017;30(1):38–40.CrossRefGoogle Scholar
  32. 32.
    Ogunyemi D, De Taylor-Harris S. NBME obstetrics and gynecology clerkship final examination scores: predictive value of standardized tests and demographic factors. J Reprod Med. 2004;49(14):978–982.Google Scholar
  33. 33.
    Mueller AS, Jenkins TM, Osborne M, Dayal A, O’Connor DM, Arora VM. Gender Differences in Attending Physicians’ Feedback to Residents: A Qualitative Analysis. Journal of Graduate Medical Education. 2017;9(5):577–585. CrossRefGoogle Scholar
  34. 34.
    Ross DA, Boatright D, Nunez-Smith M, Jordan A, Chekroud A, Moore EZ. Differences in words used to describe racial and gender groups in Medical Student Performance Evaluations. PLOS ONE. 2017;12(8):e0181659. CrossRefGoogle Scholar
  35. 35.
    Boatright D, Ross D, O’Connor P, Moore E, Nunez-Smith M. Racial Disparities in Medical Student Membership in the Alpha Omega Alpha Honor Society. JAMA Intern Med. 2017;177(5):659–665. CrossRefGoogle Scholar
  36. 36.
    Jena AB, Khullar D, Ho O, Olenski AR, Blumenthal DM. Sex Differences in Academic Rank in US Medical Schools in 2014. JAMA. 2015;314(11):1149–1158. CrossRefGoogle Scholar
  37. 37.
    Nunez-Smith M, Ciarleglio MM, Sandoval-Schaefer T, et al. Institutional Variation in the Promotion of Racial/Ethnic Minority Faculty at US Medical Schools. Am J Public Health. 2012;102(5):852–858. CrossRefGoogle Scholar
  38. 38.
    A Teherani, K E Hauer, A Fernandez, T E King Jr, C Lucey. How Small Differences in Assessed Clinical Performance Amplify to Large Differences in Grades and Awards: A Cascade With Serious Consequences for Students Underrepresented in Medicine. Acad Med. 2018;93(9):1286–1292.CrossRefGoogle Scholar
  39. 39.
    Bohnet, I., van Geen, A., Bazerman, M. When Performance Trumps Gender Bias: Joint vs. Separate Evaluation. Management Science. 2015;621225–1234.
  40. 40.
    Reskin BF, McBrier DB. Why Not Ascription? Organizations’ Employment of Male and Female Managers. American Sociological Review. 2000;65(2):210–233. CrossRefGoogle Scholar
  41. 41.
    Gawande A. The Checklist Manifesto: How to Get Things Right. New York: Picador; 2011.Google Scholar
  42. 42.
    Goldin C, Rouse C. Orchestrating Impartiality: The Impact of “Blind” Auditions on Female Musicians. American Economic Review. 2000;90(4):715–741. CrossRefGoogle Scholar
  43. 43.
    Babcock L, Loewenstein G. Explaining bargaining impasse: the role of self-serving biases. Journal of Economic Perspectives. 1997;11(1):109–126. CrossRefGoogle Scholar

Copyright information

© Society of General Internal Medicine 2019

Authors and Affiliations

  • Alexandra E. Rojek
    • 1
  • Raman Khanna
    • 2
  • Joanne W. L. Yim
    • 3
  • Rebekah Gardner
    • 4
  • Sarah Lisker
    • 1
    • 5
  • Karen E. Hauer
    • 1
  • Catherine Lucey
    • 1
  • Urmimala Sarkar
    • 1
    • 5
    Email author
  1. 1.University of California, San Francisco School of MedicineSan FranciscoUSA
  2. 2.Division of Hospital MedicineUniversity of California, San Francisco, School of MedicineSan FranciscoUSA
  3. 3.Health Informatics, UCSF Health, University of California, San FranciscoSan FranciscoUSA
  4. 4.Warren Alpert Medical School of Brown UniversityProvidenceUSA
  5. 5.UCSF Center for Vulnerable PopulationsSan FranciscoUSA

Personalised recommendations