Abstract
Assessment of clinical teachers by learners is problematic. Construct-irrelevant factors influence ratings, and women teachers often receive lower ratings than men. However, most studies focus only on numeric scores. Therefore, the authors analyzed written comments on 4032 teacher assessments, representing 282 women and 448 men teachers in one Department of Medicine, to explore for gender differences. NVivo was used to search for 61 evidence- and theoretically-based terms purported to reflect teaching excellence, which were analyzed using 2 × 2 chi-squared tests. The Linguistic Index and Word Count (LIWC) was used to categorize comment data, which were analyzed using linear regressions. The only significant difference in NVivo was that men were more likely than women to have the word “available” in a comment (OR 1.4, p < .05). A subset of LIWC variables showed significant gender differences, but all effects were modest. Men teachers had more positive emotion words written about them, while negative emotion words appeared equally. Significant differences occurred more often between the men and women residents who wrote the comments, rather than those attributed to the gender of the teachers. For example, women residents used more social and gender-related words (β 1.87, p < 0.001) and fewer words related to power or achievement (β −3.78, p < 0.001) than men residents. Profound gender differences were not found in teacher assessment comments in this large, diverse academic department of medicine, which differs from other studies. The authors explore possible reasons including differences in departmental culture and issues related to the methods used.
Similar content being viewed by others
References
@facultyfocus. (2018). What to Do About Those Negative Comments on Course Evaluations. @facultyfocus; [updated 2018–05–30; accessed]. https://www.facultyfocus.com/articles/educational-assessment/negative-comments-on-course-evaluations/.
Avoiding gender bias in reference writing. (2021). [Accessed]. https://csw.arizona.edu/sites/default/files/avoiding_gender_bias_in_letter_of_reference_writing.pdf.
Billick, M., Rassos, J., & Ginsburg, S. (2021). Dressing the part: Gender differences in residents’ perceptions of feedback in internal medicine. Academic Medicine. https://doi.org/10.1097/ACM.0000000000004487
de Groot, J., Brunet, A., Kaplan, A. S., & Bagby, M. (2003). A comparison of evaluations of male and female psychiatry supervisors. Academic Psychiatry, 27(1), 39–43.
Dory, V., Cummings, B.-A., Mondou, M., & Young, M. (2019). Nudging clinical supervisors to provide better in-training assessment reports [journal article]. Perspectives on Medical Education, 9, 66–70.
Dudek, N. L., Marks, M., Bandiera, G., White, J., & Wood, T. J. (2013). Quality in-training evaluation reports–does feedback drive faculty performance? Academic Medicine., 88(8), 1129–1134.
Fassiotto, M., Li, J., Maldonado, Y., & Kothary, N. (2018). Female surgeons as counter stereotype: The impact of gender perceptions on trainee evaluations of physician faculty. Journal of Surgical Education, 75(5), 1140–1148.
Files, J. A., Mayer, A. P., Ko, M. G., Friedrich, P., Jenkins, M., Bryan, M. J., Vegunta, S., Wittich, C. M., Lyle, M. A., Melikian, R., Duston, T., Chang, Y.-H.H., & Hayes, S. N. (2017). Speaker introductions at internal medicine grand rounds: Forms of address reveal gender bias. Journal of Women’s Health, 26(5), 413–419.
Fluit, C. R. M. G., Feskens, R., Bolhuis, S., Grol, R., Wensing, M., & Laan, R. (2015). Understanding resident ratings of teaching in the workplace: A multi-centre study. Advances in Health Sciences Education., 20(3), 691–707.
Ginsburg, S., Gingerich, A., Kogan, J. R., Watling, C. J., & Eva, K. W. (2020a). Idiosyncrasy in assessment comments: Do faculty have distinct writing styles when completing in-training evaluation reports? Academic Medicine, 95, S81–S88.
Ginsburg, S., Gold, W., Cavalcanti, R. B., Kurabi, B., & McDonald-Blumer, H. (2011). Competencies “plus”: The nature of written comments on internal medicine residents’ evaluation forms. Academic Medicine, 86(10 Suppl), S30-34.
Ginsburg, S., Kogan, J. R., Gingerich, A., Lynch, M., & Watling, C. J. (2020b). Taken out of context: hazards in the interpretation of written assessment comments. Academic Medicine, 95(7), 1082–1088.
Ginsburg, S., Regehr, G., Lingard, L., & Eva, K. W. (2015). Reading between the lines: Faculty interpretations of narrative evaluation comments. Medical Education, 49(3), 296–306.
Ginsburg, S., van der Vleuten, C., Eva, K. W., & Lingard, L. (2016). Hedging to save face: A linguistic analysis of written comments on in-training evaluation reports. Advances in Health Sciences Education: Theory and Practice, 21(1), 175–188.
Hamermesh, D. S., & Parker, A. (2005a). Beauty in the classroom: Instructors’ pulchritude and putative pedagogical productivity. Economics of Education Review, 24(4), 369–376.
Hamermesh, D. S., & Parker, A. (2005b). Beauty in the classroom: Instructors’ pulchritude and putative pedagogical productivity. Economics of Education Review, 24(4), 369–376.
Heath, J. K., Clancy, C. B., Carillo-Perez, A., & Dine, C. J. (2020). Assessment of gender-based qualitative differences within trainee evaluations of faculty. Annals of the American Thoracic Society, 17(5), 621–626.
Heath, J. K., Weissman, G. E., Clancy, C. B., Shou, H., Farrar, J. T., & Dine, C. J. (2019). Assessment of gender-based linguistic differences in physician trainee evaluations of medical faculty using automated text mining. JAMA Network Open, 2(5), e193520–e193520.
Hessler, M., Pöpping, D. M., Hollstein, H., Ohlenburg, H., Arnemann, P. H., Massoth, C., Seidel, L. M., Zarbock, A., & Wenk, M. (2018). Availability of cookies during an academic course session affects evaluation of teaching. Medical Education, 52(10), 1064–1072.
Hirshfield, L. E. (2014). ‘She’s not good with crying’: The effect of gender expectations on graduate students’ assessments of their principal investigators. Gender and Education, 26(6), 601–617.
Hui, K., Sukhera, J., Vigod, S., Taylor, V. H., & Zaheer, J. (2020). Recognizing and addressing implicit gender bias in medicine. Canadian Medical Association Journal, 192(42), E1269–E1270.
Jones, R. F., & Froom, J. D. (1994). Faculty and administration views of problems in faculty evaluation. Academic Medicine, 69(6), 476–483.
MacNell, L., Driscoll, A., & Hunt, A. N. (2015). What’s in a name: Exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291–303.
Madera, J. M., Hebl, M. R., & Martin, R. C. (2009). Gender and letters of recommendation for academia: Agentic and communal differences. Journal of Applied Psychology, 94(6), 1591–1599.
McOwen, K. S., Bellini, L. M., Guerra, C. E., & Shea, J. A. (2007). Evaluation of clinical faculty: Gender and minority implications. Academic Medicine, 82(10 Suppl), S94-96.
Mitchell, K. M. W., & Martin, J. (2018). Gender bias in student evaluations. Political Science & Politics, 51(03), 648–652.
Morgan, H. K., Purkiss, J. A., Porter, A. C., Lypson, M. L., Santen, S. A., Christner, J. G., Grum, C. M., & Hammoud, M. M. (2016). Student evaluation of faculty physicians: Gender differences in teaching evaluations. Journal of Women’s Health (2002), 25(5), 453–456.
Myers, K. A., Zibrowski, E. M., & Lingard, L. (2011). A mixed-methods analysis of residents’ written comments regarding their clinical supervisors. Academic Medicine, 86(10), S21–S24.
Nebeker, C. A., Basson, M. D., Haan, P. S., Davis, A. T., Ali, M., Gupta, R. N., Osmer, R. L., Hardaway, J. C., Peshkepija, A. N., McLeod, M. K., et al. (2017). Do female surgeons learn or teach differently? American Journal of Surgery, 213(2), 282–287.
Newman, M. L., Groom, C. J., Handelman, L. D., & Pennebaker, J. W. (2008). Gender differences in language use: an analysis of 14,000 text samples. Discourse Processes, 45(3), 211–236.
Pennebaker, J. W., Booth, R.J., Boyd, R., Francis, M. E. (2015). LIWC Operator's Manual. Austin, Texas.
Riniolo, T. C., Johnson, K. C., Sherman, T. R., & Misso, J. A. (2006). Hot or not: Do professors perceived as physically attractive receive higher student evaluations? The Journal of General Psychology, 133(1), 19–35.
Rozin, P., & Royzman, E. B. (2001). Negativity bias, negativity dominance, and contagion. Personality and Social Psychology Review, 5(4), 296–320.
Rubini, M., & Menegatti, M. (2014). Hindering women’s careers in academia: gender linguistic bias in personnel selection. Journal of Language and Social Psychology, 33(6), 632–650.
Schmader, T., Whitehead, J., & Wysocki, V. H. (2007). A linguistic comparison of letters of recommendation for male and female chemistry and biochemistry job applicants. Sex Roles, 57(7–8), 509–514.
Schmidt, B. (2020). Gendered Language in Teacher Reviews. [accessed]. http://benschmidt.org/profGender/#%7B%22database%22%3A%22RMP%22%2C%22plotType%22%3A%22pointchart%22%2C%22method%22%3A%22return_json%22%2C%22search_limits%22%3A%7B%22word%22%3A%5B%22funny%22%5D%2C%22department__id%22%3A%7B%22%24lte%22%3A25%7D%7D%2C%22aesthetic%22%3A%7B%22x%22%3A%22WordsPerMillion%22%2C%22y%22%3A%22department%22%2C%22color%22%3A%22gender%22%7D%2C%22counttype%22%3A%5B%22WordCount%22%2C%22TotalWords%22%5D%2C%22groups%22%3A%5B%22unigram%22%5D%2C%22testGroup%22%3A%22C%22%7D.
Shellito, A. D., de Virgilio, C., Lee, G., Aarons, C. B., Namm, J. P., Smink, D. S., Tanner, T., Brasel, K. J., Poola, V. P., & Calhoun, K. E. (2020). Investigating association between sex and faculty teaching evaluation in general surgery residency programs: a multi-institutional study. Journal of the American College of Surgeons, 231(3), 309-315.e301.
Storage, D., Horne, Z., Cimpian, A., & Leslie, S.-J. (2016). The frequency of “brilliant” and “genius” in teaching evaluations predicts the representation of women and african americans across fields. PLoS One, 11(3), e0150194.
Stroud, L., Freeman, R., Kulasegaram, M. K., Cil, T. D., & Ginsburg, S. (2020). Gender effects in assessment of clinical teaching: Does concordance matter? Journal of Graduate Medical Education, 12(6), 710–716.
Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology., 29(1), 24–54.
Toma, C. L., & D’Angelo, J. D. (2015). Tell-tale words: Linguistic cues used to infer the expertise of online medical advice. Journal of Language and Social Psychology, 34(1), 25–45.
Trix, F., & Psenka, C. (2003). Exploring the color of glass: Letters of recommendation for female and male medical faculty. Discourse & Society, 14(2), 191–220.
Uijtdehaage, S., & O’Neal, C. (2015). A curious case of the phantom professor: Mindless teaching evaluations by medical students. Medical Education, 49(9), 928–932.
Uttl, B., White, C. A., & Gonzalez, D. W. (2016). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22–42.
van der Leeuw, R. M., Overeem, K., Arah, O. A., Heineman, M. J., & Lombarts, K. M. J. M. H. (2013). Frequency and determinants of residents’ narrative feedback on the teaching performance of faculty: Narratives in numbers. Academic Medicine, 88(9), 1324–1331.
Zabaleta, F. (2007). The use and misuse of student evaluations of teaching. Teaching in Higher Education, 12(1), 55–76.
Acknowledgements
The authors wish to thank Mr. Ed Lorens, Research Officer in the Department of Medicine, for compiling and anonymizing the data.
Funding
Dr. Ginsburg is supported as the Canada Research Chair for Health Professions Education.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
The Research Ethics Board at the University of Toronto gave approval for this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Number of individuals with this code attached | Percentage of total codes by gender | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
M with code | M without code | W with code | W without code | Total (730) | 61.36% Gender_Teacher = Man (448) | 38.60% Gender_Teacher = Woman (282) | Total (730) | |||||
1: Available | 148 | 300 | 72 | 210 | 220 | The chi-square statistic is 4.6283. The p-value is 0.031449. Significant at p < 0.05 | 1: Available | 67.27% | 32.73% | 100% | ||
2: Unavailable | 3 | 445 | 0 | 282 | 3 | NS | 2: Unavailable | 100% | 0% | 100% | ||
3: Approachable | 141 | 307 | 94 | 188 | 235 | 3: Approachable | 60% | 40% | 100% | |||
4: Not approachable, uncomfortable | 3 | 445 | 3 | 279 | 6 | 4: Not approachable, uncomfortable | 50% | 50% | 100% | |||
5: Comfortable, welcoming, safe | 0 | 448 | 0 | 282 | 0 | 5: Comfortable, welcoming, safe | 0.00% | 0.00% | 0% | |||
6: Comfortable | 32 | 416 | 30 | 252 | 62 | 6: Comfortable | 51.61% | 48.39% | 100% | |||
7: Welcoming | 44 | 404 | 22 | 260 | 66 | 7: Welcoming | 66.67% | 33.33% | 100% | |||
8: Safe environment | 26 | 422 | 21 | 261 | 47 | 8: Safe environment | 55% | 45% | 100% | |||
9: Support | 224 | 224 | 159 | 123 | 383 | 9: Support | 58.49% | 41.51% | 100% | |||
10: Explore limits | 14 | 434 | 9 | 273 | 23 | 10: Explore limits | 60.87% | 39.13% | 100% | |||
11: Autonomy | 57 | 391 | 38 | 244 | 95 | 11: Autonomy | 60.00% | 40.00% | 100% | |||
12: Micromanage, hands-on | 7 | 441 | 8 | 274 | 15 | 12: Micromanage, hands-on | 47% | 53% | 100% | |||
13: Independence | 103 | 345 | 69 | 213 | 172 | 13: Independence | 59.88% | 40.12% | 100% | |||
14: Feedback | 138 | 310 | 88 | 194 | 226 | 14: Feedback | 61% | 39% | 100% | |||
15: Feedback—neg | 15 | 433 | 10 | 272 | 25 | 15: Feedback—neg | 60.00% | 40.00% | 100% | |||
16: Personality | 10 | 438 | 4 | 278 | 14 | 16: Personality | 71.43% | 28.57% | 100% | |||
17: Personality characteristics | 0 | 448 | 0 | 282 | 0 | 17: Personality characteristics | 0.00% | 0.00% | 0% | |||
18: Friendly | 56 | 392 | 29 | 253 | 85 | 18: Friendly | 65.88% | 34.12% | 100% | |||
19: Intimidating | 5 | 443 | 4 | 278 | 9 | 19: Intimidating | 55.56% | 44.44% | 100% | |||
20: Not intimidating | 11 | 437 | 7 | 275 | 18 | 20: Not intimidating | 61% | 39% | 100% | |||
21: Kind | 86 | 362 | 60 | 222 | 146 | 21: Kind | 59% | 41% | 100% | |||
22: Caring | 32 | 416 | 27 | 255 | 59 | 22: Caring | 54.24% | 45.76% | 100% | |||
23: Empathic | 15 | 433 | 10 | 272 | 25 | 23: Empathic | 60.00% | 40.00% | 100% | |||
24: Warm | 9 | 439 | 11 | 271 | 20 | 24: Warm | 45.00% | 55.00% | 100% | |||
25: Cold | 2 | 446 | 1 | 281 | 3 | 25: Cold | 66.67% | 33.33% | 100% | |||
26: Belittling, condescending etc | 5 | 443 | 4 | 278 | 9 | 26: Belittling, condescending etc | 55.56% | 44.44% | 100% | |||
27: Enthusiastic | 48 | 400 | 37 | 245 | 85 | 27: Enthusiastic | 56.47% | 43.53% | 100% | |||
28: Eager | 13 | 435 | 6 | 276 | 19 | 28: Eager | 68.42% | 31.58% | 100% | |||
29: Fun, exciting | 69 | 379 | 32 | 250 | 101 | 29: Fun, exciting | 68.32% | 31.68% | 100% | |||
30: Sense of humour, funny | 23 | 425 | 7 | 275 | 30 | Sig X = 3.099,p < 0.079 | 30: Sense of humour, funny | 77% | 23% | 100% | ||
31: Person | 58 | 390 | 28 | 254 | 86 | 31: Person | 67.44% | 32.56% | 100% | |||
32: Human | 11 | 437 | 3 | 279 | 14 | 32: Human | 79% | 21% | 100% | |||
33: Respect | 105 | 343 | 68 | 214 | 173 | 33: Respect | 60.69% | 39.31% | 100% | |||
34: Disrespect | 0 | 448 | 1 | 281 | 1 | 34: Disrespect | 0.00% | 100.00% | 100% | |||
35: Learner | 50 | 398 | 38 | 244 | 88 | 35: Learner | 56.82% | 43.18% | 100% | |||
36: Learning-top | 0 | 448 | 0 | 282 | 0 | 36: Learning-top | 0.00% | 0.00% | 0% | |||
37: Learning | 213 | 235 | 138 | 144 | 351 | 37: Learning | 60.68% | 39.32% | 100% | |||
38: Learned | 83 | 365 | 47 | 235 | 130 | 38: Learned | 63.85% | 36.15% | 100% | |||
39: Teacher | 324 | 124 | 198 | 84 | 522 | 39: Teacher | 62.07% | 37.93% | 100% | |||
40: Teaching | 20 | 428 | 11 | 271 | 31 | 40:Teaching | 64.52% | 35.48% | 100% | |||
41: Educator | 35 | 413 | 32 | 250 | 67 | 41: Educator | 52.24% | 47.76% | 100% | |||
42: Attending | 49 | 399 | 20 | 262 | 69 | 2.99 | 1 | 0.084 | 71.01% | 28.99% | 100% | |
43: Supervisor | 75 | 373 | 41 | 241 | 116 | 43: Supervisor | 64.66% | 35.34% | 100% | |||
44: Doctor | 25 | 423 | 8 | 274 | 33 | 3.018a | 1 | 0.082 | 76% | 24% | 100% | |
45: Dr | 53 | 395 | 26 | 256 | 79 | 45: Dr | 67.09% | 32.91% | 100% | |||
46: Physician | 108 | 340 | 66 | 216 | 174 | 46: Physician | 62.07% | 37.93% | 100% | |||
47: Clinician | 40 | 408 | 29 | 253 | 69 | 47: Clinician | 57.97% | 42.03% | 100% | |||
48: Positive Adjectives | 0 | 448 | 0 | 282 | 0 | 48: Positive Adjectives | 0.00% | 0.00% | 0% | |||
49: Good | 189 | 259 | 110 | 172 | 299 | 49: Good | 63.21% | 36.79% | 100% | |||
50: Excellent | 290 | 158 | 173 | 109 | 463 | 50: Excellent | 62.63% | 37.37% | 100% | |||
51: Exemplary | 57 | 391 | 27 | 255 | 84 | 51: Exemplary | 67.86% | 32.14% | 100% | |||
52: Outstanding | 71 | 377 | 44 | 238 | 115 | 52: Outstanding | 61.74% | 38.26% | 100% | |||
53: Exceptional | 109 | 339 | 60 | 222 | 169 | 53: Exceptional | 64.50% | 35.50% | 100% | |||
54: Role Model | 159 | 289 | 113 | 169 | 272 | 54: Role Model | 58.46% | 41.54% | 100% | |||
55: Evidence | 47 | 401 | 24 | 258 | 71 | 55: Evidence | 66.20% | 33.80% | 100% | |||
56: Bedside manner | 40 | 408 | 23 | 259 | 63 | 56: Bedside manner | 63.49% | 36.51% | 100% | |||
57: Time—pos | 163 | 285 | 109 | 173 | 272 | 57: Time—pos | 60% | 40% | 100% | |||
58: Time—neg | 28 | 420 | 20 | 262 | 48 | 58: Time—neg | 58.33% | 41.67% | 100% | |||
59: Efficient | 82 | 366 | 39 | 243 | 121 | 59: Efficient | 67.77% | 32.23% | 100% | |||
60: Inefficient | 6 | 442 | 3 | 279 | 9 | 60: Inefficient | 66.67% | 33.33% | 100% | |||
61: Disorganized | 3 | 445 | 3 | 279 | 6 | 61: Disorganized | 50% | 50% | 100% | |||
Total (unique) | 437 | 276 | 713 | Total (unique) | 61.29% | 38.71% | 100% |
Rights and permissions
About this article
Cite this article
Ginsburg, S., Stroud, L., Lynch, M. et al. Beyond the ratings: gender effects in written comments from clinical teaching assessments. Adv in Health Sci Educ 27, 355–374 (2022). https://doi.org/10.1007/s10459-021-10088-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10459-021-10088-1