Skip to main content

Advertisement

Log in

Beyond the ratings: gender effects in written comments from clinical teaching assessments

  • Published:
Advances in Health Sciences Education Aims and scope Submit manuscript

Abstract

Assessment of clinical teachers by learners is problematic. Construct-irrelevant factors influence ratings, and women teachers often receive lower ratings than men. However, most studies focus only on numeric scores. Therefore, the authors analyzed written comments on 4032 teacher assessments, representing 282 women and 448 men teachers in one Department of Medicine, to explore for gender differences. NVivo was used to search for 61 evidence- and theoretically-based terms purported to reflect teaching excellence, which were analyzed using 2 × 2 chi-squared tests. The Linguistic Index and Word Count (LIWC) was used to categorize comment data, which were analyzed using linear regressions. The only significant difference in NVivo was that men were more likely than women to have the word “available” in a comment (OR 1.4, p < .05). A subset of LIWC variables showed significant gender differences, but all effects were modest. Men teachers had more positive emotion words written about them, while negative emotion words appeared equally. Significant differences occurred more often between the men and women residents who wrote the comments, rather than those attributed to the gender of the teachers. For example, women residents used more social and gender-related words (β 1.87, p < 0.001) and fewer words related to power or achievement (β −3.78, p < 0.001) than men residents. Profound gender differences were not found in teacher assessment comments in this large, diverse academic department of medicine, which differs from other studies. The authors explore possible reasons including differences in departmental culture and issues related to the methods used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • @facultyfocus. (2018). What to Do About Those Negative Comments on Course Evaluations. @facultyfocus; [updated 2018–05–30; accessed]. https://www.facultyfocus.com/articles/educational-assessment/negative-comments-on-course-evaluations/.

  • Avoiding gender bias in reference writing. (2021). [Accessed]. https://csw.arizona.edu/sites/default/files/avoiding_gender_bias_in_letter_of_reference_writing.pdf.

  • Billick, M., Rassos, J., & Ginsburg, S. (2021). Dressing the part: Gender differences in residents’ perceptions of feedback in internal medicine. Academic Medicine. https://doi.org/10.1097/ACM.0000000000004487

    Article  Google Scholar 

  • de Groot, J., Brunet, A., Kaplan, A. S., & Bagby, M. (2003). A comparison of evaluations of male and female psychiatry supervisors. Academic Psychiatry, 27(1), 39–43.

    Article  Google Scholar 

  • Dory, V., Cummings, B.-A., Mondou, M., & Young, M. (2019). Nudging clinical supervisors to provide better in-training assessment reports [journal article]. Perspectives on Medical Education, 9, 66–70.

    Article  Google Scholar 

  • Dudek, N. L., Marks, M., Bandiera, G., White, J., & Wood, T. J. (2013). Quality in-training evaluation reports–does feedback drive faculty performance? Academic Medicine., 88(8), 1129–1134.

    Article  Google Scholar 

  • Fassiotto, M., Li, J., Maldonado, Y., & Kothary, N. (2018). Female surgeons as counter stereotype: The impact of gender perceptions on trainee evaluations of physician faculty. Journal of Surgical Education, 75(5), 1140–1148.

    Article  Google Scholar 

  • Files, J. A., Mayer, A. P., Ko, M. G., Friedrich, P., Jenkins, M., Bryan, M. J., Vegunta, S., Wittich, C. M., Lyle, M. A., Melikian, R., Duston, T., Chang, Y.-H.H., & Hayes, S. N. (2017). Speaker introductions at internal medicine grand rounds: Forms of address reveal gender bias. Journal of Women’s Health, 26(5), 413–419.

    Article  Google Scholar 

  • Fluit, C. R. M. G., Feskens, R., Bolhuis, S., Grol, R., Wensing, M., & Laan, R. (2015). Understanding resident ratings of teaching in the workplace: A multi-centre study. Advances in Health Sciences Education., 20(3), 691–707.

    Article  Google Scholar 

  • Ginsburg, S., Gingerich, A., Kogan, J. R., Watling, C. J., & Eva, K. W. (2020a). Idiosyncrasy in assessment comments: Do faculty have distinct writing styles when completing in-training evaluation reports? Academic Medicine, 95, S81–S88.

    Article  Google Scholar 

  • Ginsburg, S., Gold, W., Cavalcanti, R. B., Kurabi, B., & McDonald-Blumer, H. (2011). Competencies “plus”: The nature of written comments on internal medicine residents’ evaluation forms. Academic Medicine, 86(10 Suppl), S30-34.

    Article  Google Scholar 

  • Ginsburg, S., Kogan, J. R., Gingerich, A., Lynch, M., & Watling, C. J. (2020b). Taken out of context: hazards in the interpretation of written assessment comments. Academic Medicine, 95(7), 1082–1088.

    Article  Google Scholar 

  • Ginsburg, S., Regehr, G., Lingard, L., & Eva, K. W. (2015). Reading between the lines: Faculty interpretations of narrative evaluation comments. Medical Education, 49(3), 296–306.

    Article  Google Scholar 

  • Ginsburg, S., van der Vleuten, C., Eva, K. W., & Lingard, L. (2016). Hedging to save face: A linguistic analysis of written comments on in-training evaluation reports. Advances in Health Sciences Education: Theory and Practice, 21(1), 175–188.

    Article  Google Scholar 

  • Hamermesh, D. S., & Parker, A. (2005a). Beauty in the classroom: Instructors’ pulchritude and putative pedagogical productivity. Economics of Education Review, 24(4), 369–376.

    Article  Google Scholar 

  • Hamermesh, D. S., & Parker, A. (2005b). Beauty in the classroom: Instructors’ pulchritude and putative pedagogical productivity. Economics of Education Review, 24(4), 369–376.

    Article  Google Scholar 

  • Heath, J. K., Clancy, C. B., Carillo-Perez, A., & Dine, C. J. (2020). Assessment of gender-based qualitative differences within trainee evaluations of faculty. Annals of the American Thoracic Society, 17(5), 621–626.

    Article  Google Scholar 

  • Heath, J. K., Weissman, G. E., Clancy, C. B., Shou, H., Farrar, J. T., & Dine, C. J. (2019). Assessment of gender-based linguistic differences in physician trainee evaluations of medical faculty using automated text mining. JAMA Network Open, 2(5), e193520–e193520.

    Article  Google Scholar 

  • Hessler, M., Pöpping, D. M., Hollstein, H., Ohlenburg, H., Arnemann, P. H., Massoth, C., Seidel, L. M., Zarbock, A., & Wenk, M. (2018). Availability of cookies during an academic course session affects evaluation of teaching. Medical Education, 52(10), 1064–1072.

    Article  Google Scholar 

  • Hirshfield, L. E. (2014). ‘She’s not good with crying’: The effect of gender expectations on graduate students’ assessments of their principal investigators. Gender and Education, 26(6), 601–617.

    Article  Google Scholar 

  • Hui, K., Sukhera, J., Vigod, S., Taylor, V. H., & Zaheer, J. (2020). Recognizing and addressing implicit gender bias in medicine. Canadian Medical Association Journal, 192(42), E1269–E1270.

    Article  Google Scholar 

  • Jones, R. F., & Froom, J. D. (1994). Faculty and administration views of problems in faculty evaluation. Academic Medicine, 69(6), 476–483.

    Article  Google Scholar 

  • MacNell, L., Driscoll, A., & Hunt, A. N. (2015). What’s in a name: Exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291–303.

    Article  Google Scholar 

  • Madera, J. M., Hebl, M. R., & Martin, R. C. (2009). Gender and letters of recommendation for academia: Agentic and communal differences. Journal of Applied Psychology, 94(6), 1591–1599.

    Article  Google Scholar 

  • McOwen, K. S., Bellini, L. M., Guerra, C. E., & Shea, J. A. (2007). Evaluation of clinical faculty: Gender and minority implications. Academic Medicine, 82(10 Suppl), S94-96.

    Article  Google Scholar 

  • Mitchell, K. M. W., & Martin, J. (2018). Gender bias in student evaluations. Political Science & Politics, 51(03), 648–652.

    Article  Google Scholar 

  • Morgan, H. K., Purkiss, J. A., Porter, A. C., Lypson, M. L., Santen, S. A., Christner, J. G., Grum, C. M., & Hammoud, M. M. (2016). Student evaluation of faculty physicians: Gender differences in teaching evaluations. Journal of Women’s Health (2002), 25(5), 453–456.

    Article  Google Scholar 

  • Myers, K. A., Zibrowski, E. M., & Lingard, L. (2011). A mixed-methods analysis of residents’ written comments regarding their clinical supervisors. Academic Medicine, 86(10), S21–S24.

    Article  Google Scholar 

  • Nebeker, C. A., Basson, M. D., Haan, P. S., Davis, A. T., Ali, M., Gupta, R. N., Osmer, R. L., Hardaway, J. C., Peshkepija, A. N., McLeod, M. K., et al. (2017). Do female surgeons learn or teach differently? American Journal of Surgery, 213(2), 282–287.

    Article  Google Scholar 

  • Newman, M. L., Groom, C. J., Handelman, L. D., & Pennebaker, J. W. (2008). Gender differences in language use: an analysis of 14,000 text samples. Discourse Processes, 45(3), 211–236.

    Article  Google Scholar 

  • Pennebaker, J. W., Booth, R.J., Boyd, R., Francis, M. E. (2015). LIWC Operator's Manual. Austin, Texas.

  • Riniolo, T. C., Johnson, K. C., Sherman, T. R., & Misso, J. A. (2006). Hot or not: Do professors perceived as physically attractive receive higher student evaluations? The Journal of General Psychology, 133(1), 19–35.

    Article  Google Scholar 

  • Rozin, P., & Royzman, E. B. (2001). Negativity bias, negativity dominance, and contagion. Personality and Social Psychology Review, 5(4), 296–320.

    Article  Google Scholar 

  • Rubini, M., & Menegatti, M. (2014). Hindering women’s careers in academia: gender linguistic bias in personnel selection. Journal of Language and Social Psychology, 33(6), 632–650.

    Article  Google Scholar 

  • Schmader, T., Whitehead, J., & Wysocki, V. H. (2007). A linguistic comparison of letters of recommendation for male and female chemistry and biochemistry job applicants. Sex Roles, 57(7–8), 509–514.

    Article  Google Scholar 

  • Schmidt, B. (2020). Gendered Language in Teacher Reviews. [accessed]. http://benschmidt.org/profGender/#%7B%22database%22%3A%22RMP%22%2C%22plotType%22%3A%22pointchart%22%2C%22method%22%3A%22return_json%22%2C%22search_limits%22%3A%7B%22word%22%3A%5B%22funny%22%5D%2C%22department__id%22%3A%7B%22%24lte%22%3A25%7D%7D%2C%22aesthetic%22%3A%7B%22x%22%3A%22WordsPerMillion%22%2C%22y%22%3A%22department%22%2C%22color%22%3A%22gender%22%7D%2C%22counttype%22%3A%5B%22WordCount%22%2C%22TotalWords%22%5D%2C%22groups%22%3A%5B%22unigram%22%5D%2C%22testGroup%22%3A%22C%22%7D.

  • Shellito, A. D., de Virgilio, C., Lee, G., Aarons, C. B., Namm, J. P., Smink, D. S., Tanner, T., Brasel, K. J., Poola, V. P., & Calhoun, K. E. (2020). Investigating association between sex and faculty teaching evaluation in general surgery residency programs: a multi-institutional study. Journal of the American College of Surgeons, 231(3), 309-315.e301.

    Article  Google Scholar 

  • Storage, D., Horne, Z., Cimpian, A., & Leslie, S.-J. (2016). The frequency of “brilliant” and “genius” in teaching evaluations predicts the representation of women and african americans across fields. PLoS One, 11(3), e0150194.

    Article  Google Scholar 

  • Stroud, L., Freeman, R., Kulasegaram, M. K., Cil, T. D., & Ginsburg, S. (2020). Gender effects in assessment of clinical teaching: Does concordance matter? Journal of Graduate Medical Education, 12(6), 710–716.

    Article  Google Scholar 

  • Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology., 29(1), 24–54.

    Article  Google Scholar 

  • Toma, C. L., & D’Angelo, J. D. (2015). Tell-tale words: Linguistic cues used to infer the expertise of online medical advice. Journal of Language and Social Psychology, 34(1), 25–45.

    Article  Google Scholar 

  • Trix, F., & Psenka, C. (2003). Exploring the color of glass: Letters of recommendation for female and male medical faculty. Discourse & Society, 14(2), 191–220.

    Article  Google Scholar 

  • Uijtdehaage, S., & O’Neal, C. (2015). A curious case of the phantom professor: Mindless teaching evaluations by medical students. Medical Education, 49(9), 928–932.

    Article  Google Scholar 

  • Uttl, B., White, C. A., & Gonzalez, D. W. (2016). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22–42.

    Article  Google Scholar 

  • van der Leeuw, R. M., Overeem, K., Arah, O. A., Heineman, M. J., & Lombarts, K. M. J. M. H. (2013). Frequency and determinants of residents’ narrative feedback on the teaching performance of faculty: Narratives in numbers. Academic Medicine, 88(9), 1324–1331.

    Article  Google Scholar 

  • Zabaleta, F. (2007). The use and misuse of student evaluations of teaching. Teaching in Higher Education, 12(1), 55–76.

    Article  Google Scholar 

Download references

Acknowledgements

The authors wish to thank Mr. Ed Lorens, Research Officer in the Department of Medicine, for compiling and anonymizing the data.

Funding

Dr. Ginsburg is supported as the Canada Research Chair for Health Professions Education.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiphra Ginsburg.

Ethics declarations

Ethical approval

The Research Ethics Board at the University of Toronto gave approval for this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

 

Number of individuals with this code attached

    

Percentage of total codes by gender

 

M with code

M without code

W with code

W without code

Total (730)

    

61.36% Gender_Teacher = Man (448)

38.60% Gender_Teacher = Woman (282)

Total (730)

1: Available

148

300

72

210

220

The chi-square statistic is 4.6283. The p-value is 0.031449. Significant at p < 0.05

  

1: Available

67.27%

32.73%

100%

2: Unavailable

3

445

0

282

3

NS

  

2: Unavailable

100%

0%

100%

3: Approachable

141

307

94

188

235

   

3: Approachable

60%

40%

100%

4: Not approachable, uncomfortable

3

445

3

279

6

   

4: Not approachable, uncomfortable

50%

50%

100%

5: Comfortable, welcoming, safe

0

448

0

282

0

   

5: Comfortable, welcoming, safe

0.00%

0.00%

0%

6: Comfortable

32

416

30

252

62

   

6: Comfortable

51.61%

48.39%

100%

7: Welcoming

44

404

22

260

66

   

7: Welcoming

66.67%

33.33%

100%

8: Safe environment

26

422

21

261

47

   

8: Safe environment

55%

45%

100%

9: Support

224

224

159

123

383

   

9: Support

58.49%

41.51%

100%

10: Explore limits

14

434

9

273

23

   

10: Explore limits

60.87%

39.13%

100%

11: Autonomy

57

391

38

244

95

   

11: Autonomy

60.00%

40.00%

100%

12: Micromanage, hands-on

7

441

8

274

15

   

12: Micromanage, hands-on

47%

53%

100%

13: Independence

103

345

69

213

172

   

13: Independence

59.88%

40.12%

100%

14: Feedback

138

310

88

194

226

   

14: Feedback

61%

39%

100%

15: Feedback—neg

15

433

10

272

25

   

15: Feedback—neg

60.00%

40.00%

100%

16: Personality

10

438

4

278

14

   

16: Personality

71.43%

28.57%

100%

17: Personality characteristics

0

448

0

282

0

   

17: Personality characteristics

0.00%

0.00%

0%

18: Friendly

56

392

29

253

85

   

18: Friendly

65.88%

34.12%

100%

19: Intimidating

5

443

4

278

9

   

19: Intimidating

55.56%

44.44%

100%

20: Not intimidating

11

437

7

275

18

   

20: Not intimidating

61%

39%

100%

21: Kind

86

362

60

222

146

   

21: Kind

59%

41%

100%

22: Caring

32

416

27

255

59

   

22: Caring

54.24%

45.76%

100%

23: Empathic

15

433

10

272

25

   

23: Empathic

60.00%

40.00%

100%

24: Warm

9

439

11

271

20

   

24: Warm

45.00%

55.00%

100%

25: Cold

2

446

1

281

3

   

25: Cold

66.67%

33.33%

100%

26: Belittling, condescending etc

5

443

4

278

9

   

26: Belittling, condescending etc

55.56%

44.44%

100%

27: Enthusiastic

48

400

37

245

85

   

27: Enthusiastic

56.47%

43.53%

100%

28: Eager

13

435

6

276

19

   

28: Eager

68.42%

31.58%

100%

29: Fun, exciting

69

379

32

250

101

   

29: Fun, exciting

68.32%

31.68%

100%

30: Sense of humour, funny

23

425

7

275

30

Sig X = 3.099,p < 0.079

 

30: Sense of humour, funny

77%

23%

100%

 

31: Person

58

390

28

254

86

   

31: Person

67.44%

32.56%

100%

32: Human

11

437

3

279

14

   

32: Human

79%

21%

100%

33: Respect

105

343

68

214

173

   

33: Respect

60.69%

39.31%

100%

34: Disrespect

0

448

1

281

1

   

34: Disrespect

0.00%

100.00%

100%

35: Learner

50

398

38

244

88

   

35: Learner

56.82%

43.18%

100%

36: Learning-top

0

448

0

282

0

   

36: Learning-top

0.00%

0.00%

0%

37: Learning

213

235

138

144

351

   

37: Learning

60.68%

39.32%

100%

38: Learned

83

365

47

235

130

   

38: Learned

63.85%

36.15%

100%

39: Teacher

324

124

198

84

522

   

39: Teacher

62.07%

37.93%

100%

40: Teaching

20

428

11

271

31

   

40:Teaching

64.52%

35.48%

100%

41: Educator

35

413

32

250

67

   

41: Educator

52.24%

47.76%

100%

42: Attending

49

399

20

262

69

2.99

1

0.084

 

71.01%

28.99%

100%

43: Supervisor

75

373

41

241

116

   

43: Supervisor

64.66%

35.34%

100%

44: Doctor

25

423

8

274

33

3.018a

1

0.082

 

76%

24%

100%

45: Dr

53

395

26

256

79

   

45: Dr

67.09%

32.91%

100%

46: Physician

108

340

66

216

174

   

46: Physician

62.07%

37.93%

100%

47: Clinician

40

408

29

253

69

   

47: Clinician

57.97%

42.03%

100%

48: Positive Adjectives

0

448

0

282

0

   

48: Positive Adjectives

0.00%

0.00%

0%

49: Good

189

259

110

172

299

   

49: Good

63.21%

36.79%

100%

50: Excellent

290

158

173

109

463

   

50: Excellent

62.63%

37.37%

100%

51: Exemplary

57

391

27

255

84

   

51: Exemplary

67.86%

32.14%

100%

52: Outstanding

71

377

44

238

115

   

52: Outstanding

61.74%

38.26%

100%

53: Exceptional

109

339

60

222

169

   

53: Exceptional

64.50%

35.50%

100%

54: Role Model

159

289

113

169

272

   

54: Role Model

58.46%

41.54%

100%

55: Evidence

47

401

24

258

71

   

55: Evidence

66.20%

33.80%

100%

56: Bedside manner

40

408

23

259

63

   

56: Bedside manner

63.49%

36.51%

100%

57: Time—pos

163

285

109

173

272

   

57: Time—pos

60%

40%

100%

58: Time—neg

28

420

20

262

48

   

58: Time—neg

58.33%

41.67%

100%

59: Efficient

82

366

39

243

121

   

59: Efficient

67.77%

32.23%

100%

60: Inefficient

6

442

3

279

9

   

60: Inefficient

66.67%

33.33%

100%

61: Disorganized

3

445

3

279

6

   

61: Disorganized

50%

50%

100%

Total (unique)

437

 

276

 

713

   

Total (unique)

61.29%

38.71%

100%

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ginsburg, S., Stroud, L., Lynch, M. et al. Beyond the ratings: gender effects in written comments from clinical teaching assessments. Adv in Health Sci Educ 27, 355–374 (2022). https://doi.org/10.1007/s10459-021-10088-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10459-021-10088-1

Keywords

Navigation