Evaluating Student Evaluations of Teaching: a Review of Measurement and Equity Bias in SETs and Recommendations for Ethical Reform

Abstract

Student evaluations of teaching are ubiquitous in the academe as a metric for assessing teaching and frequently used in critical personnel decisions. Yet, there is ample evidence documenting both measurement and equity bias in these assessments. Student Evaluations of Teaching (SETs) have low or no correlation with learning. Furthermore, scholars using different data and different methodologies routinely find that women faculty, faculty of color, and other marginalized groups are subject to a disadvantage in SETs. Extant research on bias on teaching evaluations tend to review only the aspect of the literature most pertinent to that study. In this paper, we review a novel dataset of over 100 articles on bias in student evaluations of teaching and provide a nuanced review of this broad but established literature. We find that women and other marginalized groups do face significant biases in standard evaluations of teaching – however, the effect of gender is conditional upon other factors. We conclude with recommendations for the judicious use of SETs and avenues for future research.

This is a preview of subscription content, access via your institution.

Notes

  1. 1.

    For now, the entirety of this discussion and related research is binary in its orientation. We recognize that gender is more complex than women and men and acknowledge that gender identity that does not overtly conform to the binary likely complicates evaluations of teaching further than the existing body of knowledge has even identified

  2. 2.

    A full list of articles and article summaries are available at < redacted > 

  3. 3.

    Though see Basow and Montgomery (2005), which finds no significant interactions between student and faculty gender

  4. 4.

    Research also finds that the role of attractiveness is more relevant to women, who are more likely to get comments about their appearance (Mitchell & Martin, 2018; Key & Ardoin, 2019). This is problematic given that attractiveness has been shown to be correlated with evaluations of instructional quality (Rosen, 2018)

References

  1. Abel, M. H., & Meltzer, A. L. (2007). Student ratings of a male and female professors’ lecture on sex discrimination in the workforce. Sex Roles, 57(3–4), 173–180

    Article  Google Scholar 

  2. Abrami, P. C. (2001). Improving judgments about teaching effectiveness using teacher rating forms. New Directions for Institutional Research, 2001(109), 59–87

    Article  Google Scholar 

  3. Adams, M. J. D., & Umbach, P. D. (2012). Nonresponse and online student evaluations of teaching: understanding the influence of salience, fatigue, and academic environments. Research in Higher Education, 53(5), 576–591

    Article  Google Scholar 

  4. Anderson, K. J. (2010). Students’ stereotypes of professors: An exploration of the double violations of ethnicity and gender. Social Psychology of Education, 13(4), 459–472

    Article  Google Scholar 

  5. Anderson, K. J., & Kanner, M. (2011). Inventing a Gay Agenda: Students’ Perceptions of Lesbian and Gay Professors 1. Journal of Applied Social Psychology, 41(6), 1538–1564

    Article  Google Scholar 

  6. Anderson, K. J., & Smith, G. (2005). Students’ preconceptions of professors: Benefits and barriers according to ethnicity and gender. Hispanic Journal of Behavioral Sciences, 27(2), 184–201

    Article  Google Scholar 

  7. Aguirre Jr, A. (2000). Women and Minority Faculty in the Academic Workplace: Recruitment, Retention, and Academic Culture. ASHE-ERIC Higher Education Report, Volume 27, Number 6. Jossey-Bass Higher and Adult Education Series. Jossey-Bass, 350 Sansome St., San Francisco, CA 94104-1342.

  8. APSA. (2011). Political science in the 21st century edited by report of the task force on political science in the 21st century

  9. Arbuckle, J., & Williams, B. D. (2003). Students’ perceptions of expressiveness: Age and gender effects on teacher evaluations. Sex Roles, 49(9–10), 507–516

    Article  Google Scholar 

  10. Arreola, R. A. (2004). Developing a comprehensive faculty evaluation system. Magna Publications

  11. Bachen, C. M., McLoughlin, M. M., & Garcia, S. S. (1999). Assessing the role of gender in college students’ evaluations of faculty. Communication Education, 48(3), 193–210

    Article  Google Scholar 

  12. Baker, P., & Copp, M. (1997). Gender matters most: the interaction of gendered expectations, feminist course content, and pregnancy in student course evaluations. Teaching Sociology: 29–43

  13. Barbezat, D. A., & Hughes, J. W. (2005). Salary structure effects and the gender pay gap in academia. Research in Higher Education, 46(6), 621–640.

    Article  Google Scholar 

  14. Bos, A. L., Sweet-Cushman, J., & Schneider, M. C. (2019). Family-friendly academic conferences: a missing link to fix the “leaky pipeline”? Politics, Groups, and Identities, 7(3), 748–758.

    Article  Google Scholar 

  15. Basow, S. A., & Distenfeld, M. S. (1985). Teacher expressiveness: More important for male teachers than female teachers? Journal of Educational Psychology, 77(1), 45

    Article  Google Scholar 

  16. Basow, S. A., & Howe, K. G. (1987). Evaluations of college professors: Effects of professors’ sex-type, and sex, and students’ sex. Psychological Reports, 60(2), 671–678

    Article  Google Scholar 

  17. Basow, S. A. (1995). Student evaluations of college professors: When gender matters. Journal of Educational Psychology, 87(4), 656

    Article  Google Scholar 

  18. Basow, S. A. (2000). Best and worst professors: Gender patterns in students’ choices. Sex Roles, 43(5–6), 407–417

    Article  Google Scholar 

  19. Basow, S. A., & Montgomery, S. (2005). Student ratings and professor self-ratings of college teaching: Effects of gender and divisional affiliation. Journal of Personnel Evaluation in Education, 18(2), 91–106

    Article  Google Scholar 

  20. Basow, S. A., & Silberg, N. T. (1987). Student evaluations of college professors: Are female and male professors rated differently? Journal of Educational Psychology, 79(3), 308

    Article  Google Scholar 

  21. Bennett, S. K. (1982). Student perceptions of and expectations for male and female instructors: Evidence relating to the question of gender bias in teaching evaluation. Journal of Educational Psychology, 74(2), 170

    Article  Google Scholar 

  22. Benton, S. L., & Cashin, W. E. (2012). Student ratings of teaching: a summary of research and literature (IDEA Paper no. 50). Manhattan, KS: The IDEA Center

  23. Bian, L., Leslie, S.-J., & Cimpian, A. (2017). Gender stereotypes about intellectual ability emerge early and influence children’s interests. Science, 355(6323), 389–391

    Article  Google Scholar 

  24. Boring, A. (2017). Gender biases in student evaluations of teaching. Journal of Public Economics, 145, 27–41

    Article  Google Scholar 

  25. Boring, A., Ottoboni, K., & Stark, P. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research

  26. Bray, J. H., & Howard, G. S. (1980). Interaction of teacher and student sex and sex role orientations and student evaluations of college instruction. Contemporary Educational Psychology,5(3), 241–248

    Article  Google Scholar 

  27. Burns-Glover, A. L., & Veith, D. J. (1995). Revisiting gender and teaching evaluations: Sex still makes a difference. Journal of Social Behavior and Personality, 10(4), 69

    Google Scholar 

  28. Centra, J. A. (2000). Evaluating the Teaching Portfolio: A Role for Colleagues. New Directions for Teaching and Learning, 83, 87–93

    Article  Google Scholar 

  29. Centra, J. A., & Gaubatz, N. B. (1998). Is there gender bias in student ratings of instruction. Journal of Higher Education, 70, 17–33

    Google Scholar 

  30. Chamberlin, M. S., & Hickey, J. S. (2001). Student evaluations of faculty performance: The role of gender expectationis in differential evaluations. Educational Research Quarterly, 25(2), 3

    Google Scholar 

  31. Chapman, D. D., & Joines, J. A. (2017). Strategies for Increasing Response Rates for Online End-of-Course Evaluations. International Journal of Teaching and Learning in Higher Education, 29(1), 47–60

    Google Scholar 

  32. Chávez, K., & Mitchell, K. M. (2020). Exploring bias in student evaluations: Gender, race, and ethnicity. PS: Political Science & Politics53(2), 270-274.

  33. Chism, N. V. N. (2007). Peer Review of Teaching. A Sourcebook. Bolton Massachusetts: Anker

  34. Eagly, A. H., & Karau, S. J. (2002). Role congruity theory of prejudice toward female leaders. Psychological Review, 109(3), 573

    Article  Google Scholar 

  35. El-Alayli, A., Hansen-Brown, A. A., & Ceynar, M. (2018). Dancing backwards in high heels: Female professors experience more work demands and special favor requests, particularly from academically entitled students. Sex Roles, 79(3–4), 136–150

    Article  Google Scholar 

  36. Elmore, P. B., & LaPointe, K. A. (1974). Effects of teacher sex and student sex on the evaluation of college instructors. Journal of Educational Psychology, 66(3), 386.

    Article  Google Scholar 

  37. Elmore, P. B., & LaPointe, K. A. (1975). Effect of teacher sex, student sex, and teacher warmth on the evaluation of college instructors. Journal of Educational Psychology, 67(3), 368

    Article  Google Scholar 

  38. Esarey, J., & Valdes, N. (2020). Unbiased, reliable, and valid student evaluations can still be unfair. Assessment & Evaluation in Higher Education

  39. Ewing, V. L., Stukas Jr, A. A., & Sheehan, E. P. (2003). Student prejudice against gay male and lesbian lecturers. The Journal of Social Psychology, 143(5), 569–579

    Article  Google Scholar 

  40. Fan, Y., Shepherd, L. J., Slavich, E., Waters, D., Stone, M., Abel, R., & Johnston, E. L. (2019). Gender and cultural bias in student evaluations: Why representation matters. PLoS One, 14(2), e0209749

    Article  Google Scholar 

  41. Feldman, K. A. (1992). College students’ views of male and female college teachers: Part I—Evidence from the social laboratory and experiments. Research in Higher Education, 33(3), 317–375

    Article  Google Scholar 

  42. Fischer, E., & Hänze, M. (2019). Bias hypotheses under scrutiny: investigating the validity of student assessment of university teaching by means of external observer ratings. Assessment & Evaluation in Higher Education, 44(5), 772–786

    Article  Google Scholar 

  43. Franklin, J. (2001). Interpreting the numbers: Using a narrative to help others read student evaluations of your teaching accurately. New Directions for Teaching and Learning, 87, 85–100

    Article  Google Scholar 

  44. Franklin, J., & Theall, M. (1995). The relationship of disciplinary differences and the value of class preparation time to student ratings of teaching. New Directions for Teaching and Learning, 1995(64), 41–48

    Article  Google Scholar 

  45. Freeman, H. R. (1994). Student evaluations of college instructors: Effects of type of course taught, instructor gender and gender role, and student gender. Journal of Educational Psychology, 86(4), 627

    Article  Google Scholar 

  46. Greenwald, A. G., & Gillmore, G. M. (1997). No pain, no gain? The importance of measuring course workload in student ratings of instruction. Journal of Educational Psychology, 89(4), 743

    Article  Google Scholar 

  47. Hamermesh, D. S., & Parker, A. (2005). Beauty in the classroom: Instructors’ pulchritude and putative pedagogical productivity. Economics of Education Review, 24(4), 369–376

    Article  Google Scholar 

  48. Harris, M. B. (1975). Sex role stereotypes and teacher evaluations. Journal of Educational Psychology, 67(6), 751

    Article  Google Scholar 

  49. Ḥaṭiva, N. (2013a). Student ratings of instruction: a practical approach to designing, operating, and reporting. Oron Publications

  50. Ḥaṭiva, N. (2013b). Student ratings of instruction: Recognizing effective teaching. Oron Publications

  51. Hessler, M., Pöpping, D. M., Hollstein, H., Ohlenburg, H., Arnemann, P. H., Massoth, C., et al. (2018). Availability of cookies during an academic course session affects evaluation of teaching. Medical Education, 52(10), 1064–1072

    Article  Google Scholar 

  52. Himelein, M. J. (2018). Pitfalls of using student comments in the evaluation of faculty. Academic Briefing: Expert Advice for Higher Ed Leaders. https://www.academicbriefing.com/human-resources/faculty-evaluation/pitfalls-of-using-student-comments-evaluation-of-faculty/

  53. Kaschak, E. (1978). Sex bias in student evaluations of college professors. Psychology of Women Quarterly, 2(3), 235–243

    Article  Google Scholar 

  54. Kaschak, E. (1981). Another look at sex bias in students’ evaluations of professors: Do winners get the recognition that they have been given? Psychology of Women Quarterly, 5(5_suppl), 767–772

    Article  Google Scholar 

  55. Key, E., & Ardoin, P. (2019). Students rate male instructors more highly than female instructors. We tried to counter that hidden bias. Washington Post. Accessed 3 Sep 2019. https://www.washingtonpost.com/politics/2019/08/20/students-rate-male-instructors-more-highly-than-female-instructors-we-tried-counter-that-hidden-bias/

  56. Kierstead, D., D’agostino, P., & Dill, H. (1988). Sex role stereotyping of college professors: Bias in students’ ratings of instructors. Journal of Educational Psychology, 80(3), 342

    Article  Google Scholar 

  57. Leslie, S.-J., Cimpian, A., Meyer, M., & Freeland, E. (2015). Expectations of brilliance underlie gender distributions across academic disciplines. Science, 347(6219), 262–265

    Article  Google Scholar 

  58. Lindahl, M. W., & Unger, M. L. (2010). Cruelty in student teaching evaluations. College Teaching, 58(3), 71–76

    Article  Google Scholar 

  59. Linse, A. R. (2017). Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees. Studies in Educational Evaluation, 54, 94–106

    Article  Google Scholar 

  60. MacNell, L., Driscoll, A., & Hunt, A. N. (2015). What’s in a name: Exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291–303

    Article  Google Scholar 

  61. Marsh, H. W. (1980). Research on students’ evaluations of teaching effectiveness. Instructional Evaluation, 4(5), 5–13

    Google Scholar 

  62. Marsh, H. W. (1982a). Factors affecting students’ evaluations of the same course taught by the same instructor on different occasions. American Educational Research Journal, 19(4), 485–497

  63. Marsh, H. W. (1982b). Validity of students’ evaluations of college teaching: A multitrait–multimethod analysis. Journal of Educational Psychology, 74(2), 264

  64. Marsh, H. W. (1984). Students’ evaluations of university teaching: Dimensionality, reliability, validity, potential baises, and utility. Journal of Educational Psychology, 76(5), 707

    Article  Google Scholar 

  65. Martin, E. (1984). Power and authority in the classroom: Sexist stereotypes in teaching evaluations. Signs: Journal of Women in Culture and Society, 9(3), 482–492

    Article  Google Scholar 

  66. McPherson, M. A., Todd Jewell, R., & Kim, M. (2009). What determines student evaluation scores? A random effects analysis of undergraduate economics classes. Eastern Economic Journal, 35(1), 37–51

    Article  Google Scholar 

  67. Mengel, F., Sauermann, J., & Zölitz, U. (2018). Gender bias in teaching evaluations. Journal of the European Economic Association, 17(2), 535–566

    Article  Google Scholar 

  68. Miles, P., & House, D. (2015). The Tail Wagging the Dog; An Overdue Examination of Student Teaching Evaluations. International Journal of Higher Education, 4(2), 116–126

    Article  Google Scholar 

  69. Miller, J., & Seldin, P. (2014). Changing Practices in Faculty Evaluations: Can Better Evaluation Make a Difference? Academe, 100(3), 35–38

    Google Scholar 

  70. Miller, J., & Chamberlin, M. (2000). Women are teachers, men are professors: A study of student perceptions. Teaching Sociology, 28(4), 283

    Article  Google Scholar 

  71. Mitchell, K. M. W., & Martin, J. (2018). Gender bias in student evaluations. Political Science & Politics, 51(3), 648–652

    Article  Google Scholar 

  72. Murray, H. G. (1984). The impact of formative and summative evaluation of teaching in North American universities. Assessment and Evaluation in Higher Education, 9(2), 117–132

    Article  Google Scholar 

  73. Murray, H. G. (1997). Does evaluation of teaching lead to improvement of teaching? The International Journal for Academic Development, 2(1), 8–23.

    Article  Google Scholar 

  74. Perna, L. W. (2005). The benefits of higher education: Sex, racial/ethnic, and socioeconomic group differences. The Review of Higher Education, 29(1), 23–52.

    Article  Google Scholar 

  75. Peterson, D. A. M., Biederman, L. A., Andersen, D., Ditonto, T. M., & Roe, K. (2019). Mitigating gender bias in student evaluations of teaching. PLoS One, 14(5), e0216241

    Article  Google Scholar 

  76. Piatak, J., & Mohr, Z. (2019). More gender bias in academia? Examining the influence of gender and formalization on student worker rule following. Journal of Behavioral Public Administration, 2(2)

  77. Reid, L. D. (2010). The role of perceived race and gender in the evaluation of college teaching on RateMyProfessors. Com. Journal of Diversity in Higher Education, 3(3), 137

    Article  Google Scholar 

  78. Ridgeway, C. L. (2011). Framed by gender: How gender inequality persists in the modern world Oxford University Press

  79. Rivera, L. A., & Tilcsik, A. (2019). Scaling Down Inequality: Rating Scales, Gender Bias, and the Architecture of Evaluation. American Sociological Review, 84(2), 248–274

    Article  Google Scholar 

  80. Rosen, A. S. (2018). Correlations, trends and potential biases among publicly accessible web-based student evaluations of teaching: a large-scale study of RateMyProfessors. com data. Assessment & Evaluation in Higher Education, 43(1), 31–44

    Article  Google Scholar 

  81. Rowden, G. V., & Carlson, R. E. (1996). Gender issues and students’ perceptions of instructors’ immediacy and evaluation of teaching and course. Psychological Reports, 78(3), 835–839

    Article  Google Scholar 

  82. Seldin, P., Miller, J. E., & Seldin, C. A. (2010). The teaching portfolio: A practical guide to improved performance and promotion/tenure decisions. John Wiley & Sons

  83. Sidanius, J., & Crane, M. (1989). Job evaluation and gender: The case of university faculty. Journal of Applied Social Psychology, 19(2), 174–197

    Article  Google Scholar 

  84. Sinclair, L., & Kunda, Z. (2000). Motivated stereotyping of women: She’s fine if she praised me but incompetent if she criticized me. Personality and Social Psychology Bulletin, 26(11), 1329–1342.

    Article  Google Scholar 

  85. Smith, B. P., & Hawkins, B. (2011). Examining student evaluations of black college faculty: does race matter? Journal of Negro Education, 80(2)

  86. Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598–642

    Article  Google Scholar 

  87. Sprague, J., & Massoni, K. (2005). Student evaluations and gendered expectations: What we can’t count can hurt us. Sex Roles, 53(11–12), 779–793

    Article  Google Scholar 

  88. Stark, P., & Freishtat, R. (2014). An evaluation of course evaluations. ScienceOpen. Center for Teaching and Learning, University of California, Berkley. Retrieved https://www.scienceopen.com/document

  89. Storage, D., Horne, Z., Cimpian, A., & Leslie, S.-J. (2016). The frequency of “brilliant” and “genius” in teaching evaluations predicts the representation of women and African Americans across fields. PLoS One, 11(3), e0150194

    Article  Google Scholar 

  90. Subtirelu, N. C. (2015). “She does have an accent but…”: Race and language ideology in students’ evaluations of mathematics instructors on RateMyProfessors. com. Language in Society, 44(1), 35–62

    Article  Google Scholar 

  91. Theall, M., & Franklin, J. (2001). Looking for bias in all the wrong places: A search for truth or a witch hunt in student ratings of instruction? New Directions for Institutional Research, 2001(109), 45–56

    Article  Google Scholar 

  92. Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22–42

    Article  Google Scholar 

  93. Uttl, B., White, C. A., & Morin, A. (2013). The numbers tell it all: students don’t like numbers! PLoS One, 8(12), e83443

    Article  Google Scholar 

  94. Wachtel, H. K. (1998). Student evaluation of college teaching effectiveness: A brief review. Assessment & Evaluation in Higher Education, 23(2), 191–212

    Article  Google Scholar 

  95. Wagner, N., Rieger, M., & Voorvelt, K. (2016). Gender, ethnicity and teaching evaluations: Evidence from mixed teaching teams. Economics of Education Review, 54, 79–94

    Article  Google Scholar 

  96. Wallace, S. L., Lewis, A. K., & Allen, M. D. (2019). The State of the Literature on Student Evaluations of Teaching and an Exploratory Analysis of Written Comments: Who Benefits Most? College Teaching, 67(1), 1–14

    Article  Google Scholar 

  97. Wallisch, P., & Cachia, J. (2019). Determinants of perceived teaching quality: the role of divergent interpretations of expectations

  98. Wigington, H., Tollefson, N., & Rodriguez, E. (1989). Students’ ratings of instructors revisited: Interactions among class and instructor variables. Research in Higher Education, 30(3), 331–344

    Article  Google Scholar 

  99. Whitworth, J. E., Price, B. A., & Randall, C. H. (2002). Factors that affect college of business student opinion of teaching and learning. Journal of Education for Business, 77(5), 282–289

    Article  Google Scholar 

  100. Wright, S. L., & Jenkins-Guarnieri, M. A. (2012). Student evaluations of teaching: combining the meta-analyses and demonstrating further evidence for effective use. Assessment & Evaluation in Higher Education, 37(6), 683–699

    Article  Google Scholar 

  101. Youmans, R. J., & Jee, B. D. (2007). Fudging the numbers: Distributing chocolate influences student evaluations of an undergraduate course. Teaching of Psychology, 34(4), 245–247

    Article  Google Scholar 

  102. Young, S., Rush, L., & Shaw, D. (2009). Evaluating Gender Bias in Ratings of University Instructors’ Teaching Effectiveness. International Journal for the Scholarship of Teaching and Learning, 3(2), n2

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jennie Sweet-Cushman.

Ethics declarations

Conflict of Interest

The authors hereby acknowledge no financial or non-financial conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kreitzer, R.J., Sweet-Cushman, J. Evaluating Student Evaluations of Teaching: a Review of Measurement and Equity Bias in SETs and Recommendations for Ethical Reform. J Acad Ethics (2021). https://doi.org/10.1007/s10805-021-09400-w

Download citation

Keywords

  • Teaching evaluations
  • Gender stereotypes
  • Gender bias
  • Gender