Student ratings of teaching play a significant role in career outcomes for higher education instructors. Although instructor gender has been shown to play an important role in influencing student ratings, the extent and nature of that role remains contested. While difficult to separate gender from teaching practices in person, it is possible to disguise an instructor’s gender identity online. In our experiment, assistant instructors in an online class each operated under two different gender identities. Students rated the male identity significantly higher than the female identity, regardless of the instructor’s actual gender, demonstrating gender bias. Given the vital role that student ratings play in academic career trajectories, this finding warrants considerable attention.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
To clarify the language we use throughout the paper, we refer to all three persons responsible for grading and directly interacting with students as “instructors.” The course “professor” was the person responsible for course design and content preparation, while the two “assistant instructors” worked under the professor’s direction to manage and teach their respective discussion groups.
A one-way ANOVA test confirmed that there was no significant variation among all six groups’ discussion board grades and overall grades for the course.
We acknowledge that the application of parametric analytical techniques (ANOVA, MANOVA, and t-tests) to ordinal data (the Likert scale responses) remains controversial among social scientists and statisticians. (See Knapp (1990) for a relatively balanced review of the debate.) We side with the arguments of Gaito (1980) and Armstrong (1981) and argue that it is appropriate to do so in our case as the concept being measured is interval, even if the data labels are not. This practice is common within higher education research. (e.g. Centra & Gaubatz  Young, Rush, & Shaw ; Basow ; and Knol et al. )
While we acknowledge that a significance level of .05 is conventional in social science and higher education research, we side with Skipper, Guenther, and Nass (1967), Labovitz (1968), and Lai (1973) in pointing out the arbitrary nature of conventional significance levels. Considering our study design, we have used a significance level of .10 for some tests where: 1) the results support the hypothesis and we are consequently more willing to reject the null hypothesis of no difference; 2) our hypothesis is strongly supported theoretically and by empirical results in other studies that use lower significance levels; 3) our small n may be obscuring large differences; and 4) the gravity of an increased risk of Type I error is diminished in light of the benefit of decreasing the risk of a Type II error (Labovitz, 1968; Lai, 1973).
Abrami, P. C., d’Apollonia, S., & Rosenfield, S. (2007). The dimensionality of student ratings of instruction: What we know and what we do not. In R. P. Perry & J. C. Smart (Eds.), The scholarship of teaching and learning in higher education: An evidence-based perspective (pp. 385–445). Dordrecht, The Netherlands: Springer.
Acker, J. (1990). Hierarchies, job, and bodies: A theory of gendered organizations. Gender and Society, 4, 81–95.
Andersen, K., & Miller, E. D. (1997). Gender and student evaluations of teaching. Ps-Political Science and Politics, 30, 216–219.
Armstrong, G. D. (1981). Parametric statistics and ordinal data: A pervasive misconception. Nursing Research, 30, 60–62.
Bachen, C. M., McLoughlin, M. M., & Garcia, S. S. (1999). Assessing the role of gender in college students' evaluations of faculty. Communication Education, 48, 193–210.
Basow, S. A. (1995). Student evaluations of college professors: When gender matters. Journal of Educational Psychology, 87, 656–665.
Basow, S. A., & Montgomery, S. (2005). Student ratings and professor self-rating of college teaching: Effects of gender and divisional affiliation. Journal of Personnel Evaluation in Education, 18, 91–106.
Basow, S. A., Phelan, J. E., & Capotosto, L. (2006). Gender patterns in college students' choices of their best and worst professors. Psychology of Women Quarterly, 30, 25–35.
Basow, S. A., & Silberg, N. T. (1987). Student evaluations of college professors: Are female and male professors rated differently? Journal of Educational Psychology, 79, 308–314.
Bennett, S. K. (1982). Student perceptions of and expectations for male and female instructors: Evidence relating to the question of gender bias in teaching evaluation. Journal of Educational Psychology, 74, 170–179.
Benton, S. L., & Cashin, W. E. (2014). Student ratings of instruction in college and university courses. In M. B. Paulsen (Ed.), Higher education: Handbook of theory and research (pp. 279–326). Dordrecht, The Netherlands: Springer.
Burns-Glover, A. L., & Veith, D. J. (1995). Revisiting gender and teaching evaluations: Sex still makes a difference. Journal of Social Behavior and Personality, 10, 69–80.
Centra, J. A. (2007). Differences in responses to the student instructional report: Is it bias? Princeton, NJ: Educational Testing Service.
Centra, J. A., & Gaubatz, N. B. (2000). Is there gender bias in student evaluations of teaching? Journal of Higher Education, 71, 17–33.
Chamberlin, M. S., & Hickey, J. S. (2001). Student evaluations of faculty performance: The role of gender expectations in differential evaluations. Educational Research Quarterly, 25, 3–14.
Curtis, J. W. (2011). Persistent inequity: Gender and academic employment. Report from the American Association of University Professors. Retrieved from http://www.aaup.org/NR/rdonlyres/08E023AB-E6D8-4DBD-99A0-24E5EB73A760/0/persistent_inequity.pdf
Dalmia, S., Giedeman, D. C., Klein, H. A., & Levenburg, N. M. (2005). Women in academia: An analysis of their expectations, performance and pay. Forum on Public Policy, 1, 160–177.
Davis, B. G. (2009). Tools for teaching (2nd ed.). San Francisco, CA: Jossey-Bass.
Feldman, K. A. (1992). College students’ views of male and female college teachers: Evidence from the social laboratory and experiments – Part 1. Research in Higher Education, 33, 317–375.
Feldman, K. A. (1993). College students’ views of male and female college teachers: Evidence from the social laboratory and experiments – Part 2. Research in Higher Education, 34, 151–211.
Gaito, J. (1980). Measurement scales and statistics: Resurgence of an old misconception. Psychological Bulletin, 87, 564–567.
Garson, G. D. (2012). General linear models: Multivariate GLM & MANOVA/MANCOVA. Asheboro, NC: Statistical Associates.
Goldberg, P. (1968). Are women prejudiced against women? Trans-action, 5, 28–30.
Greenwald, A. G. (1997). Validity concerns and usefulness of student ratings of instruction. American Psychologist, 52, 1182–1186.
Hair, J. F., Jr., Anderson, R. E., Tatham, R. L., & Black, W. C. (1998). Multivariate data analysis with readings (5th ed.). Englewood Cliffs, NJ: Prentice-Hall.
Hampton, S. E., & Reiser, R. A. (2004). Effects of a theory-based feedback and consultation process on instruction and learning in college classrooms. Research in Higher Education, 45, 497–527.
Johnson, V. E. (2003). Grade inflation: A crisis in college education. New York, NY: Springer.
Johnson, A. (2006). Power, privilege, and difference. Boston, MA: McGraw-Hill.
Knapp, T. R. (1990). Treating ordinal scales as interval scales: An attempt to resolve the controversy. Nursing Research, 39, 121–123.
Knol, M. H., Veld, R., Vorst, H. C. M., van Driel, J. H., & Mellenbergh, G. J. (2013). Experimental effects of student evaluations coupled with collaborative consultation on college professors’ instructional skills. Research in Higher Education, 54, 825–850.
Labovitz, S. (1968). Criteria for selecting a significance level: A note on the sacredness of.05. The American Sociologist, 3, 220–222.
Lai, M.K. (1973). The case against tests of statistical significance. Report from the Teacher Education Division Publication Series. Retrieved from http://files.eric.ed.gov/fulltext/ED093926.pdf
Liu, O. L. (2012). Student evaluation of instruction: In the new paradigm of distance education. Research in Higher Education, 53, 471–486.
Lorber, J. (1994). Paradoxes of gender. New Haven, CT: Yale University Press.
Marsh, H. W. (2001). Distinguishing between good (useful) and bad workloads on students’ evaluations of teaching. American Educational Research Journal, 38, 183–212.
Marsh, H. W. (2007). Students’ evaluations of university teaching: Dimensionality, reliability, validity, potential biases and usefulness. In R. P. Perry & J. C. Smart (Eds.), The scholarship of teaching and learning in higher education: An evidence-based perspective (pp. 319–383). Dordrecht, The Netherlands: Springer.
Miller, J., & Chamberlin, M. (2000). Women are teachers, men are professors: A study of student perceptions. Teaching Sociology, 28, 283–298.
Monroe, K., Ozyurt, S., Wrigley, T., & Alexander, A. (2008). Gender equality in academia: Bad news from the trenches, and some possible solutions. Perspectives on Politics, 6, 215–233.
Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. Cambridge, MA: Cambridge University Press.
Morris, L. V. (2011). Women in higher education: Access, success, and the future. Innovative Higher Education, 36, 145–147.
Morrison, K., & Johnson, T. (2013). Editorial. Educational Research and Evaluation, 19, 579–584.
Murray, H. G. (2007). Low-inference teaching behaviors and college teaching effectiveness: Recent developments and controversies. In R. P. Perry & J. C. Smart (Eds.), The scholarship of teaching and learning in higher education: An evidence-based perspective (pp. 145–183). Dordrecht, The Netherlands: Springer.
O’Sullivan, P. D., Hunt, S. K., & Lippert, L. R. (2004). Mediated immediacy: A language of affiliation in a technological age. Journal of Language and Social Psychology, 23, 464–490.
Paludi, M. A., & Strayer, L. A. (1985). What’s in an author’s name? Differential evaluations of performance as a function of author’s name. Sex Roles, 12, 353–361.
Perry, R. P., & Smart, J. C. (Eds.). (2007). The scholarship of teaching and learning in higher education: An evidence-based perspective. Dordrecht, The Netherlands: Springer.
Risman, B. J. (2004). Gender as a social structure: Theory wrestling with activism. Gender & Society, 18, 429–450.
Rowden, G. V., & Carlson, R. E. (1996). Gender issues and students' perceptions of instructors' immediacy and evaluation of teaching and course. Psychological Reports, 78, 835–839.
Sandler, B. R. (1991). Women faculty at work in the classroom, or, why it still hurts to be a woman in labor. Communication Education, 40, 6–15.
Sidanius, J., & Crane, M. (1989). Job evaluation and gender: The case of university faculty. Journal of Applied Social Psychology, 19, 174–197.
Simeone, A. (1987). Academic women: Working toward equality. South Hadley, MA: Bergin and Garvey.
Skipper, J. K., Guenther, A. C., & Nass, G. (1967). The sacredness of.05: A note concerning the uses of statistical levels of significance in social science. The American Sociologist, 1, 16–18.
Sprague, J., & Massoni, K. (2005). Student evaluations and gendered expectations: What we can't count can hurt us. Sex Roles, 53, 779–793.
Statham, A., Richardson, L., & Cook, J. A. (1991). Gender and university teaching: A negotiated difference. Albany, NY: State University of New York Press.
Subramanya, S. R. (2014). Toward a more effective and useful end-of-course evaluation scheme. Journal of Research in Innovative Teaching, 7, 143–157.
Svanum, S., & Aigner, C. (2011). The influences of course effort, mastery and performance goals, grade expectancies, and earned course grades on student ratings of course satisfaction. British Journal of Educational Psychology, 81, 667–679.
Svinicki, M., & McKeachie, W. J. (2010). McKeachie’s teaching tips: Strategies, research, and theory for college and university teachers (13th ed.). Belmont, CA: Wadsworth.
Theall, M., Abrami, P. C., & Mets, L. A. (Eds.). (2001). The student ratings debate: Are they valid? How can we best use them? San Francisco, CA: Jossey-Bass.
West, C., & Zimmerman, D. H. (1987). Doing gender. Gender & Society, 1, 125–151.
Young, S., Rush, L., & Shaw, D. (2009). Evaluating gender bias in ratings of university instructors' teaching effectiveness. International Journal of Scholarship of Teaching and Learning, 3, 1–14.
About this article
Cite this article
MacNell, L., Driscoll, A. & Hunt, A.N. What’s in a Name: Exposing Gender Bias in Student Ratings of Teaching. Innov High Educ 40, 291–303 (2015). https://doi.org/10.1007/s10755-014-9313-4
- gender inequality
- gender bias
- student ratings of teaching
- student evaluations of instruction