Abstract
This study aimed to demonstrate how one university worked to overcome some of the measurement problems associated with legacy student rating instruments through the creation and investigation of a new student rating instrument based on the most current scholarship on teaching and learning. Measurement problems with legacy instruments include asking about consumer satisfaction (including the use of global ratings) rather than directly assessing the quality of teaching, asking students for self-reports of learning, and asking students to make judgments about the internal state of their instructors. A new instrument was created to intentionally reduce these problems. The new instrument and its predecessor were both administered by 54 instructors in 81 classes and completed by 2,013 students. The following semester, the new instrument was administered university-wide including 1,450 instructors, 3,669 classes, and 58,320 students. The findings indicate that the new instrument created by this process is both reliable and valid but does not reflect multidimensionality. There is also no compelling evidence of bias according to gender or race. This work illustrates the process by which new and better instruments might be created and tested in order to replace flawed legacy instruments in higher education.
Similar content being viewed by others
Data Availability
These data belong to the institution are not available for use. The instrument used is available at https://sites.google.com/mail.fresnostate.edu/fresno-state-sri/home.
References
Abrami, P. C., d’Apollonia, S., & Rosenfeld, S. (2007). The dimensionality of student ratings of instruction: What we know and what we do not. In R.P. Perry and J.C. Smart (Eds.), The Scholarship of Teaching and Learning in Higher Education: An Evidence-Based Perspective, 385–456.
Aikenhead, G. S. (2006). Science education for everyday life: Evidence-based practice. Teachers College.
Albanese, M. A., Schuldt, S. S., Case, D. E., & Brown, D. (1991). The validity of lecturer ratings by students and trained observers. Academic Medicine, 66(1), 26–28. https://doi.org/10.1097/00001888-199101000-00008.
Aleamoni, L. M. (1999). Student rating myths versus research facts from 1924 to 1998. Journal of Personnel Evaluation in Education, 13(2), 153–166. https://doi.org/10.1023/A:1008168421283.
Aleamoni, L., & Everly, J. C. (1971). Illinois course evaluation questionnaire: Useful in collecting student opinion. NACTA Journal, XV(4, December 1971), pp99–100.
Bain, K. (2004). What the best college teachers do. Harvard University Press.
Bandiera, O., Larcinese, V., & Rasul, I. (2015). Blissful ignorance? A natural experiment on the effect of feedback on students’ performance. Labour Economics, 34, 13–25. https://doi.org/10.1016/j.labeco.2015.02.002.
Barnes, L. L., & Barnes, M. W. (1993). Academic discipline and generalizability of student evaluations of instruction. Research in Higher Education, 34, 135–149.
Belland, B. R., Walker, A. E., Kim, N. J., & Lefler, M. (2017). Synthesizing results from empirical research on computer-based scaffolding in STEM education. Review of Educational Research, 87(2), 309–344. https://doi.org/10.3102/0034654316670999.
Benton, S. L., & Cashin, W. E. (2012). Student ratings of teaching: a summary of research and literature (IDEA Paper no. 50). Manhattan, KS: The IDEA Center. https://ideacontent.blob.core.windows.net/content/sites/2/2020/01/PaperIDEA_50.pdf.
Benton, S. L., & Ryallis, K. R. (2016). Challenging misconceptions about student ratings of instruction. IDEA Center. https://www.ideaedu.org/Portals/0/Uploads/Documents/IDEA%20Papers/IDEA%20Papers/PaperIDEA_58.pdf.
Berezvai, Z., Lukats, G. D., & Molontay, R. (2021). Can professors buy better evaluation with lenient grading? The effect of grade inflation on student evaluation of teaching. Assessment and Evaluation in Higher Education, 46, 793–808.
Boring, A. (2017). Gender biases in student evaluations of teaching. Journal of Public Economics, 145, 27–41. https://doi.org/10.1016/j.jpubeco.2016.11.006.
Boring, A., Ottoboni, K., & Stark, P. B. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness. Science Open Research, 2016. https://doi.org/10.14293/S2199-1006.1.SOR-EDU.AETBZC.v1.
Bradley, K. D., & Bradley, J. W. (2006). Challenging the validity of higher education course evaluations. Journal of College Teaching and Learning, 3, 63–76.
Brooks, S., Dobbins, K., Scott, J. J. A., Rawlinson, M., & Norman, R. I. (2014). Learning about learning outcomes: The student perspective. Teaching in Higher Education, 19(6), 721–733. https://doi.org/10.1080/13562517.2014.901964.
Bunge, N. (2018). Student evaluating teachers doesn’t just hurt teachers. It hurts students. The Chronicle of Higher Education, November 27, 2018
Burke, A. S., Head-Burgess, W., & Siders, M. (2017). He’s smart and she’s nice: Student perceptions of male and female faculty. International Journal of Gender and Women’s Studies, 5, 1–6. https://doi.org/10.15640/ijgws.v5n1p1.
Burt, S. (2015). Why not get rid of student evaluations? Slate, May 15, 2015https://slate.com/human-interest/2015/05/a-defense-of-student-evaluations-theyre-biased-misleading-and-extremely-useful.html.
Canelos, J. (1985). Teaching and course evaluation procedures: A literature review of current research. Journal of Instructional Psychology, 12(4), 187–195.
Carless, D., & Boud, D. (2018). The development of student feedback literacy: Enabling uptake of feedback. Assessment & Evaluation in Higher Education, 43(8), 1315–1325. https://doi.org/10.1080/02602938.2018.1463354.
Cashin, W. E., & Downey, R. G. (1992). Using global student rating items for summative evaluation. Journal of Educational Psychology, 84(4), 563–572. https://doi.org/10.1037/0022-0663.84.4.563.
Coffman, W. E. (1954). Determining students’ concepts of effective teaching from their ratings of instructors. Journal of Educational Psychology, 45(5), 277–286. https://doi.org/10.1037/h0058214.
Cohen, P. A. (1981). Student ratings of instruction and student achievement: A meta-analysis of multisection validity studies. Review of Educational Research, 51(3), 281–309. https://doi.org/10.3102/00346543051003281.
Costin, F. (1966). Intercorrelations between students’ and course chairmen’s ratings of instructors. University of Illinois, Division of General studies.
Costin, F., Greenough, W. T., & Menges, R. J. (1971). Student ratings of college teaching: Reliability, validity, and usefulness. Review of Educational Research, 41(5), 511–535. https://doi.org/10.3102/00346543041005511.
Crawford, P. L., & Bradshaw, H. L. (1968). Perception of characteristics of effective university teachers: A scaling analysis. Educational and Psychological Measurement, 28(4), 1079–1085. https://doi.org/10.1177/001316446802800406.
Crews, D. R., Burroughs, W. J., & Nokovich, D. (1987). Teacher self-ratings as a validity criterion for student evaluations. Teaching of Psychology, 14(1), 23–25. https://doi.org/10.1207/s15328023top1401_5.
Daumiller, M., Janke, S., Hein, J., Rinas, R., Dickhauser, O., & Dresel, M. (2023). Teaching quality in higher education. Journal of Psychological Assessment, 39, 176–181.
De Champlain, A. F. (2009). A primer on classical test theory and item response theory for assessments in medical education. Medical Education, 44, 109–117.
Doo, M. Y., Bonk, C., & Heo, H. (2020). A meta-analysis of scaffolding effects in online learning in higher education. The International Review of Research in Open and Distributed Learning, 21(3). https://doi.org/10.19173/irrodl.v21i3.4638.
Downie, N. M. (1952). Student evaluation of faculty. Journal of Higher Education, 23, 495–496. https://doi.org/10.2307/1976935.
Eble, K. E. (1971). The recognition and evaluation of teaching. Project to Improve College Teaching. American Association of University Professors.
Falkoff, M. (2018). Why we must stop relying on student ratings of teaching. The Chronicle of Higher Education, April 25, 2018https://www.chronicle.com/article/why-we-must-stop-relying-on-student-ratings-of-teaching/.
Feldman, K. A. (1976). The superior college teacher from the students’ view. Research in Higher Education, 5(3), 243–288. https://doi.org/10.1007/BF00991967.
Feldman, K. A. (1978). Consistency and variability among college students in rating their teachers and courses: A review and analysis. Research in Higher Education, 6, 233–274.
Feldman, K. A. (1993). College students’ views of male and female college teachers: Part II—Evidence from students’ evaluations of their classroom teachers. Research in Higher Education, 34(2), 151–211. https://doi.org/10.1007/BF00992161.
Feldman, A., & Özalp, D. (2019). Science teachers’ ability to self-calibrate and the trustworthiness of their self-reporting. Journal of Science Teacher Education, 30(3), 280–299. https://doi.org/10.1080/1046560X.2018.1560209.
Freeman, S., Eddy, S. L., McDonough, M., Smith, M. K., Okoroafor, N., Jordt, H., & Wenderoth, M. P. (2014). Active learning increases student performance in science, engineering, and mathematics. Proceedings of the National Academy of Sciences, 111(23), 8410– 8415. https://doi.org/10.1073/pnas.1319030111.
Gannon, K. (2018). A defense (sort of) of student evaluations of teaching. The Chronicle of Higher Education, May 8, 2018. https://www.chronicle.com/article/in-defense-sort-of-of-student-evaluations-of-teaching/.
Gelber, S. M. (2020). Grading the College: A History of Evaluating Teaching and Learning by Scott M. Gelber. Baltimore: Johns Hopkins University Press, 2020.
Gelber, K., Brennan, K., Duriesmith, D., & Fenton, E. (2022). Gendered mundanities: Gender bias in student evaluations of teaching in political science. Australian Journal of Political Science, 1–22. https://doi.org/10.1080/10361146.2022.2043241.
Gin, L. E., Scott, R. A., Pfeiffer, L. D., Zheng, Y., Cooper, K. M., & Brownell, S. E. (2021). It’s in the syllabus … or is it? How biology syllabi can serve as communication tools for creating inclusive classrooms at a large-enrollment research institution. Advances in Physiology Education, 45(2), 224–240. https://doi.org/10.1152/advan.00119.2020.
Gordon, R. A. (2015). Measuring constructs in family science: How can item response theory improve precision and validity? Journal of Marriage and Family, 77, 147–176.
Guthrie, E. R. (1949). The evaluation of teaching. Educational Record, 30, 109–115.
Guthrie, E. R. (1953). The evaluation of teaching. The American Journal of Nursing, 53, 220–221.
Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 38–47.
Hamermesh, D. S., & Parker, A. (2005). Beauty in the classroom: Instructors’ pulchritude and putative pedagogical productivity. Economics of Education Review, 24(4), 369–376. https://doi.org/10.1016/j.econedurev.2004.07.013.
Holley, L. C., & Steiner, S. (2005). Safe space: Student perspectives on classroom environment. Journal of Social Work Education, 41(1), 49–64. https://doi.org/10.5175/JSWE.2005.200300343.
Humiston, J. P., Marshall, S. M., Hacker, N. L., & Cantu, L. M. (2020). Intentionally creating an inclusive and welcoming climate in the online learning classroom. In Handbook of research on creating meaningful experiences in online courses (pp. 173–186). Hershey, PA: IGI Global. https://doi.org/10.4018/978-1-7998-0115-3.ch012.
Jaschick, S. (2015). Rate my word choice. Inside Higher Ed. https://www.insidehighered.com/news/2015/02/09/new-analysis-rate-my-professors-finds-patterns-words-used-describe-men-and-women.
Karas, A. (2021). The effect of class size on grades and course evaluations. Bulletin of Economic Research, 73, 624–642.
Kornell, N., & Hausman, H. (2016). Do the best teachers get the best ratings? Frontiers in Psychology. Retrieved from https://www.frontiersin.org/article/https://doi.org/10.3389/fpsyg.2016.00570.
Kreitzer, R. J., & Sweet-Cushman, J. (2022). Evaluating student evaluations of teaching: A review of measurement and equity bias in SETs and recommendations for ethical reform. Journal of Academic Ethics, 20(1), 73–84. https://doi.org/10.1007/s10805-021-09400-w.
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 1121–1134. https://doi.org/10.1037/0022-3514.77.6.1121.
Kumpas-Lenk, K., Eisenschmidt, E., & Veispak, A. (2018). Does the design of learning outcomes matter from students’ perspective? Studies in Educational Evaluation, 59, 179–186. https://doi.org/10.1016/j.stueduc.2018.07.008.
Linse, A. R. (2017). Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees. Studies in Educational Evaluation, 54, 94–106. https://doi.org/10.1016/j.stueduc.2016.12.004.
López-Pastor, V., & Sicilia-Camacho, A. (2017). Formative and shared assessment in higher education. Lessons learned and challenges for the future. Assessment & Evaluation in Higher Education, 42(1), 77–97. https://doi.org/10.1080/02602938.2015.1083535.
Lund Dean, K., & Fornaciari, C. J. (2013). The 21st-Century syllabus: Tips for putting andragogy into practice. Journal of Management Education, 38(5), 724–732. https://doi.org/10.1177/1052562913504764.
MacNell, L., Driscoll, A., & Hunt, A. N. (2015). What’s in a name: Exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291–303. https://doi.org/10.1007/s10755-014-9313-4.
Maderick, J. A., Zhang, S., Hartley, K., & Marchand, G. (2015). Preservice teachers and self-assessing digital competence. Journal of Educational Computing Research, 54(3), 326–351. https://doi.org/10.1177/0735633115620432.
Marsh, H. W. (1982). SEEQ: A reliable, valid, and useful instrument for collecting students’ evaluations of university teaching. British Journal of Educational Psychology, 52(1), 77–95. https://doi.org/10.1111/j.2044-8279.1982.tb02505.x.
Marsh, H. W. (1983). Multidimensional ratings of teaching effectiveness by students from different academic settings and their relation to student/course/instructor characteristics. Journal of Educational Psychology, 75(1), 150–166. https://doi.org/10.1037/0022-0663.75.1.150.
Marsh, H. W. (1984). Students’ evaluations of university teaching: Dimensionality, reliability, validity, potential biases, and utility. Journal of Educational Psychology, 76(5), 707–754. https://doi.org/10.1037/0022-0663.76.5.707.
Marsh, H. W. (1987). Students’ evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11(3), 253–388. https://doi.org/10.1016/0883-0355(87)90001-2.
Marsh, H. W., & Roche, L. A. (1997). Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52(11), 1187–1197. https://doi.org/10.1037/0003-066X.52.11.1187.
Marsh, H. W., & Roche, L. A. (2000). Effects of grading leniency and low workloads on students’ evaluations of teaching. Journal of Educational Psychology, 92, 202–228. https://doi.org/10.1037/0022-0663.92.1.202.
Marsh, H. W., Overall, J. U., & Kesler, S. P. (1979). Validity of student evaluations of instructional effectiveness: A comparison of faculty self-evaluations and evaluations by their students. Journal of Educational Psychology, 71(2), 149–160. https://doi.org/10.1037/0022-0663.71.2.149. https://doi-org.hmlproxy.lib.csufresno.edu/.
Martin, F., Ritzhaupt, A., Kumar, S., & Budhrani, K. (2019). Award-winning faculty online teaching practices: Course design, assessment and evaluation, and facilitation. The Internet and Higher Education, 42, 34–43. https://doi.org/10.1016/j.iheduc.2019.04.001.
Mau, R. R., & Opengart, R. A. (2012). Comparing ratings: In-class (paper) vs out of class (online) student evaluations. Higher Education Studies, 2(3), 55–68. https://doi.org/10.5539/hes.v2n3p55.
McKeachie, W. J. (1997). Student ratings: The validity of use. American Psychologist, 52(11), 1218–1225. https://doi.org/10.1037/0003-066X.52.11.1218.
Miles, P., & House, D. (2015). The tail wagging the dog: An overdue examination of student teaching evaluations. International Journal of Higher Education, 4, 116–126.
Morris, R. C., Parker, L. C., Nelson, D., Pistilli, M. D., Hagen, A., Levesque-Bristol, C., & Weaver, G. (2014). Development of a student self-reported instrument to assess course reform. Educational Assessment, 19(4), 302–320. https://doi.org/10.1080/10627197.2014.964119.
Morris, R., Perry, T., & Wardle, L. (2021). Formative assessment and feedback for learning in higher education: A systematic review. Review of Education, 9(3), e3292. https://doi.org/10.1002/rev3.3292.
Murray, H. G. (1983). Low-inference classroom teaching behaviors and student ratings of college teaching effectiveness. Journal of Educational Psychology, 75(1), 138–149. https://doi.org/10.1037/0022-0663.75.1.138.
Overman, A. A., Xu, Q., & Little, D. (2020). What do students actually pay attention to and remember from a syllabus? An eye tracking study of visually rich and text-based syllabi. Scholarship of Teaching and Learning in Psychology, 6(4), 285–300. https://doi.org/10.1037/stl0000157.
Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88(5), 879–903. https://doi.org/10.1037/0021-9010.88.5.879.
Raykov, T., & Marcoulides, G. A. On the relationship between classical test theory and item response theory: From one to the other and back. Educational and Psychological Measurement, 76, 325–338. https://doi.org/10.1177/0013164415576958.
Reid, L. D. (2010). The role of perceived race and gender in the evaluation of college teaching on RateMyProfessors.com. Journal of Diversity in Higher Education, 3(3), 137–152. https://doi.org/10.1037/a0019865.
Remmers, H. H. (1927). The Purdue rating scale for instructors. Educational Administration and Supervision, 6, 399–406.
Rosen, A. S. (2018). Correlations, trends and potential biases among publicly accessible web-based student evaluations of teaching: A large-scale study of RateMyProfessors.com data. Assessment & Evaluation in Higher Education, 43(1), 31–44. https://doi.org/10.1080/02602938.2016.1276155.
Rosenshine, B., & First, N. F. (1979). Research on teacher performance criteria. In B.O. Smith (Ed.), Research in teaching education: A symposium Prentice Hall.
Schneider, M., & Preckel, F. (2017). Variables associated with achievement in higher education: A systematic review of meta-analyses. Psychological Bulletin, 143(6), 565–600. https://doi.org/10.1037/bul0000098.
Seldin, P. (1998). How colleges evaluate teaching: 1988 versus 1998. American Association ofHigher Education Bulletin, 50, 3–7.
Sharkness, J., & DeAngelo, L. (2011). Measuring student involvement: A comparison of classical test theory and item response theory in the construction of scales from student surveys. Research in Higher Education, 52, 480–507.
Smalzried, N. T., & Remmers, H. H. (1943). A factor analysis of the purdue rating scale for instructors. Journal of Educational Psychology, 34(6), 363–367. https://doi.org/10.1037/h0060532.
Smith, A. A. (1944). College teachers evaluated by students. Sociology and Social Research, 28, 471–478.
Smith, P. L. (1979). The generalizability of student ratings of courses: Asking the right questions. Journal of Educational Measurement, 16, 77–87.
Spector, P. E. (1994). Using self-report questionnaires in OB research: A comment on the use of a controversial method. Journal of Organizational Behavior, 15(5), 385–392. http://www.jstor.org/stable/2488210.
Squires, D. (2012). Curriculum alignment research suggests that alignment can improve student achievement. The Clearing House: A Journal of Educational Strategies Issues and Ideas, 85(4), 129–135. https://doi.org/10.1080/00098655.2012.657723.
Stalnaker, J. M., & Remmers, H. H. (1928). Can students discriminate traits associated with success in teaching? Journal of Applied Psychology, 12(6), 602–610. https://doi.org/10.1037/h0070372.
Stroebe, W. (2020). Student evaluations of teaching encourages poor teaching and contributes to grade inflation: A theoretical and empirical analysis. Basic and Applied Social Psychology, 42, 276–294. https://doi.org/10.1080/01973533.2020.1756817.
Theall, M., & Franklin, J. (2001). Looking for bias in all the wrong places: A search for truth or a witch hunt in student ratings of instruction? New Directions for Institutional Research, 109, 45–56. https://doi.org/10.1002/ir.3.
Theobald, E. J., Hill, M. J., Tran, E., Agrawal, S., Arroyo, E. N., Behling, S., Chambwe, N., Cintron, D. L., Cooper, J. D., Dunster, G., Grummer, J. A., Hennessey, K., Hsiao, J., Iranon, N., Jones, L., Jordt, H., Keller, M., Lacey, M. E., Littlefield, C. E., & Freeman, S. (2020). Active learning narrows achievement gaps for underrepresented students in undergraduate science, technology, engineering, and math. Proceedings of the National Academy of Sciences, 117(12), 6476 LP – 6483. https://doi.org/10.1073/pnas.1916903117.
Uttl, B., & Smibert, D. (2017). Student evaluations of teaching: Teaching quantitative courses can be hazardous to one’s career. PeerJ. https://doi.org/10.7717/peerj.3299.
Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22–42. https://doi.org/10.1016/j.stueduc.2016.08.007.
Wallace, S. L., Lewis, A. K., & Allen, M. D. (2019). The state of the literature on student evaluations of teaching and an exploratory analysis of written comments: Who benefits most? College Teaching, 67(1), 1–14. https://doi.org/10.1080/87567555.2018.1483317.
Walter, E. M., Henderson, C. R., Beach, A. L., & Williams, C. T. (2016). Introducing the Postsecondary Instructional Practices Survey (PIPS): A concise, interdisciplinary, and easy-to-score survey. CBE life Sciences Education, 15(4), ar53. https://doi.org/10.1187/cbe.15-09-0193.
Wauters, K., Desmet, P., & Van den Noortgate, W. (2010). Adaptive item-based learning environments based on the item response theory: Possibilities and challenges. Journal of Computer Assisted Learning, 26, 549–562.
Wolfe, J. (2022). Let’s stop relying on biased teaching evaluations. Inside Higher Ed, January 20, 2022https://www.insidehighered.com/advice/2022/01/21/teaching-evaluations-reflect-colleges-commitment-diversity-opinion.
Zimmerman, H. T., & Bell, P. (2014). Where young people see science: Everyday activities connected to science. International Journal of Science Education Part B, 4(1), 25–53. https://doi.org/10.1080/21548455.2012.741271.
Funding
No funding was received to conduct this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interests to declare.
Ethics approval and consent to participate
This research was approved by the Committee for the Protection of Human Subjects at California State University, Fresno as minimal risk.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dyer, K., Donnelly-Hermosillo, D. Student Ratings of Instruction: Updating Measures to Reflect Recent Scholarship. Res High Educ (2024). https://doi.org/10.1007/s11162-024-09804-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11162-024-09804-8