Abstract
Virtually every institution of higher education in the US uses some type of student teaching evaluation (STE) instrument as a means of assessing instructors’ instructional performance in courses. Unfortunately, many administrators and faculty misinterpret STE ratings. Therefore, the present article provides a comprehensive critique of STE instruments. In particular, we build on Messick’s (Educational Measurement, MacMillan, pp. 13–103, and Messick (Am. Psychol., 50, 741–749, 1995, 1989) conceptualization of validity to yield what we refer to as a meta-validity model that subdivides content-, criterion-, and construct-related validity into several areas of evidence. We use our meta-validity model to conduct a meta-validity analysis of STEs. Specifically, we assessed the score-validity of STEs based on findings from the extant literature. We conclude that strong evidence has been provided with respect to areas of criterion-related validity; however, for the most part, weak or inadequate evidence has been provided with regard to areas of both content-related and construct-related validity. This seriously calls into question both the score-validity and utility of STEs.
This is a preview of subscription content, access via your institution.
References
Aleamoni L.M. (1981). The use of student evaluations in the improvement of instruction. NACTA. J. 20: 16
Alreck P.L. and Settle R.B. (1985). The Survey Research Handbook. Irwin, Homewood, IL
Ambady N. and Rosenthal R. (1992). Half a minute. Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. J. Pers. Soc. Psychol. 64: 431–441
American Educational Research Association American Psychological Association National Council on Measurement and Evaluation : (1985). Standards for Educational and Psychological Testing. American Psychological Association, Washington
American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (1999) Standards for Educational and Psychological Testing (Rev edn). American Educational Research Association, Washington
Babad E. (2001). Students’ course selection: differential considerations for first and last course. Res. High. Educ. 42: 469–492
Blackburn R.T. and Clark M.J. (1975). An assessment of faculty performance. Some correlates between administrators, colleagues, students and self-ratings. Sociol. Educ. 48: 242–256
Braskamp L.A., Brandenberg D.C. and Ory J.C. (1984). Evaluating teaching effectiveness: a practical guide. Sage, Beverly Hills, CA
Campbell D.T. and Fiske D.W. (1959). Convergent and discriminant validation by the multitrait–multimethod matrix. Psychol. Bull. 56: 81–105
Centra J.A. (1974). The relationship between student and alumni ratings of teachers. Educ. Psychol. Meas. 34: 321–326
Centra J.A. (1976). The influence of different directions on student ratings of instruction. J. Educ. Meas. 13: 277–282
Centra J.A. (1993). Reflective faculty evaluation. Jossey-Bass, San Francisco, CA
Centra J.A. and Creech F.R. (1976). The relationship between student teachers and course characteristics and student ratings of teacher effectiveness. Project Report 76–1. Educational Testing Service, Princeton, NJ
Cohen P.A. (1980). Effectiveness of student-rating feedback for improving college instruction: a meta-analysis of findings. Res. High. Educ. 13: 321–341
Cohen P.A. (1981). Student ratings of instruction and student achievement: a meta-analysis of multisection validity studies. Rev. Educ. Res. 51: 281–309
Crocker L. and Algina J. (1986). Introduction to classical and modern test theory. Holt, Rinehart, & Winston, Orlando, FL
D’Apollonia S. and Abrami P.C. (1997). Navigating student ratings of instruction. Am. Psychol. 52(11): 1198–1208
Dommeyer C.J., Baum P., Chapman K.S. and Hanna R.W. (2002). Attitudes of business faculty towards two methods of collecting teaching evaluations. Paper vs. online. Assess. Eval. High. Educ. 27: 455–462
Doyle K.O. and Crichton L.A. (1978). Student, peer and self-evaluations of college instruction. J. Educ. Psychol. 70: 815–826
Feldman K.A. (1978). Course characteristics and college students’ ratings of their teachers and courses: what we know and what we don’t. Res. High. Educ. 9: 199–242
Feldman K.A. (1979). The significance of circumstances for college students’ ratings of their teachers and courses: a review and analysis. Res. High. Educ. 10: 149–172
Feldman K.A. (1984). Class size and college students’ evaluations of teachers and courses: a closer look. Res. High. Educ. 21: 45–116
Feldman K.A. (1989). The association between student ratings of specific instructional dimensions and student achievement: refining and extending the synthesis of data from multisection validity studies. Res. High. Educ. 30: 583–645
Fernald P.S. (1990). Students’ ratings of instruction: standardized and customized. Teach. Psychol. 17: 105–109
Gray M. and Bergmann B.R. (2003). Student teaching evaluations: inaccurate, demeaning, misused. Academe 89(5): 44–46
Greenwald A.G. and Gillmore G.M. (1997). Grading leniency is a removable contaminant of student ratings. Am. Psychol. 52: 1209–1217
Guthrie E.R. (1954). The evaluation of teaching: a progress report. University of Washington, Seattle, WA
Haskell R.E.: Academic freedom, tenure, and student evaluation of faculty: galloping polls in the 21st century. Educ. Policy Anal. Arch. 5(6). Retrieved February 3, 2005, from http://epaa.asu.edu/epaa/v5n6.html (1997)
Howard G.S., Conway C.G. and Maxwell S.E. (1985). Construct validity of measures of college teaching effectiveness. J. Educ. Psychol. 77: 187–196
Kierstead, D.P., D’Agostino, P., Dill, H.: Sex role stereotyping of college professors: bias in students’ ratings of instructors. J. Educ. Psychol. 80:342–344 (1988)
Kulik J.A. (2001). Student ratings: validity, utility and controversy. New Dir. Inst. Res. 109: 9–25
Lombardo J.P. and Tocci M.E. (1979). Attribution of positive and negative characteristics of instructors as a function of attractiveness and sex of instructor and sex of subject. Percept. Mot. Skills 48: 491–494
Marsh H.W. (1984). Students’ evaluations of university teaching: dimensionality, reliability, validity, potential biases and utility. J. Educ. Psychol. 76: 707–754
Marsh H.W. (1987). Students’ evaluations of university teaching: research findings, methodological issues and directions for future research. Int. J. Educ. Res. 11: 253–388
Marsh H.W. and Bailey M. (1993). Multidimensional students’ evaluations of teaching effectiveness. A profile analysis. J. High. Educ. 64: 1–18
Marsh H.W., Overall J.U. and Kessler S.P. (1979). Validity of student evaluations of instructional effectiveness: a comparison of faculty self-evaluations and evaluations by their students. J. Educ. Psychol. 71: 149–160
Marsh H.W. and Roche L.A. (1993). The use of students’ evaluations and an individually structured intervention to enhance university teaching. Am. Educ. Res. J. 30: 217–251
Marsh H.W. and Roche L.A. (2000). Effects of grading leniency and low workload in students’ evaluations of teaching: popular myth, bias, validity, or innocent bystanders?. J. Educ. Psychol. 92: 202–228
McCallum L.W. (1984). A meta-analysis of course evaluation data and its use in the tenure decision. Res. High. Educ. 21: 150–158
Messick S. (1989). Validity. In: Linn, R.L. (eds) Educational Measurement, 3rd edn., pp 13–103. Macmillan, Old Tappan, NJ
Messick S. (1995). Validity of psychological assessment: validation of inferences from persons responses and performances as scientific inquiry into score meaning. Am. Psychol. 50: 741–749
Murray H.G. (1983). Low-inference classroom teaching behaviors and student ratings of college teaching effectiveness. J. Educ. Psychol. 71: 856–865
Naftulin D.H., Ware J.E. and Donnelly F.A. (1973). The doctor fox lecture: a paradigm of educational seduction. J. Med. Educ. 48: 630–635
Newport F.J. (1996). Rating teaching in the USA: probing the qualifications of student raters and novice teachers. Assess. Educ. High. Educ. 21(1): 17–21
Onwuegbuzie A.J. and Daniel L.G. (2002). Uses and misuses of the correlation coefficient. Res. Sch. 9(1): 73–90
Onwuegbuzie, A.J., Daniel, L.G.: Typology of analytical and interpretational errors in quantitative and qualitative educational research. Curr. Issues Educ. [On-line], 6(2). Available: http://cie.ed.asu.edu/volume6/number2/ (2003)
Onwuegbuzie A.J. and Weems G.H. (2004). Response categories on rating scales: characteristics of item respondents who frequently utilize midpoint. Res. Sch. 11(1): 51–60
Onwuegbuzie A.J., Witcher W.A.E., Collins K.M.T., Filer J.D., Wiedmaier C.D. and Moore C.W. (2007). Students’ perceptions of characteristics of effective college teachers: a validity study of a teaching evaluation form using a mixed-methods analysis. Am. Educ. Res. J. 44: 113–160
Ory J.C. (2000). Teaching evaluation: past, present and future. New Dir. Teach. Learn. 83: 13–18
Ory J.C., Braskamp L.A. and Pieper D.M. (1980). The congruency of student evaluative information collected by three methods. J. Educ. Psychol. 72: 181–185
Ory J.C. and Ryan K. (2001). How do student ratings measure up to a new validity framework?. New Dir. Ins. Res. 109: 27–44
Overall J.U. and Marsh H.W. (1979). Midterm feedback from students: its relationship to instructional improvement and students’ cognitive and affective outcomes. J. Educ. Psychol. 71: 856–865
Overall J.U. and Marsh H.W. (1980). Students’ evaluations of instruction: a longitudinal study of their stability. J. Educ. Psychol. 72: 321–325
Penny A.R. (2003). Changing the agenda for research into students’ views about university teaching: four shortcomings of SRT research. Teach. High. Educ. 8(3): 399–411
Peterson K. and Kauchak D. (1982). Teacher evaluation: perspectives, practices and promises. Utah University Center for Educational Practice, Salt Lake City, Utah
Rodin M. and Rodin B. (1972). Student evaluations of teachers. Sci. 177: 1164–1166
Schmelkin L.P., Spencer K.J. and Gellman E.S. (1997). Faculty perspectives on course and teacher evaluations. Res. High. Educ. 38(5): 575–592
Seldin P. (1984). Changing practices in faculty evaluation. Jossey-Bass, San Francisco, CA
Seldin P. (1993). The use and abuse of student ratings of professors. Chron. High. Educ. 21: A40
Shapiro E.G. (1990). Effects of instructor and class characteristics on students’ class evaluations. Res. High. Educ. 31: 135–148
Simmons T.L. (1996). Student evaluation of teachers: professional practice or punitive policy?. JALT Test. Eval. N-SIG Newsl. 1(1): 12–16
Spencer K.J. and Schmelkin L.P. (2002). Student perspectives on teaching and its evaluation. Assess. High. Educ. 27: 397–409
Sun A. and Valiga M.J. (1997). Using generalizability theory to assess the reliability of student ratings of academic advising. J. Exp. Educ. 65: 367–379
Theall M. and Franklin J. (2001). Looking for bias in all the wrong places: a search for truth or a witch hunt in student ratings of instruction?. New Dir. Teach. Learn. 109: 45–56
Washburn, K., Thornton, J. F.: (eds.) Dumbing down: essays on the strip mining of American culture. W.W. Norton & Company, New York (1996)
Weems G.H. and Onwuegbuzie A.J. (2001). The impact of midpoint responses and reverse coding on survey data. Meas. Eval. Couns. Dev. 34: 166–176
Williams W.M. and Ceci S.J. (1997). How’m I doing? Problems with student ratings of instructors and courses. Change 29(5): 13–23
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Onwuegbuzie, A.J., Daniel, L.G. & Collins, K.M.T. A meta-validation model for assessing the score-validity of student teaching evaluations. Qual Quant 43, 197–209 (2009). https://doi.org/10.1007/s11135-007-9112-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11135-007-9112-4