Skip to main content

Advertisement

Log in

A meta-validation model for assessing the score-validity of student teaching evaluations

  • Original Paper
  • Published:
Quality & Quantity Aims and scope Submit manuscript

Abstract

Virtually every institution of higher education in the US uses some type of student teaching evaluation (STE) instrument as a means of assessing instructors’ instructional performance in courses. Unfortunately, many administrators and faculty misinterpret STE ratings. Therefore, the present article provides a comprehensive critique of STE instruments. In particular, we build on Messick’s (Educational Measurement, MacMillan, pp. 13–103, and Messick (Am. Psychol., 50, 741–749, 1995, 1989) conceptualization of validity to yield what we refer to as a meta-validity model that subdivides content-, criterion-, and construct-related validity into several areas of evidence. We use our meta-validity model to conduct a meta-validity analysis of STEs. Specifically, we assessed the score-validity of STEs based on findings from the extant literature. We conclude that strong evidence has been provided with respect to areas of criterion-related validity; however, for the most part, weak or inadequate evidence has been provided with regard to areas of both content-related and construct-related validity. This seriously calls into question both the score-validity and utility of STEs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aleamoni L.M. (1981). The use of student evaluations in the improvement of instruction. NACTA. J. 20: 16

    Google Scholar 

  • Alreck P.L. and Settle R.B. (1985). The Survey Research Handbook. Irwin, Homewood, IL

    Google Scholar 

  • Ambady N. and Rosenthal R. (1992). Half a minute. Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. J. Pers. Soc. Psychol. 64: 431–441

    Google Scholar 

  • American Educational Research Association American Psychological Association National Council on Measurement and Evaluation : (1985). Standards for Educational and Psychological Testing. American Psychological Association, Washington

    Google Scholar 

  • American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (1999) Standards for Educational and Psychological Testing (Rev edn). American Educational Research Association, Washington

  • Babad E. (2001). Students’ course selection: differential considerations for first and last course. Res. High. Educ. 42: 469–492

    Article  Google Scholar 

  • Blackburn R.T. and Clark M.J. (1975). An assessment of faculty performance. Some correlates between administrators, colleagues, students and self-ratings. Sociol. Educ. 48: 242–256

    Google Scholar 

  • Braskamp L.A., Brandenberg D.C. and Ory J.C. (1984). Evaluating teaching effectiveness: a practical guide. Sage, Beverly Hills, CA

    Google Scholar 

  • Campbell D.T. and Fiske D.W. (1959). Convergent and discriminant validation by the multitrait–multimethod matrix. Psychol. Bull. 56: 81–105

    Article  Google Scholar 

  • Centra J.A. (1974). The relationship between student and alumni ratings of teachers. Educ. Psychol. Meas. 34: 321–326

    Article  Google Scholar 

  • Centra J.A. (1976). The influence of different directions on student ratings of instruction. J. Educ. Meas. 13: 277–282

    Article  Google Scholar 

  • Centra J.A. (1993). Reflective faculty evaluation. Jossey-Bass, San Francisco, CA

    Google Scholar 

  • Centra J.A. and Creech F.R. (1976). The relationship between student teachers and course characteristics and student ratings of teacher effectiveness. Project Report 76–1. Educational Testing Service, Princeton, NJ

    Google Scholar 

  • Cohen P.A. (1980). Effectiveness of student-rating feedback for improving college instruction: a meta-analysis of findings. Res. High. Educ. 13: 321–341

    Article  Google Scholar 

  • Cohen P.A. (1981). Student ratings of instruction and student achievement: a meta-analysis of multisection validity studies. Rev. Educ. Res. 51: 281–309

    Google Scholar 

  • Crocker L. and Algina J. (1986). Introduction to classical and modern test theory. Holt, Rinehart, & Winston, Orlando, FL

    Google Scholar 

  • D’Apollonia S. and Abrami P.C. (1997). Navigating student ratings of instruction. Am. Psychol. 52(11): 1198–1208

    Article  Google Scholar 

  • Dommeyer C.J., Baum P., Chapman K.S. and Hanna R.W. (2002). Attitudes of business faculty towards two methods of collecting teaching evaluations. Paper vs. online. Assess. Eval. High. Educ. 27: 455–462

    Article  Google Scholar 

  • Doyle K.O. and Crichton L.A. (1978). Student, peer and self-evaluations of college instruction. J. Educ. Psychol. 70: 815–826

    Article  Google Scholar 

  • Feldman K.A. (1978). Course characteristics and college students’ ratings of their teachers and courses: what we know and what we don’t. Res. High. Educ. 9: 199–242

    Article  Google Scholar 

  • Feldman K.A. (1979). The significance of circumstances for college students’ ratings of their teachers and courses: a review and analysis. Res. High. Educ. 10: 149–172

    Article  Google Scholar 

  • Feldman K.A. (1984). Class size and college students’ evaluations of teachers and courses: a closer look. Res. High. Educ. 21: 45–116

    Article  Google Scholar 

  • Feldman K.A. (1989). The association between student ratings of specific instructional dimensions and student achievement: refining and extending the synthesis of data from multisection validity studies. Res. High. Educ. 30: 583–645

    Article  Google Scholar 

  • Fernald P.S. (1990). Students’ ratings of instruction: standardized and customized. Teach. Psychol. 17: 105–109

    Article  Google Scholar 

  • Gray M. and Bergmann B.R. (2003). Student teaching evaluations: inaccurate, demeaning, misused. Academe 89(5): 44–46

    Google Scholar 

  • Greenwald A.G. and Gillmore G.M. (1997). Grading leniency is a removable contaminant of student ratings. Am. Psychol. 52: 1209–1217

    Article  Google Scholar 

  • Guthrie E.R. (1954). The evaluation of teaching: a progress report. University of Washington, Seattle, WA

    Google Scholar 

  • Haskell R.E.: Academic freedom, tenure, and student evaluation of faculty: galloping polls in the 21st century. Educ. Policy Anal. Arch. 5(6). Retrieved February 3, 2005, from http://epaa.asu.edu/epaa/v5n6.html (1997)

  • Howard G.S., Conway C.G. and Maxwell S.E. (1985). Construct validity of measures of college teaching effectiveness. J. Educ. Psychol. 77: 187–196

    Article  Google Scholar 

  • Kierstead, D.P., D’Agostino, P., Dill, H.: Sex role stereotyping of college professors: bias in students’ ratings of instructors. J. Educ. Psychol. 80:342–344 (1988)

    Google Scholar 

  • Kulik J.A. (2001). Student ratings: validity, utility and controversy. New Dir. Inst. Res. 109: 9–25

    Article  Google Scholar 

  • Lombardo J.P. and Tocci M.E. (1979). Attribution of positive and negative characteristics of instructors as a function of attractiveness and sex of instructor and sex of subject. Percept. Mot. Skills 48: 491–494

    Google Scholar 

  • Marsh H.W. (1984). Students’ evaluations of university teaching: dimensionality, reliability, validity, potential biases and utility. J. Educ. Psychol. 76: 707–754

    Article  Google Scholar 

  • Marsh H.W. (1987). Students’ evaluations of university teaching: research findings, methodological issues and directions for future research. Int. J. Educ. Res. 11: 253–388

    Article  Google Scholar 

  • Marsh H.W. and Bailey M. (1993). Multidimensional students’ evaluations of teaching effectiveness. A profile analysis. J. High. Educ. 64: 1–18

    Google Scholar 

  • Marsh H.W., Overall J.U. and Kessler S.P. (1979). Validity of student evaluations of instructional effectiveness: a comparison of faculty self-evaluations and evaluations by their students. J. Educ. Psychol. 71: 149–160

    Article  Google Scholar 

  • Marsh H.W. and Roche L.A. (1993). The use of students’ evaluations and an individually structured intervention to enhance university teaching. Am. Educ. Res. J. 30: 217–251

    Google Scholar 

  • Marsh H.W. and Roche L.A. (2000). Effects of grading leniency and low workload in students’ evaluations of teaching: popular myth, bias, validity, or innocent bystanders?. J. Educ. Psychol. 92: 202–228

    Article  Google Scholar 

  • McCallum L.W. (1984). A meta-analysis of course evaluation data and its use in the tenure decision. Res. High. Educ. 21: 150–158

    Article  Google Scholar 

  • Messick S.  (1989). Validity. In: Linn, R.L. (eds) Educational Measurement, 3rd edn., pp 13–103. Macmillan, Old Tappan, NJ

    Google Scholar 

  • Messick S. (1995). Validity of psychological assessment: validation of inferences from persons responses and performances as scientific inquiry into score meaning. Am. Psychol. 50: 741–749

    Article  Google Scholar 

  • Murray H.G. (1983). Low-inference classroom teaching behaviors and student ratings of college teaching effectiveness. J. Educ. Psychol. 71: 856–865

    Google Scholar 

  • Naftulin D.H., Ware J.E. and Donnelly F.A. (1973). The doctor fox lecture: a paradigm of educational seduction. J. Med. Educ. 48: 630–635

    Google Scholar 

  • Newport F.J. (1996). Rating teaching in the USA: probing the qualifications of student raters and novice teachers. Assess. Educ. High. Educ. 21(1): 17–21

    Article  Google Scholar 

  • Onwuegbuzie A.J. and Daniel L.G. (2002). Uses and misuses of the correlation coefficient. Res. Sch. 9(1): 73–90

    Google Scholar 

  • Onwuegbuzie, A.J., Daniel, L.G.: Typology of analytical and interpretational errors in quantitative and qualitative educational research. Curr. Issues Educ. [On-line], 6(2). Available: http://cie.ed.asu.edu/volume6/number2/ (2003)

  • Onwuegbuzie A.J. and Weems G.H. (2004). Response categories on rating scales: characteristics of item respondents who frequently utilize midpoint. Res. Sch. 11(1): 51–60

    Google Scholar 

  • Onwuegbuzie A.J., Witcher W.A.E., Collins K.M.T., Filer J.D., Wiedmaier C.D. and Moore C.W. (2007). Students’ perceptions of characteristics of effective college teachers: a validity study of a teaching evaluation form using a mixed-methods analysis. Am. Educ. Res. J. 44: 113–160

    Article  Google Scholar 

  • Ory J.C. (2000). Teaching evaluation: past, present and future. New Dir. Teach. Learn. 83: 13–18

    Article  Google Scholar 

  • Ory J.C., Braskamp L.A. and Pieper D.M. (1980). The congruency of student evaluative information collected by three methods. J. Educ. Psychol. 72: 181–185

    Article  Google Scholar 

  • Ory J.C. and Ryan K. (2001). How do student ratings measure up to a new validity framework?. New Dir. Ins. Res. 109: 27–44

    Google Scholar 

  • Overall J.U. and Marsh H.W. (1979). Midterm feedback from students: its relationship to instructional improvement and students’ cognitive and affective outcomes. J. Educ. Psychol. 71: 856–865

    Article  Google Scholar 

  • Overall J.U. and Marsh H.W. (1980). Students’ evaluations of instruction: a longitudinal study of their stability. J. Educ. Psychol. 72: 321–325

    Article  Google Scholar 

  • Penny A.R. (2003). Changing the agenda for research into students’ views about university teaching: four shortcomings of SRT research. Teach. High. Educ. 8(3): 399–411

    Article  Google Scholar 

  • Peterson K. and Kauchak D. (1982). Teacher evaluation: perspectives, practices and promises. Utah University Center for Educational Practice, Salt Lake City, Utah

    Google Scholar 

  • Rodin M. and Rodin B. (1972). Student evaluations of teachers. Sci. 177: 1164–1166

    Article  Google Scholar 

  • Schmelkin L.P., Spencer K.J. and Gellman E.S. (1997). Faculty perspectives on course and teacher evaluations. Res. High. Educ. 38(5): 575–592

    Article  Google Scholar 

  • Seldin P. (1984). Changing practices in faculty evaluation. Jossey-Bass, San Francisco, CA

    Google Scholar 

  • Seldin P. (1993). The use and abuse of student ratings of professors. Chron. High. Educ. 21: A40

    Google Scholar 

  • Shapiro E.G. (1990). Effects of instructor and class characteristics on students’ class evaluations. Res. High. Educ. 31: 135–148

    Article  Google Scholar 

  • Simmons T.L. (1996). Student evaluation of teachers: professional practice or punitive policy?. JALT Test. Eval. N-SIG Newsl. 1(1): 12–16

    Google Scholar 

  • Spencer K.J. and Schmelkin L.P. (2002). Student perspectives on teaching and its evaluation. Assess. High. Educ. 27: 397–409

    Article  Google Scholar 

  • Sun A. and Valiga M.J. (1997). Using generalizability theory to assess the reliability of student ratings of academic advising. J. Exp. Educ. 65: 367–379

    Google Scholar 

  • Theall M. and Franklin J. (2001). Looking for bias in all the wrong places: a search for truth or a witch hunt in student ratings of instruction?. New Dir. Teach. Learn. 109: 45–56

    Google Scholar 

  • Washburn, K., Thornton, J. F.: (eds.) Dumbing down: essays on the strip mining of American culture. W.W. Norton & Company, New York (1996)

  • Weems G.H. and Onwuegbuzie A.J. (2001). The impact of midpoint responses and reverse coding on survey data. Meas. Eval. Couns. Dev. 34: 166–176

    Google Scholar 

  • Williams W.M. and Ceci S.J. (1997). How’m I doing? Problems with student ratings of instructors and courses. Change 29(5): 13–23

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anthony J. Onwuegbuzie.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Onwuegbuzie, A.J., Daniel, L.G. & Collins, K.M.T. A meta-validation model for assessing the score-validity of student teaching evaluations. Qual Quant 43, 197–209 (2009). https://doi.org/10.1007/s11135-007-9112-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11135-007-9112-4

Keywords

Navigation