The authors examined whether feedback from student ratings of instruction not augmented with consultation helps college teachers to improve their student ratings on a long-term basis. The study reported was conducted in an institution where no previous teaching-effectiveness evaluations had taken place. At the end of each of four consecutive semesters, student ratings were assessed and teachers were provided with feedback. Data from 3122 questionnaires evaluating 12 teachers were analyzed using polynomial and piecewise random coefficient models. Results revealed that student ratings increased from the no-feedback baseline semester to the second semester and then gradually decreased from the second to the fourth semester, although feedback was provided after each semester. The findings suggest that student ratings not augmented with consultation are far less effective than typically assumed when considered from a long-term perspective.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Similar content being viewed by others
Abrami P.C., d’Apollonia S. (1991). Multidimensional students’ evaluations of teaching effectiveness? Generalizability of “N = 1” research: Comment on Marsh (1991). Journal of Educational Psychology 83:411–415
Abrami P.C., d’Apollonia S., Cohen P.A. (1990). Validity of student ratings of instruction: What we know and what we do not. Journal of Educational Psychology 82:219–231
Adair J.G., Sharpe D., Huynh C.L. (1989). Hawthorne control procedures in educational experiments: A reconsideration of their use and effectiveness. Review of Educational Research 59:215–228
Akaike H. (1973). Information theory as an extension of the maximum likelihood principle. In: Petrov B.N., Csaki F. (eds). Second international symposium on information theory. Akademiai Kiado, Budapest, Hungary, pp. 267–281
Armstrong S.J. (1998). Are student ratings of instruction useful?. American Psychologist 53:1223–1224
Basow S.A. (1995). Student evaluations of college professors: When gender matters. Journal of Educational Psychology 87:656–665
Biesanz J.C., Deeb-Sossa N., Papadakis A.A., Bollen K.A., Curran P.J. (2004). The role of coding time in estimating and interpreting growth curve models. Psychological Methods 9:30–52
Bliese P.D. (2000). Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. In: Klein K.J., Kozlowski S.W.J. (eds). Multilevel theory, research, and methods in organizations: Foundations, extensions, and new directions. Jossey-Bass, San Francisco, CA, pp. 349–381
Bliese P.D., Jex S.M. (2002). Incorporating a multilevel perspective into occupational stress research: Theoretical, methodological, and practical implications. Journal of Occupational Health Psychology 7:265–276
Bliese P.D., Ployhart R.E. (2002). Growth modeling using random coefficient models: Model building, testing, and illustrations. Organizational Research Methods 5:362–387
Bryk A.S., Raudenbush S.W. (1987). Applications of hierarchical linear models to assessing change. Psychological Bulletin 101:147–158
Campbell D.T., Stanley J.C. (1963). Experimental and quasi-experimental designs for research. Rand McNally, Chicago
Carlson K.D., Schmidt F.L. (1999). Impact of experimental design on effect size: Findings from the research literature on training. Journal of Applied Psychology 84:851–862
Carter R.E. (1989). Comparison of criteria for academic promotion of medical-school and university-based psychologists. Professional Psychology: Research and Practice 20:400–403
Cashin W.E., Downey R.G. (1992). Using global student rating items for summative evaluation. Journal of Educational Psychology 84:563–572
Cohen J., Cohen P., West S.G., Aiken L.S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed). Erlbaum, Mahwah, NJ
Cohen P.A. (1980). Effectiveness of student-rating feedback for improving college instruction: A meta-analysis. Research in Higher Education 13:321–341
Coleman J., McKeachie W.J. (1981). Effects of instructor/course evaluations on student course selection. Journal of Educational Psychology 73:224–226
Cronbach L.J., Furby L. (1970). How we should measure “change”: Or should we? Psychological Bulletin 74:68–80
d’Apollonia S., Abrami P.C. (1997). Navigating student ratings of instruction. American Psychologist 52:1198–1208
DeShon R.P., Ployhart R.E., Sacco J.M. (1998). The estimation of reliability in longitudinal models. International Journal of Behavior and Development 22:493–515
Diehl, J.M. (2002) VBVOR–VBREF. Fragebögen zur studentischen Evaluation von Hochschulveranstaltungen – Manual. [VBVOR – VBREF. Questionnaires for students’ evaluations of college courses–Manual]. Retrieved on March 17, 2005 from http://www.psychol.uni-giessen.de/dl/det/diehl/2368/
Diehl J.M. (2003). Normierung zweier Fragebögen zur studentischen Beurteilung von Vorlesungen und Seminaren [Student evaluations of lectures and seminars: Norms for two recently developed questionnaires]. Psychologie in Erziehung und Unterricht 50:27–42
Diehl J.M., Kohr H.-U. (1977). Entwicklung eines Fragebogens zur Beurteilung von Hochschulveranstaltungen im Fach Psychologie [Development of a psychology course evaluation questionnaire]. Psychologie in Erziehung und Unterricht 24:61–75
Firebaugh G. (1978). A rule for inferring individual-level relationships from aggregate data. American Sociological Review 43:557–572
Franklin J., Theall M. (2002). Faculty thinking about the design and evaluation of instruction. In: Hativa N., Goodyear P. (eds). Teacher thinking, beliefs and knowledge in higher education. Kluwer, Dordrecht, The Netherlands
Greenwald A.G. (1997). Validity concerns and usefulness of student ratings of instruction. American Psychologist 52:1182–1186
Greenwald A.G., Gillmore G.M. (1997a). Grading leniency is a removable contaminant of student ratings. American Psychologist 52:1209–1217
Greenwald A.G., Gillmore G.M. (1997b). No pain, no gain? The importance of measuring course workload in student ratings of instruction. Journal of Educational Psychology 89:743–751
Greenwald A.G., Gillmore G.M. (1998). How useful are student ratings? Reactions to comments on the current issues section. American Psychologist 53:1228–1229
Guzzo R.A., Jette R.D., Katzell R.A. (1985) The effects of psychologically based intervention programs on worker productivity: A meta-analysis. Personnel Psychology 38:275–291
Hernández-Lloreda M.V., Colmenares F., Martínez-Arias R. (2004). Application of piecewise hierarchical linear growth modeling to the study of continuity in behavioral development of baboons (Papio hamadryas). Journal of Comparative Psychology 118:316–324
Hofmann D.A., Jacobs R., Baratta J.E. (1993). Dynamic criteria and the measurement of change. Journal of Applied Psychology 78:194–204
Howell A.J., Symbaluk D.G. (2001). Published student ratings of instruction: Revealing and reconciling the views of students and faculty. Journal of Educational Psychology 93:790–796
James L.R., Demaree R.Q., Wolf G. (1984). Estimating withingroup interrater reliability with and without response bias. Journal of Applied Psychology 69:85–98
Klein K.J., Dansereau F., Hall R.J. (1994). Levels issues in theory development, data collection, and analysis. Academy of Management Review 19:195–229
Kluger A.N., DeNisi A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin 119:254–284
L’Hommedieu R., Menges R.J., Brinko K.T. (1990). Methodological explanations for the modest effects of feedback from student ratings. Journal of Educational Psychology 82:232–241
Longford N. (1993). Random coefficient models. Oxford University Press, Oxford
Marsh H.W. (1991). Multidimensional students’ evaluations of teaching effectiveness: A test of alternative higher-order structures. Journal of Educational Psychology 83:285–296
Marsh H.W. (1994). Comments to: “Review of the dimensionality of student ratings of instruction: I. Introductory remarks. II. Aggregation of factor studies. III. A meta-analysis of the factor studies”. Instructional Evaluation and Faculty Development 14:13–19
Marsh H.W., Hocevar D. (1991). The multidimensionality of students’ evaluations of teaching effectiveness: The generality of factor structures across academic discipline, instructor level, and course level. Teaching and Teacher Education 7:9–18
Marsh H.W., Roche L.A. (1993). The use of student evaluations and an individually structured intervention to enhance university teaching effectiveness. American Educational Research Journal 30:217–251
Marsh H.W., Roche L.A. (1997). Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist 52:1187–1197
Marsh H.W., Roche L.A. (2000). Effects of grading leniency and low workload on students’ evaluations of teaching: Popular myth, bias, validity, or innocent bystanders? Journal of Educational Psychology 92:202–228
McKeachie W.J. (1997). Student ratings—the validity of use. American Psychologist 52:1218–1225
Pinheiro J.C., Bates D.M. (2000). Mixed-effects models in S and S-PLUS. Springer, New York
R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Raftery A.E. (1995). Bayesian model selection in social research. Sociological Methodology 25:111–196
Raudenbush S.W., Bryk A.S. (2002). Hierarchical linear models: Applications and data analysis methods, 2nd edn. Sage, Thousand Oaks, CA
Rousseau D. (1985). Issues of level in organizational research: Multi-level and cross-level perspectives. In: Cummings L.L., Staw B.M. (eds). Research in organizational behavior, vol. 7. JAI Press, Greenwich, CT, pp. 1–37
Schwarz G. (1978). Estimating the dimension of a model. Annals of Statistics 6:461–464
Stevens J.J., Aleamoni L.M. (1985). The use of evaluative feedback for instructional improvement: A longitudinal perspective. Instructional Science 13:285–304
Ting K. (2000). Cross-level effects of class characteristics on students’ perceptions of teaching quality. Journal of Educational Psychology 92:818–825
Wilhelm W.B. (2004). The relative influence of published teaching evaluations and other instructor attributes on course choice. Journal of Marketing Education 26:17–30
Wood R.E., Locke E.A. (1990). Goal setting and strategy effects on complex tasks. In: Cummings L.L., Staw B.M. (eds). Research in organizational behavior, vol. 12. JAI Press, Greenwich, CT, pp. 73–109
We would like to thank Jessica Ippolito, Anette Kluge, Jan Schilling, and two anonymous reviewers for their helpful comments on an earlier version of this article, and Paul D. Bliese for answering questions on random coefficient modeling and data aggregation. Further thanks go to Susannah Goss for improving the language of this article.
Appendix: Items used in the study
Appendix: Items used in the study
What grade would you give the instructor?
What grade would you give this class?
The instructor was not particularly interested in the students’ progress. (R)
The instructor’s attitude toward the students was cold and unpersonal. (R)
The instructor seemed to see teaching as a duty and a routine activity. (R)
The instructor was clearly only interested in getting through the material. (R)
It was easy to follow the material covered in the course.
Too much material was covered in the course. (R)
The pace was too fast. (R)
You had to put in a lot of extra work to keep up with the course. (R)
The course was often confusing because it seemed to lack structure, and it was easy to get lost. (R)
The instructor presented the material in a clear and understandable manner.
The instructor planned and delivered the course well.
The course was clearly structured.
Note: Original German versions of the items may be found in Diehl (2002). Items 1 and 2 are from the global subscale, items 3–6 from the rapport subscale, items 7–10 from the difficulty subscale, and items 10–14 from the teaching skill subscale. R = items scored reversely to form the overall index.
About this article
Cite this article
Lang, J.W.B., Kersting, M. Regular Feedback from Student Ratings of Instruction: Do College Teachers Improve their Ratings in the Long Run?. Instr Sci 35, 187–205 (2007). https://doi.org/10.1007/s11251-006-9006-1