Skip to main content
Log in

Regular Feedback from Student Ratings of Instruction: Do College Teachers Improve their Ratings in the Long Run?

  • Published:
Instructional Science Aims and scope Submit manuscript

Abstract

The authors examined whether feedback from student ratings of instruction not augmented with consultation helps college teachers to improve their student ratings on a long-term basis. The study reported was conducted in an institution where no previous teaching-effectiveness evaluations had taken place. At the end of each of four consecutive semesters, student ratings were assessed and teachers were provided with feedback. Data from 3122 questionnaires evaluating 12 teachers were analyzed using polynomial and piecewise random coefficient models. Results revealed that student ratings increased from the no-feedback baseline semester to the second semester and then gradually decreased from the second to the fourth semester, although feedback was provided after each semester. The findings suggest that student ratings not augmented with consultation are far less effective than typically assumed when considered from a long-term perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abrami P.C., d’Apollonia S. (1991). Multidimensional students’ evaluations of teaching effectiveness? Generalizability of “N = 1” research: Comment on Marsh (1991). Journal of Educational Psychology 83:411–415

    Article  Google Scholar 

  • Abrami P.C., d’Apollonia S., Cohen P.A. (1990). Validity of student ratings of instruction: What we know and what we do not. Journal of Educational Psychology 82:219–231

    Article  Google Scholar 

  • Adair J.G., Sharpe D., Huynh C.L. (1989). Hawthorne control procedures in educational experiments: A reconsideration of their use and effectiveness. Review of Educational Research 59:215–228

    Article  Google Scholar 

  • Akaike H. (1973). Information theory as an extension of the maximum likelihood principle. In: Petrov B.N., Csaki F. (eds). Second international symposium on information theory. Akademiai Kiado, Budapest, Hungary, pp. 267–281

    Google Scholar 

  • Armstrong S.J. (1998). Are student ratings of instruction useful?. American Psychologist 53:1223–1224

    Article  Google Scholar 

  • Basow S.A. (1995). Student evaluations of college professors: When gender matters. Journal of Educational Psychology 87:656–665

    Article  Google Scholar 

  • Biesanz J.C., Deeb-Sossa N., Papadakis A.A., Bollen K.A., Curran P.J. (2004). The role of coding time in estimating and interpreting growth curve models. Psychological Methods 9:30–52

    Article  Google Scholar 

  • Bliese P.D. (2000). Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. In: Klein K.J., Kozlowski S.W.J. (eds). Multilevel theory, research, and methods in organizations: Foundations, extensions, and new directions. Jossey-Bass, San Francisco, CA, pp. 349–381

    Google Scholar 

  • Bliese P.D., Jex S.M. (2002). Incorporating a multilevel perspective into occupational stress research: Theoretical, methodological, and practical implications. Journal of Occupational Health Psychology 7:265–276

    Article  Google Scholar 

  • Bliese P.D., Ployhart R.E. (2002). Growth modeling using random coefficient models: Model building, testing, and illustrations. Organizational Research Methods 5:362–387

    Google Scholar 

  • Bryk A.S., Raudenbush S.W. (1987). Applications of hierarchical linear models to assessing change. Psychological Bulletin 101:147–158

    Article  Google Scholar 

  • Campbell D.T., Stanley J.C. (1963). Experimental and quasi-experimental designs for research. Rand McNally, Chicago

    Google Scholar 

  • Carlson K.D., Schmidt F.L. (1999). Impact of experimental design on effect size: Findings from the research literature on training. Journal of Applied Psychology 84:851–862

    Article  Google Scholar 

  • Carter R.E. (1989). Comparison of criteria for academic promotion of medical-school and university-based psychologists. Professional Psychology: Research and Practice 20:400–403

    Article  Google Scholar 

  • Cashin W.E., Downey R.G. (1992). Using global student rating items for summative evaluation. Journal of Educational Psychology 84:563–572

    Article  Google Scholar 

  • Cohen J., Cohen P., West S.G., Aiken L.S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed). Erlbaum, Mahwah, NJ

    Google Scholar 

  • Cohen P.A. (1980). Effectiveness of student-rating feedback for improving college instruction: A meta-analysis. Research in Higher Education 13:321–341

    Article  Google Scholar 

  • Coleman J., McKeachie W.J. (1981). Effects of instructor/course evaluations on student course selection. Journal of Educational Psychology 73:224–226

    Article  Google Scholar 

  • Cronbach L.J., Furby L. (1970). How we should measure “change”: Or should we? Psychological Bulletin 74:68–80

    Article  Google Scholar 

  • d’Apollonia S., Abrami P.C. (1997). Navigating student ratings of instruction. American Psychologist 52:1198–1208

    Article  Google Scholar 

  • DeShon R.P., Ployhart R.E., Sacco J.M. (1998). The estimation of reliability in longitudinal models. International Journal of Behavior and Development 22:493–515

    Article  Google Scholar 

  • Diehl, J.M. (2002) VBVOR–VBREF. Fragebögen zur studentischen Evaluation von Hochschulveranstaltungen – Manual. [VBVOR – VBREF. Questionnaires for students’ evaluations of college courses–Manual]. Retrieved on March 17, 2005 from http://www.psychol.uni-giessen.de/dl/det/diehl/2368/

  • Diehl J.M. (2003). Normierung zweier Fragebögen zur studentischen Beurteilung von Vorlesungen und Seminaren [Student evaluations of lectures and seminars: Norms for two recently developed questionnaires]. Psychologie in Erziehung und Unterricht 50:27–42

    Google Scholar 

  • Diehl J.M., Kohr H.-U. (1977). Entwicklung eines Fragebogens zur Beurteilung von Hochschulveranstaltungen im Fach Psychologie [Development of a psychology course evaluation questionnaire]. Psychologie in Erziehung und Unterricht 24:61–75

    Google Scholar 

  • Firebaugh G. (1978). A rule for inferring individual-level relationships from aggregate data. American Sociological Review 43:557–572

    Article  Google Scholar 

  • Franklin J., Theall M. (2002). Faculty thinking about the design and evaluation of instruction. In: Hativa N., Goodyear P. (eds). Teacher thinking, beliefs and knowledge in higher education. Kluwer, Dordrecht, The Netherlands

    Google Scholar 

  • Greenwald A.G. (1997). Validity concerns and usefulness of student ratings of instruction. American Psychologist 52:1182–1186

    Article  Google Scholar 

  • Greenwald A.G., Gillmore G.M. (1997a). Grading leniency is a removable contaminant of student ratings. American Psychologist 52:1209–1217

    Article  Google Scholar 

  • Greenwald A.G., Gillmore G.M. (1997b). No pain, no gain? The importance of measuring course workload in student ratings of instruction. Journal of Educational Psychology 89:743–751

    Article  Google Scholar 

  • Greenwald A.G., Gillmore G.M. (1998). How useful are student ratings? Reactions to comments on the current issues section. American Psychologist 53:1228–1229

    Article  Google Scholar 

  • Guzzo R.A., Jette R.D., Katzell R.A. (1985) The effects of psychologically based intervention programs on worker productivity: A meta-analysis. Personnel Psychology 38:275–291

    Article  Google Scholar 

  • Hernández-Lloreda M.V., Colmenares F., Martínez-Arias R. (2004). Application of piecewise hierarchical linear growth modeling to the study of continuity in behavioral development of baboons (Papio hamadryas). Journal of Comparative Psychology 118:316–324

    Article  Google Scholar 

  • Hofmann D.A., Jacobs R., Baratta J.E. (1993). Dynamic criteria and the measurement of change. Journal of Applied Psychology 78:194–204

    Article  Google Scholar 

  • Howell A.J., Symbaluk D.G. (2001). Published student ratings of instruction: Revealing and reconciling the views of students and faculty. Journal of Educational Psychology 93:790–796

    Article  Google Scholar 

  • James L.R., Demaree R.Q., Wolf G. (1984). Estimating withingroup interrater reliability with and without response bias. Journal of Applied Psychology 69:85–98

    Article  Google Scholar 

  • Klein K.J., Dansereau F., Hall R.J. (1994). Levels issues in theory development, data collection, and analysis. Academy of Management Review 19:195–229

    Article  Google Scholar 

  • Kluger A.N., DeNisi A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin 119:254–284

    Article  Google Scholar 

  • L’Hommedieu R., Menges R.J., Brinko K.T. (1990). Methodological explanations for the modest effects of feedback from student ratings. Journal of Educational Psychology 82:232–241

    Article  Google Scholar 

  • Longford N. (1993). Random coefficient models. Oxford University Press, Oxford

    Google Scholar 

  • Marsh H.W. (1991). Multidimensional students’ evaluations of teaching effectiveness: A test of alternative higher-order structures. Journal of Educational Psychology 83:285–296

    Article  Google Scholar 

  • Marsh H.W. (1994). Comments to: “Review of the dimensionality of student ratings of instruction: I. Introductory remarks. II. Aggregation of factor studies. III. A meta-analysis of the factor studies”. Instructional Evaluation and Faculty Development 14:13–19

    Google Scholar 

  • Marsh H.W., Hocevar D. (1991). The multidimensionality of students’ evaluations of teaching effectiveness: The generality of factor structures across academic discipline, instructor level, and course level. Teaching and Teacher Education 7:9–18

    Article  Google Scholar 

  • Marsh H.W., Roche L.A. (1993). The use of student evaluations and an individually structured intervention to enhance university teaching effectiveness. American Educational Research Journal 30:217–251

    Article  Google Scholar 

  • Marsh H.W., Roche L.A. (1997). Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist 52:1187–1197

    Article  Google Scholar 

  • Marsh H.W., Roche L.A. (2000). Effects of grading leniency and low workload on students’ evaluations of teaching: Popular myth, bias, validity, or innocent bystanders? Journal of Educational Psychology 92:202–228

    Article  Google Scholar 

  • McKeachie W.J. (1997). Student ratings—the validity of use. American Psychologist 52:1218–1225

    Article  Google Scholar 

  • Pinheiro J.C., Bates D.M. (2000). Mixed-effects models in S and S-PLUS. Springer, New York

    Google Scholar 

  • R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  • Raftery A.E. (1995). Bayesian model selection in social research. Sociological Methodology 25:111–196

    Article  Google Scholar 

  • Raudenbush S.W., Bryk A.S. (2002). Hierarchical linear models: Applications and data analysis methods, 2nd edn. Sage, Thousand Oaks, CA

    Google Scholar 

  • Rousseau D. (1985). Issues of level in organizational research: Multi-level and cross-level perspectives. In: Cummings L.L., Staw B.M. (eds). Research in organizational behavior, vol. 7. JAI Press, Greenwich, CT, pp. 1–37

    Google Scholar 

  • Schwarz G. (1978). Estimating the dimension of a model. Annals of Statistics 6:461–464

    Google Scholar 

  • Stevens J.J., Aleamoni L.M. (1985). The use of evaluative feedback for instructional improvement: A longitudinal perspective. Instructional Science 13:285–304

    Article  Google Scholar 

  • Ting K. (2000). Cross-level effects of class characteristics on students’ perceptions of teaching quality. Journal of Educational Psychology 92:818–825

    Article  Google Scholar 

  • Wilhelm W.B. (2004). The relative influence of published teaching evaluations and other instructor attributes on course choice. Journal of Marketing Education 26:17–30

    Article  Google Scholar 

  • Wood R.E., Locke E.A. (1990). Goal setting and strategy effects on complex tasks. In: Cummings L.L., Staw B.M. (eds). Research in organizational behavior, vol. 12. JAI Press, Greenwich, CT, pp. 73–109

    Google Scholar 

Download references

Acknowledgements

We would like to thank Jessica Ippolito, Anette Kluge, Jan Schilling, and two anonymous reviewers for their helpful comments on an earlier version of this article, and Paul D. Bliese for answering questions on random coefficient modeling and data aggregation. Further thanks go to Susannah Goss for improving the language of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonas W. B. Lang.

Appendix: Items used in the study

Appendix: Items used in the study

  1. 1.

    What grade would you give the instructor?

  2. 2.

    What grade would you give this class?

  3. 3.

    The instructor was not particularly interested in the students’ progress. (R)

  4. 4.

    The instructor’s attitude toward the students was cold and unpersonal. (R)

  5. 5.

    The instructor seemed to see teaching as a duty and a routine activity. (R)

  6. 6.

    The instructor was clearly only interested in getting through the material. (R)

  7. 7.

    It was easy to follow the material covered in the course.

  8. 8.

    Too much material was covered in the course. (R)

  9. 9.

    The pace was too fast. (R)

  10. 10.

    You had to put in a lot of extra work to keep up with the course. (R)

  11. 11.

    The course was often confusing because it seemed to lack structure, and it was easy to get lost. (R)

  12. 12.

    The instructor presented the material in a clear and understandable manner.

  13. 13.

    The instructor planned and delivered the course well.

  14. 14.

    The course was clearly structured.

Note: Original German versions of the items may be found in Diehl (2002). Items 1 and 2 are from the global subscale, items 3–6 from the rapport subscale, items 7–10 from the difficulty subscale, and items 10–14 from the teaching skill subscale. R = items scored reversely to form the overall index.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lang, J.W.B., Kersting, M. Regular Feedback from Student Ratings of Instruction: Do College Teachers Improve their Ratings in the Long Run?. Instr Sci 35, 187–205 (2007). https://doi.org/10.1007/s11251-006-9006-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11251-006-9006-1

Keywords

Navigation