Instructional Science

, Volume 35, Issue 3, pp 187–205 | Cite as

Regular Feedback from Student Ratings of Instruction: Do College Teachers Improve their Ratings in the Long Run?

  • Jonas W. B. Lang
  • Martin Kersting


The authors examined whether feedback from student ratings of instruction not augmented with consultation helps college teachers to improve their student ratings on a long-term basis. The study reported was conducted in an institution where no previous teaching-effectiveness evaluations had taken place. At the end of each of four consecutive semesters, student ratings were assessed and teachers were provided with feedback. Data from 3122 questionnaires evaluating 12 teachers were analyzed using polynomial and piecewise random coefficient models. Results revealed that student ratings increased from the no-feedback baseline semester to the second semester and then gradually decreased from the second to the fourth semester, although feedback was provided after each semester. The findings suggest that student ratings not augmented with consultation are far less effective than typically assumed when considered from a long-term perspective.


feedback long-term effects student ratings teaching effectiveness 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



We would like to thank Jessica Ippolito, Anette Kluge, Jan Schilling, and two anonymous reviewers for their helpful comments on an earlier version of this article, and Paul D. Bliese for answering questions on random coefficient modeling and data aggregation. Further thanks go to Susannah Goss for improving the language of this article.


  1. Abrami P.C., d’Apollonia S. (1991). Multidimensional students’ evaluations of teaching effectiveness? Generalizability of “N = 1” research: Comment on Marsh (1991). Journal of Educational Psychology 83:411–415CrossRefGoogle Scholar
  2. Abrami P.C., d’Apollonia S., Cohen P.A. (1990). Validity of student ratings of instruction: What we know and what we do not. Journal of Educational Psychology 82:219–231CrossRefGoogle Scholar
  3. Adair J.G., Sharpe D., Huynh C.L. (1989). Hawthorne control procedures in educational experiments: A reconsideration of their use and effectiveness. Review of Educational Research 59:215–228CrossRefGoogle Scholar
  4. Akaike H. (1973). Information theory as an extension of the maximum likelihood principle. In: Petrov B.N., Csaki F. (eds). Second international symposium on information theory. Akademiai Kiado, Budapest, Hungary, pp. 267–281Google Scholar
  5. Armstrong S.J. (1998). Are student ratings of instruction useful?. American Psychologist 53:1223–1224CrossRefGoogle Scholar
  6. Basow S.A. (1995). Student evaluations of college professors: When gender matters. Journal of Educational Psychology 87:656–665CrossRefGoogle Scholar
  7. Biesanz J.C., Deeb-Sossa N., Papadakis A.A., Bollen K.A., Curran P.J. (2004). The role of coding time in estimating and interpreting growth curve models. Psychological Methods 9:30–52CrossRefGoogle Scholar
  8. Bliese P.D. (2000). Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. In: Klein K.J., Kozlowski S.W.J. (eds). Multilevel theory, research, and methods in organizations: Foundations, extensions, and new directions. Jossey-Bass, San Francisco, CA, pp. 349–381Google Scholar
  9. Bliese P.D., Jex S.M. (2002). Incorporating a multilevel perspective into occupational stress research: Theoretical, methodological, and practical implications. Journal of Occupational Health Psychology 7:265–276CrossRefGoogle Scholar
  10. Bliese P.D., Ployhart R.E. (2002). Growth modeling using random coefficient models: Model building, testing, and illustrations. Organizational Research Methods 5:362–387Google Scholar
  11. Bryk A.S., Raudenbush S.W. (1987). Applications of hierarchical linear models to assessing change. Psychological Bulletin 101:147–158CrossRefGoogle Scholar
  12. Campbell D.T., Stanley J.C. (1963). Experimental and quasi-experimental designs for research. Rand McNally, ChicagoGoogle Scholar
  13. Carlson K.D., Schmidt F.L. (1999). Impact of experimental design on effect size: Findings from the research literature on training. Journal of Applied Psychology 84:851–862CrossRefGoogle Scholar
  14. Carter R.E. (1989). Comparison of criteria for academic promotion of medical-school and university-based psychologists. Professional Psychology: Research and Practice 20:400–403CrossRefGoogle Scholar
  15. Cashin W.E., Downey R.G. (1992). Using global student rating items for summative evaluation. Journal of Educational Psychology 84:563–572CrossRefGoogle Scholar
  16. Cohen J., Cohen P., West S.G., Aiken L.S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed). Erlbaum, Mahwah, NJGoogle Scholar
  17. Cohen P.A. (1980). Effectiveness of student-rating feedback for improving college instruction: A meta-analysis. Research in Higher Education 13:321–341CrossRefGoogle Scholar
  18. Coleman J., McKeachie W.J. (1981). Effects of instructor/course evaluations on student course selection. Journal of Educational Psychology 73:224–226CrossRefGoogle Scholar
  19. Cronbach L.J., Furby L. (1970). How we should measure “change”: Or should we? Psychological Bulletin 74:68–80CrossRefGoogle Scholar
  20. d’Apollonia S., Abrami P.C. (1997). Navigating student ratings of instruction. American Psychologist 52:1198–1208CrossRefGoogle Scholar
  21. DeShon R.P., Ployhart R.E., Sacco J.M. (1998). The estimation of reliability in longitudinal models. International Journal of Behavior and Development 22:493–515CrossRefGoogle Scholar
  22. Diehl, J.M. (2002) VBVOR–VBREF. Fragebögen zur studentischen Evaluation von Hochschulveranstaltungen – Manual. [VBVOR – VBREF. Questionnaires for students’ evaluations of college courses–Manual]. Retrieved on March 17, 2005 from
  23. Diehl J.M. (2003). Normierung zweier Fragebögen zur studentischen Beurteilung von Vorlesungen und Seminaren [Student evaluations of lectures and seminars: Norms for two recently developed questionnaires]. Psychologie in Erziehung und Unterricht 50:27–42Google Scholar
  24. Diehl J.M., Kohr H.-U. (1977). Entwicklung eines Fragebogens zur Beurteilung von Hochschulveranstaltungen im Fach Psychologie [Development of a psychology course evaluation questionnaire]. Psychologie in Erziehung und Unterricht 24:61–75Google Scholar
  25. Firebaugh G. (1978). A rule for inferring individual-level relationships from aggregate data. American Sociological Review 43:557–572CrossRefGoogle Scholar
  26. Franklin J., Theall M. (2002). Faculty thinking about the design and evaluation of instruction. In: Hativa N., Goodyear P. (eds). Teacher thinking, beliefs and knowledge in higher education. Kluwer, Dordrecht, The NetherlandsGoogle Scholar
  27. Greenwald A.G. (1997). Validity concerns and usefulness of student ratings of instruction. American Psychologist 52:1182–1186CrossRefGoogle Scholar
  28. Greenwald A.G., Gillmore G.M. (1997a). Grading leniency is a removable contaminant of student ratings. American Psychologist 52:1209–1217CrossRefGoogle Scholar
  29. Greenwald A.G., Gillmore G.M. (1997b). No pain, no gain? The importance of measuring course workload in student ratings of instruction. Journal of Educational Psychology 89:743–751CrossRefGoogle Scholar
  30. Greenwald A.G., Gillmore G.M. (1998). How useful are student ratings? Reactions to comments on the current issues section. American Psychologist 53:1228–1229CrossRefGoogle Scholar
  31. Guzzo R.A., Jette R.D., Katzell R.A. (1985) The effects of psychologically based intervention programs on worker productivity: A meta-analysis. Personnel Psychology 38:275–291CrossRefGoogle Scholar
  32. Hernández-Lloreda M.V., Colmenares F., Martínez-Arias R. (2004). Application of piecewise hierarchical linear growth modeling to the study of continuity in behavioral development of baboons (Papio hamadryas). Journal of Comparative Psychology 118:316–324CrossRefGoogle Scholar
  33. Hofmann D.A., Jacobs R., Baratta J.E. (1993). Dynamic criteria and the measurement of change. Journal of Applied Psychology 78:194–204CrossRefGoogle Scholar
  34. Howell A.J., Symbaluk D.G. (2001). Published student ratings of instruction: Revealing and reconciling the views of students and faculty. Journal of Educational Psychology 93:790–796CrossRefGoogle Scholar
  35. James L.R., Demaree R.Q., Wolf G. (1984). Estimating withingroup interrater reliability with and without response bias. Journal of Applied Psychology 69:85–98CrossRefGoogle Scholar
  36. Klein K.J., Dansereau F., Hall R.J. (1994). Levels issues in theory development, data collection, and analysis. Academy of Management Review 19:195–229CrossRefGoogle Scholar
  37. Kluger A.N., DeNisi A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin 119:254–284CrossRefGoogle Scholar
  38. L’Hommedieu R., Menges R.J., Brinko K.T. (1990). Methodological explanations for the modest effects of feedback from student ratings. Journal of Educational Psychology 82:232–241CrossRefGoogle Scholar
  39. Longford N. (1993). Random coefficient models. Oxford University Press, OxfordGoogle Scholar
  40. Marsh H.W. (1991). Multidimensional students’ evaluations of teaching effectiveness: A test of alternative higher-order structures. Journal of Educational Psychology 83:285–296CrossRefGoogle Scholar
  41. Marsh H.W. (1994). Comments to: “Review of the dimensionality of student ratings of instruction: I. Introductory remarks. II. Aggregation of factor studies. III. A meta-analysis of the factor studies”. Instructional Evaluation and Faculty Development 14:13–19Google Scholar
  42. Marsh H.W., Hocevar D. (1991). The multidimensionality of students’ evaluations of teaching effectiveness: The generality of factor structures across academic discipline, instructor level, and course level. Teaching and Teacher Education 7:9–18CrossRefGoogle Scholar
  43. Marsh H.W., Roche L.A. (1993). The use of student evaluations and an individually structured intervention to enhance university teaching effectiveness. American Educational Research Journal 30:217–251CrossRefGoogle Scholar
  44. Marsh H.W., Roche L.A. (1997). Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist 52:1187–1197CrossRefGoogle Scholar
  45. Marsh H.W., Roche L.A. (2000). Effects of grading leniency and low workload on students’ evaluations of teaching: Popular myth, bias, validity, or innocent bystanders? Journal of Educational Psychology 92:202–228CrossRefGoogle Scholar
  46. McKeachie W.J. (1997). Student ratings—the validity of use. American Psychologist 52:1218–1225CrossRefGoogle Scholar
  47. Pinheiro J.C., Bates D.M. (2000). Mixed-effects models in S and S-PLUS. Springer, New YorkGoogle Scholar
  48. R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  49. Raftery A.E. (1995). Bayesian model selection in social research. Sociological Methodology 25:111–196CrossRefGoogle Scholar
  50. Raudenbush S.W., Bryk A.S. (2002). Hierarchical linear models: Applications and data analysis methods, 2nd edn. Sage, Thousand Oaks, CAGoogle Scholar
  51. Rousseau D. (1985). Issues of level in organizational research: Multi-level and cross-level perspectives. In: Cummings L.L., Staw B.M. (eds). Research in organizational behavior, vol. 7. JAI Press, Greenwich, CT, pp. 1–37Google Scholar
  52. Schwarz G. (1978). Estimating the dimension of a model. Annals of Statistics 6:461–464Google Scholar
  53. Stevens J.J., Aleamoni L.M. (1985). The use of evaluative feedback for instructional improvement: A longitudinal perspective. Instructional Science 13:285–304CrossRefGoogle Scholar
  54. Ting K. (2000). Cross-level effects of class characteristics on students’ perceptions of teaching quality. Journal of Educational Psychology 92:818–825CrossRefGoogle Scholar
  55. Wilhelm W.B. (2004). The relative influence of published teaching evaluations and other instructor attributes on course choice. Journal of Marketing Education 26:17–30CrossRefGoogle Scholar
  56. Wood R.E., Locke E.A. (1990). Goal setting and strategy effects on complex tasks. In: Cummings L.L., Staw B.M. (eds). Research in organizational behavior, vol. 12. JAI Press, Greenwich, CT, pp. 73–109Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  1. 1.Institute of PsychologyRWTH Aachen UniversityAachenGermany

Personalised recommendations