Once good teaching, always good teaching? The differential stability of student perceptions of teaching quality

  • Holger GaertnerEmail author
  • Martin Brunner


In many countries, students are asked about their perceptions of teaching in order to make decisions about the further development of teaching practices on the basis of this feedback. The stability of this measurement of teaching quality is a prerequisite for the ability to generalize the results to other teaching situations. The present study aims to expand the extant empirical body of knowledge on the effects of situational factors on the stability of students’ perceptions of teaching quality. Therefore, we investigate whether the degree of stability is moderated by three situational factors: time between assessments, subjects taught by teachers, and students’ grade levels. To this end, we analyzed data from a web-based student feedback system. The study involved 497 teachers, each of whom conducted two student surveys. We examined the differential stability of student perceptions of 16 teaching constructs that were operationalized as latent correlations between aggregated student perceptions of the same teacher’s teaching. Testing metric invariance indicated that student ratings provided measures of teaching constructs that were invariant across time, subjects, and grade levels. Stability was moderated to some extent by grade level but not by subjects taught nor time spacing between surveys. The results provide evidence of the extent to which situational factors may affect the stability of student perceptions of teaching constructs. The generalizability of the students’ feedback results to other teaching situations is discussed.


Stability Student perception Instruction Generalizability Situation 


  1. Altricher, H., & Maag Merki, K. (2010). Steuerung der Entwicklung des Schulwesens [Steering the development of the school system]. In H. Altricher & K. Maag Merki (Eds.), Handbuch Neue Steuerung im Schulsystem (pp. 15–40). Wiesbaden: Verlag für Sozialwissenschaften.CrossRefGoogle Scholar
  2. Balch, R. T. (2012). The validation of a student survey on teacher practice. Nashville: Vanderbilt University.Google Scholar
  3. Bell, C. A., Gitomer, D. H., McGaffrey, D. F., Hamre, B. K., Pianta, R. C., & Qi, Y. (2012). An argument approach to observation protocol validity. Educational Assessment, 17(2), 62–87. Scholar
  4. Cheung, G. W., & Rensvold, R. B. (2002). Evaluation goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233–255.CrossRefGoogle Scholar
  5. Clausen, M. (2002). Unterrichtsqualität: eine Frage der Perspektive? [Quality of instruction: a matter of perspective?]. Münster: Waxmann.Google Scholar
  6. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Erlbaum.Google Scholar
  7. Cole, D. A., & Maxwell, S. E. (2003). Testing mediational models with longitudinal data: questions and tips in the use of structural equation modeling. Journal of Abnormal Psychology, 112, 558–577.CrossRefGoogle Scholar
  8. Cumming, G. (2014). The new statistics: why and how. Psychological Science, 25(1), 7–29.CrossRefGoogle Scholar
  9. de Jong, R., & Westerhof, K. J. (2001). The quality of student ratings of teacher behaviour. Learning Environments Research, 4, 51–85.CrossRefGoogle Scholar
  10. Ditton, H., & Arnoldt, B. (2004). Schülerbefragung zum Fachunterricht: Feedback an Lehrkräfte [Surveying students about instructing: feedback for teachers]. Empirische Pädagogik, 18, 115–139.Google Scholar
  11. Fauth, B., Decristan, J., Rieser, S., Klieme, E., & Büttner, G. (2014). Student ratings of teaching quality in primary school: dimensions and prediction of student outcomes. Learning and Instruction, 29, 1–9.CrossRefGoogle Scholar
  12. Ferguson, R. F. (2012). Can student surveys measure teaching quality? Phi Delta Kappan, 94, 24–28.CrossRefGoogle Scholar
  13. Gaertner, H. (2010). Wie Schülerinnen und Schüler ihre Lernumwelt wahrnehmen: Ein Vergleich verschiedener Maße zur Übereinstimmung von Schülerwahrnehmungen [How students perceive their learning environment: a comparison of four indices of interrater agreement]. Zeitschrift für Pädagogische Psychologie, 24, 111–122.CrossRefGoogle Scholar
  14. Gaertner, H. (2014). Effects of student feedback as a method of self-evaluating the quality of teaching. Studies in Educational Evaluation, 42, 91–99.CrossRefGoogle Scholar
  15. Goe, L., Bell, C., & Little, O. (2008). Approaches to evaluating teacher effectiveness: a research synthesis. Washington, DC: National Comprehensive Center for Teacher Quality.Google Scholar
  16. Hamre, B., & Pianta, R. (2010). Classroom environments and developmental processes: conceptualization and measurement. In J. L. Meece & J. S. Eccles (Eds.), Handbook of research on schools, schooling and human development (pp. 25–41). New York: Routledge.Google Scholar
  17. Harker, R., & Tymms, P. (2004). The effects of student composition on school outcomes. School Effectiveness and School Improvement, 15(2), 177–199.Google Scholar
  18. Harris, D. N. (2010). How do school peers influence student educational outcomes? Theory and evidence from economics and other social sciences. Teachers College Record, 112(4), 1163–1197.Google Scholar
  19. Hattie, J. (2009). Visible learning. A synthesis of over 800 meta-analyses relating to achievement. London: Routledge.Google Scholar
  20. Hiebert, J., & Morris, A. K. (2012). Teaching, rather than teachers, as a path toward improving classroom instruction. Journal of Teacher Education, 63, 92–102.CrossRefGoogle Scholar
  21. Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.CrossRefGoogle Scholar
  22. Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.CrossRefGoogle Scholar
  23. Kane, T. J., & Staiger, D. O. (2012). Gathering feedback for teaching: combining high-quality observations with student surveys and achievement gains. Seattle: Bill & Melinda Gates Foundation: MET Project.Google Scholar
  24. Kennedy, M. M. (2010). Attribution error and the quest for teaching quality. Educational Researcher, 39, 591–598.CrossRefGoogle Scholar
  25. Kimball, S. M., & Milanowski, A. T. (2009). Examining teacher evaluation validity and leadership decision making within a standards-based evaluation system. Educational Administration Quarterly, 45, 34–70.CrossRefGoogle Scholar
  26. Klieme, E., & Rakoczy, K. (2008). Empirische Unterrichtsforschung und Fachdidaktik [Empirical instruction research and didactics]. Zeitschrift für Pädagogik, 54, 222–237.Google Scholar
  27. Klieme, E., Pauli, C., & Reusser, K. (2009). The Pythagoras study: investigating effects of teaching and learning in Swiss and German mathematics classrooms. In T. Seidel & P. Najvar (Eds.), The power of video studies in investigating teaching and learning in the classroom (pp. 137–160). Münster: Waxmann.Google Scholar
  28. Kloss, J. (2013). Grundschüler als Experten für Unterricht [Primary school students as experts for teaching]. Frankfurt: Peter Lang.Google Scholar
  29. Kratz, H. E. (1896). Characteristics of the best teachers as recognized by children. The Pedagogical Seminary, 3, 413–418.CrossRefGoogle Scholar
  30. Kunter, M., Kleickmann, T., Klusmann, U., & Richter, D. (2013). The development of teachers’ professional competence. In M. Kunter, J. Baumert, W. Blum, U. Klusmann, S. Krauss, & J. Neubrand (Eds.), Cognitive activation in the mathematics classroom and professional competence of teachers (pp. 63–79). New York: Springer.CrossRefGoogle Scholar
  31. Kyriakides, L. (2005). Drawing from teacher effectiveness research and research into teacher interpersonal behaviour to establish a teacher evaluation system: a study on the use of student ratings to evaluate teacher behaviour. Journal of Classroom Interaction, 40, 44–66.Google Scholar
  32. Lai, M. K., & Schildkamp, K. (2013). Data-based decision making: an overview. In K. Schildkamp, M. K. Lai, & L. Earl (Eds.), Data-based decision making in education (pp. 9–22). Dordrecht: Springer.CrossRefGoogle Scholar
  33. Lei, X., Li, H., & Leroux, A. J. (2018). Does a teacher’s classroom observation rating vary across multiple classrooms? Educational Assessment, Evaluation and Accountability, 30, 27–46. Scholar
  34. Lenske, G. (2016). Schülerfeedback in der Grundschule [Student feedback in primary school]. Münster: Waxmann.Google Scholar
  35. Marsh, H. W., & Hocevar, D. (1991). Students’ evaluations of teaching effectiveness: the stability of mean ratings of the same teachers over a 13-year period. Teaching and Teacher Education, 7(4), 303–314.CrossRefGoogle Scholar
  36. MET Project. (2010). Learning about teaching: initial findings from the measures of effective teaching project. Seattle: Bill & Melinda Gates Foundation.Google Scholar
  37. MET Project. (2012). Asking students about teaching: student perception surveys and their implementation. Seattle: Bill & Melinda Gates Foundation.Google Scholar
  38. Murray, H. G. (2007). Low-inference teaching behaviors and college teaching effectiveness: recent developments and controversies. In R. P. Perry & J. C. Smart (Eds.), The scholarship of teaching and learning in higher education: an evidence-based perspective (pp. 145–200). New York: Springer.CrossRefGoogle Scholar
  39. Neppl, T. K., Donnellan, M. B., Scaramella, L. V., Widaman, K. F., Spilman, S. K., Ontai, L. L., & Conger, R. D. (2010). Differential stability of temperament and personality from toddlerhood to middle childhood. Journal of Research in Personality, 44, 386–396.CrossRefGoogle Scholar
  40. OECD. (2013). Teachers for the 21st century: using evaluations to improve teaching. Paris: OECD.CrossRefGoogle Scholar
  41. OECD. (2015). Education policy outlook 2015: making reforms happen. Paris: OECD.Google Scholar
  42. OECD. (2016). PISA 2015 assessment and analytical framework: science, reading, mathematic and financial literacy. Paris: OECD Publishing.CrossRefGoogle Scholar
  43. Opdenakker, M.-C., & Van Damme, J. (2007). Do school context, student composition and school leadership affect school practice and outcomes in secondary education? British Educational Research Journal, 33(2), 179–206.CrossRefGoogle Scholar
  44. Oser, F. K., & Baeriswyl, F. J. (2001). Choreographies of teaching: bridging instruction to learning. In V. Richardson (Ed.), Handbook of research on teaching (4th ed., pp. 1031–1065). Washington, DC: American Educational Research Association.Google Scholar
  45. Praetorius, A.-K. (2014). Messung von Unterrichtsqualität durch Ratings [Measure of instruction quality within observer ratings]. Münster: Waxmann.Google Scholar
  46. Praetorius, A.-K., Vieluf, S., Saß, S., Bernholt, A., & Klieme, E. (2015). The same in German as in English? Investigating the subject-specificity of teaching quality. Zeitschrift für Erziehungswissenschaft, 19(1), 1–19.Google Scholar
  47. Rakoczy, K. (2008). Motivationsunterstützung im Mathematikunterricht [Supporting students’ motivation in mathematics instruction]. Münster: Waxmann.Google Scholar
  48. Rantanen, P. (2013). The number of feedbacks needed for reliable evaluation: a multilevel analysis of the reliability, stability and generalisability of students’ evaluation of teaching. Assessment & Evaluation in Higher Education, 38, 224–239.CrossRefGoogle Scholar
  49. Roberts, B. W., & DelVecchio, W. F. (2000). The rank-order consistency of personality traits from childhood to old age: a quantitative review of longitudinal studies. Psychological Bulletin, 126, 3–25.CrossRefGoogle Scholar
  50. Slavin, R. E. (1995). A model of effective instruction. The Educational Forum, 59, 166–176.CrossRefGoogle Scholar
  51. Thiel, F., & Thillmann, K. (2012). Interne evaluation [School self-evaluation]. In A. Wacker, U. Maier, & J. Wissinger (Eds.), Schul- und Unterrichtsreform durch ergebnisorientierte Steuerung–Empirische Befunde und forschungsmethodische Implikationen (pp. 35–56). Wiesbaden: Springer.CrossRefGoogle Scholar
  52. van de Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement invariance. European Journal of Developmental Psychology, 9, 486–492.CrossRefGoogle Scholar
  53. Wagner, W., Göllner, R., Helmke, A., Trautwein, U., & Lüdtke, O. (2013). Construct validity of student perceptions of instructional quality is high, but not perfect: dimensionality and generalizability of domain-independent assessments. Learning and Instruction, 28, 1–11.CrossRefGoogle Scholar
  54. Wagner, W., Göllner, R., Werth, S., Voss, T., Schmitz, B., & Trautwein, U. (2016). Student and teacher ratings of instructional quality: consistency of ratings over time, agreement, and predictive power. Journal of Educational Psychology, 108(5), 705–721.CrossRefGoogle Scholar
  55. Wurster, S., & Gaertner, H. (2013). Erfassung von Bildungsprozessen im Rahmen von Schulinspektion und deren potenzieller Nutzen für die empirische Bildungsforschung [Assessment of educational processes within school inspection and their potential use for education research]. Unterrichtswissenschaft, 41, 217–235.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute for School Quality Improvement (ISQ)Freie Universität BerlinBerlinGermany
  2. 2.Universität PotsdamPotsdamGermany

Personalised recommendations