Skip to main content
Log in

Once good teaching, always good teaching? The differential stability of student perceptions of teaching quality

  • Published:
Educational Assessment, Evaluation and Accountability Aims and scope Submit manuscript

Abstract

In many countries, students are asked about their perceptions of teaching in order to make decisions about the further development of teaching practices on the basis of this feedback. The stability of this measurement of teaching quality is a prerequisite for the ability to generalize the results to other teaching situations. The present study aims to expand the extant empirical body of knowledge on the effects of situational factors on the stability of students’ perceptions of teaching quality. Therefore, we investigate whether the degree of stability is moderated by three situational factors: time between assessments, subjects taught by teachers, and students’ grade levels. To this end, we analyzed data from a web-based student feedback system. The study involved 497 teachers, each of whom conducted two student surveys. We examined the differential stability of student perceptions of 16 teaching constructs that were operationalized as latent correlations between aggregated student perceptions of the same teacher’s teaching. Testing metric invariance indicated that student ratings provided measures of teaching constructs that were invariant across time, subjects, and grade levels. Stability was moderated to some extent by grade level but not by subjects taught nor time spacing between surveys. The results provide evidence of the extent to which situational factors may affect the stability of student perceptions of teaching constructs. The generalizability of the students’ feedback results to other teaching situations is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Finland)

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. A total of 22 of these 96 teachers conducted both surveys in the same school year. In these cases, the very same class or a parallel class may have been surveyed. Based on the available data, the two cases cannot be distinguished. However, the general pattern of results for group B does not change if these 22 surveys are excluded from the analysis.

  2. There was one exception: achievement expectations across grade level ΔCFI = .011.

  3. Subgroup B showed small increases in the mean values at the second measurement point (up to a maximum difference of 0.09. on the original scale from 1 to 4). However, in most cases (12 out of 16 comparisons), these differences did not reach statistical significance.

References

  • Altricher, H., & Maag Merki, K. (2010). Steuerung der Entwicklung des Schulwesens [Steering the development of the school system]. In H. Altricher & K. Maag Merki (Eds.), Handbuch Neue Steuerung im Schulsystem (pp. 15–40). Wiesbaden: Verlag für Sozialwissenschaften.

    Chapter  Google Scholar 

  • Balch, R. T. (2012). The validation of a student survey on teacher practice. Nashville: Vanderbilt University.

    Google Scholar 

  • Bell, C. A., Gitomer, D. H., McGaffrey, D. F., Hamre, B. K., Pianta, R. C., & Qi, Y. (2012). An argument approach to observation protocol validity. Educational Assessment, 17(2), 62–87. https://doi.org/10.1080/10627197.2012.715014.

    Article  Google Scholar 

  • Cheung, G. W., & Rensvold, R. B. (2002). Evaluation goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233–255.

    Article  Google Scholar 

  • Clausen, M. (2002). Unterrichtsqualität: eine Frage der Perspektive? [Quality of instruction: a matter of perspective?]. Münster: Waxmann.

    Google Scholar 

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Erlbaum.

    Google Scholar 

  • Cole, D. A., & Maxwell, S. E. (2003). Testing mediational models with longitudinal data: questions and tips in the use of structural equation modeling. Journal of Abnormal Psychology, 112, 558–577.

    Article  Google Scholar 

  • Cumming, G. (2014). The new statistics: why and how. Psychological Science, 25(1), 7–29.

    Article  Google Scholar 

  • de Jong, R., & Westerhof, K. J. (2001). The quality of student ratings of teacher behaviour. Learning Environments Research, 4, 51–85.

    Article  Google Scholar 

  • Ditton, H., & Arnoldt, B. (2004). Schülerbefragung zum Fachunterricht: Feedback an Lehrkräfte [Surveying students about instructing: feedback for teachers]. Empirische Pädagogik, 18, 115–139.

    Google Scholar 

  • Fauth, B., Decristan, J., Rieser, S., Klieme, E., & Büttner, G. (2014). Student ratings of teaching quality in primary school: dimensions and prediction of student outcomes. Learning and Instruction, 29, 1–9.

    Article  Google Scholar 

  • Ferguson, R. F. (2012). Can student surveys measure teaching quality? Phi Delta Kappan, 94, 24–28.

    Article  Google Scholar 

  • Gaertner, H. (2010). Wie Schülerinnen und Schüler ihre Lernumwelt wahrnehmen: Ein Vergleich verschiedener Maße zur Übereinstimmung von Schülerwahrnehmungen [How students perceive their learning environment: a comparison of four indices of interrater agreement]. Zeitschrift für Pädagogische Psychologie, 24, 111–122.

    Article  Google Scholar 

  • Gaertner, H. (2014). Effects of student feedback as a method of self-evaluating the quality of teaching. Studies in Educational Evaluation, 42, 91–99.

    Article  Google Scholar 

  • Goe, L., Bell, C., & Little, O. (2008). Approaches to evaluating teacher effectiveness: a research synthesis. Washington, DC: National Comprehensive Center for Teacher Quality.

    Google Scholar 

  • Hamre, B., & Pianta, R. (2010). Classroom environments and developmental processes: conceptualization and measurement. In J. L. Meece & J. S. Eccles (Eds.), Handbook of research on schools, schooling and human development (pp. 25–41). New York: Routledge.

    Google Scholar 

  • Harker, R., & Tymms, P. (2004). The effects of student composition on school outcomes. School Effectiveness and School Improvement, 15(2), 177–199.

  • Harris, D. N. (2010). How do school peers influence student educational outcomes? Theory and evidence from economics and other social sciences. Teachers College Record, 112(4), 1163–1197.

    Google Scholar 

  • Hattie, J. (2009). Visible learning. A synthesis of over 800 meta-analyses relating to achievement. London: Routledge.

    Google Scholar 

  • Hiebert, J., & Morris, A. K. (2012). Teaching, rather than teachers, as a path toward improving classroom instruction. Journal of Teacher Education, 63, 92–102.

    Article  Google Scholar 

  • Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.

    Article  Google Scholar 

  • Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.

    Article  Google Scholar 

  • Kane, T. J., & Staiger, D. O. (2012). Gathering feedback for teaching: combining high-quality observations with student surveys and achievement gains. Seattle: Bill & Melinda Gates Foundation: MET Project.

    Google Scholar 

  • Kennedy, M. M. (2010). Attribution error and the quest for teaching quality. Educational Researcher, 39, 591–598.

    Article  Google Scholar 

  • Kimball, S. M., & Milanowski, A. T. (2009). Examining teacher evaluation validity and leadership decision making within a standards-based evaluation system. Educational Administration Quarterly, 45, 34–70.

    Article  Google Scholar 

  • Klieme, E., & Rakoczy, K. (2008). Empirische Unterrichtsforschung und Fachdidaktik [Empirical instruction research and didactics]. Zeitschrift für Pädagogik, 54, 222–237.

    Google Scholar 

  • Klieme, E., Pauli, C., & Reusser, K. (2009). The Pythagoras study: investigating effects of teaching and learning in Swiss and German mathematics classrooms. In T. Seidel & P. Najvar (Eds.), The power of video studies in investigating teaching and learning in the classroom (pp. 137–160). Münster: Waxmann.

    Google Scholar 

  • Kloss, J. (2013). Grundschüler als Experten für Unterricht [Primary school students as experts for teaching]. Frankfurt: Peter Lang.

    Google Scholar 

  • Kratz, H. E. (1896). Characteristics of the best teachers as recognized by children. The Pedagogical Seminary, 3, 413–418.

    Article  Google Scholar 

  • Kunter, M., Kleickmann, T., Klusmann, U., & Richter, D. (2013). The development of teachers’ professional competence. In M. Kunter, J. Baumert, W. Blum, U. Klusmann, S. Krauss, & J. Neubrand (Eds.), Cognitive activation in the mathematics classroom and professional competence of teachers (pp. 63–79). New York: Springer.

    Chapter  Google Scholar 

  • Kyriakides, L. (2005). Drawing from teacher effectiveness research and research into teacher interpersonal behaviour to establish a teacher evaluation system: a study on the use of student ratings to evaluate teacher behaviour. Journal of Classroom Interaction, 40, 44–66.

    Google Scholar 

  • Lai, M. K., & Schildkamp, K. (2013). Data-based decision making: an overview. In K. Schildkamp, M. K. Lai, & L. Earl (Eds.), Data-based decision making in education (pp. 9–22). Dordrecht: Springer.

    Chapter  Google Scholar 

  • Lei, X., Li, H., & Leroux, A. J. (2018). Does a teacher’s classroom observation rating vary across multiple classrooms? Educational Assessment, Evaluation and Accountability, 30, 27–46. https://doi.org/10.1007/s11092-017-9269-x.

    Article  Google Scholar 

  • Lenske, G. (2016). Schülerfeedback in der Grundschule [Student feedback in primary school]. Münster: Waxmann.

    Google Scholar 

  • Marsh, H. W., & Hocevar, D. (1991). Students’ evaluations of teaching effectiveness: the stability of mean ratings of the same teachers over a 13-year period. Teaching and Teacher Education, 7(4), 303–314.

    Article  Google Scholar 

  • MET Project. (2010). Learning about teaching: initial findings from the measures of effective teaching project. Seattle: Bill & Melinda Gates Foundation.

    Google Scholar 

  • MET Project. (2012). Asking students about teaching: student perception surveys and their implementation. Seattle: Bill & Melinda Gates Foundation.

    Google Scholar 

  • Murray, H. G. (2007). Low-inference teaching behaviors and college teaching effectiveness: recent developments and controversies. In R. P. Perry & J. C. Smart (Eds.), The scholarship of teaching and learning in higher education: an evidence-based perspective (pp. 145–200). New York: Springer.

    Chapter  Google Scholar 

  • Neppl, T. K., Donnellan, M. B., Scaramella, L. V., Widaman, K. F., Spilman, S. K., Ontai, L. L., & Conger, R. D. (2010). Differential stability of temperament and personality from toddlerhood to middle childhood. Journal of Research in Personality, 44, 386–396.

    Article  Google Scholar 

  • OECD. (2013). Teachers for the 21st century: using evaluations to improve teaching. Paris: OECD.

    Book  Google Scholar 

  • OECD. (2015). Education policy outlook 2015: making reforms happen. Paris: OECD.

    Google Scholar 

  • OECD. (2016). PISA 2015 assessment and analytical framework: science, reading, mathematic and financial literacy. Paris: OECD Publishing.

    Book  Google Scholar 

  • Opdenakker, M.-C., & Van Damme, J. (2007). Do school context, student composition and school leadership affect school practice and outcomes in secondary education? British Educational Research Journal, 33(2), 179–206.

    Article  Google Scholar 

  • Oser, F. K., & Baeriswyl, F. J. (2001). Choreographies of teaching: bridging instruction to learning. In V. Richardson (Ed.), Handbook of research on teaching (4th ed., pp. 1031–1065). Washington, DC: American Educational Research Association.

    Google Scholar 

  • Praetorius, A.-K. (2014). Messung von Unterrichtsqualität durch Ratings [Measure of instruction quality within observer ratings]. Münster: Waxmann.

    Google Scholar 

  • Praetorius, A.-K., Vieluf, S., Saß, S., Bernholt, A., & Klieme, E. (2015). The same in German as in English? Investigating the subject-specificity of teaching quality. Zeitschrift für Erziehungswissenschaft, 19(1), 1–19.

    Google Scholar 

  • Rakoczy, K. (2008). Motivationsunterstützung im Mathematikunterricht [Supporting students’ motivation in mathematics instruction]. Münster: Waxmann.

    Google Scholar 

  • Rantanen, P. (2013). The number of feedbacks needed for reliable evaluation: a multilevel analysis of the reliability, stability and generalisability of students’ evaluation of teaching. Assessment & Evaluation in Higher Education, 38, 224–239.

    Article  Google Scholar 

  • Roberts, B. W., & DelVecchio, W. F. (2000). The rank-order consistency of personality traits from childhood to old age: a quantitative review of longitudinal studies. Psychological Bulletin, 126, 3–25.

    Article  Google Scholar 

  • Slavin, R. E. (1995). A model of effective instruction. The Educational Forum, 59, 166–176.

    Article  Google Scholar 

  • Thiel, F., & Thillmann, K. (2012). Interne evaluation [School self-evaluation]. In A. Wacker, U. Maier, & J. Wissinger (Eds.), Schul- und Unterrichtsreform durch ergebnisorientierte Steuerung–Empirische Befunde und forschungsmethodische Implikationen (pp. 35–56). Wiesbaden: Springer.

    Chapter  Google Scholar 

  • van de Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement invariance. European Journal of Developmental Psychology, 9, 486–492.

    Article  Google Scholar 

  • Wagner, W., Göllner, R., Helmke, A., Trautwein, U., & Lüdtke, O. (2013). Construct validity of student perceptions of instructional quality is high, but not perfect: dimensionality and generalizability of domain-independent assessments. Learning and Instruction, 28, 1–11.

    Article  Google Scholar 

  • Wagner, W., Göllner, R., Werth, S., Voss, T., Schmitz, B., & Trautwein, U. (2016). Student and teacher ratings of instructional quality: consistency of ratings over time, agreement, and predictive power. Journal of Educational Psychology, 108(5), 705–721.

    Article  Google Scholar 

  • Wurster, S., & Gaertner, H. (2013). Erfassung von Bildungsprozessen im Rahmen von Schulinspektion und deren potenzieller Nutzen für die empirische Bildungsforschung [Assessment of educational processes within school inspection and their potential use for education research]. Unterrichtswissenschaft, 41, 217–235.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Holger Gaertner.

Appendices

Appendix A

Table 4 Fit indices for multi-group models over time/subject/grade

Appendix B

Exemplary MPlus input file for clarity

Below is an example syntax for the construct clarity and the comparison of stability over time (model A vs. B); for constructs with more items, the syntax is extended accordingly. All analyses were performed using the statistics program MPlus (version 7.4), and the estimation method is always maximum likelihood (ML). The grouping variable is always dichotomous and refers to the comparison over time (model A vs. B), subject (model A vs. C), or grade level (model A vs. D). The model test (Wald test) refers to the invariance testing.

figure a

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gaertner, H., Brunner, M. Once good teaching, always good teaching? The differential stability of student perceptions of teaching quality. Educ Asse Eval Acc 30, 159–182 (2018). https://doi.org/10.1007/s11092-018-9277-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11092-018-9277-5

Keywords

Navigation