Skip to main content

Advertisement

Log in

Teacher Evaluations and Pupil Achievement Gains: Evidence from Classroom Observations

  • Published:
De Economist Aims and scope Submit manuscript

Abstract

This chapter investigates the relationship between teacher evaluations and pupil performance gains in primary education. Teacher evaluations have been conducted by trained external evaluators who scored teachers on a detailed rubric containing 75 classroom practices. These practices reflect pedagogical, didactical and classroom organization competences considered crucial for effective teaching. Conditional on previous year test scores and several pupil and classroom characteristics the score on this rubric significantly predicts pupil performance gains on standardized tests in math, reading and spelling. Estimated test score gains are in the order of 0.4 standard deviations in math and spelling and 0.25 standard deviations in reading if a pupil is assigned a teacher from the top quartile instead of the bottom quartile of the distribution of the evaluation rubric. The observation rubric particularly seems to have potential to identify weak teachers. These observations may stimulate targeted teacher improvement plans and personnel decisions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. See e.g. Hanushek and Rivkin (2010) and Harris and Sass (2011) for reviews of the literature.

  2. See, among others, Rivkin et al. (2005), Clothfelter et al. (2006) and Jacob (2007), and Staiger and Rockoff (2010). Notable exceptions are two recent papers by Harris and Sass (2011) and Wiswall (2013) that find that teacher productivity keeps on increasing with experience (far) beyond the first couple of years on the job.

  3. Weisberg et al. (2009) show in an analysis of teacher evaluation systems in 14 school districts in the US that most districts only have a binary rating system in which more than 98 % of teachers rated the highest category (usually labeled “satisfactory”).

  4. The official competence requirements for teachers that are used by the Education Inspectorate of the Netherlands and that are part of the national Law on Occupations in Education (Wet Beroepen in Onderwijs) have been transferred to corresponding observable classroom practices in the rubric.

  5. Teacher experience has been weighted in the same way as the TES-score for a classroom. We define this as the teacher experience a classroom of children is confronted with.

  6. That is, parents who only finished the lowest level of secondary school or less.

  7. They conclude this from comparing estimates of teacher value added with and without controlling for previously unobserved parent characteristics, as well as from applying a quasi-experimental research design based on changes in teaching staff.

  8. Appendix Table 12 shows the relationship between previous year test scores and the start-of-year teacher evaluation score based on a regression with school- and grade-fixed effects. It seems that better teachers (based on the start-of-year teacher evaluation score) are assigned to weaker pupils regarding math (see column 1). No significant relationship was found however for spelling and reading and point estimates are of the opposite sign.

  9. It should be noted that the estimates for reading, spelling and math are not statistically significantly different from each other. Therefore, we should be cautious with interpreting these results as if the strength of the relationship is strongest for spelling and weakest for reading.

  10. A review of value-added estimates of teacher effectiveness in terms of SD in pupil test scores Hanushek and Rivkin (2010) shows that estimated coefficients are larger for math than for reading in every study.

  11. The difference in the average TES-score between teachers in the lowest quartile (i.e. 34 competences shown) and teachers in the highest quartile (i.e. 68 competences shown) comes down to 2.5 SD.

  12. Comparable estimates in Kane et al. (2011) for Cincinatti’s Teacher Evaluation System are 0.09 for Math and 0.13 for reading. KaSt12 find estimates in the order of 0.05 to 0.11 SD for four different rubric instruments used in the Measures of Effective Teaching Project.

  13. For instance, the cumulative effects of being in a class with five less pupils for three consecutive years on cognitive skills are estimated to be about 0.15 SD (Fredriksson et al. 2013; Krueger 1999). Estimates of the effect of a year in school on scores on cognitive tests are in the order of 0.2 SD (e.g. Angrist and Krueger 1991; Hansen et al. 2004; Webbink and Gerritsen 2013).

  14. Estimates in Table 8 should be compared with the ones in Table 6 on the restricted sample of 88 classrooms where two evaluations per teacher have been carried out.

  15. Similar findings have been reported elsewhere for various evaluation rubrics.

  16. We also investigated to what extent we could differentiate between (subsets of) competences by including them in the regressions simultaneously. However, disentangling factors of competence is difficult due to problems of multicollinearity as we work with 99 classrooms and (highly) correlated competences. A principle component analysis reveals that the first component explains about 25 % of the variance, and that the first 36 components of the 75 items explain about 90 percent. However, we could not give clear interpretations of the identified principle components, which kept us away from using them in our analysis.

  17. Comparable estimates for Cincinatti’s TES are 0.09 in math and 0.08 in reading (Kane et al. 2011). Rockoff and Speroni (2011) report a coefficient of 0.05 higher math achievement of a 1 SD higher rating by mentor teachers.

  18. Furthermore, 86 % of the principals agree that the rubric is a good instrument to distinguish weak from good teachers. Sixty % of surveyed teachers are positive about measuring teacher competences by classroom observations, as compared to 33 % being neutral and 7 % being negative. Just 13 % of teachers thinks that classroom observations do not succeed in obtaining a good picture of their competences.

References

  • Aaronson, D., Barrow, L., & Sander, W. (2007). Teachers and student achievement in the Chicago public high schools. Journal of Labor Economics, 25(1), 95–135.

    Article  Google Scholar 

  • Angrist, J., & Krueger, A. (1991). Does compulsory school attendance affect schooling and earnings? Quarterly Journal of Economics, 106(4), 979–1014.

    Article  Google Scholar 

  • Araujo, M., Carneiro, P., Cruz-Aguayo Y., and Schady, N. (2016). Teacher quality and learning outcomes in kindergarten, IZA Discussion Paper No. 9796.

  • Chetty, R., Friedman, N., & Rockoff, J. (2013a). The long-term impact of teachers: Teacher value-added and student outcomes in adulthood. NBER Working Paper No. 17699.

  • Chetty, R., Friedman, N., and Rockoff, J. (2013b). Measuring the impact of teachers I: Evaluating bias in teacher value-added estimates. NBER Working Paper No. 19423.

  • Clotfelter, C., Ladd, H., and Vigdor, J. (2006). Teacher–student matching and the assessment of teacher effectiveness NBER Working Paper No. 11936.

  • Fredriksson, P., Öckert, B., & Oosterbeek, H. (2013). Long-term effects of class size. Quarterly Journal of Economics, 128(1), 249–285.

    Article  Google Scholar 

  • Grossman, P., Loeb, S., Cohen, J., & Wyckoff, J. (2013). Measure for measure: The relationship between measures of instructional practice in middle school english language arts and teachers’ value-added scores. American Journal of Education, 119(3), 445–470.

    Article  Google Scholar 

  • Hansen, K., Heckman, J., & Mullen, K. (2004). The effect of schooling and ability on achievement test scores. Journal of Econometrics, 121(1–2), 39–98.

    Article  Google Scholar 

  • Hanushek, E., & Rivkin, S. (2010). Using value-added measures of teacher quality. American Economic Review, 100(2), 267–271.

    Article  Google Scholar 

  • Harris, D., & Sass, T. (2011). Teacher training, teacher quality and student achievement. Journal of Public Economics, 95, 798–812.

    Article  Google Scholar 

  • Harris, D., & Sass, T. (2014). Skills, Productivity and the evaluation of teacher performance. Economic of Education Review (forthcoming),. doi:10.1016/j.econedurev.2014.03.002.

  • Holtzapple, E. (2003). Criterion-related validity evidence for a standards-based teacher evaluation system. Journal of Personnel Evaluation in Education, 17(3), 207–219.

    Article  Google Scholar 

  • Jacob, B. (2007). The challenges of staffing urban schools with effective teachers. The Future of Children, 17(1), 129–154.

    Article  Google Scholar 

  • Jacob, B., & Lefgren, L. (2008). Principals as agents: Subjective performance measurement in education. Journal of Labor Economics, 26(1), 101–136.

    Article  Google Scholar 

  • Kane, T., & Staiger, D. (2008). Estimating teacher impacts on student achievement: An experimental evaluation. National Bureau of Economic Research Working Paper No. 14601.

  • Kane, T., Taylor, E., Tyler, J., & Wooten, A. (2011). Identifying effective classroom practices using student achievement data. Journal of Human Resources, 46(3), 587–613.

    Article  Google Scholar 

  • Kane, T., & Staiger, D. (2012). Gathering feedback for teaching: combining high-quality observations with students surveys and achievement gains. Measures of Effective Teaching Research Paper.

  • Kane, T., McCaffrey, D., Miller, T., & Staiger, D. (2013). Have we identified effective teachers?. Measures of Effective Teaching Research Paper: Validating measures of effective teaching using random assignment.

  • Krueger, A. (1999). Experimental estimates of education production functions. Quarterly Journal of Economics, 114, 497–532.

    Article  Google Scholar 

  • Ministry of Education. (2013). Kerncijfers 2008–2012: Onderwijs, cultuur en wetenschap, The Hague.

  • Nye, B., Konstantopoulos, S., & Hedges, L. (2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26(3), 237–257.

    Article  Google Scholar 

  • Rivkin, S., Hanushek, E., & Kain, J. (2005). Teachers, schools and academic achievement. Econometrica, 73(2), 417–458.

    Article  Google Scholar 

  • Rockoff, J. (2004). The impact of individual teachers on student achievement: Evidence from panel data. American Economic Review, 94(2), 247–252.

    Article  Google Scholar 

  • Rockoff, J., & Speroni, C. (2011). Subjective and objective evaluations of teacher effectiveness: Evidence from New York City. Labour Economics, 18, 687–696.

    Article  Google Scholar 

  • Rockoff, J., Staiger, D., Kane, T., & Taylor, E. (2012). Information and employee evaluation: Evidence from a randomized intervention in public schools. American Economic Review, 94(2), 3184–3213.

    Article  Google Scholar 

  • Rothstein, J. (2010). Teacher quality in educational production: Tracking decay, and student achievement. Quarterly Journal of Economics, 25(1), 175–214.

    Article  Google Scholar 

  • Staiger, D., & Rockoff, J. (2010). Searching for effective teachers with imperfect information. Journal of Economic Perspectives, 23(3), 97–118.

    Article  Google Scholar 

  • Taylor, E., & Tyler, J. (2012). The effect of evaluation on teacher performance. American Economic Review, 102(7), 3628–3651.

    Article  Google Scholar 

  • Tyler, J., Taylor, E., Kane, T., & Wooten, A. (2009). Using student performance data to identify effective classroom practices, Working Paper.

  • Tyler, J., Taylor, E., Kane, T., & Wooten, A. (2010). Using student performance data to identify effective classroom practices. American Economic Review Papers and Proceedings, 100, 256–260.

    Article  Google Scholar 

  • Webbink, D., & Gerritsen, S. (2013). How much do children learn in school? (p. 255). Evidence from school entry rules: CPB Netherlands Bureau for Economic Policy Analysis Discussion Paper No.

  • Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Our national failure to acknowledge and act on teacher effectiveness, New York.

  • Wiswall, M. (2013). The dynamics of teacher quality. Journal of Public Economics, 100(C), 61–78.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc van der Steeg.

Appendix

Appendix

See Tables 9, 10, 11, 12, 13.

Table 9 Recent literature on relationship between teacher evaluations and pupil test scores
Table 10 The teacher evaluation rubric “Amsterdamse Kijkwijzer”

The “Amsterdamse Kijkwijzer” rubric has been developed by KPC Groep in cooperation with the school boards and with a program that was set up by the municipality of Amsterdam to improve the quality of primary education in Amsterdam called KBA (Kwaliteitsaanpak Basisonderwijs Amsterdam). In the rubric, the competences identified in the national competence standard for teachers (the so-called SBL-competences) and the most important aspects from the framework used by the Inspectorate of Education have been translated to concrete observable behavior.

Table 11 Matrix of pair-wise correlations between class room variables \((n=99)\)
Table 12 Assignment of teachers to classes: relationship between start-of-year teacher evaluation score and previous year math, spelling and reading score
Table 13 Matrix of pair-wise correlations among scores on subsets of classroom practices by type and level and the total score on the rubric

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

van der Steeg, M., Gerritsen, S. Teacher Evaluations and Pupil Achievement Gains: Evidence from Classroom Observations. De Economist 164, 419–443 (2016). https://doi.org/10.1007/s10645-016-9280-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10645-016-9280-5

Keywords

JEL Classification

Navigation