Teacher Evaluations and Pupil Achievement Gains: Evidence from Classroom Observations

van der Steeg, Marc; Gerritsen, Sander

doi:10.1007/s10645-016-9280-5

Teacher Evaluations and Pupil Achievement Gains: Evidence from Classroom Observations

Published: 15 July 2016

Volume 164, pages 419–443, (2016)
Cite this article

De Economist Aims and scope Submit manuscript

700 Accesses
2 Citations
4 Altmetric
Explore all metrics

Abstract

This chapter investigates the relationship between teacher evaluations and pupil performance gains in primary education. Teacher evaluations have been conducted by trained external evaluators who scored teachers on a detailed rubric containing 75 classroom practices. These practices reflect pedagogical, didactical and classroom organization competences considered crucial for effective teaching. Conditional on previous year test scores and several pupil and classroom characteristics the score on this rubric significantly predicts pupil performance gains on standardized tests in math, reading and spelling. Estimated test score gains are in the order of 0.4 standard deviations in math and spelling and 0.25 standard deviations in reading if a pupil is assigned a teacher from the top quartile instead of the bottom quartile of the distribution of the evaluation rubric. The observation rubric particularly seems to have potential to identify weak teachers. These observations may stimulate targeted teacher improvement plans and personnel decisions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating Teacher Performance and Teaching Effectiveness: Conceptual and Methodological Considerations

Teacher evaluation in Illinois: school leaders’ perceptions and practices

Article 15 November 2016

Teacher Evaluation with Multiple Indicators: Conceptual and Methodological Considerations Regarding Validity

Notes

See e.g. Hanushek and Rivkin (2010) and Harris and Sass (2011) for reviews of the literature.
See, among others, Rivkin et al. (2005), Clothfelter et al. (2006) and Jacob (2007), and Staiger and Rockoff (2010). Notable exceptions are two recent papers by Harris and Sass (2011) and Wiswall (2013) that find that teacher productivity keeps on increasing with experience (far) beyond the first couple of years on the job.
Weisberg et al. (2009) show in an analysis of teacher evaluation systems in 14 school districts in the US that most districts only have a binary rating system in which more than 98 % of teachers rated the highest category (usually labeled “satisfactory”).
The official competence requirements for teachers that are used by the Education Inspectorate of the Netherlands and that are part of the national Law on Occupations in Education (Wet Beroepen in Onderwijs) have been transferred to corresponding observable classroom practices in the rubric.
Teacher experience has been weighted in the same way as the TES-score for a classroom. We define this as the teacher experience a classroom of children is confronted with.
That is, parents who only finished the lowest level of secondary school or less.
They conclude this from comparing estimates of teacher value added with and without controlling for previously unobserved parent characteristics, as well as from applying a quasi-experimental research design based on changes in teaching staff.
Appendix Table 12 shows the relationship between previous year test scores and the start-of-year teacher evaluation score based on a regression with school- and grade-fixed effects. It seems that better teachers (based on the start-of-year teacher evaluation score) are assigned to weaker pupils regarding math (see column 1). No significant relationship was found however for spelling and reading and point estimates are of the opposite sign.
It should be noted that the estimates for reading, spelling and math are not statistically significantly different from each other. Therefore, we should be cautious with interpreting these results as if the strength of the relationship is strongest for spelling and weakest for reading.
A review of value-added estimates of teacher effectiveness in terms of SD in pupil test scores Hanushek and Rivkin (2010) shows that estimated coefficients are larger for math than for reading in every study.
The difference in the average TES-score between teachers in the lowest quartile (i.e. 34 competences shown) and teachers in the highest quartile (i.e. 68 competences shown) comes down to 2.5 SD.
Comparable estimates in Kane et al. (2011) for Cincinatti’s Teacher Evaluation System are 0.09 for Math and 0.13 for reading. KaSt12 find estimates in the order of 0.05 to 0.11 SD for four different rubric instruments used in the Measures of Effective Teaching Project.
For instance, the cumulative effects of being in a class with five less pupils for three consecutive years on cognitive skills are estimated to be about 0.15 SD (Fredriksson et al. 2013; Krueger 1999). Estimates of the effect of a year in school on scores on cognitive tests are in the order of 0.2 SD (e.g. Angrist and Krueger 1991; Hansen et al. 2004; Webbink and Gerritsen 2013).
Estimates in Table 8 should be compared with the ones in Table 6 on the restricted sample of 88 classrooms where two evaluations per teacher have been carried out.
Similar findings have been reported elsewhere for various evaluation rubrics.
We also investigated to what extent we could differentiate between (subsets of) competences by including them in the regressions simultaneously. However, disentangling factors of competence is difficult due to problems of multicollinearity as we work with 99 classrooms and (highly) correlated competences. A principle component analysis reveals that the first component explains about 25 % of the variance, and that the first 36 components of the 75 items explain about 90 percent. However, we could not give clear interpretations of the identified principle components, which kept us away from using them in our analysis.
Comparable estimates for Cincinatti’s TES are 0.09 in math and 0.08 in reading (Kane et al. 2011). Rockoff and Speroni (2011) report a coefficient of 0.05 higher math achievement of a 1 SD higher rating by mentor teachers.
Furthermore, 86 % of the principals agree that the rubric is a good instrument to distinguish weak from good teachers. Sixty % of surveyed teachers are positive about measuring teacher competences by classroom observations, as compared to 33 % being neutral and 7 % being negative. Just 13 % of teachers thinks that classroom observations do not succeed in obtaining a good picture of their competences.

References

Aaronson, D., Barrow, L., & Sander, W. (2007). Teachers and student achievement in the Chicago public high schools. Journal of Labor Economics, 25(1), 95–135.
Article Google Scholar
Angrist, J., & Krueger, A. (1991). Does compulsory school attendance affect schooling and earnings? Quarterly Journal of Economics, 106(4), 979–1014.
Article Google Scholar
Araujo, M., Carneiro, P., Cruz-Aguayo Y., and Schady, N. (2016). Teacher quality and learning outcomes in kindergarten, IZA Discussion Paper No. 9796.
Chetty, R., Friedman, N., & Rockoff, J. (2013a). The long-term impact of teachers: Teacher value-added and student outcomes in adulthood. NBER Working Paper No. 17699.
Chetty, R., Friedman, N., and Rockoff, J. (2013b). Measuring the impact of teachers I: Evaluating bias in teacher value-added estimates. NBER Working Paper No. 19423.
Clotfelter, C., Ladd, H., and Vigdor, J. (2006). Teacher–student matching and the assessment of teacher effectiveness NBER Working Paper No. 11936.
Fredriksson, P., Öckert, B., & Oosterbeek, H. (2013). Long-term effects of class size. Quarterly Journal of Economics, 128(1), 249–285.
Article Google Scholar
Grossman, P., Loeb, S., Cohen, J., & Wyckoff, J. (2013). Measure for measure: The relationship between measures of instructional practice in middle school english language arts and teachers’ value-added scores. American Journal of Education, 119(3), 445–470.
Article Google Scholar
Hansen, K., Heckman, J., & Mullen, K. (2004). The effect of schooling and ability on achievement test scores. Journal of Econometrics, 121(1–2), 39–98.
Article Google Scholar
Hanushek, E., & Rivkin, S. (2010). Using value-added measures of teacher quality. American Economic Review, 100(2), 267–271.
Article Google Scholar
Harris, D., & Sass, T. (2011). Teacher training, teacher quality and student achievement. Journal of Public Economics, 95, 798–812.
Article Google Scholar
Harris, D., & Sass, T. (2014). Skills, Productivity and the evaluation of teacher performance. Economic of Education Review (forthcoming),. doi:10.1016/j.econedurev.2014.03.002.
Holtzapple, E. (2003). Criterion-related validity evidence for a standards-based teacher evaluation system. Journal of Personnel Evaluation in Education, 17(3), 207–219.
Article Google Scholar
Jacob, B. (2007). The challenges of staffing urban schools with effective teachers. The Future of Children, 17(1), 129–154.
Article Google Scholar
Jacob, B., & Lefgren, L. (2008). Principals as agents: Subjective performance measurement in education. Journal of Labor Economics, 26(1), 101–136.
Article Google Scholar
Kane, T., & Staiger, D. (2008). Estimating teacher impacts on student achievement: An experimental evaluation. National Bureau of Economic Research Working Paper No. 14601.
Kane, T., Taylor, E., Tyler, J., & Wooten, A. (2011). Identifying effective classroom practices using student achievement data. Journal of Human Resources, 46(3), 587–613.
Article Google Scholar
Kane, T., & Staiger, D. (2012). Gathering feedback for teaching: combining high-quality observations with students surveys and achievement gains. Measures of Effective Teaching Research Paper.
Kane, T., McCaffrey, D., Miller, T., & Staiger, D. (2013). Have we identified effective teachers?. Measures of Effective Teaching Research Paper: Validating measures of effective teaching using random assignment.
Krueger, A. (1999). Experimental estimates of education production functions. Quarterly Journal of Economics, 114, 497–532.
Article Google Scholar
Ministry of Education. (2013). Kerncijfers 2008–2012: Onderwijs, cultuur en wetenschap, The Hague.
Nye, B., Konstantopoulos, S., & Hedges, L. (2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26(3), 237–257.
Article Google Scholar
Rivkin, S., Hanushek, E., & Kain, J. (2005). Teachers, schools and academic achievement. Econometrica, 73(2), 417–458.
Article Google Scholar
Rockoff, J. (2004). The impact of individual teachers on student achievement: Evidence from panel data. American Economic Review, 94(2), 247–252.
Article Google Scholar
Rockoff, J., & Speroni, C. (2011). Subjective and objective evaluations of teacher effectiveness: Evidence from New York City. Labour Economics, 18, 687–696.
Article Google Scholar
Rockoff, J., Staiger, D., Kane, T., & Taylor, E. (2012). Information and employee evaluation: Evidence from a randomized intervention in public schools. American Economic Review, 94(2), 3184–3213.
Article Google Scholar
Rothstein, J. (2010). Teacher quality in educational production: Tracking decay, and student achievement. Quarterly Journal of Economics, 25(1), 175–214.
Article Google Scholar
Staiger, D., & Rockoff, J. (2010). Searching for effective teachers with imperfect information. Journal of Economic Perspectives, 23(3), 97–118.
Article Google Scholar
Taylor, E., & Tyler, J. (2012). The effect of evaluation on teacher performance. American Economic Review, 102(7), 3628–3651.
Article Google Scholar
Tyler, J., Taylor, E., Kane, T., & Wooten, A. (2009). Using student performance data to identify effective classroom practices, Working Paper.
Tyler, J., Taylor, E., Kane, T., & Wooten, A. (2010). Using student performance data to identify effective classroom practices. American Economic Review Papers and Proceedings, 100, 256–260.
Article Google Scholar
Webbink, D., & Gerritsen, S. (2013). How much do children learn in school? (p. 255). Evidence from school entry rules: CPB Netherlands Bureau for Economic Policy Analysis Discussion Paper No.
Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Our national failure to acknowledge and act on teacher effectiveness, New York.
Wiswall, M. (2013). The dynamics of teacher quality. Journal of Public Economics, 100(C), 61–78.
Article Google Scholar

Download references

Author information

Authors and Affiliations

CPB Netherlands Bureau for Economic Policy Analysis, The Hague, The Netherlands
Marc van der Steeg & Sander Gerritsen
Dutch Ministry of Education, Science and Culture, The Hague, The Netherlands
Marc van der Steeg

Authors

Marc van der Steeg
View author publications
You can also search for this author in PubMed Google Scholar
Sander Gerritsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc van der Steeg.

Appendix

See Tables 9, 10, 11, 12, 13.

Table 9 Recent literature on relationship between teacher evaluations and pupil test scores

Full size table

Table 10 The teacher evaluation rubric “Amsterdamse Kijkwijzer”

Full size table

The “Amsterdamse Kijkwijzer” rubric has been developed by KPC Groep in cooperation with the school boards and with a program that was set up by the municipality of Amsterdam to improve the quality of primary education in Amsterdam called KBA (Kwaliteitsaanpak Basisonderwijs Amsterdam). In the rubric, the competences identified in the national competence standard for teachers (the so-called SBL-competences) and the most important aspects from the framework used by the Inspectorate of Education have been translated to concrete observable behavior.

Table 11 Matrix of pair-wise correlations between class room variables \((n=99)\)

Full size table

Table 12 Assignment of teachers to classes: relationship between start-of-year teacher evaluation score and previous year math, spelling and reading score

Full size table

Table 13 Matrix of pair-wise correlations among scores on subsets of classroom practices by type and level and the total score on the rubric

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

van der Steeg, M., Gerritsen, S. Teacher Evaluations and Pupil Achievement Gains: Evidence from Classroom Observations. De Economist 164, 419–443 (2016). https://doi.org/10.1007/s10645-016-9280-5

Download citation

Published: 15 July 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10645-016-9280-5

Keywords

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Teacher Evaluations and Pupil Achievement Gains: Evidence from Classroom Observations

Abstract

Access this article

Similar content being viewed by others

Evaluating Teacher Performance and Teaching Effectiveness: Conceptual and Methodological Considerations

Teacher evaluation in Illinois: school leaders’ perceptions and practices

Teacher Evaluation with Multiple Indicators: Conceptual and Methodological Considerations Regarding Validity

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Teacher Evaluations and Pupil Achievement Gains: Evidence from Classroom Observations

Abstract

Access this article

Similar content being viewed by others

Evaluating Teacher Performance and Teaching Effectiveness: Conceptual and Methodological Considerations

Teacher evaluation in Illinois: school leaders’ perceptions and practices

Teacher Evaluation with Multiple Indicators: Conceptual and Methodological Considerations Regarding Validity

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation