Abstract
This chapter investigates the relationship between teacher evaluations and pupil performance gains in primary education. Teacher evaluations have been conducted by trained external evaluators who scored teachers on a detailed rubric containing 75 classroom practices. These practices reflect pedagogical, didactical and classroom organization competences considered crucial for effective teaching. Conditional on previous year test scores and several pupil and classroom characteristics the score on this rubric significantly predicts pupil performance gains on standardized tests in math, reading and spelling. Estimated test score gains are in the order of 0.4 standard deviations in math and spelling and 0.25 standard deviations in reading if a pupil is assigned a teacher from the top quartile instead of the bottom quartile of the distribution of the evaluation rubric. The observation rubric particularly seems to have potential to identify weak teachers. These observations may stimulate targeted teacher improvement plans and personnel decisions.
Similar content being viewed by others
Notes
See, among others, Rivkin et al. (2005), Clothfelter et al. (2006) and Jacob (2007), and Staiger and Rockoff (2010). Notable exceptions are two recent papers by Harris and Sass (2011) and Wiswall (2013) that find that teacher productivity keeps on increasing with experience (far) beyond the first couple of years on the job.
Weisberg et al. (2009) show in an analysis of teacher evaluation systems in 14 school districts in the US that most districts only have a binary rating system in which more than 98 % of teachers rated the highest category (usually labeled “satisfactory”).
The official competence requirements for teachers that are used by the Education Inspectorate of the Netherlands and that are part of the national Law on Occupations in Education (Wet Beroepen in Onderwijs) have been transferred to corresponding observable classroom practices in the rubric.
Teacher experience has been weighted in the same way as the TES-score for a classroom. We define this as the teacher experience a classroom of children is confronted with.
That is, parents who only finished the lowest level of secondary school or less.
They conclude this from comparing estimates of teacher value added with and without controlling for previously unobserved parent characteristics, as well as from applying a quasi-experimental research design based on changes in teaching staff.
Appendix Table 12 shows the relationship between previous year test scores and the start-of-year teacher evaluation score based on a regression with school- and grade-fixed effects. It seems that better teachers (based on the start-of-year teacher evaluation score) are assigned to weaker pupils regarding math (see column 1). No significant relationship was found however for spelling and reading and point estimates are of the opposite sign.
It should be noted that the estimates for reading, spelling and math are not statistically significantly different from each other. Therefore, we should be cautious with interpreting these results as if the strength of the relationship is strongest for spelling and weakest for reading.
A review of value-added estimates of teacher effectiveness in terms of SD in pupil test scores Hanushek and Rivkin (2010) shows that estimated coefficients are larger for math than for reading in every study.
The difference in the average TES-score between teachers in the lowest quartile (i.e. 34 competences shown) and teachers in the highest quartile (i.e. 68 competences shown) comes down to 2.5 SD.
Comparable estimates in Kane et al. (2011) for Cincinatti’s Teacher Evaluation System are 0.09 for Math and 0.13 for reading. KaSt12 find estimates in the order of 0.05 to 0.11 SD for four different rubric instruments used in the Measures of Effective Teaching Project.
For instance, the cumulative effects of being in a class with five less pupils for three consecutive years on cognitive skills are estimated to be about 0.15 SD (Fredriksson et al. 2013; Krueger 1999). Estimates of the effect of a year in school on scores on cognitive tests are in the order of 0.2 SD (e.g. Angrist and Krueger 1991; Hansen et al. 2004; Webbink and Gerritsen 2013).
Similar findings have been reported elsewhere for various evaluation rubrics.
We also investigated to what extent we could differentiate between (subsets of) competences by including them in the regressions simultaneously. However, disentangling factors of competence is difficult due to problems of multicollinearity as we work with 99 classrooms and (highly) correlated competences. A principle component analysis reveals that the first component explains about 25 % of the variance, and that the first 36 components of the 75 items explain about 90 percent. However, we could not give clear interpretations of the identified principle components, which kept us away from using them in our analysis.
Furthermore, 86 % of the principals agree that the rubric is a good instrument to distinguish weak from good teachers. Sixty % of surveyed teachers are positive about measuring teacher competences by classroom observations, as compared to 33 % being neutral and 7 % being negative. Just 13 % of teachers thinks that classroom observations do not succeed in obtaining a good picture of their competences.
References
Aaronson, D., Barrow, L., & Sander, W. (2007). Teachers and student achievement in the Chicago public high schools. Journal of Labor Economics, 25(1), 95–135.
Angrist, J., & Krueger, A. (1991). Does compulsory school attendance affect schooling and earnings? Quarterly Journal of Economics, 106(4), 979–1014.
Araujo, M., Carneiro, P., Cruz-Aguayo Y., and Schady, N. (2016). Teacher quality and learning outcomes in kindergarten, IZA Discussion Paper No. 9796.
Chetty, R., Friedman, N., & Rockoff, J. (2013a). The long-term impact of teachers: Teacher value-added and student outcomes in adulthood. NBER Working Paper No. 17699.
Chetty, R., Friedman, N., and Rockoff, J. (2013b). Measuring the impact of teachers I: Evaluating bias in teacher value-added estimates. NBER Working Paper No. 19423.
Clotfelter, C., Ladd, H., and Vigdor, J. (2006). Teacher–student matching and the assessment of teacher effectiveness NBER Working Paper No. 11936.
Fredriksson, P., Öckert, B., & Oosterbeek, H. (2013). Long-term effects of class size. Quarterly Journal of Economics, 128(1), 249–285.
Grossman, P., Loeb, S., Cohen, J., & Wyckoff, J. (2013). Measure for measure: The relationship between measures of instructional practice in middle school english language arts and teachers’ value-added scores. American Journal of Education, 119(3), 445–470.
Hansen, K., Heckman, J., & Mullen, K. (2004). The effect of schooling and ability on achievement test scores. Journal of Econometrics, 121(1–2), 39–98.
Hanushek, E., & Rivkin, S. (2010). Using value-added measures of teacher quality. American Economic Review, 100(2), 267–271.
Harris, D., & Sass, T. (2011). Teacher training, teacher quality and student achievement. Journal of Public Economics, 95, 798–812.
Harris, D., & Sass, T. (2014). Skills, Productivity and the evaluation of teacher performance. Economic of Education Review (forthcoming),. doi:10.1016/j.econedurev.2014.03.002.
Holtzapple, E. (2003). Criterion-related validity evidence for a standards-based teacher evaluation system. Journal of Personnel Evaluation in Education, 17(3), 207–219.
Jacob, B. (2007). The challenges of staffing urban schools with effective teachers. The Future of Children, 17(1), 129–154.
Jacob, B., & Lefgren, L. (2008). Principals as agents: Subjective performance measurement in education. Journal of Labor Economics, 26(1), 101–136.
Kane, T., & Staiger, D. (2008). Estimating teacher impacts on student achievement: An experimental evaluation. National Bureau of Economic Research Working Paper No. 14601.
Kane, T., Taylor, E., Tyler, J., & Wooten, A. (2011). Identifying effective classroom practices using student achievement data. Journal of Human Resources, 46(3), 587–613.
Kane, T., & Staiger, D. (2012). Gathering feedback for teaching: combining high-quality observations with students surveys and achievement gains. Measures of Effective Teaching Research Paper.
Kane, T., McCaffrey, D., Miller, T., & Staiger, D. (2013). Have we identified effective teachers?. Measures of Effective Teaching Research Paper: Validating measures of effective teaching using random assignment.
Krueger, A. (1999). Experimental estimates of education production functions. Quarterly Journal of Economics, 114, 497–532.
Ministry of Education. (2013). Kerncijfers 2008–2012: Onderwijs, cultuur en wetenschap, The Hague.
Nye, B., Konstantopoulos, S., & Hedges, L. (2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26(3), 237–257.
Rivkin, S., Hanushek, E., & Kain, J. (2005). Teachers, schools and academic achievement. Econometrica, 73(2), 417–458.
Rockoff, J. (2004). The impact of individual teachers on student achievement: Evidence from panel data. American Economic Review, 94(2), 247–252.
Rockoff, J., & Speroni, C. (2011). Subjective and objective evaluations of teacher effectiveness: Evidence from New York City. Labour Economics, 18, 687–696.
Rockoff, J., Staiger, D., Kane, T., & Taylor, E. (2012). Information and employee evaluation: Evidence from a randomized intervention in public schools. American Economic Review, 94(2), 3184–3213.
Rothstein, J. (2010). Teacher quality in educational production: Tracking decay, and student achievement. Quarterly Journal of Economics, 25(1), 175–214.
Staiger, D., & Rockoff, J. (2010). Searching for effective teachers with imperfect information. Journal of Economic Perspectives, 23(3), 97–118.
Taylor, E., & Tyler, J. (2012). The effect of evaluation on teacher performance. American Economic Review, 102(7), 3628–3651.
Tyler, J., Taylor, E., Kane, T., & Wooten, A. (2009). Using student performance data to identify effective classroom practices, Working Paper.
Tyler, J., Taylor, E., Kane, T., & Wooten, A. (2010). Using student performance data to identify effective classroom practices. American Economic Review Papers and Proceedings, 100, 256–260.
Webbink, D., & Gerritsen, S. (2013). How much do children learn in school? (p. 255). Evidence from school entry rules: CPB Netherlands Bureau for Economic Policy Analysis Discussion Paper No.
Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Our national failure to acknowledge and act on teacher effectiveness, New York.
Wiswall, M. (2013). The dynamics of teacher quality. Journal of Public Economics, 100(C), 61–78.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The “Amsterdamse Kijkwijzer” rubric has been developed by KPC Groep in cooperation with the school boards and with a program that was set up by the municipality of Amsterdam to improve the quality of primary education in Amsterdam called KBA (Kwaliteitsaanpak Basisonderwijs Amsterdam). In the rubric, the competences identified in the national competence standard for teachers (the so-called SBL-competences) and the most important aspects from the framework used by the Inspectorate of Education have been translated to concrete observable behavior.
Rights and permissions
About this article
Cite this article
van der Steeg, M., Gerritsen, S. Teacher Evaluations and Pupil Achievement Gains: Evidence from Classroom Observations. De Economist 164, 419–443 (2016). https://doi.org/10.1007/s10645-016-9280-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10645-016-9280-5