Principal holistic judgments and high-stakes evaluations of teachers

Briggs, Derek C.; Dadey, Nathan

doi:10.1007/s11092-016-9256-7

Principal holistic judgments and high-stakes evaluations of teachers

Published: 03 December 2016

Volume 29, pages 155–178, (2017)
Cite this article

Educational Assessment, Evaluation and Accountability Aims and scope Submit manuscript

Derek C. Briggs¹ &
Nathan Dadey²

703 Accesses
6 Citations
7 Altmetric
2 Mentions
Explore all metrics

Abstract

Results from a sample of 1,013 Georgia principals who rated 12,617 teachers are used to compare holistic and analytic principal judgments with indicators of student growth central to the state’s teacher evaluation system. Holistic principal judgments were compared to mean student growth percentiles (MGPs) and analytic judgments from a formal observation protocol. The correlations of a holistic principal rating with teacher MGPs and observation protocol scores were 0.22 and 0.32. Teachers selected as most successful at increasing student achievement had a mean MGP that was a full SD higher than did teachers selected as least successful, and a mean observation protocol score that was 1.35 SDs higher. Holistic principal judgments appear to be much more strongly influenced by observations of teachers’ classroom practices than they were by evidence of growth in student achievement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Ehlert, Koedel, Parsons and Podgursky (2013) argue that even if it were possible to properly specify a value-added model that could isolate the effect of a teacher on student academic growth, it would still be preferable to include additional classroom level covariates at the risk of “overcorrecting” the model in order to create an optimal incentive structure to recruit and retain the best teachers in high-needs school districts.
Only about one third of all teachers in Georgia teach in subjects or grades for which it is possible to compute an MGP. For all other teachers, an aggregated growth statistic is computed on the basis of student performance on “student learning objectives” (SLOs). Because SLOs were still relatively new and in the process of being standardized at the time of this study, we do not include them here to keep the scope of our investigation manageable. For a detailed evaluation of SLOs in Georgia, see Buckley 2015.
For a primer on the student growth percentile methodology as it has been implemented in Georgia, see http://www.gadoe.org/Curriculum-Instruction-and-Assessment/Assessment/Pages/Georgia-Student-Growth-Model.aspx.
It is easy to confuse the acronym “MGP” because the M can refer to either a median or a mean. In many states using SGPs, the median is taken instead of the mean. In Georgia, the decision was made to use the mean instead of the median in part because of research conducted by Castellano and Ho (2015) that suggests the mean will be more stable to random fluctuations in student cohorts.
In the survey “professional development support” is defined by example: “all teachers can benefit from support in the form of professional development (PD) that helps them become better at their job. Examples of these kinds of PD supports might include workshops offered at the district or school level, presentations offered by professional speakers from outside the school, periodic meetings in teacher teams during the school year, one-on-one coaching and feedback on teaching from a mentor or mentors, and taking coursework at an institution of higher education ”
We regard this as a quasi-ordinal rating scale in the sense that a rating of a 4 is meant to indicate a teacher whom a principal believes to require more support than do a teacher with a rating of 1, 2, or 3, and a rating of a 1 indicates a teacher whom a principal believes to require less support than do a teacher with a rating of a 2, 3, or 4. However, it is not clear that a rating of 3 necessarily indicates a greater level of support than a rating of 2, and in subsequent analyses, we sometimes collapse the middle two categories or restrict focus to the top and bottom categories.
The courses for which EOCTs exist are Mathematics I, Mathematics II, Coordinate Algebra, Georgia Performance Standards Algebra, Analytic Geometry, Georgia Performance Standards Geometry, United States History, Economics, Biology, Physical Science, Ninth Grade Literature and Composition, and American Literature and Composition.
This assumption is surely violated by the clustering of teachers within schools and school districts. However, because our sample sizes are so large in this regression context, involving the full population of teachers in the state, producing cluster-adjusted standard errors would have no impact on conventional tests of statistical significance, and such tests are not relevant to the approach anyway.
These variables are meant to be illustrative rather than exhaustive; examples of other variables that could have been included would be racial/ethnic composition, attendance rates, student “churn” (students that enter and exit the classroom throughout the year), the proportion of students in gifted and talented program, etc. Indeed, one challenge with this approach is that it can be unclear where one should stop in adding factors that need to be controlled.
In the results not shown here due to space constraints, we verify this by regressing principals’ least/most successful judgments for teachers on MGP and TAPS scores. We find that the TAPS score variable has the strongest influence, especially in judgments of teachers deemed least successful at increasing student achievement.

References

Adler, M. (2014). Review of measuring the impacts of teachers. National Education Policy Center. http://nepc.colorado.edu/thinktank/review-measuring-impact-of-teachers.
American Statistical Association (2014). ASA Statement on Using Value-Added Models for Educational Assessment. https://www.amstat.org/asa/files/pdfs/POLASAVAM-Statement.pdf. Accessed 12 June 2016.
Ballou, D. (2012). Review of long-term impacts of teachers. National Education Policy Center. http://nepc.colorado.edu/thinktank/review-long-term-impacts.
Ballou, D., Sanders, W., & Wright, P. (2004). Controlling for student background in value-added assessment of teachers. Journal of Educational and Behavioral Statistics, 29(1), 37–65.
Article Google Scholar
Betebenner, D. (2009). Norm- and criterion-referenced student growth. Educational Measurement: Issues and Practice, 28(4), 42–51.
Article Google Scholar
Castellano, K. E., & Ho, A. D. (2015). Practical differences among aggregate-level conditional status metrics: from median student growth percentiles to value-added models. Journal of Educational and Behavioral Statistics, 40(1), 35–68.
Article Google Scholar
Chetty, R., Friedman, T., & Rockoff, J. (2014a). Measuring the impacts of teachers I: evaluating bias in teacher value-added estimates. American Economic Review., 104(9), 2593–2632.
Article Google Scholar
Chetty, R., Friedman, T., & Rockoff, J. (2014b). Measuring the impacts of teachers II: teacher value-added and student outcomes in adulthood. American Economic Review., 104(9), 2633–2679.
Article Google Scholar
Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. J. (2013). The sensitivity of value-added estimates to specification adjustments: evidence from school- and teacher-level models in Missouri. Statistics and Public Policy, (March 2014), 19–27.
Goldhaber, D., Walch, J., & Gabele, B. (2013). Does the model matter? Exploring the relationship between different student achievement-based teacher assessments. Statistics and Public Policy, 1(1), 28–39.
Article Google Scholar
Goldring, E., Grissom, J. A., Rubin, M., Neumerski, C. M., Cannata, M., Drake, T., & Schuermann, P. (2015). Make room value added: principals’ human capital decisions and the emergence of teacher observation data. Educational Researcher, 44(2), 96–104.
Article Google Scholar
Guarino, C., Reckase, M., Stacy, B., & Wooldridge, J. (2015). A comparison of student growth percentile and value-added models of teacher performance. Statistics and Public Policy, 2, 1. doi:10.1080/2330443X.2015.1034820.
Article Google Scholar
Harris, D. N., Ingle, W. K., & Rutledge, S. A. (2014). How teacher evaluation methods matters for accountability: a comparative analysis of teacher effectiveness ratings by principals and teacher value-added measures. American Educational Research Journal, 5(1), 73–112.
Article Google Scholar
Hill, H., Charalambos, C., & Kraft, M. (2012). When rater reliability is not enough: teacher observation systems and a case for the generalizability study. Educational Researcher, 41(2), 56–64.
Article Google Scholar
Ingle, W. K., Rutledge, S. A., & Bishop, J. L. (2011). Context matters: principals’ sense- making of teacher hiring and on-the-job performance. Journal of Educational Administration, 49, 579–610.
Article Google Scholar
Jacob, B. A., & Lefgren, L. (2008). Can principals identify effective teachers? Evidence on subjective performance evaluation in education. Journal of Labor Economics, 26(1), 101–136.
Article Google Scholar
Kane, T. J., McCaffrey, D. M., Miller, T., & Staiger, D. O. (2013a). Have we identified effective teachers? Validating measures of effective teaching using random assignment. Seattle, WA: Bill and Melinda Gates Foundation.
Google Scholar
Mashburn, A., Meyer, J., Allen, J., & Pianta, R. (2014). The effect of observation length and presentation order on the reliability and validity of an observational measure of teaching quality. Educational and Psychological Measurement, 74(3), 400–422.
Article Google Scholar
McCaffrey, D. F., Sass, T. R., Lockwood, J. R., & Mihaly, K. (2009). The intertemporal variability of teacher effect estimates. Education Finance and Policy, 4(4), 572–606.
Article Google Scholar
Reardon, S. F., & Raudenbush, S. W. (2009). Assumptions of value-added models for estimating school effects. Education Finance and Policy, 4(4), 492–519.
Article Google Scholar
Rockoff, J., Staiger, D. O., Kane, T. J., & Taylor, E. (2012) Information and employee evaluation: evidence from a randomized intervention in public schools. American Economic Review, 102(7), 3184–3213.
Rothstein, J. (2009). Student sorting and bias in value-added estimation: selection on observables and unobservables. Education Finance and Policy, 4(4), 537–571.
Article Google Scholar
Rothstein, J. (2010). Teacher quality in educational production: tracking, decay, and student achievement. Quarterly Journal of Economics, 125(1), 175–214.
Article Google Scholar
Schochet, P. Z., & Chiang, H. S. (2010). Error rates in measuring teacher and school performance based on student test score gains. Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education.
Google Scholar
Walsh, E., & Isenberg, E. (2015). How does value added compare to student growth percentiles? Statistics and Public Policy, 2, 1. doi:10.1080/2330443X.2015.1034390.
Article Google Scholar
Whitehurst, G., Chingos, M., & Lindquist, K. (2014). Evaluating teachers with classroom observations: lessons learned in four districts. Brown Center on Education Policy at Brookings.

Download references

Author information

Authors and Affiliations

University of Colorado, Boulder, CO, USA
Derek C. Briggs
National Center for the Improvement of Educational Assessment, Dover, NH, USA
Nathan Dadey

Authors

Derek C. Briggs
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Dadey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Derek C. Briggs.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Briggs, D.C., Dadey, N. Principal holistic judgments and high-stakes evaluations of teachers. Educ Asse Eval Acc 29, 155–178 (2017). https://doi.org/10.1007/s11092-016-9256-7

Download citation

Received: 03 May 2016
Accepted: 11 November 2016
Published: 03 December 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s11092-016-9256-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Principal holistic judgments and high-stakes evaluations of teachers

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Theories of Motivation in Education: an Integrative Framework

The Impact of Peer Assessment on Academic Performance: A Meta-analysis of Control Group Studies

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Principal holistic judgments and high-stakes evaluations of teachers

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Theories of Motivation in Education: an Integrative Framework

The Impact of Peer Assessment on Academic Performance: A Meta-analysis of Control Group Studies

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation