New evidence concerning school accountability and mathematics instructional quality in the no child left behind era

Hunter, Seth B.

doi:10.1007/s11092-019-09307-6

New evidence concerning school accountability and mathematics instructional quality in the no child left behind era

Published: 16 November 2019

Volume 31, pages 409–436, (2019)
Cite this article

Educational Assessment, Evaluation and Accountability Aims and scope Submit manuscript

Seth B. Hunter ORCID: orcid.org/0000-0002-3051-872X¹

716 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Using longitudinal data from the No Child Left Behind (NCLB) era, I applied regression techniques and found a positive association between school failure to reach “adequate yearly progress” in mathematics and subsequent changes in the quality of middle grades mathematics instruction in districts where district leaders adopted robust theories of action for improving mathematics instruction. The positive association was robust to multiple sensitivity tests and may reflect a causal relationship. The evidence suggests that educational leaders in similar contexts can use school failure to reach accountability standards as measured by standardized assessments to promote instructional quality in mathematics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accountability, Achievement, and Inequality in American Public Schools: A Review of the Literature

Preparing Teachers’ to Raise Students’ Mathematics Learning

Article 27 May 2017

A Lesson for the Common Core Standards Era from the NCTM Standards Era: The Importance of Considering School-Level Buy-in When Implementing and Evaluating Standards-Based Instructional Materials

Notes

Local education agencies give charter schools autonomy from some local policies and regulations with the expectation that charter school leaders will use this autonomy to rapidly improve student achievement.
Three rubrics from the IQA toolkit were not included because previous works suggested scores produced by these rubrics were unreliable (Wilhelm and Kim 2015).
A dependability coefficient is used in absolute decision making and is appropriate when comparing scores to a threshold (Hill, Charalambous, & Kraft, 2012).
The generalizability coefficient of the Mathematical Quality of Instruction (MQI) instrument, based on two observations, is less than 0.60 (Hill et al. 2012). Therefore the dependability coefficient of the IQA is greater than the dependability coefficient of the MQI because the generalizability coefficient of an instrument is always greater than the dependability coefficient associated with a particular data collection and scoring procedure (Brennan 2001).
Exploratory factor analyses conducted by ACRO researchers suggested scores from the eight rubrics loaded onto two constructs. The first construct was primarily based on variation in Task Potential and Implementation scores while the second construct was based on variation in the remaining six rubrics. ACRO researchers decided 50% of the IQA composite score should be determined by variation in scores contributing to the first factor while the other 50% of the IQA composite should be determined by variation in the other six rubrics. Task Potential and Implementation scores were weighted by 25% each. Scores associated with each of the six remaining rubrics were weighted by 8.33%.
There are 30 schools observed for 4 years, yielding a maximum of 90 records at the school-by-year level. However, one school only participated 2 years and six schools participated 3 years. See Table 1.
The survey allowed respondents to choose both White and Hispanic.
It is also plausible that it is easier to improve IQA for teachers in some grades compared to other grades. If teacher-participants in failing schools happened to teach in these “easier” grades, this could explain away the positive relationship between school failure and IQA. I addressed this by adding grade-fixed effects (i.e., dichotomous variables for whether a teacher taught 6th, 7th, or 8th grades) as right-hand side variables. The collection of grade-fixed effects was not related to changes in IQA (i.e., grade-fixed effects were not jointly significant).
Additionally, several schools-by-year cells passed math AYP in the first year of the study period. It is plausible instruction in that year improved dramatically, maybe because it was the first year that districts could respond to ACRO-provided feedback. District leadership may have wanted to show ACRO researchers they could productively act on researcher feedback. Such relationships could explain a sizeable portion of the observed positive relationship of interest. To address this explanation, I added year-fixed effects as right-hand size variables (i.e., dichotomous variables for whether the record came from the first, second, or third study year). Like grade-fixed effects, year-fixed effects did not predict changes in IQA. Subsequent models did not include year-fixed effects due to their lack of joint significance.
This explanation is counterintuitive. If instruction in failing schools was improving, it seems these schools would not continue to fail.
While there is no consensus regarding the cutoffs to use when identifying highly influential data based on these two statistics, 4/(analytical sample size) and the absolute value of two are often used as rules of thumb to identify extreme values for Cook’s D and studentized residuals, respectively (Bollen and Jackman 1990).
Robust and quantile estimators were unable to account for the clustering of teachers within schools, thus standard errors in rows IV and V in Table 4 are not comparable to estimates from other models.

References

Anagnostopoulos, D. (2003). The new accountability, student failure and teachers’ work in urban high schools. Educational Policy, 17(3), 291–316. https://doi.org/10.1177/0895904803254481.
Article Google Scholar
Baker, D. (2014). The schooled society (1st ed.). Stanford: Stanford University Press.
Google Scholar
Ball, D. L. (2000). Bridging practices: intertwining content and pedagogy in teaching and learning to teach. Journal of Teacher Education, 51(3), 241–247.
Article Google Scholar
Bollen, K. A., & Jackman, R. W. (1990). In J. Fox & J. S. Long (Eds.), Modern methods of data analysis. Newbury Park: Sage Publications.
Google Scholar
Boston, M. D. (2012). Assessing instructional quality in mathematics. The Elementary School Journal, 113(1), 76–104.
Article Google Scholar
Boston, M. D., & Wilhelm, A. G. (2015). Middle school mathematics instruction in instructionally focused urban districts. Urban Education, 1–33. https://doi.org/10.1177/0042085915574528.
Boston, M., & Wolf, M. K. (2006). Assessing Academic Rigor in Mathematics Instruction: The Development of the Instructional Quality Assessment Toolkit [Technical Report]. https://doi.org/10.1037/e644922011-001
Boston, M., Bostic, J., Lesseig, K., & Sherman, M. (2015). A comparison of mathematics classroom observation protocols. Mathematics Teacher Educator, 3(2), 154–175.
Article Google Scholar
Brennan, R. L. (2001). Generalizability theory. New York: Springer-Verlag.
Book Google Scholar
Campbell, S. L., & Ronfeldt, M. (2018). Observational evaluation of teachers: measuring more than we bargained for? American Educational Research Journal, 000283121877621. https://doi.org/10.3102/0002831218776216.
Chiang, H. (2009). How accountability pressure on failing schools affects student achievement. Journal of Public Economics, 93(9–10), 1045–1057. https://doi.org/10.1016/j.jpubeco.2009.06.002.
Article Google Scholar
Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2006). Teacher-Student Matching and the Assessment of Teacher Effectiveness. The Journal of Human Resources, 41(4), 778–820.
Cobb, P., Jackson, K., Henrick, E., & Smith, T. M. (2018). Systems for instructional improvement: creating coherence from the classroom to the district office (1st ed.). Harvard Education Press.
Cochran-Smith, M., & Lytle, S. L. (1999). Relationships of knowledge and practice: teacher learning in communities. Review of Research in Education, 24(1), 249–305. https://doi.org/10.3102/0091732X024001249.
Article Google Scholar
Dee, T. S., Jacob, B., & Schwartz, N. L. (2013). The effects of NCLB on school resources and practices. Educational Evaluation and Policy Analysis, 35(2), 252–279. https://doi.org/10.3102/0162373712467080.
Article Google Scholar
Desimone, L. M., Hochberg, E. D., & McMaken, J. (2016). Teacher knowledge and instructional quality of beginning teachers: growth and linkages. Teachers College Record, 118(May), 54.
Google Scholar
Franke, M. L., Kazemi, E., & Battey, D. (2007). Mathematics teaching and classroom practice. In F. K. Lester (Ed.), Second handbook of research on mathematics teaching and learning (2nd ed., pp. 225–256). Charlotte: National Council of Teachers of Mathematics.
Google Scholar
Fuller, B., Wright, J., Gesicki, K., & Kang, E. (2007). Gauging growth: how to judge no child left behind? Educational Researcher, 36(5), 268–278. https://doi.org/10.3102/0013189X07306556.
Article Google Scholar
Gamoran, A., Porter, A. C., Smithson, J., & White, P. A. (1997). Upgrading high school mathematics instruction: improving learning opportunities for low-achieving, low-income youth. Educational Evaluation and Policy Analysis, 19(4), 325–338.
Article Google Scholar
Hamilton, L., Berends, M., & Stecher, B. M.. (2005). Teachers ’ responses to standards-based accountability.
Hannaway, J., & Hamilton, L. (2008). Performance-based accountability Policies: Implications for School and Classroom Practices. https://doi.org/10.1037/e722482011-001.
Harris, D. N., & Sass, T. R. (2011). Teacher training, teacher quality and student achievement. Journal of Public Economics, 95(7–8), 798–812. https://doi.org/10.1016/j.jpubeco.2010.11.009.
Article Google Scholar
Herman, J. L. (2004). The effects of testing on instruction. In S. H. Fuhrman & R. F. Elmore (Eds.), Redesigning accountability systems for education. New York: Teachers College Press.
Google Scholar
Hiebert, J., Carpenter, T. P., Fennema, E., Fuson, K. C., Wearne, D., Murray, H., et al. (1997). Making sense: teaching and learning mathematics with understanding. Portsmouth: Heinemann.
Google Scholar
Hill, H. C., Charalambous, C. Y., & Kraft, M. A. (2012). When rater reliability is not enough: teacher observation systems and a case for the generalizability study. Educational Researcher, 41(2), 56–64. https://doi.org/10.3102/0013189X12437203.
Article Google Scholar
Holmstrom, B., & Milgrom, P. (1991). Multitask principal-agent analyses: incentive contracts, asset ownership, and job design. Journal of Law, Economics, and Organization, 7(Special Issue), 24–52.
Article Google Scholar
Jacob, B. A. (2005). Accountability, incentives and behavior: the impact of high-stakes testing in the Chicago public schools. Journal of Public Economics, 89(5–6), 761–796. https://doi.org/10.1016/j.jpubeco.2004.08.004.
Article Google Scholar
Kane, T. J., Taylor, E. S., Tyler, J. H., & Wooten, A. L. (2011). Identifying effective classroom practices using student achievement data. The Journal of Human Resources, 46(3), 587–613.
Article Google Scholar
Kazemi, E., & Franke, M. L.. (2004). Teacher learning in mathematics: using student. 203–235.
Kim, J. S., & Sunderman, G. L. (2005). Measuring academic proficiency under the no child left behind act: implications for educational equity. Educational Researcher, 34(8), 3–13. https://doi.org/10.3102/0013189X034008003.
Article Google Scholar
Ladd, H. F., & Sorensen, L. C. (2017). Returns to teacher experience: student achievement and motivation in middle school. Education Finance and Policy, 12(2), 241–279. https://doi.org/10.1162/EDFP_a_00194.
Article Google Scholar
Manna, P.. (2011). Collision course: federal education policy meets state and local realities. CQ Press.
McGuinn, P.. (2006). No Child Left Behind and the transformation of federal education policy. Lawrence.
Mintrop, H., & Sunderman, G. L. (2009). Predictable failure of federal sanctions-driven accountability for school improvement—and why we may retain it anyway. Educational Researcher, 38(5), 353–364. https://doi.org/10.3102/0013189X09339055.
Article Google Scholar
MIST Instruments. (n.d.). Retrieved July 13, 2019, from Peabody College of Education and Human Development website: https://peabody.vanderbilt.edu/departments/tl/teaching_and_learning_research/mist/mist_instruments.php
Monfils, L. F., Firestone, W. A., Hicks, J. E., Martinez, M. C., Schorr, R. Y., & Camilli, G. (2004). Teaching to the test. In W. A. Firestone, R. Y. Schorr, & L. F. Monfils (Eds.), The ambiguity of teaching to the test. Mahwah: Lawrence Erlbaum Associates.
Google Scholar
National Center for Education Statistics. (2003). Highlights from the TIMSS 1999 Video Study of Eighth-Grade Mathematics Teaching (pp. 1–12). Washington, D.C.
National Council of Teachers of Mathematics. (2000). Principles and standards. Reston: NCTM.
Google Scholar
NEA - ESEA/NCLB Update #129. (2012). Retrieved from http://www.nea.org/home/50526.htm
Polikoff, M. S. (2015). The stability of observational and student survey measures of teaching effectiveness. American Journal of Education, 121.
Popham, J. W. (2001). Teaching to the test? Educational Leadership, (March).
Quint, J. C., Akey, T. M., Rappaport, S., & Willner, C. J. (2007). Instructional leadership, Teaching quality and student achievement suggestive evidence from three urban school districts.
Resnick, L., Matsumara, L. C., & Junker, B. (2006). Measuring reading comprehension and mathematics instruction in urban middle schools: a pilot study of the Instructional Quality Assessment (CSE Technical report No. 681). Los Angeles, CA: Center for the study of evaluation.
Rockoff, J. E. (2004). The impact of individual teachers on student achievement: evidence from panel data. The American Economic Review, 94(2).
Rockoff, J., & Turner, L. J. (2010). Short-run impacts of accountability on school quality. American Economic Journal: Economic Policy, 2, 119–147.
Google Scholar
Schools and Staffing Survey. (n.d.-a). Among public school teachers born in 1946 or later, total number of teachers, average years of teaching experience, average age, and percentage distribution by sex, marital status, years of teaching experience, race/ethnicity, and selected year of birth: 2011–12. Retrieved July 3, 2017, from https://nces.ed.gov/surveys/sass/tables/sass1112_20170125_t1n.asp
Schools and Staffing Survey. (n.d.-b). Characteristics of public, private, and bureau of Indian education elementary and secondary school teachers in the United States: results from the 2007–08 Schools and Staffing Survey. Retrieved July 3, 2017, from https://nces.ed.gov/pubs2009/2009324/tables/sass0708_2009324_t12n_04.asp
Staiger, D. O., Kane, T. J. (2013). Making Decisions with Imprecise Performance Measures: The Relationship Between Annual Student Achievement Gains and a Teacher’s Career Value‐Added
StataCorp. (2013). Stata 13 base reference manual. College Station: Stata Press.
Google Scholar
Stein, M. K., & Lane, S. (1996). Instructional tasks and the development of student capacity to think and reason: an analysis of the relationship between teaching and learning in a reform mathematics project. Educational Research and Evaluation, 2(1), 50–80.
Article Google Scholar
Stein, M. K., Grover, B. W., & Henningsen, M. A. (1996). Building student capacity for mathematical thinking and reasoning: an analysis of mathematical tasks used in reform classrooms. American Educational Research Journal, 33(2).
Steinberg, M. P., & Garrett, R. (2016). Classroom composition and measured teacher performance: what do teacher observation scores really measure? Educational evaluation and policy analysis, XX(X), 0162373715616249-. https://doi.org/10.3102/0162373715616249.
Toch, T. (2006). Margins of error. Education Sector.
Watanabe, M. (2007). Displaced teacher and state priorities in a high-stakes accountability context. https://doi.org/10.1177/0895904805284114.
Wilhelm, A. G., & Kim, S. (2015). Generalizing from observations of mathematics teachers’ instructional practice using the instructional quality assessment. Journal for Research in Mathematics Education, 46(3), 270–279.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Education Leadership, CEHD, George Mason University, 4400 University Dr, MS2F1, Thompson Hall, Fairfax, VA, 22030, USA
Seth B. Hunter

Authors

Seth B. Hunter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seth B. Hunter.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: IQA Task Potential, Implementation, and Discussion Rubrics

Copies of the IQA rubrics come from Boston and Wolf (2006). For additional details see (“MIST Instruments,” n.d.)

Table 7 Academic Rigor: Potential of the Task

Full size table

Table 8 Academic Rigor: Implementation

Full size table

Table 9 Academic Rigor: Discussion

Full size table

Appendix 2: Some Measurement Properties of IQA

This section describes procedures used to estimate the year-to-year stability coefficient and reliability of IQA scores as a measure of the long-term component of a teacher’s instructional practices.

Like Polikoff (2015), I estimate the year-to-year stability of the IQA score by regressing the ith teacher’s score in year t (IQA_it) on the IQA score the same teacher received in year t − 1 (IQA_i,t-1) and district dummy variables. The stability coefficient is the coefficient associated with the teacher’s lagged IQA score represents the extent to which a teacher implements similar instructional practices from year-to-year. I use 241 teacher–year observations in my model, the overall adjusted R² was 0.13, and the stability coefficient and district-clustered standard error was 0.31 and [0.11], respectively. Polikoff (2015) estimated a stability coefficient for Charlotte Danielson’s Framework for Teaching and the MQI observational rubrics of 0.49 and 0.12, respectively. Thus, the IQA stability coefficient is lower than the score associated with the generic Danielson rubric, but much higher than the coefficient associated with the much more comparable MQI rubric.

Staiger and Kane (2013) argue year-to-year Pearson correlations of noisy measurements, like the stability coefficients in the immediately previous model, underestimate the reliability of an instrument as a measure of “true” performance. These authors show the square root of the correlation of year-to-year teacher performance is a better estimate of instrument reliability. The square root of the year-to-year correlation (i.e., \( \sqrt{\rho_{IQA_{it}{IQA}_{i,t-1}}} \)) represents the correlation of a short-term measurement (IQA_it) to a teacher’s true, time-invariant performance (i.e., teacher level mean IQA score, \( \overline{IQA_{it}} \)), and produces what Staiger and Kane call a year–to–career correlation (2013). Conceptually, a year-to-career correlation is better than a year-to-year correlation because short term measures (i.e., IQA_it) contain more measurement error than the career measure (i.e., \( \overline{IQA_{it}} \)), leading to attenuation in the year-to-year correlation (Staiger and Kane 2013). In my analytical dataset comprised of 241 teacher–year observations \( {\rho}_{IQA_{it}{IQA}_{i,t-1}} \)= 0.36, so the reliability of IQA scores as a measure of the long-term component of teacher instructional practices is 0.60.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hunter, S.B. New evidence concerning school accountability and mathematics instructional quality in the no child left behind era. Educ Asse Eval Acc 31, 409–436 (2019). https://doi.org/10.1007/s11092-019-09307-6

Download citation

Received: 17 January 2019
Accepted: 30 September 2019
Published: 16 November 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s11092-019-09307-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New evidence concerning school accountability and mathematics instructional quality in the no child left behind era

Abstract

Access this article

Similar content being viewed by others

Accountability, Achievement, and Inequality in American Public Schools: A Review of the Literature

Preparing Teachers’ to Raise Students’ Mathematics Learning

A Lesson for the Common Core Standards Era from the NCTM Standards Era: The Importance of Considering School-Level Buy-in When Implementing and Evaluating Standards-Based Instructional Materials

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix 1: IQA Task Potential, Implementation, and Discussion Rubrics

Appendix 2: Some Measurement Properties of IQA

Rights and permissions

About this article

Cite this article

Keywords

Navigation

New evidence concerning school accountability and mathematics instructional quality in the no child left behind era

Abstract

Access this article

Similar content being viewed by others

Accountability, Achievement, and Inequality in American Public Schools: A Review of the Literature

Preparing Teachers’ to Raise Students’ Mathematics Learning

A Lesson for the Common Core Standards Era from the NCTM Standards Era: The Importance of Considering School-Level Buy-in When Implementing and Evaluating Standards-Based Instructional Materials

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix 1: IQA Task Potential, Implementation, and Discussion Rubrics

Appendix 2: Some Measurement Properties of IQA

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation