High-stakes testing pressures schools to raise test scores, but schools respond to pressure in different ways. Some responses produce real, broad increases in learning, but other responses can raise reported test scores without increasing learning. We estimate the effect of an accountability program on reading scores and math scores in Chile. Over a 6-year period, fourth-grade reading and math scores rose by 0.2 to 0.3 standard deviations, on average, and half the rise was due to the accountability program. However, many schools, especially schools serving disadvantaged students, inflated their accountability ratings by having low-performing students miss high-stakes tests. To encourage healthier responses to accountability, we recommend setting accountability goals that are attainable for schools with disadvantaged students, and providing incentives for all students to take high-stakes tests.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Practically all the students who missed reading tests missed math tests as well. We got essentially the same result if Yijt represented missing the math test, missing the reading test, or missing both.
A third approach would be to restrict the data to public schools, but then the model would not be identified because the SEP variable would be almost perfectly collinear with the year fixed effects. This collinearity would result from the fact that nearly all public schools joined SEP in 2008 (see Fig. 1).
In an earlier version of this article, the imputation model was even more flexible; its parameters were estimated separately within every school and year. That model ran much more slowly, however, and may have been overfit in small schools. In any case, its results were practically identical to those obtained from the model we have described here.
In every year, the Ministry of Education reported scores for schools with at least six fourth-grade scores in each subject, but some students miss tests, so a school with more than six fourth graders might have fewer than six test scores. By limiting the analysis to schools with at least fifteen fourth graders, we ensured that those schools would have at least six test scores. Any cutoff between ten and twenty fourth graders produced similar results.
Under SEP’s accountability system, Mineduc classified schools as “autonomous” (the highest level), “emergent” (the intermediate level), or “in recovery” (the lowest level). From 2008 to 2011, Mineduc classified 12% of SEP schools as “autonomous,” 88% as “emergent,” and none as “in recovery.” After 2012, Mineduc classified 2.5% of SEP schools as “in recovery.”
A typical threshold for repeating was a GPA of four, but the exact threshold varied from school to school.
The vulnerability index was calculated by the National School and Scholarship Aid Board, the public body charged with providing food assistance to schools (Mineduc 2008b)
Imputation often has little effect on fixed-effect estimates (Young and Johnson 2015). In an earlier study on the effect of high-stakes testing in Chicago, imputing missing scores also had little effect on fixed-effects estimates (Jacob 2005). However, the Chicago study imputed a constant, which can bias estimates (Allison 2002), instead of using multiple imputation. The Chicago study also did not examine the effect of missing test scores on school accountability ratings, which can be larger, as we have shown.
The only exception is the effect of being 3 years before SEP, which is significant at p < .05. However, this could be an artifact of multiple tests. With 7 years before participation, the probability of one of those years having .01 < p < .05 would be approximately 30%, even if there were no pre-trends at all.
Allison, P. (2002). Missing data. Sage Publication.
Brian, J., & Levitt, S. (2003). Rotten apples: an investigation of the prevalence and predictors of teacher cheating. The Quaterly Journal of Economics, no. August, 843–877.
Cameron, C., & Miller, D. (2015). A practitioner’s guide to cluster-robust inference. Journal of Human Resources, 50(2), 317–372.
Campbell, D. (1979). Assessing the impact of planned social change. Evaluation and Program Planning, 2(1), 67–90.
Carrasco, A., Bogolasky, F., Flores, C., Guitiérrez, G., & Martin, E. S. (2014). Seleccion de Estudiantes y Desigualdad Educacional En Chile: ¿Qué Tan Coactiva Es La Regulación Que La Prohíbe? Fondo de Investigación y Desarrollo en Educación. MINEDUC.
Congreso de Chile. 2008. Ley de Subvención Escolar Preferencial. http://www.leychile.cl/Navegar?idNorma=269001. Accessed 2 Jan 2019.
Downey, D. B., von Hippel, P. T., & Hughes, M. (2008). Are ‘failing’ school really failing? Using seasonal comparisons to evaluate school effectiveness. Sociology of Education, 81(3), 242–270 http://www.sociology.ohio-state.edu/people/ptv/publications/failing_schools.pdf.
Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2014). Choosing the right growth measure. Education Next 14 (January): 67+.
Falabella, A. (2013). Accountability policy effects within school markets a study in three Chilean municipalities. University of London http://www.academia.edu/12249727/accountability_policy_effects_within_school_markets_a_study_in_three_chilean_municipalities. Accessed 15 Feb 2015.
Figlio, D., & Getzler, L. (2006). Accountability, ability and disability: Gaming the system? In Improving School Accountability, edited by Getzler Lawrence S., Timothy J Gronberg, and Dennis W Jansen (Vol. 14, pp. 35–49). Advances in Applied Microeconomics. Emerald Group Publishing Limited. https://doi.org/10.1016/S0278-0984(06)14002-X.
Hamilton, L. S., Stecher, B. M., & Klein, S. P. (2002). Making sense of test-based accountability in education. Edited by RAND. Rand Corporation.
Hofflinger, A. (2015). Do competition and accountability improve quality of education?: The Chilean case from 2002 to 2013. University of Texas at Austin: Dissertation, LBJ School of Public Affairs.
Hofflinger, A., & von Hippel, P. T. (2020). Does achievement rise fastest with school choice, school resources, or family resources? Chile from 2002 to 2013. Sociology of Education, 93, 132–152. https://doi.org/10.1177/0038040719899358.
Jacob, B. (2005). Accountability, incentives and behavior: The impact of high-stakes testing in the Chicago public schools. Journal of Public Economics, 89(5–6), 761–796.
Jennings, J. (2012). The effects of accountability system design on teachers’ use of test score data. Teachers College Record, 114(11).
Jennings, J., & Corcoran, S. (2011). Beyond high stakes: Teacher effects on multiple educational outcomes. In In Assessing Teacher Quality: Understanding Teacher Effects on Instruction and Achievement, edited by Sean Kelly, 77–96. Teachers College Press.
Kogan, V., Lavertu, S., & Peskowitz, Z. (2016). Performance federalism and local democracy: Theory and evidence from school tax referenda. American Journal of Political Science, 60(2), 418–435.
Koretz, D. (2003). Using multiple measures to address perverse incentives and score inflation. Educational Measurement: Issues and Practice, 22(2), 18–26.
Koretz, D. (2008). Measuring up: What educational testing really tells us. Harvard University Press.
Koretz, D., & Hamilton, L. S. (2006). Testing for accountability in K-12. In Educational measurement, edited by R. L Brennan (pp. 531–578). Westport: American Council on Education/Praeger.
Madaus, G., & Clarke, M. (2001). The adverse impact of high stakes testing on minority students: Evidence from 100 years of test data. In In Raising Standars or Raising Barriers? Inequality and High Stakes Testing in Public Education, edited by Gary Orfield and Mindy Kornhaber, 51. The Century Foundation http://files.eric.ed.gov/fulltext/ED450183.pdf. Accessed 10 Dec 2018.
Manzi, J., Bogolasky, F., Grau, V., Guitiérrez, G., & Volante, P. (2014). Análisis Sobre Valoraciones, Comprensión y Uso Del SIMCE Por Parte de Directores Escolares de Establecimientos Subvencionados. Santiago.
Mineduc. 2003. Reforma Constitucional Que Establece La Obligatoriedad y Gratuidad de La Educación Media. file:///C:/Users/Alvaro/Downloads/LEY-19876_22-MAY-2003.pdf.
Mineduc. 2008a. “Anexo I Resumen Ley de Subvención Escolar Preferencial.” Santiago, Chile. http://portales.mineduc.cl/usuarios/convivencia_escolar/doc/201103050058380.Anexo 1 Resumen Ley SEP.pdf. Accessed 5 Feb 2015.
Mineduc. (2008b). Manual de Uso de La Base de Datos SIMCE 2008 Para 4 Basico. Santiago: http://www.agenciaeducacion.cl/simce/bases-de-datos-nacionales/. Accessed 12 Feb 2015.
Mineduc. (2013). Resumen Prioritarios y Beneficiarios SEP. Bases de Datos., 2013 http://centroestudios.mineduc.cl/index.php?t=96&i=2&cc=2036&tm=2. Accessed 12 Feb 2015.
Mineduc. 2016. “Niveles de Logro.” 2016. http://www.agenciaeducacion.cl/biblioteca-digital/niveles-de-logro/. Accessed 5 Dec 2018.
Mizala, A., & Torche, F. (2013). ¿Logra La Subvención Escolar Preferencial Igualar Los Resultados Educativos? Espacio Publico, 09, 1–36.
Mizala, A., & Torche, F. (2017). Means-tested school vouchers and educational achievement: Evidence from Chile’s universal voucher system. The Annals of the American Academy of Political and Social Science, 674(1), 163–183.
Mizala, A., & Urquiola, M. (2007). “School markets: The impact of information approximating schools’ effectiveness.” 13676. Cambridge.
Roman, M. (1999). “Usos Alternativos Del Simce: Padres, Directivos y Docentes.” 5. Santiago: CIDE.
Royston, P., Carlin, J., & White, I. (2009). Multiple imputation of missing values: New features for Mim. Stata Journal, 9(2), 252–264.
Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys (Vol. 81). Wiley.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. Chapman and Hall/CRC.
Torche, F. (2005). Privatization reform and inequality of educational opportunity. Sociology of Education, 78(mmi), 316–343.
von Hippel, P. T. (2013). Should a normal imputation model be modified to impute skewed variables? Sociological Methods & Research, 42(1), 105–138.
von Hippel, P. T. (2018). How many imputations do you need? A two-stage calculation using a quadratic rule. Sociological Methods & Research January, 004912411774730.
von Hippel, P. T., Scarpino, S., & Holas, I. (2016). Robust estimation of inequality from binned incomes. Sociological Methodology, 46(1).
Young, R., & Johnson, D. (2015). Handling missing values in longitudinal panel data with multiple imputation. Journal of Marriage and Family, 77(1), 277–294.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Hofflinger, A., von Hippel, P.T. Missing children: how Chilean schools evaded accountability by having low-performing students miss high-stakes tests. Educ Asse Eval Acc 32, 127–152 (2020). https://doi.org/10.1007/s11092-020-09318-8