Skip to main content
Log in

Comparing and assessing the consequences of two different approaches to measuring school effectiveness

  • Published:
Educational Assessment, Evaluation and Accountability Aims and scope Submit manuscript

Abstract

Nations, states, and districts must choose among an array of different approaches to measuring school effectiveness in implementing their accountability policies, and the choice can be difficult because different approaches yield different results. This study compares two approaches to computing school effectiveness: a “beating the odds” type approach and a “value-added” approach. We analyze the approaches using both administrative data and simulated data and reveal the reasons why they produce different results. We find that differences are driven by a combination of factors related to modeling decisions as well as bias stemming from nonrandom assignment. Generally, we find that the value-added method provides a more defensible measure of school effectiveness based purely on test scores, but we note advantages and disadvantages of both approaches. This study highlights the consequences of several of the many choices facing policymakers in choosing a methodology for measuring school effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5:

Similar content being viewed by others

Notes

  1. The Every Student Succeeds Act (ESSA) of 2015 updated and replaced the No Child Left Behind (NCLB) Act of 2001. Although ESSA eased federal pressure on teacher evaluation, it continued to emphasize school-level accountability. The primary change to school-level accountability was the requirement to include an additional measure of school performance—such as chronic absenteeism—in addition to test scores. However, test scores remained an important part of the evaluation process. Thus, research into the best methods to evaluate schools is still needed, particularly given that school systems can differ greatly in the approach they take and that some persist in using less desirable methods.

  2. See https://ies.ed.gov/ncee/edlabs/projects/beating_the_odds.asp for a list of several studies using BTO-type methods.

  3. See, for example, numerous studies investigate features of models such as the value-added, contextualized value-added, expected progress, and progress 8 models that have been implemented for accountability purposes in the UK (e.g., Wilson and Piebalga 2008; Leckie and Goldstein 2009; Dumay et al. 2014; Perry 2016; Leckie and Goldstein 2017), value-added approaches using Italian data (Agasisti and Minaya forthcoming), various permutations on value-added models applied to the Dutch context (Timmermans et al. 2011), value-added measures in Australia (Marks 2015), and a quantile value-added approach applied to the Chilean context (Page et al. 2017). The US literature on school-level effectiveness research is large—see Everson (2017) for a review.

  4. This problem with value-added can also extend to the measurement of teacher effects in the presence of test ceiling effects (see, for example Koedel and Betts 2010).

  5. SBAC stands for tests that are developed and administered by the Smarter Balanced Assessment Consortium, and PARCC stands for tests that are developed and administered by the Partnership for Assessment of Readiness for College and Careers. These tests are aligned with the Common Core State Standards and generally administered on the computer.

  6. The model was specified in a document entitled “BTO Technical Overview 72315”, provided by the SCSC. A similar model has been in continued use.

  7. The BTO measure used the CCRPI Single Score without Challenge points. For schools that did not span grade clusters, this score was the Single Score minus Challenge points. For schools that spanned grade clusters, this score was the weighted average based on enrollment of each grade cluster’s CCRPI score without Challenge points. Enrollment by grade cluster was provided by the Georgia Department of Education’s Accountability Division (BTO Technical Overview 72315).

  8. Sass (2014, p. 10) reported that “for grades 3–8, the FAY was determined by the number of calendar days between the start of each school’s school year and the end of the state CRCT testing window. For grades 9–12, the FAY for each school was measured by the calendar days between the start and end of the school year. For each student, the school of longest attendance was determined based on individual attendance records. The total calendar days enrolled at the school of longest attendance was then determined. If a student’s calendar days of enrollment were at least 65 percent of the FAY, they were assigned to that school for the purposes of determining value-added school effects and mean or median school SGPs.”

  9. BTO Technical Overview 72315

  10. The BTO analysis was used by the Georgia Department of Education’s Charter School Division to evaluate all charter schools as of 2014 and in later years. We use data from two prior periods 2012 and 2013 to investigate the alignment between the BTO and VAM approaches, given that those were the data made available to us.

  11. The SCSC partnered with the Governor’s Office of Student Achievement and Dr. Tim Sass, distinguished University Professor in the Department of Economics and the Andrew Young School of Policy Studies at Georgia State University, to evaluate the performance of all state charter schools during the 2012–2013 school year.

  12. The value-added model is described in detail in the document titled, “Technical Appendix to The Performance of State Charter Schools in Georgia, 2012–2013”. The document was last accessed on February 28, 2018, at the following website (http://scsc.georgia.gov/2013-state-charter-school-value-add-performance-report )

  13. The ordinary least squares regression method estimates the parameters in the value-added model by finding parameters that minimize the squared differences between the test score outcomes and the fitted values of the test score outcomes based on the value-added model.

  14. A school indicator variable is a binary variable that takes a value of 1 if a student attends a given school and takes a value of zero if they do not. As a simple example, if for instance, there are 100 schools, then 100 school indicators can be included in the model pertaining to each school, or 99 can be included if a constant is included in the value-added model. The school effects then are the coefficients on each of these 100 school indicator variables. Intuitively, the school effect gives the amount that student achievement increases or decreases if students are assigned to a particular school, compared usually with some reference point, such as the average school in a district.

  15. The tests were done grade by grade. We could not test all school districts in Georgia for sorting, because at least two schools per grade are needed in a district to test for differential sorting between schools. To analyze this complex enrollment process of students to schools, we use a multinomial logit model (MNL), in which the probability that a student enrolls in a particular school j in a grade depends on a set of characteristics of the student. These characteristics included prior achievement scores in Math or Reading, free and reduced price lunch status, female, Black, Hispanic, limited English proficiency, gifted, or whether the student was disabled. In all of the grades, the multinomial logit regressions suggest that non-random selection takes place on the basis of at least some of the characteristics in all of the districts. Results are available upon request.

  16. For instance, school j will have 80 students in grade 3, 80 in grade 4, and 80 in grade 5, for a total of 240 students in any given year.

  17. This is referred to as a “geometric distributed lag” assumption. See Guarino et al. (2015a, b, c) for a complete explanation of the assumptions underlying common education production function specifications. We explore an alternative data generating process as a sensitivity check, which is available upon request. Rather than assuming that past educational inputs are captured by the previous test score, A(i(g − 1)), we allow previous school inputs to fade out more slowly at a less geometric rate. The results are qualitatively similar, and our key point that differences between the VAM and BTO estimators are greatest when there exists nonrandom grouping and sorting to school-based remains.

  18. In other words, λ represents the degree to which prior learning persists from one year to the next.

  19. See Authors (2015a, b, c, d) for a detailed explanation of this assumption and its derivation. We refer to this as the “common factor restriction.”

  20. Other work by the authors (Authors, unpublished) finds that when test scores are generated as in (13) such correlation—which seems realistic—is necessary to achieve data that conform to the parameter estimates derived from observed achievement distributions.

  21. This relationship was examined using a probit regression of free and reduced price lunch status on the student’s lagged test score using 4th grade Georgia administrative data. The estimated coefficient on the lagged test score in this regression was approximately − .5. In our simulations, we generated Xit using our probit model estimates. We first generated a latent variable based on the equation, wit = 0.2 − 0.5Ai2 + vit, which was derived from our probit regressions and where vit is a random draw from the standard normal distribution. Then, we form Xit according to the following formula. Xit = 1(wit > 0).

  22. Rough estimates of λ in real data cover a wide range of values. Andrabi et al. (2011) find persistence rates of .5 or lower in Pakistani schools.

  23. This was done in the same way as in Authors (2015) by sorting students on the following, sit = Ai2 + vit or sit = Xit + vit as the case may be, where vit is drawn from the standard normal distribution. This ensures that schools are not perfectly sorted by a student attribute.

  24. A Monte Carlo replication study is a study where repeated draws are taken from a probability distribution to learn about the behavior of some object. In our case, we are studying the performance of the value-added and BTO estimators by making repeated draws to generate data used to produce the estimates. By examining the performance of the estimators over several draws of the data, we can avoid making inferences based on flukes due to sampling.

References

  • Abe, Y., Weinstock, P., Chan, V., Meyers, C., Gerdeman, R. D., & Brandt, W. C. (2015). How methodology decisions affect the variability of schools identified as beating the odds (REL 2015–071.REV). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Midwest. Retrieved from http://www.ies.ed.gov/ncee/edlabs. Accessed 21 Oct 2019

  • Agasisti, T. & Minaya, V. (forthcoming). Evaluating the stability of school performance estimates for school choice: evidence for Italian primary schools. Fiscal Studies.

  • Andrabi, T., Das, J., Khwaja, A. I., & Zajonc, T. (2011). Do value-added estimates add value? Accounting for learning dynamics. American Economic Journal: Applied Economics, 29-54.

  • Bifulco, R., & Ladd, H. F. (2006). The impacts of charter schools on student achievement: evidence from North Carolina. Education Finance and Policy, 1(1), 50–90.

    Article  Google Scholar 

  • Booker, K., Gilpatric, S. M., Gronberg, T., & Jansen, D. (2007). The impact of charter school attendance on student performance. Journal of Public Economics, 91(5), 849–876.

    Article  Google Scholar 

  • Dumay, X., Coe, R., & Nkafu Anumendem, D. (2014). Stability over time of different methods of estimating school performance. School Effectiveness and School Improvement, 25(1), 64–82.

    Article  Google Scholar 

  • Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2014). Selecting growth measures for school and teacher evaluations: should proportionality matter? Education Policy, 1–36.

  • Everson, K. (2017). Value-added modeling and educational accountability: are we asking the right questions? Review of Educational Research, 87(1), 35–70.

    Article  Google Scholar 

  • Goldhaber, D., Walch, J., & Gabele, B. (2014). Does the model matter? Exploring the relationship between different student achievement-based teacher assessments. Statistics and Public Policy, 1(1), 28–39.

    Article  Google Scholar 

  • Guarino, C., Reckase, M., & Wooldridge, J. (2015a Published online November 2014). Can Value-added Measures of Teacher Performance be Trusted? Education Finance and Policy, 10(1), 117–156.

  • Guarino, C., Maxfield, M., Reckase, M., Thompson, P., and Wooldridge, J. (2015b). An Evaluation of Empirical Bayes’ Estimation of Value-Added Teacher Performance Measures. Journal of Educational and Behavioral Statistics, 40, 190–222.

  • Guarino, C., Reckase, M., Stacy, B., and Wooldridge, J. (2015c). A Comparison of Growth Percentile and Value-Added Models of Teacher Performance, Statistics and Public Policy, 2:1, e1034820. https://doi.org/10.1080/2330443X.2015.1034820.

  • Kane, T. J., & Staiger, D. O. (2008). Estimating teacher impacts on student achievement: an experimental evaluation (No. w14607). National Bureau of Economic Research.

  • Kane, T. J., McCaffrey, D. F., Miller, T., & Staiger, D. O. (2013). Have we identified effective teachers? Validating measures of effective teaching using random assignment. Seattle, WA: Bill and Melinda Gates Foundation.

    Google Scholar 

  • Koedel, C., & Betts, J. (2010). Value added to what? How a ceiling in the testing instrument influences value-added estimation. Education Finance and Policy, 5(1), 54–81.

    Article  Google Scholar 

  • Koon, S., Petscher, Y., & Foorman, B. R. (2014). Beating the odds: Finding schools exceeding achievement expectations with high-risk students (REL2014–032). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southeast. Retrieved from http://www.ies.ed.gov/ncee/edlabs.

  • Leckie, G., & Goldstein, H. (2009). The limitations of using school league tables to inform school choice. J. R. Statist. Soc. A, 172(4), 835–851.

    Article  Google Scholar 

  • Leckie, G., & Goldstein, H. (2017). The evolution of school league tables in England 1992–2016: ‘Contextual value-added’, ‘expected progress’ and ‘progress 8’. British Educational Research Journal, 43(2), 193–212.

    Article  Google Scholar 

  • Lockwood, J. R., McCaffrey, D. F., Hamilton, L. S., Stecher, B., Le, V. N., & Martinez, J. F. (2007). The sensitivity of value-added teacher effect estimates to different mathematics achievement measures. Journal of Educational Measurement, 44(1), 47–67.

    Article  Google Scholar 

  • Marks, G. (2015). The size, stability, and consistency of school effects: evidence from Victoria. School Effectiveness and School Improvement, 26(3), 397–414.

    Article  Google Scholar 

  • McCaffrey, D. F., Lockwood, J. R., Koretz, D., Louis, T. A., & Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29(1), 67–101.

    Article  Google Scholar 

  • Meyers, C. V., & Wan, Y. (2016). A comparison of two methods of identifying beating-the odds high schools in Puerto Rico (REL 2017–167). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Northeast &Islands. Retrieved from http://www.ies.ed.gov/ncee/edlabs. Accessed 21 Oct 2019

  • Nye, B., Konstantopoulos, S., & Hedges, L. V. (2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26(3), 237–257.

    Article  Google Scholar 

  • Page, G., San Martin, E., Orellana, J., & Gonzalez, J. (2017). Exploring complete school effectiveness via quantile value-added. J. R. Statist. Soc. A, 180(1), 315–340.

    Article  Google Scholar 

  • Partridge, M. A., & Koon, S. (2017). Beating the odds in Mississippi: identifying schools exceeding achievement expectations (REL 2017–213). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southeast. Retrieved from http://www.ies.ed.gov/ncee/edlabs.

  • Perry, T. (2016). English value-added measures: examining the limitations of school performance measurement. British Educational Research Journal, 42(6), 1056–1080.

    Article  Google Scholar 

  • Reardon, S., & Raudenbush, S. (2009). Assumptions of value-added models for estimating school effects. Education Finance and Policy, 4(4), 492–519.

    Article  Google Scholar 

  • Sass, T. R. (2006). Charter schools and student achievement in Florida. Education Finance and Policy, 1(1), 91–122.

    Article  Google Scholar 

  • Solmon, L., Paark, K., & Garcia, D. (2001). Does charter school attendance improve test scores? The Arizona results.

  • Stacy, B., Guarino, C., and Wooldridge, J. (2018). Does the Precision and Stability of Value-Added Estimates of Teacher Performance Depend on the Types of Students They Serve? Economics of Education Review, 64, 50–74.

  • Timmermans, A., Doolard, S., de Wolf, I. (2011) Conceptual and empirical differences among various value-added models for accountability, School Effectiveness and School Improvement, 393–413.

  • Walsh, E. & Isenberg, E. (2015). How does a value-added model compare to the Colorado growth model? Statistics and Public Policy.

  • Wilson, D., & Piebalga, A. (2008). Performance measures, ranking and parental choice: an analysis of the english school league tables. International Public Management Journal, 11(3), 344–366.

    Article  Google Scholar 

  • Wooldridge, J. (2009). Introductory economics: A modern approach, Edition 4e. South-Western Cengage Learning: Mason, OH, USA.

Download references

Funding

We are grateful to the State Charter Schools Commission of Georgia and the Georgia Department of Education for funding the study that formed the basis for this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cassandra M. Guarino.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 5 Parameters chosen for the base score, Ai2, school effects, βit, student fixed effect, ci, and random error term, eit. All variables based on the normal distribution.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guarino, C.M., Stacy, B.W. & Wooldridge, J.M. Comparing and assessing the consequences of two different approaches to measuring school effectiveness. Educ Asse Eval Acc 31, 437–463 (2019). https://doi.org/10.1007/s11092-019-09308-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11092-019-09308-5

Keywords

Navigation