Validating “value added” in the primary grades: one district’s attempts to increase fairness and inclusivity in its teacher evaluation system

  • Audrey Amrein-BeardsleyEmail author
  • Sarah Polasky
  • Jessica Holloway-Libell


One urban district in the state of Arizona sought to use an alternative achievement test (i.e., the Northwest Evaluation Association’s (NWEA) Measures of Academic Progress for Primary Grades (MAP)) to include more value-added ineligible teachers in the districts’ growth and merit pay system. The goal was to allow for its K-2 teachers to be more fairly and inclusively eligible for individual, teacher-level value-added scores and the differential merit pay bonuses that were to come along with growth. At the request of district administrators, researchers examined whether the different tests to be used, along with their growth estimates, yielded similar output (i.e., concurrent-related evidence of validity). Researchers found results to be (disappointingly for the district) chaotic, without underlying trend or order. Using the K-2 test for increased fairness and inclusivity was therefore deemed inappropriate. Research findings might be used to inform other districts’ examinations, particularly in terms of this early childhood test.


Value-added Growth Teacher effectiveness Fairness Validity Early childhood Participatory research 


Compliance with ethical standards

We, as authors, submit the following in terms of our compliance with ethical standards regarding the research at the focus of this manuscript.

Conflict of interest

The authors declare that they have no competing interests.

Research involving human participants and/or animals

This research involved human subjects, but only data already available at the district and collected and analyzed in line with Arizona State University’s Institutional Review Board (IRB) procedures (ruling: exempt).

Informed consent

None required


  1. Adler, M. (2013). Findings vs. interpretation in “The Long-Term Impacts of Teachers” by Chetty et al. Education Policy Analysis Archives, 21(1), p. 10. doi:10.14507/epaa.v21n10.2013 Retrieved from
  2. American Statistical Association (2014). ASA statement on using value-added models for educational assessment. Alexandria, VA. Retrieved from:
  3. Amrein-Beardsley, A. (2014). Rethinking value-added models in education: critical perspectives on tests and assessment-based accountability. New York, NY: Routledge.Google Scholar
  4. Arizona Department of Education. (2012c). A parent’s guide to understanding AIMS 3–8. Phoenix, AZ. Retrieved from
  5. Arizona Department of Education. (2012a). A-F Accountability. Phoenix, AZ. Retrieved from
  6. Arizona Department of Education. (2012b). A-F Letter Grade Accountability System technical manual. Phoenix, AZ. Retrieved from
  7. Arizona Department of Education. (2014a). Assessment. Phoenix, AZ. Retrieved from
  8. Arizona Department of Education. (2014b). Arizona Framework for Measuring Educator Effectiveness: Effective through the 2013–2014 school year. Phoenix, AZ. Retrieved from
  9. Arizona Department of Education. (2014c). Arizona Framework for Measuring Educator Effectiveness: Effective beginning the 2014–2015 school year. Phoenix, AZ. Retrieved from
  10. Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., Ravitch, D., Rothstein, R., Shavelson, R. J., & Shepard, L. A. (2010). Problems with the use of student test scores to evaluate teachers. Washington, D.C.: Economic Policy Institute. Retrieved from
  11. Baker, B. D., Oluwole, J. O., & Green, P. C. (2013). The legal consequences of mandating high stakes decisions based on low quality information: teacher evaluation in the race-to-the-top era. Education Policy Analysis Archives, 21(5), 1–71. doi:10.14507/epaa.v21n5.2013 Retrieved from
  12. Berliner, D. C. (2014). Exogenous variables and value-added assessments: a fatal flaw. Teachers College Record, 116(1). Retrieved from
  13. Betebenner, D. W. (2009a). Growth, standards and accountability. Dover: The Center for Assessment. Retrieved from: Scholar
  14. Betebenner, D. W. (2009b). Norm- and criterion-referenced student growth. Educational Measurement: Issues and Practice, 28(4), 42–51. doi: 10.1111/j.1745-3992.2009.00161.x.CrossRefGoogle Scholar
  15. Bill & Melinda Gates Foundation. (2010, December). Learning about teaching: initial findings from the Measures of Effective Teaching Project. Seattle, WA. Retrieved from
  16. Bill & Melinda Gates Foundation. (2013, January 8). Ensuring fair and reliable measures of effective teaching: culminating findings from the MET project’s three-year study. Seattle, WA. Retrieved from
  17. Brennan, R. L. (2006) Perspectives on the evolution and future of educational measurement. In R. L. Brennan (Ed.) 2006. Educational measurement (4th ed.), pp. 1–16. Westport, CT: American Council on Education/PraegerGoogle Scholar
  18. Brennan, R. L. (2013). Commentary on “Validating interpretations and uses of test scores.”. Journal of Educational Measurement, 50(1), 74–83. doi: 10.1111/jedm.12001.CrossRefGoogle Scholar
  19. Briggs, D. C., & Betebenner, D. (2009). Is growth in student achievement scale dependent? Paper presented at the annual meeting of the National Council for Measurement in Education (NCME), San Diego, CA.Google Scholar
  20. Castellano, K.E. & Ho, A.D. (2013). A practitioner’s guide to growth models. Council of Chief State School OfficersGoogle Scholar
  21. Chetty, R., Friedman, J. N., & Rockoff, J. E. (2011, December). The long-term impacts of teachers: teacher value-added and student outcomes in adulthood. Retrieved from
  22. Chetty, R., Friedman, J. N., & Rockoff, J. (2014). Discussion of the American Statistical Association’s Statement (2014) on using value-added models for educational assessment. Retrieved from
  23. Collins, C. (2014). Houston, we have a problem: teachers find no value in the SAS Education Value-Added Assessment System (EVAAS®). Education Policy Analysis Archives, 22. doi:10.14507/epaa.v22.1594. Retrieved from
  24. Collins, C., & Amrein-Beardsley, A. (2014). Putting growth and value-added models on the map: A national overview. Teachers College Record, 16(1). Retrieved from:
  25. Corcoran, S. P., Jennings, J. L., & Beveridge, A. A. (2011). Teacher effectiveness on high- and low-stakes tests. Retrieved from
  26. Di Carlo, M. (2013, January 17). A few points about the instability of value-added estimates. The Shanker Blog. Retrieved from
  27. Duncan, A. (2009, July 4). The race to the top begins: remarks by Secretary Arne Duncan. Retrieved from
  28. Duncan, A. (2011, March 9). Winning the future with education: responsibility, reform and results. Testimony given to the U.S. Congress, Washington, D.C.: Retrieved from reform-and-results
  29. Duncan, A. (2014, August 21). A back-to-school conversation with teachers and school leaders. SmartBlog on Education. Retrieved from
  30. Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2012, August). Selecting growth measures for school and teacher evaluations. Washington, D.C.: National Center for Analysis of Longitudinal Data in Education Research (CALDER). Retrieved from
  31. Gabriel, R., & Lester, J. N. (2013). Sentinels guarding the grail: value-added measurement and the quest for education reform. Education Policy Analysis Archives, 21(9), 1–30. doi: 10.14507/epaa.v21n9.2013. Retrieved from Scholar
  32. Gill, B., English, B., Furgeson, J., & McCullough, M. (2014). Alternative student growth measures for teacher evaluation: profiles of early-adopting districts. (REL 2014–016). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Mid-Atlantic. Retrieved from
  33. Glazerman, S. M., & Potamites, L. (2011, December). False performance gains: a critique of successive cohort indicators. Mathematica Policy Research. Retrieved from…/False_Perf.pdf
  34. Goldhaber, D., Gabele, B., & Walch, J. (2012). Does the model matter? Exploring the relationship between different achievement-based teacher assessments. CEDR Working Paper No. 2012–6. Seattle, WA: University of Washington. Retrieved from 10.1080/2330443X.2013.856169
  35. Goldhaber, D. & Theobald, R. (2012, October 15). Do different value-added models tell us the same things? Carnegie Knowledge Network. Retrieved from
  36. Goldschmidt, P., Choi, K., & Beaudoin, J. B. (2012, February). Growth model comparison study: practical implications of alternative models for evaluating school performance. Technical Issues in Large-Scale Assessment State Collaborative on Assessment and Student Standards. Council of Chief State School OfficersGoogle Scholar
  37. Grossman, P., Cohen, J., Ronfeldt, M., & Brown, L. (2014). The test matters: the relationship between classroom observation scores and teacher value added on multiple types of assessment. Educational Researcher, 43(6), 293–303. doi: 10.3102/0013189X14544542.CrossRefGoogle Scholar
  38. Guarino, C., Reckase, M., Stacy, B., & Wooldridge, J. (2015). A comparison of student growth percentile and value-added models of teacher performance. Statistics and Public Policy, 2(1), e1034820–1. doi: 10.1080/2330443X.2015.1034820.CrossRefGoogle Scholar
  39. Haertel, E. H. (2013). Reliability and validity of inferences about teachers based on student test scores. Princeton: Education Testing Service. Retrieved from Scholar
  40. Harris, D. N. (2011). Value-added measures in education: what every educator needs to know. Cambridge: Harvard Education Press.Google Scholar
  41. Hill, H. C., Kapitula, L., & Umland, K. (2011). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48(3), 794–831. doi: 10.3102/0002831210387916.CrossRefGoogle Scholar
  42. Ho, A. D., Lewis, D. M., & Farris, J. L. (2009). The dependence of growth-model results on proficiency cut scores. Educational Measurement: Issues and Practice, 28(4), 15–26. doi: 10.1111/j.1745-3992.2009.00159.x.CrossRefGoogle Scholar
  43. Jacob, B. A., & Lefgren, L. (2005, June). Principals as agents: subjective performance measurement in education. Cambridge, MA: The National Bureau of Economic Research (NBER). Retrieved from
  44. Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Washington, D.C.: The National Council on Measurement in Education & the American Council on Education.Google Scholar
  45. Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. doi: 10.1111/jedm.12000.CrossRefGoogle Scholar
  46. Kane, T., & Staiger, D. (2012). Gathering feedback for teaching: combining high-quality observations with student surveys and achievement gains. Seattle: Bill & Melinda Gates Foundation. Retrieved from Scholar
  47. Kersting, N. B., Chen, M., & Stigler, J. W. (2013). Value-added added teacher estimates as part of teacher evaluations: exploring the effects of data and model specifications on the stability of teacher value-added scores. Education Policy Analysis Archives, 21(7), 1–39. Retrieved from Scholar
  48. Koedel, C., & Betts, J. R. (2007, April). Re-examining the role of teacher quality in the educational production function. Working Paper No. 2007–03. Nashville, TN: National Center on Performance Initiatives.Google Scholar
  49. Linn, R. L. (1980). Issues of validity for criterion-referenced measures. Applied Psychological Measurement, 4, 547–561. doi: 10.1177/014662168000400407.CrossRefGoogle Scholar
  50. Lockwood, J. R., & McCaffrey, D. F. (2009). Exploring student-teacher interactions in longitudinal achievement data. Education Finance and Policy, 4(4), 439–467. doi: 10.1162/edfp.2009.4.4.439.CrossRefGoogle Scholar
  51. Lockwood, J. R., McCaffrey, D. F., Hamilton, L. S., Stecher, B., Le, V., & Martinez, J. F. (2007). The sensitivity of value-added teacher effect estimates to different mathematics achievement measures. Journal of Educational Measurement, 44(1), 47–67. doi: 10.1111/j.1745-3984.2007.00026.x.CrossRefGoogle Scholar
  52. McCaffrey, D. F., Sass, T., Lockwood, J., & Mihaly, K. (2009). The intertemporal variability of teacher effect estimates. Education Finance and Policy, 4(4), 572–606. doi: 10.1162/edfp.2009.4.4.572.CrossRefGoogle Scholar
  53. Messick, S. (1975). The standard problem: meaning and values in measurement and evaluation. American Psychologist, 30(10), 955–66.CrossRefGoogle Scholar
  54. Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35(11), 1012–1027.CrossRefGoogle Scholar
  55. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement, 3rd ed. (pp. 13-103.) New York: American Council on Education and MacmillanGoogle Scholar
  56. Messick, S. (1995). Validity of psychological assessment: validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749.CrossRefGoogle Scholar
  57. National Council on Teacher Quality. (2013). State of the States 2013 [Connect the dots]: using evaluations of teacher effectiveness to inform policy and practice. Retrieved from
  58. Newton, X., Darling-Hammond, L., Haertel, E., & Thomas, E. (2010). Value-added modeling of teacher effectiveness: an exploration of stability across models and contexts. Educational Policy Analysis Archives, 18(23), 1–27. Retrieved from Scholar
  59. No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107–110, § 115 Stat. 1425. (2002). Retrieved from
  60. Northwest Evaluation Association (NWEA). (2004). Reliability and validity estimates: NWEA Achievement Level Tests and Measures of Academic Progress. Lake Oswego, Oregon: Retrieved from Scholar
  61. Northwest Evaluation Association (NWEA). (2011a). Arizona linking study: a study of the alignment of the NWEA RIT Scale with Arizona’s Instrument to Measure Standards (AIMS). Portland, OR: Retrieved from
  62. Northwest Evaluation Association (NWEA). (2011b). 2011 normative data. Portland, OR: Retrieved from ew.pdf
  63. Northwest Evaluation Association (NWEA). (2014a). RIT charts—MAP. Portland, OR: Retrieved from
  64. Northwest Evaluation Association (NWEA). (2014b). Growth norms. Portland, OR: Retrieved from
  65. Northwest Evaluation Association (NWEA). (2012). MAP® basics overview. Portland, OR: Retrieved from
  66. Northwest Evaluation Association (NWEA). (2013). Common Core MAP® and MAP for Primary Grades (MPG). Portland, OR: Retrieved from Scholar
  67. Papay, J. P. (2010). Different tests, different answers: the stability of teacher value-added estimates across outcome measures. American Educational Research Journal, 48(1), 163–193. doi: 10.3102/0002831210362589.CrossRefGoogle Scholar
  68. Pearson Education, Inc. (2011). Stanford Achievement Test Series, Tenth Edition. Retrieved from
  69. Pivovarova, M., Broatch, J., & Amrein-Beardsley, A. (2014). Chetty et al. on the American Statistical Association’s recent position statement on value-added models (VAMs): five points of contention [commentary]. Teachers College Record. Retrieved from
  70. Polikoff, M. S., & Porter, A. C. (2014, May 12). Instructional alignment as a measure of teaching quality. Education Evaluation and Policy Analysis. doi: 10.3102/0162373714531851
  71. Popham, W. J. (1993). Educational testing in America: What’s right, what’s wrong? a criterion referenced perspective. Educational Measurement, 2((1), 11–14. doi: 10.1111/j.1745-3992.1993.tb00517.x.Google Scholar
  72. Popham, W. J. (2011). Classroom assessment: what teachers need to know (6th ed.). BostonGoogle Scholar
  73. Race to the Top Act of 2011, S. 844--112th Congress. (2011). Retrieved from
  74. Rothstein, J. (2009, January 11). Student sorting and bias in value-added estimation: selection on observables and unobservables. Cambridge, MA: The National Bureau of Economic Research. Retrieved from
  75. Sass, T. R. (2008). The stability of value-added measures of teacher quality and implications for teacher compensation policy. Washington, D.C.: National Center for Analysis of Longitudinal Data in Education Research (CALDER). Retrieved from
  76. Sass, T., Semykina, A., & Harris, D. (2014). Value-added models and the measurement of teacher productivity. Economics of Education Review, 38, 9–23.CrossRefGoogle Scholar
  77. Schochet, P. Z., & Chiang, H. S. (2013). What are error rates for classifying teacher and school performance using value-added models? Journal of Educational and Behavioral Statistics, 38, 142–171. doi: 10.3102/1076998611432174.CrossRefGoogle Scholar
  78. Shaw, L. (2013, March 30). Educators debate validity of MAP testing. The Seattle Times. Retrieved from Scholar
  79. Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH. Retrieved from
  80. Strunk, K. O., Weinsten, T. L., Makkonen, R. (2014). Sorting out the signal: do multiple measures of teachers’ effectiveness provide consistent information to teachers and principals? Education Policy Analysis Archives, 22(1), 100. doi:10.14507/epaa.v22.1590 Retrieved from
  81. U.S. Department of Education. (2006, May 17). Secretary Spellings approves Tennessee and North Carolina growth model pilots for 2005–2006. Retrieved from
  82. Walsh, E., & Isenberg, E. (2015). How does value-added compare to student growth percentiles? Statistics and Public Policy, 2(1), e1034390. doi: 10.1080/2330443X.2015.1034390.CrossRefGoogle Scholar
  83. Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The Widget Effect. Education Digest, 75(2), 31–35.Google Scholar
  84. Whitehurst, G. J. R., Chingos, M. M., & Lindquist, M. M. (2015). Getting classroom observations right. Education Next, 15(1). Retrieved from
  85. Wright, S. P., White, J. T., Sanders, W. L., & Rivers, J. C. (2010). SAS white paper. Cary: SAS Institute. SAS® EVAAS® statistical models, Retrieved from Scholar
  86. Yeh, S. S. (2013). A re-analysis of the effects of teacher replacement using value-added modeling. Teachers College Record, 115(12), 1–35. Retrieved from Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Audrey Amrein-Beardsley
    • 1
    Email author
  • Sarah Polasky
    • 1
  • Jessica Holloway-Libell
    • 1
    • 2
  1. 1.Mary Lou Fulton Teachers CollegeArizona State UniversityTempeUSA
  2. 2.College of EducationKansas State UniversityManhattanUSA

Personalised recommendations