Towards a framework for the validation of early childhood assessment systems

  • Jessica Goldstein
  • Jessica Kay Flake


American early childhood education is in the midst of drastic change. In recent years, states have begun the process of overhauling early childhood education systems in response to federal grant competitions, bringing an increased focus on assessment and accountability for early learning programs. The assessment of young children is fraught with challenges; psychometricians and educational researchers must work together with the early childhood community to develop these instruments. The purpose of this paper is to present a conceptual framework for the validation of such instrumentation and examine its implications for early childhood educators. We formulate a validity argument for early childhood assessments providing a pivotal link between validity theory and early education practice. Recommendations for the assessment field are also considered.


Validity Early childhood Educational assessment 



This research was supported in part by a contract from the Connecticut State Department of Education.


  1. Aboud, F. (2006). Evaluation of an early childhood preschool program in rural Bangladesh. Early Childhood Research Quarterly, 21(1), 46–60.CrossRefGoogle Scholar
  2. Alexander, K. L., & Entwisle, D. R. (1988). Achievement in the first 2 years of school: Patterns and processes. Monographs of the Society for Research in Child Development, 53(2, Serial No. 218).Google Scholar
  3. Alexander, K. L., & Entwisle, D. R. (1996). Schools and children at risk. In A. Booth & J. F. Dunn (Eds.), Family school links: How do they affect educational outcomes? (pp. 67–87). Mahwah, NJ: Erlbaum.Google Scholar
  4. Alvidrez, J., & Weinstein, R. S. (1999). Early teacher perceptions and later student academic achievement. Journal of Educational Psychology, 91, 731–746.CrossRefGoogle Scholar
  5. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.Google Scholar
  6. Andrich, D., & Styles, I. (2004). Final report on the psychometric analysis of the Early Development Instrument (EDI) using the Rasch Model: a technical paper commissioned for the development of the Australian commissioned for the development of the Australian Early Development Index(AEDI). Perth: Murdoch University.Google Scholar
  7. Barnett, D. W., Macmann, G. M., & Carey, K. T. (1992). Early intervention and the assessment of developmental skills: challenges and directions. Topics in Early Childhood Special Education, 12, 21–43.CrossRefGoogle Scholar
  8. Bassok, D., Fitzpatrick, M., Loeb, S., & Paglayan, A. S. (2012). The early childhood care and education workforce in the United States: Understanding changes from 1990 through 2010. Unpublished Manuscript. Retrieved from
  9. Berkner, L. K., & Chavez, L. (1997). Access to postsecondary education for the 1992 high school graduates. Statistical Analysis Report, NCES 98–105. Washington, DC: U.S. Department of Education, Office of Educational Research and Improvement, National Center for Education Statistics.Google Scholar
  10. Beswick, J. F., Willms, J. D., & Sloat, E. A. (2005). A comparative study of teacher ratings of emergent literacy skills and student performance on a standardized measure. Education, 126(1), 116.Google Scholar
  11. Bowman, B., Donovan, M. S., & Burns, M. S. (Eds.). (2001). Eager to learn: educating our preschoolers. Washington, DC: National Academy Press.Google Scholar
  12. Braswell, J. S., Lutkus, A. D., Grigg, W. S., Santapau, S. L., Tay-Lim, B. S.-H., & Johnson, M. S. (2001). The nation’s report card: mathematics 2000 (NCES 2001–517). Washington, DC: U.S. Department of Education, Office of Educational Research and Improvement, National Center for Education Statistics.Google Scholar
  13. Brinkman, S., & Blackmore, S. (2003). Pilot study results of the Australian early development instrument: a population based measure for communities and community mobilisation tool. Adelaide: Paper presented at the Beyond the Rhetoric in Early Intervention Conference.Google Scholar
  14. Brinkman, S., Silburn, S., Lawrence, D., Goldfeld, S., Sayers, M., & Oberklaid, F. (2007). Investigating the validity of the Australian early development index. Early Education and Development, 18(3), 427–451.CrossRefGoogle Scholar
  15. Bulotsky-Shearer, R. J., Fernandez, V. A., & Rainelli, S. (2013). The validity of the Devereux early childhood assessment for culturally and linguistically diverse head start children. Early Childhood Research Quarterly, 28(4), 794–807.CrossRefGoogle Scholar
  16. Burkam, D. T., LoGerfo, L., Ready, D., & Lee, V. E. (2007). The differential effects of repeating kindergarten. Journal of Education for Students Placed at Risk, 12(2), 103–136.CrossRefGoogle Scholar
  17. Case, R., & Griffin, S. (1990). Child cognitive development: the role of central conceptual structures in the development of scientific and social thought. In C. A. Hauert (Ed.), Developmental psychology: cognitive, perceptuo-motor, and neuropsychological perspectives (pp. 193–230). Amsterdam: Elsevier Science.CrossRefGoogle Scholar
  18. Chappelle, C. A., Enright, M. K., & Jamieson, J. (2010). Does an argument-based approach to validity make a difference? Educational Measurement: Issues and Practice, 29, 3–13. doi: 10.1111/j.1745-3992.2009.00165.x.CrossRefGoogle Scholar
  19. Cizek, G. J., Rosenberg, S., & Koons, H. (2008). Sources of validity evidence for educational and psychological tests. Educational and Psychological Measurement, 68, 397–412.CrossRefGoogle Scholar
  20. Council of Chief State School Officers (CCSSO). (2011). Moving forward with kindergarten readiness assessment efforts: A position paper of the Early Childhood State Collaborative on Assessment and Student Standards. Washington, DC: Council of Chief State School Officers.Google Scholar
  21. Crooks, T. J. (1988). The impact of classroom evaluation on students. Review of Educational Research, 58, 438–481.CrossRefGoogle Scholar
  22. De Kruif, R. E. L., McWilliam, R. A., Ridley, S. M., & Wakely, M. B. (2000). Classification of teachers’ interaction behaviors in early childhood classrooms. Early Childhood Research Quarterly, 15(2), 247–268.CrossRefGoogle Scholar
  23. Denton, K., & West, J. (2002). Children’s reading and mathematics achievement in kindergarten and first grade (NCES 2002–125). Washington, DC: National Center for Education Statistics.Google Scholar
  24. Department for Education. (2013). Early Years Foundation Stage Profile Handbook. Available at:
  25. Diamond, K.E., Justice, L.M., Siegler, R.S., & Snyder, P.A. (2013). Synthesis of IES research on early intervention and early childhood education. U.S. Department of Education. NCSER 2013–3001.Google Scholar
  26. Duncan, G. J., & Magnuson, K. A. (2005). Can family socioeconomic resources account for racial and ethnic test score gaps? Future of Children, 15(1), 35–54.CrossRefGoogle Scholar
  27. Education Commission of the States. (2014). 50-state analysis: Kindergarten entrance assessments. Available at:
  28. Entwisle, D. R., & Alexander, K. L. (1993). Entry into schools: the beginning school transition and educational stratification in the United States. Annual Review in Sociology, 19, 401–423.CrossRefGoogle Scholar
  29. Francis, D. J., Fletcher, J. M., Shaywitz, B. A., Shaywitz, S. E., & Rourke, B. P. (1996). Defining learning and language disabilities: conceptual and psychometric issues with the use of IQ tests. Language, Speech, and Hearing Services in Schools, 27, 132–143.CrossRefGoogle Scholar
  30. Fryer, R. G., & Levitt, S. D. (2004). Understanding the black-white test score gap in the first two years of school. Review of Economics and Statistics, 86(2), 447–464.CrossRefGoogle Scholar
  31. Fryer, R. G., & Levitt, S. D. (2006). The black-white test score gap through third grade. American Law and Economics Review, 8(2), 249–281.CrossRefGoogle Scholar
  32. Gilliam, W. S. (2000). On over-generalizing from overly-simplistic evaluations of complex social programs. Early Childhood Research Quarterly, 15(1), 67–74.CrossRefGoogle Scholar
  33. Goldfeld, S., Sayers, M., Brinkman, S., Silburn, S., & Oberklaid, F. (2009). The Process and Policy Challenges of Adapting and Implementing the Early Development Instrument in Australia. Early Education & Development, 13, 978–991.Google Scholar
  34. Goldstein, J., & Behuniak, P. (2011). Assumptions in alternate assessment: An argument-based approach to validation. Assessment for Effective Intervention, 36, 179–191.Google Scholar
  35. Gordon, R. A., Fujimoto, K., Kaestner, R., Korenman, S., & Abner, K. (2013). An assessment of the validity of the ECERS–R with implications for measures of child care quality and relations to child development. Developmental Psychology, 41(1), 146–160.CrossRefGoogle Scholar
  36. Guhn, M., Gadermann, A., & Zumbo, B. D. (2007). Does the EDI measure school readiness in the same way across different groups of children? Early Education and Development, 18(3), 453–472.CrossRefGoogle Scholar
  37. Harms, T., Clifford, R. M., & Cryer, D. (1998). Early childhood environment rating scale (Revisedth ed.). New York: Teachers College Press.Google Scholar
  38. Haskins, R., & Rouse, C. (2005). Closing achievement gaps. The future of children Spring Policy Brief. Princeton: Princeton University and Brookings Institution.Google Scholar
  39. Heaviside, S., & Farris, E. (1993). Public school kindergarten teachers’ views on children’s readiness for school (NCES No. 93–410). Washington, DC: U.S. Department of Educational, Office of Educational Research and Improvement.Google Scholar
  40. Herman, J., & Dorr-Bremme, D. (1982). Assessing students: teachers’ routine practices and reasoning. New York: Paper presented at the annual meeting of the American Educational Research Association.Google Scholar
  41. Herzenberg, S., Price, M., & Bradley, D. (2005). Losing ground in early childhood education: declining workforce qualifications in an expanding industry. Washington, DC: Economic Policy Institute.Google Scholar
  42. High/Scope Educational Research Foundation. (1992). High/scope Child Observation Record (COR) for ages 2 1/2-6. Ypsilanti, MI: High/Scope Press.Google Scholar
  43. Hofer, K. G. (2010). How measurement characteristics can affect ECERS-R scores and program funding. Contemporary Issues in Early Childhood, 11(2), 175–191.CrossRefGoogle Scholar
  44. Jaeger, E. and Funk, S. (2001). The Philadelphia Child Care Quality Study: An examination of quality in selected early education and care settings. Available at:
  45. Janus, M., & Offord, D. (2007). Development and psychometric properties of the early development instrument (EDI): a measure of children’s school readiness. Canadian Journal of Behavioural Science, 39, 1–22.CrossRefGoogle Scholar
  46. Janus, M., Brinkman, S., & Duku, E. (2011). Validity and psychometric properties of the early development instrument in Canada, Australia, United States, and Jamaica. Social Indicators Research, 103(2), 283–297.CrossRefGoogle Scholar
  47. Jordan, N. C., Huttenlocher, J., & Levine, S. C. (1992). Differential calculation abilities in young children from middle- and low-income families. Developmental Psychology, 28, 644–653.CrossRefGoogle Scholar
  48. Juel, C. (1988). Learning to read and write: a longitudinal study of 54 children from first through fourth grades. Journal of Educational Psychology, 80(4), 437–447.CrossRefGoogle Scholar
  49. Kagan, S. L., Scott-Little, C., & Clifford, R. M. (2003). Assessing young children: what policymakers need to know and do. In C. Scott-Little, S. L. Kagan, & R. M. Clifford (Eds.), Assessing the state of state assessments: perspectives on assessing young children. Greensboro: North Carolina: University of North Carolina, SERVE.Google Scholar
  50. Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Washington, DC: The National Council on Measurement in Education & the American Council on Education.Google Scholar
  51. Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.CrossRefGoogle Scholar
  52. Kim, D. H., & Smith, J. D. (2010). Evaluation of two observational assessment systems for children’s development and learning. NHSA Dialog, 13, 253–267.CrossRefGoogle Scholar
  53. Kim, D. H., Lambert, R. G., & Burts, D. C. (2013). Evidence of the validity of teaching strategies GOLD® assessment tool for english language learners and children with disabilities. Early Education and Development, 24(4), 574–595.CrossRefGoogle Scholar
  54. Lambert, R. G., Kim, D. H., & Burts, D. C. (2015). The measurement properties of the Teaching Strategies GOLD® assessment system. Early Childhood Research Quarterly. doi: 10.1016/j.ecresq.2015.05.004.Google Scholar
  55. Lane, S., Parke, C. S., & Stone, C. A. (1998). A framework for evaluating the consequences of assessment programs. Educational Measurement: Issues and Practice, 17(2), 24–28.CrossRefGoogle Scholar
  56. LeBuffe, P. A., & Naglieri, J. A. (1999). DECA: Devereux early childhood assessment. Lewisville: Kaplan Press.Google Scholar
  57. Li, K., Hu, B., Pan, Y., Qin, J., & Fan, X. (2011). Chinese Early Childhood Environment Rating Scale (trial) (CECERS): A validity study. Early Childhood Research Quarterly, 29, 268–282.Google Scholar
  58. Lin, H. L., Lawrence, F. R., & Gorrell, J. (2003). Kindergarten teachers’ views of children’s readiness for school. Early Childhood Research Quarterly, 18(2), 225–237.CrossRefGoogle Scholar
  59. Loeb, S., Bridges, M., Bassok, D., Fuller, B., & Rumberger, R. (2007). How much is too much? The influence of preschool centers on children’s social and cognitive development. Economics of Education Review, 26(1), 52–66.CrossRefGoogle Scholar
  60. Marion, S., & Pellegrino, J. (2006). A validity framework for evaluating the technical quality of alternate assessments. Educational Measurement: Issues and Practice, 25(4), 47–57.CrossRefGoogle Scholar
  61. Mashburn, A. J., & Henry, G. T. (2004). Assessing school readiness: Validity and bias in preschool and kindergarten teachers’ ratings. Educational Measurement: Issues and Practice, 23(4), 16–30.Google Scholar
  62. Mehrens, W. (2002). Consequences of assessment: what is the evidence? In G. Tindal & T. Haladyna (Eds.), Large-scale assessment programs for all students: validity, technical adequacy, and implementation. Mahwah: Lawrence Earlbaum Associates.Google Scholar
  63. Meisels, S. J. (1996). Performance in context: assessing children’s achievement at the outset of school. In A. J. Sameroff & M. M. Haith (Eds.), The five to seven year shift: the age of reason and responsibility (pp. 410–431). Chicago, IL: University of Chicago Press.Google Scholar
  64. Meisels, S. J. (2007). Accountability in early childhood: no easy answers. In R. C. Pianta, M. J. Cox, & K. Snow (Eds.), School readiness, early learning, and the transition to kindergarten (pp. 31–48). Baltimore: Paul H. Brookes.Google Scholar
  65. Meisels, S. J., Liaw, F., Dorfman, A., & Nelson, R. F. (1995). The work sampling system: reliability and validity of a performance assessment for young children. Early Childhood Research Quarterly, 10, 277–296.CrossRefGoogle Scholar
  66. Meisels, S. J., Wen, X., & Beachy-Quick, K. (2010). Authentic assessment for infants and toddlers: exploring the reliability and validity of the ounce scale. Applied Developmental Science, 14, 55–71.CrossRefGoogle Scholar
  67. Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13–23.CrossRefGoogle Scholar
  68. Mislevy, R. J., & Haertel, G. D. (2006). Implications of evidence‐centered design for educational testing. Educational Measurement: Issues and Practice, 25(4), 6–20.CrossRefGoogle Scholar
  69. Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). Focus article: on the structure of educational assessments. Measurement: Interdisciplinary research and perspectives, 1(1), 3–62.Google Scholar
  70. Myers, R. G. (2004). In search of quality programmes of early child- hood care and education. Background paper for Education for All, Global Monitoring Report 2005. Paris, France: UNESCO. Retrieved from 2005.pdf.
  71. National Research Council. (2008). Early childhood assessment: what, why, and how. Washington, DC: National Academies Press.Google Scholar
  72. Nelson, K. (Ed.). (1998). Principles and recommendations for childhood assessments. DIANE Publishing.Google Scholar
  73. Neuman, S. B., & Dickinson, D. K. (Eds.). (2001). Handbook of early childhood literacy research. New York: Guilford.Google Scholar
  74. Powell, D. R., Son, S., File, N., & San Juan, R. R. (2010). Parent-school relationships and children’s academic and social outcomes in public school pre-kindergarten. Journal of School Psychology, 48(4), 269–292.CrossRefGoogle Scholar
  75. Rathburn A, West J. From Kindergarten Through Third Grade: Children’s Beginning School Experiences. Washington, DC: National Center for Education Statistics; 2004. Available at:
  76. Reardon, S. F. (2003). Sources of educational inequality: the growth of racial/ethnic and socioeconomic test score gaps in kindergarten and first grade (Working Paper 03-05R). University Park: The Pennsylvania State University, Population Research Institute.Google Scholar
  77. Reckase, M. (1998). Consequential validity from the test developer’s perspective. Educational Measurement: Issues and Practice, 17(2), 13–16.CrossRefGoogle Scholar
  78. Rumberger, R. W. & Arellano, B. (2004). Understanding and addressing the Latino achievement gap in California. (Working paper 2004–01). Berkeley, CA: UC Latino Policy Institute.Google Scholar
  79. Schafer, W. D., Wang, J., & Wang, V. (2009). Validity in action: state assessment validity evidence for compliance with NCLB. In R. Lissitz (Ed.), The concept of validity: revisions, new directions and applications (pp. 173–193). Charlotte: Information Age Publishing Inc.Google Scholar
  80. Shaywitz, S. E., Fletcher, J. M., Holahan, J. M., Shneider, A. E., Marchione, K. E., Stuebing, K. K., & Shaywitz, B. A. (1999). Persistence of dyslexia: the Connecticut longitudinal study at adolescence. Pediatrics, 104, 1351–1359.CrossRefGoogle Scholar
  81. Silburn, S., Brinkman, S., Sayers, M., Goldfeld, S., & Oberklaid, F. (2007). Establishing the construct and predictive validity of the Australian early development index (AEDI). Early Human Development, 83(1), S125.CrossRefGoogle Scholar
  82. Sireci, S. G. (2009). Packing and unpacking sources of validity evidence. The concept of validity: Revisions, new directions, and applications, 19.Google Scholar
  83. Stiggins, R. J. (1999). Evaluating classroom assessment training in teacher education programs. Educational Measurement: Issues and Practice, 18(1), 23–27.CrossRefGoogle Scholar
  84. Sylva, K., Siraj-Blatchford, I., & Taggart, B. (2003). Assessing quality in the early years: Early Childhood Environment Rating Scale-Extension (ECERS-E): Four curricular subscales. Stoke-on Trent: Trentham Books.Google Scholar
  85. Sylva, K., Melhuish, E. C., Sammons, P., Siraj, I., & Taggart, B. (2004). The Effective Provision of Pre-School Education (EPPE) Project Technical Paper 12, The Final Report: Effective Pre-School Education. London: DfES / Institute of Education, University of London.Google Scholar
  86. Sylva, K., Siraj-Blatchford, I., Taggart, B., Sammons, P., Melhuish, E., Elliot, K., & Totsika, V. (2006). Capturing quality in early childhood through environmental rating scales. Early Childhood Research Quarterly, 21(1), 76–92.CrossRefGoogle Scholar
  87. Tach, L. M., & Farkas, G. (2006). Learning-related behaviors, cognitive skills, and ability grouping when schooling begins. Social Science Research, 35(4), 1048–1079.CrossRefGoogle Scholar
  88. U.S. Department of Education. (2011a, October 20). 35 States, D.C. and Puerto Rico submit applications for the Race to the Top-Early Learning Challenge. Retrieved from
  89. U.S. Department of Education. (2011). Race to the Top - Early Learning Challenge application for initial funding: CFDA Number: 84.412. Retrieved from
  90. U.S. Department of Education. (2013, May 23). Applications for new awards: Enhanced assessment instruments Grants Program-Enhanced Assessment Instruments-Kindergarten Entry Assessment Competition. Retrieved from
  91. U.S. Department of Education (2015). Kindergarten Entry Assessments in RTT-ELC Grantee States. Retrieved from:
  92. U.S. Department of Health and Human Services. (2011). Minimum preservice qualifications and annual ongoing training house for center teaching roles in 2011. National Center on Child Care Quality Improvement. Fairfax, VA. Retrieved from
  93. Volante, L., & Fazio, X. (2007). Exploring teacher candidates’ assessment literacy: implications for teacher education reform and professional development. Canadian Journal of Education, 30(3), 749–770.CrossRefGoogle Scholar
  94. Wesley, P. W., & Buysse, V. (2003). Making meaning of school readiness in schools and communities. Early Childhood Research Quarterly, 18(3), 351–375.CrossRefGoogle Scholar
  95. West, J., Denton, K., & Germino-Hausken, E. (2001a). America’s kindergartners: findings from the Early Childhood Longitudinal Study, kindergarten class of 1998–99. Washington, DC: National Center for Education Statistics.Google Scholar
  96. West, J., Denton, K., & Reaney, L. (2001b). The kindergarten year (NCES 2001–023). Washington, DC: National Center for Education Statistics.Google Scholar
  97. Zill, N., & West, J. (2001). Findings from the condition of education 2000: entering kindergarten. Washington, DC: National Center for Education Statistics.Google Scholar
  98. Zill, N., Collins, M., West, J., & Hausken, E. G. (1995). Approaching kindergarten: A look at preschoolers in the United States. U.S. Department of Education, Office of Educational Research and Improvement, National Center for Education Statistics.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Educational PsychologyUniversity of ConnecticutStorrsUSA
  2. 2.Department of Psychology, Quantitative MethodsYork UniversityTorontoCanada

Personalised recommendations