Advertisement

Improving Comprehension Assessment for Middle and High School Students: Challenges and Opportunities

  • John SabatiniEmail author
  • Yaacov Petscher
  • Tenaha O’Reilly
  • Adrea Truckenmiller
Part of the Literacy Studies book series (LITS, volume 10)

Abstract

For decades, standardized reading comprehension tests have consisted of a series of passages and associated multiple-choice questions. Although widely used in and out of the classroom, there continues to be considerable disagreement regarding how or whether such tests have net value in the service of advancing educational progress in reading. This chapter begins with a review of features that characterize standardized reading assessments. In particular, we discuss how assessment designs and analytics reflect a balance of practical and measurement constraints. We then discuss how advances in the learning sciences, measurement, and electronic technologies have opened up the design space for a new generation of reading assessments. Abstracting from this review, we end by presenting some examples of prototype assessments that reflect opportunities for enhancing the value and utility of reading assessments in the future.

Keywords

Assessment Measurement Computer adaptive testing 

Notes

Acknowledgements

The research reported here was supported in part by the Institute of Education Sciences (IES), U.S. Department of Education, through Grant R305F100005 to the Educational Testing Service (ETS) as part of the Reading for Understanding Research (RfU) Initiative. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education or ETS. We are extremely grateful to the IES and ETS for sponsoring and supporting this research. We would like to also like to thank Matthias von Davier, Michael Kane, and Don Powers for their intellectual insights and thoughtful comments; and Jennifer Lentini and Kim Fryer for their editorial assistance.

References

  1. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.Google Scholar
  2. Au, W. (2007). High-stakes testing and curricular control: A qualitative metasynthesis. Educational Researcher, 36, 258–267.CrossRefGoogle Scholar
  3. Baker, E. L. (2013). The chimera of validity. Teachers College Record (090302), 115, 1–26.Google Scholar
  4. Bennett, R. E. (2011a, June). Theory of action and educational assessment. Paper presented at the National Conference on Student Assessment, Orlando, FL.Google Scholar
  5. Bennett, R. E. (2011b). CBAL: Results from piloting innovative K–12 assessments (Research report no. RR-11-23). Princeton, NJ: ETS.Google Scholar
  6. Bennett, R. E., & Gitomer, D. H. (2009). Transforming K–12 assessment: Integrating accountability testing, formative assessment and professional support. In C. Wyatt-Smith & J. J. Cumming (Eds.), Educational assessment in the 21st century (pp. 43–62). New York, NY: Springer.CrossRefGoogle Scholar
  7. Berliner, D. C. (2011). Rational responses to high-stakes testing: The case of curriculum narrowing and the harm that follows. Cambridge Journal of Education, 41, 278–302.CrossRefGoogle Scholar
  8. Brennan, R. L. (Ed.). (2006). Educational measurement (4th ed.). Westport, CT: American Council on Education/Praeger.Google Scholar
  9. Britt, M. A., & Rouet, J. F. (2012). Learning with multiple documents: Component skills and their acquisition. In M. J. Lawson & J. R. Kirby (Eds.), The quality of learning: Dispositions, instruction, and mental structures (pp. 276–314). Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
  10. Cain, K., & Parrila, R. (2014). Introduction to the special issue. Theories of reading: what we have learned from two decades of scientific research. Scientific Studies of Reading, 18, 1–4.CrossRefGoogle Scholar
  11. Christ, T. J., & Hintze, J. M. (2007). Psychometric considerations when evaluating response to intervention. In S. R. Jimerson, M. K. Burns, & A. M. VanDerHeyden (Eds.), Handbook of response to intervention: The science and practice of assessment and intervention(pp. 99–105). New York, NY: Springer.Google Scholar
  12. Christo, C. (2005). Critical characteristics of a three-tiered model applied to reading interventions. California School Psychologist, 10, 33–44.CrossRefGoogle Scholar
  13. Coiro, J. (2009). Rethinking reading assessment in a digital age: How is reading comprehension different and where do we turn now? Educational Leadership, 66, 59–63.Google Scholar
  14. Coiro, J. (2011). Predicting reading comprehension on the Internet: Contributions of offline reading skills, online reading skills, and prior knowledge. Journal of Literacy Research, 43, 352–392.CrossRefGoogle Scholar
  15. Compton, D., Fuchs, D., Fuchs, L., & Bryant, J. (2006). Selecting at-risk readers in first grade for early intervention: A two-year longitudinal study of decision rules and procedures. Journal of Educational Psychology, 98, 394–409.CrossRefGoogle Scholar
  16. Deane, P., Sabatini, J., & O’Reilly, T. (2012). English language arts literacy framework. Princeton, NJ: ETS. Retrieved from http://elalp.cbalwiki.ets.org/Table+of+Contents
  17. Educational Testing Service. (2002). ETS standards for quality and fairness. Princeton, NJ: Author. Retrieved from https://www.ets.org/s/about/pdf/standards.pdf
  18. Educational Testing Service. (2013). Reading for understanding. Retrieved from http://www.ets.org/research/topics/reading_for_understanding/
  19. Embretson, S., & Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.Google Scholar
  20. Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, DHHS. (2010). Developing early literacy: Report of the National Early Literacy Panel. Washington, DC: U.S. Government Printing Office.Google Scholar
  21. Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person parameters. Educational and Psychological Measurement, 58, 357–381.CrossRefGoogle Scholar
  22. Foorman, B., Petscher, Y., & Schatschneider, C. (2013). FCRR reading assessment. Tallahassee, FL: Florida Center for Reading Research.Google Scholar
  23. Fuchs, D., Compton, D., Fuchs, L., Bryant, J., & Davis, G. (2008). Making ‘secondary intervention’ work in a three-tier responsiveness-to-intervention model: Findings from the first-grade longitudinal reading study of the National Research Center on Learning Disabilities. Reading and Writing, 21, 413–436.CrossRefGoogle Scholar
  24. Fuchs, L. S., Fuchs, D., & Compton, D. L. (2010). Rethinking response to intervention at middle and high school. School Psychology Review, 39, 22–28.Google Scholar
  25. Gearhart, M., & Herman, J. L. (1998). Portfolio assessment: Whose work is it? Issues in the use of classroom assignments for accountability. Educational Assessment, 5, 41–56.CrossRefGoogle Scholar
  26. Gil, L., Bråten, I., Vidal-Abarca, E., & Strømsø, H. I. (2010). Understanding and integrating multiple science texts: Summary tasks are sometimes better than argument tasks. Reading Psychology, 31, 30–68.CrossRefGoogle Scholar
  27. Goldman, S. (2012). Adolescent literacy: Learning and understanding content. Future of Children, 22, 73–88.CrossRefGoogle Scholar
  28. Goldman, S. R. (2004). Cognitive aspects of constructing meaning through and across multiple texts. In N. Shuart-Ferris & D. M. Bloome (Eds.), Uses of intertextuality in classroom and educational research (pp. 317–351). Greenwich, CT: Information Age Publishing.Google Scholar
  29. Gordon Commission. (2013). To assess, to teach, to learn: a vision for the future of assessment. Retrieved from http://www.gordoncommission.org/rsc/pdfs/gordon_commission_technical_report.pdf
  30. Gorin, J., & Mislevy, R. J. (2013, September). Inherent measurement challenges in the next generation science standards for both formative and summative assessment. Paper presented at the invitational research symposium on science assessment, Washington, DC.Google Scholar
  31. Haertel, E. H. (2006). Reliability. In R. L. Brenna (Ed.), Educational measurement (4th ed., pp. 65–110). Westport, CT: American Council on Education/Praeger.Google Scholar
  32. Halverson, R. (2010). School formative feedback systems. Peabody Journal of Education, 85, 130–155.CrossRefGoogle Scholar
  33. Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12, 38–47.CrossRefGoogle Scholar
  34. Institute of Education Sciences. (2010). Reading for understanding initiative. Washington, DC: U. S. Department of Education. Retrieved from http://ies.ed.gov/ncer/projects/program.asp?ProgID=62
  35. International Association for the Evaluation of Educational Achievement. (2013a). Progress in international reading literacy study 2016. Retrieved from http://www.iea.nl/?id=457
  36. International Association for the Evaluation of Educational Achievement. (2013b). ePirls online reading 2016. Retrieved from http://www.iea.nl/fileadmin/user_upload/Studies/PIRLS_2016/ePIRLS_2016_Brochure.pdf
  37. Jimerson, S. R., Burns, M. K., & VanDerHeyden, A. M. (2007). Handbook of response to intervention: The science and practice of assessment and intervention. Springfield, IL: Springer.CrossRefGoogle Scholar
  38. Kafer, K. (2002, December 1). High-poverty students excel with direct instruction. Heartlander Magazine. Retrieved from http://news.heartland.org/newspaper-article/2002/12/01/high-poverty-students-excel-direct-instruction
  39. Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.CrossRefGoogle Scholar
  40. Kane, M. (2006). Validation. In R. J. Brennan (Ed.), Educational measurement (4th ed., pp. 18–64). Lanham, MD: Rowman & Littlefield Education.Google Scholar
  41. Katz, S., & Lautenschlager, G. (2001). The contribution of passage and no-passage factors to item performance on the SAT reading task. Educational Assessment, 7, 165–176.CrossRefGoogle Scholar
  42. Kieffer, M. J., & Petscher, Y. (2013). Unique contributions of measurement error? Applying a bi-factor structural equation model to investigate the roles of morphological awareness and vocabulary knowledge in reading comprehension. Paper presented at the American Education Research Association, San Francisco, CA.Google Scholar
  43. Kim, H. R., Zhang, J., & Stout, W. F. (1995). A new index of dimensionality—DETECT. Unpublished manuscript.Google Scholar
  44. Kline, T. (2005). Psychological testing: A practical approach to design and evaluation. Thousand Oaks, CA: Sage Publications.CrossRefGoogle Scholar
  45. Klingner, J., & Edwards, P. (2006). Cultural considerations with response to intervention models. Reading Research Quarterly, 41, 108–117.CrossRefGoogle Scholar
  46. Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York, NY: Springer.CrossRefGoogle Scholar
  47. Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994a). The evolution of a portfolio program: The impact and quality of the Vermont program in its second year (1992–1993). Los Angeles, CA: UCLA, National Center for Research on Evaluation, Standards, and Student Testing.Google Scholar
  48. Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994b). The Vermont portfolio assessment program: Findings and implications. Educational Measurement: Issues and Practice, 13, 5–10.CrossRefGoogle Scholar
  49. Koretz, D., Stecher, S., Klein, D., McCaffrey, D., & Deibert, E. (1993). Can portfolios assess student performance and influence instruction? The 1991–1992 Vermont experience. Santa Monica, CA: RAND.Google Scholar
  50. Lee, C. D., & Spratley, A. (2010). Reading in the disciplines: The challenges of adolescent literacy. New York, NY: Carnegie Corporation of New York.Google Scholar
  51. Leu, D., Kinzer, C., Coiro, J., Castek, J., & Henry, L. (2013). New literacies: A dual-level theory of the changing nature of literacy, instruction, and assessment. In D. E. Alvermann, N. J. Unrau, & R. B. Ruddell (Eds.), Theoretical models and processes of reading (6th ed., pp. 1150–1181). Newark, DE: International Reading Association.CrossRefGoogle Scholar
  52. Lewis, D. M., Green, D. R., Mitzel, H. C., Baum, K., & Patz, R. J. (1998). The bookmark standard setting procedure: Methodology and recent implementations. Paper presented at the annual meeting of the National Council for Measurement in Education, San Diego, CA.Google Scholar
  53. Livingston, S. A. (2004). Equating test scores (without IRT). Princeton, NJ: ETS.Google Scholar
  54. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.Google Scholar
  55. McCrudden, M. T., & Schraw, G. (2007). Relevance and goal-focusing in text processing. Educational Psychology Review, 19, 113–139.CrossRefGoogle Scholar
  56. McMurrer, J. (2008). Instructional time in elementary schools: A closer look at changes for specific subjects. Washington, DC: Center on Education Policy.Google Scholar
  57. Messick, S. (1983). Assessment of children. In P. Mussen (Ed.), Handbook of child psychology, volume 1: History, theory, and methods (4th ed., pp. 477–526). New York, NY: Wiley.Google Scholar
  58. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3dth ed., pp. 13–103). New York, NY: Macmillan.Google Scholar
  59. Miller, M. D. (2002). Generalizability of performance-based assessments. Technical guidelines for performance assessment. Washington, DC: Council of Chief State School Officers.Google Scholar
  60. Minarechová, M. (2012). Negative impacts of high-stakes testing. Journal of Pedagogy/Pedagogický Casopis, 3, 82–100.CrossRefGoogle Scholar
  61. Mislevy, R. J. (2006). Cognitive psychology and educational assessment. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 257–306). Westport, CT: American Council on Education/Praeger.Google Scholar
  62. Mislevy, R. J. (2007). Validity by design. Educational Researcher, 36, 463–469.CrossRefGoogle Scholar
  63. Mislevy, R. J. (2008). How cognitive science challenges the educational measurement tradition. Measurement: Interdisciplinary Research and Perspectives, 6, 124.Google Scholar
  64. Mislevy, R. J. (2009). Validity from the perspective of model-based reasoning. In R. L. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 83–108). Charlotte, NC: Information Age Publishing.Google Scholar
  65. Mislevy, R. J., & Haertel, G. (2006). Implications for evidence-centered design for educational assessment. Educational Measurement: Issues and Practice, 25, 6–20.CrossRefGoogle Scholar
  66. Mislevy, R. J., & Sabatini, J. P. (2012). How research on reading and research on assessment are transforming reading assessment (or if they aren’t, how they ought to). In J. Sabatini, E. Albro, & T. O’Reilly (Eds.), Measuring up: Advances in how we assess reading ability (pp. 119–134). Lanham, MD: Rowman & Littlefield Education.Google Scholar
  67. Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–67.Google Scholar
  68. Mullis, I. V. S., Martin, M. O., Kennedy, A. M., Trong, K. L., & Sainsbury, M. (2009). PIRLS 2011 assessment framework. Boston, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. Retrieved from http://timssandpirls.bc.edu/pirls2011/downloads/PIRLS2011_Framework.pdf
  69. National Governors Association Center for Best Practices, & Council of Chief State School Officers. (2010). Common Core State Standards for English language arts and literacy in history/social studies, science, and technical subjects. Retrieved from http://www.corestandards.org/assets/CCSSI_ELA%20Standards.pdf
  70. National Institute of Child Health and Human Development. (2000). Report of the National Reading Panel. Teaching children to read: an evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. Bethesda, MC: Author. Retrieved from https://www.nichd.nih.gov/publications/pubs/nrp/Pages/smallbook.aspx
  71. Neill, M. (1997). Testing our children: A report card on state assessment systems. Retrieved from http://www.fairtest.org/testing-our-children-introduction
  72. Nelson, H. (2013). Testing more, teaching less: What America’s obsession with student testing costs in money and lost instructional time. Washington, DC: American Federation of Teachers. Retrieved from http://www.aft.org/pdfs/teachers/testingmore2013.pdf
  73. Nunnally, J. C., & Bernstein, L. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.Google Scholar
  74. O’Reilly, T., & Sabatini, J. (2013). Reading for understanding: How performance moderators and scenarios impact assessment design (Research report no. RR-13-31). Princeton, NJ: ETS.Google Scholar
  75. O’Reilly, T., Sabatini, J., Bruce, K., Pillarisetti, S., & McCormick, C. (2012). Middle school reading assessment: Measuring what matters under an RTI framework. Reading Psychology Special Issue: Response to Intervention, 33, 162–189.CrossRefGoogle Scholar
  76. Organisation for Economic Co-operation and Development. (2009a). PISA 2009 assessment framework- key competencies in reading, mathematics and science. Paris, France: Author. Retrieved from http://www.oecd.org/pisa/pisaproducts/44455820.pdf
  77. Organisation for Economic Co-operation and Development. (2009b). PIAAC literacy: A conceptual framework. Paris, France: Author. Retrieved from http://www.oecd-ilibrary.org/content/workingpaper/220348414075
  78. Organisation for Economic Co-operation and Development. (2013). PISA 2012 assessment and analytical framework: Mathematics, reading, science, problem solving, and financial literacy. Paris, France: Author.Google Scholar
  79. Owens, E. (2013, November 18). Common Core critics celebrate National Don’t Send Your Child to School Day. Daily Caller. Retrieved from http://dailycaller.com/2013/11/18/common-core-critics-celebrate-national-dont-send-your-child-to-school-day/
  80. Partnership for 21st Century Skills. (2004). Learning for the 21st century: A report and mile guide for 21st century skills. Washington, DC: Author. Retrieved from http://www.p21.org/storage/documents/P21_Report.pdf
  81. Partnership for 21st Century Skills. (2008). 21st century skills and English map. Washington, DC: Author. Retrieved from http://www.p21.org/storage/documents/21st_century_skills_english_map.pdf
  82. Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academy Press.Google Scholar
  83. Peng, L., Li, C., & Wan, X. (2012). A framework for optimising the cost and performance of concept testing. Journal of Marketing Management, 28, 1000–1013.CrossRefGoogle Scholar
  84. Perfetti, C., & Stafura, J. (2014). Word knowledge in a theory of reading comprehension. Scientific Studies of Reading, 18, 22–37.CrossRefGoogle Scholar
  85. Petress, K. (2006). Perils of current testing mandates. Journal of Instructional Psychology, 33, 80–82.Google Scholar
  86. Petscher, Y. (2011, July). A comparison of methods for scoring multidimensional constructs unidimensionally in literacy research. Paper presented at the annual meeting of the society for the scientific study of reading, St. Pete Beach, FL.Google Scholar
  87. Petscher, Y., & Schatschneider, C. (2012). Validating scores from new assessments: A comparison of classical test theory and item response theory. In G. Tenenbaum, R. Eklund, & A. Kamata (Eds.), Handbook of measurement in sport and exercise psychology (pp. 41–52). Champaign, IL: Human Kinetics.Google Scholar
  88. Phelps, R. P. (2012). The effect of testing on student achievement, 1910–2010. International Journal of Testing, 12, 21–43.CrossRefGoogle Scholar
  89. Powers, D. E., & Wilson-Leung, S. (1995). Answering the new SAT reading comprehension questions without the passages. Journal of Educational Measurement, 32(2), 105–130.CrossRefGoogle Scholar
  90. Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36.CrossRefGoogle Scholar
  91. Rijmen, F. (2010). Formal relations and an empirical comparison among the bi-factor, the testlet, and a second-order multidimensional IRT model. Journal of Educational Measurement, 47, 361–372.CrossRefGoogle Scholar
  92. Rijmen, F. (2011). Hierarchical factor item response theory models for PIRLS: Capturing cluster effects at multiple levels. In M. von Davier & D. Hastedt (Eds.), IERI monograph series: Issues and methodologies in large-scale assessments (Vol. 4, pp. 59–74). Hamburg, Germany: IERI.Google Scholar
  93. Rupp, A., Ferne, T., & Choi, H. (2006). How assessing reading comprehension with multiple-choice questions shapes the construct: A cognitive processing perspective. Language Testing, 23, 441–474.CrossRefGoogle Scholar
  94. Sabatini, J., Albro, E., & O’Reilly, T. (2012). Measuring up: Advances in how we assess reading ability. Lanham, MD: Rowman & Littlefield Education.Google Scholar
  95. Sabatini, J., Bruce, K., & Steinberg, J. (2013). SARA reading components tests, RISE form (Research report no. RR-13-08). Princeton, NJ: ETS.Google Scholar
  96. Sabatini, J., & O’Reilly, T. (2013). Rationale for a new generation of reading comprehension assessments. In B. Miller, L. Cutting, & P. McCardle (Eds.), Unraveling the behavioral, neurobiological, and genetic components of reading comprehension (pp. 100–111). Baltimore, MD: Brookes Publishing.Google Scholar
  97. Sabatini, J., O’Reilly, T., & Albro, E. (2012). Reaching an understanding: Innovations in how we view reading assessment. Lanham, MD: Rowman & Littlefield Education.Google Scholar
  98. Sabatini, J., O’Reilly, T., & Deane, P. (2013). Preliminary reading literacy assessment framework: Foundation and rationale for assessment and system design (Research Report No. RR-13-30). Princeton, NJ: Educational Testing Service.Google Scholar
  99. Sabatini, J., O’Reilly, T., Halderman, L., & Bruce, K. (2014). Integrating scenario-based and component reading skill measures to understand the reading behavior of struggling readers. Learning Disabilities Research & Practice, 29, 36–43.CrossRefGoogle Scholar
  100. Santelices, M. V., & Wilson, M. (2012). On the relationship between differential item functioning and item difficulty: an issue of methods? Item response theory approach to differential item functioning. Educational & Psychological Measurement, 72, 5–36.CrossRefGoogle Scholar
  101. Schmidt, W. H., McKnight, C. C., & Raizen, S. A. (1997). A spirited vision: A investigation of U.S. science and mathematics education. Dordrecht, The Netherlands: Kluwer.Google Scholar
  102. Shanahan, C., Shanahan, T., & Misischia, C. (2011). Analysis of expert readers in three disciplines: History, mathematics, and chemistry. Journal of Literacy Research, 43, 393–429.CrossRefGoogle Scholar
  103. Shanahan, T., & Shanahan, C. (2008). Teaching disciplinary literacy to adolescents: Rethinking content-area literacy. Harvard Educational Review, 78, 40–59.Google Scholar
  104. Shephard, L. A. (2013). Validity for what purpose? Teachers College Record (090307), 115, 1–12.Google Scholar
  105. Siena College Research Institute. (2013). Siena College poll: Divided over Common Core, NYers say too much testing. Loudonville, NY: Author. Retrieved from http://www.siena.edu/uploadedfiles/home/parents_and_community/community_page/sri/sny_poll/SNY%20November%202013%20Poll%20Release%20--%20FINAL.pdf
  106. Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120.CrossRefGoogle Scholar
  107. Snow, R. E., & Lohman, D. F. (1989). Implications of cognitive psychology for educational measurement. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 263–331). New York, NY: Macmillan.Google Scholar
  108. Spector, J. (2013). NY voters: Too much testing in schools. Retrieved from http://www.democratandchronicle.com/story/news/2013/11/18/ny-voters-too-much-testing-in-schools-/3634223/
  109. Stecher, B., & Barron, S. (2001). Unintended consequences of test-based accountability when testing in ‘milepost’ grades. Educational Assessment, 7, 259–281.CrossRefGoogle Scholar
  110. Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.CrossRefGoogle Scholar
  111. Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325.CrossRefGoogle Scholar
  112. Stout, W. F., Douglas, J., Junker, B., & Roussos, L. A. (1993). DIMTEST manual. Unpublished manuscript.Google Scholar
  113. Tate, R. (2002). Test dimensionality. In G. Tindal & T. M. Haladyna (Eds.), Large-scale assessment programs for all students: Validity, technical adequacy, and implementation. Mahwah, NJ: Erlbaum.Google Scholar
  114. Thompson, N. A., & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research, and Evaluation, 16. Retrieved from http://pareonline.net/pdf/v16n1.pdf
  115. Tong, Y., & Kolen, M. J. (2010, May). IRT proficiency estimators and their impact. Paper presented at the annual conference of the National Council of Measurement in Education, Denver, CO.Google Scholar
  116. U.S. Department of Education. (2009). Race to the top executive summary. Washington, DC: Author. Retrieved from http://www2.ed.gov/programs/racetothetop/executive-summary.pdf
  117. van den Broek, P., Lorch, R. F., Jr., Linderholm, T., & Gustafson, M. (2001). The effects of readers’ goals on inference generation and memory for texts. Memory & Cognition, 29, 1081–1087.CrossRefGoogle Scholar
  118. van der Linden, W. J., & Glas, C. A. W. (Eds.). (2010). Elements of adaptive testing. New York, NY: Springer.Google Scholar
  119. Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York, NY: Cambridge University Press.CrossRefGoogle Scholar
  120. Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., & Mislevy, R. J. (2000). Computerized adaptive testing: A primer. New York, NY: Routledge.Google Scholar
  121. Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473–492.CrossRefGoogle Scholar
  122. Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361–375.CrossRefGoogle Scholar
  123. Wise, S. L., & Kingsbury, G. G. (2000). Practical issues in developing and maintaining a computerized adaptive testing program. Psicológica, 21, 135–155.Google Scholar
  124. Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.CrossRefGoogle Scholar
  125. Yovanoff, P., & Tindal, G. (2007). Scaling early reading alternate assessments with statewide measures. Exceptional Children, 73, 184–201.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • John Sabatini
    • 1
    Email author
  • Yaacov Petscher
    • 2
  • Tenaha O’Reilly
    • 1
  • Adrea Truckenmiller
    • 2
  1. 1.Global AssessmentEducational Testing ServicePrincetonUSA
  2. 2.Florida Center for Reading ResearchFlorida State UniversityTallahasseeUSA

Personalised recommendations