Improving Comprehension Assessment for Middle and High School Students: Challenges and Opportunities

Sabatini, John; Petscher, Yaacov; O’Reilly, Tenaha; Truckenmiller, Adrea

doi:10.1007/978-3-319-14735-2_6

John Sabatini⁴,
Yaacov Petscher⁵,
Tenaha O’Reilly⁴ &
…
Adrea Truckenmiller⁵

Part of the book series: Literacy Studies ((LITS,volume 10))

2552 Accesses
2 Citations
1 Altmetric

Abstract

For decades, standardized reading comprehension tests have consisted of a series of passages and associated multiple-choice questions. Although widely used in and out of the classroom, there continues to be considerable disagreement regarding how or whether such tests have net value in the service of advancing educational progress in reading. This chapter begins with a review of features that characterize standardized reading assessments. In particular, we discuss how assessment designs and analytics reflect a balance of practical and measurement constraints. We then discuss how advances in the learning sciences, measurement, and electronic technologies have opened up the design space for a new generation of reading assessments. Abstracting from this review, we end by presenting some examples of prototype assessments that reflect opportunities for enhancing the value and utility of reading assessments in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For those interested in a more complete and technically sophisticated treatment of measurement concepts, issues of ethical design and use, and modern day advances, a library of measurement books are available (e.g., see AERA/APA/NCME, 1999; Brennan, 2006; ETS, 2002).
2.
It is not that psychometrics cannot handle scores other than dichotomous; however, the complexity increases and efficiency in design and analyses typically decrease.
3.
It is worth noting that there are several assumptions made about the errors in classical test theory (Kline, 2005). First, it is expected that T and E are uncorrelated, meaning that an individuals’ errors, either negative or positive will not maintain a systematic relation with the true score. Second, it is expected that an error score on one form of the assessment (e.g., the three reading comprehension passages) will be uncorrelated with the error on a parallel form of the assessment (e.g., a set of three different reading comprehension passages). Third, it is expected that the errors are normally distributed with the average of the random errors around the individual’s score to be zero. This means that at times the reading comprehension score may be high such as when the student may have particularly high self-efficacy or recalls the information well from a prior testing, or low such as when the student skipped breakfast, but because the random errors are assumed to be normally distributed, the average across testing periods will be zero.
4.
In psychometrics, item independence is introduced as a purely statistical assumption, though it has practical implications for task design, as discussed later.
5.
It follows then that if tests are strictly parallel, we can replace the covariance of true scores T and T′ – COV (T, T′) – by the variance of true scores V(T), and the CTT assumption of uncorrelated errors COV (E, E′) = 0 = COV (T, E′) gives us what we need.
6.
Technically, IRT models do not contain an error variable as a component of the model equations. They are based on a probability model for item level variables and assume a latent variable. The standard error in IRT models is based on assumptions we make about the model, and on what is known as the Fisher information inequality or Cramer Rao lower bound.
7.
In classical test theory, methods of equating test forms are used to address these kinds of problem.
8.
Interested readers should visit the CBAL website at: http://www.ets.org/research/topics/cbal/initiative
9.
Due to space limitations, we only elaborate on the RfU assessment project in the paper. Both CBAL and RfU share many of the same underlying principles and both incorporate innovative design techniques including scenario-based tasks and assessments.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Google Scholar
Au, W. (2007). High-stakes testing and curricular control: A qualitative metasynthesis. Educational Researcher, 36, 258–267.
Article Google Scholar
Baker, E. L. (2013). The chimera of validity. Teachers College Record (090302), 115, 1–26.
Google Scholar
Bennett, R. E. (2011a, June). Theory of action and educational assessment. Paper presented at the National Conference on Student Assessment, Orlando, FL.
Google Scholar
Bennett, R. E. (2011b). CBAL: Results from piloting innovative K–12 assessments (Research report no. RR-11-23). Princeton, NJ: ETS.
Google Scholar
Bennett, R. E., & Gitomer, D. H. (2009). Transforming K–12 assessment: Integrating accountability testing, formative assessment and professional support. In C. Wyatt-Smith & J. J. Cumming (Eds.), Educational assessment in the 21st century (pp. 43–62). New York, NY: Springer.
Chapter Google Scholar
Berliner, D. C. (2011). Rational responses to high-stakes testing: The case of curriculum narrowing and the harm that follows. Cambridge Journal of Education, 41, 278–302.
Article Google Scholar
Brennan, R. L. (Ed.). (2006). Educational measurement (4th ed.). Westport, CT: American Council on Education/Praeger.
Google Scholar
Britt, M. A., & Rouet, J. F. (2012). Learning with multiple documents: Component skills and their acquisition. In M. J. Lawson & J. R. Kirby (Eds.), The quality of learning: Dispositions, instruction, and mental structures (pp. 276–314). Cambridge, UK: Cambridge University Press.
Chapter Google Scholar
Cain, K., & Parrila, R. (2014). Introduction to the special issue. Theories of reading: what we have learned from two decades of scientific research. Scientific Studies of Reading, 18, 1–4.
Article Google Scholar
Christ, T. J., & Hintze, J. M. (2007). Psychometric considerations when evaluating response to intervention. In S. R. Jimerson, M. K. Burns, & A. M. VanDerHeyden (Eds.), Handbook of response to intervention: The science and practice of assessment and intervention(pp. 99–105). New York, NY: Springer.
Google Scholar
Christo, C. (2005). Critical characteristics of a three-tiered model applied to reading interventions. California School Psychologist, 10, 33–44.
Article Google Scholar
Coiro, J. (2009). Rethinking reading assessment in a digital age: How is reading comprehension different and where do we turn now? Educational Leadership, 66, 59–63.
Google Scholar
Coiro, J. (2011). Predicting reading comprehension on the Internet: Contributions of offline reading skills, online reading skills, and prior knowledge. Journal of Literacy Research, 43, 352–392.
Article Google Scholar
Compton, D., Fuchs, D., Fuchs, L., & Bryant, J. (2006). Selecting at-risk readers in first grade for early intervention: A two-year longitudinal study of decision rules and procedures. Journal of Educational Psychology, 98, 394–409.
Article Google Scholar
Deane, P., Sabatini, J., & O’Reilly, T. (2012). English language arts literacy framework. Princeton, NJ: ETS. Retrieved from http://elalp.cbalwiki.ets.org/Table+of+Contents
Educational Testing Service. (2002). ETS standards for quality and fairness. Princeton, NJ: Author. Retrieved from https://www.ets.org/s/about/pdf/standards.pdf
Educational Testing Service. (2013). Reading for understanding. Retrieved from http://www.ets.org/research/topics/reading_for_understanding/
Embretson, S., & Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Google Scholar
Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, DHHS. (2010). Developing early literacy: Report of the National Early Literacy Panel. Washington, DC: U.S. Government Printing Office.
Google Scholar
Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person parameters. Educational and Psychological Measurement, 58, 357–381.
Article Google Scholar
Foorman, B., Petscher, Y., & Schatschneider, C. (2013). FCRR reading assessment. Tallahassee, FL: Florida Center for Reading Research.
Google Scholar
Fuchs, D., Compton, D., Fuchs, L., Bryant, J., & Davis, G. (2008). Making ‘secondary intervention’ work in a three-tier responsiveness-to-intervention model: Findings from the first-grade longitudinal reading study of the National Research Center on Learning Disabilities. Reading and Writing, 21, 413–436.
Article Google Scholar
Fuchs, L. S., Fuchs, D., & Compton, D. L. (2010). Rethinking response to intervention at middle and high school. School Psychology Review, 39, 22–28.
Google Scholar
Gearhart, M., & Herman, J. L. (1998). Portfolio assessment: Whose work is it? Issues in the use of classroom assignments for accountability. Educational Assessment, 5, 41–56.
Article Google Scholar
Gil, L., Bråten, I., Vidal-Abarca, E., & Strømsø, H. I. (2010). Understanding and integrating multiple science texts: Summary tasks are sometimes better than argument tasks. Reading Psychology, 31, 30–68.
Article Google Scholar
Goldman, S. (2012). Adolescent literacy: Learning and understanding content. Future of Children, 22, 73–88.
Article Google Scholar
Goldman, S. R. (2004). Cognitive aspects of constructing meaning through and across multiple texts. In N. Shuart-Ferris & D. M. Bloome (Eds.), Uses of intertextuality in classroom and educational research (pp. 317–351). Greenwich, CT: Information Age Publishing.
Google Scholar
Gordon Commission. (2013). To assess, to teach, to learn: a vision for the future of assessment. Retrieved from http://www.gordoncommission.org/rsc/pdfs/gordon_commission_technical_report.pdf
Gorin, J., & Mislevy, R. J. (2013, September). Inherent measurement challenges in the next generation science standards for both formative and summative assessment. Paper presented at the invitational research symposium on science assessment, Washington, DC.
Google Scholar
Haertel, E. H. (2006). Reliability. In R. L. Brenna (Ed.), Educational measurement (4th ed., pp. 65–110). Westport, CT: American Council on Education/Praeger.
Google Scholar
Halverson, R. (2010). School formative feedback systems. Peabody Journal of Education, 85, 130–155.
Article Google Scholar
Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12, 38–47.
Article Google Scholar
Institute of Education Sciences. (2010). Reading for understanding initiative. Washington, DC: U. S. Department of Education. Retrieved from http://ies.ed.gov/ncer/projects/program.asp?ProgID=62
International Association for the Evaluation of Educational Achievement. (2013a). Progress in international reading literacy study 2016. Retrieved from http://www.iea.nl/?id=457
International Association for the Evaluation of Educational Achievement. (2013b). ePirls online reading 2016. Retrieved from http://www.iea.nl/fileadmin/user_upload/Studies/PIRLS_2016/ePIRLS_2016_Brochure.pdf
Jimerson, S. R., Burns, M. K., & VanDerHeyden, A. M. (2007). Handbook of response to intervention: The science and practice of assessment and intervention. Springfield, IL: Springer.
Book Google Scholar
Kafer, K. (2002, December 1). High-poverty students excel with direct instruction. Heartlander Magazine. Retrieved from http://news.heartland.org/newspaper-article/2002/12/01/high-poverty-students-excel-direct-instruction
Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.
Article Google Scholar
Kane, M. (2006). Validation. In R. J. Brennan (Ed.), Educational measurement (4th ed., pp. 18–64). Lanham, MD: Rowman & Littlefield Education.
Google Scholar
Katz, S., & Lautenschlager, G. (2001). The contribution of passage and no-passage factors to item performance on the SAT reading task. Educational Assessment, 7, 165–176.
Article Google Scholar
Kieffer, M. J., & Petscher, Y. (2013). Unique contributions of measurement error? Applying a bi-factor structural equation model to investigate the roles of morphological awareness and vocabulary knowledge in reading comprehension. Paper presented at the American Education Research Association, San Francisco, CA.
Google Scholar
Kim, H. R., Zhang, J., & Stout, W. F. (1995). A new index of dimensionality—DETECT. Unpublished manuscript.
Google Scholar
Kline, T. (2005). Psychological testing: A practical approach to design and evaluation. Thousand Oaks, CA: Sage Publications.
Book Google Scholar
Klingner, J., & Edwards, P. (2006). Cultural considerations with response to intervention models. Reading Research Quarterly, 41, 108–117.
Article Google Scholar
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York, NY: Springer.
Book Google Scholar
Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994a). The evolution of a portfolio program: The impact and quality of the Vermont program in its second year (1992–1993). Los Angeles, CA: UCLA, National Center for Research on Evaluation, Standards, and Student Testing.
Google Scholar
Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994b). The Vermont portfolio assessment program: Findings and implications. Educational Measurement: Issues and Practice, 13, 5–10.
Article Google Scholar
Koretz, D., Stecher, S., Klein, D., McCaffrey, D., & Deibert, E. (1993). Can portfolios assess student performance and influence instruction? The 1991–1992 Vermont experience. Santa Monica, CA: RAND.
Google Scholar
Lee, C. D., & Spratley, A. (2010). Reading in the disciplines: The challenges of adolescent literacy. New York, NY: Carnegie Corporation of New York.
Google Scholar
Leu, D., Kinzer, C., Coiro, J., Castek, J., & Henry, L. (2013). New literacies: A dual-level theory of the changing nature of literacy, instruction, and assessment. In D. E. Alvermann, N. J. Unrau, & R. B. Ruddell (Eds.), Theoretical models and processes of reading (6th ed., pp. 1150–1181). Newark, DE: International Reading Association.
Chapter Google Scholar
Lewis, D. M., Green, D. R., Mitzel, H. C., Baum, K., & Patz, R. J. (1998). The bookmark standard setting procedure: Methodology and recent implementations. Paper presented at the annual meeting of the National Council for Measurement in Education, San Diego, CA.
Google Scholar
Livingston, S. A. (2004). Equating test scores (without IRT). Princeton, NJ: ETS.
Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Google Scholar
McCrudden, M. T., & Schraw, G. (2007). Relevance and goal-focusing in text processing. Educational Psychology Review, 19, 113–139.
Article Google Scholar
McMurrer, J. (2008). Instructional time in elementary schools: A closer look at changes for specific subjects. Washington, DC: Center on Education Policy.
Google Scholar
Messick, S. (1983). Assessment of children. In P. Mussen (Ed.), Handbook of child psychology, volume 1: History, theory, and methods (4th ed., pp. 477–526). New York, NY: Wiley.
Google Scholar
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3dth ed., pp. 13–103). New York, NY: Macmillan.
Google Scholar
Miller, M. D. (2002). Generalizability of performance-based assessments. Technical guidelines for performance assessment. Washington, DC: Council of Chief State School Officers.
Google Scholar
Minarechová, M. (2012). Negative impacts of high-stakes testing. Journal of Pedagogy/Pedagogický Casopis, 3, 82–100.
Article Google Scholar
Mislevy, R. J. (2006). Cognitive psychology and educational assessment. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 257–306). Westport, CT: American Council on Education/Praeger.
Google Scholar
Mislevy, R. J. (2007). Validity by design. Educational Researcher, 36, 463–469.
Article Google Scholar
Mislevy, R. J. (2008). How cognitive science challenges the educational measurement tradition. Measurement: Interdisciplinary Research and Perspectives, 6, 124.
Google Scholar
Mislevy, R. J. (2009). Validity from the perspective of model-based reasoning. In R. L. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 83–108). Charlotte, NC: Information Age Publishing.
Google Scholar
Mislevy, R. J., & Haertel, G. (2006). Implications for evidence-centered design for educational assessment. Educational Measurement: Issues and Practice, 25, 6–20.
Article Google Scholar
Mislevy, R. J., & Sabatini, J. P. (2012). How research on reading and research on assessment are transforming reading assessment (or if they aren’t, how they ought to). In J. Sabatini, E. Albro, & T. O’Reilly (Eds.), Measuring up: Advances in how we assess reading ability (pp. 119–134). Lanham, MD: Rowman & Littlefield Education.
Google Scholar
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–67.
Google Scholar
Mullis, I. V. S., Martin, M. O., Kennedy, A. M., Trong, K. L., & Sainsbury, M. (2009). PIRLS 2011 assessment framework. Boston, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. Retrieved from http://timssandpirls.bc.edu/pirls2011/downloads/PIRLS2011_Framework.pdf
National Governors Association Center for Best Practices, & Council of Chief State School Officers. (2010). Common Core State Standards for English language arts and literacy in history/social studies, science, and technical subjects. Retrieved from http://www.corestandards.org/assets/CCSSI_ELA%20Standards.pdf
National Institute of Child Health and Human Development. (2000). Report of the National Reading Panel. Teaching children to read: an evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. Bethesda, MC: Author. Retrieved from https://www.nichd.nih.gov/publications/pubs/nrp/Pages/smallbook.aspx
Neill, M. (1997). Testing our children: A report card on state assessment systems. Retrieved from http://www.fairtest.org/testing-our-children-introduction
Nelson, H. (2013). Testing more, teaching less: What America’s obsession with student testing costs in money and lost instructional time. Washington, DC: American Federation of Teachers. Retrieved from http://www.aft.org/pdfs/teachers/testingmore2013.pdf
Nunnally, J. C., & Bernstein, L. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.
Google Scholar
O’Reilly, T., & Sabatini, J. (2013). Reading for understanding: How performance moderators and scenarios impact assessment design (Research report no. RR-13-31). Princeton, NJ: ETS.
Google Scholar
O’Reilly, T., Sabatini, J., Bruce, K., Pillarisetti, S., & McCormick, C. (2012). Middle school reading assessment: Measuring what matters under an RTI framework. Reading Psychology Special Issue: Response to Intervention, 33, 162–189.
Article Google Scholar
Organisation for Economic Co-operation and Development. (2009a). PISA 2009 assessment framework- key competencies in reading, mathematics and science. Paris, France: Author. Retrieved from http://www.oecd.org/pisa/pisaproducts/44455820.pdf
Organisation for Economic Co-operation and Development. (2009b). PIAAC literacy: A conceptual framework. Paris, France: Author. Retrieved from http://www.oecd-ilibrary.org/content/workingpaper/220348414075
Organisation for Economic Co-operation and Development. (2013). PISA 2012 assessment and analytical framework: Mathematics, reading, science, problem solving, and financial literacy. Paris, France: Author.
Google Scholar
Owens, E. (2013, November 18). Common Core critics celebrate National Don’t Send Your Child to School Day. Daily Caller. Retrieved from http://dailycaller.com/2013/11/18/common-core-critics-celebrate-national-dont-send-your-child-to-school-day/
Partnership for 21st Century Skills. (2004). Learning for the 21st century: A report and mile guide for 21st century skills. Washington, DC: Author. Retrieved from http://www.p21.org/storage/documents/P21_Report.pdf
Partnership for 21st Century Skills. (2008). 21st century skills and English map. Washington, DC: Author. Retrieved from http://www.p21.org/storage/documents/21st_century_skills_english_map.pdf
Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academy Press.
Google Scholar
Peng, L., Li, C., & Wan, X. (2012). A framework for optimising the cost and performance of concept testing. Journal of Marketing Management, 28, 1000–1013.
Article Google Scholar
Perfetti, C., & Stafura, J. (2014). Word knowledge in a theory of reading comprehension. Scientific Studies of Reading, 18, 22–37.
Article Google Scholar
Petress, K. (2006). Perils of current testing mandates. Journal of Instructional Psychology, 33, 80–82.
Google Scholar
Petscher, Y. (2011, July). A comparison of methods for scoring multidimensional constructs unidimensionally in literacy research. Paper presented at the annual meeting of the society for the scientific study of reading, St. Pete Beach, FL.
Google Scholar
Petscher, Y., & Schatschneider, C. (2012). Validating scores from new assessments: A comparison of classical test theory and item response theory. In G. Tenenbaum, R. Eklund, & A. Kamata (Eds.), Handbook of measurement in sport and exercise psychology (pp. 41–52). Champaign, IL: Human Kinetics.
Google Scholar
Phelps, R. P. (2012). The effect of testing on student achievement, 1910–2010. International Journal of Testing, 12, 21–43.
Article Google Scholar
Powers, D. E., & Wilson-Leung, S. (1995). Answering the new SAT reading comprehension questions without the passages. Journal of Educational Measurement, 32(2), 105–130.
Article Google Scholar
Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36.
Article Google Scholar
Rijmen, F. (2010). Formal relations and an empirical comparison among the bi-factor, the testlet, and a second-order multidimensional IRT model. Journal of Educational Measurement, 47, 361–372.
Article Google Scholar
Rijmen, F. (2011). Hierarchical factor item response theory models for PIRLS: Capturing cluster effects at multiple levels. In M. von Davier & D. Hastedt (Eds.), IERI monograph series: Issues and methodologies in large-scale assessments (Vol. 4, pp. 59–74). Hamburg, Germany: IERI.
Google Scholar
Rupp, A., Ferne, T., & Choi, H. (2006). How assessing reading comprehension with multiple-choice questions shapes the construct: A cognitive processing perspective. Language Testing, 23, 441–474.
Article Google Scholar
Sabatini, J., Albro, E., & O’Reilly, T. (2012). Measuring up: Advances in how we assess reading ability. Lanham, MD: Rowman & Littlefield Education.
Google Scholar
Sabatini, J., Bruce, K., & Steinberg, J. (2013). SARA reading components tests, RISE form (Research report no. RR-13-08). Princeton, NJ: ETS.
Google Scholar
Sabatini, J., & O’Reilly, T. (2013). Rationale for a new generation of reading comprehension assessments. In B. Miller, L. Cutting, & P. McCardle (Eds.), Unraveling the behavioral, neurobiological, and genetic components of reading comprehension (pp. 100–111). Baltimore, MD: Brookes Publishing.
Google Scholar
Sabatini, J., O’Reilly, T., & Albro, E. (2012). Reaching an understanding: Innovations in how we view reading assessment. Lanham, MD: Rowman & Littlefield Education.
Google Scholar
Sabatini, J., O’Reilly, T., & Deane, P. (2013). Preliminary reading literacy assessment framework: Foundation and rationale for assessment and system design (Research Report No. RR-13-30). Princeton, NJ: Educational Testing Service.
Google Scholar
Sabatini, J., O’Reilly, T., Halderman, L., & Bruce, K. (2014). Integrating scenario-based and component reading skill measures to understand the reading behavior of struggling readers. Learning Disabilities Research & Practice, 29, 36–43.
Article Google Scholar
Santelices, M. V., & Wilson, M. (2012). On the relationship between differential item functioning and item difficulty: an issue of methods? Item response theory approach to differential item functioning. Educational & Psychological Measurement, 72, 5–36.
Article Google Scholar
Schmidt, W. H., McKnight, C. C., & Raizen, S. A. (1997). A spirited vision: A investigation of U.S. science and mathematics education. Dordrecht, The Netherlands: Kluwer.
Google Scholar
Shanahan, C., Shanahan, T., & Misischia, C. (2011). Analysis of expert readers in three disciplines: History, mathematics, and chemistry. Journal of Literacy Research, 43, 393–429.
Article Google Scholar
Shanahan, T., & Shanahan, C. (2008). Teaching disciplinary literacy to adolescents: Rethinking content-area literacy. Harvard Educational Review, 78, 40–59.
Google Scholar
Shephard, L. A. (2013). Validity for what purpose? Teachers College Record (090307), 115, 1–12.
Google Scholar
Siena College Research Institute. (2013). Siena College poll: Divided over Common Core, NYers say too much testing. Loudonville, NY: Author. Retrieved from http://www.siena.edu/uploadedfiles/home/parents_and_community/community_page/sri/sny_poll/SNY%20November%202013%20Poll%20Release%20--%20FINAL.pdf
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120.
Article Google Scholar
Snow, R. E., & Lohman, D. F. (1989). Implications of cognitive psychology for educational measurement. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 263–331). New York, NY: Macmillan.
Google Scholar
Spector, J. (2013). NY voters: Too much testing in schools. Retrieved from http://www.democratandchronicle.com/story/news/2013/11/18/ny-voters-too-much-testing-in-schools-/3634223/
Stecher, B., & Barron, S. (2001). Unintended consequences of test-based accountability when testing in ‘milepost’ grades. Educational Assessment, 7, 259–281.
Article Google Scholar
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.
Article Google Scholar
Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325.
Article Google Scholar
Stout, W. F., Douglas, J., Junker, B., & Roussos, L. A. (1993). DIMTEST manual. Unpublished manuscript.
Google Scholar
Tate, R. (2002). Test dimensionality. In G. Tindal & T. M. Haladyna (Eds.), Large-scale assessment programs for all students: Validity, technical adequacy, and implementation. Mahwah, NJ: Erlbaum.
Google Scholar
Thompson, N. A., & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research, and Evaluation, 16. Retrieved from http://pareonline.net/pdf/v16n1.pdf
Tong, Y., & Kolen, M. J. (2010, May). IRT proficiency estimators and their impact. Paper presented at the annual conference of the National Council of Measurement in Education, Denver, CO.
Google Scholar
U.S. Department of Education. (2009). Race to the top executive summary. Washington, DC: Author. Retrieved from http://www2.ed.gov/programs/racetothetop/executive-summary.pdf
van den Broek, P., Lorch, R. F., Jr., Linderholm, T., & Gustafson, M. (2001). The effects of readers’ goals on inference generation and memory for texts. Memory & Cognition, 29, 1081–1087.
Article Google Scholar
van der Linden, W. J., & Glas, C. A. W. (Eds.). (2010). Elements of adaptive testing. New York, NY: Springer.
Google Scholar
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York, NY: Cambridge University Press.
Book Google Scholar
Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., & Mislevy, R. J. (2000). Computerized adaptive testing: A primer. New York, NY: Routledge.
Google Scholar
Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473–492.
Article Google Scholar
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361–375.
Article Google Scholar
Wise, S. L., & Kingsbury, G. G. (2000). Practical issues in developing and maintaining a computerized adaptive testing program. Psicológica, 21, 135–155.
Google Scholar
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.
Article Google Scholar
Yovanoff, P., & Tindal, G. (2007). Scaling early reading alternate assessments with statewide measures. Exceptional Children, 73, 184–201.
Article Google Scholar

Download references

Acknowledgements

The research reported here was supported in part by the Institute of Education Sciences (IES), U.S. Department of Education, through Grant R305F100005 to the Educational Testing Service (ETS) as part of the Reading for Understanding Research (RfU) Initiative. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education or ETS. We are extremely grateful to the IES and ETS for sponsoring and supporting this research. We would like to also like to thank Matthias von Davier, Michael Kane, and Don Powers for their intellectual insights and thoughtful comments; and Jennifer Lentini and Kim Fryer for their editorial assistance.

Author information

Authors and Affiliations

Global Assessment, Educational Testing Service, Princeton, NJ, USA
John Sabatini & Tenaha O’Reilly
Florida Center for Reading Research, Florida State University, Tallahassee, FL, USA
Yaacov Petscher & Adrea Truckenmiller

Authors

John Sabatini
View author publications
You can also search for this author in PubMed Google Scholar
Yaacov Petscher
View author publications
You can also search for this author in PubMed Google Scholar
Tenaha O’Reilly
View author publications
You can also search for this author in PubMed Google Scholar
Adrea Truckenmiller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Sabatini .

Editor information

Editors and Affiliations

College of Education, University of Houston, Houston, Texas, USA
Kristi L. Santi
Florida Center for Reading Research, Florida State University, Tallahassee, Florida, USA
Deborah K. Reed

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sabatini, J., Petscher, Y., O’Reilly, T., Truckenmiller, A. (2015). Improving Comprehension Assessment for Middle and High School Students: Challenges and Opportunities. In: Santi, K., Reed, D. (eds) Improving Reading Comprehension of Middle and High School Students. Literacy Studies, vol 10. Springer, Cham. https://doi.org/10.1007/978-3-319-14735-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-14735-2_6
Published: 11 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14734-5
Online ISBN: 978-3-319-14735-2
eBook Packages: Humanities, Social Sciences and LawEducation (R0)

Publish with us

Policies and ethics