Skip to main content

Improving Comprehension Assessment for Middle and High School Students: Challenges and Opportunities

  • Chapter
  • First Online:
Improving Reading Comprehension of Middle and High School Students

Part of the book series: Literacy Studies ((LITS,volume 10))

Abstract

For decades, standardized reading comprehension tests have consisted of a series of passages and associated multiple-choice questions. Although widely used in and out of the classroom, there continues to be considerable disagreement regarding how or whether such tests have net value in the service of advancing educational progress in reading. This chapter begins with a review of features that characterize standardized reading assessments. In particular, we discuss how assessment designs and analytics reflect a balance of practical and measurement constraints. We then discuss how advances in the learning sciences, measurement, and electronic technologies have opened up the design space for a new generation of reading assessments. Abstracting from this review, we end by presenting some examples of prototype assessments that reflect opportunities for enhancing the value and utility of reading assessments in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For those interested in a more complete and technically sophisticated treatment of measurement concepts, issues of ethical design and use, and modern day advances, a library of measurement books are available (e.g., see AERA/APA/NCME, 1999; Brennan, 2006; ETS, 2002).

  2. 2.

    It is not that psychometrics cannot handle scores other than dichotomous; however, the complexity increases and efficiency in design and analyses typically decrease.

  3. 3.

    It is worth noting that there are several assumptions made about the errors in classical test theory (Kline, 2005). First, it is expected that T and E are uncorrelated, meaning that an individuals’ errors, either negative or positive will not maintain a systematic relation with the true score. Second, it is expected that an error score on one form of the assessment (e.g., the three reading comprehension passages) will be uncorrelated with the error on a parallel form of the assessment (e.g., a set of three different reading comprehension passages). Third, it is expected that the errors are normally distributed with the average of the random errors around the individual’s score to be zero. This means that at times the reading comprehension score may be high such as when the student may have particularly high self-efficacy or recalls the information well from a prior testing, or low such as when the student skipped breakfast, but because the random errors are assumed to be normally distributed, the average across testing periods will be zero.

  4. 4.

    In psychometrics, item independence is introduced as a purely statistical assumption, though it has practical implications for task design, as discussed later.

  5. 5.

    It follows then that if tests are strictly parallel, we can replace the covariance of true scores T and T′ – COV (T, T′) – by the variance of true scores V(T), and the CTT assumption of uncorrelated errors COV (E, E′) = 0 = COV (T, E′) gives us what we need.

  6. 6.

    Technically, IRT models do not contain an error variable as a component of the model equations. They are based on a probability model for item level variables and assume a latent variable. The standard error in IRT models is based on assumptions we make about the model, and on what is known as the Fisher information inequality or Cramer Rao lower bound.

  7. 7.

    In classical test theory, methods of equating test forms are used to address these kinds of problem.

  8. 8.

    Interested readers should visit the CBAL website at: http://www.ets.org/research/topics/cbal/initiative

  9. 9.

    Due to space limitations, we only elaborate on the RfU assessment project in the paper. Both CBAL and RfU share many of the same underlying principles and both incorporate innovative design techniques including scenario-based tasks and assessments.

References

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

    Google Scholar 

  • Au, W. (2007). High-stakes testing and curricular control: A qualitative metasynthesis. Educational Researcher, 36, 258–267.

    Article  Google Scholar 

  • Baker, E. L. (2013). The chimera of validity. Teachers College Record (090302), 115, 1–26.

    Google Scholar 

  • Bennett, R. E. (2011a, June). Theory of action and educational assessment. Paper presented at the National Conference on Student Assessment, Orlando, FL.

    Google Scholar 

  • Bennett, R. E. (2011b). CBAL: Results from piloting innovative K–12 assessments (Research report no. RR-11-23). Princeton, NJ: ETS.

    Google Scholar 

  • Bennett, R. E., & Gitomer, D. H. (2009). Transforming K–12 assessment: Integrating accountability testing, formative assessment and professional support. In C. Wyatt-Smith & J. J. Cumming (Eds.), Educational assessment in the 21st century (pp. 43–62). New York, NY: Springer.

    Chapter  Google Scholar 

  • Berliner, D. C. (2011). Rational responses to high-stakes testing: The case of curriculum narrowing and the harm that follows. Cambridge Journal of Education, 41, 278–302.

    Article  Google Scholar 

  • Brennan, R. L. (Ed.). (2006). Educational measurement (4th ed.). Westport, CT: American Council on Education/Praeger.

    Google Scholar 

  • Britt, M. A., & Rouet, J. F. (2012). Learning with multiple documents: Component skills and their acquisition. In M. J. Lawson & J. R. Kirby (Eds.), The quality of learning: Dispositions, instruction, and mental structures (pp. 276–314). Cambridge, UK: Cambridge University Press.

    Chapter  Google Scholar 

  • Cain, K., & Parrila, R. (2014). Introduction to the special issue. Theories of reading: what we have learned from two decades of scientific research. Scientific Studies of Reading, 18, 1–4.

    Article  Google Scholar 

  • Christ, T. J., & Hintze, J. M. (2007). Psychometric considerations when evaluating response to intervention. In S. R. Jimerson, M. K. Burns, & A. M. VanDerHeyden (Eds.), Handbook of response to intervention: The science and practice of assessment and intervention(pp. 99–105). New York, NY: Springer.

    Google Scholar 

  • Christo, C. (2005). Critical characteristics of a three-tiered model applied to reading interventions. California School Psychologist, 10, 33–44.

    Article  Google Scholar 

  • Coiro, J. (2009). Rethinking reading assessment in a digital age: How is reading comprehension different and where do we turn now? Educational Leadership, 66, 59–63.

    Google Scholar 

  • Coiro, J. (2011). Predicting reading comprehension on the Internet: Contributions of offline reading skills, online reading skills, and prior knowledge. Journal of Literacy Research, 43, 352–392.

    Article  Google Scholar 

  • Compton, D., Fuchs, D., Fuchs, L., & Bryant, J. (2006). Selecting at-risk readers in first grade for early intervention: A two-year longitudinal study of decision rules and procedures. Journal of Educational Psychology, 98, 394–409.

    Article  Google Scholar 

  • Deane, P., Sabatini, J., & O’Reilly, T. (2012). English language arts literacy framework. Princeton, NJ: ETS. Retrieved from http://elalp.cbalwiki.ets.org/Table+of+Contents

  • Educational Testing Service. (2002). ETS standards for quality and fairness. Princeton, NJ: Author. Retrieved from https://www.ets.org/s/about/pdf/standards.pdf

  • Educational Testing Service. (2013). Reading for understanding. Retrieved from http://www.ets.org/research/topics/reading_for_understanding/

  • Embretson, S., & Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.

    Google Scholar 

  • Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, DHHS. (2010). Developing early literacy: Report of the National Early Literacy Panel. Washington, DC: U.S. Government Printing Office.

    Google Scholar 

  • Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person parameters. Educational and Psychological Measurement, 58, 357–381.

    Article  Google Scholar 

  • Foorman, B., Petscher, Y., & Schatschneider, C. (2013). FCRR reading assessment. Tallahassee, FL: Florida Center for Reading Research.

    Google Scholar 

  • Fuchs, D., Compton, D., Fuchs, L., Bryant, J., & Davis, G. (2008). Making ‘secondary intervention’ work in a three-tier responsiveness-to-intervention model: Findings from the first-grade longitudinal reading study of the National Research Center on Learning Disabilities. Reading and Writing, 21, 413–436.

    Article  Google Scholar 

  • Fuchs, L. S., Fuchs, D., & Compton, D. L. (2010). Rethinking response to intervention at middle and high school. School Psychology Review, 39, 22–28.

    Google Scholar 

  • Gearhart, M., & Herman, J. L. (1998). Portfolio assessment: Whose work is it? Issues in the use of classroom assignments for accountability. Educational Assessment, 5, 41–56.

    Article  Google Scholar 

  • Gil, L., Bråten, I., Vidal-Abarca, E., & Strømsø, H. I. (2010). Understanding and integrating multiple science texts: Summary tasks are sometimes better than argument tasks. Reading Psychology, 31, 30–68.

    Article  Google Scholar 

  • Goldman, S. (2012). Adolescent literacy: Learning and understanding content. Future of Children, 22, 73–88.

    Article  Google Scholar 

  • Goldman, S. R. (2004). Cognitive aspects of constructing meaning through and across multiple texts. In N. Shuart-Ferris & D. M. Bloome (Eds.), Uses of intertextuality in classroom and educational research (pp. 317–351). Greenwich, CT: Information Age Publishing.

    Google Scholar 

  • Gordon Commission. (2013). To assess, to teach, to learn: a vision for the future of assessment. Retrieved from http://www.gordoncommission.org/rsc/pdfs/gordon_commission_technical_report.pdf

  • Gorin, J., & Mislevy, R. J. (2013, September). Inherent measurement challenges in the next generation science standards for both formative and summative assessment. Paper presented at the invitational research symposium on science assessment, Washington, DC.

    Google Scholar 

  • Haertel, E. H. (2006). Reliability. In R. L. Brenna (Ed.), Educational measurement (4th ed., pp. 65–110). Westport, CT: American Council on Education/Praeger.

    Google Scholar 

  • Halverson, R. (2010). School formative feedback systems. Peabody Journal of Education, 85, 130–155.

    Article  Google Scholar 

  • Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12, 38–47.

    Article  Google Scholar 

  • Institute of Education Sciences. (2010). Reading for understanding initiative. Washington, DC: U. S. Department of Education. Retrieved from http://ies.ed.gov/ncer/projects/program.asp?ProgID=62

  • International Association for the Evaluation of Educational Achievement. (2013a). Progress in international reading literacy study 2016. Retrieved from http://www.iea.nl/?id=457

  • International Association for the Evaluation of Educational Achievement. (2013b). ePirls online reading 2016. Retrieved from http://www.iea.nl/fileadmin/user_upload/Studies/PIRLS_2016/ePIRLS_2016_Brochure.pdf

  • Jimerson, S. R., Burns, M. K., & VanDerHeyden, A. M. (2007). Handbook of response to intervention: The science and practice of assessment and intervention. Springfield, IL: Springer.

    Book  Google Scholar 

  • Kafer, K. (2002, December 1). High-poverty students excel with direct instruction. Heartlander Magazine. Retrieved from http://news.heartland.org/newspaper-article/2002/12/01/high-poverty-students-excel-direct-instruction

  • Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.

    Article  Google Scholar 

  • Kane, M. (2006). Validation. In R. J. Brennan (Ed.), Educational measurement (4th ed., pp. 18–64). Lanham, MD: Rowman & Littlefield Education.

    Google Scholar 

  • Katz, S., & Lautenschlager, G. (2001). The contribution of passage and no-passage factors to item performance on the SAT reading task. Educational Assessment, 7, 165–176.

    Article  Google Scholar 

  • Kieffer, M. J., & Petscher, Y. (2013). Unique contributions of measurement error? Applying a bi-factor structural equation model to investigate the roles of morphological awareness and vocabulary knowledge in reading comprehension. Paper presented at the American Education Research Association, San Francisco, CA.

    Google Scholar 

  • Kim, H. R., Zhang, J., & Stout, W. F. (1995). A new index of dimensionality—DETECT. Unpublished manuscript.

    Google Scholar 

  • Kline, T. (2005). Psychological testing: A practical approach to design and evaluation. Thousand Oaks, CA: Sage Publications.

    Book  Google Scholar 

  • Klingner, J., & Edwards, P. (2006). Cultural considerations with response to intervention models. Reading Research Quarterly, 41, 108–117.

    Article  Google Scholar 

  • Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York, NY: Springer.

    Book  Google Scholar 

  • Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994a). The evolution of a portfolio program: The impact and quality of the Vermont program in its second year (1992–1993). Los Angeles, CA: UCLA, National Center for Research on Evaluation, Standards, and Student Testing.

    Google Scholar 

  • Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994b). The Vermont portfolio assessment program: Findings and implications. Educational Measurement: Issues and Practice, 13, 5–10.

    Article  Google Scholar 

  • Koretz, D., Stecher, S., Klein, D., McCaffrey, D., & Deibert, E. (1993). Can portfolios assess student performance and influence instruction? The 1991–1992 Vermont experience. Santa Monica, CA: RAND.

    Google Scholar 

  • Lee, C. D., & Spratley, A. (2010). Reading in the disciplines: The challenges of adolescent literacy. New York, NY: Carnegie Corporation of New York.

    Google Scholar 

  • Leu, D., Kinzer, C., Coiro, J., Castek, J., & Henry, L. (2013). New literacies: A dual-level theory of the changing nature of literacy, instruction, and assessment. In D. E. Alvermann, N. J. Unrau, & R. B. Ruddell (Eds.), Theoretical models and processes of reading (6th ed., pp. 1150–1181). Newark, DE: International Reading Association.

    Chapter  Google Scholar 

  • Lewis, D. M., Green, D. R., Mitzel, H. C., Baum, K., & Patz, R. J. (1998). The bookmark standard setting procedure: Methodology and recent implementations. Paper presented at the annual meeting of the National Council for Measurement in Education, San Diego, CA.

    Google Scholar 

  • Livingston, S. A. (2004). Equating test scores (without IRT). Princeton, NJ: ETS.

    Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • McCrudden, M. T., & Schraw, G. (2007). Relevance and goal-focusing in text processing. Educational Psychology Review, 19, 113–139.

    Article  Google Scholar 

  • McMurrer, J. (2008). Instructional time in elementary schools: A closer look at changes for specific subjects. Washington, DC: Center on Education Policy.

    Google Scholar 

  • Messick, S. (1983). Assessment of children. In P. Mussen (Ed.), Handbook of child psychology, volume 1: History, theory, and methods (4th ed., pp. 477–526). New York, NY: Wiley.

    Google Scholar 

  • Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3dth ed., pp. 13–103). New York, NY: Macmillan.

    Google Scholar 

  • Miller, M. D. (2002). Generalizability of performance-based assessments. Technical guidelines for performance assessment. Washington, DC: Council of Chief State School Officers.

    Google Scholar 

  • Minarechová, M. (2012). Negative impacts of high-stakes testing. Journal of Pedagogy/Pedagogický Casopis, 3, 82–100.

    Article  Google Scholar 

  • Mislevy, R. J. (2006). Cognitive psychology and educational assessment. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 257–306). Westport, CT: American Council on Education/Praeger.

    Google Scholar 

  • Mislevy, R. J. (2007). Validity by design. Educational Researcher, 36, 463–469.

    Article  Google Scholar 

  • Mislevy, R. J. (2008). How cognitive science challenges the educational measurement tradition. Measurement: Interdisciplinary Research and Perspectives, 6, 124.

    Google Scholar 

  • Mislevy, R. J. (2009). Validity from the perspective of model-based reasoning. In R. L. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 83–108). Charlotte, NC: Information Age Publishing.

    Google Scholar 

  • Mislevy, R. J., & Haertel, G. (2006). Implications for evidence-centered design for educational assessment. Educational Measurement: Issues and Practice, 25, 6–20.

    Article  Google Scholar 

  • Mislevy, R. J., & Sabatini, J. P. (2012). How research on reading and research on assessment are transforming reading assessment (or if they aren’t, how they ought to). In J. Sabatini, E. Albro, & T. O’Reilly (Eds.), Measuring up: Advances in how we assess reading ability (pp. 119–134). Lanham, MD: Rowman & Littlefield Education.

    Google Scholar 

  • Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–67.

    Google Scholar 

  • Mullis, I. V. S., Martin, M. O., Kennedy, A. M., Trong, K. L., & Sainsbury, M. (2009). PIRLS 2011 assessment framework. Boston, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. Retrieved from http://timssandpirls.bc.edu/pirls2011/downloads/PIRLS2011_Framework.pdf

  • National Governors Association Center for Best Practices, & Council of Chief State School Officers. (2010). Common Core State Standards for English language arts and literacy in history/social studies, science, and technical subjects. Retrieved from http://www.corestandards.org/assets/CCSSI_ELA%20Standards.pdf

  • National Institute of Child Health and Human Development. (2000). Report of the National Reading Panel. Teaching children to read: an evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. Bethesda, MC: Author. Retrieved from https://www.nichd.nih.gov/publications/pubs/nrp/Pages/smallbook.aspx

  • Neill, M. (1997). Testing our children: A report card on state assessment systems. Retrieved from http://www.fairtest.org/testing-our-children-introduction

  • Nelson, H. (2013). Testing more, teaching less: What America’s obsession with student testing costs in money and lost instructional time. Washington, DC: American Federation of Teachers. Retrieved from http://www.aft.org/pdfs/teachers/testingmore2013.pdf

  • Nunnally, J. C., & Bernstein, L. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.

    Google Scholar 

  • O’Reilly, T., & Sabatini, J. (2013). Reading for understanding: How performance moderators and scenarios impact assessment design (Research report no. RR-13-31). Princeton, NJ: ETS.

    Google Scholar 

  • O’Reilly, T., Sabatini, J., Bruce, K., Pillarisetti, S., & McCormick, C. (2012). Middle school reading assessment: Measuring what matters under an RTI framework. Reading Psychology Special Issue: Response to Intervention, 33, 162–189.

    Article  Google Scholar 

  • Organisation for Economic Co-operation and Development. (2009a). PISA 2009 assessment framework- key competencies in reading, mathematics and science. Paris, France: Author. Retrieved from http://www.oecd.org/pisa/pisaproducts/44455820.pdf

  • Organisation for Economic Co-operation and Development. (2009b). PIAAC literacy: A conceptual framework. Paris, France: Author. Retrieved from http://www.oecd-ilibrary.org/content/workingpaper/220348414075

  • Organisation for Economic Co-operation and Development. (2013). PISA 2012 assessment and analytical framework: Mathematics, reading, science, problem solving, and financial literacy. Paris, France: Author.

    Google Scholar 

  • Owens, E. (2013, November 18). Common Core critics celebrate National Don’t Send Your Child to School Day. Daily Caller. Retrieved from http://dailycaller.com/2013/11/18/common-core-critics-celebrate-national-dont-send-your-child-to-school-day/

  • Partnership for 21st Century Skills. (2004). Learning for the 21st century: A report and mile guide for 21st century skills. Washington, DC: Author. Retrieved from http://www.p21.org/storage/documents/P21_Report.pdf

  • Partnership for 21st Century Skills. (2008). 21st century skills and English map. Washington, DC: Author. Retrieved from http://www.p21.org/storage/documents/21st_century_skills_english_map.pdf

  • Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academy Press.

    Google Scholar 

  • Peng, L., Li, C., & Wan, X. (2012). A framework for optimising the cost and performance of concept testing. Journal of Marketing Management, 28, 1000–1013.

    Article  Google Scholar 

  • Perfetti, C., & Stafura, J. (2014). Word knowledge in a theory of reading comprehension. Scientific Studies of Reading, 18, 22–37.

    Article  Google Scholar 

  • Petress, K. (2006). Perils of current testing mandates. Journal of Instructional Psychology, 33, 80–82.

    Google Scholar 

  • Petscher, Y. (2011, July). A comparison of methods for scoring multidimensional constructs unidimensionally in literacy research. Paper presented at the annual meeting of the society for the scientific study of reading, St. Pete Beach, FL.

    Google Scholar 

  • Petscher, Y., & Schatschneider, C. (2012). Validating scores from new assessments: A comparison of classical test theory and item response theory. In G. Tenenbaum, R. Eklund, & A. Kamata (Eds.), Handbook of measurement in sport and exercise psychology (pp. 41–52). Champaign, IL: Human Kinetics.

    Google Scholar 

  • Phelps, R. P. (2012). The effect of testing on student achievement, 1910–2010. International Journal of Testing, 12, 21–43.

    Article  Google Scholar 

  • Powers, D. E., & Wilson-Leung, S. (1995). Answering the new SAT reading comprehension questions without the passages. Journal of Educational Measurement, 32(2), 105–130.

    Article  Google Scholar 

  • Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36.

    Article  Google Scholar 

  • Rijmen, F. (2010). Formal relations and an empirical comparison among the bi-factor, the testlet, and a second-order multidimensional IRT model. Journal of Educational Measurement, 47, 361–372.

    Article  Google Scholar 

  • Rijmen, F. (2011). Hierarchical factor item response theory models for PIRLS: Capturing cluster effects at multiple levels. In M. von Davier & D. Hastedt (Eds.), IERI monograph series: Issues and methodologies in large-scale assessments (Vol. 4, pp. 59–74). Hamburg, Germany: IERI.

    Google Scholar 

  • Rupp, A., Ferne, T., & Choi, H. (2006). How assessing reading comprehension with multiple-choice questions shapes the construct: A cognitive processing perspective. Language Testing, 23, 441–474.

    Article  Google Scholar 

  • Sabatini, J., Albro, E., & O’Reilly, T. (2012). Measuring up: Advances in how we assess reading ability. Lanham, MD: Rowman & Littlefield Education.

    Google Scholar 

  • Sabatini, J., Bruce, K., & Steinberg, J. (2013). SARA reading components tests, RISE form (Research report no. RR-13-08). Princeton, NJ: ETS.

    Google Scholar 

  • Sabatini, J., & O’Reilly, T. (2013). Rationale for a new generation of reading comprehension assessments. In B. Miller, L. Cutting, & P. McCardle (Eds.), Unraveling the behavioral, neurobiological, and genetic components of reading comprehension (pp. 100–111). Baltimore, MD: Brookes Publishing.

    Google Scholar 

  • Sabatini, J., O’Reilly, T., & Albro, E. (2012). Reaching an understanding: Innovations in how we view reading assessment. Lanham, MD: Rowman & Littlefield Education.

    Google Scholar 

  • Sabatini, J., O’Reilly, T., & Deane, P. (2013). Preliminary reading literacy assessment framework: Foundation and rationale for assessment and system design (Research Report No. RR-13-30). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Sabatini, J., O’Reilly, T., Halderman, L., & Bruce, K. (2014). Integrating scenario-based and component reading skill measures to understand the reading behavior of struggling readers. Learning Disabilities Research & Practice, 29, 36–43.

    Article  Google Scholar 

  • Santelices, M. V., & Wilson, M. (2012). On the relationship between differential item functioning and item difficulty: an issue of methods? Item response theory approach to differential item functioning. Educational & Psychological Measurement, 72, 5–36.

    Article  Google Scholar 

  • Schmidt, W. H., McKnight, C. C., & Raizen, S. A. (1997). A spirited vision: A investigation of U.S. science and mathematics education. Dordrecht, The Netherlands: Kluwer.

    Google Scholar 

  • Shanahan, C., Shanahan, T., & Misischia, C. (2011). Analysis of expert readers in three disciplines: History, mathematics, and chemistry. Journal of Literacy Research, 43, 393–429.

    Article  Google Scholar 

  • Shanahan, T., & Shanahan, C. (2008). Teaching disciplinary literacy to adolescents: Rethinking content-area literacy. Harvard Educational Review, 78, 40–59.

    Google Scholar 

  • Shephard, L. A. (2013). Validity for what purpose? Teachers College Record (090307), 115, 1–12.

    Google Scholar 

  • Siena College Research Institute. (2013). Siena College poll: Divided over Common Core, NYers say too much testing. Loudonville, NY: Author. Retrieved from http://www.siena.edu/uploadedfiles/home/parents_and_community/community_page/sri/sny_poll/SNY%20November%202013%20Poll%20Release%20--%20FINAL.pdf

  • Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120.

    Article  Google Scholar 

  • Snow, R. E., & Lohman, D. F. (1989). Implications of cognitive psychology for educational measurement. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 263–331). New York, NY: Macmillan.

    Google Scholar 

  • Spector, J. (2013). NY voters: Too much testing in schools. Retrieved from http://www.democratandchronicle.com/story/news/2013/11/18/ny-voters-too-much-testing-in-schools-/3634223/

  • Stecher, B., & Barron, S. (2001). Unintended consequences of test-based accountability when testing in ‘milepost’ grades. Educational Assessment, 7, 259–281.

    Article  Google Scholar 

  • Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.

    Article  Google Scholar 

  • Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325.

    Article  Google Scholar 

  • Stout, W. F., Douglas, J., Junker, B., & Roussos, L. A. (1993). DIMTEST manual. Unpublished manuscript.

    Google Scholar 

  • Tate, R. (2002). Test dimensionality. In G. Tindal & T. M. Haladyna (Eds.), Large-scale assessment programs for all students: Validity, technical adequacy, and implementation. Mahwah, NJ: Erlbaum.

    Google Scholar 

  • Thompson, N. A., & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research, and Evaluation, 16. Retrieved from http://pareonline.net/pdf/v16n1.pdf

  • Tong, Y., & Kolen, M. J. (2010, May). IRT proficiency estimators and their impact. Paper presented at the annual conference of the National Council of Measurement in Education, Denver, CO.

    Google Scholar 

  • U.S. Department of Education. (2009). Race to the top executive summary. Washington, DC: Author. Retrieved from http://www2.ed.gov/programs/racetothetop/executive-summary.pdf

  • van den Broek, P., Lorch, R. F., Jr., Linderholm, T., & Gustafson, M. (2001). The effects of readers’ goals on inference generation and memory for texts. Memory & Cognition, 29, 1081–1087.

    Article  Google Scholar 

  • van der Linden, W. J., & Glas, C. A. W. (Eds.). (2010). Elements of adaptive testing. New York, NY: Springer.

    Google Scholar 

  • Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York, NY: Cambridge University Press.

    Book  Google Scholar 

  • Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., & Mislevy, R. J. (2000). Computerized adaptive testing: A primer. New York, NY: Routledge.

    Google Scholar 

  • Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473–492.

    Article  Google Scholar 

  • Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361–375.

    Article  Google Scholar 

  • Wise, S. L., & Kingsbury, G. G. (2000). Practical issues in developing and maintaining a computerized adaptive testing program. Psicológica, 21, 135–155.

    Google Scholar 

  • Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.

    Article  Google Scholar 

  • Yovanoff, P., & Tindal, G. (2007). Scaling early reading alternate assessments with statewide measures. Exceptional Children, 73, 184–201.

    Article  Google Scholar 

Download references

Acknowledgements

The research reported here was supported in part by the Institute of Education Sciences (IES), U.S. Department of Education, through Grant R305F100005 to the Educational Testing Service (ETS) as part of the Reading for Understanding Research (RfU) Initiative. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education or ETS. We are extremely grateful to the IES and ETS for sponsoring and supporting this research. We would like to also like to thank Matthias von Davier, Michael Kane, and Don Powers for their intellectual insights and thoughtful comments; and Jennifer Lentini and Kim Fryer for their editorial assistance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Sabatini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Sabatini, J., Petscher, Y., O’Reilly, T., Truckenmiller, A. (2015). Improving Comprehension Assessment for Middle and High School Students: Challenges and Opportunities. In: Santi, K., Reed, D. (eds) Improving Reading Comprehension of Middle and High School Students. Literacy Studies, vol 10. Springer, Cham. https://doi.org/10.1007/978-3-319-14735-2_6

Download citation

Publish with us

Policies and ethics