Abstract
For decades, standardized reading comprehension tests have consisted of a series of passages and associated multiple-choice questions. Although widely used in and out of the classroom, there continues to be considerable disagreement regarding how or whether such tests have net value in the service of advancing educational progress in reading. This chapter begins with a review of features that characterize standardized reading assessments. In particular, we discuss how assessment designs and analytics reflect a balance of practical and measurement constraints. We then discuss how advances in the learning sciences, measurement, and electronic technologies have opened up the design space for a new generation of reading assessments. Abstracting from this review, we end by presenting some examples of prototype assessments that reflect opportunities for enhancing the value and utility of reading assessments in the future.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
It is not that psychometrics cannot handle scores other than dichotomous; however, the complexity increases and efficiency in design and analyses typically decrease.
- 3.
It is worth noting that there are several assumptions made about the errors in classical test theory (Kline, 2005). First, it is expected that T and E are uncorrelated, meaning that an individuals’ errors, either negative or positive will not maintain a systematic relation with the true score. Second, it is expected that an error score on one form of the assessment (e.g., the three reading comprehension passages) will be uncorrelated with the error on a parallel form of the assessment (e.g., a set of three different reading comprehension passages). Third, it is expected that the errors are normally distributed with the average of the random errors around the individual’s score to be zero. This means that at times the reading comprehension score may be high such as when the student may have particularly high self-efficacy or recalls the information well from a prior testing, or low such as when the student skipped breakfast, but because the random errors are assumed to be normally distributed, the average across testing periods will be zero.
- 4.
In psychometrics, item independence is introduced as a purely statistical assumption, though it has practical implications for task design, as discussed later.
- 5.
It follows then that if tests are strictly parallel, we can replace the covariance of true scores T and T′ – COV (T, T′) – by the variance of true scores V(T), and the CTT assumption of uncorrelated errors COV (E, E′) = 0 = COV (T, E′) gives us what we need.
- 6.
Technically, IRT models do not contain an error variable as a component of the model equations. They are based on a probability model for item level variables and assume a latent variable. The standard error in IRT models is based on assumptions we make about the model, and on what is known as the Fisher information inequality or Cramer Rao lower bound.
- 7.
In classical test theory, methods of equating test forms are used to address these kinds of problem.
- 8.
Interested readers should visit the CBAL website at: http://www.ets.org/research/topics/cbal/initiative
- 9.
Due to space limitations, we only elaborate on the RfU assessment project in the paper. Both CBAL and RfU share many of the same underlying principles and both incorporate innovative design techniques including scenario-based tasks and assessments.
References
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Au, W. (2007). High-stakes testing and curricular control: A qualitative metasynthesis. Educational Researcher, 36, 258–267.
Baker, E. L. (2013). The chimera of validity. Teachers College Record (090302), 115, 1–26.
Bennett, R. E. (2011a, June). Theory of action and educational assessment. Paper presented at the National Conference on Student Assessment, Orlando, FL.
Bennett, R. E. (2011b). CBAL: Results from piloting innovative K–12 assessments (Research report no. RR-11-23). Princeton, NJ: ETS.
Bennett, R. E., & Gitomer, D. H. (2009). Transforming K–12 assessment: Integrating accountability testing, formative assessment and professional support. In C. Wyatt-Smith & J. J. Cumming (Eds.), Educational assessment in the 21st century (pp. 43–62). New York, NY: Springer.
Berliner, D. C. (2011). Rational responses to high-stakes testing: The case of curriculum narrowing and the harm that follows. Cambridge Journal of Education, 41, 278–302.
Brennan, R. L. (Ed.). (2006). Educational measurement (4th ed.). Westport, CT: American Council on Education/Praeger.
Britt, M. A., & Rouet, J. F. (2012). Learning with multiple documents: Component skills and their acquisition. In M. J. Lawson & J. R. Kirby (Eds.), The quality of learning: Dispositions, instruction, and mental structures (pp. 276–314). Cambridge, UK: Cambridge University Press.
Cain, K., & Parrila, R. (2014). Introduction to the special issue. Theories of reading: what we have learned from two decades of scientific research. Scientific Studies of Reading, 18, 1–4.
Christ, T. J., & Hintze, J. M. (2007). Psychometric considerations when evaluating response to intervention. In S. R. Jimerson, M. K. Burns, & A. M. VanDerHeyden (Eds.), Handbook of response to intervention: The science and practice of assessment and intervention(pp. 99–105). New York, NY: Springer.
Christo, C. (2005). Critical characteristics of a three-tiered model applied to reading interventions. California School Psychologist, 10, 33–44.
Coiro, J. (2009). Rethinking reading assessment in a digital age: How is reading comprehension different and where do we turn now? Educational Leadership, 66, 59–63.
Coiro, J. (2011). Predicting reading comprehension on the Internet: Contributions of offline reading skills, online reading skills, and prior knowledge. Journal of Literacy Research, 43, 352–392.
Compton, D., Fuchs, D., Fuchs, L., & Bryant, J. (2006). Selecting at-risk readers in first grade for early intervention: A two-year longitudinal study of decision rules and procedures. Journal of Educational Psychology, 98, 394–409.
Deane, P., Sabatini, J., & O’Reilly, T. (2012). English language arts literacy framework. Princeton, NJ: ETS. Retrieved from http://elalp.cbalwiki.ets.org/Table+of+Contents
Educational Testing Service. (2002). ETS standards for quality and fairness. Princeton, NJ: Author. Retrieved from https://www.ets.org/s/about/pdf/standards.pdf
Educational Testing Service. (2013). Reading for understanding. Retrieved from http://www.ets.org/research/topics/reading_for_understanding/
Embretson, S., & Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, DHHS. (2010). Developing early literacy: Report of the National Early Literacy Panel. Washington, DC: U.S. Government Printing Office.
Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person parameters. Educational and Psychological Measurement, 58, 357–381.
Foorman, B., Petscher, Y., & Schatschneider, C. (2013). FCRR reading assessment. Tallahassee, FL: Florida Center for Reading Research.
Fuchs, D., Compton, D., Fuchs, L., Bryant, J., & Davis, G. (2008). Making ‘secondary intervention’ work in a three-tier responsiveness-to-intervention model: Findings from the first-grade longitudinal reading study of the National Research Center on Learning Disabilities. Reading and Writing, 21, 413–436.
Fuchs, L. S., Fuchs, D., & Compton, D. L. (2010). Rethinking response to intervention at middle and high school. School Psychology Review, 39, 22–28.
Gearhart, M., & Herman, J. L. (1998). Portfolio assessment: Whose work is it? Issues in the use of classroom assignments for accountability. Educational Assessment, 5, 41–56.
Gil, L., Bråten, I., Vidal-Abarca, E., & Strømsø, H. I. (2010). Understanding and integrating multiple science texts: Summary tasks are sometimes better than argument tasks. Reading Psychology, 31, 30–68.
Goldman, S. (2012). Adolescent literacy: Learning and understanding content. Future of Children, 22, 73–88.
Goldman, S. R. (2004). Cognitive aspects of constructing meaning through and across multiple texts. In N. Shuart-Ferris & D. M. Bloome (Eds.), Uses of intertextuality in classroom and educational research (pp. 317–351). Greenwich, CT: Information Age Publishing.
Gordon Commission. (2013). To assess, to teach, to learn: a vision for the future of assessment. Retrieved from http://www.gordoncommission.org/rsc/pdfs/gordon_commission_technical_report.pdf
Gorin, J., & Mislevy, R. J. (2013, September). Inherent measurement challenges in the next generation science standards for both formative and summative assessment. Paper presented at the invitational research symposium on science assessment, Washington, DC.
Haertel, E. H. (2006). Reliability. In R. L. Brenna (Ed.), Educational measurement (4th ed., pp. 65–110). Westport, CT: American Council on Education/Praeger.
Halverson, R. (2010). School formative feedback systems. Peabody Journal of Education, 85, 130–155.
Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12, 38–47.
Institute of Education Sciences. (2010). Reading for understanding initiative. Washington, DC: U. S. Department of Education. Retrieved from http://ies.ed.gov/ncer/projects/program.asp?ProgID=62
International Association for the Evaluation of Educational Achievement. (2013a). Progress in international reading literacy study 2016. Retrieved from http://www.iea.nl/?id=457
International Association for the Evaluation of Educational Achievement. (2013b). ePirls online reading 2016. Retrieved from http://www.iea.nl/fileadmin/user_upload/Studies/PIRLS_2016/ePIRLS_2016_Brochure.pdf
Jimerson, S. R., Burns, M. K., & VanDerHeyden, A. M. (2007). Handbook of response to intervention: The science and practice of assessment and intervention. Springfield, IL: Springer.
Kafer, K. (2002, December 1). High-poverty students excel with direct instruction. Heartlander Magazine. Retrieved from http://news.heartland.org/newspaper-article/2002/12/01/high-poverty-students-excel-direct-instruction
Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.
Kane, M. (2006). Validation. In R. J. Brennan (Ed.), Educational measurement (4th ed., pp. 18–64). Lanham, MD: Rowman & Littlefield Education.
Katz, S., & Lautenschlager, G. (2001). The contribution of passage and no-passage factors to item performance on the SAT reading task. Educational Assessment, 7, 165–176.
Kieffer, M. J., & Petscher, Y. (2013). Unique contributions of measurement error? Applying a bi-factor structural equation model to investigate the roles of morphological awareness and vocabulary knowledge in reading comprehension. Paper presented at the American Education Research Association, San Francisco, CA.
Kim, H. R., Zhang, J., & Stout, W. F. (1995). A new index of dimensionality—DETECT. Unpublished manuscript.
Kline, T. (2005). Psychological testing: A practical approach to design and evaluation. Thousand Oaks, CA: Sage Publications.
Klingner, J., & Edwards, P. (2006). Cultural considerations with response to intervention models. Reading Research Quarterly, 41, 108–117.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York, NY: Springer.
Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994a). The evolution of a portfolio program: The impact and quality of the Vermont program in its second year (1992–1993). Los Angeles, CA: UCLA, National Center for Research on Evaluation, Standards, and Student Testing.
Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994b). The Vermont portfolio assessment program: Findings and implications. Educational Measurement: Issues and Practice, 13, 5–10.
Koretz, D., Stecher, S., Klein, D., McCaffrey, D., & Deibert, E. (1993). Can portfolios assess student performance and influence instruction? The 1991–1992 Vermont experience. Santa Monica, CA: RAND.
Lee, C. D., & Spratley, A. (2010). Reading in the disciplines: The challenges of adolescent literacy. New York, NY: Carnegie Corporation of New York.
Leu, D., Kinzer, C., Coiro, J., Castek, J., & Henry, L. (2013). New literacies: A dual-level theory of the changing nature of literacy, instruction, and assessment. In D. E. Alvermann, N. J. Unrau, & R. B. Ruddell (Eds.), Theoretical models and processes of reading (6th ed., pp. 1150–1181). Newark, DE: International Reading Association.
Lewis, D. M., Green, D. R., Mitzel, H. C., Baum, K., & Patz, R. J. (1998). The bookmark standard setting procedure: Methodology and recent implementations. Paper presented at the annual meeting of the National Council for Measurement in Education, San Diego, CA.
Livingston, S. A. (2004). Equating test scores (without IRT). Princeton, NJ: ETS.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
McCrudden, M. T., & Schraw, G. (2007). Relevance and goal-focusing in text processing. Educational Psychology Review, 19, 113–139.
McMurrer, J. (2008). Instructional time in elementary schools: A closer look at changes for specific subjects. Washington, DC: Center on Education Policy.
Messick, S. (1983). Assessment of children. In P. Mussen (Ed.), Handbook of child psychology, volume 1: History, theory, and methods (4th ed., pp. 477–526). New York, NY: Wiley.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3dth ed., pp. 13–103). New York, NY: Macmillan.
Miller, M. D. (2002). Generalizability of performance-based assessments. Technical guidelines for performance assessment. Washington, DC: Council of Chief State School Officers.
Minarechová, M. (2012). Negative impacts of high-stakes testing. Journal of Pedagogy/Pedagogický Casopis, 3, 82–100.
Mislevy, R. J. (2006). Cognitive psychology and educational assessment. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 257–306). Westport, CT: American Council on Education/Praeger.
Mislevy, R. J. (2007). Validity by design. Educational Researcher, 36, 463–469.
Mislevy, R. J. (2008). How cognitive science challenges the educational measurement tradition. Measurement: Interdisciplinary Research and Perspectives, 6, 124.
Mislevy, R. J. (2009). Validity from the perspective of model-based reasoning. In R. L. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 83–108). Charlotte, NC: Information Age Publishing.
Mislevy, R. J., & Haertel, G. (2006). Implications for evidence-centered design for educational assessment. Educational Measurement: Issues and Practice, 25, 6–20.
Mislevy, R. J., & Sabatini, J. P. (2012). How research on reading and research on assessment are transforming reading assessment (or if they aren’t, how they ought to). In J. Sabatini, E. Albro, & T. O’Reilly (Eds.), Measuring up: Advances in how we assess reading ability (pp. 119–134). Lanham, MD: Rowman & Littlefield Education.
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–67.
Mullis, I. V. S., Martin, M. O., Kennedy, A. M., Trong, K. L., & Sainsbury, M. (2009). PIRLS 2011 assessment framework. Boston, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. Retrieved from http://timssandpirls.bc.edu/pirls2011/downloads/PIRLS2011_Framework.pdf
National Governors Association Center for Best Practices, & Council of Chief State School Officers. (2010). Common Core State Standards for English language arts and literacy in history/social studies, science, and technical subjects. Retrieved from http://www.corestandards.org/assets/CCSSI_ELA%20Standards.pdf
National Institute of Child Health and Human Development. (2000). Report of the National Reading Panel. Teaching children to read: an evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. Bethesda, MC: Author. Retrieved from https://www.nichd.nih.gov/publications/pubs/nrp/Pages/smallbook.aspx
Neill, M. (1997). Testing our children: A report card on state assessment systems. Retrieved from http://www.fairtest.org/testing-our-children-introduction
Nelson, H. (2013). Testing more, teaching less: What America’s obsession with student testing costs in money and lost instructional time. Washington, DC: American Federation of Teachers. Retrieved from http://www.aft.org/pdfs/teachers/testingmore2013.pdf
Nunnally, J. C., & Bernstein, L. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.
O’Reilly, T., & Sabatini, J. (2013). Reading for understanding: How performance moderators and scenarios impact assessment design (Research report no. RR-13-31). Princeton, NJ: ETS.
O’Reilly, T., Sabatini, J., Bruce, K., Pillarisetti, S., & McCormick, C. (2012). Middle school reading assessment: Measuring what matters under an RTI framework. Reading Psychology Special Issue: Response to Intervention, 33, 162–189.
Organisation for Economic Co-operation and Development. (2009a). PISA 2009 assessment framework- key competencies in reading, mathematics and science. Paris, France: Author. Retrieved from http://www.oecd.org/pisa/pisaproducts/44455820.pdf
Organisation for Economic Co-operation and Development. (2009b). PIAAC literacy: A conceptual framework. Paris, France: Author. Retrieved from http://www.oecd-ilibrary.org/content/workingpaper/220348414075
Organisation for Economic Co-operation and Development. (2013). PISA 2012 assessment and analytical framework: Mathematics, reading, science, problem solving, and financial literacy. Paris, France: Author.
Owens, E. (2013, November 18). Common Core critics celebrate National Don’t Send Your Child to School Day. Daily Caller. Retrieved from http://dailycaller.com/2013/11/18/common-core-critics-celebrate-national-dont-send-your-child-to-school-day/
Partnership for 21st Century Skills. (2004). Learning for the 21st century: A report and mile guide for 21st century skills. Washington, DC: Author. Retrieved from http://www.p21.org/storage/documents/P21_Report.pdf
Partnership for 21st Century Skills. (2008). 21st century skills and English map. Washington, DC: Author. Retrieved from http://www.p21.org/storage/documents/21st_century_skills_english_map.pdf
Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academy Press.
Peng, L., Li, C., & Wan, X. (2012). A framework for optimising the cost and performance of concept testing. Journal of Marketing Management, 28, 1000–1013.
Perfetti, C., & Stafura, J. (2014). Word knowledge in a theory of reading comprehension. Scientific Studies of Reading, 18, 22–37.
Petress, K. (2006). Perils of current testing mandates. Journal of Instructional Psychology, 33, 80–82.
Petscher, Y. (2011, July). A comparison of methods for scoring multidimensional constructs unidimensionally in literacy research. Paper presented at the annual meeting of the society for the scientific study of reading, St. Pete Beach, FL.
Petscher, Y., & Schatschneider, C. (2012). Validating scores from new assessments: A comparison of classical test theory and item response theory. In G. Tenenbaum, R. Eklund, & A. Kamata (Eds.), Handbook of measurement in sport and exercise psychology (pp. 41–52). Champaign, IL: Human Kinetics.
Phelps, R. P. (2012). The effect of testing on student achievement, 1910–2010. International Journal of Testing, 12, 21–43.
Powers, D. E., & Wilson-Leung, S. (1995). Answering the new SAT reading comprehension questions without the passages. Journal of Educational Measurement, 32(2), 105–130.
Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36.
Rijmen, F. (2010). Formal relations and an empirical comparison among the bi-factor, the testlet, and a second-order multidimensional IRT model. Journal of Educational Measurement, 47, 361–372.
Rijmen, F. (2011). Hierarchical factor item response theory models for PIRLS: Capturing cluster effects at multiple levels. In M. von Davier & D. Hastedt (Eds.), IERI monograph series: Issues and methodologies in large-scale assessments (Vol. 4, pp. 59–74). Hamburg, Germany: IERI.
Rupp, A., Ferne, T., & Choi, H. (2006). How assessing reading comprehension with multiple-choice questions shapes the construct: A cognitive processing perspective. Language Testing, 23, 441–474.
Sabatini, J., Albro, E., & O’Reilly, T. (2012). Measuring up: Advances in how we assess reading ability. Lanham, MD: Rowman & Littlefield Education.
Sabatini, J., Bruce, K., & Steinberg, J. (2013). SARA reading components tests, RISE form (Research report no. RR-13-08). Princeton, NJ: ETS.
Sabatini, J., & O’Reilly, T. (2013). Rationale for a new generation of reading comprehension assessments. In B. Miller, L. Cutting, & P. McCardle (Eds.), Unraveling the behavioral, neurobiological, and genetic components of reading comprehension (pp. 100–111). Baltimore, MD: Brookes Publishing.
Sabatini, J., O’Reilly, T., & Albro, E. (2012). Reaching an understanding: Innovations in how we view reading assessment. Lanham, MD: Rowman & Littlefield Education.
Sabatini, J., O’Reilly, T., & Deane, P. (2013). Preliminary reading literacy assessment framework: Foundation and rationale for assessment and system design (Research Report No. RR-13-30). Princeton, NJ: Educational Testing Service.
Sabatini, J., O’Reilly, T., Halderman, L., & Bruce, K. (2014). Integrating scenario-based and component reading skill measures to understand the reading behavior of struggling readers. Learning Disabilities Research & Practice, 29, 36–43.
Santelices, M. V., & Wilson, M. (2012). On the relationship between differential item functioning and item difficulty: an issue of methods? Item response theory approach to differential item functioning. Educational & Psychological Measurement, 72, 5–36.
Schmidt, W. H., McKnight, C. C., & Raizen, S. A. (1997). A spirited vision: A investigation of U.S. science and mathematics education. Dordrecht, The Netherlands: Kluwer.
Shanahan, C., Shanahan, T., & Misischia, C. (2011). Analysis of expert readers in three disciplines: History, mathematics, and chemistry. Journal of Literacy Research, 43, 393–429.
Shanahan, T., & Shanahan, C. (2008). Teaching disciplinary literacy to adolescents: Rethinking content-area literacy. Harvard Educational Review, 78, 40–59.
Shephard, L. A. (2013). Validity for what purpose? Teachers College Record (090307), 115, 1–12.
Siena College Research Institute. (2013). Siena College poll: Divided over Common Core, NYers say too much testing. Loudonville, NY: Author. Retrieved from http://www.siena.edu/uploadedfiles/home/parents_and_community/community_page/sri/sny_poll/SNY%20November%202013%20Poll%20Release%20--%20FINAL.pdf
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120.
Snow, R. E., & Lohman, D. F. (1989). Implications of cognitive psychology for educational measurement. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 263–331). New York, NY: Macmillan.
Spector, J. (2013). NY voters: Too much testing in schools. Retrieved from http://www.democratandchronicle.com/story/news/2013/11/18/ny-voters-too-much-testing-in-schools-/3634223/
Stecher, B., & Barron, S. (2001). Unintended consequences of test-based accountability when testing in ‘milepost’ grades. Educational Assessment, 7, 259–281.
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.
Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325.
Stout, W. F., Douglas, J., Junker, B., & Roussos, L. A. (1993). DIMTEST manual. Unpublished manuscript.
Tate, R. (2002). Test dimensionality. In G. Tindal & T. M. Haladyna (Eds.), Large-scale assessment programs for all students: Validity, technical adequacy, and implementation. Mahwah, NJ: Erlbaum.
Thompson, N. A., & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research, and Evaluation, 16. Retrieved from http://pareonline.net/pdf/v16n1.pdf
Tong, Y., & Kolen, M. J. (2010, May). IRT proficiency estimators and their impact. Paper presented at the annual conference of the National Council of Measurement in Education, Denver, CO.
U.S. Department of Education. (2009). Race to the top executive summary. Washington, DC: Author. Retrieved from http://www2.ed.gov/programs/racetothetop/executive-summary.pdf
van den Broek, P., Lorch, R. F., Jr., Linderholm, T., & Gustafson, M. (2001). The effects of readers’ goals on inference generation and memory for texts. Memory & Cognition, 29, 1081–1087.
van der Linden, W. J., & Glas, C. A. W. (Eds.). (2010). Elements of adaptive testing. New York, NY: Springer.
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York, NY: Cambridge University Press.
Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., & Mislevy, R. J. (2000). Computerized adaptive testing: A primer. New York, NY: Routledge.
Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473–492.
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361–375.
Wise, S. L., & Kingsbury, G. G. (2000). Practical issues in developing and maintaining a computerized adaptive testing program. Psicológica, 21, 135–155.
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.
Yovanoff, P., & Tindal, G. (2007). Scaling early reading alternate assessments with statewide measures. Exceptional Children, 73, 184–201.
Acknowledgements
The research reported here was supported in part by the Institute of Education Sciences (IES), U.S. Department of Education, through Grant R305F100005 to the Educational Testing Service (ETS) as part of the Reading for Understanding Research (RfU) Initiative. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education or ETS. We are extremely grateful to the IES and ETS for sponsoring and supporting this research. We would like to also like to thank Matthias von Davier, Michael Kane, and Don Powers for their intellectual insights and thoughtful comments; and Jennifer Lentini and Kim Fryer for their editorial assistance.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Sabatini, J., Petscher, Y., O’Reilly, T., Truckenmiller, A. (2015). Improving Comprehension Assessment for Middle and High School Students: Challenges and Opportunities. In: Santi, K., Reed, D. (eds) Improving Reading Comprehension of Middle and High School Students. Literacy Studies, vol 10. Springer, Cham. https://doi.org/10.1007/978-3-319-14735-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-14735-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14734-5
Online ISBN: 978-3-319-14735-2
eBook Packages: Humanities, Social Sciences and LawEducation (R0)