Educational Psychology Review

, Volume 26, Issue 3, pp 403–424 | Cite as

Designing Reading Comprehension Assessments for Reading Interventions: How a Theoretically Motivated Assessment Can Serve as an Outcome Measure

  • Tenaha O’ReillyEmail author
  • Jonathan Weeks
  • John Sabatini
  • Laura Halderman
  • Jonathan Steinberg
Research into Practice


When designing a reading intervention, researchers and educators face a number of challenges related to the focus, intensity, and duration of the intervention. In this paper, we argue there is another fundamental challenge—the nature of the reading outcome measures used to evaluate the intervention. Many interventions fail to demonstrate significant improvements on standardized measures of reading comprehension. Although there are a number of reasons to explain this phenomenon, an important one to consider is misalignment between the nature of the outcome assessment and the targets of the intervention. In this study, we present data on three theoretically driven summative reading assessments that were developed in consultation with a research and evaluation team conducting an intervention study. The reading intervention, Reading Apprenticeship, involved instructing teachers to use disciplinary strategies in three domains: literature, history, and science. Factor analyses and other psychometric analyses on data from over 12,000 high school students revealed the assessments had adequate reliability, moderate correlations with state reading test scores and measures of background knowledge, a large general reading factor, and some preliminary evidence for separate, smaller factors specific to each form. In this paper, we describe the empirical work that motivated the assessments, the aims of the intervention, and the process used to develop the new assessments. Implications for intervention and assessment are discussed.


Reading comprehension Assessment Outcome measures Intervention Disciplinary literacy 



The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through grant R305F100005 to Educational Testing Service as part of the Reading for Understanding Research Initiative; and in partnership with WestEd, IMPAQ International, and Empirical Education, Inc. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education, nor the Educational Testing Service. We would also like to thank Cynthia Greenleaf and Ruth Schoenbach of WestEd, Cheri Fancsali and the team at IMPAQ for their partnership and support with school sample recruitment in this study; our Cognitively Based Assessment as, of, and for, Learning (CBAL™) Initiative partners; the NAEP team for providing access and use of released items; Kelly Bruce for technical support; and Jennifer Lentini and Kim Fryer for editorial assistance; Paul Deane, Jim Carlson, Shelby Haberman, Matthias Von Davier and anonymous reviewers for their thoughtful reviews and helpful comments.


  1. Achugar, M., & Carpenter, B. D. (2012). Linguistics and Education, 23, 262–276.CrossRefGoogle Scholar
  2. Airey, J., & Linder, C. (2009). A disciplinary discourse perspective on university science learning: achieving fluency in a critical constellation of modes. Journal of Research in Science Teaching, 46, 27–49.CrossRefGoogle Scholar
  3. Bennett, R. E. (2010). Cognitively based assessment of, for, and as learning: a preliminary theory of action for summative and formative assessment. Measurement: Interdisciplinary Research and Perspectives, 8, 70–91.Google Scholar
  4. Berkeley, S., Scruggs, T. E., & Mastropieri, M. A. (2010). Reading comprehension instruction for students with learning disabilities, 1995–2006: a meta-analysis. Remedial and Special Education, 31, 423–436.CrossRefGoogle Scholar
  5. Bus, A. G. (1999). Phonological awareness and early reading: a meta-analysis of experimental training studies. Journal of Educational Psychology, 91, 403–414.CrossRefGoogle Scholar
  6. Cai, L. (2012). flexMIRT version 1.88: a numerical engine for multilevel item factor analysis and test scoring. [computer software]. Seattle, WA: Vector Psychometric Group.Google Scholar
  7. Cantrell, S., Almasi, J. F., Carter, J. C., & Rintamaa, M. (2013). Reading intervention in middle and high schools: implementation fidelity, teacher efficacy, and student achievement. Reading Psychology, 34, 26–58.CrossRefGoogle Scholar
  8. Chamberlain, A., Daniels, C., Madden, N. A., & Slavin, R. E. (2007). A randomized evaluation of the success for all middle school reading program. Middle Grades Reading Journal, 2, 1–22.Google Scholar
  9. Coiro, J. (2009). Rethinking reading assessment in a digital age: How is reading comprehension different and where do we turn now? Educational Leadership, 66, 59–63.Google Scholar
  10. Coyne, M. D., Little, M., Rawlinson, D., Simmons, D., Kwok, O., Kim, M., & Civetelli, C. (2013). Replicating the impact of a supplemental beginning reading intervention: the role of instructional context. Journal of Research on Educational Effectiveness, 6, 1–23.CrossRefGoogle Scholar
  11. Cutting, L., & Scarborough, H. (2006). Prediction of reading comprehension: relative contributions of word recognition, language proficiency, and other cognitive skills can depend on how comprehension is measured. Scientific Studies of Reading, 10, 277–299.CrossRefGoogle Scholar
  12. Denton, C. A., Wexler, J., Vaughn, S., & Bryan, D. (2008). Intervention provided to linguistically diverse middle school students with severe reading difficulties. Learning Disabilities Research & Practice, 23, 79–89.CrossRefGoogle Scholar
  13. Denton, C. A., Tolar, T. D., Fletcher, J. M., Barth, A. E., Vaughn, S., & Francis, D. J. (2013). Effects of tier 3 intervention for students with persistent reading difficulties and characteristics of inadequate responders. Journal of Educational Psychology, 105, 633–648.CrossRefGoogle Scholar
  14. Ehren, B. J. (2012). Foreword: complementary perspectives from multiple sources on disciplinary literacy. Topics in Language Disorders, 32, 5–6.CrossRefGoogle Scholar
  15. Faggella-Luby, M. N., Graner, S. P., Deshler, D. D., & Drew, S. V. (2012). Building a house on sand: Why disciplinary literacy is not sufficient to replace general strategies for adolescent learners who struggle. Topics in Language Disorders, 32, 69–84.CrossRefGoogle Scholar
  16. Fang, Z. (2012). Language correlates of disciplinary literacy. Topics in Language Disorders, 32, 19–34.CrossRefGoogle Scholar
  17. Fisk, C., & Hurst, C. B. (2003). Paraphrasing for comprehension. Reading Teacher, 57, 182–185.Google Scholar
  18. Flynn, L. J., Zheng, X., & Swanson, H. (2012). Instructing struggling older readers: a selective meta‐analysis of intervention research. Learning Disabilities Research & Practice, 27, 21–32.CrossRefGoogle Scholar
  19. Franzke, M., Kintsch, E., Caccamise, D., Johnson, N., & Dooley, S. (2005). Summary Street ®: computer support for comprehension and writing. Journal of Educational Computing Research, 33, 53–80.CrossRefGoogle Scholar
  20. Goldman, S. (2012). Adolescent literacy: learning and understanding content. Future of Children, 22, 89–116.CrossRefGoogle Scholar
  21. Goldman, S., & Rakestraw, J. (2000). Structural aspects of constructing meaning from text. In M. Kamil, P. Mosenthal, P. D. Pearson, & R. Barr (Eds.), Handbook of reading research (Vol. III, pp. 311–335). Mahwah, NJ: Erlbaum.Google Scholar
  22. Graesser, A. C., Singer, M., & Trabasso, T. (1994). Constructing inferences during narrative text comprehension. Psychological Review, 101, 371–395.CrossRefGoogle Scholar
  23. Guldenoğlu, İ., Kargin, T., & Miller, P. (2012). Comparing the word processing and reading comprehension of skilled and less skilled readers. Educational Sciences: Theory & Practice, 12, 2822–2828.Google Scholar
  24. Hammer, S., & Green, W. (2011). Critical thinking in a first year management unit: the relationship between disciplinary learning, academic literacy and learning progression. Higher Education Research & Development, 30, 303–315.CrossRefGoogle Scholar
  25. Kane, M. (2006). Validation. In R. J. Brennan (Ed.), Educational measurement (4th ed., pp. 18–64). Westport, CT: American Council on Education and Praeger.Google Scholar
  26. Keenan, J. M., Betjemann, R. S., & Olson, R. K. (2008). Reading comprehension tests vary in the skills they assess: differential dependence on decoding and oral comprehension. Scientific Studies of Reading, 12, 281–300.CrossRefGoogle Scholar
  27. Kim, J., & Quinn, D. (2013). The effects of summer reading on low-income children’s literacy achievement from kindergarten to grade 8: a meta-analysis of classroom and home interventions. Review of Educational Research, 83, 386–431.CrossRefGoogle Scholar
  28. Kim, J. S., Samson, J. F., Fitzgerald, R., & Hartry, A. (2010). A randomized experiment of a mixed-methods literacy intervention for struggling readers in grades 4–6: effects on word reading efficiency, reading comprehension and vocabulary, and oral reading fluency. Reading and Writing: An Interdisciplinary Journal, 23, 1109–1129.CrossRefGoogle Scholar
  29. Kintsch, W. (1998). Comprehension: a paradigm for cognition. Cambridge, UK: Cambridge University Press.Google Scholar
  30. Lee, C. (2004). Literacy in the academic disciplines and the needs of adolescent struggling readers. Voices in Urban Education, 3, 14–25.Google Scholar
  31. Lee, C. D., & Spratley, A. (2010). Reading in the disciplines: the challenges of adolescent literacy. New York, NY: Carnegie Corporation.Google Scholar
  32. Liu, O. L., Bridgeman, B., & Adler, R. (2012). Measuring learning outcomes in higher education: motivation matters. Educational Researcher, 41, 352–362.CrossRefGoogle Scholar
  33. Lykins, C. (2012). Why “what works” still doesn't work: how to improve research syntheses at the What Works Clearinghouse. Peabody Journal of Education, 87, 500–509.CrossRefGoogle Scholar
  34. MacGinitie, W. H., MacGinitie, R. K., Katherine, M., & Dreyer, L. G. (2000). Gates MacGinitie tests of reading. Itasca, IL: Riverside.Google Scholar
  35. McCrudden, M. T., & Schraw, G. (2007). Relevance and goal-focusing in text processing. Educational Psychology Review, 19, 113–139.CrossRefGoogle Scholar
  36. McKinley, R. L., & Reckase, M. D. (1983). An extension of the two-parameter logistic model to the multidimensional latent space (Research Report ONR 83–2). Iowa City, IA: The American College Testing Program.Google Scholar
  37. McNamara, D. S. (2004). SERT: self-explanation reading training. Discourse Processes, 38, 1–30.CrossRefGoogle Scholar
  38. McNamara, D. S. (2007). Reading comprehension strategies: theories, interventions, and technologies. Mahwah, NJ: Erlbaum.Google Scholar
  39. McNamara, D. S., & Kintsch, W. (1996). Learning from texts: effects of prior knowledge and text coherence. Discourse Processes, 22, 247–288.CrossRefGoogle Scholar
  40. Metzger, M. J. (2007). Making sense of credibility on the web: models for evaluating online information and recommendations for future research. Journal of the American Society for Information Science and Technology, 58, 2078–2091.CrossRefGoogle Scholar
  41. Meyer, B., & Wijekumar, K. (2007). A Web based tutoring system for the structure strategy: theoretical background, design, and findings. In D. S. McNamara (Ed.), Reading comprehension strategies: theory, interventions, and technologies (pp. 347–374). Mahwah, NJ: Erlbaum.Google Scholar
  42. Mislevy, R. J., & Haertel, G. (2006). Implications for evidence-centered design for educational assessment. Educational Measurement: Issues and Practice, 25, 6–20.CrossRefGoogle Scholar
  43. Mislevy, R. J., & Sabatini, J. P. (2012). How research on reading and research on assessment are transforming reading assessment (or if they aren’t, how they ought to). In J. Sabatini, E. Albro, & T. O'Reilly (Eds.), Measuring up: advances in how we assess reading ability (pp. 119–134). Lanham, MD: Rowman & Littlefield Education.Google Scholar
  44. Monte-Sano, C. (2010). Disciplinary literacy in history: an exploration of the historical nature of adolescents’ writing. Journal of the Learning Sciences, 19, 539–568.CrossRefGoogle Scholar
  45. National Assessment Governing Board (2010). Reading framework for the 2011 National Assessment of Educational Progress. Washington, DC: U.S. Department of Education. Retrieved from
  46. National Governors Association, & Council of Chief State School Officers (2010). Common core state standards for English language arts. Washington, DC: Authors Retrieved from
  47. O'Reilly, T., & Sabatini, J. (2013). Reading for understanding: how performance moderators and scenarios impact assessment design (Research Report No. RR-13-31). Princeton, NJ: Educational Testing Service.Google Scholar
  48. Ozuru, Y., Rowe, M., O'Reilly, T., & McNamara, D. S. (2008). Where’s the difficulty in standardized reading tests: the passage or the question? Behavior Research Methods, 40, 1001–1015.CrossRefGoogle Scholar
  49. Perfetti, C. A., & Adlof, S. M. (2012). Reading comprehension: a conceptual framework from word meaning to text meaning. In J. P. Sabatini, E. Albro, & T. O’Reilly (Eds.), Measuring up: advances in how we assess reading ability (pp. 3–20). Lanham, MD: Rowman & Littlefield Education.Google Scholar
  50. Poitras, E., & Trevors, G. (2012). Deriving empirically-based design guidelines for advanced learning technologies that foster disciplinary comprehension. Canadian Journal of Learning and Technology, 38, 1–21.Google Scholar
  51. Reynolds, M., Wheldall, K., & Madelaine, A. (2011). What recent reviews tell us about the efficacy of reading interventions for struggling readers in the early years of schooling. International Journal of Disability, Development and Education, 58, 257–286.CrossRefGoogle Scholar
  52. Rouet, J.-F., & Britt, M. A. (2011). Relevance processes in multiple document comprehension. In M. T. McCrudden, J. P. Magliano, & G. Schraw (Eds.), Relevance instructions and goal-focusing in text learning (pp. 19–52). Greenwich, CT: Information Age Publishing.Google Scholar
  53. Rupp, A., Ferne, T., & Choi, H. (2006). How assessing reading comprehension with multiple-choice questions shapes the construct: a cognitive processing perspective. Language Testing, 23, 441–474.CrossRefGoogle Scholar
  54. Sabatini, J., & O’Reilly, T. (2013). Rationale for a new generation of reading comprehension assessments. In B. Miller, L. Cutting, & P. McCardle (Eds.), Unraveling reading comprehension: behavioral, neurobiological, and genetic components (pp. 100–111). Baltimore, MD: Brookes Publishing.Google Scholar
  55. Sabatini, J., O'Reilly, T., & Deane, P. (2013). Preliminary reading literacy assessment framework: foundation and rationale for assessment and system design. (Research Report No. RR-13-30). Princeton, NJ: Educational Testing Service.Google Scholar
  56. Sabatini, J., O’Reilly, T., Halderman, L., & Bruce, K. (2014). Integrating scenario-based and component reading skill measures to understand the reading behavior of struggling readers. Learning Disabilities Research & Practice, 29(1), 36–43.CrossRefGoogle Scholar
  57. Schoenbach, R., Greenleaf, C., & Murphy, L. (2012). Engaged academic literacy for all. In Reading for understanding: How Reading Apprenticeship improves disciplinary learning in secondary and college classrooms, 2nd edition (pp. 1–6). San Francisco, CA: Jossey-Bass. Retrieved from:
  58. Scholin, S. E., & Burns, M. (2010). Relationship between pre-intervention data and post-intervention reading fluency and growth: a meta-analysis of assessment data for individual students. Psychology in the Schools, 49, 385–398.CrossRefGoogle Scholar
  59. Shanahan, T., & Shanahan, C. (2008). Teaching disciplinary literacy to adolescents: rethinking content-area literacy. Harvard Educational Review, 78, 40–59.Google Scholar
  60. Shanahan, C., Shanahan, T., & Misischia, C. (2011). Analysis of expert readers in three disciplines: history, mathematics, and chemistry. Journal of Literacy Research, 43, 393–429.CrossRefGoogle Scholar
  61. Shapiro, A. M. (2004). How including prior knowledge as a subject variable may change outcomes of learning research. American Educational Research Journal, 41, 159–189.CrossRefGoogle Scholar
  62. Song, M., & Herman, R. (2010). Critical issues and common pitfalls in designing and conducting impact studies in education: lessons learned from the What Works Clearinghouse (phase I). Educational Evaluation and Policy Analysis, 32, 351–371.CrossRefGoogle Scholar
  63. Suggate, S. (2010). Why what we teach depends on when: grade and reading intervention modality moderate effect size. Developmental Psychology, 46, 1556–1579.CrossRefGoogle Scholar
  64. Tunmer, W. E., Chapman, J. W., & Prochnow, J. E. (2004). Why the reading achievement gap in New Zealand won’t go away: evidence from the PIRLS 2001 International Study of Reading Achievement. New Zealand Journal of Educational Studies, 39, 127–145.Google Scholar
  65. van den Broek, P., Lorch, R. F., Jr., Linderholm, T., & Gustafson, M. (2001). The effects of readers’ goals on inference generation and memory for texts. Memory & Cognition, 29, 1081–1087.CrossRefGoogle Scholar
  66. Vaughn, S., Swanson, E. A., Roberts, G., Wanzek, J., Stillman-Spisak, S. J., Solis, M., & Simmons, D. (2013). Improving reading comprehension and social studies knowledge in middle school. Reading Research Quarterly, 48, 77–93.Google Scholar
  67. Vidal-Abarca, E., Mãná, A., & Gil, L. (2010). Individual differences for self-regulating task-oriented reading activities. Journal of Educational Psychology, 102, 817–826.CrossRefGoogle Scholar
  68. Wanzek, J., Vaughn, S., Scammacca, N. K., Metz, K., Murray, C. S., Roberts, G., & Danielson, L. (2013). Extensive reading interventions for students with reading difficulties after grade 3. Review of Educational Research, 83, 163–195.CrossRefGoogle Scholar
  69. What Works Clearinghouse (2012). Phonological awareness training (What Works Clearinghouse Intervention Report). U.S. Department of Education, Institute of Education Sciences What Works Clearinghouse, Retrieved from
  70. Zygouris-Coe, V. I. (2012). Disciplinary literacy and the common core state standards. Topics in Language Disorders, 32, 35–50.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Tenaha O’Reilly
    • 1
    Email author
  • Jonathan Weeks
    • 1
  • John Sabatini
    • 1
  • Laura Halderman
    • 1
  • Jonathan Steinberg
    • 1
  1. 1.Educational Testing Service Research & DevelopmentPrincetonUSA

Personalised recommendations