Skip to main content
Log in

Improving the reliability of student scores from speeded assessments: an illustration of conditional item response theory using a computer-administered measure of vocabulary

  • Published:
Reading and Writing Aims and scope Submit manuscript

Abstract

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. This assumes that the pseudo-guessing parameter has a value of 0.

References

  • Adams, M. J. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: MIT.

    Google Scholar 

  • Al Otaiba, S., Petscher, Y., Pappamihiel, N. E., Williams, R. S., Drylund, A. K., & Connor, C. M. (2009). Modeling oral reading fluency development in Latino students: A longitudinal study across second and third grade. Journal of Educational Psychology, 101, 315–329. doi:10.1037/a0014698.

    Article  Google Scholar 

  • Andrich, D. (1988). Rasch models for measurement. Sage Publications.

  • Ardoin, S. P., & Christ, T. J. (2009). Curriculum-based measurement of oral reading: Standard errors associated with progress monitoring outcomes from DIBELS, AIMSweb and an experimental passage set. School Psychology Review, 38, 266–283.

    Google Scholar 

  • Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238–246.

  • Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness-of-fit in the analysis of covariance structures. Psychological Bulletin, 88, 588–600.

  • Blackwell, C. K., Lauricella, A. R., Wartella, E., Robb, M., & Schomburg, R. (2013). Adoption and use of technology in early education: The interplay of extrinsic barriers and teacher attitudes. Computers & Education, 69, 310–319. doi:10.1016/j.compedu.2013.07.024.

    Article  Google Scholar 

  • Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods and Research, 21, 230–258.

  • Cattell, R. B. (1948). Concepts and methods in the measurement of group syntality. Psychological Review, 55, 48–63. doi:10.1037/h0055921.

    Article  Google Scholar 

  • Chard, D. J., Vaughn, S., & Tyler, B. (2002). A synthesis of research on effective interventions for building reading fluency with elementary students with learning disabilities. Journal of Learning Disabilities, 35(5), 386–406. http://search.proquest.com/docview/619935634?accountid=4840.

  • Christ, T. J., & Ardoin, S. P. (2009). Curriculum-based measurement of oral reading: Passage equivalence and probe-set development. Journal of School Psychology, 47, 55–75. doi:10.1016/j.jsp.2008.09.004.

    Article  Google Scholar 

  • Christ, T. J., & Silberglitt, B. (2007). Estimates of the standard error of measurement for curriculum-based measures of oral reading fluency. School Psychology Review, 36, 130–146.

    Google Scholar 

  • Christ, T. J., Silberglitt, B., Yeo, S., & Cormier, D. (2010). Curriculum-based measurement of oral reading: An evaluation of growth rates and seasonal effects among students served in general and special education. School Psychology Review, 29, 447–462.

    Google Scholar 

  • Cummings, K. D., Atkins, T., Allison, R., & Cole, C. (2008). Response to intervention. Teaching Exceptional Children, 40, 24–31.

    Google Scholar 

  • Cunningham, A. E., & Stanovich, K. E. (1997). Early reading acquisition and its relation to reading experience and ability 10 years later. Developmental Psychology, 33(6), 934–945.

    Article  Google Scholar 

  • de Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford.

    Google Scholar 

  • Deno, S. L. (2003). Developments in curriculum-based measurement. The Journal of Special Education, 37, 184–192.

  • Divgi, D. R. (1980, April). Dimensionality of binary items: Use of a mixed model. Paper presented at the annual meeting of the National Council on Measurement in Education. Boston, MA.

  • Dunn, T. J., Baguley, T., & Brunsden, V. (2013). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology. doi:10.1111/bjop.12046.

  • Edgar et al. (2013). Neuromagnetic oscillations predict evoked-response latency delays and core language deficits in autism spectrum disorders. Journal of autism and developmental bisorders, 1–11.

  • Educational Testing Service. (2007). Test and score data summary for TOEFL internet-based test. Princeton, NJ: Author.

    Google Scholar 

  • Embretson, S. E., & Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum Publishers.

    Google Scholar 

  • Ferrando, P., & Lorenzo-Seva, U. (2007). An item response theory model for incorporating response time data in binary personality items. Applied Psychological Measurement, 31, 525–543. doi:10.1177/0146621606295197.

    Article  Google Scholar 

  • Foorman, B. R., Petscher, Y., & Bishop, M. D. (2012). The incremental variance of morphological knowledge to reading comprehension in grades 3–10 beyond prior reading comprehension, spelling, and text reading efficiency. Learning and Individual Differences, 22, 792–798. doi:10.1016/j.lindif.2012.07.009.

    Article  Google Scholar 

  • Fox, J. P., Klein Entink, R. H. K., & van der Linden, W. J. (2007). Modeling of responses and response time with the package CIRT. Journal of Statistical Software, 20, 1–14.

    Google Scholar 

  • Francis, D. J., Santi, K. S., Barr, C., Fletcher, J. M., Varisco, A., & Foorman, B. R. (2008). Form effects on the estimation of students’ oral reading fluency using DIBELS. Journal of School Psychology, 46, 315–342. doi:10.1016/j.jsp.2007.06.003.

    Article  Google Scholar 

  • Fuchs, L. S., Fuchs, D., Hosp, M. K., & Jenkins, J. R. (2001). Oral reading fluency as an indicator of reading competence: A theoretical, empirical, and historical analysis. Scientific Studies of Reading, 5, 239–256. doi:10.1207/S1532799XSSR0503_3.

    Article  Google Scholar 

  • Good, R. H., Simmons, D. C., & Kame’enui, E. J. (2001). The importance of decision-making utility of a continuum of fluency-based indicators of foundational reading skills for third-grade high-stakes outcomes. Scientific Studies of Reading, 5, 257–288. doi:10.1207/S1532799XSSR0503_4.

    Article  Google Scholar 

  • Goodglass, H., Theurkauf, J.C., & Wingfield, A. (1984). Naming latencies as evidence for two modes of lexical retrieval. Applied Psycholinguistics, 5, 135–146.

  • Gray, L., Thomas, N., & Lewis, L. (2010). Teachers’ use of educational technology in U.S. public schools: 2009 (NCES 2010-040). Retrieved from the U.S. Department of Education, National Center for Educational Statistics, Institute of Education Sciences. http://nces.ed.gov/pubs2010/2010040.pdf.

  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.

    Google Scholar 

  • Jang, E. E., & Roussos, L. (2007). An investigation into the dimensionality of TOEFL using conditional covariance-based nonparametric approach. Journal of Educational Measurement, 44, 1–21.

    Article  Google Scholar 

  • Kamil, M. L. (2004). Vocabulary and comprehension instruction: Summary and implications of the national reading panel findings. In P. McCardle & V. Chhabra (Eds.), The voice of evidence in reading research (pp. 213–234). Baltimore: Paul H Brookes Publishing.

    Google Scholar 

  • Kim, Y.-S., Wagner, R. K., & Foster, E. (2011). Relations among oral reading fluency, silent reading fluency, and reading comprehension: A latent variable study of first-grade readers. Scientific Studies of Reading, 15, 338–362. doi:10.1080/10888438.2010.493964.

    Article  Google Scholar 

  • Klein Entink, R. H., Kuhn, J.-T., Hornke, L. F., & Fox, J.-P. (2009). Evaluating cognitive theory: A joint modeling approach using responses and response times. Psychological Methods, 14, 54–75. doi:10.1037/a0014877.

    Article  Google Scholar 

  • Koenker, R. (2013). Quantreg: Quantile regression. R package version 4.98. http://CRAN.R-project.org/package=quantreg.

  • Koenker, R., & Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33–50.

  • Kolen, M. J., & Brennan, R. L. (2004). Test equating: Methods and practices (2nd ed.). New York: Springer-Verlag.

    Book  Google Scholar 

  • LaBerge, D., & Samuels, S. J. (1974). Toward a theory of automatic information processing in reading. Cognitive Psychology, 62, 293–323. doi:10.1016/0010-0285(74)90015-2.

    Article  Google Scholar 

  • Logan, J. A. R., & Petscher, Y. (2010). School profiles of at-risk student concentration: Differential growth in oral reading fluency. Journal of School Psychology, 48, 163–186. doi:10.1016/j.jsp.2009.12.002.

    Article  Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. New York: Erlbaum Associates.

    Google Scholar 

  • McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Mercer, S. H., Dufrene, B. A., Zoder-Martell, K., Harpole, L. L., Mitchell, R. R., & Blaze, J. T. (2012). Generalizability theory analysis of CBM maze reliability in third- through fifth-grade students. Assessment for Effective Intervention, 37, 183–190. doi:10.1177/1534508411430319.

    Article  Google Scholar 

  • Metrik et al. (2012). Balanced placebo design with marijuana: Pharmacological and expectancy effects on impulsivity and risk taking. Psychopharmacology, 223, 489-499.

  • Miranda, H., & Russell, M. (2011). Predictors of teacher-directed student use of technology in elementary classrooms: A multilevel SEM approach using data from the USEIT study. Journal of Research on Technology in Education, 43, 301–323.

    Article  Google Scholar 

  • Muthen, L. K., & Muthen, B. O. (1998–2012). Mplus (7th ed.). Los Angeles, CA: Muthen & Muthen.

  • National Institute of Child Health and Human Development. (2000). Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH publication no. 00-4769).

  • Perfetti, C. & Hogaboam, T. (1975). Relationship between single word decoding and reading comprehension skill. Journal of Educational Psychology, 67, 461-469.

  • Petscher, Y., Cummings, K. D., Biancarosa, G., & Fien, H. (2013). Advanced (measurement) applications of curriculum-based measurement of reading. Assessment for Effective Intervention, 38, 71–75. doi:10.1177/1534508412461434.

    Article  Google Scholar 

  • Petscher, Y., & Kim, Y. S. (2011). The utility and accuracy of oral reading fluency score types in predicting reading comprehension. Journal of School Psychology, 49, 107–129. doi:10.1016/j.jsp.2010.09.004.

    Article  Google Scholar 

  • Petscher, Y., & Logan, J. A. R. (2014). Quantile regression in the study of developmental sciences. Child Development, 85,861–881. doi:10.1111/cdev.12190.

  • Poncy, B. C., Skinner, C. H., & Axtell, P. K. (2005). An investigation of the reliability and standard error of measurement of words read correctly per minute using curriculum-based measurement. Journal of Psychoeductional Assessment, 23, 326–338. doi:10.1177/073428290502300403.

    Article  Google Scholar 

  • Pressey, B. (2013). Comparative analysis of national teacher surveys. http://www.joanganzcooneycenter.org/wp-content/uploads/2013/10/jgcc_teacher_survey_analysis_final.pdf/.

  • Prindle, J. J. (2012). A functional use of response time data in cognitive assessment. Doctoral dissertation. Retrieved from USC Digital Library.

  • R Core Team. (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

  • Ranger, J., & Kuhn, J.-T. (2012). Improving item response theory model calibration by considering response times in psychological tests. Applied Psychological Measurement, 36, 214–231. doi:10.1177/0146621612439796.

    Article  Google Scholar 

  • SAS Institute Inc. (2011). Base SAS ® 9.3 procedures guide. Cary, NC: SAS Institute Inc.

    Google Scholar 

  • Scarborough, H. S. (2001). Connecting early language and literacy to later reading (dis)abilities: Evidence, theory, and practice. In S. Neumann & D. Dickinson (Eds.), Handbook for research in early literacy (pp. 97–110). New York: Guilford.

    Google Scholar 

  • Scheiblechner, H. (1985). Psychometric models for speed-test construction: The linear exponential model. In S. E. Embreston (Ed.), Test design developments in psychology and psychometrics (pp. 219–244). New York: Academic Press.

    Google Scholar 

  • Schnipke, D. L., & Scrams, D. J. (2002). Exploring issues of examinee behavior: Insights gained from response-time analyses. In C. N. Mills, M. T. Potenza, J. J. Fremer, & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments. Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247. doi:10.1111/j.1745-3984.1991.tb00356.x.

    Article  Google Scholar 

  • Stout, W. F. (1987). A nonparametric approach for assessing latent trait dimensionality. Psychometrika, 52, 589–617.

    Article  Google Scholar 

  • Sternberg, S. (1969) Memory-scanning: Mental processes revealed by reaction-time experiments. American Scientist, 57, 421–457.

  • Tate, R. (2003). A comparison of selected empirical methods for assessing the structure of responses to test items. Applied Psychological Mesurement, 27, 159-203.

  • van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287–308. doi:10.1007/s11336-006-1478-z.

    Article  Google Scholar 

  • van der Linden, W. J. (2011). Modeling response times with latent variables: Principles and applications. Psychological Test and Assessment Modeling, 53, 334–358.

    Google Scholar 

  • van der Linden, W. J., Klein Entink, R. H., & Fox, J.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34, 327–347.

    Article  Google Scholar 

  • van der Linden, W. J., & van Krimpen-Stoop, E. M. L. A. (2003). Using response times to detect responses in computerized adaptive testing. Psychometrika, 68, 251–265.

    Article  Google Scholar 

  • Verbic, S., & Tomic, B. (2009). Test item response time and the response likelihood. http://arxiv.org/ftp/arxiv/papers/0901/0901.4356.pdf.

  • Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press.

    Book  Google Scholar 

  • Wang, T., & Hanson, B. (2005). Development and calibration of an item response model that incorporates response time. Applied Psychological Measurement, 29, 332–339. doi:10.1177/0146621605275984.

    Google Scholar 

  • Wolf, M., & Katzir-Cohen, T. (2001). Reading fluency and its intervention. Scientific Studies of Reading, 5(3), 211–239. doi:10.1207/S1532799XSSR0503_2.

    Article  Google Scholar 

  • Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R. (1995). The educator’s word frequency guide. NY: Touchstone Applied Science Associates Inc.

    Google Scholar 

  • Zhang, J., & Stout, W. (1999). The theoretical detect index of dimensionality and its application to approximate simple structure. Psychometrika, 64, 213-249.

Download references

Acknowledgments

This research was supported by the Institute of Education Sciences (R305F100005, R305A100301) and the National Institute of Child Health and Human Development (P50HD052120).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaacov Petscher.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Petscher, Y., Mitchell, A.M. & Foorman, B.R. Improving the reliability of student scores from speeded assessments: an illustration of conditional item response theory using a computer-administered measure of vocabulary. Read Writ 28, 31–56 (2015). https://doi.org/10.1007/s11145-014-9518-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11145-014-9518-z

Keywords

Navigation