Improving the reliability of student scores from speeded assessments: an illustration of conditional item response theory using a computer-administered measure of vocabulary

Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

doi:10.1007/s11145-014-9518-z

Improving the reliability of student scores from speeded assessments: an illustration of conditional item response theory using a computer-administered measure of vocabulary

Published: 30 May 2014

Volume 28, pages 31–56, (2015)
Cite this article

Reading and Writing Aims and scope Submit manuscript

Yaacov Petscher¹,
Alison M. Mitchell¹ &
Barbara R. Foorman¹

788 Accesses
10 Citations
5 Altmetric
Explore all metrics

Abstract

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Response Time and Accuracy Data to Inform the Measurement of Fluency

Longitudinal measurement of growth in vocabulary size using Rasch-based test equating

Article Open access 02 March 2022

A Speed-Accuracy Response Model with Conditional Dependence Between Items

Notes

This assumes that the pseudo-guessing parameter has a value of 0.

References

Adams, M. J. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: MIT.
Google Scholar
Al Otaiba, S., Petscher, Y., Pappamihiel, N. E., Williams, R. S., Drylund, A. K., & Connor, C. M. (2009). Modeling oral reading fluency development in Latino students: A longitudinal study across second and third grade. Journal of Educational Psychology, 101, 315–329. doi:10.1037/a0014698.
Article Google Scholar
Andrich, D. (1988). Rasch models for measurement. Sage Publications.
Ardoin, S. P., & Christ, T. J. (2009). Curriculum-based measurement of oral reading: Standard errors associated with progress monitoring outcomes from DIBELS, AIMSweb and an experimental passage set. School Psychology Review, 38, 266–283.
Google Scholar
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238–246.
Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness-of-fit in the analysis of covariance structures. Psychological Bulletin, 88, 588–600.
Blackwell, C. K., Lauricella, A. R., Wartella, E., Robb, M., & Schomburg, R. (2013). Adoption and use of technology in early education: The interplay of extrinsic barriers and teacher attitudes. Computers & Education, 69, 310–319. doi:10.1016/j.compedu.2013.07.024.
Article Google Scholar
Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods and Research, 21, 230–258.
Cattell, R. B. (1948). Concepts and methods in the measurement of group syntality. Psychological Review, 55, 48–63. doi:10.1037/h0055921.
Article Google Scholar
Chard, D. J., Vaughn, S., & Tyler, B. (2002). A synthesis of research on effective interventions for building reading fluency with elementary students with learning disabilities. Journal of Learning Disabilities, 35(5), 386–406. http://search.proquest.com/docview/619935634?accountid=4840.
Christ, T. J., & Ardoin, S. P. (2009). Curriculum-based measurement of oral reading: Passage equivalence and probe-set development. Journal of School Psychology, 47, 55–75. doi:10.1016/j.jsp.2008.09.004.
Article Google Scholar
Christ, T. J., & Silberglitt, B. (2007). Estimates of the standard error of measurement for curriculum-based measures of oral reading fluency. School Psychology Review, 36, 130–146.
Google Scholar
Christ, T. J., Silberglitt, B., Yeo, S., & Cormier, D. (2010). Curriculum-based measurement of oral reading: An evaluation of growth rates and seasonal effects among students served in general and special education. School Psychology Review, 29, 447–462.
Google Scholar
Cummings, K. D., Atkins, T., Allison, R., & Cole, C. (2008). Response to intervention. Teaching Exceptional Children, 40, 24–31.
Google Scholar
Cunningham, A. E., & Stanovich, K. E. (1997). Early reading acquisition and its relation to reading experience and ability 10 years later. Developmental Psychology, 33(6), 934–945.
Article Google Scholar
de Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford.
Google Scholar
Deno, S. L. (2003). Developments in curriculum-based measurement. The Journal of Special Education, 37, 184–192.
Divgi, D. R. (1980, April). Dimensionality of binary items: Use of a mixed model. Paper presented at the annual meeting of the National Council on Measurement in Education. Boston, MA.
Dunn, T. J., Baguley, T., & Brunsden, V. (2013). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology. doi:10.1111/bjop.12046.
Edgar et al. (2013). Neuromagnetic oscillations predict evoked-response latency delays and core language deficits in autism spectrum disorders. Journal of autism and developmental bisorders, 1–11.
Educational Testing Service. (2007). Test and score data summary for TOEFL internet-based test. Princeton, NJ: Author.
Google Scholar
Embretson, S. E., & Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum Publishers.
Google Scholar
Ferrando, P., & Lorenzo-Seva, U. (2007). An item response theory model for incorporating response time data in binary personality items. Applied Psychological Measurement, 31, 525–543. doi:10.1177/0146621606295197.
Article Google Scholar
Foorman, B. R., Petscher, Y., & Bishop, M. D. (2012). The incremental variance of morphological knowledge to reading comprehension in grades 3–10 beyond prior reading comprehension, spelling, and text reading efficiency. Learning and Individual Differences, 22, 792–798. doi:10.1016/j.lindif.2012.07.009.
Article Google Scholar
Fox, J. P., Klein Entink, R. H. K., & van der Linden, W. J. (2007). Modeling of responses and response time with the package CIRT. Journal of Statistical Software, 20, 1–14.
Google Scholar
Francis, D. J., Santi, K. S., Barr, C., Fletcher, J. M., Varisco, A., & Foorman, B. R. (2008). Form effects on the estimation of students’ oral reading fluency using DIBELS. Journal of School Psychology, 46, 315–342. doi:10.1016/j.jsp.2007.06.003.
Article Google Scholar
Fuchs, L. S., Fuchs, D., Hosp, M. K., & Jenkins, J. R. (2001). Oral reading fluency as an indicator of reading competence: A theoretical, empirical, and historical analysis. Scientific Studies of Reading, 5, 239–256. doi:10.1207/S1532799XSSR0503_3.
Article Google Scholar
Good, R. H., Simmons, D. C., & Kame’enui, E. J. (2001). The importance of decision-making utility of a continuum of fluency-based indicators of foundational reading skills for third-grade high-stakes outcomes. Scientific Studies of Reading, 5, 257–288. doi:10.1207/S1532799XSSR0503_4.
Article Google Scholar
Goodglass, H., Theurkauf, J.C., & Wingfield, A. (1984). Naming latencies as evidence for two modes of lexical retrieval. Applied Psycholinguistics, 5, 135–146.
Gray, L., Thomas, N., & Lewis, L. (2010). Teachers’ use of educational technology in U.S. public schools: 2009 (NCES 2010-040). Retrieved from the U.S. Department of Education, National Center for Educational Statistics, Institute of Education Sciences. http://nces.ed.gov/pubs2010/2010040.pdf.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Google Scholar
Jang, E. E., & Roussos, L. (2007). An investigation into the dimensionality of TOEFL using conditional covariance-based nonparametric approach. Journal of Educational Measurement, 44, 1–21.
Article Google Scholar
Kamil, M. L. (2004). Vocabulary and comprehension instruction: Summary and implications of the national reading panel findings. In P. McCardle & V. Chhabra (Eds.), The voice of evidence in reading research (pp. 213–234). Baltimore: Paul H Brookes Publishing.
Google Scholar
Kim, Y.-S., Wagner, R. K., & Foster, E. (2011). Relations among oral reading fluency, silent reading fluency, and reading comprehension: A latent variable study of first-grade readers. Scientific Studies of Reading, 15, 338–362. doi:10.1080/10888438.2010.493964.
Article Google Scholar
Klein Entink, R. H., Kuhn, J.-T., Hornke, L. F., & Fox, J.-P. (2009). Evaluating cognitive theory: A joint modeling approach using responses and response times. Psychological Methods, 14, 54–75. doi:10.1037/a0014877.
Article Google Scholar
Koenker, R. (2013). Quantreg: Quantile regression. R package version 4.98. http://CRAN.R-project.org/package=quantreg.
Koenker, R., & Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33–50.
Kolen, M. J., & Brennan, R. L. (2004). Test equating: Methods and practices (2nd ed.). New York: Springer-Verlag.
Book Google Scholar
LaBerge, D., & Samuels, S. J. (1974). Toward a theory of automatic information processing in reading. Cognitive Psychology, 62, 293–323. doi:10.1016/0010-0285(74)90015-2.
Article Google Scholar
Logan, J. A. R., & Petscher, Y. (2010). School profiles of at-risk student concentration: Differential growth in oral reading fluency. Journal of School Psychology, 48, 163–186. doi:10.1016/j.jsp.2009.12.002.
Article Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. New York: Erlbaum Associates.
Google Scholar
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.
Google Scholar
Mercer, S. H., Dufrene, B. A., Zoder-Martell, K., Harpole, L. L., Mitchell, R. R., & Blaze, J. T. (2012). Generalizability theory analysis of CBM maze reliability in third- through fifth-grade students. Assessment for Effective Intervention, 37, 183–190. doi:10.1177/1534508411430319.
Article Google Scholar
Metrik et al. (2012). Balanced placebo design with marijuana: Pharmacological and expectancy effects on impulsivity and risk taking. Psychopharmacology, 223, 489-499.
Miranda, H., & Russell, M. (2011). Predictors of teacher-directed student use of technology in elementary classrooms: A multilevel SEM approach using data from the USEIT study. Journal of Research on Technology in Education, 43, 301–323.
Article Google Scholar
Muthen, L. K., & Muthen, B. O. (1998–2012). Mplus (7th ed.). Los Angeles, CA: Muthen & Muthen.
National Institute of Child Health and Human Development. (2000). Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH publication no. 00-4769).
Perfetti, C. & Hogaboam, T. (1975). Relationship between single word decoding and reading comprehension skill. Journal of Educational Psychology, 67, 461-469.
Petscher, Y., Cummings, K. D., Biancarosa, G., & Fien, H. (2013). Advanced (measurement) applications of curriculum-based measurement of reading. Assessment for Effective Intervention, 38, 71–75. doi:10.1177/1534508412461434.
Article Google Scholar
Petscher, Y., & Kim, Y. S. (2011). The utility and accuracy of oral reading fluency score types in predicting reading comprehension. Journal of School Psychology, 49, 107–129. doi:10.1016/j.jsp.2010.09.004.
Article Google Scholar
Petscher, Y., & Logan, J. A. R. (2014). Quantile regression in the study of developmental sciences. Child Development, 85,861–881. doi:10.1111/cdev.12190.
Poncy, B. C., Skinner, C. H., & Axtell, P. K. (2005). An investigation of the reliability and standard error of measurement of words read correctly per minute using curriculum-based measurement. Journal of Psychoeductional Assessment, 23, 326–338. doi:10.1177/073428290502300403.
Article Google Scholar
Pressey, B. (2013). Comparative analysis of national teacher surveys. http://www.joanganzcooneycenter.org/wp-content/uploads/2013/10/jgcc_teacher_survey_analysis_final.pdf/.
Prindle, J. J. (2012). A functional use of response time data in cognitive assessment. Doctoral dissertation. Retrieved from USC Digital Library.
R Core Team. (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Ranger, J., & Kuhn, J.-T. (2012). Improving item response theory model calibration by considering response times in psychological tests. Applied Psychological Measurement, 36, 214–231. doi:10.1177/0146621612439796.
Article Google Scholar
SAS Institute Inc. (2011). Base SAS ^® 9.3 procedures guide. Cary, NC: SAS Institute Inc.
Google Scholar
Scarborough, H. S. (2001). Connecting early language and literacy to later reading (dis)abilities: Evidence, theory, and practice. In S. Neumann & D. Dickinson (Eds.), Handbook for research in early literacy (pp. 97–110). New York: Guilford.
Google Scholar
Scheiblechner, H. (1985). Psychometric models for speed-test construction: The linear exponential model. In S. E. Embreston (Ed.), Test design developments in psychology and psychometrics (pp. 219–244). New York: Academic Press.
Google Scholar
Schnipke, D. L., & Scrams, D. J. (2002). Exploring issues of examinee behavior: Insights gained from response-time analyses. In C. N. Mills, M. T. Potenza, J. J. Fremer, & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments. Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247. doi:10.1111/j.1745-3984.1991.tb00356.x.
Article Google Scholar
Stout, W. F. (1987). A nonparametric approach for assessing latent trait dimensionality. Psychometrika, 52, 589–617.
Article Google Scholar
Sternberg, S. (1969) Memory-scanning: Mental processes revealed by reaction-time experiments. American Scientist, 57, 421–457.
Tate, R. (2003). A comparison of selected empirical methods for assessing the structure of responses to test items. Applied Psychological Mesurement, 27, 159-203.
van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287–308. doi:10.1007/s11336-006-1478-z.
Article Google Scholar
van der Linden, W. J. (2011). Modeling response times with latent variables: Principles and applications. Psychological Test and Assessment Modeling, 53, 334–358.
Google Scholar
van der Linden, W. J., Klein Entink, R. H., & Fox, J.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34, 327–347.
Article Google Scholar
van der Linden, W. J., & van Krimpen-Stoop, E. M. L. A. (2003). Using response times to detect responses in computerized adaptive testing. Psychometrika, 68, 251–265.
Article Google Scholar
Verbic, S., & Tomic, B. (2009). Test item response time and the response likelihood. http://arxiv.org/ftp/arxiv/papers/0901/0901.4356.pdf.
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press.
Book Google Scholar
Wang, T., & Hanson, B. (2005). Development and calibration of an item response model that incorporates response time. Applied Psychological Measurement, 29, 332–339. doi:10.1177/0146621605275984.
Google Scholar
Wolf, M., & Katzir-Cohen, T. (2001). Reading fluency and its intervention. Scientific Studies of Reading, 5(3), 211–239. doi:10.1207/S1532799XSSR0503_2.
Article Google Scholar
Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R. (1995). The educator’s word frequency guide. NY: Touchstone Applied Science Associates Inc.
Google Scholar
Zhang, J., & Stout, W. (1999). The theoretical detect index of dimensionality and its application to approximate simple structure. Psychometrika, 64, 213-249.

Download references

Acknowledgments

This research was supported by the Institute of Education Sciences (R305F100005, R305A100301) and the National Institute of Child Health and Human Development (P50HD052120).

Author information

Authors and Affiliations

Florida Center for Reading Research, Florida State University, 2010 Levy Avenue, Suite 100, Tallahassee, FL, 32310, USA
Yaacov Petscher, Alison M. Mitchell & Barbara R. Foorman

Authors

Yaacov Petscher
View author publications
You can also search for this author in PubMed Google Scholar
Alison M. Mitchell
View author publications
You can also search for this author in PubMed Google Scholar
Barbara R. Foorman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yaacov Petscher.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Petscher, Y., Mitchell, A.M. & Foorman, B.R. Improving the reliability of student scores from speeded assessments: an illustration of conditional item response theory using a computer-administered measure of vocabulary. Read Writ 28, 31–56 (2015). https://doi.org/10.1007/s11145-014-9518-z

Download citation

Published: 30 May 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11145-014-9518-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the reliability of student scores from speeded assessments: an illustration of conditional item response theory using a computer-administered measure of vocabulary

Abstract

Access this article

Similar content being viewed by others

Using Response Time and Accuracy Data to Inform the Measurement of Fluency

Longitudinal measurement of growth in vocabulary size using Rasch-based test equating

A Speed-Accuracy Response Model with Conditional Dependence Between Items

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving the reliability of student scores from speeded assessments: an illustration of conditional item response theory using a computer-administered measure of vocabulary

Abstract

Access this article

Similar content being viewed by others

Using Response Time and Accuracy Data to Inform the Measurement of Fluency

Longitudinal measurement of growth in vocabulary size using Rasch-based test equating

A Speed-Accuracy Response Model with Conditional Dependence Between Items

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation