Abstract
The recent “Every Student Succeed Act" encourages schools to use an innovative assessment to provide feedback about students’ mastery level of grade-level content standards. Mastery of a skill requires the ability to complete the task with not only accuracy but also fluency. This paper offers a new sight on using both response times and response accuracy to measure fluency with cognitive diagnosis model framework. Defining fluency as the highest level of a categorical latent attribute, a polytomous response accuracy model and two forms of response time models are proposed to infer fluency jointly. A Bayesian estimation approach is developed to calibrate the newly proposed models. These models were applied to analyze data collected from a spatial rotation test. Results demonstrate that compared with the traditional CDM that using response accuracy only, the proposed joint models were able to reveal more information regarding test takers’ spatial skills. A set of simulation studies were conducted to evaluate the accuracy of model estimation algorithm and illustrate the various degrees of model complexities.
Similar content being viewed by others
References
Alberto, P. A., & Troutman, A. C. (2013). Applied behavior analysis for teachers. 6th. Upper Saddle River: Prentice Hall.
Biancarosa, G., & Shanley, L. (2016). What Is Fluency?. In The fluency construct (pp. 1–18). New York, NY: Springer.
Bolsinova, M., Tijmstra, J., & Molenaar, D. (2017a). Response moderation models for conditional dependence between response time and response accuracy. British Journal of Mathematical and Statistical Psychology, 70(2), 257–279.
Bolsinova, M., Tijmstra, J., Molenaar, D., & De Boeck, P. (2017b). Conditional dependence between response time and accuracy: An overview of its possible sources and directions for distinguishing between them. Frontiers in Psychology, 8, 202.
Cattell, R. B. (1948). Concepts and methods in the measurement of group syntality. Psychological Review, 55(1), 48.
Chen, J., & de la Torre, J. (2013). A general cognitive diagnosis model for expert-defined polytomous attributes. Applied Psychological Measurement, 37(6), 419–437.
Chiu, C.-Y., & Köhn, H.-F. (2015). The reduced RUM as a logit model: Parameterization and constraints. Psychometrika, 81, 350–370.
Choe, E. M., Kern, J. L., & Chang, H.-H. (2018). Optimizing the use of response times for item selection in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 43(2), 135–158.
Christ, T. J., Van Norman, E. R., & Nelson, P. M. (2016). Foundations of fluency-based assessments in behavioral and psychometric paradigms. In The fluency construct (pp. 143–163). New York, NY: Springer.
Corballis, M. C. (1986). Is mental rotation controlled or automatic? Memory & Cognition, 14(2), 124–128.
Culpepper, S. A. (2015). Bayesian estimation of the DINA model with Gibbs sampling. Journal of Educational and Behavioral Statistics, 40(5), 454–476.
Cummings, K. D., Park, Y., & Bauer Schaper, H. A. (2013). Form effects on dibels next oral reading fluency progress-monitoring passages. Assessment for Effective Intervention, 38(2), 91–104.
De Boeck, P., Chen, H., & Davison, M. (2017). Spontaneous and imposed speed of cognitive test responses. British Journal of Mathematical and Statistical Psychology, 70(2), 225–237.
De Boeck, P., & Jeon, M. (2019). An overview of models for response times and processes in cognitive tests. Frontiers in Psychology, 10, 102.
de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353. https://doi.org/10.1007/bf02295640.
Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52(3), 219–232.
Engelhardt, L., & Goldhammer, F. (2019). Validating test score interpretations using time information. Frontiers in Psychology, 10, 1131.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science,. https://doi.org/10.1214/ss/1177011136.
Goldhammer, F. (2015). Measuring ability, speed, or both? challenges, psychometric solutions, and what can be gained from experimental control. Measurement: Interdisciplinary Research and Perspectives, 13(3–4), 133–164.
Goldhammer, F., Naumann, J., Stelter, A., Tóth, K., Rölke, H., & Klieme, E. (2014). The time on task effect in reading and problem solving is moderated by task difficulty and skill: Insights from a computer-based large-scale assessment. Journal of Educational Psychology, 106(3), 608.
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272. https://doi.org/10.1177/01466210122032064.
Kail, R. (1991). Controlled and automatic processing during mental rotation. Journal of Experimental Child Psychology, 51(3), 337–347.
Karelitz, T. M. (2004). Ordered category attribute coding framework for cognitive assessments. PhD thesis, University of Illinois at Urbana-Champaign.
Ketterlin-Geller, L. R., & Yovanoff, P. (2009). Diagnostic assessments in mathematics to support instructional decision making. Practical Assessment, Research & Evaluation, 14(16), 1–11.
Li, F., Cohen, A., Bottge, B., & Templin, J. (2016). A latent transition analysis model for assessing change in cognitive skills. Educational and Psychological Measurement, 76(2), 181–204. https://doi.org/10.1177/0013164415588946.
Maris, G., & Van der Maas, H. (2012). Speed-accuracy response models: Scoring rules based on response time and accuracy. Psychometrika, 77(4), 615–633.
Partchev, I., & De Boeck, P. (2012). Can fast and slow intelligence be differentiated? Intelligence, 40(1), 23–32.
Petscher, Y., Mitchell, A. M., & Foorman, B. R. (2015). Improving the reliability of student scores from speeded assessments: An illustration of conditional item response theory using a computer-administered measure of vocabulary. Reading and Writing, 28(1), 31–56.
Prindle, J. J., Mitchell, A. M., & Petscher, Y. (2016). Using response time and accuracy data to inform the measurement of fluency. In The Fluency Construct (pp. 165–186). New York, NY: Springer.
Samejima, F. (1997). Graded response model. In Handbook of modern item response theory (pp. 85–100). New York, NY: Springer.
Sia, C. J. L., & Lim, C. S. (2018). Cognitive diagnostic assessment: An alternative mode of assessment for learning. In Classroom assessment in mathematics (pp. 123–137). Cham: Springer.
Spearman, C. (1927). The abilities of man (Vol. 6). New York: Macmillan.
Su, S., & Davison, M. L. (2019). Improving the predictive validity of reading comprehension using response times of correct item responses. Applied Measurement in Education, 32(2), 166–182.
Templin, J. L. (2004). Generalized linear mixed proficiency models. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign.
Thissen, D. (1983). Timed testing: An approach using item response theory. New Horizons in Testing: Latent Trait Test Theory and Computerized Adaptive Testing,. https://doi.org/10.1016/b978-0-12-742780-5.50019-6.
van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287–308. https://doi.org/10.1007/s11336-006-1478-z.
van der Linden, W. J. (2009). Predictive control of speededness in adaptive testing. Applied Psychological Measurement, 33(1), 25–41.
van der Maas, H. L., Molenaar, D., Maris, G., Kievit, R. A., & Borsboom, D. (2011). Cognitive psychology meets psychometric theory: On the relation between process models for decision making and latent variable models for individual differences. Psychological Review, 118(2), 339.
Van Der Maas, H. L., Wagenmakers, E.-J., et al. (2005). A psychometric analysis of chess expertise. American Journal of Psychology, 118(1), 29–60.
van Rijn, P. W., & Ali, U. S. (2018). A generalized speed—accuracy response model for dichotomous items. Psychometrika, 83(1), 109–131.
von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61(2), 287–307. https://doi.org/10.1348/000711007x193957.
Wang, C., Xu, G., & Shang, Z. (2016). A two-stage approach to differentiating normal and aberrant behavior in computer based testing. Psychometrika,. https://doi.org/10.1007/s11336-016-9525-x.
Wang, S., Hu, Y., Wang, Q., Wu, B., Shen, Y., & Carr, M. (2020). The development of a multidimensional diagnostic assessment with learning tools to improve 3-d mental rotation skills. Frontiers in Psychology, 11, 305.
Wang, S., Yang, Y., Culpepper, S. A., & Douglas, J. A. (2018a). Tracking skill acquisition with cognitive diagnosis models: A higher-order, hidden markov model with covariates. Journal of Educational and Behavioral Statistics, 43(1), 57–87.
Wang, S., Zhang, S., Douglas, J., & Culpepper, S. (2018b). Using response times to assess learning progress: A joint model for responses and response times. Measurement: Interdisciplinary Research and Perspectives, 16(1), 45–58.
Wang, S., Zhang, S., & Shen, Y. (2019). A joint modeling framework of responses and response times to assess learning outcomes. Multivariate Behavioral Research, 55(49), 68.
Zhan, P., Jiao, H., & Liao, D. (2018). Cognitive diagnosis modelling incorporating item response times. British Journal of Mathematical and Statistical Psychology, 71(2), 262–286.
Zhan, P., Jiao, H., Liao, D., & Li, F. (2019). A longitudinal higher-order diagnostic classification model. Journal of Educational and Behavioral Statistics, 44(3), 251–281.
Acknowledgments
This study is funded by 2019 National Academy of Education and Spencer Postdoctoral Fellowship Program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
Appendix I: Gibbs Algorithm for Parameter Updating
The proposed MCMC algorithm is used to sample from the posterior distribution of the model parameters. To do this, we first assigned initial values to all model parameters as follows:
-
The initial population membership probabilities \({{\varvec{\pi }}}^{[0]}\) were randomly generated from \(\text{ Dirichlet }(1,1,\ldots ,1)\).
-
The initial \({\varvec{\alpha }}^{[0]}\) were randomly sampled from the discrete uniform distribution over all possible patterns.
-
The initial \(\phi _i^{[0]}\) was randomly generated from \(\text{ Uniform }(0,1)\).
-
For the DINA model item parameters, we randomly generated the \(g_j^{[0]},s_{1j}^{[0]},s_{2j}^{[0]}\) for each item from \(\text{ Uniform }(.1,.3)\).
-
For the lognormal response time model parameters, we sampled \(a_j^{[0]} \sim \text{ Uniform }(2,4),\) and \(\gamma _j^{[0]}\sim N(3.45, .5^2).\)
-
For response time model (2), the initial variance \({\sigma ^2}^{[0]}_\tau \) was generated from Uniform (1,1.5). For each test taker i, \(\tau ^{[0]}_{i}\) was generated from \(N(0,{\sigma ^2}^{[0]}_\tau )\)
-
For response time model (3), the initial covariance matrix \({{\varvec{\Sigma }}}_{\tau _0\tau _1}^{[0]}\) was set to be the \(2\times 2\) identity matrix. For each test taker i, \((\tau _{0i}^{[0]}, \tau _{1i}^{[0]})\) was randomly sampled from \(MVN(\mathbf {0},{{\varvec{\Sigma }}}_{\tau _0\tau _1}^{[0]}).\)
The following procedures were then used to update the parameters in the rth iteration of the MCMC chain.
-
(1)
For \(i = 1,\ldots , N\), sample \({\varvec{\alpha }}_{i}^{[r+1]}\) from the multinomial distribution with probabilities \(\tilde{\pi }_{ic}\) is given in Table 1.
-
(2)
For response time model (2), for \(i = 1,\ldots , N\), \(\tau _i^{[r+1]}\) is updated using a Gibbs step based on the conditional distribution specified in Table 1.
-
(3)
For response time model (3), for \(i=1,\ldots ,N\), \((\tau _{0i}^{[r+1]},\tau _{1i}^{r+1})\) is updated based on the conditional distribution specified in Table 1.
-
(4)
Given a specified response time model, for \(i = 1,\ldots , N\), obtain \(\phi _i^{[r+1]}\) based on its conditional distribution in Table 1.
-
(5)
For response time model (2), using the conditional distribution in Table 1 and \(\tau ^{r+1}_i\), update \({\sigma ^2}^{[r+1]}_{\tau }\) from the inverse gamma distribution.
-
(6)
For response time model (3), based on the conditional distribution in Table 1 and \(({{\varvec{\tau }}}_0,{{\varvec{\tau }}}_{1})^{[r+1]}\), obtain \({{\varvec{\Sigma }}}^{[r+1]}_{\tau _0\tau _1}\) from the inverse Wishart distribution.
-
(7)
Based on \({\varvec{\alpha }}^{[r+1]},\) update \({{\varvec{\pi }}}^{[r+1]}\) according to the Dirichlet distribution in Table 1.
-
(8)
For \(j = 1,\ldots , J\), sample \(s_{2j}^{[r+1]}\) from the truncated beta distribution in Table 1, based on \({\varvec{\alpha }}^{[r+1]}, \mathbf{Y},\) and \(s^{[r]}_{1j}\),\(g_j^{[r]}\); and then sample \(s^{[r+1]}_{j1}\) based on \({\varvec{\alpha }}^{[r+1]}, \mathbf{Y},\) \(g_j^{[r]}\) and \(s_{j2}^{[r+1]}\). Finally, update \(g_j^{[r+1]}\) using the truncated beta distribution based on \({\varvec{\alpha }}^{[r+1]}, \mathbf{Y},\) \(s_{j1}^{[r+1]}\) and \(s_{j2}^{[r+1]}\).
-
(9)
Given the specified response time model, for \(j = 1,\ldots , J\), sample \(a^{[r+1]}\) from the inverse gamma distribution given in Table 1, based on \(\varvec{L}, {{\varvec{\tau }}}^{[r+1]} (\text {or } ({{\varvec{\tau }}}_0,{{\varvec{\tau }}}_1)^{[r+1]}),{\varvec{\alpha }}^{[r+1]}, {{\varvec{\phi }}}^{[r]}\) and \(\gamma _j^{[r]}\), and sample \(\gamma _j^{[r+1]}\) from the normal distribution in Table 1, based on \(\varvec{L}, {{\varvec{\tau }}}^{[r+1]}(\text {or } ({{\varvec{\tau }}}_0,{{\varvec{\tau }}}_1)^{[r+1]}),{\varvec{\alpha }}^{[r+1]}, {{\varvec{\phi }}}^{[r]}\) and \(a_j^{[r+1]}.\)
Appendix II: Supplementary Simulation Results
We report the results for the four models when \(N=500, J=40\) and \(N=1000, J=20\) in this appendix. Tables 9 and 10 document the latent attribute profile classification results and the recovery of \(\phi _i\).
For recovery of latent speed parameter, the correlations between estimated and true \(\tau _{i}\) for models 1 and 2 were always above 0.95, and the performance were comparable across all settings. For model 3 and 4, the correlations between estimated and true \(\tau _{0i}\) were similar to models 1 and 2, and the correlations between estimated and true \(\tau _{1i}\) were from 0.552 to 0.933. Model 4 performed better than model 3 across different settings.
For models 1 and 2, the RMSEs of estimated \(\sigma _{\tau }\) are below 0.05, and the relative bias was always below 0.08. Both models performed better with a more idealized \(\phi \) condition and were comparable under the two item conditions. For models 3 and 4, the range of the absolute bias of \(\sigma _{00}\) was from 0.017 to 0.051 for model 3 and from 0.012 to 0.038 for model 4. The range of absolute bias of \(\sigma _{01}\) (resp. \(\sigma _{11}\)) was from 0.013 to 0.062 (resp. 0.015–0.017) for model 3 and from 0.035 to 0.046 (resp. 0.012–0.016) for model 4. The estimation of \(\sigma _{00}\) and \(\sigma _{11}\) was better when N increased, whereas the performance of \(\sigma _{01}\) improved when J increased. The estimation of response time model parameters \(\gamma _j\) and \(a_j\) were consistently good across all simulation conditions. The correlation of estimated and true parameter \(a_j\) was greater than 98% for all situations, and the correlation of estimated and true parameter \(\gamma _j\) was always greater than 95%. For both parameters, 99% of the relative absolute deviations were smaller than 0.2 for all situations.
The range of RMSEs of response model item parameter was from 0.021 to 0.070 for models 1 and 2 and from 0.021 to 0.042 for models 3 and 4. Given the same \(\phi _i\) generation condition, the estimation of all item parameters was better with a more idealized true item parameter condition, that is \((g, s_1, s_2)=(0.1, 0.2, 0.1)\). Given the same true item parameter generation condition, the estimation of item parameters was better with the more idealized \(\phi _i\) generation condition. All four models performed better with a larger item size J despite the smaller subject size N.
Appendix III: Model Fit of Four Simplified Fluency Models (\(\phi _i=\phi \))
See Table 11.
Rights and permissions
About this article
Cite this article
Wang, S., Chen, Y. Using Response Times and Response Accuracy to Measure Fluency Within Cognitive Diagnosis Models. Psychometrika 85, 600–629 (2020). https://doi.org/10.1007/s11336-020-09717-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-020-09717-2