Skip to main content
Log in

Accurate Assessment via Process Data

  • Theory and Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Accurate assessment of a student’s ability is the key task of a test. Assessments based on final responses are the standard. As the infrastructure advances, substantially more information is observed. One of such instances is the process data that is collected by computer-based interactive items and contain a student’s detailed interactive processes. In this paper, we show both theoretically and with simulated and empirical data that appropriately including such information in the assessment will substantially improve relevant assessment precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • AERA, APA, and NCME. (2014). Standards for educational and psychological testing. American Educational Research Association American Psychological Association.

  • Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater V.2. The Journal of Technology, Learning and Assessment, 4(3). Retrieved from https://ejournals.bc.edu/index.php/jtla/article/view/1650

  • Bejar, I. I., Mislevy, R. J., & Zhang, M. (2016). Automated scoring with validity in mind. In A. A. Rupp & J. P. Leighton (Eds.), The Wiley handbook of cognition and assessment (pp. 226–246). https://doi.org/10.1002/9781118956588.ch10

  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Addison-Wesley.

  • Blackwell, D. (1947). Conditional expectation and unbiased sequential estimation. The Annals of Mathematical Statistics, 18(1), 105–110.

  • Bolsinova, M., & Tijmstra, J. (2018). Improving precision of ability estimation: Getting more from response times. British Journal of Mathematical and Statistical Psychology, 71(1), 13–38.

    Article  PubMed  Google Scholar 

  • Casella, G., & Berger, R. L. (2002). Statistical inference (Vol. 2). Duxbury.

  • Clauser, B. E., Harik, P., & Clyman, S. G. (2000). The generalizability of scores for a performance assessment scored with a computer-automated scoring system. Journal of Educational Measurement, 37(3), 245–261.

    Article  Google Scholar 

  • Evanini, K., Heilman, M., Wang, X., & Blanchard, D. (2015). Automated scoring for the toefl junior® comprehensive writing and speaking test. ETS Research Report Series, 2015(1), 1–11.

    Article  Google Scholar 

  • Fife, J. H. (2013). Automated scoring of mathematics tasks in the common core era: Enhancements to m-rater in support of \(\text{ cbal}^{{\rm TM}}\) mathematics and the common core assessments. ETS research report series, 2013(2), i–35.

    Article  Google Scholar 

  • Foltz, P. W., Laham, D., & Landauer, T. K. (1999). Automated essay scoring: Applications to educational technology. In B. Collis & R. Oliver (Eds.), Proceedings of EdMedia + Innovate Learning 1999 (pp. 939–944). Association for the Advancement of Computing in Education (AACE).

  • Frey, A., Spoden, C., Goldhammer, F., & Wenzel, S. F. C. (2018). Response time-based treatment of omitted responses in computer-based testing. Behaviormetrika, 45(2), 505–526.

    Article  Google Scholar 

  • He, Q., Veldkamp, B. P., Glas, C. A., & de Vries, T. (2017). Automated assessment of patients’ self-narratives for posttraumatic stress disorder screening using natural language processing and text mining. Assessment, 24(2), 157–172.

    Article  PubMed  Google Scholar 

  • He, Q., Veldkamp, B. P., Glas, C. A., & Van Den Berg, S. M. (2019). Combining text mining of long constructed responses and item-based measures: A hybrid test design to screen for posttraumatic stress disorder (ptsd). Frontiers in Psychology, 10, 2358.

    Article  PubMed  PubMed Central  Google Scholar 

  • He, Q., & von Davier, M. (2016). Analyzing process data from problem-solving items with N-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 750–777). IGI Global. https://doi.org/10.4018/978-1-4666-9441-5.ch029

  • Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67. https://doi.org/10.1080/00401706.1970.10488634

    Article  Google Scholar 

  • Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30(1/2), 81–93.

    Article  Google Scholar 

  • Kim, J. K., & Nicewander, W. A. (1993). Ability estimation for conventional tests. Psychometrika, 58(4), 587–599.

    Article  Google Scholar 

  • LaMar, M. M. (2018). Markov decision process measurement model. Psychometrika, 83(1), 67–88.

    Article  PubMed  Google Scholar 

  • Lehmann, E. L., & Romano, J. P. (2005). Testing statistical hypotheses (3rd ed.). Springer.

  • Liu, H., Liu, Y., & Li, M. (2018). Analysis of process data of pisa 2012 computer-based problem solving: Application of the modified multilevel mixture irt model. Frontiers in Psychology, 9, 1372.

    Article  PubMed  PubMed Central  Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge.

  • Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18(2), 5–11.

    Article  Google Scholar 

  • Muraki, E. (1992). A generalized partial credit model: Application of an em algorithm. ETS Research Report Series, 1992(1), i–30.

    Google Scholar 

  • OECD. (2012). Literacy, numeracy and problem solving in technology-rich environments: Framework for the oecd survey of adult skills. OECD Publishing.

  • Page, E. B. (1966). The imminence of grading essays by computer. The Phi Delta Kappan, 47(5), 238–243.

    Google Scholar 

  • Qiao, X., & Jiao, H. (2018). Data mining techniques in analyzing process data: A didactic. Frontiers in Psychology, 9, 2231.

    Article  PubMed  PubMed Central  Google Scholar 

  • Rasch, G. (1960). Probabilistic models for some intelligence and achievement tests. Danish Institute for Educational Research.

  • Rose, N., von Davier, M., & Nagengast, B. (2017). Modeling omitted and not-reached items in irt models. Psychometrika, 82(3), 795–819.

    Article  Google Scholar 

  • Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of IntelliMetric\(^{\rm TM}\) essay scoring system. The Journal of Technology, Learning and Assessment, 4(4). Retrieved from https://ejournals.bc.edu/index.php/jtla/article/view/1651

  • Rupp, A. A. (2018). Designing, evaluating, and deploying automated scoring systems with validity in mind: Methodological design decisions. Applied Measurement in Education, 31(3), 191–214.

    Article  Google Scholar 

  • Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.

  • Schleicher, A. (2008). Piaac: A new strategy for assessing adult competencies. International Review of Education, 54(5–6), 627–650.

    Article  Google Scholar 

  • Tang, X., Wang, Z., Liu, J., & Ying, Z. (2021a). An exploratory analysis of the latent structure of process data via action sequence autoencoders. British Journal of Mathematical and Statistical Psychology, 74(1), 1–33.

  • Tang, X., Zhang, S., Wang, Z., Liu, J., & Ying, Z. (2021b). Procdata: An R package for process data analysis. Psychometrika, 86(4), 1058–1083.

  • Tang, X., Wang, Z., He, Q., Liu, J., & Ying, Z. (2020). Latent feature extraction for process data via multidimensional scaling. Psychometrika, 85(2), 378–397.

  • Tikhonov, A. N. & Arsenin, V. Y. (1977). Solutions of ill-posed problems (pp. 1–30). New York.

  • Ulitzsch, E., von Davier, M., & Pohl, S. (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non-response. British Journal of Mathematical and Statistical Psychology, 73, 83–112.

    Article  PubMed  Google Scholar 

  • van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287.

    Article  Google Scholar 

  • von Davier, M., Sinharay, S., Oranje, A., & Beaton, A. (2006). 32 the statistical procedures used in national assessment of educational progress: Recent developments and future directions. Handbook of Satistics, 26, 1039–1055.

    Article  Google Scholar 

  • Wainer, H., Dorans, N. J. , Flaugher, R., Green, B. F., & Mislevy, R. J. (2000). Computerized adaptive testing: A primer. Routledge.

  • Xu, H., Fang, G., Chen, Y., Liu, J., & Ying, Z. (2018). Latent class analysis of recurrent events in problem-solving items. Applied Psychological Measurement, 0146621617748325.

  • Zumbo, B. D., & Hubley, A. M. (2017). Understanding and investigating response processes in validation research (Vol 26). Springer.

Download references

Acknowledgements

This research was supported in part by NSF Grants SES-1826540, SES-2119938, DMS-2015417 and 1633360. The authors would like to thank Educational Testing Service for providing the data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingchen Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs of Theorem 1 and Theorem 2

Appendix: Proofs of Theorem 1 and Theorem 2

To prove Theorem 1, we establish the following lemma.

Lemma 1

Let X be a nonconstant random variable, and \(f(\cdot )\) and \(g(\cdot )\) be strictly increasing functions. Suppose that f(X) and g(X) have finite second moments. Then, \({{\,\mathrm{Cov}\,}}\left( f(X), g(X) \right) >0\) .

Proof of lemma 1

Let Y be an independent and identically distributed (i.i.d.) copy of X. It is easy to verify the following identity

$$\begin{aligned} {{\,\mathrm{Cov}\,}}\left( f(X), g(X) \right) = \frac{1}{2} E \left[ \left( f(X) -f(Y) \right) \left( g(X) - g(Y) \right) \right] . \end{aligned}$$
(10)

Clearly, for any x and y, \( (f(x) -f(y) ) (g(x) - g(y))\ge 0\), and \(``=''\) holds if and only if \(x=y\). Since \(P(X\not =Y)>0\), the right-hand side of equation (10) must be positive. \(\square \)

Proof of Theorem 1

By Assumption A2 (local independence),

$$\begin{aligned} T_{{\mathbf {X}}_{-j}} = E \left[ {{\hat{\theta }}}_{Y_j} | {\mathbf {X}}_{-j}\right] = E \left[ E \left[ {{\hat{\theta }}}_{Y_j} | {\mathbf {X}}_{-j} ,\theta \right] | {\mathbf {X}}_{-j} \right] = E \left[ E \left[ {{\hat{\theta }}}_{Y_j} | \theta \right] | \mathbf{X}_{-j} \right] = E \left[ m_j (\theta ) | {\mathbf {X}}_{-j} \right] . \end{aligned}$$

Due to Assumption A3 (exponential family), the posterior distribution of \(\theta \) given \({\mathbf {X}}_{-j}\) depends on \(\mathbf{X}_{-j}\) only through the sufficient statistic \(T_j({\mathbf {X}}_{-j})\). In fact,

$$\begin{aligned} T_{{\mathbf {X}}_{-j}} = E \left[ m_j (\theta ) | {\mathbf {X}}_{-j} \right] = G_j(T_j({\mathbf {X}}_{-j})), \end{aligned}$$

where \(G_j(t) =E \left[ m_j (\theta ) |T_j({\mathbf {X}}_{-j})=t\right] \). Furthermore, by making use of the exponential family form in Assumption A3 and the simple exchange of order of differentiation and integration, we can show that

$$\begin{aligned} G_j'(t) = {{\,\mathrm{Cov}\,}}\left[ m_j(\theta ),\eta _j(\theta ) | T_j(\mathbf{X}_{-j})=t \right] . \end{aligned}$$

Since both \(m_j\) and \(\eta _j\) are strictly monotone, Lemma 1 implies that \(G_j'(t)\) is strictly positive or negative for all t and, therefore, \(G_j\) is strictly monotone. In other words, there is a one-to-one mapping between \(T_{{\mathbf {X}}_j}\) and \(T_j({\mathbf {X}}_{-j})\). \(\square \)

Proof of Theorem 2

From Theorem 1, we know that \(T_{\mathbf{X}_{-j}}\) is a sufficient statistic of \({\mathbf {X}}_{-j}\) for each j. Since \({{\hat{\theta }}}_{{\mathbf {Y}}}\) is a function of \({\mathbf {Y}}\) and \(\sigma ({\mathbf {Y}}_{-j}) \subseteq \sigma ({\mathbf {X}}_{-j})\), the conditional distribution \({{\hat{\theta }}}_{{\mathbf {Y}}} | T_{\mathbf{X}_{-j}}, Y_j\) is free of \(\theta \). Therefore, we have \(E[{{\hat{\theta }}}_{{\mathbf {Y}}} | T_{{\mathbf {X}}_{-j}}, Y_j, \theta ] = E[{{\hat{\theta }}}_{{\mathbf {Y}}} | T_{{\mathbf {X}}_{-j}}, Y_j ] = \hat{\theta }_{{\mathbf {X}}_{-j}}.\) It follows from the well-known Rao–Blackwell theorem (Casella & Berger, 2002) that \({{\hat{\theta }}}_{{\mathbf {X}}_{-j}}\) reduces the conditional variance and

$$\begin{aligned} E[({{\hat{\theta }}}_{{\mathbf {X}}_{-j}} - \theta )^2 | \theta ] \le E[({{\hat{\theta }}}_{{\mathbf {Y}}} - \theta )^2 | \theta ] \end{aligned}$$

holds for every j and \(\theta \).

By Cauchy–Schwarz inequality, we get

$$\begin{aligned} E[({{\hat{\theta }}}_{{\mathbf {X}}} - \theta )^2 | \theta ] \le E \left[ \frac{1}{J} \sum _{j=1}^J ({{\hat{\theta }}}_{{\mathbf {X}}_{-j}} - \theta )^2 | \theta ] \le E[({{\hat{\theta }}}_{{\mathbf {Y}}} - \theta )^2 | \theta \right] . \end{aligned}$$

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, S., Wang, Z., Qi, J. et al. Accurate Assessment via Process Data. Psychometrika 88, 76–97 (2023). https://doi.org/10.1007/s11336-022-09880-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-022-09880-8

Keywords

Navigation