Accurate Assessment via Process Data

Zhang, Susu; Wang, Zhi; Qi, Jitong; Liu, Jingchen; Ying, Zhiliang

doi:10.1007/s11336-022-09880-8

Accurate Assessment via Process Data

Theory and Methods
Published: 13 August 2022

Volume 88, pages 76–97, (2023)
Cite this article

Psychometrika Aims and scope Submit manuscript

Susu Zhang¹,
Zhi Wang²,
Jitong Qi³,
Jingchen Liu ORCID: orcid.org/0000-0002-4937-2601³ &
…
Zhiliang Ying³

1062 Accesses
4 Citations
Explore all metrics

Abstract

Accurate assessment of a student’s ability is the key task of a test. Assessments based on final responses are the standard. As the infrastructure advances, substantially more information is observed. One of such instances is the process data that is collected by computer-based interactive items and contain a student’s detailed interactive processes. In this paper, we show both theoretically and with simulated and empirical data that appropriately including such information in the assessment will substantially improve relevant assessment precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The use of process data in large-scale assessments: a literature review

Article Open access 06 May 2024

Quality Assurance in Digital-First Assessments

Computational Psychometrics: A Framework for Estimating Learners’ Knowledge, Skills and Abilities from Learning and Assessments Systems

References

AERA, APA, and NCME. (2014). Standards for educational and psychological testing. American Educational Research Association American Psychological Association.
Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater V.2. The Journal of Technology, Learning and Assessment, 4(3). Retrieved from https://ejournals.bc.edu/index.php/jtla/article/view/1650
Bejar, I. I., Mislevy, R. J., & Zhang, M. (2016). Automated scoring with validity in mind. In A. A. Rupp & J. P. Leighton (Eds.), The Wiley handbook of cognition and assessment (pp. 226–246). https://doi.org/10.1002/9781118956588.ch10
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Addison-Wesley.
Blackwell, D. (1947). Conditional expectation and unbiased sequential estimation. The Annals of Mathematical Statistics, 18(1), 105–110.
Bolsinova, M., & Tijmstra, J. (2018). Improving precision of ability estimation: Getting more from response times. British Journal of Mathematical and Statistical Psychology, 71(1), 13–38.
Article PubMed Google Scholar
Casella, G., & Berger, R. L. (2002). Statistical inference (Vol. 2). Duxbury.
Clauser, B. E., Harik, P., & Clyman, S. G. (2000). The generalizability of scores for a performance assessment scored with a computer-automated scoring system. Journal of Educational Measurement, 37(3), 245–261.
Article Google Scholar
Evanini, K., Heilman, M., Wang, X., & Blanchard, D. (2015). Automated scoring for the toefl junior® comprehensive writing and speaking test. ETS Research Report Series, 2015(1), 1–11.
Article Google Scholar
Fife, J. H. (2013). Automated scoring of mathematics tasks in the common core era: Enhancements to m-rater in support of $\text{ cbal}^{{\rm TM}}$ mathematics and the common core assessments. ETS research report series, 2013(2), i–35.
Article Google Scholar
Foltz, P. W., Laham, D., & Landauer, T. K. (1999). Automated essay scoring: Applications to educational technology. In B. Collis & R. Oliver (Eds.), Proceedings of EdMedia + Innovate Learning 1999 (pp. 939–944). Association for the Advancement of Computing in Education (AACE).
Frey, A., Spoden, C., Goldhammer, F., & Wenzel, S. F. C. (2018). Response time-based treatment of omitted responses in computer-based testing. Behaviormetrika, 45(2), 505–526.
Article Google Scholar
He, Q., Veldkamp, B. P., Glas, C. A., & de Vries, T. (2017). Automated assessment of patients’ self-narratives for posttraumatic stress disorder screening using natural language processing and text mining. Assessment, 24(2), 157–172.
Article PubMed Google Scholar
He, Q., Veldkamp, B. P., Glas, C. A., & Van Den Berg, S. M. (2019). Combining text mining of long constructed responses and item-based measures: A hybrid test design to screen for posttraumatic stress disorder (ptsd). Frontiers in Psychology, 10, 2358.
Article PubMed PubMed Central Google Scholar
He, Q., & von Davier, M. (2016). Analyzing process data from problem-solving items with N-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 750–777). IGI Global. https://doi.org/10.4018/978-1-4666-9441-5.ch029
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67. https://doi.org/10.1080/00401706.1970.10488634
Article Google Scholar
Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30(1/2), 81–93.
Article Google Scholar
Kim, J. K., & Nicewander, W. A. (1993). Ability estimation for conventional tests. Psychometrika, 58(4), 587–599.
Article Google Scholar
LaMar, M. M. (2018). Markov decision process measurement model. Psychometrika, 83(1), 67–88.
Article PubMed Google Scholar
Lehmann, E. L., & Romano, J. P. (2005). Testing statistical hypotheses (3rd ed.). Springer.
Liu, H., Liu, Y., & Li, M. (2018). Analysis of process data of pisa 2012 computer-based problem solving: Application of the modified multilevel mixture irt model. Frontiers in Psychology, 9, 1372.
Article PubMed PubMed Central Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge.
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18(2), 5–11.
Article Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an em algorithm. ETS Research Report Series, 1992(1), i–30.
Google Scholar
OECD. (2012). Literacy, numeracy and problem solving in technology-rich environments: Framework for the oecd survey of adult skills. OECD Publishing.
Page, E. B. (1966). The imminence of grading essays by computer. The Phi Delta Kappan, 47(5), 238–243.
Google Scholar
Qiao, X., & Jiao, H. (2018). Data mining techniques in analyzing process data: A didactic. Frontiers in Psychology, 9, 2231.
Article PubMed PubMed Central Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and achievement tests. Danish Institute for Educational Research.
Rose, N., von Davier, M., & Nagengast, B. (2017). Modeling omitted and not-reached items in irt models. Psychometrika, 82(3), 795–819.
Article Google Scholar
Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of IntelliMetric$^{\rm TM}$ essay scoring system. The Journal of Technology, Learning and Assessment, 4(4). Retrieved from https://ejournals.bc.edu/index.php/jtla/article/view/1651
Rupp, A. A. (2018). Designing, evaluating, and deploying automated scoring systems with validity in mind: Methodological design decisions. Applied Measurement in Education, 31(3), 191–214.
Article Google Scholar
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.
Schleicher, A. (2008). Piaac: A new strategy for assessing adult competencies. International Review of Education, 54(5–6), 627–650.
Article Google Scholar
Tang, X., Wang, Z., Liu, J., & Ying, Z. (2021a). An exploratory analysis of the latent structure of process data via action sequence autoencoders. British Journal of Mathematical and Statistical Psychology, 74(1), 1–33.
Tang, X., Zhang, S., Wang, Z., Liu, J., & Ying, Z. (2021b). Procdata: An R package for process data analysis. Psychometrika, 86(4), 1058–1083.
Tang, X., Wang, Z., He, Q., Liu, J., & Ying, Z. (2020). Latent feature extraction for process data via multidimensional scaling. Psychometrika, 85(2), 378–397.
Tikhonov, A. N. & Arsenin, V. Y. (1977). Solutions of ill-posed problems (pp. 1–30). New York.
Ulitzsch, E., von Davier, M., & Pohl, S. (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non-response. British Journal of Mathematical and Statistical Psychology, 73, 83–112.
Article PubMed Google Scholar
van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287.
Article Google Scholar
von Davier, M., Sinharay, S., Oranje, A., & Beaton, A. (2006). 32 the statistical procedures used in national assessment of educational progress: Recent developments and future directions. Handbook of Satistics, 26, 1039–1055.
Article Google Scholar
Wainer, H., Dorans, N. J. , Flaugher, R., Green, B. F., & Mislevy, R. J. (2000). Computerized adaptive testing: A primer. Routledge.
Xu, H., Fang, G., Chen, Y., Liu, J., & Ying, Z. (2018). Latent class analysis of recurrent events in problem-solving items. Applied Psychological Measurement, 0146621617748325.
Zumbo, B. D., & Hubley, A. M. (2017). Understanding and investigating response processes in validation research (Vol 26). Springer.

Download references

Acknowledgements

This research was supported in part by NSF Grants SES-1826540, SES-2119938, DMS-2015417 and 1633360. The authors would like to thank Educational Testing Service for providing the data.

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, Champaign, IL, USA
Susu Zhang
Citadel Securities, New York, NY, USA
Zhi Wang
Columbia University, New York, NY, USA
Jitong Qi, Jingchen Liu & Zhiliang Ying

Authors

Susu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jitong Qi
View author publications
You can also search for this author in PubMed Google Scholar
Jingchen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiliang Ying
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingchen Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs of Theorem 1 and Theorem 2

To prove Theorem 1, we establish the following lemma.

Lemma 1

Let X be a nonconstant random variable, and $f(\cdot )$ and $g(\cdot )$ be strictly increasing functions. Suppose that f(X) and g(X) have finite second moments. Then, ${{\,\mathrm{Cov}\,}}\left( f(X), g(X) \right) >0$ .

Proof of lemma 1

Let Y be an independent and identically distributed (i.i.d.) copy of X. It is easy to verify the following identity

$$\begin{aligned} {{\,\mathrm{Cov}\,}}\left( f(X), g(X) \right) = \frac{1}{2} E \left[ \left( f(X) -f(Y) \right) \left( g(X) - g(Y) \right) \right] . \end{aligned}$$

(10)

Clearly, for any x and y, $ (f(x) -f(y) ) (g(x) - g(y))\ge 0$, and $``=''$ holds if and only if $x=y$. Since $P(X\not =Y)>0$, the right-hand side of equation (10) must be positive. $\square $

Proof of Theorem 1

By Assumption A2 (local independence),

$$\begin{aligned} T_{{\mathbf {X}}_{-j}} = E \left[ {{\hat{\theta }}}_{Y_j} | {\mathbf {X}}_{-j}\right] = E \left[ E \left[ {{\hat{\theta }}}_{Y_j} | {\mathbf {X}}_{-j} ,\theta \right] | {\mathbf {X}}_{-j} \right] = E \left[ E \left[ {{\hat{\theta }}}_{Y_j} | \theta \right] | \mathbf{X}_{-j} \right] = E \left[ m_j (\theta ) | {\mathbf {X}}_{-j} \right] . \end{aligned}$$

Due to Assumption A3 (exponential family), the posterior distribution of $\theta $ given ${\mathbf {X}}_{-j}$ depends on $\mathbf{X}_{-j}$ only through the sufficient statistic $T_j({\mathbf {X}}_{-j})$. In fact,

$$\begin{aligned} T_{{\mathbf {X}}_{-j}} = E \left[ m_j (\theta ) | {\mathbf {X}}_{-j} \right] = G_j(T_j({\mathbf {X}}_{-j})), \end{aligned}$$

where $G_j(t) =E \left[ m_j (\theta ) |T_j({\mathbf {X}}_{-j})=t\right] $. Furthermore, by making use of the exponential family form in Assumption A3 and the simple exchange of order of differentiation and integration, we can show that

$$\begin{aligned} G_j'(t) = {{\,\mathrm{Cov}\,}}\left[ m_j(\theta ),\eta _j(\theta ) | T_j(\mathbf{X}_{-j})=t \right] . \end{aligned}$$

Since both $m_j$ and $\eta _j$ are strictly monotone, Lemma 1 implies that $G_j'(t)$ is strictly positive or negative for all t and, therefore, $G_j$ is strictly monotone. In other words, there is a one-to-one mapping between $T_{{\mathbf {X}}_j}$ and $T_j({\mathbf {X}}_{-j})$. $\square $

Proof of Theorem 2

From Theorem 1, we know that $T_{\mathbf{X}_{-j}}$ is a sufficient statistic of ${\mathbf {X}}_{-j}$ for each j. Since ${{\hat{\theta }}}_{{\mathbf {Y}}}$ is a function of ${\mathbf {Y}}$ and $\sigma ({\mathbf {Y}}_{-j}) \subseteq \sigma ({\mathbf {X}}_{-j})$, the conditional distribution ${{\hat{\theta }}}_{{\mathbf {Y}}} | T_{\mathbf{X}_{-j}}, Y_j$ is free of $\theta $. Therefore, we have $E[{{\hat{\theta }}}_{{\mathbf {Y}}} | T_{{\mathbf {X}}_{-j}}, Y_j, \theta ] = E[{{\hat{\theta }}}_{{\mathbf {Y}}} | T_{{\mathbf {X}}_{-j}}, Y_j ] = \hat{\theta }_{{\mathbf {X}}_{-j}}.$ It follows from the well-known Rao–Blackwell theorem (Casella & Berger, 2002) that ${{\hat{\theta }}}_{{\mathbf {X}}_{-j}}$ reduces the conditional variance and

$$\begin{aligned} E[({{\hat{\theta }}}_{{\mathbf {X}}_{-j}} - \theta )^2 | \theta ] \le E[({{\hat{\theta }}}_{{\mathbf {Y}}} - \theta )^2 | \theta ] \end{aligned}$$

holds for every j and $\theta $.

By Cauchy–Schwarz inequality, we get

$$\begin{aligned} E[({{\hat{\theta }}}_{{\mathbf {X}}} - \theta )^2 | \theta ] \le E \left[ \frac{1}{J} \sum _{j=1}^J ({{\hat{\theta }}}_{{\mathbf {X}}_{-j}} - \theta )^2 | \theta ] \le E[({{\hat{\theta }}}_{{\mathbf {Y}}} - \theta )^2 | \theta \right] . \end{aligned}$$

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, S., Wang, Z., Qi, J. et al. Accurate Assessment via Process Data. Psychometrika 88, 76–97 (2023). https://doi.org/10.1007/s11336-022-09880-8

Download citation

Received: 29 July 2020
Revised: 12 July 2022
Published: 13 August 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11336-022-09880-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate Assessment via Process Data

Abstract

Access this article

Similar content being viewed by others

The use of process data in large-scale assessments: a literature review

Quality Assurance in Digital-First Assessments

Computational Psychometrics: A Framework for Estimating Learners’ Knowledge, Skills and Abilities from Learning and Assessments Systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs of Theorem 1 and Theorem 2

Lemma 1

Proof of lemma 1

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accurate Assessment via Process Data

Abstract

Access this article

Similar content being viewed by others

The use of process data in large-scale assessments: a literature review

Quality Assurance in Digital-First Assessments

Computational Psychometrics: A Framework for Estimating Learners’ Knowledge, Skills and Abilities from Learning and Assessments Systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs of Theorem 1 and Theorem 2

Appendix: Proofs of Theorem 1 and Theorem 2

Lemma 1

Proof of lemma 1

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation