Abstract
Response process data from computer-based problem-solving items describe respondents’ problem-solving processes as sequences of actions. Such data provide a valuable source for understanding respondents’ problem-solving behaviors. Recently, data-driven feature extraction methods have been developed to compress the information in unstructured process data into relatively low-dimensional features. Although the extracted features can be used as covariates in regression or other models to understand respondents’ response behaviors, the results are often not easy to interpret since the relationship between the extracted features, and the original response process is often not explicitly defined. In this paper, we propose a statistical model for describing response processes and how they vary across respondents. The proposed model assumes a response process follows a hidden Markov model given the respondent’s latent traits. The structure of hidden Markov models resembles problem-solving processes, with the hidden states interpreted as problem-solving subtasks or stages. Incorporating the latent traits in hidden Markov models enables us to characterize the heterogeneity of response processes across respondents in a parsimonious and interpretable way. We demonstrate the performance of the proposed model through simulation experiments and case studies of PISA process data.
Similar content being viewed by others
Data Availability
The dataset analyzed in the current study are available at https://www.oecd.org/pisa/pisaproducts/database-cbapisa2012.htm.
References
Binkley, M., Erstad, O., Herman, J., Raizen, S., Ripley, M., Miller-Ricci, M., & Rumble, M. (2012). Defining twenty-first century skills. In Assessment and teaching of 21st century skills (pp. 17–66). Springer.
Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA Journal of Applied Mathematics, 6(1), 76–90.
Cappé, O., Moulines, E., & Ryden, T. (2005). Inference in hidden Markov models. Springer. https://doi.org/10.1007/0-387-28982-8
Chen, Y. (2020). A continuous-time dynamic choice measurement model for problem-solving process data. Psychometrika, 85(4), 1052–1075.
Chen, Y., Li, X., Liu, J., & Ying, Z. (2019a). Statistical analysis of complex problem-solving process data: An event history analysis approach. Frontiers in Psychology, 10, 486.
Chen, Y., Li, X., & Zhang, S. (2019b). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika, 84(1), 124–146.
Cover, T. M., & Thomas, J. A. (2006). Elements of information theory (2nd ed.). Wiley.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39(1), 1–22.
Eddelbuettel, D., & François, R. (2011). Rcpp: Seamless r and c++ integration. Journal of Statistical Software, 40, 1–18.
Eichmann, B., Greiff, S., Naumann, J., Brandhuber, L., & Goldhammer, F. (2020). Exploring behavioural patterns during complex problem-solving. Journal of Computer Assisted Learning, 36(6), 933–956.
Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal, 13(3), 317–322.
Giner, G., Chen, L., Hu, Y., Dunn, P., Phipson, B., & Chen, Y. (2023). statmod: Statistical modeling [Computer software manual]. Retrieved from https://cran.r-project.org/package=statmod
Goldfarb, D. (1970). A family of variable-metric methods derived by variational means. Mathematics of Computation, 24(109), 23–26.
Greiff, S., Niepel, C., Scherer, R., & Martin, R. (2016). Understanding students’ performance in a computer-based assessment of complex problem solving: An analysis of behavioral data from computer-generated log files. Computers in Human Behavior, 61, 36–46. https://doi.org/10.1016/j.chb.2016.02.095
Greiff, S., Wüstenberg, S., & Avvisati, F. (2015). Computer-generated log-file analyses as a window into students’ minds? A showcase study based on the PISA 2012 assessment of problem solving. Computers & Education, 91, 92–105.
Han, Y., Liu, H., & Ji, F. (2021). A sequential response model for analyzing process data on technology-based problem-solving tasks. Multivariate Behavioral Research, 57, 960.
He, Q., & von Davier, M. (2016). Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 749-776). Information Science Reference. https://doi.org/10.4018/978-1-4666-9441-5.ch029
He, Q., Liao, D., & Jiao, H. (2019). Clustering behavioral patterns using process data in PIAAC problem-solving items. In Theoretical and practical advances in computer-based educational measurement (pp. 189-212). Springer.
Herborn, K., Mustafić, M., & Greiff, S. (2017). Mapping an experiment-based assessment of collaborative behavior onto collaborative problem solving in PISA 2015: A cluster analysis approach for collaborator profiles. Journal of Educational Measurement, 54(1), 103–122.
Liang, K., Tu, D., & Cai, Y. (2022). Using process data to improve classification accuracy of cognitive diagnosis model. Multivariate Behavioral Research. https://doi.org/10.1080/00273171.2022.2157788
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge.
McCullagh, P., & Nelder, J. (2018). Generalized linear models. Routledge.
OECD. (2014). PISA 2012 results: Creative problem solving: Students’ skills in tackling real-life problems (Vol. 5). OECD Publishing. https://doi.org/10.1787/9789264208070-en
R Core Team. (2023). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/
Rabiner, L., & Juang, B. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1), 4–16.
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.
Shanno, D. F. (1970). Conditioning of quasi-Newton methods for function minimization. Mathematics of Computation, 24(111), 647–656.
Stadler, M., Fischer, F., & Greiff, S. (2019). Taking a closer look: An exploratory analysis of successful and unsuccessful strategy use in complex problems. Frontiers in Psychology, 10, 777.
Tang, X., Wang, Z., He, Q., Liu, J., & Ying, Z. (2020). Automatic feature construction for process data using multidimensional scaling. Psychometrika, 85, 378–397.
Tang, X., Wang, Z., Liu, J., & Ying, Z. (2021). An exploration of process data by action sequence autoencoder. British Journal of Mathematical and Statistical Psychology, 74, 1–33.
Ulitzsch, E., He, Q., & Pohl, S. (2022a). Using sequence mining techniques for understanding incorrect behavioral patterns on interactive tasks. Journal of Educational and Behavioral Statistics, 47(1), 3–35.
Ulitzsch, E., Ulitzsch, V., He, Q., & Lüdtke, O. (2022b). A machine learning-based procedure for leveraging clickstream data to investigate early predictability of failure on interactive tasks. Behavior Research Methods, 55, 1392.
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE transactions on Information Theory, 13(2), 260–269.
von Davier, M., Khorramdel, L., He, Q., Shin, H. J., & Chen, H. (2019). Developments in psychometric population models for technology-based large-scale assessments: An overview of challenges and opportunities. Journal of Educational and Behavioral Statistics, 44(6), 671–705.
Wang, Z., Tang, X., Liu, J., & Ying, Z. (2022). Subtask analysis of process data through a predictive model. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12290
Xiao, Y., He, Q., Veldkamp, B., & Liu, H. (2021). Exploring latent states of problem-solving competence using hidden Markov model on process data. Journal of Computer Assisted Learning, 37(5), 1232–1247.
Xu, H., Fang, G., & Ying, Z. (2020). A latent topic model with Markov transition for process data. British Journal of Mathematical and Statistical Psychology, 73(3), 474–505.
Zhang, S., Wang, Z., Qi, J., Liu, J., & Ying, Z. (2023). Accurate assessment via process data. Psychometric, 88, 76–97. https://doi.org/10.1007/s11336-022-09880-8
Zhan, P., & Qiao, X. (2022). Diagnostic classification analysis of problem-solving competence using process data: An item expansion method. Psychometrika, 87, 1529.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author has no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was funded by National Science Foundation Grant DMS-2310664.
Appendices
Appendix A LHMM Likelihood Computation
The likelihood for a set of response processes \({\mathcal {Y}}_n\) following an LHMM is
We demonstrate here how to compute \(L_i\left( \varvec{\eta }\mid \varvec{y}^{(i)}\right) = P\left( \varvec{Y}^{(i)} = \varvec{y}^{(i)} \mid \varvec{\eta }\right) \). For notation simplicity, the superscripts and the subscripts denoting different respondents are suppressed hereafter. We explain first how to compute \(f(\varvec{\eta }, \theta ) = P(\varvec{Y} = \varvec{y} \mid \varvec{\eta }, \theta )\) given \((\varvec{\eta }, \theta )\) and then how to numerically integrate \(\phi (\theta )f(\varvec{\eta }, \theta )\) with respect to \(\theta \) to obtain \(L(\varvec{\eta }\mid \varvec{y})\).
For \(k = 1, \ldots , K\) and \(t = 1, \ldots , T\), define the forward probability
Given \(\varvec{\eta }\) and \(\theta \), we can obtain \(f(\varvec{\eta }, \theta )\) from the forward probabilities \(\alpha _T(k \mid \theta )\) since \(f(\varvec{\eta }, \theta ) = \sum _{k = 1}^K \alpha _T(k \mid \theta )\). According to HMM assumptions (1–4), it is easy to verify \(\alpha _1(k \mid \theta ) = \pi _k (\theta ) q_{k, y_1}(\theta )\) and
where \(\pi _k(\theta )\), \(p_{kl}(\theta )\), and \(q_{kj}(\theta )\) are defined in (5–7). Therefore, \(\alpha _T(k\mid \theta )\) can be computed by first calculating \(\alpha _1(k \mid \theta )\) and then applying (A2) recursively.
Besides the forward probabilities, one can also define the backward probability
Letting \(\beta _T(k \mid \theta ) = 1\), then we have the recursive relation
Although computing \(f(\varvec{\eta }, \theta )\) does not require the backward probabilities, we still compute them when evaluating the likelihood because they, together with the forward probabilities, are essential components for computing the derivatives of the likelihood function. See Appendix B for details.
Given that \(f(\varvec{\eta }, \theta )\) is computable, we can approximate
using Gaussian–Hermite quadrature by \(\frac{1}{\sqrt{\pi }} \sum _{u = 1}^U w_u f(\varvec{\eta }, \sqrt{2}x_u)\) where \(x_1, \ldots , x_U\) are U quadrature points and \(w_1, \ldots , w_U\) are the associated weights. The quadrature points and the corresponding weights for a given U can be computed based on the Hermite polynomials. We use the function gauss.quad in the R package statmod for this aim.
The algorithm for computing the likelihood function for LHMM is summarized in Algorithm 1.
Algorithm 1
(LHMM likelihood computation) The likelihood function \(L(\varvec{\eta }\mid \varvec{y})\) for a response process \(\varvec{y}\) following LHMM is computed in the following steps.
-
1.
Obtain Gaussian–Hermite quadrature points \(x_1, \ldots , x_U\) and the associated weights \(w_1, \ldots , w_U\).
-
2.
For \(u = 1, \ldots , U\), compute \(f(\varvec{\eta }, \sqrt{2}x_u)\) as follows.
-
(a)
Compute \(\alpha _1(k \mid \sqrt{2}x_u) = \pi _k(\sqrt{2}x_u) q_{k, y_1}(\sqrt{2}x_u)\) and set \(\beta _T(k \mid \sqrt{2}x_u) = 1\) for \(k = 1, \ldots , K\).
-
(b)
For \(t = 2, \ldots , T\) and \(k = 1, \ldots , K\), compute
$$\begin{aligned} \alpha _t(k \mid \sqrt{2}x_u) = \sum _{l = 1}^K \alpha _{t-1} (l \mid \sqrt{2}x_u) p_{lk}(\sqrt{2}x_u)q_{k, y_t}(\sqrt{2}x_u) \end{aligned}$$and
$$\begin{aligned} \beta _{T-t+1}(k \mid \sqrt{2} x_u) = \sum _{l = 1}^K p_{kl}(\sqrt{2}x_u) q_{l,y_{T-t+2}}(\sqrt{2}x_u) \beta _{T-t+2}(l \mid \sqrt{2}x_u). \end{aligned}$$ -
(c)
Compute \(f(\varvec{\eta }, \sqrt{2}x_u) = \sum _{k = 1}^K \alpha _T(k\mid \sqrt{2}x_u)\).
-
(a)
-
3.
Compute \(L(\varvec{\eta }\mid \varvec{y}) = \frac{1}{\sqrt{\pi }}\sum _{u=1}^U w_u f(\varvec{\eta }, \sqrt{2}x_u)\).
Appendix B Gradient of LHMM Log-Likelihood Function
For a given element \(\eta \) in \(\varvec{\eta }\),
The algorithm for calculating \(L_i(\varvec{\eta }\mid \varvec{y}^{(i)})\) is presented in Appendix A. We explain here how to compute \(\frac{\partial L_i(\varvec{\eta }\mid \varvec{y}^{(i)})}{\partial \eta }\). The superscripts and the subscripts denoting different respondents are suppressed hereafter to simplify notation. Let \(f(\varvec{\eta }, \theta ) = P(\varvec{Y} = \varvec{y} \mid \varvec{\eta },\theta )\). Then
If \(\frac{\partial f(\varvec{\eta }, \theta )}{\partial \eta }\) is computable given \((\varvec{\eta }, \theta )\), then the integral on the right-hand side of (A5) can be approximated using Gaussian–Hermite quadrature similarly as in computing the likelihood function. In the remaining part, we focus on deriving \(\frac{\partial f(\varvec{\eta }, \theta )}{\partial \eta }\). In the following calculations, the initial state probability \(\pi _k\), the state transition probabilities \(p_{kl}\), and the state-action probabilities \(q_{kj}\) all depend on \(\theta \) as defined in (5–7). To simplify notation, we do not explicitly write them as functions of \(\theta \).
First, consider taking derivative of f with respect to \(\pi _k\), \(p_{kl}\), and \(q_{kj}\). Define \(\varvec{\alpha }_t = (\alpha _t(1), \ldots , \alpha _t(K))^\top \) and \(\varvec{\beta }_t = (\beta _t(1), \ldots , \beta _t(K))^\top \) where \(\alpha _t(k)\) and \(\beta _t(k)\) are the forward and backward probabilities defined in (A1) and (A3), respectively. Then, the relationship in (A2) and (A4) can be expressed compactly as
where \(\varvec{P}\) is the state transition probability matrix and \(\tilde{\varvec{Q}}_t = {{\,\textrm{diag}\,}}\{q_{1, y_t}, \ldots , q_{K, y_t}\}\). Recursively applying the above relationship, we get
where \(\varvec{1}\) is a column vector of K ones. Let x denote a generic element of \(\varvec{\pi }\), \(\varvec{P}\) or \(\varvec{Q}\). Then,
Replacing x with \(\pi _k\), \(p_{kl}\), and \(q_{kj}\) and simplifying the expression, we obtain
According to the chain rule,
Combining (A6) and (A7) gives \(\frac{\partial f}{\partial \eta }\) for \(\eta = \tau _k, \mu _k, a_{kl}, b_{kl}, c_{kj}, d_{kj}\).
Appendix C Viterbi Algorithm
Let \(\varvec{y}\) be a sequence following the LHMM with parameters \(\varvec{\eta }\) and latent trait \(\theta \). The most probable hidden state sequence \(\hat{\varvec{s}}\) can be found using the Viterbi algorithm. For \(k = 1, \ldots , K\) and \(t = 2, \ldots , T\), define
According to HMM assumptions (1)–(4), we have the recursive relation
where \(v_1(k) = \pi _k(\theta ) q_{k,y_1}(\theta )\). Let
After computing \(v_t(k)\) and \(u_t(k)\) for \(k=1, \ldots , K\) and \(t=2, \ldots , T\) sequentially, the most probable hidden state sequence can be obtained by backtracing:
The algorithm is summarized in Algorithm 2.
Algorithm 2
(Viterbi Algorithm) The most probable hidden state sequence \(\hat{\varvec{s}}\) for a response process \(\varvec{y}\) following the LHMM with latent trait \(\theta \) is obtained in the following steps.
-
1.
For \(k = 1, \ldots , K\), compute \(v_1(k) = \pi _k(\theta ) q_{k, y_1}(\theta )\).
-
2.
For \(t = 2, \ldots , T\),
-
(a)
Compute \(w_t(l, k) = v_{t-1}(l) p_{lk}(\theta ) q_{k, y_t}(\theta )\) for \(k,l = 1, \ldots , K\);
-
(b)
Record \(v_t(k) = \max _{l} w_t(l, k)\) and \(u_t(k) = \mathop {\textrm{argmax}}\limits _{l} w_t(l, k)\) for \(k = 1, \ldots , K\).
-
(a)
-
3.
Obtain \(\hat{\varvec{s}}\) by backtracing:
-
(a)
\({\hat{s}}_T = \mathop {\textrm{argmax}}\limits _k v_T(k)\);
-
(b)
For \(t = T-1, \ldots , 1\), set \({\hat{s}}_t = \mathop {\textrm{argmax}}\limits _k u_{t+1}(k)\).
-
(a)
Appendix D Estimated LHMM Parameters in Case Studies
Tables 3 and 4 present the LHMM parameter estimates for the CC item and the TICKET item, respectively.
Appendix E True Parameters in Simulation Studies
Table 5 presents the parameters of LHMM for generating the action sequences in the simulation study. The values are chosen so that the resulting state transition and state-action probability curves are similar to those obtained in the TICKET item.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tang, X. A Latent Hidden Markov Model for Process Data. Psychometrika (2023). https://doi.org/10.1007/s11336-023-09938-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11336-023-09938-1