Identifiability of Hidden Markov Models for Learning Trajectories in Cognitive Diagnosis

Liu, Ying; Culpepper, Steven Andrew; Chen, Yuguo

doi:10.1007/s11336-023-09904-x

Identifiability of Hidden Markov Models for Learning Trajectories in Cognitive Diagnosis

Theory and Methods
Published: 16 February 2023

Volume 88, pages 361–386, (2023)
Cite this article

Psychometrika Aims and scope Submit manuscript

698 Accesses
3 Citations
Explore all metrics

Abstract

Hidden Markov models (HMMs) have been applied in various domains, which makes the identifiability issue of HMMs popular among researchers. Classical identifiability conditions shown in previous studies are too strong for practical analysis. In this paper, we propose generic identifiability conditions for discrete time HMMs with finite state space. Also, recent studies about cognitive diagnosis models (CDMs) applied first-order HMMs to track changes in attributes related to learning. However, the application of CDMs requires a known $\varvec{Q}$ matrix to infer the underlying structure between latent attributes and items, and the identifiability constraints of the model parameters should also be specified. We propose generic identifiability constraints for our restricted HMM and then estimate the model parameters, including the $\varvec{Q}$ matrix, through a Bayesian framework. We present Monte Carlo simulation results to support our conclusion and apply the developed model to a real dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Joint Maximum Likelihood Estimation Approach to Cognitive Diagnosis Models

Identifiability and Cognitive Diagnosis Models

Bridging Parametric and Nonparametric Methods in Cognitive Diagnosis

Article 16 August 2022

References

Allman, E. S., Matias, C., & Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics, 37(6A), 3099–3132.
Article Google Scholar
Baras, J. S., & Finesso, L. (1992). Consistent estimation of the order of hidden Markov chains. In T. E. Duncan & B. Pasik-Duncan (Eds.), Stochastic theory and adaptive control (pp. 26–39). Berlin & Heidelberg: Springer.
Chapter Google Scholar
Blasiak, S., & Rangwala, H. (2011). A hidden Markov model variant for sequence classification. In: Proceedings of the twenty-second international joint conference on artificial intelligence - volume two ( 1192–1197). AAAI Press.
Bonhomme, S., Jochmans, K., & Robin, J. M. (2016). Estimating multivariate latent-structure models. The Annals of Statistics, 44(2), 540–563.
Article Google Scholar
Brooks, S. P., & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434–455.
Google Scholar
Chen, Y., Culpepper, S., Chen, Y., & Douglas, J. (2018). Bayesian estimation of the DINA Q matrix. Psychometrika, 83(1), 89–108.
Article PubMed Google Scholar
Chen, Y., Culpepper, S. A., & Liang, F. (2020). A sparse latent class model for cognitive diagnosis, Psychometrika, 85, 121–153.
Chen, Y., Culpepper, S., Wang, S., & Douglas, J. (2018). A hidden Markov model for learning trajectories in cognitive diagnosis with application to spatial rotation skills. Applied Psychological Measurement, 42(1), 5–23.
Article PubMed Google Scholar
Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110(510), 850–866.
Article PubMed Google Scholar
Chen, Y., Liu, Y., Culpepper, S. A., & Chen, Y. (2021). Inferring the number of attributes for the exploratory DINA model. Psychometrika, 86(1), 30–64.
Article PubMed Google Scholar
Chiu, C. Y., Douglas, J., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74, 633–665.
Article Google Scholar
Cox, D. A., Little, J., & O’Shea, D. (2015). Ideals, varieties, and algorithms. New York: Springer.
Book Google Scholar
Crouse, M. S., Nowak, R. D., & Baraniuk, R. G. (1998). Wavelet-based statistical signal processing using hidden Markov models. IEEE Transactions on Signal Processing, 46(4), 886–902.
Article Google Scholar
Culpepper, S. A. (2015). Bayesian estimation of the DINA model with Gibbs sampling. Journal of Educational and Behavioral Statistics, 40(5), 454–476.
Article Google Scholar
De La Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.
Article Google Scholar
Gu, Y., & Xu, G. (2021). Sufficient and necessary conditions for the identifiability of the Q-matrix. Statistica Sinica, 31, 449–472.
Google Scholar
Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301–321.
Article Google Scholar
Heller, J., & Wickelmaier, F. (2013). Minimum discrepancy estimation in probabilistic knowledge structures. Electronic Notes in Discrete Mathematics, 42, 49–56.
Article Google Scholar
Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210.
Article Google Scholar
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.
Article Google Scholar
Khatri, C. G., & Rao, C. R. (1968). Solutions to some functional equations and their applications to characterization of probability distributions. Sankhya: The Indian Journal of Statistics, Series A, 30(2), 167–180.
Google Scholar
Kruskal, J. (1977). Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and its Applications, 18, 95–138.
Article Google Scholar
Lathauwer, L. D., Moor, B. D., & Vandewalle, J. (2004). Computation of the canonical decomposition by means of a simultaneous generalized schur decomposition. SIAM Journal on Matrix Analysis and Applications, 26, 295–327.
Article Google Scholar
Matsaglia, G., & Styan, G. P. H. (1974). Equalities and inequalities for ranks of matrices. Linear and Multilinear Algebra, 2(3), 269–292.
Article Google Scholar
Paz, A. (1971). Stochastic Sequential Machines. In A. Paz (Ed.), Introduction to probabilistic automata (pp. 1–66). Academic Press.
Google Scholar
Petrie, T. (1969). Probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics, 40(1), 97–115.
Article Google Scholar
Sipos, I. R., Ceffer, A., & Levendovszky, J. (2017). Parallel optimization of sparse portfolios with AR-HMMs. Computational Economics, 49, 563–578.
Article Google Scholar
Von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61(2), 287–307.
Article Google Scholar
Xu, G. (2017). Identifiability of restricted latent class models with binary responses. Annals of Statistics, 45(2), 675–707.
Article Google Scholar

Download references

Funding

The authors gratefully acknowledge the financial support of the NSF Grant Nos. SES-1758631, SES-1951057, and SES 21-50628.

Author information

Authors and Affiliations

Department of Statistics, University of Illinois at Urbana-Champaign, Computing Applications Building, Room 152, 605 E. Springfield Ave., Champaign, IL, 61820, USA
Ying Liu, Steven Andrew Culpepper & Yuguo Chen

Authors

Ying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Steven Andrew Culpepper
View author publications
You can also search for this author in PubMed Google Scholar
Yuguo Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steven Andrew Culpepper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Part (a) of Theorem 2 ($T\ge 3$ case)

We start with introducing some basic terminology and facts from algebraic geometry.

Definition 3

(Cox, Little, and O’Shea (2015)) An algebraic variety V is defined as the simultaneous zero-set of a finite collection of multivariate polynomials $\{f_i\}_{i=1}^{n}\subset {\mathbb {C}}[\,x_1,\ldots ,x_k ]\,$,

$$\begin{aligned} V=V(f_1,\ldots ,f_n)=\{\varvec{a}\in {\mathbb {C}}^k\vert f_i(\varvec{a})=0,1\le i \le n\}. \end{aligned}$$

(A1)

Here ${\mathbb {C}}[\,x_1,\ldots ,x_k ]$ represents the set of all polynomials in $x_1,\ldots ,x_k$ with coefficients in ${\mathbb {C}}$, and ${\mathbb {C}}^k$ is the set of k-dimensional complex numbers.

Lemma 1

(Allman et al. (2009)) A variety is all of ${\mathbb {C}}^k$ only when all $f_ i$ are 0; otherwise, a variety is called a proper subvariety and must be of dimension less than k, and of Lebesgue measure 0 in ${\mathbb {C}}^k$.

Remark 7

In Lemma 1, analogous statements still hold if we replace ${\mathbb {C}}^k$ by ${\mathbb {R}}^k$.

In order to show generic identifiability of model parameters, we can prove that all nonidentifiable parameter choices lie within a proper subvariety, and thus form a set of Lebesgue measure zero based on Lemma 1.

Proposition 1

$rank(\varvec{B})=rank(\varvec{\omega })=2^K$ if and only if $rank(\varvec{B} \cdot \varvec{\omega })=2^K$.

Proof

By Sylvester’s rank inequality (Matsaglia & Styan, 1974), we have

$$\begin{aligned} rank(\varvec{B})+rank(\varvec{\omega })-2^K \le rank(\varvec{B} \cdot \varvec{\omega })\le min\{rank(\varvec{B}),rank(\varvec{\omega })\}, \end{aligned}$$

so the proposition holds. $\square $

By Proposition 1 and part (a) of Theorem 1, we only need to show that $rank(\varvec{B}\cdot \varvec{\omega })=2^K$ holds almost everywhere in $\varvec{\Omega }_{\varvec{\omega }, \varvec{B}}=\{(\varvec{\omega },\varvec{B}):\ \varvec{\omega }\in \varvec{\Omega }(\varvec{\omega }),\ \varvec{B}\in \varvec{\Omega }(\varvec{B}) \text { and } rank(\varvec{B})=2^K\}$.

Let M be a subset of $\{1,\ldots ,2^J\}$ with $2^K$ elements, and then, let $[\varvec{B} \cdot \varvec{\omega }]_M$ denote the minor of a submatrix in $\varvec{B} \cdot \varvec{\omega }$ that corresponds to the rows with indices in M. Let

$$\begin{aligned} f(\varvec{B}, \varvec{\omega })=\sum _{M}([\varvec{B} \cdot \varvec{\omega }]_M)^2:\ \varvec{\Omega }_{\varvec{\omega }, \varvec{B}}\rightarrow {\mathbb {R}} \end{aligned}$$

(A2)

denote the summation of all squared minors of order $2^K$ of matrix $\varvec{B} \cdot \varvec{\omega }$.

Since $f(\varvec{B}, \varvec{\omega })$ is a polynomial function of $\varvec{B}$ and $\varvec{\omega }$, and we know that the rank of $\varvec{B}\cdot \varvec{\omega }$ is the maximal order of a nonzero minor of $\varvec{B}\cdot \varvec{\omega }$, then by Proposition 1, we can write the zero set of $f(\varvec{B}, \varvec{\omega })$ as:

$$\begin{aligned} \begin{aligned} \varvec{Z}_f&=\{(\varvec{\omega },\varvec{B}): (\varvec{\omega },\varvec{B})\in \varvec{\Omega }_{\varvec{\omega }, \varvec{B}} \text { and } f(\varvec{B}, \varvec{\omega })=0\} \\&=\{(\varvec{\omega },\varvec{B}): (\varvec{\omega },\varvec{B})\in \varvec{\Omega }_{\varvec{\omega }, \varvec{B}} \text { and } [\varvec{B} \cdot \varvec{\omega }]_M=0 \text { for all possible }M\}\\&=\{(\varvec{\omega },\varvec{B}):\ \varvec{\omega }\in \varvec{\Omega }(\varvec{\omega }),\ \varvec{B}\in \varvec{\Omega }(\varvec{B}),\ rank(\varvec{B})=2^K \text { and } rank(\varvec{\omega })<2^K\}. \end{aligned} \end{aligned}$$

(A3)

In the following, we will show that $f(\varvec{B}, \varvec{\omega })$ is not a constant zero function.

Proposition 2

If $rank(\varvec{B})=2^K$, then there exists some nonsingular $\varvec{\omega }$, such that $f(\varvec{B}, \varvec{\omega })\ne 0$.

Proof

Given a full column rank $\varvec{B}$, there must exist a nonzero minor of order $2^K$ in $\varvec{B}$. Without loss of generality, we assume that the first $2^K$ rows of $\varvec{B}$, denoted by $\varvec{B}^*$, satisfy $\det (\varvec{B}^*)\ne 0$; then, $\det (\varvec{B}^*)$ is a nonzero minor of order $2^K$. Let $\varvec{B}= (\varvec{B}^{*\top },\varvec{B}'^\top )^{\top }$. In order to show that $f(\varvec{B}, \varvec{\omega })\ne 0$ for some nonsingular $\varvec{\omega }$, it is enough to show that $\varvec{B} \cdot \varvec{\omega }$ has full column rank for some specific choice of nonsingular $\varvec{\omega }$, since that will establish that some minors of order $2^K$ of $\varvec{B} \cdot \varvec{\omega }$ are nonzero polynomials in the entries of $\varvec{B}$ and $\varvec{\omega }$.

For any nonsingular $\varvec{\omega }$, we have

$$\begin{aligned} \varvec{B} \cdot \varvec{\omega }= \begin{bmatrix} \varvec{B}^*\\ \varvec{B}^\prime \end{bmatrix}\cdot \varvec{\omega }= \begin{bmatrix} \varvec{B}^*\cdot \varvec{\omega }\\ \varvec{B}^\prime \cdot \varvec{\omega }\end{bmatrix}. \end{aligned}$$

(A4)

Since $\det (\varvec{B}^*)$ is a nonzero minor of $\varvec{B}$ and $\varvec{\omega }$ is a nonsingular matrix, then $rank(\varvec{B}^*\cdot \varvec{\omega })=2^K$. Therefore, $\det (\varvec{B}^*\cdot \varvec{\omega })$ is a nonzero minor of $\varvec{B} \cdot \varvec{\omega }$, which implies $rank(\varvec{B} \cdot \varvec{\omega })=2^K$ and $f(\varvec{B}, \varvec{\omega })\ne 0$. $\square $

Therefore, by Lemma 1, the zero set $\varvec{Z}_f$ has measure zero within $\varvec{\Omega }_{\varvec{\omega }, \varvec{B}}$. The HMM with $T\ge 3$ is generically identified.

Appendix B: Proof of Part (a) of Theorems 5 and 6 ($T\ge 3$ case)

We first show that if emission matrix $\varvec{B}$ is identified, then parameters $\varvec{s}$, $\varvec{g}$, and $\varvec{Q}$ in a restricted HMM can also be identified.

Proposition 3

For any $\varvec{B}, \varvec{B}^\prime \in \varvec{\Omega }(\varvec{B})$, $\varvec{s},\varvec{s}^\prime \in (0,1)^J$, $\varvec{g},\varvec{g}^\prime \in (0,1)^J$ and $\varvec{Q},\varvec{Q}^\prime \in \{0,1\}^{J\times K}$, we have

$$\begin{aligned} \varvec{B}=\varvec{B}^\prime \text{ if } \text{ and } \text{ only } \text{ if } (\varvec{s}, \varvec{g}, \varvec{Q})=(\varvec{s}^\prime , \varvec{g}^\prime , \varvec{Q}^\prime ). \end{aligned}$$

Proof

It suffices to show that given $\varvec{B}=\varvec{B}^\prime $, we must have $(\varvec{s}, \varvec{g}, \varvec{Q})=(\varvec{s}^\prime , \varvec{g}^\prime , \varvec{Q}^\prime )$.

For $j\in \{1,2,\ldots ,J\}$, let $\varvec{D}_j$ be the matrix such that $\varvec{D}_j\varvec{B}$ and $\varvec{D}_j\varvec{B}'$ reduce to the $2\times 2^K$ matrix of conditional probabilities for $Y_j$ given $\varvec{\alpha }_t$. For instance, the second row of $\varvec{D}_j\varvec{B}$ is $P(Y_j=1 \mid \varvec{\alpha }_t)$:

$$\begin{aligned} (g_j^{1-\eta _{j0}}(1-s_j)^{\eta _{j0}},\dots ,g_j^{1-\eta _{j,2^K-1}}(1-s_j)^{\eta _{j,2^K-1}}), \end{aligned}$$

where $\eta _{jc}={\mathcal {I}}\left( \varvec{\alpha }_{t}^{\top } {\varvec{q}}_{j} \ge {\varvec{q}}_{j}^{\top } {\varvec{q}}_{j},\ \varvec{\alpha }_t^\top \varvec{v}=c\right) $, $c=0,\ldots ,2^K-1$. Note that $\eta _{j,2^K-1}=\eta _{j,2^K-1}'=1$ and the assumption that $\varvec{q}_j\ne \varvec{0}$ and $\varvec{q}_j'\ne \varvec{0}$ implies $\eta _{j0}=\eta _{j0}'=0$. Therefore, $\varvec{D}_j\varvec{B}=\varvec{D}_j\varvec{B}'$ implies that $g_j=g_j'$, $s_j=s_j'$. Also, for $c\in \{1,\dots ,2^K-2\}$ we have

$$\begin{aligned} g_j^{1-\eta _{jc}}(1-s_j)^{\eta _{jc}}=g_j^{1-\eta _{jc}'}(1-s_j)^{\eta _{jc}'}, \end{aligned}$$

and $g_j\ne 1-s_j$ implies that $\eta _{jc}\ne \eta _{jc}'$ is not possible, so $\varvec{q}_j=\varvec{q}_j'$. $\square $

The emission matrix $\varvec{B}$ is of size $2^J \times 2^K$, and we use $\varvec{B}_{ \varvec{y}_t, \varvec{\alpha }_t}$ to denote the element corresponding to the row with response pattern $\varvec{y}_t$ (we refer to it as the $\varvec{y}_t$-th row) and column with attribute profile $\varvec{\alpha }_t$ (we refer to it as the $\varvec{\alpha }_t$-th column), so $\varvec{B}_{ \varvec{y}_t, \varvec{\alpha }_t}$ is the emission probability

$$\begin{aligned} P(\varvec{Y}_t=\varvec{y}_t \vert \varvec{\alpha }_t, \varvec{Q}, \varvec{s}, \varvec{g})=\prod _{j=1}^{J}\theta _{j,\varvec{\alpha }_t}^{y_{jt}}\left[ 1-\theta _{j,\varvec{\alpha }_t}\right] ^{1-y_{jt}}, \end{aligned}$$

(B1)

where $\theta _{j,\varvec{\alpha }_t}=(1-s_{j})^{\eta _{jt}} g_{j}^{\left( 1-\eta _{jt}\right) }$ and $\eta _{jt}={\mathcal {I}}\left( \varvec{\alpha }_{t}^{\top } {\varvec{q}}_{j} \ge {\varvec{q}}_{j}^{\top } {\varvec{q}}_{j}\right) $.

As mentioned in Sect. 2.3, we have a bipartition of the set ${\mathbb {J}}=\{1,2,\ldots ,J\}$ into two disjoint, nonempty subsets ${\mathbb {J}}_1=\{1,2,\ldots ,K\}$, ${\mathbb {J}}_2=\{K+1,\ldots ,J\}$. Then, let $\varvec{Y}_t=(\varvec{Y}_{t}^{{\mathbb {J}}_1\top },\varvec{Y}_{t}^{{\mathbb {J}}_2\top })^{\top }$, where $\varvec{Y}_{t}^{{\mathbb {J}}_1}=(Y_{1t},\ldots , Y_{Kt})^{\top }$ and $\varvec{Y}_{t}^{{\mathbb {J}}_2}=(Y_{(K+1)t},\ldots , Y_{Jt})^{\top }$. Assuming that the $\varvec{Q}$ matrix has the form shown in condition (A1), let

$$\begin{aligned} \varvec{Q}_{J \times K}=\left( \begin{array}{c}{\varvec{Q}^{{\mathbb {J}}_1}}\\ {\varvec{Q}^{{\mathbb {J}}_2}}\end{array}\right) , \end{aligned}$$

(B2)

and without loss of generality, let $\varvec{Q}^{{\mathbb {J}}_1}=\varvec{I}_{K}$ and $\varvec{Q}^{{\mathbb {J}}_2}=\varvec{Q}^{*}$. Then, the emission probability can be decomposed into two parts since the components of $\varvec{Y}_t$ are independent given profile $\varvec{\alpha }_t$:

$$\begin{aligned} P(\varvec{Y}_t=\varvec{y}_t \vert \varvec{\alpha }_t, \varvec{Q}, \varvec{s}, \varvec{g})=P(\varvec{Y}_t^{{\mathbb {J}}_1}=\varvec{y}_t^{{\mathbb {J}}_1} \vert \varvec{\alpha }_t, \varvec{Q}^{{\mathbb {J}}_1}=\varvec{I}_K, \varvec{s}, \varvec{g}) \cdot P(\varvec{Y}_t^{{\mathbb {J}}_2}=\varvec{y}_t^{{\mathbb {J}}_2} \vert \varvec{\alpha }_t, \varvec{Q}^{*}, \varvec{s}, \varvec{g}). \nonumber \\ \end{aligned}$$

(B3)

Similarly, the emission matrix $\varvec{B}$ can also be decomposed into two parts. Let $\varvec{B}^{{\mathbb {J}}_1}$ be a matrix of size $2^K \times 2^K$, where its $\varvec{y}_t^{{\mathbb {J}}_1}$-th row and $\varvec{\alpha }_t$-th column element is $P(\varvec{Y}_t^{{\mathbb {J}}_1}=\varvec{y}_t^{{\mathbb {J}}_1} \vert \varvec{\alpha }_t, \varvec{I}_K, \varvec{s}, \varvec{g})$; and let $\varvec{B}^{{\mathbb {J}}_2}$ be a matrix of size $2^{(J-K)} \times 2^K$, where its $\varvec{y}_t^{{\mathbb {J}}_2}$-th row and $\varvec{\alpha }_t$-th column element is $P(\varvec{Y}_t^{{\mathbb {J}}_2}=\varvec{y}_t^{{\mathbb {J}}_2} \vert \varvec{\alpha }_t, \varvec{Q}^{*}, \varvec{s}, \varvec{g})$. Therefore, the emission matrix $\varvec{B}$ can be decomposed as

$$\begin{aligned} \varvec{B}=\varvec{B}^{{\mathbb {J}}_1}\odot \varvec{B}^{{\mathbb {J}}_2}, \end{aligned}$$

(B4)

where $\odot $ represents column-wise tensor product, which is defined next.

Definition 4

(Khatri–Rao product; Khatri and Rao (1968)) Given matrices $\varvec{U} \in {\mathbb {R}}^{m_1 \times n}$ and $\varvec{V} \in {\mathbb {R}}^{m_2 \times n}$ with columns $\varvec{u}_{1},\ldots ,\varvec{u}_{n}$ and ${\textbf{v}}_{1},\ldots ,{\textbf{v}}_{n}$, respectively, their Khatri–Rao tensor product is denoted by $\varvec{U} \odot \varvec{V}$. The result is a matrix of size $(m_1 m_2) \times n$

$$\begin{aligned} \varvec{U} \odot \varvec{V}=\left[ \begin{array}{c} {\textbf{u}}_{1} \otimes {\textbf{v}}_{1} \quad {\textbf{u}}_{2} \otimes {\textbf{v}}_{2} \quad \cdots \quad {\textbf{u}}_{n} \otimes {\textbf{v}}_{n} \end{array}\right] . \end{aligned}$$

Remark 8

If ${\textbf{u}}$ and ${\textbf{v}}$ are vectors, then the Khatri–Rao product and Kronecker product are identical, i.e., ${\textbf{u}} \odot {\textbf{v}}={\textbf{u}} \otimes {\textbf{v}}$.

We can represent $\varvec{B}^{{\mathbb {J}}_1}$ as the Kronecker product of K $2\times 2$ sub-matrices (Chen, Culpepper, & Liang, 2020)

$$\begin{aligned} \varvec{B}^{{\mathbb {J}}_1}=\bigotimes _{j=1}^{K} \begin{bmatrix} 1-g_j &{} s_j\\ g_j &{} 1-s_j \end{bmatrix} :=\bigotimes _{j=1}^{K}\varvec{B}^{{\mathbb {J}}_1}_j. \end{aligned}$$

(B5)

Condition ‘$g_j\ne 1-s_j$’ in Theorem 5 implies that $rank(\varvec{B}^{{\mathbb {J}}_1}_j)=2$ for all j. Then, according to the property of the rank of a Kronecker product, we have $rank(\varvec{B}^{{\mathbb {J}}_1})=\prod _{j=1}^{K}rank( \varvec{B}^{{\mathbb {J}}_1}_j)=2^K$, which implies that $\varvec{B}^{{\mathbb {J}}_1}$ is a full rank matrix.

For the decomposition in Eq. (B4), we have

$$\begin{aligned} \varvec{B}^{{\mathbb {J}}_1}\odot \varvec{B}^{{\mathbb {J}}_2}= \begin{bmatrix} \varvec{B}^{{\mathbb {J}}_1}\varvec{D}_{1}(\varvec{B}^{{\mathbb {J}}_2}) \\ \varvec{B}^{{\mathbb {J}}_1}\varvec{D}_{2}(\varvec{B}^{{\mathbb {J}}_2}) \\ \vdots \\ \varvec{B}^{{\mathbb {J}}_1}\varvec{D}_{2^{J-K}}(\varvec{B}^{{\mathbb {J}}_2}) \end{bmatrix}, \end{aligned}$$

(B6)

where $\varvec{D}_{k}(\varvec{B}^{{\mathbb {J}}_2})$ denotes the diagonal matrix with the k-th row of $\varvec{B}^{{\mathbb {J}}_2}$ lying on its diagonal. Here $\varvec{D}_{1}(\varvec{B}^{{\mathbb {J}}_2})$ has full rank since $s_j$, $1-s_j$, $g_j$, $1-g_j$ are nonzero, which implies that

$$\begin{aligned} rank(\varvec{B}^{{\mathbb {J}}_1}\varvec{D}_{1}(\varvec{B}^{{\mathbb {J}}_2}))=\text {min}(rank(\varvec{D}_{1}(\varvec{B}^{{\mathbb {J}}_2})),\ rank(\varvec{B}^{{\mathbb {J}}_1}))=2^K, \end{aligned}$$

(B7)

then $\det (\varvec{B}^{{\mathbb {J}}_1}\varvec{D}_{1}(\varvec{B}^{{\mathbb {J}}_2}))$ is a nonzero minor of $\varvec{B}^{{\mathbb {J}}_1}\odot \varvec{B}^{{\mathbb {J}}_2}$ with order $2^K$, so we have

$$\begin{aligned} rank(\varvec{B})=rank(\varvec{B}^{{\mathbb {J}}_1}\odot \varvec{B}^{{\mathbb {J}}_2})=2^K. \end{aligned}$$

Also $\pi _c>0$ for all c in Theorem 5. Therefore, the strict identifiability condition in Theorem 1 is satisfied, and the restricted HMM with $T\ge 3$ is identified. This completes the proof of part (a) of Theorem 5.

Without the condition ‘$g_j\ne 1-s_j$’, $\varvec{B}$ has full column rank unless there exists at least one $j^*$, such that $g_{j^*} = 1-s_{j^*}$. Then, the dimension of this exceptional set is less than the dimension of $\varvec{\Omega }(\varvec{\pi }, \varvec{\omega }, \varvec{s}, \varvec{g}, \varvec{Q})$, hence of Lebesgue measure zero. Therefore, the generic identifiability condition in Theorem 2 is satisfied, and the restricted HMM with $T\ge 3$ is generically identified. This completes the proof of part (a) of Theorem 6.

Appendix C: Proof of Part (b) of Theorems 1 and 2 ($T=2$ case)

The proof is based on Kruskal (1977) for the uniqueness of three-way arrays and its application on the identifiability conditions of three-variate latent class models discussed in Allman et al. (2009).

We start from representing the marginal distribution of $(\varvec{Y}_1, \varvec{Y}_2)^\top $ as a three-way array by decomposing $\varvec{Y}_2$ into two parts as shown in Eq. (B3):

$$\begin{aligned} \begin{aligned} \varvec{T}&(\varvec{y}_1, \varvec{y}_2^{{\mathbb {J}}_1},\varvec{y}_2^{{\mathbb {J}}_2}) = P(\varvec{Y}_1=\varvec{y}_1, \varvec{Y}_2^{{\mathbb {J}}_1}=\varvec{y}_2^{{\mathbb {J}}_1},\varvec{Y}_2^{{\mathbb {J}}_2}=\varvec{y}_2^{{\mathbb {J}}_2}\mid \varvec{\pi }, \varvec{\omega }, \varvec{B})\\&=\sum _{\varvec{\alpha }_2} \pi _{\varvec{\alpha }_2} P(\varvec{Y}_1=\varvec{y}_1, \varvec{Y}_2^{{\mathbb {J}}_1}=\varvec{y}_2^{{\mathbb {J}}_1},\varvec{Y}_2^{{\mathbb {J}}_2}=\varvec{y}_2^{{\mathbb {J}}_2}\mid \varvec{\omega }, \varvec{B},\varvec{\alpha }_2)\\&=\sum _{\varvec{\alpha }_2} \pi _{\varvec{\alpha }_2} P(\varvec{Y}_1=\varvec{y}_1\mid \varvec{\omega }, \varvec{B},\varvec{\alpha }_2) P(\varvec{Y}_2^{{\mathbb {J}}_1}=\varvec{y}_2^{{\mathbb {J}}_1}\mid \varvec{\omega }, \varvec{B},\varvec{\alpha }_2) P(\varvec{Y}_2^{{\mathbb {J}}_2}=\varvec{y}_2^{{\mathbb {J}}_2}\mid \varvec{\omega }, \varvec{B},\varvec{\alpha }_2). \end{aligned} \end{aligned}$$

(C1)

As shown in Bonhomme et al. (2016), we let $\varvec{A}=\varvec{B}\cdot diag(\varvec{\pi })\cdot \varvec{\omega }\cdot diag(\varvec{\pi })^{-1}$ denote the distribution of $\varvec{Y}_1$ given values of $\varvec{\alpha }_2$ (attribute profile at time point 2). Then, the identifiability is equivalent to the uniqueness of the decomposition of the following tensor (Kruskal, 1977):

$$\begin{aligned} \begin{aligned} \varvec{T}&=\sum _{\varvec{\alpha }_2} \pi _{\varvec{\alpha }_2} \varvec{A}_{\varvec{\alpha }_2} \odot \varvec{B}^{{\mathbb {J}}_1}_{\varvec{\alpha }_2}\odot \varvec{B}^{{\mathbb {J}}_2}_{\varvec{\alpha }_2} =\sum _{\varvec{\alpha }_2} \tilde{\varvec{A}}_{\varvec{\alpha }_2} \odot \varvec{B}^{{\mathbb {J}}_1}_{\varvec{\alpha }_2}\odot \varvec{B}^{{\mathbb {J}}_2}_{\varvec{\alpha }_2}, \end{aligned} \end{aligned}$$

(C2)

where $\varvec{A}_{\varvec{\alpha }_2}$, $\varvec{B}^{{\mathbb {J}}_1}_{\varvec{\alpha }_2}$, $\varvec{B}^{{\mathbb {J}}_2}_{\varvec{\alpha }_2}$ are the $\varvec{\alpha }_2$-th column of $\varvec{A}$, $\varvec{B}^{{\mathbb {J}}_1}$, $\varvec{B}^{{\mathbb {J}}_2}$, and $\tilde{\varvec{A}}_{\varvec{\alpha }_2}=\pi _{\varvec{\alpha }_2} \varvec{A}_{\varvec{\alpha }_2}$.

Next, we give the definition of Kruskal rank and state the theorem in Kruskal (1977) for our setting.

Definition 5

For a matrix $\varvec{M}$, the Kruskal rank of $\varvec{M}$, i.e., $rank_K(\varvec{M})$, is the largest number I such that every set of I columns in $\varvec{M}$ are linearly independent.

Remark 9

Compared with the rank of a matrix $\varvec{M}$, we have $rank_K(\varvec{M})\le rank(\varvec{M})$. If $\varvec{M}$ has full column rank, then the equality holds.

Theorem 7

(Kruskal (1977)) If

$$\begin{aligned} rank_K(\tilde{\varvec{A}})+rank_K(\varvec{B}^{{\mathbb {J}}_1})+rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2 \cdot 2^K +2, \end{aligned}$$

(C3)

then the tensor decomposition of $\varvec{T}$ is unique up to simultaneous permutation and rescaling of the rows.

Since $\varvec{\pi }$ has all positive entries, then we have $rank_K(\tilde{\varvec{A}})=rank_K(\varvec{A})$. Moreover, $\varvec{A}$, $\varvec{B}^{{\mathbb {J}}_1}$ and $\varvec{B}^{{\mathbb {J}}_2}$ are all stochastic matrices with column sum 1, so the decomposition of the tensor $\varvec{T}$ is unique up to state label swapping if (C3) in Theorem 7 is satisfied.

Bonhomme et al. (2016) established strict identifiability of HMMs for $T>2$. We next establish sufficient conditions for the identifiability of the restricted HMM with $T=2$. Since $\varvec{A}=\varvec{B}\cdot diag(\varvec{\pi })\cdot \varvec{\omega }\cdot diag(\varvec{\pi })^{-1}$, the rank conditions on $\varvec{B}^{{\mathbb {J}}_1}$ and $\varvec{\omega }$ imply that $\varvec{A}$ also has full column rank $2^K$. Therefore, given $rank(\varvec{B}^{{\mathbb {J}}_1})=2^K$ and $rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2$, the HMM with $T=2$ is identified by Theorem 7. This completes the proof of part (b) of Theorem 1.

Following the similar idea as the proof for Theorem 2 in Appendix A, we need to show that $rank(\varvec{B}^{{\mathbb {J}}_1})=2^K$, $rank(\varvec{B}^{{\mathbb {J}}_2})\ge 2$ and $rank(\varvec{\omega })=2^K$ hold almost everywhere in

$$\begin{aligned} \varvec{\Omega }^\prime _{\varvec{\omega }, \varvec{B}}=\{(\varvec{\omega },\varvec{B}):\ \varvec{\omega }\in \varvec{\Omega }(\varvec{\omega }),\ \varvec{B}\in \varvec{\Omega }(\varvec{B}), \ rank(\varvec{B}^{{\mathbb {J}}_1})=2^K\text { and } rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2\}, \end{aligned}$$

which implies that the restricted HMM with $T=2$ is generically identified.

Let $f^\prime (\varvec{B}, \varvec{\omega })=\sum _{M}([\varvec{B} \cdot \varvec{\omega }]_M)^2:\ \varvec{\Omega }^\prime _{\varvec{\omega }, \varvec{B}}\rightarrow {\mathbb {R}}$, then the zero set of $f^\prime (\varvec{B}, \varvec{\omega })$ is

$$\begin{aligned} \begin{aligned} \varvec{Z}_{f^\prime }&=\{(\varvec{\omega },\varvec{B}): (\varvec{\omega },\varvec{B})\in \varvec{\Omega }^\prime _{\varvec{\omega }, \varvec{B}} \text { and } f^\prime (\varvec{B}, \varvec{\omega })=0\} \\&=\{(\varvec{\omega },\varvec{B}): (\varvec{\omega },\varvec{B})\in \varvec{\Omega }^\prime _{\varvec{\omega }, \varvec{B}} \text { and } [\varvec{B} \cdot \varvec{\omega }]_M=0 \text { for all possible }M\}\\&=\{(\varvec{\omega },\varvec{B}):\ \varvec{\omega }\in \varvec{\Omega }(\varvec{\omega }),\ \varvec{B}\in \varvec{\Omega }(\varvec{B}),\ rank(\varvec{B}^{{\mathbb {J}}_1})=2^K, \ rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2 \text { and } rank(\varvec{\omega })<2^K\}. \end{aligned} \end{aligned}$$

(C4)

As shown in Appendix B, $rank(\varvec{B}^{{\mathbb {J}}_1})=2^K$ and $rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2$ imply $rank(\varvec{B})=rank(\varvec{B}^{{\mathbb {J}}_1}\odot \varvec{B}^{{\mathbb {J}}_2})=2^K$. By Proposition 2, we know that $f^\prime (\varvec{B}, \varvec{\omega })$ is not a zero function. Then, by Lemma 1, the zero set $\varvec{Z}_f^\prime $ has measure zero within $\varvec{\Omega }^\prime _{\varvec{\omega }, \varvec{B}}$. So the restricted HMM with $T= 2$ is generically identified. This completes the proof of part(b) of Theorem 2.

Appendix D: Proof of Part (b) of Theorems 5 and 6 ($T=2$ case)

We first introduce the following two propositions.

Proposition 4

$rank(\varvec{B}^{{\mathbb {J}}_1})=2^K$ if and only if $\varvec{Q}^{{\mathbb {J}}_1}=\varvec{I}_K$ and $g_j\ne 1-s_j$ for $j\in \{1,\dots ,K\}$.

Proof

According to Eq. (B5), we know that $\varvec{Q}^{{\mathbb {J}}_1}=\varvec{I}_K$ and $g_j\ne 1-s_j$ for $j\in \{1,\dots ,K\}$ imply $rank(\varvec{B}^{{\mathbb {J}}_1})=2^K$. On the other hand, $rank(\varvec{B}^{{\mathbb {J}}_1})=2^K$ implies that the columns of $\varvec{B}^{{\mathbb {J}}_1}$ are distinct. The $\varvec{\alpha }^\top \varvec{v}=c$ column of $\varvec{B}^{{\mathbb {J}}_1}$ is

$$\begin{aligned} \varvec{B}^{{\mathbb {J}}_1}_c = \bigotimes _{j=1}^K \left[ (1-g_j)^{1-\eta _{jc}}s_j^{\eta _{jc}},\ g_j^{1-\eta _{jc}} (1-s_j)^{\eta _{jc}}\right] , \end{aligned}$$

where $\eta _{jc}={\mathcal {I}}\left( \varvec{\alpha }_{t}^{\top } {\varvec{q}}_{j} \ge {\varvec{q}}_{j}^{\top } {\varvec{q}}_{j},\ \varvec{\alpha }_t^\top \varvec{v}=c\right) $, $c=0,\ldots ,2^K-1$. Therefore, for $k=1,\ldots ,K$, a full rank $\varvec{B}^{{\mathbb {J}}_1}$ implies that $\varvec{B}^{{\mathbb {J}}_1}_{0}\ne \varvec{B}^{{\mathbb {J}}_1}_{\varvec{e}_k^\top \varvec{v}}$ only if there exists at least one row $\varvec{q}_j$ in $\varvec{Q}^{{\mathbb {J}}_1}$ satisfying $\varvec{q}_j=\varvec{e}_k$ given $g_j\ne 1-s_j$. Therefore, we must have $\varvec{Q}^{{\mathbb {J}}_1}=\varvec{I}_K$ after a permutation of rows in $\varvec{Q}^{{\mathbb {J}}_1}$ and $g_j\ne 1-s_j$ for all $j\in \{1,\dots ,K\}$. $\square $

Proposition 5

$rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2$ if and only if $\varvec{Q}^{{\mathbb {J}}_2}$ contains at least one $\varvec{I}_K$ after a row permutation.

Proof

Similar to the proof in Appendix B, we can prove that $rank_K(\varvec{B}^{{\mathbb {J}}_2})=2^K\ge 2$ given $\varvec{Q}^{{\mathbb {J}}_2}$ contains at least one $\varvec{I}_K$ after a row permutation.

Given $rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2$, we know that every two columns in $\varvec{B}^{{\mathbb {J}}_2}$ are linearly independent according to Definition 5. Assume that for some $k\in \{1,\ldots ,K\}$, there does not exist a row in $\varvec{Q}^{{\mathbb {J}}_2}$ satisfying $\varvec{q}_j=\varvec{e}_k$, then we would have $\varvec{B}^{{\mathbb {J}}_2}_{0}=\varvec{B}^{{\mathbb {J}}_2}_{\varvec{e}_k^\top \varvec{v}}$, which contradicts with the condition $rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2$. Therefore, $\varvec{Q}^{{\mathbb {J}}_2}$ must contain at least one $\varvec{I}_K$ after a row permutation. $\square $

In Proposition 3, we already showed that if emission matrix $\varvec{B}$ is identified, then parameters $\varvec{s}$, $\varvec{g}$ and $\varvec{Q}$ in the restricted HMM can also be identified. Then, by Propositions 4 and 5, conditions (b) in Theorems 1 and 2 are all satisfied, which proves Theorems 5 and 6 with $T=2$.

Appendix E: Gibbs Sampling Step in Algorithm 1

The full conditional distributions of the parameters are shown as follows. For the attribute profiles $\varvec{\alpha }_{it}$, at time point $t=1$, given $\varvec{\alpha }_{i2} =\varvec{\alpha }_{c2}$,

$$\begin{aligned} P(\varvec{\alpha }_{i1}=\varvec{\alpha }_{c'1}\vert \varvec{Y}_i,\varvec{Q}, \varvec{s},\varvec{g},\varvec{\omega },\varvec{\alpha }_{i2}=\varvec{\alpha }_{c2})\propto p(\varvec{Y}_{i1}\vert \varvec{\alpha }_{i1}=\varvec{\alpha }_{c'1},\varvec{Q},\varvec{s}, \varvec{g})\cdot \pi _{c'} \cdot \omega _{c\vert c'}. \end{aligned}$$

(E1)

For $1<t<T$, given $\varvec{\alpha }_{i,t-1}=\varvec{\alpha }_{c,t-1}$ and $\varvec{\alpha }_{i,t+1}=\varvec{\alpha }_{c',t+1}$,

$$\begin{aligned} \begin{aligned}&P(\varvec{\alpha }_{it}=\varvec{\alpha }_{lt}\vert \varvec{Y}_i,\varvec{Q},\varvec{s},\varvec{g},\varvec{\omega },\varvec{\alpha }_{c,t-1},\varvec{\alpha }_{c^{'},t+1}) \\ {}&\quad \propto p(\varvec{Y}_i\vert \varvec{\alpha }_l,\varvec{Q},\varvec{s},\varvec{g})p(\varvec{\alpha }_{lt}\vert \varvec{\alpha }_{c,t-1},\varvec{\omega })p(\varvec{\alpha }_{c^{'},t+1}\vert \varvec{\alpha }_{lt},\varvec{\omega })\\&\quad \propto p(\varvec{Y}_{it}\vert \varvec{\alpha }_{lt},\varvec{Q},\varvec{s}, \varvec{g})\cdot \omega _{l\vert c} \cdot \omega _{c^{'}\vert l}. \end{aligned} \end{aligned}$$

(E2)

At time point $t=T$, given $\varvec{\alpha }_{i,T-1}=\varvec{\alpha }_{c,T-1}$,

$$\begin{aligned} P(\varvec{\alpha }_{iT}=\varvec{\alpha }_{lT}\vert \varvec{Y}_i,\varvec{Q},\varvec{s},\varvec{g},\varvec{\omega },\varvec{\alpha }_{c,T-1})\propto p(\varvec{Y}_{iT}\vert \varvec{\alpha }_{lT},\varvec{Q},\varvec{s}, \varvec{g})\cdot \omega _{l\vert c}. \end{aligned}$$

(E3)

For other parameters, we have

$$\begin{aligned}{} & {} p(\varvec{\pi }_1\vert \varvec{\alpha }_1)\propto \left( \prod _{i=1}^N\prod _{l=0}^{2^K-1} \pi _{1,l}^{{\mathcal {I}}(\varvec{\alpha }_{i1}=\varvec{\alpha }_l)}\right) p(\varvec{\pi }_1) \propto \prod _{l=0}^{2^K-1} \pi _{1,l}^{{\tilde{N}}_{0,l}+\delta _{0,l}-1}, \end{aligned}$$

(E4)

$$\begin{aligned}{} & {} p(\varvec{\omega }\vert \varvec{\alpha })\propto \left( \prod _{l=0}^{2^K-1}\prod _{c=0}^{2^K-1}\prod _{i=1}^{N}\prod _{t=2}^{T}P(\varvec{\alpha }_{it}=\varvec{\alpha }_c\vert \varvec{\alpha }_{i,t-1}=\varvec{\alpha }_l,\varvec{\omega }_l)\right) p(\varvec{\omega })\nonumber \\{} & {} \propto \prod _{l=0}^{2^K-1}\left( \prod _{c=0}^{2^K-1}\omega _{c\vert l}^{{\tilde{N}}_{c|l}}\right) p(\varvec{\omega }_l)\propto \prod _{l=0}^{2^K-1}\prod _{c=0}^{2^K-1}\omega _{c\vert l}^{{\tilde{N}}_{c|l}+\delta _{c|l}-1}, \end{aligned}$$

(E5)

$$\begin{aligned}{} & {} p(\varvec{Q}\vert \varvec{Y},\varvec{\alpha },\varvec{s},\varvec{g},\varvec{\omega })\propto p(\varvec{Y}\vert \varvec{\alpha },\varvec{s},\varvec{g},\varvec{Q})\cdot {\mathcal {I}}({\varvec{Q}} \in {\mathcal {Q}}). \end{aligned}$$

(E6)

Details about some of the prior and posterior distributions of parameters shown above could be found in Chen, Culpepper, Wang, and Douglas (2018).

Appendix F: Proof of Theorems 3 and 4

Proof

To prove part (a) of Theorem 3, we will apply Theorem 1 which requires $rank(\varvec{B})=2^K$. In Appendix B, we decompose matrix $\varvec{B}$ into two parts: $\varvec{B}=\varvec{B}^{{\mathbb {J}}_1}\odot \varvec{B}^{{\mathbb {J}}_2}$. Since emission probabilities in matrix $\varvec{B}$ are all positive due to the CDF $\Psi (\cdot )$, then it is sufficient to show that $rank(\varvec{B}^{{\mathbb {J}}_1})=2^K$. With condition (B1), we have $\varvec{D} = (\varvec{1}_K, \varvec{I}_K, \varvec{0})$, which implies a DINA model with K skills, K items and $\varvec{Q} = \varvec{I}_K$. Then, similar to Eq. (B5), we can rewrite the first part of the emission matrix $\varvec{B}^{{\mathbb {J}}_1}$ as

$$\begin{aligned} \varvec{B}^{{\mathbb {J}}_1}=\bigotimes _{j=1}^{K} \begin{bmatrix} 1-g_j &{} s_j\\ g_j &{} 1-s_j \end{bmatrix} =\bigotimes _{j=1}^{K} \begin{bmatrix} 1-\Psi (\beta _{j,0}) &{} 1-\Psi (\beta _{j,0}+\beta _{j,j})\\ \Psi (\beta _{j,0}) &{} \Psi (\beta _{j,0}+\beta _{j,j}) \end{bmatrix} :=\bigotimes _{j=1}^{K}\varvec{B}^{{\mathbb {J}}_1}_j. \end{aligned}$$

(F1)

Under condition (B1), we have $\Psi (\beta _{j,0}) \ne \Psi (\beta _{j,0}+\beta _{j,j})$, which implies $rank(\varvec{B}^{{\mathbb {J}}_1})=\prod _{j=1}^{K}rank( \varvec{B}^{{\mathbb {J}}_1}_j)=2^K$, so part (a) of Theorem 3 holds based on part (a) of Theorem 1. Also, part (a) of Theorem 4 can be proved similarly using the argument above and part (a) of Theorem 2.

To prove part (b) of Theorem 3, we will again apply Theorem 1 which requires $rank(\varvec{B}^{{\mathbb {J}}_1})=2^K$ and $rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2$. According to the proof shown above, we have $rank(\varvec{B}^{{\mathbb {J}}_1})=2^K$ under condition (B1). Under condition (B2), we have $rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2$, so part (b) of Theorem 3 holds based on part (b) of Theorem 1. Also, by the similar argument and part (b) of Theorem 2, we can prove that part (b) of Theorem 4 holds under conditions (B1)-(B2). $\square $

Appendix G: Proof of Remark 1

We start from representing the marginal distribution of $(\varvec{Y}_1, \varvec{Y}_2)^\top $ as a three-way array by decomposing $\varvec{Y}_1$ into two parts as shown in Eq. (B3):

$$\begin{aligned} \begin{aligned} \varvec{T}&(\varvec{y}_1^{{\mathbb {J}}_1},\varvec{y}_1^{{\mathbb {J}}_2},\varvec{y}_2) = P(\varvec{Y}_1^{{\mathbb {J}}_1}=\varvec{y}_1^{{\mathbb {J}}_1},\varvec{Y}_1^{{\mathbb {J}}_2}=\varvec{y}_1^{{\mathbb {J}}_2},\varvec{Y}_2=\varvec{y}_2\mid \varvec{\pi }_1, \varvec{\omega }, \varvec{B})\\&=\sum _{\varvec{\alpha }_1} \pi _{1,\varvec{\alpha }_1} P(\varvec{Y}_1^{{\mathbb {J}}_1}=\varvec{y}_1^{{\mathbb {J}}_1},\varvec{Y}_1^{{\mathbb {J}}_2}=\varvec{y}_1^{{\mathbb {J}}_2},\varvec{Y}_2=\varvec{y}_2\mid \varvec{\omega }, \varvec{B},\varvec{\alpha }_1)\\&=\sum _{\varvec{\alpha }_1} \pi _{1,\varvec{\alpha }_1} P(\varvec{Y}_1^{{\mathbb {J}}_1}=\varvec{y}_1^{{\mathbb {J}}_1}\mid \varvec{\omega }, \varvec{B},\varvec{\alpha }_1) P(\varvec{Y}_1^{{\mathbb {J}}_2}=\varvec{y}_1^{{\mathbb {J}}_2}\mid \varvec{\omega }, \varvec{B},\varvec{\alpha }_1)P(\varvec{Y}_2=\varvec{y}_2\mid \varvec{\omega }, \varvec{B},\varvec{\alpha }_1). \end{aligned} \end{aligned}$$

(G1)

As shown in Bonhomme et al. (2016), we let $\varvec{A}^*=\varvec{B}\cdot \varvec{\omega }^\top $ denote the distribution of $\varvec{Y}_2$ given values of $\varvec{\alpha }_1$ (attribute profile at time point 1). Then, the identifiability is equivalent to the uniqueness of the decomposition of the following tensor (Kruskal, 1977):

$$\begin{aligned} \begin{aligned} \varvec{T}&=\sum _{\varvec{\alpha }_1} \pi _{1,\varvec{\alpha }_1} \varvec{B}^{{\mathbb {J}}_1}_{\varvec{\alpha }_1}\odot \varvec{B}^{{\mathbb {J}}_2}_{\varvec{\alpha }_1}\odot \varvec{A}^{*}_{\varvec{\alpha }_1} =\sum _{\varvec{\alpha }_1} \tilde{\varvec{B}}^{{\mathbb {J}}_1}_{\varvec{\alpha }_1}\odot \varvec{B}^{{\mathbb {J}}_2}_{\varvec{\alpha }_1}\odot \varvec{A}^{*}_{\varvec{\alpha }_1}, \end{aligned} \end{aligned}$$

(G2)

where $\varvec{A}^{*}_{\varvec{\alpha }_1}$, $\varvec{B}^{{\mathbb {J}}_1}_{\varvec{\alpha }_1}$, $\varvec{B}^{{\mathbb {J}}_2}_{\varvec{\alpha }_1}$ are the $\varvec{\alpha }_1$-th column of $\varvec{A}^{*}$, $\varvec{B}^{{\mathbb {J}}_1}$, $\varvec{B}^{{\mathbb {J}}_2}$, and $\tilde{\varvec{B}}^{{\mathbb {J}}_1}_{\varvec{\alpha }_1}=\pi _{1,\varvec{\alpha }_1} \varvec{B}^{{\mathbb {J}}_1}_{\varvec{\alpha }_1}$. Since $\varvec{\pi }_1$ has all positive entries, then we have $rank_K(\tilde{\varvec{B}}^{{\mathbb {J}}_1})=rank_K(\varvec{B}^{{\mathbb {J}}_1})$. Moreover, $\varvec{A}^{*}$, $\varvec{B}^{{\mathbb {J}}_1}$ and $\varvec{B}^{{\mathbb {J}}_2}$ are all stochastic matrices with column sum 1, so by Theorem 7, the decomposition of the tensor $\varvec{T}$ is unique up to state label swapping if $rank_K(\varvec{A}^{*})+rank_K(\varvec{B}^{{\mathbb {J}}_1})+rank_K(\tilde{\varvec{B}}^{{\mathbb {J}}_2})\ge 2 \cdot 2^K +2$ holds.

We next establish sufficient conditions for the identifiability of the restricted HMM with $T=2$. Since $\varvec{A}^{*}=\varvec{B}\cdot \varvec{\omega }^\top $, the rank conditions on $\varvec{B}^{{\mathbb {J}}_1}$ and $\varvec{\omega }$ imply that $\varvec{A}^{*}$ also has full column rank $2^K$. Therefore, given $rank(\varvec{B}^{{\mathbb {J}}_1})=2^K$ and $rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2$, the HMM with $T=2$ is strictly identified by Theorem 7.

Similar to the proof for Theorem 2 in Appendix A, in order to prove that the restricted HMM with $T = 2$ is generically identified, we need to show that $rank(\varvec{B}^{{\mathbb {J}}_1})=2^K$, $rank(\varvec{B}^{{\mathbb {J}}_2})\ge 2$ and $rank(\varvec{\omega })=2^K$ hold almost everywhere in

$$\begin{aligned} \varvec{\Omega }^\prime _{\varvec{\omega }, \varvec{B}}=\{(\varvec{\omega },\varvec{B}):\ \varvec{\omega }\in \varvec{\Omega }(\varvec{\omega }),\ \varvec{B}\in \varvec{\Omega }(\varvec{B}), \ rank(\varvec{B}^{{\mathbb {J}}_1})=2^K\text { and } rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2\}, \end{aligned}$$

which is already proved in Appendix C.

Appendix H: Problem set of Elementary Probability Theory

The two sets of questions from R package pks (Heller & Wickelmaier, 2013) are shown as follows.

1.1 The first set of questions

1.
A box contains 30 marbles in the following colors: 8 red, 10 black, 12 yellow. What is the probability that a randomly drawn marble is yellow?
2.
A bag contains 5-cent, 10-cent, and 20-cent coins. The probability of drawing a 5-cent coin is 0.35, that of drawing a 10-cent coin is 0.25, and that of drawing a 20-cent coin is 0.40. What is the probability that the coin randomly drawn is not a 5-cent coin?
3.
A bag contains 5-cent, 10-cent, and 20-cent coins. The probability of drawing a 5-cent coin is 0.20, that of drawing a 10-cent coin is 0.45, and that of drawing a 20-cent coin is 0.35. What is the probability that the coin randomly drawn is a 5-cent coin or a 20-cent coin?
4.
In a school, 40% of the pupils are boys and 80% of the pupils are right-handed. Suppose that gender and handedness are independent. What is the probability of randomly selecting a right-handed boy?
5.
Given a standard deck containing 32 different cards, what is the probability of not drawing a heart?
6.
A box contains 20 marbles in the following colors: 4 white, 14 green, 2 red. What is the probability that a randomly drawn marble is not white?
7.
A box contains 10 marbles in the following colors: 2 yellow, 5 blue, 3 red. What is the probability that a randomly drawn marble is yellow or blue?
8.
What is the probability of obtaining an even number by throwing a dice?
9.
Given a standard deck containing 32 different cards, what is the probability of drawing a 4 in a black suit?
10.
A box contains marbles that are red or yellow, small or large. The probability of drawing a red marble is 0.70, the probability of drawing a small marble is 0.40. Suppose that the color of the marbles is independent of their size. What is the probability of randomly drawing a small marble that is not red?
11.
In a garage there are 50 cars, 20 are black and 10 are diesel powered. Suppose that the color of the cars is independent of the kind of fuel. What is the probability that a randomly selected car is not black and it is diesel powered?
12.
A box contains 20 marbles, 10 marbles are red, 6 are yellow and 4 are black. 12 marbles are small and 8 are large. Suppose that the color of the marbles is independent of their size. What is the probability of randomly drawing a small marble that is yellow or red?

1.2 The Second Set of Questions

1.
A box contains 30 marbles in the following colors: 10 red, 14 yellow, 6 green. What is the probability that a randomly drawn marble is green?
2.
A bag contains 5-cent, 10-cent, and 20-cent coins. The probability of drawing a 5-cent coin is 0.25, that of drawing a 10-cent coin is 0.60, and that of drawing a 20-cent coin is 0.15. What is the probability that the coin randomly drawn is not a 5-cent coin?
3.
A bag contains 5-cent, 10-cent, and 20-cent coins. The probability of drawing a 5-cent coin is 0.35, that of drawing a 10-cent coin is 0.20, and that of drawing a 20-cent coin is 0.45. What is the probability that the coin randomly drawn is a 5-cent coin or a 20-cent coin?
4.
In a school, 70% of the pupils are girls and 10% of the pupils are left-handed. Suppose that gender and handedness are independent. What is the probability of randomly selecting a left-handed girl?
5.
Given a standard deck containing 32 different cards, what is the probability of not drawing a club?
6.
A box contains 20 marbles in the following colors: 6 yellow, 10 red, 4 green. What is the probability that a randomly drawn marble is not yellow?
7.
A box contains 10 marbles in the following colors: 5 blue, 3 red, 2 green. What is the probability that a randomly drawn marble is red or blue?
8.
What is the probability of obtaining an odd number by throwing a dice?
9.
Given a standard deck containing 32 different cards, what is the probability of drawing a 10 in a red suit?
10.
A box contains marbles that are red or yellow, small or large. The probability of drawing a green marble is 0.40, the probability of drawing a large marble is 0.20. Suppose that the color of the marbles is independent of their size. What is the probability of randomly drawing a large marble that is not green?
11.
In a garage there are 50 cars, 15 are white and 20 are diesel powered. Suppose that the color of the cars is independent of the kind of fuel. What is the probability that a randomly selected car is not white and it is diesel powered?
12.
A box contains 20 marbles, 8 marbles are white, 4 are green and 8 are red. 15 marbles are small and 5 are large. Suppose that the color of the marbles is independent of their size. What is the probability of randomly drawing a large marble that is white or green?

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Y., Culpepper, S.A. & Chen, Y. Identifiability of Hidden Markov Models for Learning Trajectories in Cognitive Diagnosis. Psychometrika 88, 361–386 (2023). https://doi.org/10.1007/s11336-023-09904-x

Download citation

Received: 22 October 2021
Published: 16 February 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11336-023-09904-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifiability of Hidden Markov Models for Learning Trajectories in Cognitive Diagnosis

Abstract

Access this article

Similar content being viewed by others

An Joint Maximum Likelihood Estimation Approach to Cognitive Diagnosis Models

Identifiability and Cognitive Diagnosis Models

Bridging Parametric and Nonparametric Methods in Cognitive Diagnosis

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proof of Part (a) of Theorem 2 (\(T\ge 3\) case)

Definition 3

Lemma 1

Remark 7

Proposition 1

Proof

Proposition 2

Proof

Appendix B: Proof of Part (a) of Theorems 5 and 6 (\(T\ge 3\) case)

Proposition 3

Proof

Definition 4

Remark 8

Appendix C: Proof of Part (b) of Theorems 1 and 2 (\(T=2\) case)

Definition 5

Remark 9

Theorem 7

Appendix D: Proof of Part (b) of Theorems 5 and 6 (\(T=2\) case)

Proposition 4

Proof

Proposition 5

Proof

Appendix E: Gibbs Sampling Step in Algorithm 1

Appendix F: Proof of Theorems 3 and 4

Proof

Appendix G: Proof of Remark 1

Appendix H: Problem set of Elementary Probability Theory

1.1 The first set of questions

1.2 The Second Set of Questions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation