Abstract
Hidden Markov models (HMMs) have been applied in various domains, which makes the identifiability issue of HMMs popular among researchers. Classical identifiability conditions shown in previous studies are too strong for practical analysis. In this paper, we propose generic identifiability conditions for discrete time HMMs with finite state space. Also, recent studies about cognitive diagnosis models (CDMs) applied first-order HMMs to track changes in attributes related to learning. However, the application of CDMs requires a known \(\varvec{Q}\) matrix to infer the underlying structure between latent attributes and items, and the identifiability constraints of the model parameters should also be specified. We propose generic identifiability constraints for our restricted HMM and then estimate the model parameters, including the \(\varvec{Q}\) matrix, through a Bayesian framework. We present Monte Carlo simulation results to support our conclusion and apply the developed model to a real dataset.
Similar content being viewed by others
References
Allman, E. S., Matias, C., & Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics, 37(6A), 3099–3132.
Baras, J. S., & Finesso, L. (1992). Consistent estimation of the order of hidden Markov chains. In T. E. Duncan & B. Pasik-Duncan (Eds.), Stochastic theory and adaptive control (pp. 26–39). Berlin & Heidelberg: Springer.
Blasiak, S., & Rangwala, H. (2011). A hidden Markov model variant for sequence classification. In: Proceedings of the twenty-second international joint conference on artificial intelligence - volume two ( 1192–1197). AAAI Press.
Bonhomme, S., Jochmans, K., & Robin, J. M. (2016). Estimating multivariate latent-structure models. The Annals of Statistics, 44(2), 540–563.
Brooks, S. P., & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434–455.
Chen, Y., Culpepper, S., Chen, Y., & Douglas, J. (2018). Bayesian estimation of the DINA Q matrix. Psychometrika, 83(1), 89–108.
Chen, Y., Culpepper, S. A., & Liang, F. (2020). A sparse latent class model for cognitive diagnosis, Psychometrika, 85, 121–153.
Chen, Y., Culpepper, S., Wang, S., & Douglas, J. (2018). A hidden Markov model for learning trajectories in cognitive diagnosis with application to spatial rotation skills. Applied Psychological Measurement, 42(1), 5–23.
Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110(510), 850–866.
Chen, Y., Liu, Y., Culpepper, S. A., & Chen, Y. (2021). Inferring the number of attributes for the exploratory DINA model. Psychometrika, 86(1), 30–64.
Chiu, C. Y., Douglas, J., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74, 633–665.
Cox, D. A., Little, J., & O’Shea, D. (2015). Ideals, varieties, and algorithms. New York: Springer.
Crouse, M. S., Nowak, R. D., & Baraniuk, R. G. (1998). Wavelet-based statistical signal processing using hidden Markov models. IEEE Transactions on Signal Processing, 46(4), 886–902.
Culpepper, S. A. (2015). Bayesian estimation of the DINA model with Gibbs sampling. Journal of Educational and Behavioral Statistics, 40(5), 454–476.
De La Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.
Gu, Y., & Xu, G. (2021). Sufficient and necessary conditions for the identifiability of the Q-matrix. Statistica Sinica, 31, 449–472.
Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301–321.
Heller, J., & Wickelmaier, F. (2013). Minimum discrepancy estimation in probabilistic knowledge structures. Electronic Notes in Discrete Mathematics, 42, 49–56.
Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210.
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.
Khatri, C. G., & Rao, C. R. (1968). Solutions to some functional equations and their applications to characterization of probability distributions. Sankhya: The Indian Journal of Statistics, Series A, 30(2), 167–180.
Kruskal, J. (1977). Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and its Applications, 18, 95–138.
Lathauwer, L. D., Moor, B. D., & Vandewalle, J. (2004). Computation of the canonical decomposition by means of a simultaneous generalized schur decomposition. SIAM Journal on Matrix Analysis and Applications, 26, 295–327.
Matsaglia, G., & Styan, G. P. H. (1974). Equalities and inequalities for ranks of matrices. Linear and Multilinear Algebra, 2(3), 269–292.
Paz, A. (1971). Stochastic Sequential Machines. In A. Paz (Ed.), Introduction to probabilistic automata (pp. 1–66). Academic Press.
Petrie, T. (1969). Probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics, 40(1), 97–115.
Sipos, I. R., Ceffer, A., & Levendovszky, J. (2017). Parallel optimization of sparse portfolios with AR-HMMs. Computational Economics, 49, 563–578.
Von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61(2), 287–307.
Xu, G. (2017). Identifiability of restricted latent class models with binary responses. Annals of Statistics, 45(2), 675–707.
Funding
The authors gratefully acknowledge the financial support of the NSF Grant Nos. SES-1758631, SES-1951057, and SES 21-50628.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of Part (a) of Theorem 2 (\(T\ge 3\) case)
We start with introducing some basic terminology and facts from algebraic geometry.
Definition 3
(Cox, Little, and O’Shea (2015)) An algebraic variety V is defined as the simultaneous zero-set of a finite collection of multivariate polynomials \(\{f_i\}_{i=1}^{n}\subset {\mathbb {C}}[\,x_1,\ldots ,x_k ]\,\),
Here \({\mathbb {C}}[\,x_1,\ldots ,x_k ]\) represents the set of all polynomials in \(x_1,\ldots ,x_k\) with coefficients in \({\mathbb {C}}\), and \({\mathbb {C}}^k\) is the set of k-dimensional complex numbers.
Lemma 1
(Allman et al. (2009)) A variety is all of \({\mathbb {C}}^k\) only when all \(f_ i\) are 0; otherwise, a variety is called a proper subvariety and must be of dimension less than k, and of Lebesgue measure 0 in \({\mathbb {C}}^k\).
Remark 7
In Lemma 1, analogous statements still hold if we replace \({\mathbb {C}}^k\) by \({\mathbb {R}}^k\).
In order to show generic identifiability of model parameters, we can prove that all nonidentifiable parameter choices lie within a proper subvariety, and thus form a set of Lebesgue measure zero based on Lemma 1.
Proposition 1
\(rank(\varvec{B})=rank(\varvec{\omega })=2^K\) if and only if \(rank(\varvec{B} \cdot \varvec{\omega })=2^K\).
Proof
By Sylvester’s rank inequality (Matsaglia & Styan, 1974), we have
so the proposition holds. \(\square \)
By Proposition 1 and part (a) of Theorem 1, we only need to show that \(rank(\varvec{B}\cdot \varvec{\omega })=2^K\) holds almost everywhere in \(\varvec{\Omega }_{\varvec{\omega }, \varvec{B}}=\{(\varvec{\omega },\varvec{B}):\ \varvec{\omega }\in \varvec{\Omega }(\varvec{\omega }),\ \varvec{B}\in \varvec{\Omega }(\varvec{B}) \text { and } rank(\varvec{B})=2^K\}\).
Let M be a subset of \(\{1,\ldots ,2^J\}\) with \(2^K\) elements, and then, let \([\varvec{B} \cdot \varvec{\omega }]_M\) denote the minor of a submatrix in \(\varvec{B} \cdot \varvec{\omega }\) that corresponds to the rows with indices in M. Let
denote the summation of all squared minors of order \(2^K\) of matrix \(\varvec{B} \cdot \varvec{\omega }\).
Since \(f(\varvec{B}, \varvec{\omega })\) is a polynomial function of \(\varvec{B}\) and \(\varvec{\omega }\), and we know that the rank of \(\varvec{B}\cdot \varvec{\omega }\) is the maximal order of a nonzero minor of \(\varvec{B}\cdot \varvec{\omega }\), then by Proposition 1, we can write the zero set of \(f(\varvec{B}, \varvec{\omega })\) as:
In the following, we will show that \(f(\varvec{B}, \varvec{\omega })\) is not a constant zero function.
Proposition 2
If \(rank(\varvec{B})=2^K\), then there exists some nonsingular \(\varvec{\omega }\), such that \(f(\varvec{B}, \varvec{\omega })\ne 0\).
Proof
Given a full column rank \(\varvec{B}\), there must exist a nonzero minor of order \(2^K\) in \(\varvec{B}\). Without loss of generality, we assume that the first \(2^K\) rows of \(\varvec{B}\), denoted by \(\varvec{B}^*\), satisfy \(\det (\varvec{B}^*)\ne 0\); then, \(\det (\varvec{B}^*)\) is a nonzero minor of order \(2^K\). Let \(\varvec{B}= (\varvec{B}^{*\top },\varvec{B}'^\top )^{\top }\). In order to show that \(f(\varvec{B}, \varvec{\omega })\ne 0\) for some nonsingular \(\varvec{\omega }\), it is enough to show that \(\varvec{B} \cdot \varvec{\omega }\) has full column rank for some specific choice of nonsingular \(\varvec{\omega }\), since that will establish that some minors of order \(2^K\) of \(\varvec{B} \cdot \varvec{\omega }\) are nonzero polynomials in the entries of \(\varvec{B}\) and \(\varvec{\omega }\).
For any nonsingular \(\varvec{\omega }\), we have
Since \(\det (\varvec{B}^*)\) is a nonzero minor of \(\varvec{B}\) and \(\varvec{\omega }\) is a nonsingular matrix, then \(rank(\varvec{B}^*\cdot \varvec{\omega })=2^K\). Therefore, \(\det (\varvec{B}^*\cdot \varvec{\omega })\) is a nonzero minor of \(\varvec{B} \cdot \varvec{\omega }\), which implies \(rank(\varvec{B} \cdot \varvec{\omega })=2^K\) and \(f(\varvec{B}, \varvec{\omega })\ne 0\). \(\square \)
Therefore, by Lemma 1, the zero set \(\varvec{Z}_f\) has measure zero within \(\varvec{\Omega }_{\varvec{\omega }, \varvec{B}}\). The HMM with \(T\ge 3\) is generically identified.
Appendix B: Proof of Part (a) of Theorems 5 and 6 (\(T\ge 3\) case)
We first show that if emission matrix \(\varvec{B}\) is identified, then parameters \(\varvec{s}\), \(\varvec{g}\), and \(\varvec{Q}\) in a restricted HMM can also be identified.
Proposition 3
For any \(\varvec{B}, \varvec{B}^\prime \in \varvec{\Omega }(\varvec{B})\), \(\varvec{s},\varvec{s}^\prime \in (0,1)^J\), \(\varvec{g},\varvec{g}^\prime \in (0,1)^J\) and \(\varvec{Q},\varvec{Q}^\prime \in \{0,1\}^{J\times K}\), we have
Proof
It suffices to show that given \(\varvec{B}=\varvec{B}^\prime \), we must have \((\varvec{s}, \varvec{g}, \varvec{Q})=(\varvec{s}^\prime , \varvec{g}^\prime , \varvec{Q}^\prime )\).
For \(j\in \{1,2,\ldots ,J\}\), let \(\varvec{D}_j\) be the matrix such that \(\varvec{D}_j\varvec{B}\) and \(\varvec{D}_j\varvec{B}'\) reduce to the \(2\times 2^K\) matrix of conditional probabilities for \(Y_j\) given \(\varvec{\alpha }_t\). For instance, the second row of \(\varvec{D}_j\varvec{B}\) is \(P(Y_j=1 \mid \varvec{\alpha }_t)\):
where \(\eta _{jc}={\mathcal {I}}\left( \varvec{\alpha }_{t}^{\top } {\varvec{q}}_{j} \ge {\varvec{q}}_{j}^{\top } {\varvec{q}}_{j},\ \varvec{\alpha }_t^\top \varvec{v}=c\right) \), \(c=0,\ldots ,2^K-1\). Note that \(\eta _{j,2^K-1}=\eta _{j,2^K-1}'=1\) and the assumption that \(\varvec{q}_j\ne \varvec{0}\) and \(\varvec{q}_j'\ne \varvec{0}\) implies \(\eta _{j0}=\eta _{j0}'=0\). Therefore, \(\varvec{D}_j\varvec{B}=\varvec{D}_j\varvec{B}'\) implies that \(g_j=g_j'\), \(s_j=s_j'\). Also, for \(c\in \{1,\dots ,2^K-2\}\) we have
and \(g_j\ne 1-s_j\) implies that \(\eta _{jc}\ne \eta _{jc}'\) is not possible, so \(\varvec{q}_j=\varvec{q}_j'\). \(\square \)
The emission matrix \(\varvec{B}\) is of size \(2^J \times 2^K\), and we use \(\varvec{B}_{ \varvec{y}_t, \varvec{\alpha }_t}\) to denote the element corresponding to the row with response pattern \(\varvec{y}_t\) (we refer to it as the \(\varvec{y}_t\)-th row) and column with attribute profile \(\varvec{\alpha }_t\) (we refer to it as the \(\varvec{\alpha }_t\)-th column), so \(\varvec{B}_{ \varvec{y}_t, \varvec{\alpha }_t}\) is the emission probability
where \(\theta _{j,\varvec{\alpha }_t}=(1-s_{j})^{\eta _{jt}} g_{j}^{\left( 1-\eta _{jt}\right) }\) and \(\eta _{jt}={\mathcal {I}}\left( \varvec{\alpha }_{t}^{\top } {\varvec{q}}_{j} \ge {\varvec{q}}_{j}^{\top } {\varvec{q}}_{j}\right) \).
As mentioned in Sect. 2.3, we have a bipartition of the set \({\mathbb {J}}=\{1,2,\ldots ,J\}\) into two disjoint, nonempty subsets \({\mathbb {J}}_1=\{1,2,\ldots ,K\}\), \({\mathbb {J}}_2=\{K+1,\ldots ,J\}\). Then, let \(\varvec{Y}_t=(\varvec{Y}_{t}^{{\mathbb {J}}_1\top },\varvec{Y}_{t}^{{\mathbb {J}}_2\top })^{\top }\), where \(\varvec{Y}_{t}^{{\mathbb {J}}_1}=(Y_{1t},\ldots , Y_{Kt})^{\top }\) and \(\varvec{Y}_{t}^{{\mathbb {J}}_2}=(Y_{(K+1)t},\ldots , Y_{Jt})^{\top }\). Assuming that the \(\varvec{Q}\) matrix has the form shown in condition (A1), let
and without loss of generality, let \(\varvec{Q}^{{\mathbb {J}}_1}=\varvec{I}_{K}\) and \(\varvec{Q}^{{\mathbb {J}}_2}=\varvec{Q}^{*}\). Then, the emission probability can be decomposed into two parts since the components of \(\varvec{Y}_t\) are independent given profile \(\varvec{\alpha }_t\):
Similarly, the emission matrix \(\varvec{B}\) can also be decomposed into two parts. Let \(\varvec{B}^{{\mathbb {J}}_1}\) be a matrix of size \(2^K \times 2^K\), where its \(\varvec{y}_t^{{\mathbb {J}}_1}\)-th row and \(\varvec{\alpha }_t\)-th column element is \(P(\varvec{Y}_t^{{\mathbb {J}}_1}=\varvec{y}_t^{{\mathbb {J}}_1} \vert \varvec{\alpha }_t, \varvec{I}_K, \varvec{s}, \varvec{g})\); and let \(\varvec{B}^{{\mathbb {J}}_2}\) be a matrix of size \(2^{(J-K)} \times 2^K\), where its \(\varvec{y}_t^{{\mathbb {J}}_2}\)-th row and \(\varvec{\alpha }_t\)-th column element is \(P(\varvec{Y}_t^{{\mathbb {J}}_2}=\varvec{y}_t^{{\mathbb {J}}_2} \vert \varvec{\alpha }_t, \varvec{Q}^{*}, \varvec{s}, \varvec{g})\). Therefore, the emission matrix \(\varvec{B}\) can be decomposed as
where \(\odot \) represents column-wise tensor product, which is defined next.
Definition 4
(Khatri–Rao product; Khatri and Rao (1968)) Given matrices \(\varvec{U} \in {\mathbb {R}}^{m_1 \times n}\) and \(\varvec{V} \in {\mathbb {R}}^{m_2 \times n}\) with columns \(\varvec{u}_{1},\ldots ,\varvec{u}_{n}\) and \({\textbf{v}}_{1},\ldots ,{\textbf{v}}_{n}\), respectively, their Khatri–Rao tensor product is denoted by \(\varvec{U} \odot \varvec{V}\). The result is a matrix of size \((m_1 m_2) \times n\)
Remark 8
If \({\textbf{u}}\) and \({\textbf{v}}\) are vectors, then the Khatri–Rao product and Kronecker product are identical, i.e., \({\textbf{u}} \odot {\textbf{v}}={\textbf{u}} \otimes {\textbf{v}}\).
We can represent \(\varvec{B}^{{\mathbb {J}}_1}\) as the Kronecker product of K \(2\times 2\) sub-matrices (Chen, Culpepper, & Liang, 2020)
Condition ‘\(g_j\ne 1-s_j\)’ in Theorem 5 implies that \(rank(\varvec{B}^{{\mathbb {J}}_1}_j)=2\) for all j. Then, according to the property of the rank of a Kronecker product, we have \(rank(\varvec{B}^{{\mathbb {J}}_1})=\prod _{j=1}^{K}rank( \varvec{B}^{{\mathbb {J}}_1}_j)=2^K\), which implies that \(\varvec{B}^{{\mathbb {J}}_1}\) is a full rank matrix.
For the decomposition in Eq. (B4), we have
where \(\varvec{D}_{k}(\varvec{B}^{{\mathbb {J}}_2})\) denotes the diagonal matrix with the k-th row of \(\varvec{B}^{{\mathbb {J}}_2}\) lying on its diagonal. Here \(\varvec{D}_{1}(\varvec{B}^{{\mathbb {J}}_2})\) has full rank since \(s_j\), \(1-s_j\), \(g_j\), \(1-g_j\) are nonzero, which implies that
then \(\det (\varvec{B}^{{\mathbb {J}}_1}\varvec{D}_{1}(\varvec{B}^{{\mathbb {J}}_2}))\) is a nonzero minor of \(\varvec{B}^{{\mathbb {J}}_1}\odot \varvec{B}^{{\mathbb {J}}_2}\) with order \(2^K\), so we have
Also \(\pi _c>0\) for all c in Theorem 5. Therefore, the strict identifiability condition in Theorem 1 is satisfied, and the restricted HMM with \(T\ge 3\) is identified. This completes the proof of part (a) of Theorem 5.
Without the condition ‘\(g_j\ne 1-s_j\)’, \(\varvec{B}\) has full column rank unless there exists at least one \(j^*\), such that \(g_{j^*} = 1-s_{j^*}\). Then, the dimension of this exceptional set is less than the dimension of \(\varvec{\Omega }(\varvec{\pi }, \varvec{\omega }, \varvec{s}, \varvec{g}, \varvec{Q})\), hence of Lebesgue measure zero. Therefore, the generic identifiability condition in Theorem 2 is satisfied, and the restricted HMM with \(T\ge 3\) is generically identified. This completes the proof of part (a) of Theorem 6.
Appendix C: Proof of Part (b) of Theorems 1 and 2 (\(T=2\) case)
The proof is based on Kruskal (1977) for the uniqueness of three-way arrays and its application on the identifiability conditions of three-variate latent class models discussed in Allman et al. (2009).
We start from representing the marginal distribution of \((\varvec{Y}_1, \varvec{Y}_2)^\top \) as a three-way array by decomposing \(\varvec{Y}_2\) into two parts as shown in Eq. (B3):
As shown in Bonhomme et al. (2016), we let \(\varvec{A}=\varvec{B}\cdot diag(\varvec{\pi })\cdot \varvec{\omega }\cdot diag(\varvec{\pi })^{-1}\) denote the distribution of \(\varvec{Y}_1\) given values of \(\varvec{\alpha }_2\) (attribute profile at time point 2). Then, the identifiability is equivalent to the uniqueness of the decomposition of the following tensor (Kruskal, 1977):
where \(\varvec{A}_{\varvec{\alpha }_2}\), \(\varvec{B}^{{\mathbb {J}}_1}_{\varvec{\alpha }_2}\), \(\varvec{B}^{{\mathbb {J}}_2}_{\varvec{\alpha }_2}\) are the \(\varvec{\alpha }_2\)-th column of \(\varvec{A}\), \(\varvec{B}^{{\mathbb {J}}_1}\), \(\varvec{B}^{{\mathbb {J}}_2}\), and \(\tilde{\varvec{A}}_{\varvec{\alpha }_2}=\pi _{\varvec{\alpha }_2} \varvec{A}_{\varvec{\alpha }_2}\).
Next, we give the definition of Kruskal rank and state the theorem in Kruskal (1977) for our setting.
Definition 5
For a matrix \(\varvec{M}\), the Kruskal rank of \(\varvec{M}\), i.e., \(rank_K(\varvec{M})\), is the largest number I such that every set of I columns in \(\varvec{M}\) are linearly independent.
Remark 9
Compared with the rank of a matrix \(\varvec{M}\), we have \(rank_K(\varvec{M})\le rank(\varvec{M})\). If \(\varvec{M}\) has full column rank, then the equality holds.
Theorem 7
(Kruskal (1977)) If
then the tensor decomposition of \(\varvec{T}\) is unique up to simultaneous permutation and rescaling of the rows.
Since \(\varvec{\pi }\) has all positive entries, then we have \(rank_K(\tilde{\varvec{A}})=rank_K(\varvec{A})\). Moreover, \(\varvec{A}\), \(\varvec{B}^{{\mathbb {J}}_1}\) and \(\varvec{B}^{{\mathbb {J}}_2}\) are all stochastic matrices with column sum 1, so the decomposition of the tensor \(\varvec{T}\) is unique up to state label swapping if (C3) in Theorem 7 is satisfied.
Bonhomme et al. (2016) established strict identifiability of HMMs for \(T>2\). We next establish sufficient conditions for the identifiability of the restricted HMM with \(T=2\). Since \(\varvec{A}=\varvec{B}\cdot diag(\varvec{\pi })\cdot \varvec{\omega }\cdot diag(\varvec{\pi })^{-1}\), the rank conditions on \(\varvec{B}^{{\mathbb {J}}_1}\) and \(\varvec{\omega }\) imply that \(\varvec{A}\) also has full column rank \(2^K\). Therefore, given \(rank(\varvec{B}^{{\mathbb {J}}_1})=2^K\) and \(rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2\), the HMM with \(T=2\) is identified by Theorem 7. This completes the proof of part (b) of Theorem 1.
Following the similar idea as the proof for Theorem 2 in Appendix A, we need to show that \(rank(\varvec{B}^{{\mathbb {J}}_1})=2^K\), \(rank(\varvec{B}^{{\mathbb {J}}_2})\ge 2\) and \(rank(\varvec{\omega })=2^K\) hold almost everywhere in
which implies that the restricted HMM with \(T=2\) is generically identified.
Let \(f^\prime (\varvec{B}, \varvec{\omega })=\sum _{M}([\varvec{B} \cdot \varvec{\omega }]_M)^2:\ \varvec{\Omega }^\prime _{\varvec{\omega }, \varvec{B}}\rightarrow {\mathbb {R}}\), then the zero set of \(f^\prime (\varvec{B}, \varvec{\omega })\) is
As shown in Appendix B, \(rank(\varvec{B}^{{\mathbb {J}}_1})=2^K\) and \(rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2\) imply \(rank(\varvec{B})=rank(\varvec{B}^{{\mathbb {J}}_1}\odot \varvec{B}^{{\mathbb {J}}_2})=2^K\). By Proposition 2, we know that \(f^\prime (\varvec{B}, \varvec{\omega })\) is not a zero function. Then, by Lemma 1, the zero set \(\varvec{Z}_f^\prime \) has measure zero within \(\varvec{\Omega }^\prime _{\varvec{\omega }, \varvec{B}}\). So the restricted HMM with \(T= 2\) is generically identified. This completes the proof of part(b) of Theorem 2.
Appendix D: Proof of Part (b) of Theorems 5 and 6 (\(T=2\) case)
We first introduce the following two propositions.
Proposition 4
\(rank(\varvec{B}^{{\mathbb {J}}_1})=2^K\) if and only if \(\varvec{Q}^{{\mathbb {J}}_1}=\varvec{I}_K\) and \(g_j\ne 1-s_j\) for \(j\in \{1,\dots ,K\}\).
Proof
According to Eq. (B5), we know that \(\varvec{Q}^{{\mathbb {J}}_1}=\varvec{I}_K\) and \(g_j\ne 1-s_j\) for \(j\in \{1,\dots ,K\}\) imply \(rank(\varvec{B}^{{\mathbb {J}}_1})=2^K\). On the other hand, \(rank(\varvec{B}^{{\mathbb {J}}_1})=2^K\) implies that the columns of \(\varvec{B}^{{\mathbb {J}}_1}\) are distinct. The \(\varvec{\alpha }^\top \varvec{v}=c\) column of \(\varvec{B}^{{\mathbb {J}}_1}\) is
where \(\eta _{jc}={\mathcal {I}}\left( \varvec{\alpha }_{t}^{\top } {\varvec{q}}_{j} \ge {\varvec{q}}_{j}^{\top } {\varvec{q}}_{j},\ \varvec{\alpha }_t^\top \varvec{v}=c\right) \), \(c=0,\ldots ,2^K-1\). Therefore, for \(k=1,\ldots ,K\), a full rank \(\varvec{B}^{{\mathbb {J}}_1}\) implies that \(\varvec{B}^{{\mathbb {J}}_1}_{0}\ne \varvec{B}^{{\mathbb {J}}_1}_{\varvec{e}_k^\top \varvec{v}}\) only if there exists at least one row \(\varvec{q}_j\) in \(\varvec{Q}^{{\mathbb {J}}_1}\) satisfying \(\varvec{q}_j=\varvec{e}_k\) given \(g_j\ne 1-s_j\). Therefore, we must have \(\varvec{Q}^{{\mathbb {J}}_1}=\varvec{I}_K\) after a permutation of rows in \(\varvec{Q}^{{\mathbb {J}}_1}\) and \(g_j\ne 1-s_j\) for all \(j\in \{1,\dots ,K\}\). \(\square \)
Proposition 5
\(rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2\) if and only if \(\varvec{Q}^{{\mathbb {J}}_2}\) contains at least one \(\varvec{I}_K\) after a row permutation.
Proof
Similar to the proof in Appendix B, we can prove that \(rank_K(\varvec{B}^{{\mathbb {J}}_2})=2^K\ge 2\) given \(\varvec{Q}^{{\mathbb {J}}_2}\) contains at least one \(\varvec{I}_K\) after a row permutation.
Given \(rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2\), we know that every two columns in \(\varvec{B}^{{\mathbb {J}}_2}\) are linearly independent according to Definition 5. Assume that for some \(k\in \{1,\ldots ,K\}\), there does not exist a row in \(\varvec{Q}^{{\mathbb {J}}_2}\) satisfying \(\varvec{q}_j=\varvec{e}_k\), then we would have \(\varvec{B}^{{\mathbb {J}}_2}_{0}=\varvec{B}^{{\mathbb {J}}_2}_{\varvec{e}_k^\top \varvec{v}}\), which contradicts with the condition \(rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2\). Therefore, \(\varvec{Q}^{{\mathbb {J}}_2}\) must contain at least one \(\varvec{I}_K\) after a row permutation. \(\square \)
In Proposition 3, we already showed that if emission matrix \(\varvec{B}\) is identified, then parameters \(\varvec{s}\), \(\varvec{g}\) and \(\varvec{Q}\) in the restricted HMM can also be identified. Then, by Propositions 4 and 5, conditions (b) in Theorems 1 and 2 are all satisfied, which proves Theorems 5 and 6 with \(T=2\).
Appendix E: Gibbs Sampling Step in Algorithm 1
The full conditional distributions of the parameters are shown as follows. For the attribute profiles \(\varvec{\alpha }_{it}\), at time point \(t=1\), given \(\varvec{\alpha }_{i2} =\varvec{\alpha }_{c2}\),
For \(1<t<T\), given \(\varvec{\alpha }_{i,t-1}=\varvec{\alpha }_{c,t-1}\) and \(\varvec{\alpha }_{i,t+1}=\varvec{\alpha }_{c',t+1}\),
At time point \(t=T\), given \(\varvec{\alpha }_{i,T-1}=\varvec{\alpha }_{c,T-1}\),
For other parameters, we have
Details about some of the prior and posterior distributions of parameters shown above could be found in Chen, Culpepper, Wang, and Douglas (2018).
Appendix F: Proof of Theorems 3 and 4
Proof
To prove part (a) of Theorem 3, we will apply Theorem 1 which requires \(rank(\varvec{B})=2^K\). In Appendix B, we decompose matrix \(\varvec{B}\) into two parts: \(\varvec{B}=\varvec{B}^{{\mathbb {J}}_1}\odot \varvec{B}^{{\mathbb {J}}_2}\). Since emission probabilities in matrix \(\varvec{B}\) are all positive due to the CDF \(\Psi (\cdot )\), then it is sufficient to show that \(rank(\varvec{B}^{{\mathbb {J}}_1})=2^K\). With condition (B1), we have \(\varvec{D} = (\varvec{1}_K, \varvec{I}_K, \varvec{0})\), which implies a DINA model with K skills, K items and \(\varvec{Q} = \varvec{I}_K\). Then, similar to Eq. (B5), we can rewrite the first part of the emission matrix \(\varvec{B}^{{\mathbb {J}}_1}\) as
Under condition (B1), we have \(\Psi (\beta _{j,0}) \ne \Psi (\beta _{j,0}+\beta _{j,j})\), which implies \(rank(\varvec{B}^{{\mathbb {J}}_1})=\prod _{j=1}^{K}rank( \varvec{B}^{{\mathbb {J}}_1}_j)=2^K\), so part (a) of Theorem 3 holds based on part (a) of Theorem 1. Also, part (a) of Theorem 4 can be proved similarly using the argument above and part (a) of Theorem 2.
To prove part (b) of Theorem 3, we will again apply Theorem 1 which requires \(rank(\varvec{B}^{{\mathbb {J}}_1})=2^K\) and \(rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2\). According to the proof shown above, we have \(rank(\varvec{B}^{{\mathbb {J}}_1})=2^K\) under condition (B1). Under condition (B2), we have \(rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2\), so part (b) of Theorem 3 holds based on part (b) of Theorem 1. Also, by the similar argument and part (b) of Theorem 2, we can prove that part (b) of Theorem 4 holds under conditions (B1)-(B2). \(\square \)
Appendix G: Proof of Remark 1
We start from representing the marginal distribution of \((\varvec{Y}_1, \varvec{Y}_2)^\top \) as a three-way array by decomposing \(\varvec{Y}_1\) into two parts as shown in Eq. (B3):
As shown in Bonhomme et al. (2016), we let \(\varvec{A}^*=\varvec{B}\cdot \varvec{\omega }^\top \) denote the distribution of \(\varvec{Y}_2\) given values of \(\varvec{\alpha }_1\) (attribute profile at time point 1). Then, the identifiability is equivalent to the uniqueness of the decomposition of the following tensor (Kruskal, 1977):
where \(\varvec{A}^{*}_{\varvec{\alpha }_1}\), \(\varvec{B}^{{\mathbb {J}}_1}_{\varvec{\alpha }_1}\), \(\varvec{B}^{{\mathbb {J}}_2}_{\varvec{\alpha }_1}\) are the \(\varvec{\alpha }_1\)-th column of \(\varvec{A}^{*}\), \(\varvec{B}^{{\mathbb {J}}_1}\), \(\varvec{B}^{{\mathbb {J}}_2}\), and \(\tilde{\varvec{B}}^{{\mathbb {J}}_1}_{\varvec{\alpha }_1}=\pi _{1,\varvec{\alpha }_1} \varvec{B}^{{\mathbb {J}}_1}_{\varvec{\alpha }_1}\). Since \(\varvec{\pi }_1\) has all positive entries, then we have \(rank_K(\tilde{\varvec{B}}^{{\mathbb {J}}_1})=rank_K(\varvec{B}^{{\mathbb {J}}_1})\). Moreover, \(\varvec{A}^{*}\), \(\varvec{B}^{{\mathbb {J}}_1}\) and \(\varvec{B}^{{\mathbb {J}}_2}\) are all stochastic matrices with column sum 1, so by Theorem 7, the decomposition of the tensor \(\varvec{T}\) is unique up to state label swapping if \(rank_K(\varvec{A}^{*})+rank_K(\varvec{B}^{{\mathbb {J}}_1})+rank_K(\tilde{\varvec{B}}^{{\mathbb {J}}_2})\ge 2 \cdot 2^K +2\) holds.
We next establish sufficient conditions for the identifiability of the restricted HMM with \(T=2\). Since \(\varvec{A}^{*}=\varvec{B}\cdot \varvec{\omega }^\top \), the rank conditions on \(\varvec{B}^{{\mathbb {J}}_1}\) and \(\varvec{\omega }\) imply that \(\varvec{A}^{*}\) also has full column rank \(2^K\). Therefore, given \(rank(\varvec{B}^{{\mathbb {J}}_1})=2^K\) and \(rank_K(\varvec{B}^{{\mathbb {J}}_2})\ge 2\), the HMM with \(T=2\) is strictly identified by Theorem 7.
Similar to the proof for Theorem 2 in Appendix A, in order to prove that the restricted HMM with \(T = 2\) is generically identified, we need to show that \(rank(\varvec{B}^{{\mathbb {J}}_1})=2^K\), \(rank(\varvec{B}^{{\mathbb {J}}_2})\ge 2\) and \(rank(\varvec{\omega })=2^K\) hold almost everywhere in
which is already proved in Appendix C.
Appendix H: Problem set of Elementary Probability Theory
The two sets of questions from R package pks (Heller & Wickelmaier, 2013) are shown as follows.
1.1 The first set of questions
-
1.
A box contains 30 marbles in the following colors: 8 red, 10 black, 12 yellow. What is the probability that a randomly drawn marble is yellow?
-
2.
A bag contains 5-cent, 10-cent, and 20-cent coins. The probability of drawing a 5-cent coin is 0.35, that of drawing a 10-cent coin is 0.25, and that of drawing a 20-cent coin is 0.40. What is the probability that the coin randomly drawn is not a 5-cent coin?
-
3.
A bag contains 5-cent, 10-cent, and 20-cent coins. The probability of drawing a 5-cent coin is 0.20, that of drawing a 10-cent coin is 0.45, and that of drawing a 20-cent coin is 0.35. What is the probability that the coin randomly drawn is a 5-cent coin or a 20-cent coin?
-
4.
In a school, 40% of the pupils are boys and 80% of the pupils are right-handed. Suppose that gender and handedness are independent. What is the probability of randomly selecting a right-handed boy?
-
5.
Given a standard deck containing 32 different cards, what is the probability of not drawing a heart?
-
6.
A box contains 20 marbles in the following colors: 4 white, 14 green, 2 red. What is the probability that a randomly drawn marble is not white?
-
7.
A box contains 10 marbles in the following colors: 2 yellow, 5 blue, 3 red. What is the probability that a randomly drawn marble is yellow or blue?
-
8.
What is the probability of obtaining an even number by throwing a dice?
-
9.
Given a standard deck containing 32 different cards, what is the probability of drawing a 4 in a black suit?
-
10.
A box contains marbles that are red or yellow, small or large. The probability of drawing a red marble is 0.70, the probability of drawing a small marble is 0.40. Suppose that the color of the marbles is independent of their size. What is the probability of randomly drawing a small marble that is not red?
-
11.
In a garage there are 50 cars, 20 are black and 10 are diesel powered. Suppose that the color of the cars is independent of the kind of fuel. What is the probability that a randomly selected car is not black and it is diesel powered?
-
12.
A box contains 20 marbles, 10 marbles are red, 6 are yellow and 4 are black. 12 marbles are small and 8 are large. Suppose that the color of the marbles is independent of their size. What is the probability of randomly drawing a small marble that is yellow or red?
1.2 The Second Set of Questions
-
1.
A box contains 30 marbles in the following colors: 10 red, 14 yellow, 6 green. What is the probability that a randomly drawn marble is green?
-
2.
A bag contains 5-cent, 10-cent, and 20-cent coins. The probability of drawing a 5-cent coin is 0.25, that of drawing a 10-cent coin is 0.60, and that of drawing a 20-cent coin is 0.15. What is the probability that the coin randomly drawn is not a 5-cent coin?
-
3.
A bag contains 5-cent, 10-cent, and 20-cent coins. The probability of drawing a 5-cent coin is 0.35, that of drawing a 10-cent coin is 0.20, and that of drawing a 20-cent coin is 0.45. What is the probability that the coin randomly drawn is a 5-cent coin or a 20-cent coin?
-
4.
In a school, 70% of the pupils are girls and 10% of the pupils are left-handed. Suppose that gender and handedness are independent. What is the probability of randomly selecting a left-handed girl?
-
5.
Given a standard deck containing 32 different cards, what is the probability of not drawing a club?
-
6.
A box contains 20 marbles in the following colors: 6 yellow, 10 red, 4 green. What is the probability that a randomly drawn marble is not yellow?
-
7.
A box contains 10 marbles in the following colors: 5 blue, 3 red, 2 green. What is the probability that a randomly drawn marble is red or blue?
-
8.
What is the probability of obtaining an odd number by throwing a dice?
-
9.
Given a standard deck containing 32 different cards, what is the probability of drawing a 10 in a red suit?
-
10.
A box contains marbles that are red or yellow, small or large. The probability of drawing a green marble is 0.40, the probability of drawing a large marble is 0.20. Suppose that the color of the marbles is independent of their size. What is the probability of randomly drawing a large marble that is not green?
-
11.
In a garage there are 50 cars, 15 are white and 20 are diesel powered. Suppose that the color of the cars is independent of the kind of fuel. What is the probability that a randomly selected car is not white and it is diesel powered?
-
12.
A box contains 20 marbles, 8 marbles are white, 4 are green and 8 are red. 15 marbles are small and 5 are large. Suppose that the color of the marbles is independent of their size. What is the probability of randomly drawing a large marble that is white or green?
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Y., Culpepper, S.A. & Chen, Y. Identifiability of Hidden Markov Models for Learning Trajectories in Cognitive Diagnosis. Psychometrika 88, 361–386 (2023). https://doi.org/10.1007/s11336-023-09904-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-023-09904-x