Abstract
In the hidden Markov process, there is a possibility that two different transition matrices for hidden and observed variables yield the same stochastic behavior for the observed variables. Since such two transition matrices cannot be distinguished, we need to identify them and consider that they are equivalent, in practice. We address the equivalence problem of hidden Markov process in a local neighborhood by using the geometrical structure of hidden Markov process. For this aim, we introduce a mathematical concept to express Markov process, and formulate its exponential family by using generators. Then, the above equivalence problem is formulated as the equivalence problem of generators. Taking this equivalence problem into account, we derive several concrete parametrizations in several natural cases.
Similar content being viewed by others
Notes
A \(\mathcal{Y}\)-indexed transition matrix on \(\mathcal{X}\) can be regarded as the classical version of measuring instrument of the quantum setting [9, 10], which describes the quantum measuring process. The recent paper [13] characterizes quantum hidden Markov process by using measuring instrument.
References
Ito, H., Amari, S.-I., Kobayashi, K.: Identifiability of hidden Markov information sources and their minimum degrees of freedom. IEEE Trans. Inf. Theory 38(2), 324–333 (1992)
Amari, S., Nagaoka, H.: Methods of information geometry. Oxford University Press, Oxford (2000)
Nakagawa, K., Kanaya, F.: On the converse theorem in statistical hypothesis testing for Markov chains. IEEE Trans. Inf. Theory 39(2), 629–633 (1993)
Nagaoka, H.: The exponential family of Markov chains and its information geometry. In: Proceedings of the 28th Symposium on Information Theory and its Applications (SITA2005), Okinawa, 20–23 Nov 2005 (2005)
Hayashi, M., Watanabe, S.: Information geometry approach to parameter estimation in Markov Chains. Ann. Stat. 44(4), 1495–1535 (2016)
Amari, S.: \(\alpha \)-divergence is unique, belonging to both \(f\)-divergence and Bregman divergence classes. IEEE Trans. Inf. Theory 55(11), 4925–4931 (2009)
Bregman, L.: The relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming. Comput. Math. Phys. USSR 7, 200–217 (1967)
Watanabe, S., Hayashi, M.: Finite-length analysis on tail probability for Markov chain and application to simple hypothesis testing. Ann. Appl. Probab. 27(2), 811–845 (2017)
Hayashi, M.: Quantum Information Theory. Graduate Texts in Physics, Springer, New York (2017)
Ozawa, M.: Quantum measuring processes of continuous observables. J. Math. Phys. 25, 79 (1984)
Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications, 2nd edn. Springer, New York (1998)
Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Undergraduate Texts in Mathematics. Springer, New York (1960)
Hayashi, M., Yoshida, Y.: Asymptotic and non-asymptotic analysis for a hidden Markovian process with a quantum hidden system. J. Phys. A: Math. Theor. 51(33), 335304 (2018)
Seneta, E.: Non Negative matrix and Markov chains, 2nd edn. Springer, New York (1981)
Hayashi, M.: Information geometry approach to parameter estimation in hidden Markov model. arXiv:1705.06040 (2017)
Acknowledgements
The author is very grateful to Professor Takafumi Kanamori, Professor Vincent Y. F. Tan, and Dr. Wataru Kumagai for helpful discussions and comments. The works reported here were supported in part by the JSPS Grant-in-Aid for Scientific Research (B) No. 16KT0017, (A) No. 17H01280, the Okawa Research Grant and Kayamori Foundation of Informational Science Advancement.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Proof of Theorem 5
It is enough to discuss the one-parameter case. Since \(\mathbf{(B2)} \Rightarrow \mathbf{(B3)}\) is trivial, we will show only \(\mathbf{(B1)} \Rightarrow \mathbf{(B2)}\) and \(\mathbf{(B3)} \Rightarrow \mathbf{(B1)}\).
\(\mathbf{(B1)} \Rightarrow \mathbf{(B2)}\) Assume (B1). There exist \(A \in \mathcal{M}(\mathcal{V}_\mathcal{X})\) and \((B_{y})_{y\in \mathcal{Y}} \in \mathcal{L}_{2,\mathbf {W}}\) such that \(A|P\rangle =0\), \(A^T|u_{\mathcal{X}}\rangle =0 \), \(B_y (\mathcal{V}^{k_{(P,\mathbf {W})}}(P) +\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}] ) \subset \mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}]\), \((\sum _y B_y)^T |u_{\mathcal{X}}\rangle =0 \), and \(\frac{d }{d \theta } W_{\theta ,y}|_{{\theta }=0}=B_y+[W_y,A]\) for any \(y \in \mathcal{Y}\). Then,
where (a) follows from the fact that the image of \(B_y\) is included in \(\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}]\), and (b) follows from the properties of A. So, we obtain (B2).
\(\mathbf{(B3)} \Rightarrow \mathbf{(B1)}\) Assume (B3). We define \(\mathbf {W}_{\theta ,y}':= \mathbf {W}_{y}+ \theta \frac{d}{d\theta }\mathbf {W}_{\theta ,y}|_{\theta =0}\). So, we have \(P^{k_{\mathbf {W}}+k_{(P,\mathbf {W})}+1}[\mathbf {W}_{{\theta }}']\cdot P =P^{k_{\mathbf {W}}+k_{(P,\mathbf {W})}+1}[\mathbf {W}]\cdot P \) and
Theorem 4 guarantees that the pair of \(\mathbf {W}\) and P is equivalent to the pair of \(\mathbf {W}_\theta '\) and P. Thus, Theorem 4 guarantees that there exist an invertible map \(T_\theta \) on \(\mathcal{V}_{\mathcal{X}}\) and an element \((B_{\theta ,y})_{y\in \mathcal{Y}} \in \mathcal{L}_{2,\mathbf {W}}\) such that \(T_\theta P=P\), \(B_{\theta ,y} (\mathcal{V}^{k_{(P,\mathbf {W})}}(P) +\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}] ) \subset \mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}]\) and \(\mathbf {W}_{\theta ,y}'= T_\theta ^{-1}(W_y+B_{\theta ,y})T_\theta \).
Now, taking the derivative at \(\theta =0\), we have \(\frac{d}{d\theta }\mathbf {W}_{\theta ,y}'|_{\theta =0}=[W_y,A]+B_y\), where \(A:=\frac{d}{d\theta }T_\theta |_{\theta =0}\) and \(B_y:=\frac{d}{d\theta }B_{\theta ,y}|_{\theta =0}\). The condition \(T_\theta P=P\) implies that
Using (101), we have
Since the relation \((\mathbf {W}_{\theta ,y})^T|u_{\mathcal{X}}\rangle =|u_{\mathcal{X}}\rangle \) implies \((\sum _y \frac{d}{d\theta }\mathbf {W}_{\theta ,y}|_{\theta =0})^T|u_{\mathcal{X}}\rangle =0\), \((\sum _y [W_y,A]+B_y)^T |u_{\mathcal{X}}\rangle =0\). Since \(B_y^T |u_{\mathcal{X}}\rangle =0\), we have \( ([\sum _y W_y,A])^T |u_{\mathcal{X}}\rangle =0\). So, \((\sum _y W_y)^T A^T |u_{\mathcal{X}}\rangle = A^T(\sum _y W_y)^T |u_{\mathcal{X}}\rangle = A^T |u_{\mathcal{X}}\rangle \). That is, \(A^T |u_{\mathcal{X}}\rangle \) is an eigenvector of \((\sum _y W_y)^T\) with eigenvalue 1. So, \(A^T |u_{\mathcal{X}}\rangle \) is written as \(c |u_{\mathcal{X}}\rangle \) with a constant c, i.e.,
Now, we calculate \( \frac{d }{d \theta } P^{k}[\mathbf {W}_{\varvec{\theta }}]\cdot P (y_1, \ldots , y_k)|_{{\theta }=0}\) by using the same discussion as (100). So, we have
where (a) follows from (102) and (104). Since \(\langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P\rangle >0\) and the LHS is zero, we have \(c=0\). Thus, we obtain (B1).
Proof of Theorem 6
It is enough to discuss the one-parameter case. Since \(\mathbf{(C2)} \Rightarrow \mathbf{(C3)}\) is trivial, we will show only \(\mathbf{(C1)} \Rightarrow \mathbf{(C2)}\) and \(\mathbf{(C3)} \Rightarrow \mathbf{(C1)}\).
\(\mathbf{(C1)} \Rightarrow \mathbf{(C2)}\) Assume (C1). There exist a real number \(c\in \mathbb {R}\), \(A \in \mathcal{M}(\mathcal{V}_\mathcal{X})\), and \((B_{y})_{y\in \mathcal{Y}} \in \mathcal{L}_{2,\mathbf {W}}\) such that
for any \(y \in \mathcal{Y}\). Define the vector \(Q:= \frac{d}{d\theta } P_{\mathbf {W}_{\theta }} |_{{\theta }=0}\). Since
we have
where (a) and (b) follow from (110) and its derivative, respectively.
That is,
Since \(\Big (\sum _{y \in \mathcal{Y}} W_y\Big ) \Big (\sum _{y \in \mathcal{Y}} B_y\Big ) | P_{\mathbf {W}} \rangle =0\), we have
Due to the uniqueness of the eigenvector of \(\bigg (\sum _{y \in \mathcal{Y}} W_y\bigg )\) with eigenvalue 1, we have
with a constant \(c' \in \mathbb {R}\).
Since
we have
where (a), (b), (c), and (d) follow from (106), (114), (109), and (115), respectively.
Similar to (100), we have
Here, (a) follows from a derivation similar to (100). That is, we need to care about the derivative of \(|P_{\mathbf {W}_\theta } \rangle \). (b) follows from (106) and (114), and (c) does from (116). So, we obtain (2).
\(\mathbf{(C3)} \Rightarrow \mathbf{(C1)}\) Assume (C3). We define \(\mathbf {W}_{\theta ,y}'\) in the same way as the proof of Theorem 5. So, similar to the proof of Theorem 5, there exist an invertible map \(T_\theta \) on \(\mathcal{V}_{\mathcal{X}}\) and \((B_{\theta ,y})_{y\in \mathcal{Y}} \in \mathcal{L}_{2,\mathbf {W}}\) such that \(T_\theta P_{\mathbf {W}}=P_{\mathbf {W}_\theta '}\), \(B_{\theta ,y} (\mathcal{V}^{k_{(P,\mathbf {W})}}(P) +\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}] ) \subset \mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}]\) and \(\mathbf {W}_{\theta ,y}'= T_\theta ^{-1}(W_y+B_{\theta ,y})T_\theta \).
Choosing A and \(B_y\) in the same way as the proof of Theorem 5, we have
Then, in the same way as the proof of Theorem 5, we obtain (104). Thus, we obtain (C1).
Proofs of Lemmas 11 and 10
To show Lemma 11, we prepare Lemma 18.
Lemma 18
Let \(\mathcal{V}_1\) be the direct sum space \(\mathcal{V}_2+\mathcal{V}_3\) of two vector spaces \(\mathcal{V}_2\) and \(\mathcal{V}_3\) with the condition \(\mathcal{V}_2\cap \mathcal{V}_3=\{0\}\). Let \(\mathcal{V}_4\) (\(\mathcal{V}_5\)) be a subspace of \(\mathcal{V}_2\) (\(\mathcal{V}_3\)). Assume that a linear map \(\alpha _1\) (\(\alpha _2\)) from \(\mathcal{V}_6\) to \(\mathcal{V}_2\) (\(\mathcal{V}_3\)) satisfies that (1) \(\alpha _1(\mathcal{V}_1)\cap \mathcal{V}_4 = \{0\} \) and (2) \(\alpha _2(\mathrm{Ker}\,\alpha _1)\cap \mathcal{V}_5 = \{0\} \). Define \(\alpha _3(v):= \alpha _1(v)+\alpha _2(v)\). Then, \(\alpha _3(\mathcal{V}_1)\cap (\mathcal{V}_4+\mathcal{V}_5)= \{0\}\).
Proof of Lemma 18
Assume that \(\alpha _1(v_1)=v_4\) and \(\alpha _2(v_1)=v_5\) for \(v_1 \in \mathcal{V}_1\), \(v_4 \in \mathcal{V}_4\), and \(v_5 \in \mathcal{V}_5\). Condition (1) implies that \(\alpha _1(v_1)=0\). So, \(v_1 \in \mathrm{Ker}\,\alpha _1\). Condition (2) and \(\alpha _2(v_1)=v_5\) yield that \(\alpha _2(v_1)=v_5=0\), which is the desired statement.\(\square \)
Proof of Lemma 11
Now, we check that the space spanned by \(\hat{g}_1, \ldots , \hat{g}_{l_2+l_3} \) has intersection \(\{0\}\) with \(\mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) +\mathcal{N}_{P_{\mathbf {W}}}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) +\mathcal{N}_2((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}))\). For this purpose, we make preparation. We choose the matrix \(\tilde{A}\) as a diagonal matrix with diagonal entry f(x). So, we have \(W_y(x|x')(f(x)-f(x'))=[ W_y,\tilde{A}]\). We can restrict function f so that \(\sum _x f(x)=0\). Since \(\mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) = \{ (f(x)-f(x')+c)_{x,x'} \}\) and \(\langle u_{\mathcal{X}}| \tilde{A} |u_{\mathcal{X}}\rangle =0\), we have
To prove the above issue, it is sufficient to show that a nonzero element of the space spanned by \(\hat{g}_1, \ldots , \hat{g}_{l_2+l_3} \) is not contained in the space \(\mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) +\mathcal{N}_2((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}))\). If a non-zero element is contained in the space, its matrix components with \(y=y_0,y_1\) are given as those of the element of the space \(\mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) +\mathcal{N}_2((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}))\). To deny this statement, we regard \(\bar{g}_{j,y_0}\) and \(\bar{g}_{j,y_1}\) as elements of \(\mathcal{G}(\{y_0,y_1\},\mathcal{X}^2)\). Then, due to (119), it is sufficient to show that the space spanned by \(\bar{g}_{1,y_0}, \ldots , \bar{g}_{d,y_0}\), \(\bar{g}_{1,y_1}, \ldots , \bar{g}_{d^2-d,y_1}\) has intersection \(\{0\}\) with the space \(\{(\alpha _{y_0}(A),\alpha _{y_1}(A))| \langle u_{\mathcal{X}}| A|u_{\mathcal{X}}\rangle =0\}\). To show this statement, we apply Lemma 18 to the case when \(\mathcal{V}_2\) and \(\mathcal{V}_3\) are the set of traceless matrices, \(\mathcal{V}_4\) is the space spanned by \(\bar{g}_{1,y_0}, \ldots , \bar{g}_{d,y_0}\), \(\mathcal{V}_5\) is the space spanned by \(\bar{g}_{1,y_1}, \ldots , \bar{g}_{d^2-d,y_1}\), \(\alpha _1\) is the map \(\alpha _{y_0}\), and \(\alpha _2\) is the map \(\alpha _{y_1}\). Since \(\alpha _{y_0}\) is injective on \(\{ A \in \mathcal{M}(\mathcal{V}_\mathcal{X}) | A^T |u_{\mathcal{X}}\rangle =0 \}\) whose dimension is the same as that of the image of \(\alpha _{y_0}\), due to the construction of \(g_{j,y_0}\), we find that the map \(\alpha _{y_0}\) satisfies the condition for \(\alpha _1\). So, we obtain the desired statement. \(\square \)
Proof of Lemma 10
Assume the condition \([W_{y_0},A]=0\). Then, \(A^T\) needs to has common eigenvectors with \(W_{y_0}\). Due to the condition \(a^j \ne 0\) for any j, the eigenspace of \(A^T\) including \(u_{\mathcal{X}}\) needs to be the whole space. So, \(A^T\) is zero, which implies the condition (1) of Condition E3.
Let A be an element of the kernel of the map \(A\mapsto ([W_{y_0},A],[W_{y_1},A])\). Then, an eigenspace of \(A^T\) is spanned by a subset of \(\{f_i\}\). It also is spanned by a subset of \(\{f_i'\}\). To realize both conditions, the eigenspace needs to be the whole space. So, \(A^T\) is zero, which implies the condition (2) of Condition E3.\(\square \)
Rights and permissions
About this article
Cite this article
Hayashi, M. Local equivalence problem in hidden Markov model. Info. Geo. 2, 1–42 (2019). https://doi.org/10.1007/s41884-019-00016-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41884-019-00016-z