Skip to main content
Log in

Local equivalence problem in hidden Markov model

  • Research Paper
  • Published:
Information Geometry Aims and scope Submit manuscript

Abstract

In the hidden Markov process, there is a possibility that two different transition matrices for hidden and observed variables yield the same stochastic behavior for the observed variables. Since such two transition matrices cannot be distinguished, we need to identify them and consider that they are equivalent, in practice. We address the equivalence problem of hidden Markov process in a local neighborhood by using the geometrical structure of hidden Markov process. For this aim, we introduce a mathematical concept to express Markov process, and formulate its exponential family by using generators. Then, the above equivalence problem is formulated as the equivalence problem of generators. Taking this equivalence problem into account, we derive several concrete parametrizations in several natural cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. A \(\mathcal{Y}\)-indexed transition matrix on \(\mathcal{X}\) can be regarded as the classical version of measuring instrument of the quantum setting [9, 10], which describes the quantum measuring process. The recent paper [13] characterizes quantum hidden Markov process by using measuring instrument.

  2. For the Perron–Frobenius eigenvalue and Perron–Frobenius eigenvector, see the references [11, Theorem 3.1.] [14].

References

  1. Ito, H., Amari, S.-I., Kobayashi, K.: Identifiability of hidden Markov information sources and their minimum degrees of freedom. IEEE Trans. Inf. Theory 38(2), 324–333 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  2. Amari, S., Nagaoka, H.: Methods of information geometry. Oxford University Press, Oxford (2000)

    MATH  Google Scholar 

  3. Nakagawa, K., Kanaya, F.: On the converse theorem in statistical hypothesis testing for Markov chains. IEEE Trans. Inf. Theory 39(2), 629–633 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  4. Nagaoka, H.: The exponential family of Markov chains and its information geometry. In: Proceedings of the 28th Symposium on Information Theory and its Applications (SITA2005), Okinawa, 20–23 Nov 2005 (2005)

  5. Hayashi, M., Watanabe, S.: Information geometry approach to parameter estimation in Markov Chains. Ann. Stat. 44(4), 1495–1535 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  6. Amari, S.: \(\alpha \)-divergence is unique, belonging to both \(f\)-divergence and Bregman divergence classes. IEEE Trans. Inf. Theory 55(11), 4925–4931 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bregman, L.: The relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming. Comput. Math. Phys. USSR 7, 200–217 (1967)

    Article  MathSciNet  Google Scholar 

  8. Watanabe, S., Hayashi, M.: Finite-length analysis on tail probability for Markov chain and application to simple hypothesis testing. Ann. Appl. Probab. 27(2), 811–845 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  9. Hayashi, M.: Quantum Information Theory. Graduate Texts in Physics, Springer, New York (2017)

  10. Ozawa, M.: Quantum measuring processes of continuous observables. J. Math. Phys. 25, 79 (1984)

    Article  MathSciNet  Google Scholar 

  11. Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications, 2nd edn. Springer, New York (1998)

    Book  MATH  Google Scholar 

  12. Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Undergraduate Texts in Mathematics. Springer, New York (1960)

    MATH  Google Scholar 

  13. Hayashi, M., Yoshida, Y.: Asymptotic and non-asymptotic analysis for a hidden Markovian process with a quantum hidden system. J. Phys. A: Math. Theor. 51(33), 335304 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  14. Seneta, E.: Non Negative matrix and Markov chains, 2nd edn. Springer, New York (1981)

    Book  MATH  Google Scholar 

  15. Hayashi, M.: Information geometry approach to parameter estimation in hidden Markov model. arXiv:1705.06040 (2017)

Download references

Acknowledgements

The author is very grateful to Professor Takafumi Kanamori, Professor Vincent Y. F. Tan, and Dr. Wataru Kumagai for helpful discussions and comments. The works reported here were supported in part by the JSPS Grant-in-Aid for Scientific Research (B) No. 16KT0017, (A) No. 17H01280, the Okawa Research Grant and Kayamori Foundation of Informational Science Advancement.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masahito Hayashi.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Proof of Theorem 5

It is enough to discuss the one-parameter case. Since \(\mathbf{(B2)} \Rightarrow \mathbf{(B3)}\) is trivial, we will show only \(\mathbf{(B1)} \Rightarrow \mathbf{(B2)}\) and \(\mathbf{(B3)} \Rightarrow \mathbf{(B1)}\).

\(\mathbf{(B1)} \Rightarrow \mathbf{(B2)}\) Assume (B1). There exist \(A \in \mathcal{M}(\mathcal{V}_\mathcal{X})\) and \((B_{y})_{y\in \mathcal{Y}} \in \mathcal{L}_{2,\mathbf {W}}\) such that \(A|P\rangle =0\), \(A^T|u_{\mathcal{X}}\rangle =0 \), \(B_y (\mathcal{V}^{k_{(P,\mathbf {W})}}(P) +\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}] ) \subset \mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}]\), \((\sum _y B_y)^T |u_{\mathcal{X}}\rangle =0 \), and \(\frac{d }{d \theta } W_{\theta ,y}|_{{\theta }=0}=B_y+[W_y,A]\) for any \(y \in \mathcal{Y}\). Then,

$$\begin{aligned}&\frac{d }{d \theta } P^{k}[\mathbf {W}_{\varvec{\theta }}]\cdot P (y_1, \ldots , y_k)|_{{\theta }=0} =\frac{d }{d \theta } (\langle u_{\mathcal{X}}| W_{\theta ,y_k}W_{\theta ,y_{k-1}} \ldots W_{\theta ,y_1} |P\rangle )|_{{\theta }=0} \nonumber \\&\quad = \left\langle u_{\mathcal{X}}\left| \left( \frac{d }{d \theta } W_{\theta ,y_k}|_{{\theta }=0}\right) W_{y_{k-1}} \ldots W_{y_1} \right| P\right\rangle \nonumber \\&\qquad + \left\langle u_{\mathcal{X}}\left| W_{y_k} \left( \frac{d }{d \theta } W_{\theta ,y_{k-1}}|_{{\theta }=0}\right) \ldots W_{y_1} \right| P\right\rangle \nonumber \\&\qquad + \cdots + \left\langle u_{\mathcal{X}}\left| W_{y_k} W_{y_{k-1}} \ldots \left( \frac{d }{d \theta } W_{\theta ,y_1}|_{{\theta }=0}\right) \right| P\right\rangle \end{aligned}$$
(98)
$$\begin{aligned}&= \langle u_{\mathcal{X}}| B_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P\rangle + \langle u_{\mathcal{X}}| W_{y_k} B_{y_{k-1}} \ldots W_{y_1} |P\rangle \nonumber \\&\qquad + \cdots + \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots B_{y_1} |P\rangle \nonumber \\&\qquad + \langle u_{\mathcal{X}}| [W_{y_k},A] W_{y_{k-1}} \ldots W_{y_1} |P\rangle + \langle u_{\mathcal{X}}| W_{y_k} [W_{y_{k-1}},A] \ldots W_{y_1} |P\rangle \nonumber \\&\qquad + \cdots + \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots [W_{y_1},A] |P\rangle \end{aligned}$$
(99)
$$\begin{aligned}&{\mathop {=}\limits ^{(a)}} -\langle u_{\mathcal{X}}| A W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P\rangle +\langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1}A |P\rangle {\mathop {=}\limits ^{(b)}} 0, \end{aligned}$$
(100)

where (a) follows from the fact that the image of \(B_y\) is included in \(\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}]\), and (b) follows from the properties of A. So, we obtain (B2).

\(\mathbf{(B3)} \Rightarrow \mathbf{(B1)}\) Assume (B3). We define \(\mathbf {W}_{\theta ,y}':= \mathbf {W}_{y}+ \theta \frac{d}{d\theta }\mathbf {W}_{\theta ,y}|_{\theta =0}\). So, we have \(P^{k_{\mathbf {W}}+k_{(P,\mathbf {W})}+1}[\mathbf {W}_{{\theta }}']\cdot P =P^{k_{\mathbf {W}}+k_{(P,\mathbf {W})}+1}[\mathbf {W}]\cdot P \) and

$$\begin{aligned} \lim _{\theta \rightarrow 0} \frac{\mathbf {W}_{\theta ,y}-\mathbf {W}_{\theta ',y}}{\theta }=0 . \end{aligned}$$
(101)

Theorem 4 guarantees that the pair of \(\mathbf {W}\) and P is equivalent to the pair of \(\mathbf {W}_\theta '\) and P. Thus, Theorem 4 guarantees that there exist an invertible map \(T_\theta \) on \(\mathcal{V}_{\mathcal{X}}\) and an element \((B_{\theta ,y})_{y\in \mathcal{Y}} \in \mathcal{L}_{2,\mathbf {W}}\) such that \(T_\theta P=P\), \(B_{\theta ,y} (\mathcal{V}^{k_{(P,\mathbf {W})}}(P) +\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}] ) \subset \mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}]\) and \(\mathbf {W}_{\theta ,y}'= T_\theta ^{-1}(W_y+B_{\theta ,y})T_\theta \).

Now, taking the derivative at \(\theta =0\), we have \(\frac{d}{d\theta }\mathbf {W}_{\theta ,y}'|_{\theta =0}=[W_y,A]+B_y\), where \(A:=\frac{d}{d\theta }T_\theta |_{\theta =0}\) and \(B_y:=\frac{d}{d\theta }B_{\theta ,y}|_{\theta =0}\). The condition \(T_\theta P=P\) implies that

$$\begin{aligned} A|P\rangle =0. \end{aligned}$$
(102)

Using (101), we have

$$\begin{aligned} \frac{d}{d\theta }\mathbf {W}_{\theta ,y}|_{\theta =0} =[W_y,A]+B_y . \end{aligned}$$
(103)

Since the relation \((\mathbf {W}_{\theta ,y})^T|u_{\mathcal{X}}\rangle =|u_{\mathcal{X}}\rangle \) implies \((\sum _y \frac{d}{d\theta }\mathbf {W}_{\theta ,y}|_{\theta =0})^T|u_{\mathcal{X}}\rangle =0\), \((\sum _y [W_y,A]+B_y)^T |u_{\mathcal{X}}\rangle =0\). Since \(B_y^T |u_{\mathcal{X}}\rangle =0\), we have \( ([\sum _y W_y,A])^T |u_{\mathcal{X}}\rangle =0\). So, \((\sum _y W_y)^T A^T |u_{\mathcal{X}}\rangle = A^T(\sum _y W_y)^T |u_{\mathcal{X}}\rangle = A^T |u_{\mathcal{X}}\rangle \). That is, \(A^T |u_{\mathcal{X}}\rangle \) is an eigenvector of \((\sum _y W_y)^T\) with eigenvalue 1. So, \(A^T |u_{\mathcal{X}}\rangle \) is written as \(c |u_{\mathcal{X}}\rangle \) with a constant c, i.e.,

$$\begin{aligned} A^T |u_{\mathcal{X}}\rangle =c |u_{\mathcal{X}}\rangle . \end{aligned}$$
(104)

Now, we calculate \( \frac{d }{d \theta } P^{k}[\mathbf {W}_{\varvec{\theta }}]\cdot P (y_1, \ldots , y_k)|_{{\theta }=0}\) by using the same discussion as (100). So, we have

$$\begin{aligned}&\frac{d }{d \theta } P^{k}[\mathbf {W}_{\varvec{\theta }}]\cdot P (y_1, \ldots , y_k)|_{{\theta }=0} \nonumber \\&\qquad = -\langle u_{\mathcal{X}}| A W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P\rangle + \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1}A |P\rangle \nonumber \\&\qquad {\mathop {=}\limits ^{(a)}} -c \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P\rangle , \end{aligned}$$
(105)

where (a) follows from (102) and (104). Since \(\langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P\rangle >0\) and the LHS is zero, we have \(c=0\). Thus, we obtain (B1).

Proof of Theorem 6

It is enough to discuss the one-parameter case. Since \(\mathbf{(C2)} \Rightarrow \mathbf{(C3)}\) is trivial, we will show only \(\mathbf{(C1)} \Rightarrow \mathbf{(C2)}\) and \(\mathbf{(C3)} \Rightarrow \mathbf{(C1)}\).

\(\mathbf{(C1)} \Rightarrow \mathbf{(C2)}\) Assume (C1). There exist a real number \(c\in \mathbb {R}\), \(A \in \mathcal{M}(\mathcal{V}_\mathcal{X})\), and \((B_{y})_{y\in \mathcal{Y}} \in \mathcal{L}_{2,\mathbf {W}}\) such that

$$\begin{aligned} A^T|u_{\mathcal{X}}\rangle =c |u_{\mathcal{X}}\rangle , \end{aligned}$$
(106)
$$\begin{aligned} B_y (\mathcal{V}^{k_{(P,\mathbf {W})}}(P) +\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}] ) \subset \mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}], \end{aligned}$$
(107)
$$\begin{aligned} \left( \sum _y B_y\right) ^T |u_{\mathcal{X}}\rangle =0 \end{aligned}$$
(108)
$$\begin{aligned} \frac{d }{d \theta } W_{\theta ,y}|_{{\theta }=0} =B_y+[W_y,A] \end{aligned}$$
(109)

for any \(y \in \mathcal{Y}\). Define the vector \(Q:= \frac{d}{d\theta } P_{\mathbf {W}_{\theta }} |_{{\theta }=0}\). Since

$$\begin{aligned} \left( \sum _{y \in \mathcal{Y}} W_{\theta ,y}\right) | P_{\mathbf {W}_{\theta }} \rangle =| P_{\mathbf {W}_{\theta }} \rangle , \end{aligned}$$
(110)

we have

$$\begin{aligned}&\left( \sum _{y \in \mathcal{Y}} B_y\right) | P_{\mathbf {W}} \rangle +\left( \sum _{y \in \mathcal{Y}} W_y\right) A | P_{\mathbf {W}} \rangle -A | P_{\mathbf {W}} \rangle + \left( \sum _{y \in \mathcal{Y}} W_{y}\right) | Q \rangle \nonumber \\&\quad {\mathop {=}\limits ^{(a)}} \Big ( \Big (\sum _{y \in \mathcal{Y}} B_y\Big ) +\Big ( \Big (\sum _{y \in \mathcal{Y}} W_y\Big )A -A\Big (\sum _{y \in \mathcal{Y}} W_y\Big )\Big ) \Big ) | P_{\mathbf {W}} \rangle + \Big (\sum _{y \in \mathcal{Y}} W_{y}\Big ) | Q \rangle \nonumber \\&\quad = \Big (\sum _{y \in \mathcal{Y}} B_y+[W_y,A]\Big ) | P_{\mathbf {W}} \rangle + \Big (\sum _{y \in \mathcal{Y}} W_{y}\Big ) | Q \rangle \nonumber \\&\quad {\mathop {=}\limits ^{(b)}}| Q \rangle , \end{aligned}$$
(111)

where (a) and (b) follow from (110) and its derivative, respectively.

That is,

$$\begin{aligned} \Big (\sum _{y \in \mathcal{Y}} W_y\Big ) \Big (| Q \rangle + A |P_{\mathbf {W}} \rangle \Big ) = \Big (| Q \rangle + A |P_{\mathbf {W}} \rangle \Big ) -\Big (\sum _{y \in \mathcal{Y}} B_y\Big ) | P_{\mathbf {W}} \rangle . \end{aligned}$$
(112)

Since \(\Big (\sum _{y \in \mathcal{Y}} W_y\Big ) \Big (\sum _{y \in \mathcal{Y}} B_y\Big ) | P_{\mathbf {W}} \rangle =0\), we have

$$\begin{aligned}&\bigg (\sum _{y \in \mathcal{Y}} W_y\bigg ) \Bigg (| Q \rangle + A |P_{\mathbf {W}} \rangle -\bigg (\sum _{y \in \mathcal{Y}} B_y\bigg ) | P_{\mathbf {W}} \rangle \Bigg ) \nonumber \\&\quad = \Bigg (| Q \rangle + A |P_{\mathbf {W}} \rangle -\bigg (\sum _{y \in \mathcal{Y}} B_y\bigg ) | P_{\mathbf {W}} \rangle \Bigg ). \end{aligned}$$
(113)

Due to the uniqueness of the eigenvector of \(\bigg (\sum _{y \in \mathcal{Y}} W_y\bigg )\) with eigenvalue 1, we have

$$\begin{aligned} | Q \rangle + A |P_{\mathbf {W}} \rangle -\bigg (\sum _{y \in \mathcal{Y}} B_y\bigg ) | P_{\mathbf {W}} \rangle = c' |P_{\mathbf {W}} \rangle \end{aligned}$$
(114)

with a constant \(c' \in \mathbb {R}\).

Since

$$\begin{aligned} \sum _{y \in \mathcal{Y}} \langle u_{\mathcal{X}}| W_{\theta ,y}|P_{\mathbf {W}_{\theta }} \rangle =1, \end{aligned}$$
(115)

we have

$$\begin{aligned}&c '-c \nonumber \\&\quad = c '\left\langle u_{\mathcal{X}}\left| \left( \sum _{y \in \mathcal{Y}} W_{y}\right) \right| P_{\mathbf {W}} \right\rangle -c\langle u_{\mathcal{X}} |P_{\mathbf {W}} \rangle \nonumber \\&\quad {\mathop {=}\limits ^{(a)}} c '\left\langle u_{\mathcal{X}}\left| \left( \sum _{y \in \mathcal{Y}} W_{y}\right) \right| P_{\mathbf {W}} \right\rangle -\langle u_{\mathcal{X}}| A |P_{\mathbf {W}} \rangle \nonumber \\&\quad = \left\langle u_{\mathcal{X}}| \left( \sum _{y \in \mathcal{Y}} W_{y}\right) ( c' |P_{\mathbf {W}} \rangle -A |P_{\mathbf {W}} \rangle +\left( \sum _{y \in \mathcal{Y}} B_y\right) | P_{\mathbf {W}} \rangle \right) \nonumber \\&\quad {\mathop {=}\limits ^{(b)}} \left\langle u_{\mathcal{X}}| \left( \sum _{y \in \mathcal{Y}} W_{y}\right) |Q\right\rangle \nonumber \\&\quad = \langle u_{\mathcal{X}}| A|P_{\mathbf {W}} \rangle -\langle u_{\mathcal{X}}| A|P_{\mathbf {W}} \rangle + \left\langle u_{\mathcal{X}}| \left( \sum _{y \in \mathcal{Y}} W_{y}\right) |Q\right\rangle \nonumber \\&\quad = \left\langle u_{\mathcal{X}}| \Bigg (\bigg (\sum _{y \in \mathcal{Y}} W_y\bigg )A - A \bigg (\sum _{y \in \mathcal{Y}} W_y\bigg ) \Bigg ) |P_{\mathbf {W}} \right\rangle + \left\langle u_{\mathcal{X}}| \sum _{y \in \mathcal{Y}} W_{y} |Q\right\rangle \nonumber \\&\quad = \left\langle u_{\mathcal{X}}\left| \sum _{y \in \mathcal{Y}} B_y+[W_y,A] \right| P_{\mathbf {W}} \right\rangle + \left\langle u_{\mathcal{X}}\left| \sum _{y \in \mathcal{Y}} W_{y} \right| Q\right\rangle \nonumber \\&\quad {\mathop {=}\limits ^{(c)}} \left\langle u_{\mathcal{X}}| \sum _{y \in \mathcal{Y}} \frac{d}{d\theta }W_{\theta ,y}|_{{\theta }=0}|P_{\mathbf {W}} \right\rangle + \left\langle u_{\mathcal{X}}| \sum _{y \in \mathcal{Y}} W_{y} \Big |\frac{d}{d\theta } P_{\mathbf {W}_{\theta }} |_{{\theta }=0}\right\rangle \nonumber \\&\quad {\mathop {=}\limits ^{(d)}}0 , \end{aligned}$$
(116)

where (a), (b), (c), and (d) follow from (106), (114), (109), and (115), respectively.

Similar to (100), we have

$$\begin{aligned}&\frac{d }{d \theta } P^{k}[\mathbf {W}_{\varvec{\theta }}]\cdot P_{\mathbf {W}_\theta } (y_1, \ldots , y_k)|_{{\theta }=0} \nonumber \\&\quad {\mathop {=}\limits ^{(a)}} -\langle u_{\mathcal{X}}| A W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P_{\mathbf {W}}\rangle + \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1}A |P_{\mathbf {W}}\rangle \nonumber \\&\qquad +\langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |Q\rangle \nonumber \\&\quad {\mathop {=}\limits ^{(b)}} -c \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P_{\mathbf {W}}\rangle + \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1}A |P_{\mathbf {W}}\rangle \nonumber \\&\qquad +\langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1} \Bigg ( c '|P_{\mathbf {W}} \rangle -A |P_{\mathbf {W}} \rangle +\bigg (\sum _{y \in \mathcal{Y}} B_y\bigg ) | P_{\mathbf {W}} \rangle \Bigg ) \nonumber \\&\quad {\mathop {=}\limits ^{(c)}} 0. \end{aligned}$$
(117)

Here, (a) follows from a derivation similar to (100). That is, we need to care about the derivative of \(|P_{\mathbf {W}_\theta } \rangle \). (b) follows from (106) and (114), and (c) does from (116). So, we obtain (2).

\(\mathbf{(C3)} \Rightarrow \mathbf{(C1)}\) Assume (C3). We define \(\mathbf {W}_{\theta ,y}'\) in the same way as the proof of Theorem 5. So, similar to the proof of Theorem 5, there exist an invertible map \(T_\theta \) on \(\mathcal{V}_{\mathcal{X}}\) and \((B_{\theta ,y})_{y\in \mathcal{Y}} \in \mathcal{L}_{2,\mathbf {W}}\) such that \(T_\theta P_{\mathbf {W}}=P_{\mathbf {W}_\theta '}\), \(B_{\theta ,y} (\mathcal{V}^{k_{(P,\mathbf {W})}}(P) +\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}] ) \subset \mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}]\) and \(\mathbf {W}_{\theta ,y}'= T_\theta ^{-1}(W_y+B_{\theta ,y})T_\theta \).

Choosing A and \(B_y\) in the same way as the proof of Theorem 5, we have

$$\begin{aligned} \frac{d}{d\theta }\mathbf {W}_{\theta ,y}|_{\theta =0}= [W_y,A]+B_y . \end{aligned}$$
(118)

Then, in the same way as the proof of Theorem 5, we obtain (104). Thus, we obtain (C1).

Proofs of Lemmas 11 and 10

To show Lemma 11, we prepare Lemma 18.

Lemma 18

Let \(\mathcal{V}_1\) be the direct sum space \(\mathcal{V}_2+\mathcal{V}_3\) of two vector spaces \(\mathcal{V}_2\) and \(\mathcal{V}_3\) with the condition \(\mathcal{V}_2\cap \mathcal{V}_3=\{0\}\). Let \(\mathcal{V}_4\) (\(\mathcal{V}_5\)) be a subspace of \(\mathcal{V}_2\) (\(\mathcal{V}_3\)). Assume that a linear map \(\alpha _1\) (\(\alpha _2\)) from \(\mathcal{V}_6\) to \(\mathcal{V}_2\) (\(\mathcal{V}_3\)) satisfies that (1) \(\alpha _1(\mathcal{V}_1)\cap \mathcal{V}_4 = \{0\} \) and (2) \(\alpha _2(\mathrm{Ker}\,\alpha _1)\cap \mathcal{V}_5 = \{0\} \). Define \(\alpha _3(v):= \alpha _1(v)+\alpha _2(v)\). Then, \(\alpha _3(\mathcal{V}_1)\cap (\mathcal{V}_4+\mathcal{V}_5)= \{0\}\).

Proof of Lemma 18

Assume that \(\alpha _1(v_1)=v_4\) and \(\alpha _2(v_1)=v_5\) for \(v_1 \in \mathcal{V}_1\), \(v_4 \in \mathcal{V}_4\), and \(v_5 \in \mathcal{V}_5\). Condition (1) implies that \(\alpha _1(v_1)=0\). So, \(v_1 \in \mathrm{Ker}\,\alpha _1\). Condition (2) and \(\alpha _2(v_1)=v_5\) yield that \(\alpha _2(v_1)=v_5=0\), which is the desired statement.\(\square \)

Proof of Lemma 11

Now, we check that the space spanned by \(\hat{g}_1, \ldots , \hat{g}_{l_2+l_3} \) has intersection \(\{0\}\) with \(\mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) +\mathcal{N}_{P_{\mathbf {W}}}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) +\mathcal{N}_2((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}))\). For this purpose, we make preparation. We choose the matrix \(\tilde{A}\) as a diagonal matrix with diagonal entry f(x). So, we have \(W_y(x|x')(f(x)-f(x'))=[ W_y,\tilde{A}]\). We can restrict function f so that \(\sum _x f(x)=0\). Since \(\mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) = \{ (f(x)-f(x')+c)_{x,x'} \}\) and \(\langle u_{\mathcal{X}}| \tilde{A} |u_{\mathcal{X}}\rangle =0\), we have

$$\begin{aligned} \mathcal{L}_{2,\mathbf {W}}+ \mathbf {W}_* \mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) = \{ (c W_y)_{y \in \mathcal{Y}}\}+ \{(\alpha _{y}(A))_{y \in \mathcal{Y}} | \langle u_{\mathcal{X}}| A|u_{\mathcal{X}}\rangle =0 \}. \end{aligned}$$
(119)

To prove the above issue, it is sufficient to show that a nonzero element of the space spanned by \(\hat{g}_1, \ldots , \hat{g}_{l_2+l_3} \) is not contained in the space \(\mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) +\mathcal{N}_2((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}))\). If a non-zero element is contained in the space, its matrix components with \(y=y_0,y_1\) are given as those of the element of the space \(\mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) +\mathcal{N}_2((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}))\). To deny this statement, we regard \(\bar{g}_{j,y_0}\) and \(\bar{g}_{j,y_1}\) as elements of \(\mathcal{G}(\{y_0,y_1\},\mathcal{X}^2)\). Then, due to (119), it is sufficient to show that the space spanned by \(\bar{g}_{1,y_0}, \ldots , \bar{g}_{d,y_0}\), \(\bar{g}_{1,y_1}, \ldots , \bar{g}_{d^2-d,y_1}\) has intersection \(\{0\}\) with the space \(\{(\alpha _{y_0}(A),\alpha _{y_1}(A))| \langle u_{\mathcal{X}}| A|u_{\mathcal{X}}\rangle =0\}\). To show this statement, we apply Lemma 18 to the case when \(\mathcal{V}_2\) and \(\mathcal{V}_3\) are the set of traceless matrices, \(\mathcal{V}_4\) is the space spanned by \(\bar{g}_{1,y_0}, \ldots , \bar{g}_{d,y_0}\), \(\mathcal{V}_5\) is the space spanned by \(\bar{g}_{1,y_1}, \ldots , \bar{g}_{d^2-d,y_1}\), \(\alpha _1\) is the map \(\alpha _{y_0}\), and \(\alpha _2\) is the map \(\alpha _{y_1}\). Since \(\alpha _{y_0}\) is injective on \(\{ A \in \mathcal{M}(\mathcal{V}_\mathcal{X}) | A^T |u_{\mathcal{X}}\rangle =0 \}\) whose dimension is the same as that of the image of \(\alpha _{y_0}\), due to the construction of \(g_{j,y_0}\), we find that the map \(\alpha _{y_0}\) satisfies the condition for \(\alpha _1\). So, we obtain the desired statement.    \(\square \)

Proof of Lemma 10

Assume the condition \([W_{y_0},A]=0\). Then, \(A^T\) needs to has common eigenvectors with \(W_{y_0}\). Due to the condition \(a^j \ne 0\) for any j, the eigenspace of \(A^T\) including \(u_{\mathcal{X}}\) needs to be the whole space. So, \(A^T\) is zero, which implies the condition (1) of Condition E3.

Let A be an element of the kernel of the map \(A\mapsto ([W_{y_0},A],[W_{y_1},A])\). Then, an eigenspace of \(A^T\) is spanned by a subset of \(\{f_i\}\). It also is spanned by a subset of \(\{f_i'\}\). To realize both conditions, the eigenspace needs to be the whole space. So, \(A^T\) is zero, which implies the condition (2) of Condition E3.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hayashi, M. Local equivalence problem in hidden Markov model. Info. Geo. 2, 1–42 (2019). https://doi.org/10.1007/s41884-019-00016-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41884-019-00016-z

Keywords

Navigation