Local equivalence problem in hidden Markov model

Hayashi, Masahito

doi:10.1007/s41884-019-00016-z

Local equivalence problem in hidden Markov model

Research Paper
Published: 10 June 2019

Volume 2, pages 1–42, (2019)
Cite this article

Information Geometry Aims and scope Submit manuscript

Masahito Hayashi ORCID: orcid.org/0000-0003-3104-1000^1,2,3,4

565 Accesses
4 Citations
5 Altmetric
Explore all metrics

Abstract

In the hidden Markov process, there is a possibility that two different transition matrices for hidden and observed variables yield the same stochastic behavior for the observed variables. Since such two transition matrices cannot be distinguished, we need to identify them and consider that they are equivalent, in practice. We address the equivalence problem of hidden Markov process in a local neighborhood by using the geometrical structure of hidden Markov process. For this aim, we introduce a mathematical concept to express Markov process, and formulate its exponential family by using generators. Then, the above equivalence problem is formulated as the equivalence problem of generators. Taking this equivalence problem into account, we derive several concrete parametrizations in several natural cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quasi-stationarity and quasi-ergodicity of general Markov processes

Article 20 June 2014

The continuous-time hidden Markov model based on discretization. Properties of estimators and applications

Article Open access 23 June 2023

On a Construction of Stationary Processes via Bilateral Matrix-Exponential Distributions

Notes

A $\mathcal{Y}$-indexed transition matrix on $\mathcal{X}$ can be regarded as the classical version of measuring instrument of the quantum setting [9, 10], which describes the quantum measuring process. The recent paper [13] characterizes quantum hidden Markov process by using measuring instrument.
For the Perron–Frobenius eigenvalue and Perron–Frobenius eigenvector, see the references [11, Theorem 3.1.] [14].

References

Ito, H., Amari, S.-I., Kobayashi, K.: Identifiability of hidden Markov information sources and their minimum degrees of freedom. IEEE Trans. Inf. Theory 38(2), 324–333 (1992)
Article MathSciNet MATH Google Scholar
Amari, S., Nagaoka, H.: Methods of information geometry. Oxford University Press, Oxford (2000)
MATH Google Scholar
Nakagawa, K., Kanaya, F.: On the converse theorem in statistical hypothesis testing for Markov chains. IEEE Trans. Inf. Theory 39(2), 629–633 (1993)
Article MathSciNet MATH Google Scholar
Nagaoka, H.: The exponential family of Markov chains and its information geometry. In: Proceedings of the 28th Symposium on Information Theory and its Applications (SITA2005), Okinawa, 20–23 Nov 2005 (2005)
Hayashi, M., Watanabe, S.: Information geometry approach to parameter estimation in Markov Chains. Ann. Stat. 44(4), 1495–1535 (2016)
Article MathSciNet MATH Google Scholar
Amari, S.: $\alpha $-divergence is unique, belonging to both $f$-divergence and Bregman divergence classes. IEEE Trans. Inf. Theory 55(11), 4925–4931 (2009)
Article MathSciNet MATH Google Scholar
Bregman, L.: The relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming. Comput. Math. Phys. USSR 7, 200–217 (1967)
Article MathSciNet Google Scholar
Watanabe, S., Hayashi, M.: Finite-length analysis on tail probability for Markov chain and application to simple hypothesis testing. Ann. Appl. Probab. 27(2), 811–845 (2017)
Article MathSciNet MATH Google Scholar
Hayashi, M.: Quantum Information Theory. Graduate Texts in Physics, Springer, New York (2017)
Ozawa, M.: Quantum measuring processes of continuous observables. J. Math. Phys. 25, 79 (1984)
Article MathSciNet Google Scholar
Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications, 2nd edn. Springer, New York (1998)
Book MATH Google Scholar
Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Undergraduate Texts in Mathematics. Springer, New York (1960)
MATH Google Scholar
Hayashi, M., Yoshida, Y.: Asymptotic and non-asymptotic analysis for a hidden Markovian process with a quantum hidden system. J. Phys. A: Math. Theor. 51(33), 335304 (2018)
Article MathSciNet MATH Google Scholar
Seneta, E.: Non Negative matrix and Markov chains, 2nd edn. Springer, New York (1981)
Book MATH Google Scholar
Hayashi, M.: Information geometry approach to parameter estimation in hidden Markov model. arXiv:1705.06040 (2017)

Download references

Acknowledgements

The author is very grateful to Professor Takafumi Kanamori, Professor Vincent Y. F. Tan, and Dr. Wataru Kumagai for helpful discussions and comments. The works reported here were supported in part by the JSPS Grant-in-Aid for Scientific Research (B) No. 16KT0017, (A) No. 17H01280, the Okawa Research Grant and Kayamori Foundation of Informational Science Advancement.

Author information

Authors and Affiliations

The Graduate School of Mathematics, Nagoya University, Nagoya, Japan
Masahito Hayashi
Center for Advanced Intelligence Project, RIKEN, Wako, Japan
Masahito Hayashi
Shenzhen Institute for Quantum Science and Engineering, Southern University of Science and Technology, Shenzhen, China
Masahito Hayashi
The Centre for Quantum Technologies, National University of Singapore, Singapore, Singapore
Masahito Hayashi

Authors

Masahito Hayashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masahito Hayashi.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Proof of Theorem 5

It is enough to discuss the one-parameter case. Since $\mathbf{(B2)} \Rightarrow \mathbf{(B3)}$ is trivial, we will show only $\mathbf{(B1)} \Rightarrow \mathbf{(B2)}$ and $\mathbf{(B3)} \Rightarrow \mathbf{(B1)}$.

$\mathbf{(B1)} \Rightarrow \mathbf{(B2)}$ Assume (B1). There exist $A \in \mathcal{M}(\mathcal{V}_\mathcal{X})$ and $(B_{y})_{y\in \mathcal{Y}} \in \mathcal{L}_{2,\mathbf {W}}$ such that $A|P\rangle =0$, $A^T|u_{\mathcal{X}}\rangle =0 $, $B_y (\mathcal{V}^{k_{(P,\mathbf {W})}}(P) +\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}] ) \subset \mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}]$, $(\sum _y B_y)^T |u_{\mathcal{X}}\rangle =0 $, and $\frac{d }{d \theta } W_{\theta ,y}|_{{\theta }=0}=B_y+[W_y,A]$ for any $y \in \mathcal{Y}$. Then,

$$\begin{aligned}&\frac{d }{d \theta } P^{k}[\mathbf {W}_{\varvec{\theta }}]\cdot P (y_1, \ldots , y_k)|_{{\theta }=0} =\frac{d }{d \theta } (\langle u_{\mathcal{X}}| W_{\theta ,y_k}W_{\theta ,y_{k-1}} \ldots W_{\theta ,y_1} |P\rangle )|_{{\theta }=0} \nonumber \\&\quad = \left\langle u_{\mathcal{X}}\left| \left( \frac{d }{d \theta } W_{\theta ,y_k}|_{{\theta }=0}\right) W_{y_{k-1}} \ldots W_{y_1} \right| P\right\rangle \nonumber \\&\qquad + \left\langle u_{\mathcal{X}}\left| W_{y_k} \left( \frac{d }{d \theta } W_{\theta ,y_{k-1}}|_{{\theta }=0}\right) \ldots W_{y_1} \right| P\right\rangle \nonumber \\&\qquad + \cdots + \left\langle u_{\mathcal{X}}\left| W_{y_k} W_{y_{k-1}} \ldots \left( \frac{d }{d \theta } W_{\theta ,y_1}|_{{\theta }=0}\right) \right| P\right\rangle \end{aligned}$$

(98)

$$\begin{aligned}&= \langle u_{\mathcal{X}}| B_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P\rangle + \langle u_{\mathcal{X}}| W_{y_k} B_{y_{k-1}} \ldots W_{y_1} |P\rangle \nonumber \\&\qquad + \cdots + \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots B_{y_1} |P\rangle \nonumber \\&\qquad + \langle u_{\mathcal{X}}| [W_{y_k},A] W_{y_{k-1}} \ldots W_{y_1} |P\rangle + \langle u_{\mathcal{X}}| W_{y_k} [W_{y_{k-1}},A] \ldots W_{y_1} |P\rangle \nonumber \\&\qquad + \cdots + \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots [W_{y_1},A] |P\rangle \end{aligned}$$

(99)

$$\begin{aligned}&{\mathop {=}\limits ^{(a)}} -\langle u_{\mathcal{X}}| A W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P\rangle +\langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1}A |P\rangle {\mathop {=}\limits ^{(b)}} 0, \end{aligned}$$

(100)

where (a) follows from the fact that the image of $B_y$ is included in $\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}]$, and (b) follows from the properties of A. So, we obtain (B2).

$\mathbf{(B3)} \Rightarrow \mathbf{(B1)}$ Assume (B3). We define $\mathbf {W}_{\theta ,y}':= \mathbf {W}_{y}+ \theta \frac{d}{d\theta }\mathbf {W}_{\theta ,y}|_{\theta =0}$. So, we have $P^{k_{\mathbf {W}}+k_{(P,\mathbf {W})}+1}[\mathbf {W}_{{\theta }}']\cdot P =P^{k_{\mathbf {W}}+k_{(P,\mathbf {W})}+1}[\mathbf {W}]\cdot P $ and

$$\begin{aligned} \lim _{\theta \rightarrow 0} \frac{\mathbf {W}_{\theta ,y}-\mathbf {W}_{\theta ',y}}{\theta }=0 . \end{aligned}$$

(101)

Theorem 4 guarantees that the pair of $\mathbf {W}$ and P is equivalent to the pair of $\mathbf {W}_\theta '$ and P. Thus, Theorem 4 guarantees that there exist an invertible map $T_\theta $ on $\mathcal{V}_{\mathcal{X}}$ and an element $(B_{\theta ,y})_{y\in \mathcal{Y}} \in \mathcal{L}_{2,\mathbf {W}}$ such that $T_\theta P=P$, $B_{\theta ,y} (\mathcal{V}^{k_{(P,\mathbf {W})}}(P) +\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}] ) \subset \mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}]$ and $\mathbf {W}_{\theta ,y}'= T_\theta ^{-1}(W_y+B_{\theta ,y})T_\theta $.

Now, taking the derivative at $\theta =0$, we have $\frac{d}{d\theta }\mathbf {W}_{\theta ,y}'|_{\theta =0}=[W_y,A]+B_y$, where $A:=\frac{d}{d\theta }T_\theta |_{\theta =0}$ and $B_y:=\frac{d}{d\theta }B_{\theta ,y}|_{\theta =0}$. The condition $T_\theta P=P$ implies that

$$\begin{aligned} A|P\rangle =0. \end{aligned}$$

(102)

Using (101), we have

$$\begin{aligned} \frac{d}{d\theta }\mathbf {W}_{\theta ,y}|_{\theta =0} =[W_y,A]+B_y . \end{aligned}$$

(103)

Since the relation $(\mathbf {W}_{\theta ,y})^T|u_{\mathcal{X}}\rangle =|u_{\mathcal{X}}\rangle $ implies $(\sum _y \frac{d}{d\theta }\mathbf {W}_{\theta ,y}|_{\theta =0})^T|u_{\mathcal{X}}\rangle =0$, $(\sum _y [W_y,A]+B_y)^T |u_{\mathcal{X}}\rangle =0$. Since $B_y^T |u_{\mathcal{X}}\rangle =0$, we have $ ([\sum _y W_y,A])^T |u_{\mathcal{X}}\rangle =0$. So, $(\sum _y W_y)^T A^T |u_{\mathcal{X}}\rangle = A^T(\sum _y W_y)^T |u_{\mathcal{X}}\rangle = A^T |u_{\mathcal{X}}\rangle $. That is, $A^T |u_{\mathcal{X}}\rangle $ is an eigenvector of $(\sum _y W_y)^T$ with eigenvalue 1. So, $A^T |u_{\mathcal{X}}\rangle $ is written as $c |u_{\mathcal{X}}\rangle $ with a constant c, i.e.,

$$\begin{aligned} A^T |u_{\mathcal{X}}\rangle =c |u_{\mathcal{X}}\rangle . \end{aligned}$$

(104)

Now, we calculate $ \frac{d }{d \theta } P^{k}[\mathbf {W}_{\varvec{\theta }}]\cdot P (y_1, \ldots , y_k)|_{{\theta }=0}$ by using the same discussion as (100). So, we have

$$\begin{aligned}&\frac{d }{d \theta } P^{k}[\mathbf {W}_{\varvec{\theta }}]\cdot P (y_1, \ldots , y_k)|_{{\theta }=0} \nonumber \\&\qquad = -\langle u_{\mathcal{X}}| A W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P\rangle + \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1}A |P\rangle \nonumber \\&\qquad {\mathop {=}\limits ^{(a)}} -c \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P\rangle , \end{aligned}$$

(105)

where (a) follows from (102) and (104). Since $\langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P\rangle >0$ and the LHS is zero, we have $c=0$. Thus, we obtain (B1).

Proof of Theorem 6

It is enough to discuss the one-parameter case. Since $\mathbf{(C2)} \Rightarrow \mathbf{(C3)}$ is trivial, we will show only $\mathbf{(C1)} \Rightarrow \mathbf{(C2)}$ and $\mathbf{(C3)} \Rightarrow \mathbf{(C1)}$.

$\mathbf{(C1)} \Rightarrow \mathbf{(C2)}$ Assume (C1). There exist a real number $c\in \mathbb {R}$, $A \in \mathcal{M}(\mathcal{V}_\mathcal{X})$, and $(B_{y})_{y\in \mathcal{Y}} \in \mathcal{L}_{2,\mathbf {W}}$ such that

$$\begin{aligned} A^T|u_{\mathcal{X}}\rangle =c |u_{\mathcal{X}}\rangle , \end{aligned}$$

(106)

$$\begin{aligned} B_y (\mathcal{V}^{k_{(P,\mathbf {W})}}(P) +\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}] ) \subset \mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}], \end{aligned}$$

(107)

$$\begin{aligned} \left( \sum _y B_y\right) ^T |u_{\mathcal{X}}\rangle =0 \end{aligned}$$

(108)

$$\begin{aligned} \frac{d }{d \theta } W_{\theta ,y}|_{{\theta }=0} =B_y+[W_y,A] \end{aligned}$$

(109)

for any $y \in \mathcal{Y}$. Define the vector $Q:= \frac{d}{d\theta } P_{\mathbf {W}_{\theta }} |_{{\theta }=0}$. Since

$$\begin{aligned} \left( \sum _{y \in \mathcal{Y}} W_{\theta ,y}\right) | P_{\mathbf {W}_{\theta }} \rangle =| P_{\mathbf {W}_{\theta }} \rangle , \end{aligned}$$

(110)

we have

$$\begin{aligned}&\left( \sum _{y \in \mathcal{Y}} B_y\right) | P_{\mathbf {W}} \rangle +\left( \sum _{y \in \mathcal{Y}} W_y\right) A | P_{\mathbf {W}} \rangle -A | P_{\mathbf {W}} \rangle + \left( \sum _{y \in \mathcal{Y}} W_{y}\right) | Q \rangle \nonumber \\&\quad {\mathop {=}\limits ^{(a)}} \Big ( \Big (\sum _{y \in \mathcal{Y}} B_y\Big ) +\Big ( \Big (\sum _{y \in \mathcal{Y}} W_y\Big )A -A\Big (\sum _{y \in \mathcal{Y}} W_y\Big )\Big ) \Big ) | P_{\mathbf {W}} \rangle + \Big (\sum _{y \in \mathcal{Y}} W_{y}\Big ) | Q \rangle \nonumber \\&\quad = \Big (\sum _{y \in \mathcal{Y}} B_y+[W_y,A]\Big ) | P_{\mathbf {W}} \rangle + \Big (\sum _{y \in \mathcal{Y}} W_{y}\Big ) | Q \rangle \nonumber \\&\quad {\mathop {=}\limits ^{(b)}}| Q \rangle , \end{aligned}$$

(111)

where (a) and (b) follow from (110) and its derivative, respectively.

That is,

$$\begin{aligned} \Big (\sum _{y \in \mathcal{Y}} W_y\Big ) \Big (| Q \rangle + A |P_{\mathbf {W}} \rangle \Big ) = \Big (| Q \rangle + A |P_{\mathbf {W}} \rangle \Big ) -\Big (\sum _{y \in \mathcal{Y}} B_y\Big ) | P_{\mathbf {W}} \rangle . \end{aligned}$$

(112)

Since $\Big (\sum _{y \in \mathcal{Y}} W_y\Big ) \Big (\sum _{y \in \mathcal{Y}} B_y\Big ) | P_{\mathbf {W}} \rangle =0$, we have

$$\begin{aligned}&\bigg (\sum _{y \in \mathcal{Y}} W_y\bigg ) \Bigg (| Q \rangle + A |P_{\mathbf {W}} \rangle -\bigg (\sum _{y \in \mathcal{Y}} B_y\bigg ) | P_{\mathbf {W}} \rangle \Bigg ) \nonumber \\&\quad = \Bigg (| Q \rangle + A |P_{\mathbf {W}} \rangle -\bigg (\sum _{y \in \mathcal{Y}} B_y\bigg ) | P_{\mathbf {W}} \rangle \Bigg ). \end{aligned}$$

(113)

Due to the uniqueness of the eigenvector of $\bigg (\sum _{y \in \mathcal{Y}} W_y\bigg )$ with eigenvalue 1, we have

$$\begin{aligned} | Q \rangle + A |P_{\mathbf {W}} \rangle -\bigg (\sum _{y \in \mathcal{Y}} B_y\bigg ) | P_{\mathbf {W}} \rangle = c' |P_{\mathbf {W}} \rangle \end{aligned}$$

(114)

with a constant $c' \in \mathbb {R}$.

Since

$$\begin{aligned} \sum _{y \in \mathcal{Y}} \langle u_{\mathcal{X}}| W_{\theta ,y}|P_{\mathbf {W}_{\theta }} \rangle =1, \end{aligned}$$

(115)

we have

$$\begin{aligned}&c '-c \nonumber \\&\quad = c '\left\langle u_{\mathcal{X}}\left| \left( \sum _{y \in \mathcal{Y}} W_{y}\right) \right| P_{\mathbf {W}} \right\rangle -c\langle u_{\mathcal{X}} |P_{\mathbf {W}} \rangle \nonumber \\&\quad {\mathop {=}\limits ^{(a)}} c '\left\langle u_{\mathcal{X}}\left| \left( \sum _{y \in \mathcal{Y}} W_{y}\right) \right| P_{\mathbf {W}} \right\rangle -\langle u_{\mathcal{X}}| A |P_{\mathbf {W}} \rangle \nonumber \\&\quad = \left\langle u_{\mathcal{X}}| \left( \sum _{y \in \mathcal{Y}} W_{y}\right) ( c' |P_{\mathbf {W}} \rangle -A |P_{\mathbf {W}} \rangle +\left( \sum _{y \in \mathcal{Y}} B_y\right) | P_{\mathbf {W}} \rangle \right) \nonumber \\&\quad {\mathop {=}\limits ^{(b)}} \left\langle u_{\mathcal{X}}| \left( \sum _{y \in \mathcal{Y}} W_{y}\right) |Q\right\rangle \nonumber \\&\quad = \langle u_{\mathcal{X}}| A|P_{\mathbf {W}} \rangle -\langle u_{\mathcal{X}}| A|P_{\mathbf {W}} \rangle + \left\langle u_{\mathcal{X}}| \left( \sum _{y \in \mathcal{Y}} W_{y}\right) |Q\right\rangle \nonumber \\&\quad = \left\langle u_{\mathcal{X}}| \Bigg (\bigg (\sum _{y \in \mathcal{Y}} W_y\bigg )A - A \bigg (\sum _{y \in \mathcal{Y}} W_y\bigg ) \Bigg ) |P_{\mathbf {W}} \right\rangle + \left\langle u_{\mathcal{X}}| \sum _{y \in \mathcal{Y}} W_{y} |Q\right\rangle \nonumber \\&\quad = \left\langle u_{\mathcal{X}}\left| \sum _{y \in \mathcal{Y}} B_y+[W_y,A] \right| P_{\mathbf {W}} \right\rangle + \left\langle u_{\mathcal{X}}\left| \sum _{y \in \mathcal{Y}} W_{y} \right| Q\right\rangle \nonumber \\&\quad {\mathop {=}\limits ^{(c)}} \left\langle u_{\mathcal{X}}| \sum _{y \in \mathcal{Y}} \frac{d}{d\theta }W_{\theta ,y}|_{{\theta }=0}|P_{\mathbf {W}} \right\rangle + \left\langle u_{\mathcal{X}}| \sum _{y \in \mathcal{Y}} W_{y} \Big |\frac{d}{d\theta } P_{\mathbf {W}_{\theta }} |_{{\theta }=0}\right\rangle \nonumber \\&\quad {\mathop {=}\limits ^{(d)}}0 , \end{aligned}$$

(116)

where (a), (b), (c), and (d) follow from (106), (114), (109), and (115), respectively.

Similar to (100), we have

$$\begin{aligned}&\frac{d }{d \theta } P^{k}[\mathbf {W}_{\varvec{\theta }}]\cdot P_{\mathbf {W}_\theta } (y_1, \ldots , y_k)|_{{\theta }=0} \nonumber \\&\quad {\mathop {=}\limits ^{(a)}} -\langle u_{\mathcal{X}}| A W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P_{\mathbf {W}}\rangle + \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1}A |P_{\mathbf {W}}\rangle \nonumber \\&\qquad +\langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |Q\rangle \nonumber \\&\quad {\mathop {=}\limits ^{(b)}} -c \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1} |P_{\mathbf {W}}\rangle + \langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1}A |P_{\mathbf {W}}\rangle \nonumber \\&\qquad +\langle u_{\mathcal{X}}| W_{y_k} W_{y_{k-1}} \ldots W_{y_1} \Bigg ( c '|P_{\mathbf {W}} \rangle -A |P_{\mathbf {W}} \rangle +\bigg (\sum _{y \in \mathcal{Y}} B_y\bigg ) | P_{\mathbf {W}} \rangle \Bigg ) \nonumber \\&\quad {\mathop {=}\limits ^{(c)}} 0. \end{aligned}$$

(117)

Here, (a) follows from a derivation similar to (100). That is, we need to care about the derivative of $|P_{\mathbf {W}_\theta } \rangle $. (b) follows from (106) and (114), and (c) does from (116). So, we obtain (2).

$\mathbf{(C3)} \Rightarrow \mathbf{(C1)}$ Assume (C3). We define $\mathbf {W}_{\theta ,y}'$ in the same way as the proof of Theorem 5. So, similar to the proof of Theorem 5, there exist an invertible map $T_\theta $ on $\mathcal{V}_{\mathcal{X}}$ and $(B_{\theta ,y})_{y\in \mathcal{Y}} \in \mathcal{L}_{2,\mathbf {W}}$ such that $T_\theta P_{\mathbf {W}}=P_{\mathbf {W}_\theta '}$, $B_{\theta ,y} (\mathcal{V}^{k_{(P,\mathbf {W})}}(P) +\mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}] ) \subset \mathrm{Ker}\,P^{k_{\mathbf {W}}}[\mathbf {W}]$ and $\mathbf {W}_{\theta ,y}'= T_\theta ^{-1}(W_y+B_{\theta ,y})T_\theta $.

Choosing A and $B_y$ in the same way as the proof of Theorem 5, we have

$$\begin{aligned} \frac{d}{d\theta }\mathbf {W}_{\theta ,y}|_{\theta =0}= [W_y,A]+B_y . \end{aligned}$$

(118)

Then, in the same way as the proof of Theorem 5, we obtain (104). Thus, we obtain (C1).

Proofs of Lemmas 11 and 10

To show Lemma 11, we prepare Lemma 18.

Lemma 18

Let $\mathcal{V}_1$ be the direct sum space $\mathcal{V}_2+\mathcal{V}_3$ of two vector spaces $\mathcal{V}_2$ and $\mathcal{V}_3$ with the condition $\mathcal{V}_2\cap \mathcal{V}_3=\{0\}$. Let $\mathcal{V}_4$ ($\mathcal{V}_5$) be a subspace of $\mathcal{V}_2$ ($\mathcal{V}_3$). Assume that a linear map $\alpha _1$ ($\alpha _2$) from $\mathcal{V}_6$ to $\mathcal{V}_2$ ($\mathcal{V}_3$) satisfies that (1) $\alpha _1(\mathcal{V}_1)\cap \mathcal{V}_4 = \{0\} $ and (2) $\alpha _2(\mathrm{Ker}\,\alpha _1)\cap \mathcal{V}_5 = \{0\} $. Define $\alpha _3(v):= \alpha _1(v)+\alpha _2(v)$. Then, $\alpha _3(\mathcal{V}_1)\cap (\mathcal{V}_4+\mathcal{V}_5)= \{0\}$.

Proof of Lemma 18

Assume that $\alpha _1(v_1)=v_4$ and $\alpha _2(v_1)=v_5$ for $v_1 \in \mathcal{V}_1$, $v_4 \in \mathcal{V}_4$, and $v_5 \in \mathcal{V}_5$. Condition (1) implies that $\alpha _1(v_1)=0$. So, $v_1 \in \mathrm{Ker}\,\alpha _1$. Condition (2) and $\alpha _2(v_1)=v_5$ yield that $\alpha _2(v_1)=v_5=0$, which is the desired statement.$\square $

Proof of Lemma 11

Now, we check that the space spanned by $\hat{g}_1, \ldots , \hat{g}_{l_2+l_3} $ has intersection $\{0\}$ with $\mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) +\mathcal{N}_{P_{\mathbf {W}}}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) +\mathcal{N}_2((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}))$. For this purpose, we make preparation. We choose the matrix $\tilde{A}$ as a diagonal matrix with diagonal entry f(x). So, we have $W_y(x|x')(f(x)-f(x'))=[ W_y,\tilde{A}]$. We can restrict function f so that $\sum _x f(x)=0$. Since $\mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) = \{ (f(x)-f(x')+c)_{x,x'} \}$ and $\langle u_{\mathcal{X}}| \tilde{A} |u_{\mathcal{X}}\rangle =0$, we have

$$\begin{aligned} \mathcal{L}_{2,\mathbf {W}}+ \mathbf {W}_* \mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) = \{ (c W_y)_{y \in \mathcal{Y}}\}+ \{(\alpha _{y}(A))_{y \in \mathcal{Y}} | \langle u_{\mathcal{X}}| A|u_{\mathcal{X}}\rangle =0 \}. \end{aligned}$$

(119)

To prove the above issue, it is sufficient to show that a nonzero element of the space spanned by $\hat{g}_1, \ldots , \hat{g}_{l_2+l_3} $ is not contained in the space $\mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) +\mathcal{N}_2((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}))$. If a non-zero element is contained in the space, its matrix components with $y=y_0,y_1$ are given as those of the element of the space $\mathcal{N}((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}) +\mathcal{N}_2((\mathcal{Y}\times \mathcal{X}^{2})_{\mathbf {W}}))$. To deny this statement, we regard $\bar{g}_{j,y_0}$ and $\bar{g}_{j,y_1}$ as elements of $\mathcal{G}(\{y_0,y_1\},\mathcal{X}^2)$. Then, due to (119), it is sufficient to show that the space spanned by $\bar{g}_{1,y_0}, \ldots , \bar{g}_{d,y_0}$, $\bar{g}_{1,y_1}, \ldots , \bar{g}_{d^2-d,y_1}$ has intersection $\{0\}$ with the space $\{(\alpha _{y_0}(A),\alpha _{y_1}(A))| \langle u_{\mathcal{X}}| A|u_{\mathcal{X}}\rangle =0\}$. To show this statement, we apply Lemma 18 to the case when $\mathcal{V}_2$ and $\mathcal{V}_3$ are the set of traceless matrices, $\mathcal{V}_4$ is the space spanned by $\bar{g}_{1,y_0}, \ldots , \bar{g}_{d,y_0}$, $\mathcal{V}_5$ is the space spanned by $\bar{g}_{1,y_1}, \ldots , \bar{g}_{d^2-d,y_1}$, $\alpha _1$ is the map $\alpha _{y_0}$, and $\alpha _2$ is the map $\alpha _{y_1}$. Since $\alpha _{y_0}$ is injective on $\{ A \in \mathcal{M}(\mathcal{V}_\mathcal{X}) | A^T |u_{\mathcal{X}}\rangle =0 \}$ whose dimension is the same as that of the image of $\alpha _{y_0}$, due to the construction of $g_{j,y_0}$, we find that the map $\alpha _{y_0}$ satisfies the condition for $\alpha _1$. So, we obtain the desired statement. $\square $

Proof of Lemma 10

Assume the condition $[W_{y_0},A]=0$. Then, $A^T$ needs to has common eigenvectors with $W_{y_0}$. Due to the condition $a^j \ne 0$ for any j, the eigenspace of $A^T$ including $u_{\mathcal{X}}$ needs to be the whole space. So, $A^T$ is zero, which implies the condition (1) of Condition E3.

Let A be an element of the kernel of the map $A\mapsto ([W_{y_0},A],[W_{y_1},A])$. Then, an eigenspace of $A^T$ is spanned by a subset of $\{f_i\}$. It also is spanned by a subset of $\{f_i'\}$. To realize both conditions, the eigenspace needs to be the whole space. So, $A^T$ is zero, which implies the condition (2) of Condition E3.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hayashi, M. Local equivalence problem in hidden Markov model. Info. Geo. 2, 1–42 (2019). https://doi.org/10.1007/s41884-019-00016-z

Download citation

Received: 30 March 2018
Revised: 07 May 2019
Published: 10 June 2019
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s41884-019-00016-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Local equivalence problem in hidden Markov model

Abstract

Access this article

Similar content being viewed by others

Quasi-stationarity and quasi-ergodicity of general Markov processes

The continuous-time hidden Markov model based on discretization. Properties of estimators and applications

On a Construction of Stationary Processes via Bilateral Matrix-Exponential Distributions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Proof of Theorem 5

Proof of Theorem 6

Proofs of Lemmas 11 and 10

Lemma 18

Proof of Lemma 18

Proof of Lemma 11

Proof of Lemma 10

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Local equivalence problem in hidden Markov model

Abstract

Access this article

Similar content being viewed by others

Quasi-stationarity and quasi-ergodicity of general Markov processes

The continuous-time hidden Markov model based on discretization. Properties of estimators and applications

On a Construction of Stationary Processes via Bilateral Matrix-Exponential Distributions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Proof of Theorem 5

Proof of Theorem 6

Proofs of Lemmas 11 and 10

Lemma 18

Proof of Lemma 18

Proof of Lemma 11

Proof of Lemma 10

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation