Skip to main content
Log in

On the \(\alpha \)-lazy version of Markov chains in estimation and testing problems

  • Published:
Statistical Inference for Stochastic Processes Aims and scope Submit manuscript

Abstract

Given access to a single long trajectory generated by an unknown irreducible Markov chain M, we simulate an \(\alpha \)-lazy version of M which is ergodic. This enables us to generalize recent results on estimation and identity testing, that were stated for ergodic Markov chains, in a way that allows fully empirical inference. In particular, our approach shows that the pseudo spectral gap introduced by Paulin (Electron J Probab 20:32, 2015) and defined for ergodic Markov chains may be given a meaning already in the case of irreducible but possibly periodic Markov chains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Data availability statement

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

References

  • Batu T, Fischer E, Fortnow L, Kumar R, Rubinfeld R, White P (2001) Testing random variables for independence and identity. In: Proceedings 42nd IEEE symposium on foundations of computer science. IEEE, pp 442–451

  • Boyd S, Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Bui A, Sohier D (2007) How to compute times of random walks based distributed algorithms. Fund Inform 80(4):363–378

    MathSciNet  MATH  Google Scholar 

  • Chan SO, Ding Q, Li SH (2021) Learning and testing irreducible Markov chains via the \( k \)-cover time. In: Algorithmic learning theory. PMLR, pp 458–480

  • Cherapanamjeri Y, Bartlett PL (2019) Testing symmetric Markov chains without hitting. In: Conference on learning theory. PMLR, pp 758–785

  • Daskalakis C, Dikkala N, Gravin N (2018) Testing symmetric Markov chains from a single trajectory. In: Conference on learning theory. PMLR, pp 385–409

  • Ding J, Lee JR, Peres Y (2011) Cover times, blanket times, and majorizing measures. In: Proceedings of the forty-third annual ACM symposium on theory of computing, pp 61–70

  • Feige U, Rabinovich Y (2003) Deterministic approximation of the cover time. Random Struct Algorithms 23(1):1–22

    Article  MathSciNet  MATH  Google Scholar 

  • Feige U, Zeitouni O (2009) Deterministic approximation for the cover time of trees. arXiv preprint arXiv:0909.2005

  • Fill JA (1991) Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process. In: The annals of applied probability, pp 62–87

  • Fried S, Wolfer G (2022) Identity testing of reversible Markov chains. In: International conference on artificial intelligence and statistics. PMLR, pp 798–817

  • Han Y, Jiao J, Weissman T (2015) Minimax estimation of discrete distributions. In: 2015 IEEE international symposium on information theory (ISIT). IEEE, pp. 2291–2295

  • Hao Y, Orlitsky A, Pichapati V (2018) On learning Markov chains. arXiv preprint arXiv:1810.11754

  • Hermon J (2016) Maximal inequalities and mixing times. PhD thesis, UC Berkeley

  • Horn RA, Johnson CR (2012) Matrix analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Kamath S, Orlitsky A, Pichapati D, Suresh AT (2015) On learning distributions from their samples. In: Conference on learning theory, . PMLR, pp. 1066–1100

  • Lalley SP (2009) Convergence rates of Markov chains. Lecture notes, Available online http://galton.uchicago.edu/~lalley/Courses/313ConvergenceRates.pdf. 2012

  • Levin DA, Peres Y (2017) Markov chains and mixing times, vol 107. American Mathematical Society, Providence

    Book  Google Scholar 

  • Marshall AW, Olkin I, Arnold BC (1979) Inequalities: theory of majorization and its applications, vol 143. Springer, New York

    MATH  Google Scholar 

  • Montenegro R, Tetali P (2006) Mathematical aspects of mixing times in Markov chains. Found Trends Theor Comput Sci 1(3):237–354

    Article  MathSciNet  MATH  Google Scholar 

  • Orlitsky A, Suresh AT (2015) Competitive distribution estimation: Why is Good-Turing good. In: NIPS, pp 2143–2151

  • Paulin D (2015) Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electron J Probab 20:32

    Article  MathSciNet  MATH  Google Scholar 

  • Valiant G, Valiant P (2017) An automatic inequality prover and instance optimal identity testing. SIAM J Comput 46(1):429–455

    Article  MathSciNet  MATH  Google Scholar 

  • Wolfer G, Kontorovich A (2019) Estimating the mixing time of ergodic Markov chains. In: Proceedings of the thirty-second conference on learning theory, volume 99 of proceedings of machine learning research. PMLR, pp 3120–3159

  • Wolfer G, Kontorovich A (2020) Minimax testing of identity to a reference ergodic Markov chain. In: Proceedings of the twenty third international conference on artificial intelligence and statistics, volume 108 of proceedings of machine learning research. PMLR, pp 191–201

  • Wolfer G, Kontorovich A (2021) Statistical estimation of ergodic Markov chain kernel over discrete state space. Bernoulli 27(1):532–553

    Article  MathSciNet  MATH  Google Scholar 

  • Wolfer G, Kontorovich A (2022) Improved estimation of relaxation time in non-reversible Markov chains. arXiv preprint arXiv:2209.00175

Download references

Acknowledgements

We thank the anonymous referee for the careful reading of the manuscript and for the insightful suggestions that helped us significantly improve this work. In particular, the referee showed us the proof of Theorem 4.6. We are also grateful to Geoffrey Wolfer for posing us the problem of extending the results of Wolfer and Kontorovich (2019, 2020, 2021) to irreducible Markov chains and for many helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sela Fried.

Ethics declarations

Conflict of interest

We state that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Research Supported by the Israel Science Foundation (ISF) through grant No. 1456/18 and by the European Research Council grant No. 949707.

Appendix

Appendix

1.1 \(\ell _1\)-projection on \(\Delta _d\)

In Example 3.2 (b) it was necessary to project a matrix on \({\mathcal {M}}_d\) with respect to \(||\cdot ||_\infty \). Since, in this norm, each row is considered separately, projecting on \({\mathcal {M}}_d\) is equivalent to projecting each of the rows of the matrix on \(\Delta _d\), with respect to the \(\ell _1\) norm. Thus, we need to solve (at most) d optimization problems of the following form: For \(x\in {\mathbb {R}}^d\), find an \(\ell _1\)-projection \(P_{\Delta _d}(x)\) of x on \(\Delta _d\) (cf. Boyd et al. 2004, p. 397):

$$\begin{aligned} P_{\Delta _d}(x)=\mathop {\mathrm {arg\,min}}\limits _{y\in \Delta _d}\{||y-x||_1\}, \end{aligned}$$
(11)

where

$$\begin{aligned} ||z||_1 = \sum _{i\in [d]}|z(i)|,\; \forall z\in {\mathbb {R}}^d. \end{aligned}$$

Notice that, in contrast to \(||\cdot ||_p\) for \(p>1\), optimization problem (11) has, in general, infinitely many solutions and we will understand \(P_{\Delta _d}(x)\) as the set of all such solutions.

The following lemma seems to be well known but we were not able to find a reference. In it, only parts (a) and (b) and \(|S|\le 1\) are relevant for our needs.

Lemma 5.1

Let \(x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d\). Denote

$$\begin{aligned} S=\{i\in [d]\mid x_i<0\} \text{ and } s = \sum _{i\in [d]\setminus S}x_i. \end{aligned}$$

If \(S=[d]\) then choose any \(y\in \Delta _d\). Otherwise, define \(y=(y_1,\ldots ,y_d)\in \Delta _d\) as follows: Set \(y_i = 0\) for every \(i\in S\). Now, for every \(i\in [d]\setminus S\):

  1. (a)

    If \(s=1\) then set \(y_i = x_i\)

  2. (b)

    If \(s<1\) then choose any \(y_i \ge x_i\) such that \(\sum _{j\in [d]\setminus S} x_j= 1\).

  3. (c)

    If \(s>1\) then choose any \(y_i \le x_i\) such that \(\sum _{j\in [d]\setminus S} x_j= 1\).

Then \(y\in P_{\Delta _d}(x)\).

Proof

First, assume that \(S=[d]\) and let \(y'=(y'_1,\ldots ,y'_d)\in \Delta _d\). For each \(i\in [d]\), denote \(\varepsilon _i = y'_i - y_i\). Notice that \(\sum _{i=1}^d\varepsilon _i = 0\). We have

$$\begin{aligned} \sum _{i=1}^d|y'_i-x_i|= \sum _{i=1}^d(y'_i-x_i) = \sum _{i=1}^d(y_i+\varepsilon _i-x_i) =\sum _{i=1}^d(y_i-x_i) +\sum _{i=1}^d\varepsilon _i =\sum _{i=1}^d|y_i-x_i|. \end{aligned}$$

Now, assume that \(S\ne [d]\) and let \(z=(z_1,\ldots ,z_d)\in \Delta _d\). We consider each of the three possibilities for s separately:

  1. (a)

    Without loss, there exists \(k\in [d]\) such that \(x_i < 0\) for every \(1\le i\le k\) and \(x_i\ge 0\) for every \(k+1\le i\le d\). We have

    $$\begin{aligned} \sum _{i=1}^d|z_i-x_i|&=\sum _{i=1}^{k}|z_i-x_i|+\sum _{i=k+1}^{d}|z_i-x_i|\\&\ge \sum _{i=1}^{k}|0-x_i|+\sum _{i=k+1}^{d}|x_i-x_i|\\&=\sum _{i=1}^{k}|y_i-x_i|+\sum _{i=k+1}^{d}|y_i-x_i|\\&= \sum _{i=1}^d|y_i-x_i|. \end{aligned}$$
  2. (b)

    Without loss, there exist \(1\le k\le l\le m\le d\) such that

    $$\begin{aligned} x_i<0=y_i\le z_i,&\;\;\forall 1\le i\le k \\ 0\le z_i\le x_i\le y_i,&\;\;\forall k+1\le i\le l \\ 0\le x_i\le z_i\le y_i,&\;\;\forall l+1\le i\le m \\ 0\le x_i\le y_i\le z_i,&\;\;\forall m+1\le i\le d . \end{aligned}$$

    Then,

    $$\begin{aligned} \sum _{i=1}^{d}|z_i-x_i| =&\sum _{i=1}^{d}|y_i-x_i|+\sum _{i=1}^{k}(z_i-y_i) +\sum _{i=k+1}^{l}\left( (y_i-z_i)-2(y_i-x_i)\right) -\\&\sum _{i=l+1}^{m}(y_i-z_i)+\sum _{i=m+1}^{d}(z_i-y_i). \end{aligned}$$

    Thus, it suffices to show that

    $$\begin{aligned} \sum _{i=1}^{k}(z_i-y_i)+\sum _{i=k+1}^{l}\left( (y_i-z_i)-2(y_i-x_i)\right) -\sum _{i=l+1}^{m}(y_i-z_i)+\sum _{i=m+1}^{d}(z_i-y_i)\ge 0.\nonumber \\ \end{aligned}$$
    (12)

    Indeed,

    $$\begin{aligned} {}(12)&\iff \overbrace{\sum _{i=1}^{d}z_i}^{=1}+\sum _{i=k+1}^{l}\left( -2(-x_i+z_i)\right) -\overbrace{\sum _{i=k+1}^{d}y_i}^{=1}\ge 0 \iff \sum _{i=k+1}^{l}(x_i-z_i)\ge 0 \end{aligned}$$

    and, by assumption, \(x_i\ge z_i\) for every \(k+1\le i\le l\).

  3. (c)

    Similar to the previous case.

\(\square \)

An ordinary triangle inequality trick would have introduced an additional 2-factor on the sample complexity in Example 3.2 (b). The following lemma shows that this may be avoided:

Lemma 5.2

Suppose \((y_1,\ldots ,y_d)\in \Delta _d\) and let \(x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d\) such that \(||x||_1=1\). Let \(\varepsilon >0\) and assume that \(||x-y||_1<\varepsilon \). Then there exists \(x'=(x'_1,\ldots ,x'_d)\in P_{\Delta _d}(x)\) such that \(||x'-y||_1<\varepsilon \).

Proof

If \(x_1,\ldots ,x_d\ge 0\) then \(x\in \Delta _d\) and we may take \(x'=x\). Otherwise, assume without loss that there is \(1\le k\le d\) such that \(x_i<0\), for \(1\le i\le k\) and \(x_i\ge 0\), for \(k+1\le i\le d\). Let \(k+1\le l\le d\) be minimal such that \(\sum _{i=1}^l x_i\ge 0\) (such l must exist since \(||x||_1=1\)). Define \(x'=(x'_1,\ldots ,x'_d)\in \Delta _d\) as follows: For \(1\le i\le d\) let

$$\begin{aligned} x'_i= {\left\{ \begin{array}{ll} 0, &{} \quad \text {if } 1\le i\le l-1; \\ \sum _{j=1}^l x_j, &{} \quad \text {if } i=l; \\ x_i, &{} \quad \text {if } l+1\le i\le d. \end{array}\right. } \end{aligned}$$

We have

$$\begin{aligned} \sum _{i=1}^d|x'_i-y_i|&=\sum _{i=1}^{l-1}y_i + |\sum _{i=1}^l x_i- y_l| + \sum _{i=l+1}^d|x_i-y_i| \\&\le \sum _{i=1}^{l-1}y_i - \sum _{i=1}^{l-1}x_i + \sum _{i=l}^d|x_i-y_i|\\&\le \sum _{i=1}^d|x_i-y_i|<\varepsilon \end{aligned}$$

\(\square \)

1.2 Proof of Lemma 2.1

Since \(\lambda _2\ge \lambda _d\) and, by assumption, \(\lambda _2\le -\lambda _d\), necessarily \(\lambda _d<0\). Now, we notice that the second largest and the smallest eigenvalues of \({\mathcal {L}}_\alpha (M)\) are given by \(\alpha +(1-\alpha )\lambda _{2}\) and \(\alpha +(1-\alpha )\lambda _{d}\), respectively. We have

$$\begin{aligned}&\gamma _\star (M)\le \gamma _\star ({\mathcal {L}}_\alpha (M))&\iff \nonumber \\&\max \{\alpha +(1-\alpha )\lambda _{2}, |\alpha +(1-\alpha )\lambda _{d}|\}\le \max \{\lambda _2, |\lambda _d|\} = -\lambda _d&\iff \nonumber \\&{\left\{ \begin{array}{ll} \alpha +(1-\alpha )\lambda _{2} \le -\lambda _d, &{}\\ |\alpha +(1-\alpha )\lambda _{d}|\le -\lambda _d&{} \end{array}\right. }&\iff \nonumber \\&{\left\{ \begin{array}{ll} \alpha +(1-\alpha )\lambda _{2} \le -\lambda _d, &{}\\ \alpha +(1-\alpha )\lambda _{d}\le -\lambda _d,&{}\\ \alpha +(1-\alpha )\lambda _{d}\ge \lambda _d&{} \end{array}\right. }&\iff \nonumber \\&{\left\{ \begin{array}{ll} \alpha \le \frac{-\lambda _{d}-\lambda _{2}}{1-\lambda _{2}}, &{}\\ \alpha \le \frac{-2\lambda _{d}}{1-\lambda _{d}},&{}\\ \alpha (1-\lambda _{d})\ge 0.&{} \end{array}\right. } \end{aligned}$$
(13)

It is easily seen that, since \(1>\lambda _2\ge \lambda _d>-1\), we have \(\frac{-\lambda _{d}-\lambda _{2}}{1-\lambda _{2}}\le \frac{-2\lambda _{d}}{1-\lambda _{d}}\) and that the third inequality in (13) holds trivially. Thus,

$$\begin{aligned} {(}) \iff \alpha \in \left[ 0,\frac{-\lambda _{d}-\lambda _{2}}{1-\lambda _{2}}\right] , \end{aligned}$$

which proves the first assertion.

Turning to the second assertion, we have

$$\begin{aligned}&\mathop {\mathrm {arg\,max}}\limits _{\alpha \in [0,1]}\{\gamma _\star ({\mathcal {L}}_\alpha (M))\}=\\&\mathop {\mathrm {arg\,min}}\limits _{\alpha \in [0,1]}\max \{\alpha +(1-\alpha )\lambda _{2}, |\alpha +(1-\alpha )\lambda _{d}|\}=\\&\mathop {\mathrm {arg\,min}}\limits _{\alpha \in [0,1]}{\left\{ \begin{array}{ll}\max \{\alpha +(1-\alpha )\lambda _{2}, -\alpha -(1-\alpha )\lambda _{d}\},&{}\text {if }\alpha<\frac{-\lambda _{d}}{1-\lambda _{d}};\\ \max \{\alpha +(1-\alpha )\lambda _{2}, \alpha +(1-\alpha )\lambda _{d}\},&{}\text {if }\alpha \ge \frac{-\lambda _{d}}{1-\lambda _{d}}\end{array}\right. }=\\&\mathop {\mathrm {arg\,min}}\limits _{\alpha \in [0,1]}{\left\{ \begin{array}{ll}\max \{\alpha +(1-\alpha )\lambda _{2}, -\alpha -(1-\alpha )\lambda _{d}\},&{}\text {if }\alpha<\frac{-\lambda _{d}}{1-\lambda _{d}};\\ \alpha +(1-\alpha )\lambda _{2},&{}\text {if }\alpha \ge \frac{-\lambda _{d}}{1-\lambda _{d}}\end{array}\right. }=\\&\mathop {\mathrm {arg\,min}}\limits _{\alpha \in [0,1]}{\left\{ \begin{array}{ll}\max \{\alpha (1-\lambda _{2})+\lambda _{2}, \alpha (\lambda _{d}-1)-\lambda _{d}\},&{}\text {if }\alpha<\frac{-\lambda _{d}}{1-\lambda _{d}};\\ \alpha (1-\lambda _{2})+\lambda _{2},&{}\text {if }\alpha \ge \frac{-\lambda _{d}}{1-\lambda _{d}}\end{array}\right. }= \\&\mathop {\mathrm {arg\,min}}\limits _{\alpha \in [0,1]}{\left\{ \begin{array}{ll} \alpha (\lambda _{d}-1)-\lambda _{d},&{}\text {if }\alpha<\frac{-\lambda _{d}-\lambda _{2}}{2-\lambda _{d}-\lambda _{2}};\\ \alpha (1-\lambda _{2})+\lambda _{2},&{}\text {if }\frac{-\lambda _{d}-\lambda _{2}}{2-\lambda _{d}-\lambda _{2}}\le \alpha<\frac{-\lambda _{d}}{1-\lambda _{d}};\\ \alpha (1-\lambda _{2})+\lambda _{2},&{}\text {if }\alpha \ge \frac{-\lambda _{d}}{1-\lambda _{d}}\end{array}\right. }&= \\&\mathop {\mathrm {arg\,min}}\limits _{\alpha \in [0,1]}{\left\{ \begin{array}{ll} \alpha (\lambda _{d}-1)-\lambda _{d},&{}\text {if }\alpha <\frac{-\lambda _{d}-\lambda _{2}}{2-\lambda _{d}-\lambda _{2}};\\ \alpha (1-\lambda _{2})+\lambda _{2},&{}\text {if }\frac{-\lambda _{d}-\lambda _{2}}{2-\lambda _{d}-\lambda _{2}}\le \alpha \end{array}\right. }=\\&\frac{-\lambda _{d}-\lambda _{2}}{2-\lambda _{d}-\lambda _{2}}. \end{aligned}$$

Finally, for this \(\alpha \), we have

$$\begin{aligned} \gamma _\star ({\mathcal {L}}_\alpha (M))=\frac{2-2\lambda _{2}}{2-\lambda _{d}-\lambda _{2}}. \end{aligned}$$

\(\square \)

1.3 Proof of Lemma 4.5

Let \(t_0=\max \left\{ \left\lceil \frac{1}{1-\alpha }\right\rceil \left\lceil \ln \frac{2}{\varepsilon }\right\rceil ,t_{\textsf{mix}}\left( M,\frac{\varepsilon }{2}\right) \right\} \) and let \(t=2\left\lceil \frac{1}{1-\alpha }\right\rceil t_0\). Denote by \(\pi \) the stationary distribution of M. Then, for every \(i\in [d]\), we have

$$\begin{aligned}&\left| \left| ({\mathcal {L}}_\alpha (M))^t(i,\cdot ) - \pi \right| \right| \text {TV}\\&=\left| \left| \sum _{n=0}^t \left( {\begin{array}{c}t\\ n\end{array}}\right) \alpha ^{t-n}(1-\alpha )^nM^n(i,\cdot )-\pi \right| \right| \text {TV}\\&\le \sum _{n=0}^{t_0} \left( {\begin{array}{c}t\\ n\end{array}}\right) \alpha ^{t-n}(1-\alpha )^n\overbrace{||M^n(i,\cdot )-\pi ||\text {TV}}^{\le 1}+ \sum _{n=t_0+1}^t \left( {\begin{array}{c}t\\ n\end{array}}\right) \alpha ^{t-n}(1-\alpha )^n||M^n(i,\cdot )-\pi ||\text {TV}\\&\le \sum _{n=0}^{t_0} \left( {\begin{array}{c}t\\ n\end{array}}\right) \alpha ^{t-n}(1-\alpha )^n+||M^{t_0+1}(i,\cdot )-\pi ||\text {TV}\overbrace{\sum _{n=t_0+1}^t \left( {\begin{array}{c}t\\ n\end{array}}\right) \alpha ^{t-n}(1-\alpha )^n}^{\le 1}\\&\le {\mathbb {P}}(Y\ge t-t_0) + ||M^{t_0+1}(i,\cdot )-\pi ||\text {TV}, \end{aligned}$$

where \(Y\sim \text {Binomial}(t, \alpha )\) and where, in the second inequality, we used that the total variation distance is monotone decreasing (e.g., Lalley 2009, Proposition 7). Since \(t_0\ge t_{\textsf{mix}}\left( M,\frac{\varepsilon }{2}\right) \), we have

$$\begin{aligned} ||M^{t_0+1}(i,\cdot )-\pi ||\text {TV}\le \frac{\varepsilon }{2} \end{aligned}$$

and, since \(t_0\ge \left\lceil \frac{1}{1-\alpha }\right\rceil \left\lceil \ln \frac{2}{\varepsilon }\right\rceil \), by Hoeffding’s inequality,

$$\begin{aligned} {\mathbb {P}}(Y\ge t-t_0)\le \exp \left( -\frac{2t_{0}^{2}}{t}\right) \le \frac{\varepsilon }{2}. \end{aligned}$$

The assertion regarding \(t_{\textsf{mix}}({\mathcal {L}}_\alpha (M))\) follows from the general assertion regarding \(t_{\textsf{mix}}({\mathcal {L}}_\alpha (M), \varepsilon )\), together with the inequality

$$\begin{aligned} t_{\textsf{mix}}(M,\varepsilon )\le \log _{2}\left( \frac{1}{\varepsilon }\right) t_{\textsf{mix}}(M) \end{aligned}$$

(cf. Levin and Peres 2017, (4.34)). \(\square \)

1.4 Proof of inequality (8)

First, we prove that \(||I-\Pi ||_\pi \le 1\). We have

$$\begin{aligned} ||I-\Pi ||_\pi ^2&=\sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}||(I-\Pi )f^T||_{\pi }^{2}\\&=\sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}\sum _{i\in [d]}\left( f(i)-\pi f^T\right) ^{2}\pi (i)\\&=\sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}1-(\pi f^T)^{2}\le 1. \end{aligned}$$

Now, let \(M\in {\mathcal {M}}_d^{\text {irr}}\) with stationary distribution \(\pi \). Since \(\Pi = M\Pi \) and due to the sub-multiplicativity of \(||\cdot ||_\pi \), we have

$$\begin{aligned} ||M-\Pi ||_{\pi }=||M-M\Pi ||_{\pi }=||M(I-\Pi )||_{\pi }\le ||M||_{\pi }||I-\Pi ||_{\pi }. \end{aligned}$$

Thus, it remains to prove that \(||M||_\pi \le 1\). Using Jensen’s inequality, we have

$$\begin{aligned} ||M||_\pi ^2&=\sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}||Mf^T||_{\pi }^{2}\\&=\sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}\sum _{i\in [d]}\left( \sum _{j\in [d]}M(i,j)f(j)\right) ^{2}\pi (i)\\&\le \sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}\sum _{i\in [d]}\sum _{j\in [d]}M(i,j)f(j)^{2}\pi (i)\\&=\sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}\sum _{j\in [d]}\left( \overbrace{\sum _{i\in [d]}\pi (i)M(i,j)}^{=\pi (j)}\right) f(j)^{2}=1. \end{aligned}$$

Finally, \(||M^*||_\pi \le 1\) holds since the time reversal \(M^*\) of M also has \(\pi \) as stationary distribution (e.g., Levin and Peres 2017, Proposition 1.23). \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fried, S. On the \(\alpha \)-lazy version of Markov chains in estimation and testing problems. Stat Inference Stoch Process 26, 413–435 (2023). https://doi.org/10.1007/s11203-022-09283-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11203-022-09283-7

Keywords

Navigation