On the $$\alpha $$ -lazy version of Markov chains in estimation and testing problems

Fried, Sela

doi:10.1007/s11203-022-09283-7

On the $\alpha $-lazy version of Markov chains in estimation and testing problems

Published: 05 December 2022

Volume 26, pages 413–435, (2023)
Cite this article

Statistical Inference for Stochastic Processes Aims and scope Submit manuscript

Sela Fried ORCID: orcid.org/0000-0002-1547-4925¹

105 Accesses
2 Citations
Explore all metrics

Abstract

Given access to a single long trajectory generated by an unknown irreducible Markov chain M, we simulate an $\alpha $-lazy version of M which is ergodic. This enables us to generalize recent results on estimation and identity testing, that were stated for ergodic Markov chains, in a way that allows fully empirical inference. In particular, our approach shows that the pseudo spectral gap introduced by Paulin (Electron J Probab 20:32, 2015) and defined for ergodic Markov chains may be given a meaning already in the case of irreducible but possibly periodic Markov chains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Correction to a proposition related to ergodicity of Markov chains

Article 05 June 2021

On Weak and Strong Ergodicity

Article 14 February 2019

Equivalences of Geometric Ergodicity of Markov Chains

Article 06 May 2023

Data availability statement

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

References

Batu T, Fischer E, Fortnow L, Kumar R, Rubinfeld R, White P (2001) Testing random variables for independence and identity. In: Proceedings 42nd IEEE symposium on foundations of computer science. IEEE, pp 442–451
Boyd S, Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Book MATH Google Scholar
Bui A, Sohier D (2007) How to compute times of random walks based distributed algorithms. Fund Inform 80(4):363–378
MathSciNet MATH Google Scholar
Chan SO, Ding Q, Li SH (2021) Learning and testing irreducible Markov chains via the $ k $-cover time. In: Algorithmic learning theory. PMLR, pp 458–480
Cherapanamjeri Y, Bartlett PL (2019) Testing symmetric Markov chains without hitting. In: Conference on learning theory. PMLR, pp 758–785
Daskalakis C, Dikkala N, Gravin N (2018) Testing symmetric Markov chains from a single trajectory. In: Conference on learning theory. PMLR, pp 385–409
Ding J, Lee JR, Peres Y (2011) Cover times, blanket times, and majorizing measures. In: Proceedings of the forty-third annual ACM symposium on theory of computing, pp 61–70
Feige U, Rabinovich Y (2003) Deterministic approximation of the cover time. Random Struct Algorithms 23(1):1–22
Article MathSciNet MATH Google Scholar
Feige U, Zeitouni O (2009) Deterministic approximation for the cover time of trees. arXiv preprint arXiv:0909.2005
Fill JA (1991) Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process. In: The annals of applied probability, pp 62–87
Fried S, Wolfer G (2022) Identity testing of reversible Markov chains. In: International conference on artificial intelligence and statistics. PMLR, pp 798–817
Han Y, Jiao J, Weissman T (2015) Minimax estimation of discrete distributions. In: 2015 IEEE international symposium on information theory (ISIT). IEEE, pp. 2291–2295
Hao Y, Orlitsky A, Pichapati V (2018) On learning Markov chains. arXiv preprint arXiv:1810.11754
Hermon J (2016) Maximal inequalities and mixing times. PhD thesis, UC Berkeley
Horn RA, Johnson CR (2012) Matrix analysis. Cambridge University Press, Cambridge
Book Google Scholar
Kamath S, Orlitsky A, Pichapati D, Suresh AT (2015) On learning distributions from their samples. In: Conference on learning theory, . PMLR, pp. 1066–1100
Lalley SP (2009) Convergence rates of Markov chains. Lecture notes, Available online http://galton.uchicago.edu/~lalley/Courses/313ConvergenceRates.pdf. 2012
Levin DA, Peres Y (2017) Markov chains and mixing times, vol 107. American Mathematical Society, Providence
Book Google Scholar
Marshall AW, Olkin I, Arnold BC (1979) Inequalities: theory of majorization and its applications, vol 143. Springer, New York
MATH Google Scholar
Montenegro R, Tetali P (2006) Mathematical aspects of mixing times in Markov chains. Found Trends Theor Comput Sci 1(3):237–354
Article MathSciNet MATH Google Scholar
Orlitsky A, Suresh AT (2015) Competitive distribution estimation: Why is Good-Turing good. In: NIPS, pp 2143–2151
Paulin D (2015) Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electron J Probab 20:32
Article MathSciNet MATH Google Scholar
Valiant G, Valiant P (2017) An automatic inequality prover and instance optimal identity testing. SIAM J Comput 46(1):429–455
Article MathSciNet MATH Google Scholar
Wolfer G, Kontorovich A (2019) Estimating the mixing time of ergodic Markov chains. In: Proceedings of the thirty-second conference on learning theory, volume 99 of proceedings of machine learning research. PMLR, pp 3120–3159
Wolfer G, Kontorovich A (2020) Minimax testing of identity to a reference ergodic Markov chain. In: Proceedings of the twenty third international conference on artificial intelligence and statistics, volume 108 of proceedings of machine learning research. PMLR, pp 191–201
Wolfer G, Kontorovich A (2021) Statistical estimation of ergodic Markov chain kernel over discrete state space. Bernoulli 27(1):532–553
Article MathSciNet MATH Google Scholar
Wolfer G, Kontorovich A (2022) Improved estimation of relaxation time in non-reversible Markov chains. arXiv preprint arXiv:2209.00175

Download references

Acknowledgements

We thank the anonymous referee for the careful reading of the manuscript and for the insightful suggestions that helped us significantly improve this work. In particular, the referee showed us the proof of Theorem 4.6. We are also grateful to Geoffrey Wolfer for posing us the problem of extending the results of Wolfer and Kontorovich (2019, 2020, 2021) to irreducible Markov chains and for many helpful discussions.

Author information

Authors and Affiliations

Department of Computer Science, Ben-Gurion University of the Negev, Be’er-Sheva, Israel
Sela Fried

Authors

Sela Fried
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sela Fried.

Ethics declarations

Conflict of interest

We state that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Research Supported by the Israel Science Foundation (ISF) through grant No. 1456/18 and by the European Research Council grant No. 949707.

Appendix

1.1 $\ell _1$-projection on $\Delta _d$

In Example 3.2 (b) it was necessary to project a matrix on ${\mathcal {M}}_d$ with respect to $||\cdot ||_\infty $. Since, in this norm, each row is considered separately, projecting on ${\mathcal {M}}_d$ is equivalent to projecting each of the rows of the matrix on $\Delta _d$, with respect to the $\ell _1$ norm. Thus, we need to solve (at most) d optimization problems of the following form: For $x\in {\mathbb {R}}^d$, find an $\ell _1$-projection $P_{\Delta _d}(x)$ of x on $\Delta _d$ (cf. Boyd et al. 2004, p. 397):

$$\begin{aligned} P_{\Delta _d}(x)=\mathop {\mathrm {arg\,min}}\limits _{y\in \Delta _d}\{||y-x||_1\}, \end{aligned}$$

(11)

where

$$\begin{aligned} ||z||_1 = \sum _{i\in [d]}|z(i)|,\; \forall z\in {\mathbb {R}}^d. \end{aligned}$$

Notice that, in contrast to $||\cdot ||_p$ for $p>1$, optimization problem (11) has, in general, infinitely many solutions and we will understand $P_{\Delta _d}(x)$ as the set of all such solutions.

The following lemma seems to be well known but we were not able to find a reference. In it, only parts (a) and (b) and $|S|\le 1$ are relevant for our needs.

Lemma 5.1

Let $x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d$. Denote

$$\begin{aligned} S=\{i\in [d]\mid x_i<0\} \text{ and } s = \sum _{i\in [d]\setminus S}x_i. \end{aligned}$$

If $S=[d]$ then choose any $y\in \Delta _d$. Otherwise, define $y=(y_1,\ldots ,y_d)\in \Delta _d$ as follows: Set $y_i = 0$ for every $i\in S$. Now, for every $i\in [d]\setminus S$:

(a)
If $s=1$ then set $y_i = x_i$
(b)
If $s<1$ then choose any $y_i \ge x_i$ such that $\sum _{j\in [d]\setminus S} x_j= 1$.
(c)
If $s>1$ then choose any $y_i \le x_i$ such that $\sum _{j\in [d]\setminus S} x_j= 1$.

Then $y\in P_{\Delta _d}(x)$.

Proof

First, assume that $S=[d]$ and let $y'=(y'_1,\ldots ,y'_d)\in \Delta _d$. For each $i\in [d]$, denote $\varepsilon _i = y'_i - y_i$. Notice that $\sum _{i=1}^d\varepsilon _i = 0$. We have

$$\begin{aligned} \sum _{i=1}^d|y'_i-x_i|= \sum _{i=1}^d(y'_i-x_i) = \sum _{i=1}^d(y_i+\varepsilon _i-x_i) =\sum _{i=1}^d(y_i-x_i) +\sum _{i=1}^d\varepsilon _i =\sum _{i=1}^d|y_i-x_i|. \end{aligned}$$

Now, assume that $S\ne [d]$ and let $z=(z_1,\ldots ,z_d)\in \Delta _d$. We consider each of the three possibilities for s separately:

(a)
Without loss, there exists $k\in [d]$ such that $x_i < 0$ for every $1\le i\le k$ and $x_i\ge 0$ for every $k+1\le i\le d$. We have
$$\begin{aligned} \sum _{i=1}^d|z_i-x_i|&=\sum _{i=1}^{k}|z_i-x_i|+\sum _{i=k+1}^{d}|z_i-x_i|\\&\ge \sum _{i=1}^{k}|0-x_i|+\sum _{i=k+1}^{d}|x_i-x_i|\\&=\sum _{i=1}^{k}|y_i-x_i|+\sum _{i=k+1}^{d}|y_i-x_i|\\&= \sum _{i=1}^d|y_i-x_i|. \end{aligned}$$
(b)
Without loss, there exist $1\le k\le l\le m\le d$ such that
$$\begin{aligned} x_i<0=y_i\le z_i,&\;\;\forall 1\le i\le k \\ 0\le z_i\le x_i\le y_i,&\;\;\forall k+1\le i\le l \\ 0\le x_i\le z_i\le y_i,&\;\;\forall l+1\le i\le m \\ 0\le x_i\le y_i\le z_i,&\;\;\forall m+1\le i\le d . \end{aligned}$$
Then,
$$\begin{aligned} \sum _{i=1}^{d}|z_i-x_i| =&\sum _{i=1}^{d}|y_i-x_i|+\sum _{i=1}^{k}(z_i-y_i) +\sum _{i=k+1}^{l}\left( (y_i-z_i)-2(y_i-x_i)\right) -\\&\sum _{i=l+1}^{m}(y_i-z_i)+\sum _{i=m+1}^{d}(z_i-y_i). \end{aligned}$$
Thus, it suffices to show that
$$\begin{aligned} \sum _{i=1}^{k}(z_i-y_i)+\sum _{i=k+1}^{l}\left( (y_i-z_i)-2(y_i-x_i)\right) -\sum _{i=l+1}^{m}(y_i-z_i)+\sum _{i=m+1}^{d}(z_i-y_i)\ge 0.\nonumber \\ \end{aligned}$$
(12)
Indeed,
$$\begin{aligned} {}(12)&\iff \overbrace{\sum _{i=1}^{d}z_i}^{=1}+\sum _{i=k+1}^{l}\left( -2(-x_i+z_i)\right) -\overbrace{\sum _{i=k+1}^{d}y_i}^{=1}\ge 0 \iff \sum _{i=k+1}^{l}(x_i-z_i)\ge 0 \end{aligned}$$
and, by assumption, $x_i\ge z_i$ for every $k+1\le i\le l$.
(c)
Similar to the previous case.

$\square $

An ordinary triangle inequality trick would have introduced an additional 2-factor on the sample complexity in Example 3.2 (b). The following lemma shows that this may be avoided:

Lemma 5.2

Suppose $(y_1,\ldots ,y_d)\in \Delta _d$ and let $x=(x_1,\ldots ,x_d)\in {\mathbb {R}}^d$ such that $||x||_1=1$. Let $\varepsilon >0$ and assume that $||x-y||_1<\varepsilon $. Then there exists $x'=(x'_1,\ldots ,x'_d)\in P_{\Delta _d}(x)$ such that $||x'-y||_1<\varepsilon $.

Proof

If $x_1,\ldots ,x_d\ge 0$ then $x\in \Delta _d$ and we may take $x'=x$. Otherwise, assume without loss that there is $1\le k\le d$ such that $x_i<0$, for $1\le i\le k$ and $x_i\ge 0$, for $k+1\le i\le d$. Let $k+1\le l\le d$ be minimal such that $\sum _{i=1}^l x_i\ge 0$ (such l must exist since $||x||_1=1$). Define $x'=(x'_1,\ldots ,x'_d)\in \Delta _d$ as follows: For $1\le i\le d$ let

$$\begin{aligned} x'_i= {\left\{ \begin{array}{ll} 0, &{} \quad \text {if } 1\le i\le l-1; \\ \sum _{j=1}^l x_j, &{} \quad \text {if } i=l; \\ x_i, &{} \quad \text {if } l+1\le i\le d. \end{array}\right. } \end{aligned}$$

We have

$$\begin{aligned} \sum _{i=1}^d|x'_i-y_i|&=\sum _{i=1}^{l-1}y_i + |\sum _{i=1}^l x_i- y_l| + \sum _{i=l+1}^d|x_i-y_i| \\&\le \sum _{i=1}^{l-1}y_i - \sum _{i=1}^{l-1}x_i + \sum _{i=l}^d|x_i-y_i|\\&\le \sum _{i=1}^d|x_i-y_i|<\varepsilon \end{aligned}$$

$\square $

1.2 Proof of Lemma 2.1

Since $\lambda _2\ge \lambda _d$ and, by assumption, $\lambda _2\le -\lambda _d$, necessarily $\lambda _d<0$. Now, we notice that the second largest and the smallest eigenvalues of ${\mathcal {L}}_\alpha (M)$ are given by $\alpha +(1-\alpha )\lambda _{2}$ and $\alpha +(1-\alpha )\lambda _{d}$, respectively. We have

$$\begin{aligned}&\gamma _\star (M)\le \gamma _\star ({\mathcal {L}}_\alpha (M))&\iff \nonumber \\&\max \{\alpha +(1-\alpha )\lambda _{2}, |\alpha +(1-\alpha )\lambda _{d}|\}\le \max \{\lambda _2, |\lambda _d|\} = -\lambda _d&\iff \nonumber \\&{\left\{ \begin{array}{ll} \alpha +(1-\alpha )\lambda _{2} \le -\lambda _d, &{}\\ |\alpha +(1-\alpha )\lambda _{d}|\le -\lambda _d&{} \end{array}\right. }&\iff \nonumber \\&{\left\{ \begin{array}{ll} \alpha +(1-\alpha )\lambda _{2} \le -\lambda _d, &{}\\ \alpha +(1-\alpha )\lambda _{d}\le -\lambda _d,&{}\\ \alpha +(1-\alpha )\lambda _{d}\ge \lambda _d&{} \end{array}\right. }&\iff \nonumber \\&{\left\{ \begin{array}{ll} \alpha \le \frac{-\lambda _{d}-\lambda _{2}}{1-\lambda _{2}}, &{}\\ \alpha \le \frac{-2\lambda _{d}}{1-\lambda _{d}},&{}\\ \alpha (1-\lambda _{d})\ge 0.&{} \end{array}\right. } \end{aligned}$$

(13)

It is easily seen that, since $1>\lambda _2\ge \lambda _d>-1$, we have $\frac{-\lambda _{d}-\lambda _{2}}{1-\lambda _{2}}\le \frac{-2\lambda _{d}}{1-\lambda _{d}}$ and that the third inequality in (13) holds trivially. Thus,

$$\begin{aligned} {(}) \iff \alpha \in \left[ 0,\frac{-\lambda _{d}-\lambda _{2}}{1-\lambda _{2}}\right] , \end{aligned}$$

which proves the first assertion.

Turning to the second assertion, we have

$$\begin{aligned}&\mathop {\mathrm {arg\,max}}\limits _{\alpha \in [0,1]}\{\gamma _\star ({\mathcal {L}}_\alpha (M))\}=\\&\mathop {\mathrm {arg\,min}}\limits _{\alpha \in [0,1]}\max \{\alpha +(1-\alpha )\lambda _{2}, |\alpha +(1-\alpha )\lambda _{d}|\}=\\&\mathop {\mathrm {arg\,min}}\limits _{\alpha \in [0,1]}{\left\{ \begin{array}{ll}\max \{\alpha +(1-\alpha )\lambda _{2}, -\alpha -(1-\alpha )\lambda _{d}\},&{}\text {if }\alpha<\frac{-\lambda _{d}}{1-\lambda _{d}};\\ \max \{\alpha +(1-\alpha )\lambda _{2}, \alpha +(1-\alpha )\lambda _{d}\},&{}\text {if }\alpha \ge \frac{-\lambda _{d}}{1-\lambda _{d}}\end{array}\right. }=\\&\mathop {\mathrm {arg\,min}}\limits _{\alpha \in [0,1]}{\left\{ \begin{array}{ll}\max \{\alpha +(1-\alpha )\lambda _{2}, -\alpha -(1-\alpha )\lambda _{d}\},&{}\text {if }\alpha<\frac{-\lambda _{d}}{1-\lambda _{d}};\\ \alpha +(1-\alpha )\lambda _{2},&{}\text {if }\alpha \ge \frac{-\lambda _{d}}{1-\lambda _{d}}\end{array}\right. }=\\&\mathop {\mathrm {arg\,min}}\limits _{\alpha \in [0,1]}{\left\{ \begin{array}{ll}\max \{\alpha (1-\lambda _{2})+\lambda _{2}, \alpha (\lambda _{d}-1)-\lambda _{d}\},&{}\text {if }\alpha<\frac{-\lambda _{d}}{1-\lambda _{d}};\\ \alpha (1-\lambda _{2})+\lambda _{2},&{}\text {if }\alpha \ge \frac{-\lambda _{d}}{1-\lambda _{d}}\end{array}\right. }= \\&\mathop {\mathrm {arg\,min}}\limits _{\alpha \in [0,1]}{\left\{ \begin{array}{ll} \alpha (\lambda _{d}-1)-\lambda _{d},&{}\text {if }\alpha<\frac{-\lambda _{d}-\lambda _{2}}{2-\lambda _{d}-\lambda _{2}};\\ \alpha (1-\lambda _{2})+\lambda _{2},&{}\text {if }\frac{-\lambda _{d}-\lambda _{2}}{2-\lambda _{d}-\lambda _{2}}\le \alpha<\frac{-\lambda _{d}}{1-\lambda _{d}};\\ \alpha (1-\lambda _{2})+\lambda _{2},&{}\text {if }\alpha \ge \frac{-\lambda _{d}}{1-\lambda _{d}}\end{array}\right. }&= \\&\mathop {\mathrm {arg\,min}}\limits _{\alpha \in [0,1]}{\left\{ \begin{array}{ll} \alpha (\lambda _{d}-1)-\lambda _{d},&{}\text {if }\alpha <\frac{-\lambda _{d}-\lambda _{2}}{2-\lambda _{d}-\lambda _{2}};\\ \alpha (1-\lambda _{2})+\lambda _{2},&{}\text {if }\frac{-\lambda _{d}-\lambda _{2}}{2-\lambda _{d}-\lambda _{2}}\le \alpha \end{array}\right. }=\\&\frac{-\lambda _{d}-\lambda _{2}}{2-\lambda _{d}-\lambda _{2}}. \end{aligned}$$

Finally, for this $\alpha $, we have

$$\begin{aligned} \gamma _\star ({\mathcal {L}}_\alpha (M))=\frac{2-2\lambda _{2}}{2-\lambda _{d}-\lambda _{2}}. \end{aligned}$$

$\square $

1.3 Proof of Lemma 4.5

Let $t_0=\max \left\{ \left\lceil \frac{1}{1-\alpha }\right\rceil \left\lceil \ln \frac{2}{\varepsilon }\right\rceil ,t_{\textsf{mix}}\left( M,\frac{\varepsilon }{2}\right) \right\} $ and let $t=2\left\lceil \frac{1}{1-\alpha }\right\rceil t_0$. Denote by $\pi $ the stationary distribution of M. Then, for every $i\in [d]$, we have

$$\begin{aligned}&\left| \left| ({\mathcal {L}}_\alpha (M))^t(i,\cdot ) - \pi \right| \right| \text {TV}\\&=\left| \left| \sum _{n=0}^t \left( {\begin{array}{c}t\\ n\end{array}}\right) \alpha ^{t-n}(1-\alpha )^nM^n(i,\cdot )-\pi \right| \right| \text {TV}\\&\le \sum _{n=0}^{t_0} \left( {\begin{array}{c}t\\ n\end{array}}\right) \alpha ^{t-n}(1-\alpha )^n\overbrace{||M^n(i,\cdot )-\pi ||\text {TV}}^{\le 1}+ \sum _{n=t_0+1}^t \left( {\begin{array}{c}t\\ n\end{array}}\right) \alpha ^{t-n}(1-\alpha )^n||M^n(i,\cdot )-\pi ||\text {TV}\\&\le \sum _{n=0}^{t_0} \left( {\begin{array}{c}t\\ n\end{array}}\right) \alpha ^{t-n}(1-\alpha )^n+||M^{t_0+1}(i,\cdot )-\pi ||\text {TV}\overbrace{\sum _{n=t_0+1}^t \left( {\begin{array}{c}t\\ n\end{array}}\right) \alpha ^{t-n}(1-\alpha )^n}^{\le 1}\\&\le {\mathbb {P}}(Y\ge t-t_0) + ||M^{t_0+1}(i,\cdot )-\pi ||\text {TV}, \end{aligned}$$

where $Y\sim \text {Binomial}(t, \alpha )$ and where, in the second inequality, we used that the total variation distance is monotone decreasing (e.g., Lalley 2009, Proposition 7). Since $t_0\ge t_{\textsf{mix}}\left( M,\frac{\varepsilon }{2}\right) $, we have

$$\begin{aligned} ||M^{t_0+1}(i,\cdot )-\pi ||\text {TV}\le \frac{\varepsilon }{2} \end{aligned}$$

and, since $t_0\ge \left\lceil \frac{1}{1-\alpha }\right\rceil \left\lceil \ln \frac{2}{\varepsilon }\right\rceil $, by Hoeffding’s inequality,

$$\begin{aligned} {\mathbb {P}}(Y\ge t-t_0)\le \exp \left( -\frac{2t_{0}^{2}}{t}\right) \le \frac{\varepsilon }{2}. \end{aligned}$$

The assertion regarding $t_{\textsf{mix}}({\mathcal {L}}_\alpha (M))$ follows from the general assertion regarding $t_{\textsf{mix}}({\mathcal {L}}_\alpha (M), \varepsilon )$, together with the inequality

$$\begin{aligned} t_{\textsf{mix}}(M,\varepsilon )\le \log _{2}\left( \frac{1}{\varepsilon }\right) t_{\textsf{mix}}(M) \end{aligned}$$

(cf. Levin and Peres 2017, (4.34)). $\square $

1.4 Proof of inequality (8)

First, we prove that $||I-\Pi ||_\pi \le 1$. We have

$$\begin{aligned} ||I-\Pi ||_\pi ^2&=\sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}||(I-\Pi )f^T||_{\pi }^{2}\\&=\sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}\sum _{i\in [d]}\left( f(i)-\pi f^T\right) ^{2}\pi (i)\\&=\sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}1-(\pi f^T)^{2}\le 1. \end{aligned}$$

Now, let $M\in {\mathcal {M}}_d^{\text {irr}}$ with stationary distribution $\pi $. Since $\Pi = M\Pi $ and due to the sub-multiplicativity of $||\cdot ||_\pi $, we have

$$\begin{aligned} ||M-\Pi ||_{\pi }=||M-M\Pi ||_{\pi }=||M(I-\Pi )||_{\pi }\le ||M||_{\pi }||I-\Pi ||_{\pi }. \end{aligned}$$

Thus, it remains to prove that $||M||_\pi \le 1$. Using Jensen’s inequality, we have

$$\begin{aligned} ||M||_\pi ^2&=\sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}||Mf^T||_{\pi }^{2}\\&=\sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}\sum _{i\in [d]}\left( \sum _{j\in [d]}M(i,j)f(j)\right) ^{2}\pi (i)\\&\le \sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}\sum _{i\in [d]}\sum _{j\in [d]}M(i,j)f(j)^{2}\pi (i)\\&=\sup _{f\in {{\mathbb {R}}}^{d},\;||f||_{\pi }=1}\sum _{j\in [d]}\left( \overbrace{\sum _{i\in [d]}\pi (i)M(i,j)}^{=\pi (j)}\right) f(j)^{2}=1. \end{aligned}$$

Finally, $||M^*||_\pi \le 1$ holds since the time reversal $M^*$ of M also has $\pi $ as stationary distribution (e.g., Levin and Peres 2017, Proposition 1.23). $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fried, S. On the $\alpha $-lazy version of Markov chains in estimation and testing problems. Stat Inference Stoch Process 26, 413–435 (2023). https://doi.org/10.1007/s11203-022-09283-7

Download citation

Received: 11 May 2022
Accepted: 20 November 2022
Published: 05 December 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11203-022-09283-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the \(\alpha \)-lazy version of Markov chains in estimation and testing problems

Abstract

Access this article

Similar content being viewed by others

Correction to a proposition related to ergodicity of Markov chains

On Weak and Strong Ergodicity

Equivalences of Geometric Ergodicity of Markov Chains

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

1.1 \(\ell _1\)-projection on \(\Delta _d\)

Lemma 5.1

Proof

Lemma 5.2

Proof

1.2 Proof of Lemma 2.1

1.3 Proof of Lemma 4.5

1.4 Proof of inequality (8)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the \(\alpha \)-lazy version of Markov chains in estimation and testing problems

Abstract

Access this article

Similar content being viewed by others

Correction to a proposition related to ergodicity of Markov chains

On Weak and Strong Ergodicity

Equivalences of Geometric Ergodicity of Markov Chains

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 \(\ell _1\)-projection on \(\Delta _d\)

Lemma 5.1

Proof

Lemma 5.2

Proof

1.2 Proof of Lemma 2.1

1.3 Proof of Lemma 4.5

1.4 Proof of inequality (8)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation