Skip to main content
Log in

Convergence rate bounds for iterative random functions using one-shot coupling

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

One-shot coupling is a method of bounding the convergence rate between two copies of a Markov chain in total variation distance, which was first introduced in Roberts and Rosenthal (Process Appl 99:195–208, 2002) and generalized in Madras and Sezer (Bernoulli 16:882–908, 2010). The method is divided into two parts: the contraction phase, when the chains converge in expected distance and the coalescing phase, which occurs at the last iteration, when there is an attempt to couple. One-shot coupling does not require the use of any exogenous variables like a drift function or a minorization constant. In this paper, we summarize the one-shot coupling method into the One-Shot Coupling Theorem. We then apply the theorem to two families of Markov chains: the random functional autoregressive process and the autoregressive conditional heteroscedastic process. We provide multiple examples of how the theorem can be used on various models including ones in high dimensions. These examples illustrate how the theorem’s conditions can be verified in a straightforward way. The one-shot coupling method appears to generate tight geometric convergence rate bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The trigonometric identities used are \(2\cos \mu \sin \upsilon = \sin (\mu +\upsilon )-\sin (\mu -\upsilon )\) and \(\cos (\mu + \upsilon )= \cos \mu \cos \upsilon +\sin \mu \sin \upsilon \) where \(\mu ,\upsilon \in {\mathbb {R}}\)

References

Download references

Acknowledgements

We thank the referees for their many excellent comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sabrina Sixta.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Propositions related to the properties of total variation distance

Proof of Proposition 2.2

Let \({\mathcal {A}}\) be the sigma field of \({\mathcal {X}}\) and \({\mathcal {B}}\) be the sigma field of \({\mathcal {Y}}\).

First note that \(f^{-1}({\mathcal {B}})=\{f^{-1}(B): B\in {\mathcal {B}}\}={\mathcal {A}}\):

  • \(f^{-1}({\mathcal {B}})\subset {\mathcal {A}}\): For \(B\in {\mathcal {B}}\), \(f^{-1}(B)\subset {\mathcal {A}}\) by measurability.

  • \({\mathcal {A}} \subset f^{-1}({\mathcal {B}})\): Let \(A\in {\mathcal {A}}\). Then \(f(A)\in {\mathcal {B}}\) and \(f^{-1}(f(A))\in f^{-1}({\mathcal {B}})\) by definition. By invertibility, \(f^{-1}(f(A))=A\) and so \(A \in f^{-1}({\mathcal {B}})\).

The equality in Eq. 2 can then be proven as follows,

$$\begin{aligned}&\Vert {\mathcal {L}}(f(X))-{\mathcal {L}}(f(X'))\Vert \\&\quad =\sup _{B\in f({\mathcal {B}})}|P(f(X)\in B)-P(f(X')\in B)|\\&\quad = \sup _{B\in f({\mathcal {B}})}|P(X\in f^{-1}(B))-P(X'\in f^{-1}(B))|\\&\quad = \sup _{A\in {\mathcal {A}}}|P(X\in A)-P(X'\in A)| \hbox { Since}\ f^{-1}({\mathcal {B}})={\mathcal {A}}\\&\quad =\Vert {\mathcal {L}}(X)-{\mathcal {L}}(X')\Vert \end{aligned}$$

\(\square \)

Proof of Proposition 2.3

$$\begin{aligned}&\Vert {\mathcal {L}}(X)-{\mathcal {L}}(X')\Vert = \sup _{A\in {\mathcal {B}}} |P(X\in A)-P(X'\in A)|\\&\quad = \sup _{A\in {\mathcal {B}}}|\int _{{\mathcal {Y}}} P(X\in A\mid y)-P(X'\in A\mid y)\mu (dy)|\\&\quad \le \sup _{A\in {\mathcal {B}}}\int _{{\mathcal {Y}}} |P(X\in A\mid y)-P(X'\in A\mid y)|\mu (dy) \\&\qquad \text { by Jensen's inequality}\\&\quad \le \int _{{\mathcal {Y}}} \sup _{A\in {\mathcal {B}}} |P(X\in A\mid y)-P(X'\in A\mid y)|\mu (dy) \\&\quad \le E\left[ \Vert {\mathcal {L}}(X\mid Y)-{\mathcal {L}}(X'\mid Y)\Vert \right] \end{aligned}$$

\(\square \)

Proof of Proposition 2.4

To prove this, we use the concept of maximal coupling over the coordinates. By maximal coupling, for \(i\in \{1,\ldots , d\}\) there exists random variables \(X_{i,n}^M, X_{i,n}^{'M}\) such that \(X_{i,n}\overset{d}{=}X_{i,n}^M\) and \(X'_{i,n}\overset{d}{=}X_{i,n}^{'M}\) and

$$\begin{aligned} \Vert {\mathcal {L}}(X_{i,n})-{\mathcal {L}}(X'_{i,n})\Vert =P(X_{i,n}^M\ne X_{i,n}^{'M}) \end{aligned}$$

(see Proposition 3g of Roberts and Rosenthal (2004) or Section 2 of Böttcher (2017)).

Further, there exists a unique product measure such that for any \(A_1, \ldots A_d \in {{\mathcal {B}}}\), \(P(\cap _{i=1}^d [X_{i,n}^M\in A_i])=\prod _{i=1}^d P(X_{i,n}^M\in A_i)\) (Theorem 18.2 of Billingsley (2012)). For the unique product measure, the following equality holds,

$$\begin{aligned}&P(\cap _{i=1}^d X_{i,n}^M \in A_i)=\prod _{i=1}^d P(X_{i,n}^M\in A_i)\\&\quad =\prod _{i=1}^d P(X_{i,n}\in A_i) =P(\cap _{i=1}^d X_{i,n} \in A_i) \end{aligned}$$

And so by uniqueness, for \(A\in {{\mathcal {B}}}^{\text {d}}\), \(P(X_n^M\in A)=P(X_n\in A)\). By definition, this means that \(\vec {X}_{n}\overset{d}{=} \vec {X}_n^M\), which implies that \((\vec {X}_n^M, \vec {X}_n^{'M})\in {\mathcal {C}}(\vec {X}_n,\vec {X}'_n)\), the set of all couplings of \(\vec {X}_n,\vec {X}'_n\).

We now use \(\vec {X}_n^M, \vec {X}_n^{'M}\) to prove Eq. 3.

$$\begin{aligned}&\Vert {\mathcal {L}}(\vec {X}_n)-{\mathcal {L}}(\vec {X}'_n)\Vert \\&\quad = \inf _{\vec {Y},\vec {Y}'\in {\mathcal {C}}(\vec {X}_n,\vec {X}'_n)}P(\vec {Y}\ne \vec {Y}')\, \text {by Eq. 2.4 of}\, 27\\&\quad \le P(\vec {X}_n^M\ne \vec {X}_n^{'M})\\&\quad = P(\cup _{i=1}^d [X_{i,n}^M\ne X_{i,n}^{'M}])\\&\quad \le \sum _{i=1}^d P(X_{i,n}^M\ne X_{i,n}^{'M}) \quad \text {by subadditivity}\\&\quad \le d A r^n \end{aligned}$$

\(\square \)

Appendix B: Lemmas related to the Sideways Theorem

The following are lemmas and corresponding proofs and corollaries related to the Sideways Theorem (4.2).

1.1 Lemmas providing an upper bound on the integral difference between a function and a corresponding shift

The following lemmas are used in the proof of Lemma 4.3.

Lemma B.1

For any invertible, continuous function \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) where the codomain is \(f({\mathbb {R}})=(a,b)\) and \(\Delta >0\),

$$\begin{aligned} \int _{{\mathbb {R}}}|f(x+\Delta )-f(x)|dx=(b-a)\Delta \end{aligned}$$

Proof

Since f is invertible and continuous, it is strictly monotone (Lemma 3.8 if Hairer and Wanner (2008)). Assume that f is strictly increasing. The integral can be written as follows,

$$\begin{aligned}&\int _{{\mathbb {R}}}|f(x+\Delta )-f(x)|dx = \int _{{\mathbb {R}}}f(x+\Delta )-f(x)dx\\&\quad = \int _{{\mathbb {R}}}\int _a^bI_{f(x+\Delta )<y<f(x)}dy dx \\&\quad =\int _{{\mathbb {R}}}\int _a^bI_{f^{-1}(y)-\Delta<x<f^{-1}(y)}dy dx \\&\quad =\int _a^b \int _{{\mathbb {R}}}I_{f^{-1}(y)-\Delta<x<f^{-1}(y)}dx dy \;\text {by Fubini's Theorem} \\&\quad =\int _a^b \Delta dy \\&\quad = (b-a)\Delta \end{aligned}$$

If f is strictly decreasing apply the transform \(h(x)=a+b-f(x)\). The function h is a strictly increasing invertible function with codomain (a, b) and so using the previous result for increasing functions,

$$\begin{aligned}&\int _{{\mathbb {R}}}|f(x+\Delta )-f(x)|dx\\&\quad = \int _{{\mathbb {R}}}|h(x+\Delta )-h(x)|dx = (b-a)\Delta \end{aligned}$$

\(\square \)

Lemma B.2

Let \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) be a continuous function that is invertible over the set (c, d) and is a constant function over \((c,d)^C\). Further suppose that the codomain is \(f({\mathbb {R}})=(a,b)\). Then for \(\Delta >0\), we get that

$$\begin{aligned} \int _{{\mathbb {R}}}|f(x+\Delta )-f(x)|dx=(b-a)\Delta \end{aligned}$$

Proof

Assume that f is an increasing function and so \(f(c)=a\), \(f(d)=b\) and \(|f(x+\Delta )-f(x)|=f(x+\Delta )-f(x)\).

Let \(0<\epsilon <(c-d)/2\) and define

$$\begin{aligned} g_{\epsilon }(x) = \left\{ \begin{array}{ll} (f(c+\epsilon )-a)(1-e^{x-c-\epsilon })+a &{} when x\in (-\infty ,c+\epsilon ]\\ f(x) &{} when x\in (c+\epsilon ,d-\epsilon ]\\ (f(d-\epsilon )-b)(1-e^{d-\epsilon -x})+b &{} when x\in (d-\epsilon ,\infty )\\ \end{array} \right. \end{aligned}$$

Note that \(g_{\epsilon }(x)\) is continuous, invertible, an increasing function and the codomain is (a, b). By Lemma B.1 for each \(\epsilon >0\)

$$\begin{aligned} \int _{{\mathbb {R}}}g_{\epsilon }(x+\Delta )-g_{\epsilon }(x) dx = (b-a)\Delta \end{aligned}$$

Further, for all \(x\in {\mathbb {R}}\), \(\lim _{\epsilon \rightarrow 0}g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)=f(x+\Delta )-f(x)\) and so \(g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)\) converges pointwise to \(f(x+\Delta )-f(x)\). Next, for \(0<\epsilon <(c-d)/2\), \(|g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)|<2|b|\) and so the function \(g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)\) is uniformly bounded. The above statements allow us to apply the dominated convergence Theorem (Theorem 16.5 of Billingsley (2012)) and so

$$\begin{aligned}&\int _{{\mathbb {R}}}f(x+\Delta )-f(x)dx \\&\quad = \lim _{\epsilon \rightarrow 0} \int _{{\mathbb {R}}}g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)dx = (b-a)\Delta \end{aligned}$$

If f is strictly decreasing apply the transform \(h(x)=a+b-f(x)\). The function h is a strictly increasing invertible function with codomain (a, b) and so using the previous result for increasing functions,

$$\begin{aligned}&\int _{{\mathbb {R}}}|f(x+\Delta )-f(x)|dx= \int _{{\mathbb {R}}}|h(x+\Delta )\\&\quad -h(x)|dx = (b-a)\Delta \end{aligned}$$

\(\square \)

Lemma B.3

Let \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) be a continuous function with the following properties:

  • the codomain is (0, K)

  • \((m_1, m_2, \ldots , m_M)\) are the local maxima and minima points

  • \(\lim _{x\rightarrow \infty }f(x)=0\) and \(\lim _{x\rightarrow -\infty }f(x)=0\)

Further suppose that \(\Delta < \max _{i=2,\ldots , M}\{m_i-m_{i-1}\}\). Then

$$\begin{aligned} \int _{{\mathbb {R}}}|f(x-\Delta )-f(x)|dx \le K(M+1)\Delta \end{aligned}$$

Proof

Since \(\Delta < \max _{i=2,\ldots , M}\{m_i-m_{i-1}\}\), we have that \(m_1-\Delta<m_1<m_2-\Delta<\ldots <m_M\). Let \(I_1,\ldots , I_M\) be the intersection points or the points where \(f(I_i)=f(I_i-\Delta )\).

Show that \(m_i-\Delta<I_i<m_i\): Suppose that \(m_i\) is a local maximum point. Let \(g(x)=f(x+\Delta )\). Within the interval \((m_i-\Delta ,m_i)\), \(f'(x)>0\) and \(g'(x)<0\) by assumption. This implies that \(f(m_i-\Delta )<f(m_i)\) and \(g(m_i-\Delta )>g(m_i)\) by the Mean Value Theorem. Further since \(g(m_i-\Delta )=f(m_i)\) we have that \(g(m_i-\Delta )>f(m_i-\Delta )\) and \(g(m_i)<f(m_i)\).

Let \(h(x)=g(x)-f(x)\). Then \(h(m_i-\Delta )>0\) and \(h(m_i)<0\) further h is a strictly decreasing function over \((m_i-\Delta ,m_i)\) since \(g,-f\) are strictly decreasing functions over the same interval. So by the intermediate value theorem, there exists an \(\xi \in (m_i-\Delta ,m_i)\) such that \(h(\xi )=0\) or \(f(\xi )=g(\xi )=f(\xi +\Delta )\). Further by injectivity, \(\xi \) is unique. Let \(I_i=\xi \). A similar proof can be given for when \(m_i\) is a local minimum.

Show that \(\int _{I_i}^{I_{i+1}}|f(x+\Delta )-f(x)|dx\le K\Delta \): Note first that \(m_i-\Delta<I_i<m_i<m_{i+1}-\Delta<I_{i+1}<m_{i+1}\) further define

$$\begin{aligned} f_i(x) = \left\{ \begin{array}{ll} f(m_i) &{} when x\in (-\infty ,m_i]\\ f(x) &{} when x\in (m_i,m_{i+1}]\\ f(m_{i+1}) &{} when x\in (m_{i+1},\infty )\\ \end{array} \right. \end{aligned}$$

Note that over the interval \((m_i,m_{i+1}]\), the function f is either a strictly increasing or a strictly decreasing function.

$$\begin{aligned}&\int _{I_i}^{I_{i+1}}|f(x+\Delta )-f(x)|dx \\&=\int _{I_i}^{m_i}|f(x+\Delta )-f(x)|dx \\&\quad + \int _{m_i}^{m_{i+1}-\Delta }|f(x+\Delta )-f(x)|dx \\&\quad + \int _{m_{i+1}-\Delta }^{I_{i+1}}|f(x+\Delta )-f(x)|dx\\&\le \int _{I_i}^{m_i}|f(x+\Delta )-f(m_i)|dx \\&\quad + \int _{m_i}^{m_{i+1}-\Delta }|f(x+\Delta )-f(x)|dx \\&\quad + \int _{m_{i+1}-\Delta }^{I_{i+1}}|f(m_{i+1})-f(x)|dx\\&= \int _{I_i}^{m_i}|f_i(x+\Delta )-f_i(x)|dx \\&\quad + \int _{m_i}^{m_{i+1}-\Delta }|f_i(x+\Delta )-f_i(x)|dx \\&\quad \quad + \int _{m_{i+1}-\Delta }^{I_{i+1}}|f_i(x+\Delta )-f_i(x)|dx\\&= \int _{I_i}^{I_{i+1}}|f_i(x+\Delta )-f_i(x)|dx \\&\le \int _{m_i-\Delta }^{m_{i+1}}|f_i(x+\Delta )-f_i(x)|dx \\&= \int _{{\mathbb {R}}}|f_i(x+\Delta )-f_i(x)|dx \\&= |f(m_i)-f(m_{i+1})|\Delta \le K\Delta \end{aligned}$$

The last equality is a result of Lemma B.2.

By similar reasoning, it can be shown that

$$\begin{aligned}&\int _{-\infty }^{I_{1}}|f(x+\Delta )-f(x)|dx\le K\Delta \\&\quad \int _{I_{M}}^{\infty }|f(x+\Delta )-f(x)|dx\le K\Delta \end{aligned}$$

Finally note that the intersection points partition \({\mathbb {R}}\) into \(M+1\) subsets and so

$$\begin{aligned} \int _{{\mathbb {R}}}|f(x-\Delta )-f(x)|dx \le K(M+1)\Delta \end{aligned}$$

\(\square \)

1.1.1 Proof of Lemma 4.3

Lemma 4.3 represents the coalescing condition for the Sideways Theorem 4.2.

Proof of Lemma 4.3

Set \(\theta _{1,n}=\theta '_{1,n}\). Define

$$\begin{aligned} \Delta =g(\theta _{1,n},X_{n-1})-g(\theta _{1,n},X'_{n-1}) \end{aligned}$$

Let \(f_{X_n},f_{X'_n}\) be the density functions for \(X_{n},X_{n}'\), respectively, and \(f_{\theta _{2,n}}, f_{\theta _{2,n}+\Delta }\) be the density functions for \(\theta _{2,n}, \theta _{2,n}+\Delta \).

Suppose that \(\Delta ,X_{n-1}, X'_{n-1} \in {\mathbb {R}}\) are known and so,

$$\begin{aligned} X_n&=g(\theta _{1,n},X_{n-1})+\theta _{2,n} \implies \theta _{2,n}\\&=X_n-g(\theta _{1,n},X_{n-1}) \\ X'_n&=g(\theta _{1,n},X'_{n-1})+\theta '_{2,n} \implies \theta '_{2,n}-\Delta \\&=X'_n-g(\theta _{1,n},X_{n-1}) \end{aligned}$$

We know that \(\theta _{2,n}\overset{d}{=}\theta '_{2,n}\) and in general \(\Delta \), \(\theta _{1,n}\) are random variables, so

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \nonumber \\&\quad \le E_{\theta _{1,n}, \Delta }\left[ \Vert {\mathcal {L}}(X_n\mid \theta _{1,n}, \Delta )-{\mathcal {L}}(X'_n\mid \theta _{1,n}, \Delta )\Vert \right] \nonumber \\&\qquad \text {by Proposition} 2.3 \end{aligned}$$
(B1)
$$\begin{aligned}&\quad = E_{\theta _{1,n}, \Delta }\left[ \Vert {\mathcal {L}}(\theta _{2,n}\mid \theta _{1,n})-{\mathcal {L}}(\theta _{2,n}-\Delta \mid \theta _{1,n})\Vert \right] \nonumber \\&\qquad \text {by Proposition}\, 2.3 \end{aligned}$$
(B2)

By the assumptions in the theorem, the density of \(\theta _{2,n}\) is continuous with M extrema points and has a codomain that is in (0, K). Let \((m_1, m_2,\ldots , m_M)\) be the local extrema points where \(m_i<m_j\) if \(i<j\) and \(L\le \max _{2\le i\le M}\{m_i-m_{i-1}\}\) be the maximum distance between two local extrema points. So, continuing from the inequality B1 and by the definition of total variation, Eq. 1,

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \\&\quad \le E_{\theta _{1,n}}\left[ E_{\Delta }\left[ \frac{1}{2}\int _{{\mathbb {R}}}|f_{\theta _{2,n}}(x\mid \theta _{1,n})-f_{\theta _{2,n}-\Delta }(x\mid \theta _{1,n})|dx\right] \right] \\&\quad = E_{\theta _{1,n}}\left[ E_{\Delta }\left[ \frac{1}{2}\int _{{\mathbb {R}}}|f_{\theta _{2,n}}(x\mid \theta _{1,n})-f_{\theta _{2,n}}(x+\Delta \mid \theta _{1,n})|dx\right] \right] \\&\quad = E_{\theta _{1,n}}\left[ E_{\Delta }\left[ \frac{1}{2}\int _{{\mathbb {R}}}|f_{\theta _{2,n}}(x\mid \theta _{1,n})-f_{\theta _{2,n}}(x+\Delta \mid \theta _{1,n})|dxI_{\Delta <L}\right] \right] \\&\quad \quad + E_{\theta _{1,n}}\left[ E_{\Delta }\left[ \frac{1}{2}\int _{{\mathbb {R}}}|f_{\theta _{2,n}}(x\mid \theta _{1,n})-f_{\theta _{2,n}}(x+\Delta \mid \theta _{1,n})|dxI_{\Delta>L}\right] \right] \\&\quad \le \frac{1}{2}E_{\theta _{1,n}}\left[ E_\Delta \left[ K(M+1)| \Delta | \right] \right] + P_\Delta (\mid \Delta \mid >L)\\&\quad \le \frac{K(M+1)}{2}E_\Delta \left[ | \Delta | \right] +\frac{E_{\Delta }[\mid \Delta \mid ]}{L} \end{aligned}$$

The second last inequality is a result of Lemma B.3. The coalescing condition is thus satisfied as follows with \(C=\frac{K(M+1)}{2} +\frac{I_{M>1}}{L}\),

$$\begin{aligned}&\Vert {\mathcal {L}}(X_{n+1})-{\mathcal {L}}(X'_{n+1})\Vert \\&\quad \le C E[|g(\theta _{1,n},X_{n-1})-g(\theta _{1,n},X'_{n-1})|]\\&\quad =CE[|g(\theta _{1,n},X_{n-1}) + \theta _{2,n}\\&\quad -(g(\theta _{1,n},X'_{n-1}) +\theta _{2,n})|]\\&\quad = C E[|X_n-X'_n|] \end{aligned}$$

\(\square \)

Appendix C: Lemmas for random-functional autoregressive process examples

1.1 Proof of Lemma 4.5

Proof of Lemma 4.5

First note that

$$\begin{aligned}&E[|X_{n+2}-X'_{n+2}| \mid X_n=x,X'_n=y] \\&\quad = E\left[ \bigg |g\left( \frac{1}{2}(x -\sin x)+Z_n\right) -g\left( \frac{1}{2}(y -\sin y)+Z_n\right) \bigg |\right] \\&\quad = \frac{1}{2}E\left[ \bigg |\frac{1}{2}(x-y +\sin y -\sin x)\right. \\&\qquad \left. + \sin \left( \frac{1}{2}(y -\sin y)+Z_n\right) -\sin \left( \frac{1}{2}(x -\sin x)+Z_n\right) \bigg |\right] \\&\quad = \frac{1}{2}E\left[ |g(x,y) + G(x,y)|\right] \end{aligned}$$

where \(g(x,y)=\frac{1}{2}(x-y +\sin y -\sin x)\) and \(G(x,y)=\sin \left( \frac{1}{2}(y -\sin y)+Z_n\right) -\sin \left( \frac{1}{2}(x -\sin x)+Z_n\right) \). Bytrigonometric identitiesFootnote 1, for \(k(x,y)=\frac{x+y-\sin y - \sin x}{4}\) and \(h(x,y)=\frac{y-x+\sin x - \sin y}{4}\).

$$\begin{aligned} G(x,y)&= 2\cos \left( \frac{x+y-\sin y - \sin x}{4} + Z_n\right) \\&\quad \sin \left( \frac{y-x+\sin x - \sin y}{4}\right) \\&= 2\cos \left( k(x,y) + Z_n\right) \sin h(x,y) \\&= 2\sin h(x,y) \left( \cos Z_n \cos k(x,y)\right. \\&\quad \left. + \sin Z_n \sin k(x,y)\right) \end{aligned}$$

And so,

$$\begin{aligned}&E[|X_{n+2}-X'_{n+2}|\mid X_n=x,X'_n=y]\\&= \frac{1}{2}E\left[ |g(x,y) + 2\sin h(x,y) \left( \cos Z_n \cos k(x,y) + \sin Z_n \sin k(x,y)\right) |\right] \\&\le \frac{1}{2}\sqrt{E\left[ \left( g(x,y) + 2\sin h(x,y) \left( \cos Z_n \cos k(x,y) + \sin Z_n \sin k(x,y)\right) \right) ^2\right] } \\&= \frac{1}{2}\sqrt{g(x,y)^2 + 4\frac{g(x,y) \sin h(x,y)\cos k(x,y)}{e^{1/2}} + 2\sin ^2 h(x,y) \left( 1+\frac{\cos ^2k(x,y) - \sin ^2 k(x,y)}{e^{2}}\right) }\\&= \frac{1}{\sqrt{2}}\sqrt{2h(x,y)^2 - 4\frac{h(x,y) \sin h(x,y)\cos k(x,y)}{e^{1/2}} + \sin ^2 h(x,y) \left( 1+\frac{\cos ^2k(x,y) - \sin ^2 k(x,y)}{e^2}\right) } \end{aligned}$$

\(\square \)

1.2 Proof of lemmas used in Theorem 4.8

To prove the first part of this theorem, we apply the de-initialization technique which shows how the convergence rate of a Markov chain can be bounded above by the convergence rate of a more simpler Markov chain that includes sufficient information on the Markov chain of interest. The concept of de-initialization and a proposition that bounds total variation is provided below.

Definition C.1

(De-initialization) Let \(\{X_n\}_{n\ge 1}\) be a Markov chain. A Markov chain \(\{Y_n\}_{n\ge 1}\) is a de-initialization of \(\{X_n\}_{n\ge 1}\) if for each \(n\ge 1\)

$$\begin{aligned} {\mathcal {L}}(X_n \mid X_0,Y_n)={\mathcal {L}}(X_n\mid Y_n) \end{aligned}$$

Proposition C.2

(Theorem 1 of Roberts and Rosenthal (2001)) Let \(\{Y_n\}_{n\ge 1}\) be a de-initialization of \(\{X_n\}_{n\ge 1}\) then for any two initial distributions \(X_0\sim \mu \) and \(X'_0\sim \mu '\),

$$\begin{aligned} \Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \le \Vert {\mathcal {L}}(Y_n)-{\mathcal {L}}(Y'_n)\Vert \end{aligned}$$

Proof of Lemma 4.9

Note that \(\beta _{n}= {\tilde{\beta }} + \sigma _{n-1} Z_n, Z_n\sim N_p(0, A^{-1})\) can be written as a random function of \(\sigma ^2_n\). Substituting \(\beta _n\), \(\sigma ^2_{n}\) can then be written as a random function of its previous value for independent \(Z^2_{n}\sim \chi ^2(p)\) and \(G_n \sim \Gamma (\frac{k+p}{2},1)\),

$$\begin{aligned} \sigma ^2_{n}=\frac{Z^2_{n}}{C}\frac{C}{2G_n}\sigma ^2_{n-1}+\frac{C}{2G_n} \end{aligned}$$

Let \(X_n=\frac{Z^2_{n}}{C}\), \(Y_n=\frac{C}{2G_n}\). We can rewrite \(\sigma ^2_{n}=X_nY_n\sigma ^2_{n-1}+Y_n\) where \(X_n\sim \Gamma \left( \frac{p}{2}, \frac{C}{2}\right) \) and \(Y_n\sim \Gamma ^{-1}\left( \frac{k+p}{2}, \frac{C}{2}\right) \). Using the notation from the Sideways Theorem 4.2\(\theta _{1,n}=X_nY_n\) and \(\theta _{2,n}=Y_n\).

Since \(\beta _{n}\) can be written as a random function of \(\sigma ^2_n\),

$$\begin{aligned} {\mathcal {L}}(\beta _n,\sigma ^2_n\mid \beta _0,\sigma ^2_0,\sigma ^2_n)={\mathcal {L}}(\beta _n,\sigma ^2_n\mid \sigma ^2_n) \end{aligned}$$

and so \(\sigma ^2_n\) is a de-initialization of \((\beta _n,\sigma ^2_n)\). By Proposition C.2,

$$\begin{aligned} \Vert {\mathcal {L}}(\beta _n,\sigma ^2_n)-{\mathcal {L}}(\beta '_n, \sigma ^{'2}_n)\Vert \le \Vert {\mathcal {L}}(\sigma ^2_n)-{\mathcal {L}}(\sigma ^{'2}_n)\Vert \end{aligned}$$

We are thus interested in evaluating the convergence rate of \(\sigma ^2_{n}\) to bound the convergence rate of \((\beta _{n},\sigma ^2_{n})\).

To interpret this in another way, if \(\sigma ^2_n\) couples then the distribution of \(\beta _n\) is the same for both iterations, so it is automatically coupled. An alternative proof can be made using the results from Liu et al. (1994). \(\square \)

Proof of Lemma 4.10

By Lemma 4.9, \(\theta _{1,n}=X_n Y_n\) and so,

$$\begin{aligned} K&=E[|\theta _{1,n}|] = E[X_n Y_n]= E[X_n]E[Y_n]\\&\quad = \frac{p}{C}\frac{C}{k+p-2}= \frac{p}{k+p-2}\\ \end{aligned}$$

\(\square \)

Proof of Lemma 4.11

Calculate the conditional density \(\theta _{2,n}\mid \theta _{1,n}\) We remove the subscript n on the random variables. Let X, Y be as described in Lemma 4.9. Since the random variables are independent, the joint density is the product of the densities.

$$\begin{aligned} f_{X,Y}(x,y)= & {} \frac{C/2}{\Gamma (p/2)}x^{p/2-1}e^{xC/2}\nonumber \\&\quad \frac{C/2}{\Gamma ((k+p)/2)}y^{-(k+p)/2-1}e^{-\frac{C/2}{y}} \end{aligned}$$
(C3)

Then \((\theta _1, \theta _2)=(XY,Y)\) is a transformation with the Jacobian \(|J|=\theta _2^{-1}\) and the density written as follows,

$$\begin{aligned} f_{\theta _1,\theta _2}(\theta _1,\theta _2)&= f_{X,Y}\left( \frac{\theta _1}{\theta _2},\theta _2\right) \theta _2^{-1}\\&=\frac{C/2}{\Gamma (p/2)}\left( \frac{\theta _1}{\theta _2}\right) ^{p/2-1}e^{-\frac{\theta _1}{\theta _2}C/2}\\&\quad \frac{C/2}{\Gamma ((k+p)/2)}\theta _2^{-(k+p)/2-1}e^{-\frac{C/2}{\theta _2}}\theta _2^{-1} \end{aligned}$$

Next \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) is proportional to \(f_{\theta _1,\theta _2}(\theta _1,\theta _2)\) and so we can derive the conditional density of \(\theta _2\) as follows,

$$\begin{aligned} f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)&\propto f_{\theta _1,\theta _2}(\theta _1,\theta _2) \end{aligned}$$
(C4)
$$\begin{aligned}&\propto \theta _2^{1-p/2}e^{-\frac{1}{\theta _2}\theta _1C/2} \theta _2^{-(k+p)/2-1}e^{-\frac{1}{\theta _2}C/2}\theta _2^{-1} \end{aligned}$$
(C5)
$$\begin{aligned}&= \theta _2^{-(p/2+(k+p)/2)-1}e^{-\frac{1}{\theta _2}(\theta _1+1)C/2} \end{aligned}$$
(C6)

This is proportional to an inverse gamma distribution and so, \(\theta _2\mid \theta _1 \sim \Gamma ^{-1}\left( \frac{k+2p}{2}, (\theta _1+1)C/2\right) \). Since the conditional density is an inverse gamma distribution, the number of modes is \(M=1\) and the density function is continuous.

Calculate the maximum value of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) : Fig. 5 shows how the maximum value of the density increases as the shape, \((\theta _1+1)C/2\) decreases when the rate, \(\frac{k+2p}{2}\) is fixed. It can also be shown from equation C4 that the density function of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) is maximized when \(\theta _1=0\) since the normalizing constant will be the largest. This means that \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) reaches its maximum height when \(\theta _1=0\) and so we find the value of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) evaluated at \(\theta _2= \frac{C}{k+2p+2}\), the mode (Section 5.3 of Hoff (2009)).

$$\begin{aligned} K&= f_{\theta _2\mid \theta _1}\left( \frac{C}{k+2p+2}\mid \theta _1=0\right) \\&=\frac{(C/2)^{\frac{k+2p}{2}}}{\Gamma (\frac{k+2p}{2})}y^{-\frac{k+2p}{2}-1}e^{-\frac{C/2}{y}}\mid _{y=\frac{C}{k+2p+2}}\\&=\frac{(C/2)^{\frac{k+2p}{2}}}{\Gamma (\frac{k+2p}{2})}\left( \frac{C}{k+2p+2}\right) ^{-\frac{k+2p}{2}-1}e^{-\frac{k+2p+2}{2}}\\&=\frac{(C/2)^{\frac{k+2p}{2}}}{\Gamma (\frac{k+2p}{2})}\left( \frac{k+2p+2}{C}\right) ^{\frac{k+2p}{2}+1}e^{-\frac{k+2p+2}{2}} \end{aligned}$$

And so,

$$\begin{aligned} K&= \frac{(C/2)^{\frac{k+2p}{2}}}{\Gamma (\frac{k+2p}{2})}\left( \frac{k+2p+2}{C}\right) ^{\frac{k+2p}{2}+1}e^{-\frac{k+2p+2}{2}} \end{aligned}$$
(C7)

\(\square \)

Fig. 5
figure 5

Inverse gamma density when \(\alpha =100\) and \(\beta =1,10,100\)

1.3 Proof of lemmas used in Theorem 4.14

Proof of Lemma 4.15

The iteration \(\tau ^{-1}_{n+1}\) can be written as a function of its previous value, \(\tau ^{-1}_{n}\) since \(\mu _{n+1} = {\bar{y}} + Z_{n+1}/\sqrt{J \tau _{n}}\).

$$\begin{aligned} \tau ^{-1}_{n+1} = \frac{Z^2_{n+1}}{S}\frac{S}{2G_{n+1}}\tau ^{-1}_{n} + \frac{S}{2G_{n+1}} \end{aligned}$$
(C8)

Next we can rewrite, \(\tau ^{-1}_{n}=X_nY_n \tau ^{-1}_{n-1}+Y_n\) where \(X_n=\frac{Z^2_{t+1}}{S}\sim \Gamma \left( \frac{1}{2}, \frac{S}{2}\right) \) and \(Y_n=\frac{S}{2G_{t+1}}\sim \Gamma ^{-1}\left( \frac{J+2}{2}, \frac{S}{2}\right) \).

Since \((\mu _{n},\tau ^{-1}_{n})\) can be written as a random function of \(\tau ^{-1}_n\),

$$\begin{aligned} {\mathcal {L}}(\mu _n,\tau ^{-1}_n\mid \mu _0,\tau ^{-1}_0,\tau ^{-1}_n)={\mathcal {L}}(\mu _n,\tau ^{-1}_n\mid \tau ^{-1}_n) \end{aligned}$$

and \(\tau ^{-1}_n\) is a de-initialization of \((\mu _n,\tau ^{-1}_n)\). Further, by Proposition C.2,

$$\begin{aligned} \Vert {\mathcal {L}}(\mu _n,\tau ^{-1}_n)-{\mathcal {L}}(\mu '_n, \tau ^{'-1}_n)\Vert \le \Vert {\mathcal {L}}(\tau ^{-1}_n)-{\mathcal {L}}(\tau ^{'-1}_n)\Vert \end{aligned}$$

To interpret this in another way, if \(\tau _n\) couples then the distribution of \(\mu _n\) is the same for both iterations, so it is automatically coupled. An alternative proof can be made using the results from Liu et al. (1994). \(\square \)

Proof of Lemma 4.16

By Lemma 4.15, \(\theta _{1,n}=X_nY_n\) and so by Corollary 4.6

$$\begin{aligned} D= E[|\theta _{1,n}|] = E[X_nY_n]= E[X_n]E[Y_n]= \frac{1}{S}\frac{S}{J}= \frac{1}{J} \end{aligned}$$

\(\square \)

Proof of Lemma 4.17

To find M, K and show that the conditional density is continuous, we (a) show that \(\theta _2\mid \theta _1 \sim \Gamma ^{-1}\left( \frac{J-1}{2}, (\theta _1+1)S/2\right) \), which directly implies that the conditional distribution is continuous and \(M=1\) and we (b) we find the value of K.

(a) Calculate the conditional density \(\theta _{2,n}\mid \theta _{1,n}\) For simplicity, we remove the subscript n on the random variables. Let X, Y be as described in Lemma 4.15. Since the random variables are independent, the joint density is the product of the densities.

$$\begin{aligned} f_{X,Y}(x,y)= & {} \frac{S/2}{\Gamma (1/2)}x^{1/2-1}e^{xS/2} \frac{S/2}{\Gamma ((J+2)/2)}\nonumber \\&y^{-(J+2)/2-1}e^{-\frac{S/2}{y}} \end{aligned}$$
(C9)

Then \((\theta _1, \theta _2)=(XY,Y)\) is a transformation with the Jacobian \(|J|=\theta _2^{-1}\) and the density written as follows,

$$\begin{aligned} f_{\theta _1,\theta _2}(\theta _1,\theta _2)&= f_{X,Y}\left( \frac{\theta _1}{\theta _2},\theta _2\right) \theta _2^{-1}\\&=\frac{S/2}{\Gamma (1/2)}\left( \frac{\theta _1}{\theta _2}\right) ^{1/2-1}e^{-\frac{\theta _1}{\theta _2}S/2}\\&\quad \frac{S/2}{\Gamma ((J+2)/2)}\theta _2^{-(J+2)/2-1}e^{-\frac{S/2}{\theta _2}}\theta _2^{-1} \end{aligned}$$

Next \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) is proportional to \(f_{\theta _1,\theta _2}(\theta _1,\theta _2)\) and so we can derive the conditional density of \(\theta _2\) as follows,

$$\begin{aligned} f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)&\propto f_{\theta _1,\theta _2}(\theta _1,\theta _2) \end{aligned}$$
(C10)
$$\begin{aligned}&\propto \theta _2^{1-1/2}e^{-\frac{1}{\theta _2}\theta _1S/2} \theta _2^{-(J+2)/2-1}e^{-\frac{1}{\theta _2}S/2}\theta _2^{-1} \end{aligned}$$
(C11)
$$\begin{aligned}&= \theta _2^{-(1/2+(J+2)/2)-1}e^{-\frac{1}{\theta _2}(\theta _1+1)S/2} \end{aligned}$$
(C12)
$$\begin{aligned}&= \theta _2^{-(J-1)/2-1}e^{-\frac{1}{\theta _2}(\theta _1+1)S/2} \end{aligned}$$
(C13)

This is proportional to an inverse gamma distribution and so, \(\theta _2\mid \theta _1 \sim \Gamma ^{-1}\left( \frac{J-1}{2}, (\theta _1+1)S/2\right) \). We know that the inverse gamma distribution is continuous and unimodal, so \(M=1\).

(b) Calculate the maximum value of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) : Similar to Fig. 5 of Example 4.7, \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) reaches its maximum height when \(\theta _1=0\). It can also be shown from equation C10 that the density function of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) is maximized when \(\theta _1=0\) since the normalizing constant will be the largest. So the largest value of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) will occur when \(\theta _1=0\). To find the maximum conditional distribution, we find the value of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1=0)\) evaluated at \(\theta _2= \frac{S}{J+1}\), the mode (see Section 5.3 of Hoff (2009)).

$$\begin{aligned} K&= f_{\theta _2\mid \theta _1}\left( \frac{S}{J+1}\mid \theta _1=0\right) \\&=\frac{(S/2)^{\frac{J-1}{2}}}{\Gamma (\frac{J-1}{2})}y^{-\frac{J-1}{2}-1}e^{-\frac{S/2}{y}}\mid _{y=\frac{S}{J+1}}\\&=\frac{(S/2)^{\frac{J-1}{2}}}{\Gamma (\frac{J-1}{2})}\left( \frac{S}{J+1}\right) ^{-\frac{J-3}{2}}e^{-\frac{J+1}{2}} \end{aligned}$$

And so,

$$\begin{aligned} K&= \frac{(S/2)^{\frac{J-1}{2}}}{\Gamma (\frac{J-1}{2})}\left( \frac{S}{J+1}\right) ^{-\frac{J-3}{2}}e^{-\frac{J+1}{2}} \end{aligned}$$
(C14)

\(\square \)

Proof of Lemma 3.5

By the property of stationary distribution, if \(\sigma ^2_{n-1}\sim \pi \) then \(\sigma ^2_{n}\sim \pi \) and so the lemma follows from the following.

$$\begin{aligned} E_{\sigma ^2_n \sim \pi }[V(\sigma ^2_n)]= & {} E_{\sigma ^2_{n-1}\sim \pi }[E[V(\sigma ^2_n)\mid \sigma ^2_{n-1}]]\\\le & {} E_{\sigma ^2_{n-1} \sim \pi }[\lambda V(\sigma ^2_{n-1}) +b] \\= & {} \lambda E_{\sigma ^2_{n} \sim \pi }[ V(\sigma ^2_{n})]+b\\ \end{aligned}$$

\(\square \)

Proof of 4.19

Let \(\lambda = 0.6583702\), \(h=-0.5248723\) and \(b=106.3874\), then

$$\begin{aligned}&E[V(\sigma ^2_n)\mid \sigma ^2_{n-1}] \\&= E[(\sigma ^2_n-h)^2\mid \sigma ^2_{n-1}]\\&=E[(\sigma ^2_n)^2-2h \sigma ^2_n + h^2\mid \sigma ^2_{n-1}]\\&=E[(X_nY_n \sigma ^2_{n-1} + Y_n)^2-2h (X_nY_n \sigma ^2_{n-1} + Y_n) + h^2\mid \sigma ^2_{n-1}]\\&=E[Y_n^2](E[X_n^2](\sigma ^2_{n-1})^2 + 2E[X_n]\sigma ^2_{n-1} + 1)\\&\quad -2h (E[X_n] E[Y_n] \sigma ^2_{n-1} + E[Y_n]) + h^2\\&=E[Y_n^2]E[X_n^2](\sigma ^2_{n-1})^2 + 2E[X_n]E[Y_n^2]\sigma ^2_{n-1} \\&\quad + E[Y_n^2]-2hE[X_n] E[Y_n] \sigma ^2_{n-1} -2hE[Y_n] + h^2\\&=E[Y_n^2]E[X_n^2](\sigma ^2_{n-1})^2 + 2E[X_n](E[Y_n^2]\\&\quad -h E[Y_n])\sigma ^2_{n-1} + E[Y_n^2]-2hE[Y_n] + h^2\\&=0.6583702(\sigma ^2_{n-1})^2 + 0.6911206\sigma ^2_{n-1} + 107.3691\\&=\lambda (\sigma ^2_{n-1})^2 + 2\lambda h\sigma ^2_{n-1} + \lambda h^2 +b\\&=\lambda (\sigma ^2_{n-1}+h)^2 +b \\ \end{aligned}$$

\(\square \)

1.4 Proof of Theorem 4.23

Proof of Theorem 4.23

This example uses a modified version of the Sideways Theorem 4.2 to find an upper bound on the convergence rate. We will also use Proposition 2.2, which states that the total variation between two random variables is equal to the total variation of any invertible transformation of the same two random variables.

Let \(\vec {X}_n, \vec {X}'_n \in {\mathbb {R}}^2\) be two copies of the autoregressive normal process as defined in Example 4.22. Then for \(\vec {Z}_n\sim N(\vec {0},I_d)\),

$$\begin{aligned} \vec {X}_n=A\vec {X}_{n-1}+\Sigma _d\vec {Z}_n \vec {X}'_n=A\vec {X}'_{n-1}+\Sigma _d\vec {Z}'_n \end{aligned}$$

We apply the one-shot coupling method to bound the total variation distance. For \(n<N\) set \(\vec {Z}_n=\vec {Z} '_n\).

Suppose \(X_0, X'_0\) are known and define

$$\begin{aligned} \Delta = \Vert \Sigma ^{-1}_d A^n (\vec {X}_{0}-\vec {X}'_{0})\Vert _2 \end{aligned}$$

Decompose \(A=P D P^{-1}\) with D as the corresponding diagonal matrix, \(\lambda _i\) is the ith eigenvalue of A and \(\Vert \cdot \Vert _2\) denotes the Frobenius norm. Then \(\Delta \) is bounded above as follows,

$$\begin{aligned} \Delta&= \Vert \Sigma ^{-1}_d A^n (\vec {X}_{0}-\vec {X}'_{0})\Vert _2 \\&=\Vert \Sigma ^{-1}_d P D^n P^{-1} (\vec {X}_{0}-\vec {X}'_{0})\Vert _2\\&\le \Vert \Sigma ^{-1}_d\Vert _2 \cdot \Vert P|_2 \Vert D^n\Vert _2 \Vert P^{-1}\Vert _2 \Vert \vec {X}_{0}-\vec {X}'_{0}\Vert _2 \\&\quad \text {by Lemma 1.2.7 of}\,48\\&\le \Vert \Sigma ^{-1}_d\Vert _2 \cdot \Vert P\Vert _2 \Vert P^{-1}\Vert _2 \Vert \vec {X}_{0}-\vec {X}'_{0}\Vert _2 \sqrt{\sum _{i=1}^d \mid \lambda _i\mid ^{2n}}\\&\le \Vert \Sigma ^{-1}_d\Vert _2 \cdot \Vert P\Vert _2 \Vert P^{-1}\Vert _2 \Vert \vec {X}_{0}-\vec {X}'_{0}\Vert _2 \sqrt{d} \max _{1\le i\le d}\mid \lambda _i\mid ^n \end{aligned}$$

For now assume that \(X_0, X'_0\) are known and note that \(\Sigma ^{-1}_d\) is an invertible transform. We bound the total variation distance as follows by applying two invertible transforms on the Markov chain and using the fact that \(\vec {Z}_{m}=\vec {Z}'_m, m < N\).

$$\begin{aligned}&\Vert {\mathcal {L}}(\vec {X}_N)-{\mathcal {L}}(\vec {X}'_N)\Vert \\&\le E_{\{\vec {Z}_m\}_{m<N}}\left[ \Vert {\mathcal {L}}(\vec {X}_N)-{\mathcal {L}}(\vec {X}'_N)\Vert \right] \\&\quad \quad \text {by Proposition}\, 2.3\\&= E_{\{\vec {Z}_m\}_{m<N}}\left[ \Vert {\mathcal {L}}(\Sigma ^{-1}_d\vec {X}_N)-{\mathcal {L}}(\Sigma ^{-1}_d\vec {X}'_N)\Vert \right] \\&\quad \quad \text {by Proposition}\, 2.2\\&= E_{\{\vec {Z}_m\}_{m<N}}\left[ \Vert {\mathcal {L}}(\Sigma ^{-1}_d A \vec {X}_{N-1} +\vec {Z}_N)-{\mathcal {L}}(\Sigma ^{-1}_dA \vec {X}'_{N-1} +\vec {Z} '_N)\Vert \right] \\&= E_{\{\vec {Z}_m\}_{m<N}}\left[ \Vert {\mathcal {L}}(\Sigma ^{-1}_d A^N \vec {X}_0 +\vec {Z}_N)-{\mathcal {L}}(\Sigma ^{-1}_d A^N \vec {X}'_0 +\vec {Z} '_N)\Vert \right] \\&\quad \text {by Proposition}\, 2.2\\&=E_{\{\vec {Z}_m\}_{m<N}}\left[ \Vert {\mathcal {L}}(\vec {Z}_N +\Sigma ^{-1}_dA^N (\vec {X}_{0} -\vec {X}'_{0}))-{\mathcal {L}}(\vec {Z} '_N)\Vert \right] \\&= \Vert {\mathcal {L}}(\vec {Z}_N +\Sigma ^{-1}_dA^N (\vec {X}_{0} -\vec {X}'_{0}))-{\mathcal {L}}(\vec {Z} '_N)\Vert \end{aligned}$$

There exists a rotation matrix \(R\in {\mathbb {R}}^{d\times d}\) such that

$$\begin{aligned} R[\Sigma ^{-1}_dA (\vec {X}_n -\vec {X}'_n)]= & {} (\Vert \Sigma ^{-1}_dA (\vec {X}_n -\vec {X}'_n)\Vert _2,0,\ldots 0)\\= & {} (\Delta ,0,\ldots 0) \end{aligned}$$

Aggarwal (2020). By properties of rotation, R is orthogonal, so \(R^T =R^{-1}\) and \(RZ_n \sim N(0,RI_d R^T)=N(0,I_d)\sim Z_n\). In other words, \(RZ_n \overset{d}{=} Z_n \overset{d}{=} Z'_n\). Thus, continuing the above equality,

$$\begin{aligned}&\Vert {\mathcal {L}}(\vec {X}_n)-{\mathcal {L}}(\vec {X}'_n)\Vert \\&\quad \le \Vert {\mathcal {L}}(\vec {Z}_n +\Sigma ^{-1}_dA^n (\vec {X}_{0} -\vec {X}'_{0}))-{\mathcal {L}}(\vec {Z} '_n)\Vert \\&\quad = \Vert {\mathcal {L}}(R[\vec {Z}_n +\Sigma ^{-1}_dA (\vec {X}_n -\vec {X}'_n)])-{\mathcal {L}}(R\vec {Z} '_n)\Vert&\text {by Proposition}\, 2.2\\&\quad =\Vert {\mathcal {L}}(\vec {Z}_n +(\Delta ,0,\ldots 0))-{\mathcal {L}}(\vec {Z} _n)\Vert \end{aligned}$$

Next, suppose that \(X_0, X'_0\) are unknown. Then, the inequality stated in Eq. 12 is shown as follows,

$$\begin{aligned}&\Vert {\mathcal {L}}(\vec {X}_n)-{\mathcal {L}}(\vec {X}'_n)\Vert \\&\le E_{\Delta }\left[ \Vert {\mathcal {L}}(\vec {Z}_n +(\Delta ,0,\ldots 0))-{\mathcal {L}}(\vec {Z} _n)\Vert \right] \quad \text {by Proposition}\, 2.3\\&= E_{\Delta }\left[ \frac{1}{2}\int _{{\mathbb {R}}^d} \frac{e^{ -\sum _{i=2}^d y_i^2/2}}{(2\pi )^{d/2}} \left| e^{-y_1^2/2}-e^{-(y_1-\Delta )^2/2}\right| d\vec {y} \right] \\&= E_{\Delta }\left[ \frac{1}{2}\int _{{\mathbb {R}}} \left| \frac{1}{\sqrt{2\pi }}e^{-y_1^2/2 }-\frac{1}{\sqrt{2\pi }}e^{-(y_1-\Delta )^2/2}d\right| \vec {y} \right] \\&= E_{\Delta }[\Vert {\mathcal {L}}(Z_{1,n}+\Delta )-{\mathcal {L}}(Z_{1,n})\Vert ]\\&\le \frac{1}{\sqrt{2\pi }} E[\Delta ] \quad \text {by Lemma}\, B.3\\&\le \sqrt{\frac{d}{2\pi }} \Vert \Sigma ^{-1}_d\Vert _2 \cdot \Vert P\Vert _2 \Vert P^{-1}\Vert _2 E[\Vert \vec {X}_{0}-\vec {X}'_{0}\Vert _2] \max _{1\le i\le d}\mid \lambda _i\mid ^n \end{aligned}$$

\(\square \)

Appendix D: Lemmas for ARCH process examples

1.1 Proof of lemmas used in Theorem 5.3

Proof of Lemma 5.4

Let \(\{X_n\}_{n\ge 1}\in {\mathbb {R}}\) and \(\{X'_n\}_{n\ge 1}\in {\mathbb {R}}\) be two copies of the LARCH process. For fixed \(n\ge 1\), let \(Z_n=Z'_n\) and so,

$$\begin{aligned} E[|X_n-X'_n|]&= E[|(\beta _0+\beta _1 X_{n-1})Z_n-(\beta _0+\beta _1 X'_{n-1})Z_n|]\\&\le \beta _1 E[|Z_n|] E[|X_{n-1}-X'_{n-1}|] \end{aligned}$$

Since \(Z_n\overset{d}{=} Z_0>0\) a.s., the geometric convergence rate is \(D=\beta _1 E[Z_0]\). \(\square \)

Proof of Lemma 5.5

For a fixed \(n\ge 0\), suppose that \(Z_{n+1}, Z'_{n+1}\) are independent. By Proposition 2.3, the total variation distance between the two processes is bounded above by the expectation of the total variation.

$$\begin{aligned} \Vert {\mathcal {L}}(X_{n+1})-{\mathcal {L}}(X'_{n+1})\Vert&\le E[\Vert {\mathcal {L}}((\beta _0+\beta _1 X_{n})Z_{n+1})\\&\quad -{\mathcal {L}}((\beta _0+\beta _1 X'_{n})Z_{n+1})\Vert ] \end{aligned}$$

Note that \(Z_{n+1}\) and \(Z'_{n+1}\) are used interchangeably in the total variation distance since \(Z_{n+1}\overset{d}{=}Z'_{n+1}\). Let \(Y_{n}=\beta _0+\beta _1 X_{n}\), \(Y'_{n}=\beta _0+\beta _1 X'_{n}\), \(\Delta =Y'_{n}-Y_{n}\), and \(\Delta '=\frac{\Delta }{Y_{n}}\). WLOG \(Y'_{n}>Y_{n}\) so that \(\Delta , \Delta '>0\). Then,

$$\begin{aligned}&\Vert {\mathcal {L}}(X_{n+1})-{\mathcal {L}}(X'_{n+1})\Vert \\&\quad \le E[\Vert {\mathcal {L}}(Y_{n}Z_{n+1})-{\mathcal {L}}(Y'_{n}Z_{n+1})\Vert ]\quad \text {by Proposition}\, 2.3\\&\quad =E[\Vert {\mathcal {L}}(Y_{n}Z_{n+1})-{\mathcal {L}}((Y_{n}+\Delta )Z_{n+1})\Vert ]\\&\quad =E[\Vert {\mathcal {L}}(Z_{n+1})-{\mathcal {L}}((1+\Delta ')Z_{n+1})\Vert ]\quad \text {by Proposition}\, 2.2\\&\quad =E[\Vert {\mathcal {L}}(\log (Z_{n+1}))-{\mathcal {L}}(\log (1+\Delta ')+\log (Z_{n+1}))\Vert ]\\&\qquad \text {by Proposition}\, 2.2\\&\quad \le \frac{M+1}{2}\sup _x e^x f_{Z_n}(e^x)E[\log (1+\Delta ')]\\&\quad \le \frac{M+1}{2}\sup _x e^x f_{Z_n}(e^x)\frac{E[|\Delta |]}{\beta _0}\\&\quad = \frac{M+1}{2}\sup _x e^x f_{Z_n}(e^x)\frac{\beta _1E[|X_n-X'_n|]}{\beta _0} \end{aligned}$$

The second last inequality is by Lemma B.3. See the proof of Lemma 4.3 for more details. The last inequality is by the Mean Value Theorem. \(\square \)

1.2 Proof of lemmas used in Theorem 5.8

Proof of Lemma 5.9

Let \(\{X_n\}_{n\ge 1}\in {\mathbb {R}}\) and \(\{X'_n\}_{n\ge 1}\in {\mathbb {R}}\) be two copies of the asymmetric ARCH process.

For a fixed \(n\ge 1\), let \(Z_n=Z'_n\) and so,

$$\begin{aligned}&E[|X_n-X'_n|]\\&\quad = E\left[ |\sqrt{(aX_{n-1}+b)^2+c^2}Z_n-\sqrt{(aX'_{n-1}+b)^2+c^2}Z_n|\right] \\&\quad = |\sqrt{(aX_{n-1}+b)^2+c^2}-\sqrt{(aX'_{n-1}+b)^2+c^2}|\\&\qquad E[|Z_n|] \end{aligned}$$

Note that the derivative of \(f(x)=\sqrt{(ax+b)^2+c^2}\) is

$$\begin{aligned} |f'(x)|=|\frac{a(ax+b)}{\sqrt{(ax+b)^2+c^2}}|\le \frac{|a(ax+b)|}{\sqrt{(ax+b)^2}}=|a| \end{aligned}$$
(D15)

and so,

$$\begin{aligned} E[|X_n-X'_n|]&\le |a| E[|Z_n|] E[|X_{n-1}-X'_{n-1}|] \end{aligned}$$

Thus, the geometric convergence rate is \(D=|a| E[|Z_0|]\). \(\square \)

Proof of Lemma 5.10

Let \(\{X_n\}_{n\ge 1}\in {\mathbb {R}}\) and \(\{X'_n\}_{n\ge 1}\in {\mathbb {R}}\) be two copies of the asymmetric ARCH process.

For \(n\ge 1\), \(Z_n, Z'_n\) are independent. By Proposition 2.3, the total variation distance between the two processes is bounded above by the expectation of the total variation with respect to \(X_{n-1},X'_{n-1}, Z_n, Z'_n\).

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \\&\quad \le E\left[ \Vert {\mathcal {L}}(\sqrt{(aX_{n-1}+b)^2+c^2}Z_n)\right. \\&\qquad \left. -{\mathcal {L}}(\sqrt{(aX'_{n-1}+b)^2+c^2}Z'_n)\Vert \right] \end{aligned}$$

Let \(Y_{n-1}=\sqrt{(aX_{n-1}+b)^2+c^2}\) and \(Y'_{n-1} =\sqrt{(aX_{n-1}+b)^2+c^2}\), \(\Delta =Y'_{n-1}-Y_{n-1}\) and \(\Delta '=\frac{\Delta }{Y_{n-1}}\). WLOG, \(Y'_{n-1}<Y_{n-1}\), so \(-1< \Delta ' <0\), because \(Y_{n-1},Y'_{n-1}>0\) and

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \le E[\Vert {\mathcal {L}}(Y_{n-1}Z_n)-{\mathcal {L}}(Y'_{n-1}Z_n)\Vert ]\\&\quad = E[\Vert {\mathcal {L}}(Y_{n-1}Z_n)-{\mathcal {L}}((Y_{n-1}+\Delta )Z_n)\Vert ] \\&\quad \text {by Proposition}\, 2.2\\&\quad = E[\Vert {\mathcal {L}}(Z_n)-{\mathcal {L}}((1+\Delta ')Z_n)\Vert ] \quad \text {by Proposition}\, 2.2\\&\quad \le E\left[ \sup _{x} 1-\frac{\pi _{Z_n}(x)}{\pi _{(1+\Delta ')Z_n}(x)}\right] \\&\quad \text {by Lemma 6.16 of}\, 24 \end{aligned}$$

Let the density of \(Z_n\) be \(\pi _{Z_n}(x)\), then \(\pi _{(1+\Delta ')Z_n}(x)= \frac{1}{1+\Delta '}\pi _{Z_n}\left( \frac{x}{1+\Delta '}\right) \).

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \le E\left[ \sup _{x} 1-(1+\Delta ')\frac{\pi _{Z_n}(x)}{\pi _{Z_n}\left( \frac{x}{1+\Delta '}\right) }\right] \\&\quad \le E[\sup _{x} 1-(1+\Delta ')]\\&\quad = E[\Delta ']\\&\quad \le \frac{E[|Y_{n-1}-Y'_{n-1}|]}{c}\,\, \hbox { since}\ Y_{n-1}\ge c\\&\quad \le \frac{|a|}{c}E[|X_{n-1}-X'_{n-1}|] \\&\qquad \text {by equation} \,D15 \end{aligned}$$

The second inequality is by assumption \(\pi _{Z_n}(x){\ge }\pi _{Z_n}\Big (\frac{x}{1+\Delta '}\Big )\).

\(\square \)

1.3 Proof of lemmas used in Theorem 5.13

Proof of Lemma 5.14

Let \(\{X_n\}_{n\ge 1}\in {\mathbb {R}}\) and \(\{X'_n\}_{n\ge 1}\in {\mathbb {R}}\) be two copies of the GARCH process. For \(n\ge 2\), let \(Z_n=Z'_n\). First note that,

$$\begin{aligned} E[|X_n-X'_n|]&= E[|\sigma _nZ_n -\sigma '_n Z_n|]= E[|\sigma _n -\sigma '_n| |Z_n|]\nonumber \\&=E[|\sigma _n -\sigma '_n|]E[ |Z_n|] \end{aligned}$$
(D16)

Next, we find an upper bound on \(E[|\sigma _n-\sigma '_n|]\) by first noting that \(\sigma ^2_n=\alpha ^2+(\beta ^2 Z^2_{n-1}+\gamma ^2)\sigma ^2_{n-1}\) by substitution.

$$\begin{aligned}&E[|\sigma _n-\sigma '_n|] \\&\quad = E\bigg [|\sqrt{\alpha ^2+(\beta ^2 Z^2_{n-1}+\gamma ^2)\sigma ^2_{n-1}} \\&\qquad -\sqrt{\alpha ^2+(\beta ^2 Z^2_{n-1}+\gamma ^2)\sigma ^{'2}_{n-1}}|\bigg ] \\&\quad \le E\bigg [\sqrt{\beta ^2 Z^2_{n-1}+\gamma ^2}\bigg ] E\bigg [|\sigma _{n-1}-\sigma ^{'}_{n-1}|\bigg ]\\&\quad =E\bigg [\sqrt{\beta ^2 Z^2_{n-1}+\gamma ^2}\bigg ] \frac{E[|X_{n-1}-X'_{n-1}|]}{E[|Z_{n-1}|]} \end{aligned}$$

The above inequality is by taking the maximum of the derivative and the last equality is a result of Eq. D16. Finally, substituting \(E[|\sigma _n-\sigma '_n|]\) into Eq. D16,

$$\begin{aligned}&E[|X_n-X'_n|]\\&\quad \le E[\sqrt{\beta ^2 Z^2_{n-1}+\gamma ^2}] \frac{E[|X_{n-1}-X'_{n-1}|]}{E[|Z_{n-1}|]}E[|Z_n|]\\&\quad = E[\sqrt{\beta ^2 Z^2_{n-1}+\gamma ^2}] E[|X_{n-1}-X'_{n-1}|]\\&\quad \le \sqrt{\beta ^2 E[Z_0^2]+\gamma ^2} E[|X_{n-1}-X'_{n-1}|] \\&\qquad \text {by Jensen's inequality} \end{aligned}$$

Thus, the geometric convergence rate is \(D=\sqrt{\beta ^2 E[Z_0^2]+\gamma ^2}\). \(\square \)

Proof of Lemma 5.15

Let \(\{X_n\}_{n\ge 1}\in {\mathbb {R}}\) and \(\{X'_n\}_{n\ge 1}\in {\mathbb {R}}\) be two copies of the GARCH process.

For \(n\ge 2\), suppose that \(Z_n, Z'_n\) are independent. By Proposition 2.3, the total variation distance between the two processes is bounded above by the expectation of the total variation.

$$\begin{aligned} \Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert&\le E[\Vert {\mathcal {L}}(\sigma _nZ_n)-{\mathcal {L}}(\sigma '_n Z_n)\Vert ] \end{aligned}$$

Let \(\Delta =\sigma '_{n}-\sigma _{n}\) and \(\Delta '=\frac{\Delta }{\sigma _{n}}\). WLOG, \(\sigma '_{n}<\sigma _{n}\), so \(\Delta , \Delta ' <0\) because \(\sigma _{n},\sigma '_{n}>0\) and

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \\&\quad = E[\Vert {\mathcal {L}}(\sigma _{n}Z_n)-{\mathcal {L}}((\sigma _{n}+\Delta )Z_n)\Vert ]\\&\qquad \text {by Proposition}\, 2.2\\&\quad = E[\Vert {\mathcal {L}}(Z_n)-{\mathcal {L}}((1+\Delta ')Z_n)\Vert ]\\&\qquad \text {by Proposition}\, 2.2\\&\quad \le E\left[ \sup _{x} 1-\frac{\pi _{Z_n}(x)}{\pi _{(1+\Delta ')Z_n}(x)}\right] \\&\qquad \text {by Lemma 6.16 of}\, 24 \end{aligned}$$

Let the density of \(Z_n\) be \(\pi _{Z_n}(x)\), then \(\pi _{(1+\Delta ')Z_n}(x)= \frac{1}{1+\Delta '}\pi _{Z_n}\left( \frac{x}{1+\Delta '}\right) \).

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \\&\quad \le E\left[ \sup _{x} 1-(1+\Delta ')\frac{\pi _{Z_n}(x)}{\pi _{Z_n}\left( \frac{x}{1+\Delta '}\right) }\right] \\&\quad \le E[\sup _{x} 1-(1+\Delta ')]\\&\qquad \hbox { by assumption}\ \pi _{Z_n}(x)\ge \pi _{Z_n}\left( \frac{x}{1+\Delta '}\right) \\&\quad = E[\Delta ']\\&\quad \le \frac{E[|\sigma '_{n}-\sigma _{n}|]}{\alpha }\hbox { since}\ \sigma _{n}\ge \alpha \\&\quad \le \frac{D}{\alpha E[|Z_{n-1}|]} E[|X_{n-1}-X'_{n-1}|] \\&\quad \quad \text {by equation in proof}\, D.3 \end{aligned}$$

\(\square \)

Proof of Lemma 5.16

$$\begin{aligned}&E[|X_1 - X'_1|] \\&= |\sigma ^{2}_1 - \sigma ^{'2}_1| E[|Z_1|] \qquad \qquad \text {by equation in proof}\, D.3\\&= |\sqrt{\alpha ^2 +\beta ^2 X_0^{2} + \gamma ^2 \sigma _0^{2}} - \sqrt{\alpha ^2 +\beta ^2 X_0^{'2} + \gamma ^2 \sigma _0^{'2}}| E[|Z_1|] \\&\le \sqrt{|(\alpha ^2 +\beta ^2 X_0^{2} + \gamma ^2 \sigma _0^{2}) - (\alpha ^2 +\beta ^2 X_0^{'2} + \gamma ^2 \sigma _0^{'2})|} E[|Z_1|] \\&\quad \text {since} |\sqrt{x}-\sqrt{y}|=\sqrt{(\sqrt{x}-\sqrt{y})^2} = \sqrt{x+y-2\sqrt{x}\sqrt{y}} \\&\le \sqrt{|x-y|} \\&\le \sqrt{\beta ^2 |X_0^{2}-X_0^{'2}| + \gamma ^2 |\sigma _0^{2} - \sigma _0^{'2}|} E[|Z_0|] \end{aligned}$$

\(\square \)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sixta, S., Rosenthal, J.S. Convergence rate bounds for iterative random functions using one-shot coupling. Stat Comput 32, 71 (2022). https://doi.org/10.1007/s11222-022-10134-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-022-10134-x

Keywords

Navigation