Convergence rate bounds for iterative random functions using one-shot coupling

Sixta, Sabrina; Rosenthal, Jeffrey S.

doi:10.1007/s11222-022-10134-x

Convergence rate bounds for iterative random functions using one-shot coupling

Published: 02 September 2022

Volume 32, article number 71, (2022)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

161 Accesses
1 Altmetric
Explore all metrics

Abstract

One-shot coupling is a method of bounding the convergence rate between two copies of a Markov chain in total variation distance, which was first introduced in Roberts and Rosenthal (Process Appl 99:195–208, 2002) and generalized in Madras and Sezer (Bernoulli 16:882–908, 2010). The method is divided into two parts: the contraction phase, when the chains converge in expected distance and the coalescing phase, which occurs at the last iteration, when there is an attempt to couple. One-shot coupling does not require the use of any exogenous variables like a drift function or a minorization constant. In this paper, we summarize the one-shot coupling method into the One-Shot Coupling Theorem. We then apply the theorem to two families of Markov chains: the random functional autoregressive process and the autoregressive conditional heteroscedastic process. We provide multiple examples of how the theorem can be used on various models including ones in high dimensions. These examples illustrate how the theorem’s conditions can be verified in a straightforward way. The one-shot coupling method appears to generate tight geometric convergence rate bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convergence arguments to bridge cauchy and matérn covariance functions

Article 15 February 2023

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Article 17 January 2019

Conservative and Semiconservative Random Walks: Recurrence and Transience

Article 27 February 2017

Notes

The trigonometric identities used are $2\cos \mu \sin \upsilon = \sin (\mu +\upsilon )-\sin (\mu -\upsilon )$ and $\cos (\mu + \upsilon )= \cos \mu \cos \upsilon +\sin \mu \sin \upsilon $ where $\mu ,\upsilon \in {\mathbb {R}}$

References

Aggarwal, C.: Linear Algebra and Optimization for Machine Learning: A Textbook. Springer, New York (2020). https://doi.org/10.1007/978-3-030-40344-7
Book MATH Google Scholar
Baxendale, P.H.: Renewal theory and computable convergence rates for geometrically ergodic Markov chains. Ann. Appl. Probab. 15(1B), 700–738 (2005). https://doi.org/10.1214/105051604000000710
Article MathSciNet MATH Google Scholar
Billingsley, P.: Probability and Measure, Anniversary Wiley Series in Probability and Statistics, New York (2012)
MATH Google Scholar
Böttcher, B.: Markovian Maximal Coupling of Markov Processes (2017)
Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting, 2nd edn. Springer, New York (2002). https://doi.org/10.1007/978-3-319-29854-2
Book MATH Google Scholar
Diaconis, P., Freedman, D.: Iterated random functions. SIAM Rev. 41(1), 45–76 (1999). https://doi.org/10.1137/S0036144598338446
Article MathSciNet MATH Google Scholar
Doukhan, P.: Stochastic Models for Time Series, 1st edn. Springer, New York (2018). https://doi.org/10.1007/978-3-319-76938-7
Book MATH Google Scholar
Durmus, A., Moulines, Éric.: Quantitative bounds of convergence for geometrically ergodic Markov chain in the Wasserstein distance with application to the Metropolis Adjusted Langevin Algorithm. Stat. Comput. 25, 5–19 (2015). https://doi.org/10.1007/s11222-014-9511-z
Article MathSciNet MATH Google Scholar
Dyer, M., Goldberg, L.A., Jerrum, M., Martin, R.: Markov chain comparison. Probab. Surveys 3, 89–111 (2006). https://doi.org/10.1214/154957806000000041
Article MathSciNet MATH Google Scholar
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7(4), 457–472 (1992). https://doi.org/10.1214/ss/1177011136
Article MATH Google Scholar
Geyer, C.J.: Introduction to Markov Chain Monte Carlo, pp. 1–46. Chapman and Hall/CRC, New York (2011). https://doi.org/10.1201/b10905
Book MATH Google Scholar
Gibbs, A.L.: Convergence in the Wasserstein metric for Markov chain monte Carlo algorithms with applications to image restoration. Stoch. Model. 20(4), 473–492 (2004). https://doi.org/10.1081/STM-200033117
Article MathSciNet MATH Google Scholar
Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. / Revue Internationale de Statistique 70(3), 419–435 (2002). https://doi.org/10.2307/1403865
Article MATH Google Scholar
Guibourg, D., Hervé, L., Ledoux, J.: Quasi-compactness of Markov kernels on weighted-supremum spaces and geometrical ergodicity (2012)
Hairer, E., Wanner, G.: Analysis by Its History. Springer, New York (2008). https://doi.org/10.1007/978-0-387-77036-9
Book MATH Google Scholar
Hobert, J.P., Jones, G.L.: Honest Exploration of Intractable Probability Distributions via Markov Chain Monte Carlo. Stat. Sci. 16(4), 312–334 (2001). https://doi.org/10.1214/ss/1015346317
Article MathSciNet MATH Google Scholar
Hoff, P.D.: A First Course in Bayesian Statistical Methods. Springer, New York (2009). https://doi.org/10.1007/978-0-387-92407-6
Book MATH Google Scholar
Jacob, P.E.: Lecture notes for couplings and Monte Carlo. Available at https://sites.google.com/site/pierrejacob/cmclectures?authuser=0 (2021/09/17)
Jerison, D.: The drift and minorization method for reversible Markov chains. PhD thesis, Stanford University (2016)
Jin, Z., Hobert, J.P.: Dimension free convergence rates for Gibbs samplers for Bayesian linear mixed models (2021)
Jin, R., Tan, A.: Central limit theorems for markov chains based on their convergence rates in wasserstein distance. arXiv:2002.09427 Statistics Theory (2020)
Jones, G.L.: On the Markov chain central limit theorem. Probab. Surv. 1, 299–320 (2004). https://doi.org/10.1214/154957804100000051
Article MathSciNet MATH Google Scholar
Jovanovski, O.: Convergence bound in total variation for an image restoration model. Stat. Probab. Lett. 90, 11–16 (2014). https://doi.org/10.1016/j.spl.2014.03.007
Article MathSciNet MATH Google Scholar
Jovanovski, O., Madras, N.: Convergence rates for a hierarchical Gibbs sampler. Bernoulli 1(23), 603–625 (2013). https://doi.org/10.3150/15-BEJ758
Article MathSciNet MATH Google Scholar
Levin, D.A., Peres, Y., Wilmer, E.L.: Markov Chains and Mixing Times, 2nd edn. American Mathematical Society, Providence, RI (2017). https://doi.org/10.1090/mbk/107
Book MATH Google Scholar
Liu, J.S., Wong, W.H., Kong, A.: Covariance structure of the gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81(1), 27–40 (1994). https://doi.org/10.1093/biomet/81.1.27
Article MathSciNet MATH Google Scholar
Madras, N., Sezer, D.: Quantitative bounds for Markov chain convergence: Wasserstein and total variation distances. Bernoulli 16(3), 882–908 (2010). https://doi.org/10.2307/25735016
Article MathSciNet MATH Google Scholar
Meyn, S.P., Tweedie, R.L.: Markov Chains and Stochastic Stability. Springer, London (1993). https://doi.org/10.1007/978-1-4471-3267-7
Book MATH Google Scholar
Nummelin., E.: A splitting technique for harris recurrent chains. Z. Wahrscheinlichkeitstheorie und Verw. Geb. 43, 309–318 (1978). https://doi.org/10.1007/BF00534764
Pillai, N.S., Smith, A.: Kac’s walk on $n$-sphere mixes in $n\log n$ steps. Ann. Appl. Probab. 27(1), 631–650 (2017). https://doi.org/10.1214/16-AAP1214
Article MathSciNet MATH Google Scholar
Qin, Q., Hobert, J.P.: Geometric convergence bounds for Markov chains in Wasserstein distance based on generalized drift and contraction conditions (2021)
Qin, Q., Hobert, J.P.: Wasserstein-based methods for convergence complexity analysis of MCMC with applications (2020)
Rajaratnam, B., Sparks, D.: MCMC-Based inference in the era of big data: a fundamental analysis of the convergence complexity of high-dimensional chains (2015)
Reiss, R.-D.: Approximation of product measures with an application to order statistics. Ann. Probab. 9(2), 335–341 (1981). https://doi.org/10.1214/aop/1176994477
Article MathSciNet MATH Google Scholar
Roberts, G.O., Rosenthal, J.S.: Markov chains and de-initializing processes. Scand. J. Stat. 28(3), 489–504 (2001). https://doi.org/10.1111/1467-9469.00250
Article MathSciNet MATH Google Scholar
Roberts, G.O., Rosenthal, J.S.: One-shot coupling for certain stochastic recursive sequences. Stoch. Process. Appl. 99, 195–208 (2002). https://doi.org/10.1016/S0304-4149(02)00096-0
Article MathSciNet MATH Google Scholar
Roberts, G.O., Rosenthal, J.S.: General state space Markov chains and mcmc algorithms. Probab. Surv. 1, 20–71 (2004). https://doi.org/10.1214/154957804100000024
Article MathSciNet MATH Google Scholar
Rosenthal, J.S.: Convergence rates for Markov chains. SIAM Rev. 37(3), 387–405 (1995). https://doi.org/10.1137/1037083
Article MathSciNet MATH Google Scholar
Rosenthal, J.S.: Minorization conditions and convergence rates for Markov chain monte carlo. J. Am. Stat. Assoc. 90(430), 558–566 (1995). https://doi.org/10.2307/2291067
Article MathSciNet MATH Google Scholar
Rosenthal, J.S.: Analysis of the gibbs sampler for a model related to james-stein estimators. Stat. Comput. 6, 269–275 (1996). https://doi.org/10.1007/BF00140871
Article Google Scholar
Rosenthal, J.S.: Faithful couplings of Markov chains: now equals forever. Adv. Appl. Math. 18(3), 372–381 (1997). https://doi.org/10.1006/aama.1996.0515
Article MathSciNet MATH Google Scholar
Rosenthal, J.S.: A First Look at Rigorous Probability Theory, 2nd edn. World Scientific, Singapore (2016). https://doi.org/10.1142/6300
Book MATH Google Scholar
Saloff-Coste, L.: Lectures on finite Markov chains, pp. 301–413. Springer, Berlin, Heidelberg (1997). https://doi.org/10.1007/BFb0092621
Book MATH Google Scholar
Smeets, L., van de Schoot, R.: R regression Bayesian (using brms) (2019). www.rensvandeschoot.com/tutorials/r-linear-regression-bayesian-using-brms/ Accessed 2021-06-03
Steinsaltz, D.: Locally contractive iterated function systems. Ann. Probab. 27(4), 1952–1979 (1999). https://doi.org/10.1214/aop/1022874823
Article MathSciNet MATH Google Scholar
Tan, A., Jones, G.L., Hobert, J.P.: On the geometric ergodicity of two-variable gibbs samplers. Inst. Math. Stat. Collect. 10, 25–42 (2013). https://doi.org/10.1214/12-IMSCOLL1002
Article MathSciNet MATH Google Scholar
van de Schoot, R., Yerkes, M.A., Mouw, J.M., Sonneveld, H.: What took them so long? explaining phd delays among doctoral candidates. PLoS ONE 8(7), 68839 (2013). https://doi.org/10.1371/journal.pone.0068839
Article Google Scholar
Yang, J., Rosenthal, J.S.: Complexity results for MCMC derived from quantitative bounds (2019)

Download references

Acknowledgements

We thank the referees for their many excellent comments and suggestions.

Author information

Authors and Affiliations

Department of Statistical Sciences, University of Toronto, 700 University Avenue, 9th Floor, Toronto, ON, M5G 1Z5, Canada
Sabrina Sixta & Jeffrey S. Rosenthal

Authors

Sabrina Sixta
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey S. Rosenthal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sabrina Sixta.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Propositions related to the properties of total variation distance

Proof of Proposition 2.2

Let ${\mathcal {A}}$ be the sigma field of ${\mathcal {X}}$ and ${\mathcal {B}}$ be the sigma field of ${\mathcal {Y}}$.

First note that $f^{-1}({\mathcal {B}})=\{f^{-1}(B): B\in {\mathcal {B}}\}={\mathcal {A}}$:

$f^{-1}({\mathcal {B}})\subset {\mathcal {A}}$: For $B\in {\mathcal {B}}$, $f^{-1}(B)\subset {\mathcal {A}}$ by measurability.
${\mathcal {A}} \subset f^{-1}({\mathcal {B}})$: Let $A\in {\mathcal {A}}$. Then $f(A)\in {\mathcal {B}}$ and $f^{-1}(f(A))\in f^{-1}({\mathcal {B}})$ by definition. By invertibility, $f^{-1}(f(A))=A$ and so $A \in f^{-1}({\mathcal {B}})$.

The equality in Eq. 2 can then be proven as follows,

$$\begin{aligned}&\Vert {\mathcal {L}}(f(X))-{\mathcal {L}}(f(X'))\Vert \\&\quad =\sup _{B\in f({\mathcal {B}})}|P(f(X)\in B)-P(f(X')\in B)|\\&\quad = \sup _{B\in f({\mathcal {B}})}|P(X\in f^{-1}(B))-P(X'\in f^{-1}(B))|\\&\quad = \sup _{A\in {\mathcal {A}}}|P(X\in A)-P(X'\in A)| \hbox { Since}\ f^{-1}({\mathcal {B}})={\mathcal {A}}\\&\quad =\Vert {\mathcal {L}}(X)-{\mathcal {L}}(X')\Vert \end{aligned}$$

$\square $

Proof of Proposition 2.3

$$\begin{aligned}&\Vert {\mathcal {L}}(X)-{\mathcal {L}}(X')\Vert = \sup _{A\in {\mathcal {B}}} |P(X\in A)-P(X'\in A)|\\&\quad = \sup _{A\in {\mathcal {B}}}|\int _{{\mathcal {Y}}} P(X\in A\mid y)-P(X'\in A\mid y)\mu (dy)|\\&\quad \le \sup _{A\in {\mathcal {B}}}\int _{{\mathcal {Y}}} |P(X\in A\mid y)-P(X'\in A\mid y)|\mu (dy) \\&\qquad \text { by Jensen's inequality}\\&\quad \le \int _{{\mathcal {Y}}} \sup _{A\in {\mathcal {B}}} |P(X\in A\mid y)-P(X'\in A\mid y)|\mu (dy) \\&\quad \le E\left[ \Vert {\mathcal {L}}(X\mid Y)-{\mathcal {L}}(X'\mid Y)\Vert \right] \end{aligned}$$

$\square $

Proof of Proposition 2.4

To prove this, we use the concept of maximal coupling over the coordinates. By maximal coupling, for $i\in \{1,\ldots , d\}$ there exists random variables $X_{i,n}^M, X_{i,n}^{'M}$ such that $X_{i,n}\overset{d}{=}X_{i,n}^M$ and $X'_{i,n}\overset{d}{=}X_{i,n}^{'M}$ and

$$\begin{aligned} \Vert {\mathcal {L}}(X_{i,n})-{\mathcal {L}}(X'_{i,n})\Vert =P(X_{i,n}^M\ne X_{i,n}^{'M}) \end{aligned}$$

(see Proposition 3g of Roberts and Rosenthal (2004) or Section 2 of Böttcher (2017)).

Further, there exists a unique product measure such that for any $A_1, \ldots A_d \in {{\mathcal {B}}}$, $P(\cap _{i=1}^d [X_{i,n}^M\in A_i])=\prod _{i=1}^d P(X_{i,n}^M\in A_i)$ (Theorem 18.2 of Billingsley (2012)). For the unique product measure, the following equality holds,

$$\begin{aligned}&P(\cap _{i=1}^d X_{i,n}^M \in A_i)=\prod _{i=1}^d P(X_{i,n}^M\in A_i)\\&\quad =\prod _{i=1}^d P(X_{i,n}\in A_i) =P(\cap _{i=1}^d X_{i,n} \in A_i) \end{aligned}$$

And so by uniqueness, for $A\in {{\mathcal {B}}}^{\text {d}}$, $P(X_n^M\in A)=P(X_n\in A)$. By definition, this means that $\vec {X}_{n}\overset{d}{=} \vec {X}_n^M$, which implies that $(\vec {X}_n^M, \vec {X}_n^{'M})\in {\mathcal {C}}(\vec {X}_n,\vec {X}'_n)$, the set of all couplings of $\vec {X}_n,\vec {X}'_n$.

We now use $\vec {X}_n^M, \vec {X}_n^{'M}$ to prove Eq. 3.

$$\begin{aligned}&\Vert {\mathcal {L}}(\vec {X}_n)-{\mathcal {L}}(\vec {X}'_n)\Vert \\&\quad = \inf _{\vec {Y},\vec {Y}'\in {\mathcal {C}}(\vec {X}_n,\vec {X}'_n)}P(\vec {Y}\ne \vec {Y}')\, \text {by Eq. 2.4 of}\, 27\\&\quad \le P(\vec {X}_n^M\ne \vec {X}_n^{'M})\\&\quad = P(\cup _{i=1}^d [X_{i,n}^M\ne X_{i,n}^{'M}])\\&\quad \le \sum _{i=1}^d P(X_{i,n}^M\ne X_{i,n}^{'M}) \quad \text {by subadditivity}\\&\quad \le d A r^n \end{aligned}$$

$\square $

Appendix B: Lemmas related to the Sideways Theorem

The following are lemmas and corresponding proofs and corollaries related to the Sideways Theorem (4.2).

1.1 Lemmas providing an upper bound on the integral difference between a function and a corresponding shift

The following lemmas are used in the proof of Lemma 4.3.

Lemma B.1

For any invertible, continuous function $f:{\mathbb {R}}\rightarrow {\mathbb {R}}$ where the codomain is $f({\mathbb {R}})=(a,b)$ and $\Delta >0$,

$$\begin{aligned} \int _{{\mathbb {R}}}|f(x+\Delta )-f(x)|dx=(b-a)\Delta \end{aligned}$$

Proof

Since f is invertible and continuous, it is strictly monotone (Lemma 3.8 if Hairer and Wanner (2008)). Assume that f is strictly increasing. The integral can be written as follows,

$$\begin{aligned}&\int _{{\mathbb {R}}}|f(x+\Delta )-f(x)|dx = \int _{{\mathbb {R}}}f(x+\Delta )-f(x)dx\\&\quad = \int _{{\mathbb {R}}}\int _a^bI_{f(x+\Delta )<y<f(x)}dy dx \\&\quad =\int _{{\mathbb {R}}}\int _a^bI_{f^{-1}(y)-\Delta<x<f^{-1}(y)}dy dx \\&\quad =\int _a^b \int _{{\mathbb {R}}}I_{f^{-1}(y)-\Delta<x<f^{-1}(y)}dx dy \;\text {by Fubini's Theorem} \\&\quad =\int _a^b \Delta dy \\&\quad = (b-a)\Delta \end{aligned}$$

If f is strictly decreasing apply the transform $h(x)=a+b-f(x)$. The function h is a strictly increasing invertible function with codomain (a, b) and so using the previous result for increasing functions,

$$\begin{aligned}&\int _{{\mathbb {R}}}|f(x+\Delta )-f(x)|dx\\&\quad = \int _{{\mathbb {R}}}|h(x+\Delta )-h(x)|dx = (b-a)\Delta \end{aligned}$$

$\square $

Lemma B.2

Let $f:{\mathbb {R}}\rightarrow {\mathbb {R}}$ be a continuous function that is invertible over the set (c, d) and is a constant function over $(c,d)^C$. Further suppose that the codomain is $f({\mathbb {R}})=(a,b)$. Then for $\Delta >0$, we get that

$$\begin{aligned} \int _{{\mathbb {R}}}|f(x+\Delta )-f(x)|dx=(b-a)\Delta \end{aligned}$$

Proof

Assume that f is an increasing function and so $f(c)=a$, $f(d)=b$ and $|f(x+\Delta )-f(x)|=f(x+\Delta )-f(x)$.

Let $0<\epsilon <(c-d)/2$ and define

$$\begin{aligned} g_{\epsilon }(x) = \left\{ \begin{array}{ll} (f(c+\epsilon )-a)(1-e^{x-c-\epsilon })+a &{} when x\in (-\infty ,c+\epsilon ]\\ f(x) &{} when x\in (c+\epsilon ,d-\epsilon ]\\ (f(d-\epsilon )-b)(1-e^{d-\epsilon -x})+b &{} when x\in (d-\epsilon ,\infty )\\ \end{array} \right. \end{aligned}$$

Note that $g_{\epsilon }(x)$ is continuous, invertible, an increasing function and the codomain is (a, b). By Lemma B.1 for each $\epsilon >0$

$$\begin{aligned} \int _{{\mathbb {R}}}g_{\epsilon }(x+\Delta )-g_{\epsilon }(x) dx = (b-a)\Delta \end{aligned}$$

Further, for all $x\in {\mathbb {R}}$, $\lim _{\epsilon \rightarrow 0}g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)=f(x+\Delta )-f(x)$ and so $g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)$ converges pointwise to $f(x+\Delta )-f(x)$. Next, for $0<\epsilon <(c-d)/2$, $|g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)|<2|b|$ and so the function $g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)$ is uniformly bounded. The above statements allow us to apply the dominated convergence Theorem (Theorem 16.5 of Billingsley (2012)) and so

$$\begin{aligned}&\int _{{\mathbb {R}}}f(x+\Delta )-f(x)dx \\&\quad = \lim _{\epsilon \rightarrow 0} \int _{{\mathbb {R}}}g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)dx = (b-a)\Delta \end{aligned}$$

If f is strictly decreasing apply the transform $h(x)=a+b-f(x)$. The function h is a strictly increasing invertible function with codomain (a, b) and so using the previous result for increasing functions,

$$\begin{aligned}&\int _{{\mathbb {R}}}|f(x+\Delta )-f(x)|dx= \int _{{\mathbb {R}}}|h(x+\Delta )\\&\quad -h(x)|dx = (b-a)\Delta \end{aligned}$$

$\square $

Lemma B.3

Let $f:{\mathbb {R}}\rightarrow {\mathbb {R}}$ be a continuous function with the following properties:

the codomain is (0, K)
$(m_1, m_2, \ldots , m_M)$ are the local maxima and minima points
$\lim _{x\rightarrow \infty }f(x)=0$ and $\lim _{x\rightarrow -\infty }f(x)=0$

Further suppose that $\Delta < \max _{i=2,\ldots , M}\{m_i-m_{i-1}\}$. Then

$$\begin{aligned} \int _{{\mathbb {R}}}|f(x-\Delta )-f(x)|dx \le K(M+1)\Delta \end{aligned}$$

Proof

Since $\Delta < \max _{i=2,\ldots , M}\{m_i-m_{i-1}\}$, we have that $m_1-\Delta<m_1<m_2-\Delta<\ldots <m_M$. Let $I_1,\ldots , I_M$ be the intersection points or the points where $f(I_i)=f(I_i-\Delta )$.

Show that $m_i-\Delta<I_i<m_i$: Suppose that $m_i$ is a local maximum point. Let $g(x)=f(x+\Delta )$. Within the interval $(m_i-\Delta ,m_i)$, $f'(x)>0$ and $g'(x)<0$ by assumption. This implies that $f(m_i-\Delta )<f(m_i)$ and $g(m_i-\Delta )>g(m_i)$ by the Mean Value Theorem. Further since $g(m_i-\Delta )=f(m_i)$ we have that $g(m_i-\Delta )>f(m_i-\Delta )$ and $g(m_i)<f(m_i)$.

Let $h(x)=g(x)-f(x)$. Then $h(m_i-\Delta )>0$ and $h(m_i)<0$ further h is a strictly decreasing function over $(m_i-\Delta ,m_i)$ since $g,-f$ are strictly decreasing functions over the same interval. So by the intermediate value theorem, there exists an $\xi \in (m_i-\Delta ,m_i)$ such that $h(\xi )=0$ or $f(\xi )=g(\xi )=f(\xi +\Delta )$. Further by injectivity, $\xi $ is unique. Let $I_i=\xi $. A similar proof can be given for when $m_i$ is a local minimum.

Show that $\int _{I_i}^{I_{i+1}}|f(x+\Delta )-f(x)|dx\le K\Delta $: Note first that $m_i-\Delta<I_i<m_i<m_{i+1}-\Delta<I_{i+1}<m_{i+1}$ further define

$$\begin{aligned} f_i(x) = \left\{ \begin{array}{ll} f(m_i) &{} when x\in (-\infty ,m_i]\\ f(x) &{} when x\in (m_i,m_{i+1}]\\ f(m_{i+1}) &{} when x\in (m_{i+1},\infty )\\ \end{array} \right. \end{aligned}$$

Note that over the interval $(m_i,m_{i+1}]$, the function f is either a strictly increasing or a strictly decreasing function.

$$\begin{aligned}&\int _{I_i}^{I_{i+1}}|f(x+\Delta )-f(x)|dx \\&=\int _{I_i}^{m_i}|f(x+\Delta )-f(x)|dx \\&\quad + \int _{m_i}^{m_{i+1}-\Delta }|f(x+\Delta )-f(x)|dx \\&\quad + \int _{m_{i+1}-\Delta }^{I_{i+1}}|f(x+\Delta )-f(x)|dx\\&\le \int _{I_i}^{m_i}|f(x+\Delta )-f(m_i)|dx \\&\quad + \int _{m_i}^{m_{i+1}-\Delta }|f(x+\Delta )-f(x)|dx \\&\quad + \int _{m_{i+1}-\Delta }^{I_{i+1}}|f(m_{i+1})-f(x)|dx\\&= \int _{I_i}^{m_i}|f_i(x+\Delta )-f_i(x)|dx \\&\quad + \int _{m_i}^{m_{i+1}-\Delta }|f_i(x+\Delta )-f_i(x)|dx \\&\quad \quad + \int _{m_{i+1}-\Delta }^{I_{i+1}}|f_i(x+\Delta )-f_i(x)|dx\\&= \int _{I_i}^{I_{i+1}}|f_i(x+\Delta )-f_i(x)|dx \\&\le \int _{m_i-\Delta }^{m_{i+1}}|f_i(x+\Delta )-f_i(x)|dx \\&= \int _{{\mathbb {R}}}|f_i(x+\Delta )-f_i(x)|dx \\&= |f(m_i)-f(m_{i+1})|\Delta \le K\Delta \end{aligned}$$

The last equality is a result of Lemma B.2.

By similar reasoning, it can be shown that

$$\begin{aligned}&\int _{-\infty }^{I_{1}}|f(x+\Delta )-f(x)|dx\le K\Delta \\&\quad \int _{I_{M}}^{\infty }|f(x+\Delta )-f(x)|dx\le K\Delta \end{aligned}$$

Finally note that the intersection points partition ${\mathbb {R}}$ into $M+1$ subsets and so

$$\begin{aligned} \int _{{\mathbb {R}}}|f(x-\Delta )-f(x)|dx \le K(M+1)\Delta \end{aligned}$$

$\square $

1.1.1 Proof of Lemma 4.3

Lemma 4.3 represents the coalescing condition for the Sideways Theorem 4.2.

Proof of Lemma 4.3

Set $\theta _{1,n}=\theta '_{1,n}$. Define

$$\begin{aligned} \Delta =g(\theta _{1,n},X_{n-1})-g(\theta _{1,n},X'_{n-1}) \end{aligned}$$

Let $f_{X_n},f_{X'_n}$ be the density functions for $X_{n},X_{n}'$, respectively, and $f_{\theta _{2,n}}, f_{\theta _{2,n}+\Delta }$ be the density functions for $\theta _{2,n}, \theta _{2,n}+\Delta $.

Suppose that $\Delta ,X_{n-1}, X'_{n-1} \in {\mathbb {R}}$ are known and so,

$$\begin{aligned} X_n&=g(\theta _{1,n},X_{n-1})+\theta _{2,n} \implies \theta _{2,n}\\&=X_n-g(\theta _{1,n},X_{n-1}) \\ X'_n&=g(\theta _{1,n},X'_{n-1})+\theta '_{2,n} \implies \theta '_{2,n}-\Delta \\&=X'_n-g(\theta _{1,n},X_{n-1}) \end{aligned}$$

We know that $\theta _{2,n}\overset{d}{=}\theta '_{2,n}$ and in general $\Delta $, $\theta _{1,n}$ are random variables, so

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \nonumber \\&\quad \le E_{\theta _{1,n}, \Delta }\left[ \Vert {\mathcal {L}}(X_n\mid \theta _{1,n}, \Delta )-{\mathcal {L}}(X'_n\mid \theta _{1,n}, \Delta )\Vert \right] \nonumber \\&\qquad \text {by Proposition} 2.3 \end{aligned}$$

(B1)

$$\begin{aligned}&\quad = E_{\theta _{1,n}, \Delta }\left[ \Vert {\mathcal {L}}(\theta _{2,n}\mid \theta _{1,n})-{\mathcal {L}}(\theta _{2,n}-\Delta \mid \theta _{1,n})\Vert \right] \nonumber \\&\qquad \text {by Proposition}\, 2.3 \end{aligned}$$

(B2)

By the assumptions in the theorem, the density of $\theta _{2,n}$ is continuous with M extrema points and has a codomain that is in (0, K). Let $(m_1, m_2,\ldots , m_M)$ be the local extrema points where $m_i<m_j$ if $i<j$ and $L\le \max _{2\le i\le M}\{m_i-m_{i-1}\}$ be the maximum distance between two local extrema points. So, continuing from the inequality B1 and by the definition of total variation, Eq. 1,

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \\&\quad \le E_{\theta _{1,n}}\left[ E_{\Delta }\left[ \frac{1}{2}\int _{{\mathbb {R}}}|f_{\theta _{2,n}}(x\mid \theta _{1,n})-f_{\theta _{2,n}-\Delta }(x\mid \theta _{1,n})|dx\right] \right] \\&\quad = E_{\theta _{1,n}}\left[ E_{\Delta }\left[ \frac{1}{2}\int _{{\mathbb {R}}}|f_{\theta _{2,n}}(x\mid \theta _{1,n})-f_{\theta _{2,n}}(x+\Delta \mid \theta _{1,n})|dx\right] \right] \\&\quad = E_{\theta _{1,n}}\left[ E_{\Delta }\left[ \frac{1}{2}\int _{{\mathbb {R}}}|f_{\theta _{2,n}}(x\mid \theta _{1,n})-f_{\theta _{2,n}}(x+\Delta \mid \theta _{1,n})|dxI_{\Delta <L}\right] \right] \\&\quad \quad + E_{\theta _{1,n}}\left[ E_{\Delta }\left[ \frac{1}{2}\int _{{\mathbb {R}}}|f_{\theta _{2,n}}(x\mid \theta _{1,n})-f_{\theta _{2,n}}(x+\Delta \mid \theta _{1,n})|dxI_{\Delta>L}\right] \right] \\&\quad \le \frac{1}{2}E_{\theta _{1,n}}\left[ E_\Delta \left[ K(M+1)| \Delta | \right] \right] + P_\Delta (\mid \Delta \mid >L)\\&\quad \le \frac{K(M+1)}{2}E_\Delta \left[ | \Delta | \right] +\frac{E_{\Delta }[\mid \Delta \mid ]}{L} \end{aligned}$$

The second last inequality is a result of Lemma B.3. The coalescing condition is thus satisfied as follows with $C=\frac{K(M+1)}{2} +\frac{I_{M>1}}{L}$,

$$\begin{aligned}&\Vert {\mathcal {L}}(X_{n+1})-{\mathcal {L}}(X'_{n+1})\Vert \\&\quad \le C E[|g(\theta _{1,n},X_{n-1})-g(\theta _{1,n},X'_{n-1})|]\\&\quad =CE[|g(\theta _{1,n},X_{n-1}) + \theta _{2,n}\\&\quad -(g(\theta _{1,n},X'_{n-1}) +\theta _{2,n})|]\\&\quad = C E[|X_n-X'_n|] \end{aligned}$$

$\square $

Appendix C: Lemmas for random-functional autoregressive process examples

1.1 Proof of Lemma 4.5

Proof of Lemma 4.5

First note that

$$\begin{aligned}&E[|X_{n+2}-X'_{n+2}| \mid X_n=x,X'_n=y] \\&\quad = E\left[ \bigg |g\left( \frac{1}{2}(x -\sin x)+Z_n\right) -g\left( \frac{1}{2}(y -\sin y)+Z_n\right) \bigg |\right] \\&\quad = \frac{1}{2}E\left[ \bigg |\frac{1}{2}(x-y +\sin y -\sin x)\right. \\&\qquad \left. + \sin \left( \frac{1}{2}(y -\sin y)+Z_n\right) -\sin \left( \frac{1}{2}(x -\sin x)+Z_n\right) \bigg |\right] \\&\quad = \frac{1}{2}E\left[ |g(x,y) + G(x,y)|\right] \end{aligned}$$

where $g(x,y)=\frac{1}{2}(x-y +\sin y -\sin x)$ and $G(x,y)=\sin \left( \frac{1}{2}(y -\sin y)+Z_n\right) -\sin \left( \frac{1}{2}(x -\sin x)+Z_n\right) $. Bytrigonometric identities^{Footnote 1}, for $k(x,y)=\frac{x+y-\sin y - \sin x}{4}$ and $h(x,y)=\frac{y-x+\sin x - \sin y}{4}$.

$$\begin{aligned} G(x,y)&= 2\cos \left( \frac{x+y-\sin y - \sin x}{4} + Z_n\right) \\&\quad \sin \left( \frac{y-x+\sin x - \sin y}{4}\right) \\&= 2\cos \left( k(x,y) + Z_n\right) \sin h(x,y) \\&= 2\sin h(x,y) \left( \cos Z_n \cos k(x,y)\right. \\&\quad \left. + \sin Z_n \sin k(x,y)\right) \end{aligned}$$

And so,

$$\begin{aligned}&E[|X_{n+2}-X'_{n+2}|\mid X_n=x,X'_n=y]\\&= \frac{1}{2}E\left[ |g(x,y) + 2\sin h(x,y) \left( \cos Z_n \cos k(x,y) + \sin Z_n \sin k(x,y)\right) |\right] \\&\le \frac{1}{2}\sqrt{E\left[ \left( g(x,y) + 2\sin h(x,y) \left( \cos Z_n \cos k(x,y) + \sin Z_n \sin k(x,y)\right) \right) ^2\right] } \\&= \frac{1}{2}\sqrt{g(x,y)^2 + 4\frac{g(x,y) \sin h(x,y)\cos k(x,y)}{e^{1/2}} + 2\sin ^2 h(x,y) \left( 1+\frac{\cos ^2k(x,y) - \sin ^2 k(x,y)}{e^{2}}\right) }\\&= \frac{1}{\sqrt{2}}\sqrt{2h(x,y)^2 - 4\frac{h(x,y) \sin h(x,y)\cos k(x,y)}{e^{1/2}} + \sin ^2 h(x,y) \left( 1+\frac{\cos ^2k(x,y) - \sin ^2 k(x,y)}{e^2}\right) } \end{aligned}$$

$\square $

1.2 Proof of lemmas used in Theorem 4.8

To prove the first part of this theorem, we apply the de-initialization technique which shows how the convergence rate of a Markov chain can be bounded above by the convergence rate of a more simpler Markov chain that includes sufficient information on the Markov chain of interest. The concept of de-initialization and a proposition that bounds total variation is provided below.

Definition C.1

(De-initialization) Let $\{X_n\}_{n\ge 1}$ be a Markov chain. A Markov chain $\{Y_n\}_{n\ge 1}$ is a de-initialization of $\{X_n\}_{n\ge 1}$ if for each $n\ge 1$

$$\begin{aligned} {\mathcal {L}}(X_n \mid X_0,Y_n)={\mathcal {L}}(X_n\mid Y_n) \end{aligned}$$

Proposition C.2

(Theorem 1 of Roberts and Rosenthal (2001)) Let $\{Y_n\}_{n\ge 1}$ be a de-initialization of $\{X_n\}_{n\ge 1}$ then for any two initial distributions $X_0\sim \mu $ and $X'_0\sim \mu '$,

$$\begin{aligned} \Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \le \Vert {\mathcal {L}}(Y_n)-{\mathcal {L}}(Y'_n)\Vert \end{aligned}$$

Proof of Lemma 4.9

Note that $\beta _{n}= {\tilde{\beta }} + \sigma _{n-1} Z_n, Z_n\sim N_p(0, A^{-1})$ can be written as a random function of $\sigma ^2_n$. Substituting $\beta _n$, $\sigma ^2_{n}$ can then be written as a random function of its previous value for independent $Z^2_{n}\sim \chi ^2(p)$ and $G_n \sim \Gamma (\frac{k+p}{2},1)$,

$$\begin{aligned} \sigma ^2_{n}=\frac{Z^2_{n}}{C}\frac{C}{2G_n}\sigma ^2_{n-1}+\frac{C}{2G_n} \end{aligned}$$

Let $X_n=\frac{Z^2_{n}}{C}$, $Y_n=\frac{C}{2G_n}$. We can rewrite $\sigma ^2_{n}=X_nY_n\sigma ^2_{n-1}+Y_n$ where $X_n\sim \Gamma \left( \frac{p}{2}, \frac{C}{2}\right) $ and $Y_n\sim \Gamma ^{-1}\left( \frac{k+p}{2}, \frac{C}{2}\right) $. Using the notation from the Sideways Theorem 4.2$\theta _{1,n}=X_nY_n$ and $\theta _{2,n}=Y_n$.

Since $\beta _{n}$ can be written as a random function of $\sigma ^2_n$,

$$\begin{aligned} {\mathcal {L}}(\beta _n,\sigma ^2_n\mid \beta _0,\sigma ^2_0,\sigma ^2_n)={\mathcal {L}}(\beta _n,\sigma ^2_n\mid \sigma ^2_n) \end{aligned}$$

and so $\sigma ^2_n$ is a de-initialization of $(\beta _n,\sigma ^2_n)$. By Proposition C.2,

$$\begin{aligned} \Vert {\mathcal {L}}(\beta _n,\sigma ^2_n)-{\mathcal {L}}(\beta '_n, \sigma ^{'2}_n)\Vert \le \Vert {\mathcal {L}}(\sigma ^2_n)-{\mathcal {L}}(\sigma ^{'2}_n)\Vert \end{aligned}$$

We are thus interested in evaluating the convergence rate of $\sigma ^2_{n}$ to bound the convergence rate of $(\beta _{n},\sigma ^2_{n})$.

To interpret this in another way, if $\sigma ^2_n$ couples then the distribution of $\beta _n$ is the same for both iterations, so it is automatically coupled. An alternative proof can be made using the results from Liu et al. (1994). $\square $

Proof of Lemma 4.10

By Lemma 4.9, $\theta _{1,n}=X_n Y_n$ and so,

$$\begin{aligned} K&=E[|\theta _{1,n}|] = E[X_n Y_n]= E[X_n]E[Y_n]\\&\quad = \frac{p}{C}\frac{C}{k+p-2}= \frac{p}{k+p-2}\\ \end{aligned}$$

$\square $

Proof of Lemma 4.11

Calculate the conditional density $\theta _{2,n}\mid \theta _{1,n}$ We remove the subscript n on the random variables. Let X, Y be as described in Lemma 4.9. Since the random variables are independent, the joint density is the product of the densities.

$$\begin{aligned} f_{X,Y}(x,y)= & {} \frac{C/2}{\Gamma (p/2)}x^{p/2-1}e^{xC/2}\nonumber \\&\quad \frac{C/2}{\Gamma ((k+p)/2)}y^{-(k+p)/2-1}e^{-\frac{C/2}{y}} \end{aligned}$$

(C3)

Then $(\theta _1, \theta _2)=(XY,Y)$ is a transformation with the Jacobian $|J|=\theta _2^{-1}$ and the density written as follows,

$$\begin{aligned} f_{\theta _1,\theta _2}(\theta _1,\theta _2)&= f_{X,Y}\left( \frac{\theta _1}{\theta _2},\theta _2\right) \theta _2^{-1}\\&=\frac{C/2}{\Gamma (p/2)}\left( \frac{\theta _1}{\theta _2}\right) ^{p/2-1}e^{-\frac{\theta _1}{\theta _2}C/2}\\&\quad \frac{C/2}{\Gamma ((k+p)/2)}\theta _2^{-(k+p)/2-1}e^{-\frac{C/2}{\theta _2}}\theta _2^{-1} \end{aligned}$$

Next $f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)$ is proportional to $f_{\theta _1,\theta _2}(\theta _1,\theta _2)$ and so we can derive the conditional density of $\theta _2$ as follows,

$$\begin{aligned} f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)&\propto f_{\theta _1,\theta _2}(\theta _1,\theta _2) \end{aligned}$$

(C4)

$$\begin{aligned}&\propto \theta _2^{1-p/2}e^{-\frac{1}{\theta _2}\theta _1C/2} \theta _2^{-(k+p)/2-1}e^{-\frac{1}{\theta _2}C/2}\theta _2^{-1} \end{aligned}$$

(C5)

$$\begin{aligned}&= \theta _2^{-(p/2+(k+p)/2)-1}e^{-\frac{1}{\theta _2}(\theta _1+1)C/2} \end{aligned}$$

(C6)

This is proportional to an inverse gamma distribution and so, $\theta _2\mid \theta _1 \sim \Gamma ^{-1}\left( \frac{k+2p}{2}, (\theta _1+1)C/2\right) $. Since the conditional density is an inverse gamma distribution, the number of modes is $M=1$ and the density function is continuous.

Calculate the maximum value of $f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)$ : Fig. 5 shows how the maximum value of the density increases as the shape, $(\theta _1+1)C/2$ decreases when the rate, $\frac{k+2p}{2}$ is fixed. It can also be shown from equation C4 that the density function of $f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)$ is maximized when $\theta _1=0$ since the normalizing constant will be the largest. This means that $f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)$ reaches its maximum height when $\theta _1=0$ and so we find the value of $f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)$ evaluated at $\theta _2= \frac{C}{k+2p+2}$, the mode (Section 5.3 of Hoff (2009)).

$$\begin{aligned} K&= f_{\theta _2\mid \theta _1}\left( \frac{C}{k+2p+2}\mid \theta _1=0\right) \\&=\frac{(C/2)^{\frac{k+2p}{2}}}{\Gamma (\frac{k+2p}{2})}y^{-\frac{k+2p}{2}-1}e^{-\frac{C/2}{y}}\mid _{y=\frac{C}{k+2p+2}}\\&=\frac{(C/2)^{\frac{k+2p}{2}}}{\Gamma (\frac{k+2p}{2})}\left( \frac{C}{k+2p+2}\right) ^{-\frac{k+2p}{2}-1}e^{-\frac{k+2p+2}{2}}\\&=\frac{(C/2)^{\frac{k+2p}{2}}}{\Gamma (\frac{k+2p}{2})}\left( \frac{k+2p+2}{C}\right) ^{\frac{k+2p}{2}+1}e^{-\frac{k+2p+2}{2}} \end{aligned}$$

And so,

$$\begin{aligned} K&= \frac{(C/2)^{\frac{k+2p}{2}}}{\Gamma (\frac{k+2p}{2})}\left( \frac{k+2p+2}{C}\right) ^{\frac{k+2p}{2}+1}e^{-\frac{k+2p+2}{2}} \end{aligned}$$

(C7)

$\square $

1.3 Proof of lemmas used in Theorem 4.14

Proof of Lemma 4.15

The iteration $\tau ^{-1}_{n+1}$ can be written as a function of its previous value, $\tau ^{-1}_{n}$ since $\mu _{n+1} = {\bar{y}} + Z_{n+1}/\sqrt{J \tau _{n}}$.

$$\begin{aligned} \tau ^{-1}_{n+1} = \frac{Z^2_{n+1}}{S}\frac{S}{2G_{n+1}}\tau ^{-1}_{n} + \frac{S}{2G_{n+1}} \end{aligned}$$

(C8)

Next we can rewrite, $\tau ^{-1}_{n}=X_nY_n \tau ^{-1}_{n-1}+Y_n$ where $X_n=\frac{Z^2_{t+1}}{S}\sim \Gamma \left( \frac{1}{2}, \frac{S}{2}\right) $ and $Y_n=\frac{S}{2G_{t+1}}\sim \Gamma ^{-1}\left( \frac{J+2}{2}, \frac{S}{2}\right) $.

Since $(\mu _{n},\tau ^{-1}_{n})$ can be written as a random function of $\tau ^{-1}_n$,

$$\begin{aligned} {\mathcal {L}}(\mu _n,\tau ^{-1}_n\mid \mu _0,\tau ^{-1}_0,\tau ^{-1}_n)={\mathcal {L}}(\mu _n,\tau ^{-1}_n\mid \tau ^{-1}_n) \end{aligned}$$

and $\tau ^{-1}_n$ is a de-initialization of $(\mu _n,\tau ^{-1}_n)$. Further, by Proposition C.2,

$$\begin{aligned} \Vert {\mathcal {L}}(\mu _n,\tau ^{-1}_n)-{\mathcal {L}}(\mu '_n, \tau ^{'-1}_n)\Vert \le \Vert {\mathcal {L}}(\tau ^{-1}_n)-{\mathcal {L}}(\tau ^{'-1}_n)\Vert \end{aligned}$$

To interpret this in another way, if $\tau _n$ couples then the distribution of $\mu _n$ is the same for both iterations, so it is automatically coupled. An alternative proof can be made using the results from Liu et al. (1994). $\square $

Proof of Lemma 4.16

By Lemma 4.15, $\theta _{1,n}=X_nY_n$ and so by Corollary 4.6

$$\begin{aligned} D= E[|\theta _{1,n}|] = E[X_nY_n]= E[X_n]E[Y_n]= \frac{1}{S}\frac{S}{J}= \frac{1}{J} \end{aligned}$$

$\square $

Proof of Lemma 4.17

To find M, K and show that the conditional density is continuous, we (a) show that $\theta _2\mid \theta _1 \sim \Gamma ^{-1}\left( \frac{J-1}{2}, (\theta _1+1)S/2\right) $, which directly implies that the conditional distribution is continuous and $M=1$ and we (b) we find the value of K.

(a) Calculate the conditional density $\theta _{2,n}\mid \theta _{1,n}$ For simplicity, we remove the subscript n on the random variables. Let X, Y be as described in Lemma 4.15. Since the random variables are independent, the joint density is the product of the densities.

$$\begin{aligned} f_{X,Y}(x,y)= & {} \frac{S/2}{\Gamma (1/2)}x^{1/2-1}e^{xS/2} \frac{S/2}{\Gamma ((J+2)/2)}\nonumber \\&y^{-(J+2)/2-1}e^{-\frac{S/2}{y}} \end{aligned}$$

(C9)

Then $(\theta _1, \theta _2)=(XY,Y)$ is a transformation with the Jacobian $|J|=\theta _2^{-1}$ and the density written as follows,

$$\begin{aligned} f_{\theta _1,\theta _2}(\theta _1,\theta _2)&= f_{X,Y}\left( \frac{\theta _1}{\theta _2},\theta _2\right) \theta _2^{-1}\\&=\frac{S/2}{\Gamma (1/2)}\left( \frac{\theta _1}{\theta _2}\right) ^{1/2-1}e^{-\frac{\theta _1}{\theta _2}S/2}\\&\quad \frac{S/2}{\Gamma ((J+2)/2)}\theta _2^{-(J+2)/2-1}e^{-\frac{S/2}{\theta _2}}\theta _2^{-1} \end{aligned}$$

Next $f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)$ is proportional to $f_{\theta _1,\theta _2}(\theta _1,\theta _2)$ and so we can derive the conditional density of $\theta _2$ as follows,

$$\begin{aligned} f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)&\propto f_{\theta _1,\theta _2}(\theta _1,\theta _2) \end{aligned}$$

(C10)

$$\begin{aligned}&\propto \theta _2^{1-1/2}e^{-\frac{1}{\theta _2}\theta _1S/2} \theta _2^{-(J+2)/2-1}e^{-\frac{1}{\theta _2}S/2}\theta _2^{-1} \end{aligned}$$

(C11)

$$\begin{aligned}&= \theta _2^{-(1/2+(J+2)/2)-1}e^{-\frac{1}{\theta _2}(\theta _1+1)S/2} \end{aligned}$$

(C12)

$$\begin{aligned}&= \theta _2^{-(J-1)/2-1}e^{-\frac{1}{\theta _2}(\theta _1+1)S/2} \end{aligned}$$

(C13)

This is proportional to an inverse gamma distribution and so, $\theta _2\mid \theta _1 \sim \Gamma ^{-1}\left( \frac{J-1}{2}, (\theta _1+1)S/2\right) $. We know that the inverse gamma distribution is continuous and unimodal, so $M=1$.

(b) Calculate the maximum value of $f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)$ : Similar to Fig. 5 of Example 4.7, $f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)$ reaches its maximum height when $\theta _1=0$. It can also be shown from equation C10 that the density function of $f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)$ is maximized when $\theta _1=0$ since the normalizing constant will be the largest. So the largest value of $f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)$ will occur when $\theta _1=0$. To find the maximum conditional distribution, we find the value of $f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1=0)$ evaluated at $\theta _2= \frac{S}{J+1}$, the mode (see Section 5.3 of Hoff (2009)).

$$\begin{aligned} K&= f_{\theta _2\mid \theta _1}\left( \frac{S}{J+1}\mid \theta _1=0\right) \\&=\frac{(S/2)^{\frac{J-1}{2}}}{\Gamma (\frac{J-1}{2})}y^{-\frac{J-1}{2}-1}e^{-\frac{S/2}{y}}\mid _{y=\frac{S}{J+1}}\\&=\frac{(S/2)^{\frac{J-1}{2}}}{\Gamma (\frac{J-1}{2})}\left( \frac{S}{J+1}\right) ^{-\frac{J-3}{2}}e^{-\frac{J+1}{2}} \end{aligned}$$

And so,

$$\begin{aligned} K&= \frac{(S/2)^{\frac{J-1}{2}}}{\Gamma (\frac{J-1}{2})}\left( \frac{S}{J+1}\right) ^{-\frac{J-3}{2}}e^{-\frac{J+1}{2}} \end{aligned}$$

(C14)

$\square $

Proof of Lemma 3.5

By the property of stationary distribution, if $\sigma ^2_{n-1}\sim \pi $ then $\sigma ^2_{n}\sim \pi $ and so the lemma follows from the following.

$$\begin{aligned} E_{\sigma ^2_n \sim \pi }[V(\sigma ^2_n)]= & {} E_{\sigma ^2_{n-1}\sim \pi }[E[V(\sigma ^2_n)\mid \sigma ^2_{n-1}]]\\\le & {} E_{\sigma ^2_{n-1} \sim \pi }[\lambda V(\sigma ^2_{n-1}) +b] \\= & {} \lambda E_{\sigma ^2_{n} \sim \pi }[ V(\sigma ^2_{n})]+b\\ \end{aligned}$$

$\square $

Proof of 4.19

Let $\lambda = 0.6583702$, $h=-0.5248723$ and $b=106.3874$, then

$$\begin{aligned}&E[V(\sigma ^2_n)\mid \sigma ^2_{n-1}] \\&= E[(\sigma ^2_n-h)^2\mid \sigma ^2_{n-1}]\\&=E[(\sigma ^2_n)^2-2h \sigma ^2_n + h^2\mid \sigma ^2_{n-1}]\\&=E[(X_nY_n \sigma ^2_{n-1} + Y_n)^2-2h (X_nY_n \sigma ^2_{n-1} + Y_n) + h^2\mid \sigma ^2_{n-1}]\\&=E[Y_n^2](E[X_n^2](\sigma ^2_{n-1})^2 + 2E[X_n]\sigma ^2_{n-1} + 1)\\&\quad -2h (E[X_n] E[Y_n] \sigma ^2_{n-1} + E[Y_n]) + h^2\\&=E[Y_n^2]E[X_n^2](\sigma ^2_{n-1})^2 + 2E[X_n]E[Y_n^2]\sigma ^2_{n-1} \\&\quad + E[Y_n^2]-2hE[X_n] E[Y_n] \sigma ^2_{n-1} -2hE[Y_n] + h^2\\&=E[Y_n^2]E[X_n^2](\sigma ^2_{n-1})^2 + 2E[X_n](E[Y_n^2]\\&\quad -h E[Y_n])\sigma ^2_{n-1} + E[Y_n^2]-2hE[Y_n] + h^2\\&=0.6583702(\sigma ^2_{n-1})^2 + 0.6911206\sigma ^2_{n-1} + 107.3691\\&=\lambda (\sigma ^2_{n-1})^2 + 2\lambda h\sigma ^2_{n-1} + \lambda h^2 +b\\&=\lambda (\sigma ^2_{n-1}+h)^2 +b \\ \end{aligned}$$

$\square $

1.4 Proof of Theorem 4.23

Proof of Theorem 4.23

This example uses a modified version of the Sideways Theorem 4.2 to find an upper bound on the convergence rate. We will also use Proposition 2.2, which states that the total variation between two random variables is equal to the total variation of any invertible transformation of the same two random variables.

Let $\vec {X}_n, \vec {X}'_n \in {\mathbb {R}}^2$ be two copies of the autoregressive normal process as defined in Example 4.22. Then for $\vec {Z}_n\sim N(\vec {0},I_d)$,

$$\begin{aligned} \vec {X}_n=A\vec {X}_{n-1}+\Sigma _d\vec {Z}_n \vec {X}'_n=A\vec {X}'_{n-1}+\Sigma _d\vec {Z}'_n \end{aligned}$$

We apply the one-shot coupling method to bound the total variation distance. For $n<N$ set $\vec {Z}_n=\vec {Z} '_n$.

Suppose $X_0, X'_0$ are known and define

$$\begin{aligned} \Delta = \Vert \Sigma ^{-1}_d A^n (\vec {X}_{0}-\vec {X}'_{0})\Vert _2 \end{aligned}$$

Decompose $A=P D P^{-1}$ with D as the corresponding diagonal matrix, $\lambda _i$ is the ith eigenvalue of A and $\Vert \cdot \Vert _2$ denotes the Frobenius norm. Then $\Delta $ is bounded above as follows,

$$\begin{aligned} \Delta&= \Vert \Sigma ^{-1}_d A^n (\vec {X}_{0}-\vec {X}'_{0})\Vert _2 \\&=\Vert \Sigma ^{-1}_d P D^n P^{-1} (\vec {X}_{0}-\vec {X}'_{0})\Vert _2\\&\le \Vert \Sigma ^{-1}_d\Vert _2 \cdot \Vert P|_2 \Vert D^n\Vert _2 \Vert P^{-1}\Vert _2 \Vert \vec {X}_{0}-\vec {X}'_{0}\Vert _2 \\&\quad \text {by Lemma 1.2.7 of}\,48\\&\le \Vert \Sigma ^{-1}_d\Vert _2 \cdot \Vert P\Vert _2 \Vert P^{-1}\Vert _2 \Vert \vec {X}_{0}-\vec {X}'_{0}\Vert _2 \sqrt{\sum _{i=1}^d \mid \lambda _i\mid ^{2n}}\\&\le \Vert \Sigma ^{-1}_d\Vert _2 \cdot \Vert P\Vert _2 \Vert P^{-1}\Vert _2 \Vert \vec {X}_{0}-\vec {X}'_{0}\Vert _2 \sqrt{d} \max _{1\le i\le d}\mid \lambda _i\mid ^n \end{aligned}$$

For now assume that $X_0, X'_0$ are known and note that $\Sigma ^{-1}_d$ is an invertible transform. We bound the total variation distance as follows by applying two invertible transforms on the Markov chain and using the fact that $\vec {Z}_{m}=\vec {Z}'_m, m < N$.

$$\begin{aligned}&\Vert {\mathcal {L}}(\vec {X}_N)-{\mathcal {L}}(\vec {X}'_N)\Vert \\&\le E_{\{\vec {Z}_m\}_{m<N}}\left[ \Vert {\mathcal {L}}(\vec {X}_N)-{\mathcal {L}}(\vec {X}'_N)\Vert \right] \\&\quad \quad \text {by Proposition}\, 2.3\\&= E_{\{\vec {Z}_m\}_{m<N}}\left[ \Vert {\mathcal {L}}(\Sigma ^{-1}_d\vec {X}_N)-{\mathcal {L}}(\Sigma ^{-1}_d\vec {X}'_N)\Vert \right] \\&\quad \quad \text {by Proposition}\, 2.2\\&= E_{\{\vec {Z}_m\}_{m<N}}\left[ \Vert {\mathcal {L}}(\Sigma ^{-1}_d A \vec {X}_{N-1} +\vec {Z}_N)-{\mathcal {L}}(\Sigma ^{-1}_dA \vec {X}'_{N-1} +\vec {Z} '_N)\Vert \right] \\&= E_{\{\vec {Z}_m\}_{m<N}}\left[ \Vert {\mathcal {L}}(\Sigma ^{-1}_d A^N \vec {X}_0 +\vec {Z}_N)-{\mathcal {L}}(\Sigma ^{-1}_d A^N \vec {X}'_0 +\vec {Z} '_N)\Vert \right] \\&\quad \text {by Proposition}\, 2.2\\&=E_{\{\vec {Z}_m\}_{m<N}}\left[ \Vert {\mathcal {L}}(\vec {Z}_N +\Sigma ^{-1}_dA^N (\vec {X}_{0} -\vec {X}'_{0}))-{\mathcal {L}}(\vec {Z} '_N)\Vert \right] \\&= \Vert {\mathcal {L}}(\vec {Z}_N +\Sigma ^{-1}_dA^N (\vec {X}_{0} -\vec {X}'_{0}))-{\mathcal {L}}(\vec {Z} '_N)\Vert \end{aligned}$$

There exists a rotation matrix $R\in {\mathbb {R}}^{d\times d}$ such that

$$\begin{aligned} R[\Sigma ^{-1}_dA (\vec {X}_n -\vec {X}'_n)]= & {} (\Vert \Sigma ^{-1}_dA (\vec {X}_n -\vec {X}'_n)\Vert _2,0,\ldots 0)\\= & {} (\Delta ,0,\ldots 0) \end{aligned}$$

Aggarwal (2020). By properties of rotation, R is orthogonal, so $R^T =R^{-1}$ and $RZ_n \sim N(0,RI_d R^T)=N(0,I_d)\sim Z_n$. In other words, $RZ_n \overset{d}{=} Z_n \overset{d}{=} Z'_n$. Thus, continuing the above equality,

$$\begin{aligned}&\Vert {\mathcal {L}}(\vec {X}_n)-{\mathcal {L}}(\vec {X}'_n)\Vert \\&\quad \le \Vert {\mathcal {L}}(\vec {Z}_n +\Sigma ^{-1}_dA^n (\vec {X}_{0} -\vec {X}'_{0}))-{\mathcal {L}}(\vec {Z} '_n)\Vert \\&\quad = \Vert {\mathcal {L}}(R[\vec {Z}_n +\Sigma ^{-1}_dA (\vec {X}_n -\vec {X}'_n)])-{\mathcal {L}}(R\vec {Z} '_n)\Vert&\text {by Proposition}\, 2.2\\&\quad =\Vert {\mathcal {L}}(\vec {Z}_n +(\Delta ,0,\ldots 0))-{\mathcal {L}}(\vec {Z} _n)\Vert \end{aligned}$$

Next, suppose that $X_0, X'_0$ are unknown. Then, the inequality stated in Eq. 12 is shown as follows,

$$\begin{aligned}&\Vert {\mathcal {L}}(\vec {X}_n)-{\mathcal {L}}(\vec {X}'_n)\Vert \\&\le E_{\Delta }\left[ \Vert {\mathcal {L}}(\vec {Z}_n +(\Delta ,0,\ldots 0))-{\mathcal {L}}(\vec {Z} _n)\Vert \right] \quad \text {by Proposition}\, 2.3\\&= E_{\Delta }\left[ \frac{1}{2}\int _{{\mathbb {R}}^d} \frac{e^{ -\sum _{i=2}^d y_i^2/2}}{(2\pi )^{d/2}} \left| e^{-y_1^2/2}-e^{-(y_1-\Delta )^2/2}\right| d\vec {y} \right] \\&= E_{\Delta }\left[ \frac{1}{2}\int _{{\mathbb {R}}} \left| \frac{1}{\sqrt{2\pi }}e^{-y_1^2/2 }-\frac{1}{\sqrt{2\pi }}e^{-(y_1-\Delta )^2/2}d\right| \vec {y} \right] \\&= E_{\Delta }[\Vert {\mathcal {L}}(Z_{1,n}+\Delta )-{\mathcal {L}}(Z_{1,n})\Vert ]\\&\le \frac{1}{\sqrt{2\pi }} E[\Delta ] \quad \text {by Lemma}\, B.3\\&\le \sqrt{\frac{d}{2\pi }} \Vert \Sigma ^{-1}_d\Vert _2 \cdot \Vert P\Vert _2 \Vert P^{-1}\Vert _2 E[\Vert \vec {X}_{0}-\vec {X}'_{0}\Vert _2] \max _{1\le i\le d}\mid \lambda _i\mid ^n \end{aligned}$$

$\square $

Appendix D: Lemmas for ARCH process examples

1.1 Proof of lemmas used in Theorem 5.3

Proof of Lemma 5.4

Let $\{X_n\}_{n\ge 1}\in {\mathbb {R}}$ and $\{X'_n\}_{n\ge 1}\in {\mathbb {R}}$ be two copies of the LARCH process. For fixed $n\ge 1$, let $Z_n=Z'_n$ and so,

$$\begin{aligned} E[|X_n-X'_n|]&= E[|(\beta _0+\beta _1 X_{n-1})Z_n-(\beta _0+\beta _1 X'_{n-1})Z_n|]\\&\le \beta _1 E[|Z_n|] E[|X_{n-1}-X'_{n-1}|] \end{aligned}$$

Since $Z_n\overset{d}{=} Z_0>0$ a.s., the geometric convergence rate is $D=\beta _1 E[Z_0]$. $\square $

Proof of Lemma 5.5

For a fixed $n\ge 0$, suppose that $Z_{n+1}, Z'_{n+1}$ are independent. By Proposition 2.3, the total variation distance between the two processes is bounded above by the expectation of the total variation.

$$\begin{aligned} \Vert {\mathcal {L}}(X_{n+1})-{\mathcal {L}}(X'_{n+1})\Vert&\le E[\Vert {\mathcal {L}}((\beta _0+\beta _1 X_{n})Z_{n+1})\\&\quad -{\mathcal {L}}((\beta _0+\beta _1 X'_{n})Z_{n+1})\Vert ] \end{aligned}$$

Note that $Z_{n+1}$ and $Z'_{n+1}$ are used interchangeably in the total variation distance since $Z_{n+1}\overset{d}{=}Z'_{n+1}$. Let $Y_{n}=\beta _0+\beta _1 X_{n}$, $Y'_{n}=\beta _0+\beta _1 X'_{n}$, $\Delta =Y'_{n}-Y_{n}$, and $\Delta '=\frac{\Delta }{Y_{n}}$. WLOG $Y'_{n}>Y_{n}$ so that $\Delta , \Delta '>0$. Then,

$$\begin{aligned}&\Vert {\mathcal {L}}(X_{n+1})-{\mathcal {L}}(X'_{n+1})\Vert \\&\quad \le E[\Vert {\mathcal {L}}(Y_{n}Z_{n+1})-{\mathcal {L}}(Y'_{n}Z_{n+1})\Vert ]\quad \text {by Proposition}\, 2.3\\&\quad =E[\Vert {\mathcal {L}}(Y_{n}Z_{n+1})-{\mathcal {L}}((Y_{n}+\Delta )Z_{n+1})\Vert ]\\&\quad =E[\Vert {\mathcal {L}}(Z_{n+1})-{\mathcal {L}}((1+\Delta ')Z_{n+1})\Vert ]\quad \text {by Proposition}\, 2.2\\&\quad =E[\Vert {\mathcal {L}}(\log (Z_{n+1}))-{\mathcal {L}}(\log (1+\Delta ')+\log (Z_{n+1}))\Vert ]\\&\qquad \text {by Proposition}\, 2.2\\&\quad \le \frac{M+1}{2}\sup _x e^x f_{Z_n}(e^x)E[\log (1+\Delta ')]\\&\quad \le \frac{M+1}{2}\sup _x e^x f_{Z_n}(e^x)\frac{E[|\Delta |]}{\beta _0}\\&\quad = \frac{M+1}{2}\sup _x e^x f_{Z_n}(e^x)\frac{\beta _1E[|X_n-X'_n|]}{\beta _0} \end{aligned}$$

The second last inequality is by Lemma B.3. See the proof of Lemma 4.3 for more details. The last inequality is by the Mean Value Theorem. $\square $

1.2 Proof of lemmas used in Theorem 5.8

Proof of Lemma 5.9

Let $\{X_n\}_{n\ge 1}\in {\mathbb {R}}$ and $\{X'_n\}_{n\ge 1}\in {\mathbb {R}}$ be two copies of the asymmetric ARCH process.

For a fixed $n\ge 1$, let $Z_n=Z'_n$ and so,

$$\begin{aligned}&E[|X_n-X'_n|]\\&\quad = E\left[ |\sqrt{(aX_{n-1}+b)^2+c^2}Z_n-\sqrt{(aX'_{n-1}+b)^2+c^2}Z_n|\right] \\&\quad = |\sqrt{(aX_{n-1}+b)^2+c^2}-\sqrt{(aX'_{n-1}+b)^2+c^2}|\\&\qquad E[|Z_n|] \end{aligned}$$

Note that the derivative of $f(x)=\sqrt{(ax+b)^2+c^2}$ is

$$\begin{aligned} |f'(x)|=|\frac{a(ax+b)}{\sqrt{(ax+b)^2+c^2}}|\le \frac{|a(ax+b)|}{\sqrt{(ax+b)^2}}=|a| \end{aligned}$$

(D15)

and so,

$$\begin{aligned} E[|X_n-X'_n|]&\le |a| E[|Z_n|] E[|X_{n-1}-X'_{n-1}|] \end{aligned}$$

Thus, the geometric convergence rate is $D=|a| E[|Z_0|]$. $\square $

Proof of Lemma 5.10

Let $\{X_n\}_{n\ge 1}\in {\mathbb {R}}$ and $\{X'_n\}_{n\ge 1}\in {\mathbb {R}}$ be two copies of the asymmetric ARCH process.

For $n\ge 1$, $Z_n, Z'_n$ are independent. By Proposition 2.3, the total variation distance between the two processes is bounded above by the expectation of the total variation with respect to $X_{n-1},X'_{n-1}, Z_n, Z'_n$.

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \\&\quad \le E\left[ \Vert {\mathcal {L}}(\sqrt{(aX_{n-1}+b)^2+c^2}Z_n)\right. \\&\qquad \left. -{\mathcal {L}}(\sqrt{(aX'_{n-1}+b)^2+c^2}Z'_n)\Vert \right] \end{aligned}$$

Let $Y_{n-1}=\sqrt{(aX_{n-1}+b)^2+c^2}$ and $Y'_{n-1} =\sqrt{(aX_{n-1}+b)^2+c^2}$, $\Delta =Y'_{n-1}-Y_{n-1}$ and $\Delta '=\frac{\Delta }{Y_{n-1}}$. WLOG, $Y'_{n-1}<Y_{n-1}$, so $-1< \Delta ' <0$, because $Y_{n-1},Y'_{n-1}>0$ and

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \le E[\Vert {\mathcal {L}}(Y_{n-1}Z_n)-{\mathcal {L}}(Y'_{n-1}Z_n)\Vert ]\\&\quad = E[\Vert {\mathcal {L}}(Y_{n-1}Z_n)-{\mathcal {L}}((Y_{n-1}+\Delta )Z_n)\Vert ] \\&\quad \text {by Proposition}\, 2.2\\&\quad = E[\Vert {\mathcal {L}}(Z_n)-{\mathcal {L}}((1+\Delta ')Z_n)\Vert ] \quad \text {by Proposition}\, 2.2\\&\quad \le E\left[ \sup _{x} 1-\frac{\pi _{Z_n}(x)}{\pi _{(1+\Delta ')Z_n}(x)}\right] \\&\quad \text {by Lemma 6.16 of}\, 24 \end{aligned}$$

Let the density of $Z_n$ be $\pi _{Z_n}(x)$, then $\pi _{(1+\Delta ')Z_n}(x)= \frac{1}{1+\Delta '}\pi _{Z_n}\left( \frac{x}{1+\Delta '}\right) $.

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \le E\left[ \sup _{x} 1-(1+\Delta ')\frac{\pi _{Z_n}(x)}{\pi _{Z_n}\left( \frac{x}{1+\Delta '}\right) }\right] \\&\quad \le E[\sup _{x} 1-(1+\Delta ')]\\&\quad = E[\Delta ']\\&\quad \le \frac{E[|Y_{n-1}-Y'_{n-1}|]}{c}\,\, \hbox { since}\ Y_{n-1}\ge c\\&\quad \le \frac{|a|}{c}E[|X_{n-1}-X'_{n-1}|] \\&\qquad \text {by equation} \,D15 \end{aligned}$$

The second inequality is by assumption $\pi _{Z_n}(x){\ge }\pi _{Z_n}\Big (\frac{x}{1+\Delta '}\Big )$.

$\square $

1.3 Proof of lemmas used in Theorem 5.13

Proof of Lemma 5.14

Let $\{X_n\}_{n\ge 1}\in {\mathbb {R}}$ and $\{X'_n\}_{n\ge 1}\in {\mathbb {R}}$ be two copies of the GARCH process. For $n\ge 2$, let $Z_n=Z'_n$. First note that,

$$\begin{aligned} E[|X_n-X'_n|]&= E[|\sigma _nZ_n -\sigma '_n Z_n|]= E[|\sigma _n -\sigma '_n| |Z_n|]\nonumber \\&=E[|\sigma _n -\sigma '_n|]E[ |Z_n|] \end{aligned}$$

(D16)

Next, we find an upper bound on $E[|\sigma _n-\sigma '_n|]$ by first noting that $\sigma ^2_n=\alpha ^2+(\beta ^2 Z^2_{n-1}+\gamma ^2)\sigma ^2_{n-1}$ by substitution.

$$\begin{aligned}&E[|\sigma _n-\sigma '_n|] \\&\quad = E\bigg [|\sqrt{\alpha ^2+(\beta ^2 Z^2_{n-1}+\gamma ^2)\sigma ^2_{n-1}} \\&\qquad -\sqrt{\alpha ^2+(\beta ^2 Z^2_{n-1}+\gamma ^2)\sigma ^{'2}_{n-1}}|\bigg ] \\&\quad \le E\bigg [\sqrt{\beta ^2 Z^2_{n-1}+\gamma ^2}\bigg ] E\bigg [|\sigma _{n-1}-\sigma ^{'}_{n-1}|\bigg ]\\&\quad =E\bigg [\sqrt{\beta ^2 Z^2_{n-1}+\gamma ^2}\bigg ] \frac{E[|X_{n-1}-X'_{n-1}|]}{E[|Z_{n-1}|]} \end{aligned}$$

The above inequality is by taking the maximum of the derivative and the last equality is a result of Eq. D16. Finally, substituting $E[|\sigma _n-\sigma '_n|]$ into Eq. D16,

$$\begin{aligned}&E[|X_n-X'_n|]\\&\quad \le E[\sqrt{\beta ^2 Z^2_{n-1}+\gamma ^2}] \frac{E[|X_{n-1}-X'_{n-1}|]}{E[|Z_{n-1}|]}E[|Z_n|]\\&\quad = E[\sqrt{\beta ^2 Z^2_{n-1}+\gamma ^2}] E[|X_{n-1}-X'_{n-1}|]\\&\quad \le \sqrt{\beta ^2 E[Z_0^2]+\gamma ^2} E[|X_{n-1}-X'_{n-1}|] \\&\qquad \text {by Jensen's inequality} \end{aligned}$$

Thus, the geometric convergence rate is $D=\sqrt{\beta ^2 E[Z_0^2]+\gamma ^2}$. $\square $

Proof of Lemma 5.15

Let $\{X_n\}_{n\ge 1}\in {\mathbb {R}}$ and $\{X'_n\}_{n\ge 1}\in {\mathbb {R}}$ be two copies of the GARCH process.

For $n\ge 2$, suppose that $Z_n, Z'_n$ are independent. By Proposition 2.3, the total variation distance between the two processes is bounded above by the expectation of the total variation.

$$\begin{aligned} \Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert&\le E[\Vert {\mathcal {L}}(\sigma _nZ_n)-{\mathcal {L}}(\sigma '_n Z_n)\Vert ] \end{aligned}$$

Let $\Delta =\sigma '_{n}-\sigma _{n}$ and $\Delta '=\frac{\Delta }{\sigma _{n}}$. WLOG, $\sigma '_{n}<\sigma _{n}$, so $\Delta , \Delta ' <0$ because $\sigma _{n},\sigma '_{n}>0$ and

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \\&\quad = E[\Vert {\mathcal {L}}(\sigma _{n}Z_n)-{\mathcal {L}}((\sigma _{n}+\Delta )Z_n)\Vert ]\\&\qquad \text {by Proposition}\, 2.2\\&\quad = E[\Vert {\mathcal {L}}(Z_n)-{\mathcal {L}}((1+\Delta ')Z_n)\Vert ]\\&\qquad \text {by Proposition}\, 2.2\\&\quad \le E\left[ \sup _{x} 1-\frac{\pi _{Z_n}(x)}{\pi _{(1+\Delta ')Z_n}(x)}\right] \\&\qquad \text {by Lemma 6.16 of}\, 24 \end{aligned}$$

Let the density of $Z_n$ be $\pi _{Z_n}(x)$, then $\pi _{(1+\Delta ')Z_n}(x)= \frac{1}{1+\Delta '}\pi _{Z_n}\left( \frac{x}{1+\Delta '}\right) $.

$$\begin{aligned}&\Vert {\mathcal {L}}(X_n)-{\mathcal {L}}(X'_n)\Vert \\&\quad \le E\left[ \sup _{x} 1-(1+\Delta ')\frac{\pi _{Z_n}(x)}{\pi _{Z_n}\left( \frac{x}{1+\Delta '}\right) }\right] \\&\quad \le E[\sup _{x} 1-(1+\Delta ')]\\&\qquad \hbox { by assumption}\ \pi _{Z_n}(x)\ge \pi _{Z_n}\left( \frac{x}{1+\Delta '}\right) \\&\quad = E[\Delta ']\\&\quad \le \frac{E[|\sigma '_{n}-\sigma _{n}|]}{\alpha }\hbox { since}\ \sigma _{n}\ge \alpha \\&\quad \le \frac{D}{\alpha E[|Z_{n-1}|]} E[|X_{n-1}-X'_{n-1}|] \\&\quad \quad \text {by equation in proof}\, D.3 \end{aligned}$$

$\square $

Proof of Lemma 5.16

$$\begin{aligned}&E[|X_1 - X'_1|] \\&= |\sigma ^{2}_1 - \sigma ^{'2}_1| E[|Z_1|] \qquad \qquad \text {by equation in proof}\, D.3\\&= |\sqrt{\alpha ^2 +\beta ^2 X_0^{2} + \gamma ^2 \sigma _0^{2}} - \sqrt{\alpha ^2 +\beta ^2 X_0^{'2} + \gamma ^2 \sigma _0^{'2}}| E[|Z_1|] \\&\le \sqrt{|(\alpha ^2 +\beta ^2 X_0^{2} + \gamma ^2 \sigma _0^{2}) - (\alpha ^2 +\beta ^2 X_0^{'2} + \gamma ^2 \sigma _0^{'2})|} E[|Z_1|] \\&\quad \text {since} |\sqrt{x}-\sqrt{y}|=\sqrt{(\sqrt{x}-\sqrt{y})^2} = \sqrt{x+y-2\sqrt{x}\sqrt{y}} \\&\le \sqrt{|x-y|} \\&\le \sqrt{\beta ^2 |X_0^{2}-X_0^{'2}| + \gamma ^2 |\sigma _0^{2} - \sigma _0^{'2}|} E[|Z_0|] \end{aligned}$$

$\square $

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sixta, S., Rosenthal, J.S. Convergence rate bounds for iterative random functions using one-shot coupling. Stat Comput 32, 71 (2022). https://doi.org/10.1007/s11222-022-10134-x

Download citation

Received: 07 December 2021
Accepted: 07 August 2022
Published: 02 September 2022
DOI: https://doi.org/10.1007/s11222-022-10134-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convergence rate bounds for iterative random functions using one-shot coupling

Abstract

Access this article

Similar content being viewed by others

Convergence arguments to bridge cauchy and matérn covariance functions

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Conservative and Semiconservative Random Walks: Recurrence and Transience

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: Propositions related to the properties of total variation distance

Proof of Proposition 2.2

Proof of Proposition 2.3

Proof of Proposition 2.4

Appendix B: Lemmas related to the Sideways Theorem

1.1 Lemmas providing an upper bound on the integral difference between a function and a corresponding shift

Lemma B.1

Proof

Lemma B.2

Proof

Lemma B.3

Proof

1.1.1 Proof of Lemma 4.3

Proof of Lemma 4.3

Appendix C: Lemmas for random-functional autoregressive process examples

1.1 Proof of Lemma 4.5

Proof of Lemma 4.5

1.2 Proof of lemmas used in Theorem 4.8

Definition C.1

Proposition C.2

Proof of Lemma 4.9

Proof of Lemma 4.10

Proof of Lemma 4.11

1.3 Proof of lemmas used in Theorem 4.14

Proof of Lemma 4.15

Proof of Lemma 4.16

Proof of Lemma 4.17

Proof of Lemma 3.5

Proof of 4.19

1.4 Proof of Theorem 4.23

Proof of Theorem 4.23

Appendix D: Lemmas for ARCH process examples

1.1 Proof of lemmas used in Theorem 5.3

Proof of Lemma 5.4

Proof of Lemma 5.5

1.2 Proof of lemmas used in Theorem 5.8

Proof of Lemma 5.9

Proof of Lemma 5.10

1.3 Proof of lemmas used in Theorem 5.13

Proof of Lemma 5.14

Proof of Lemma 5.15

Proof of Lemma 5.16

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation