Abstract
One-shot coupling is a method of bounding the convergence rate between two copies of a Markov chain in total variation distance, which was first introduced in Roberts and Rosenthal (Process Appl 99:195–208, 2002) and generalized in Madras and Sezer (Bernoulli 16:882–908, 2010). The method is divided into two parts: the contraction phase, when the chains converge in expected distance and the coalescing phase, which occurs at the last iteration, when there is an attempt to couple. One-shot coupling does not require the use of any exogenous variables like a drift function or a minorization constant. In this paper, we summarize the one-shot coupling method into the One-Shot Coupling Theorem. We then apply the theorem to two families of Markov chains: the random functional autoregressive process and the autoregressive conditional heteroscedastic process. We provide multiple examples of how the theorem can be used on various models including ones in high dimensions. These examples illustrate how the theorem’s conditions can be verified in a straightforward way. The one-shot coupling method appears to generate tight geometric convergence rate bounds.
This is a preview of subscription content, access via your institution.




Notes
The trigonometric identities used are \(2\cos \mu \sin \upsilon = \sin (\mu +\upsilon )-\sin (\mu -\upsilon )\) and \(\cos (\mu + \upsilon )= \cos \mu \cos \upsilon +\sin \mu \sin \upsilon \) where \(\mu ,\upsilon \in {\mathbb {R}}\)
References
Aggarwal, C.: Linear Algebra and Optimization for Machine Learning: A Textbook. Springer, New York (2020). https://doi.org/10.1007/978-3-030-40344-7
Baxendale, P.H.: Renewal theory and computable convergence rates for geometrically ergodic Markov chains. Ann. Appl. Probab. 15(1B), 700–738 (2005). https://doi.org/10.1214/105051604000000710
Billingsley, P.: Probability and Measure, Anniversary Wiley Series in Probability and Statistics, New York (2012)
Böttcher, B.: Markovian Maximal Coupling of Markov Processes (2017)
Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting, 2nd edn. Springer, New York (2002). https://doi.org/10.1007/978-3-319-29854-2
Diaconis, P., Freedman, D.: Iterated random functions. SIAM Rev. 41(1), 45–76 (1999). https://doi.org/10.1137/S0036144598338446
Doukhan, P.: Stochastic Models for Time Series, 1st edn. Springer, New York (2018). https://doi.org/10.1007/978-3-319-76938-7
Durmus, A., Moulines, Éric.: Quantitative bounds of convergence for geometrically ergodic Markov chain in the Wasserstein distance with application to the Metropolis Adjusted Langevin Algorithm. Stat. Comput. 25, 5–19 (2015). https://doi.org/10.1007/s11222-014-9511-z
Dyer, M., Goldberg, L.A., Jerrum, M., Martin, R.: Markov chain comparison. Probab. Surveys 3, 89–111 (2006). https://doi.org/10.1214/154957806000000041
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7(4), 457–472 (1992). https://doi.org/10.1214/ss/1177011136
Geyer, C.J.: Introduction to Markov Chain Monte Carlo, pp. 1–46. Chapman and Hall/CRC, New York (2011). https://doi.org/10.1201/b10905
Gibbs, A.L.: Convergence in the Wasserstein metric for Markov chain monte Carlo algorithms with applications to image restoration. Stoch. Model. 20(4), 473–492 (2004). https://doi.org/10.1081/STM-200033117
Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. / Revue Internationale de Statistique 70(3), 419–435 (2002). https://doi.org/10.2307/1403865
Guibourg, D., Hervé, L., Ledoux, J.: Quasi-compactness of Markov kernels on weighted-supremum spaces and geometrical ergodicity (2012)
Hairer, E., Wanner, G.: Analysis by Its History. Springer, New York (2008). https://doi.org/10.1007/978-0-387-77036-9
Hobert, J.P., Jones, G.L.: Honest Exploration of Intractable Probability Distributions via Markov Chain Monte Carlo. Stat. Sci. 16(4), 312–334 (2001). https://doi.org/10.1214/ss/1015346317
Hoff, P.D.: A First Course in Bayesian Statistical Methods. Springer, New York (2009). https://doi.org/10.1007/978-0-387-92407-6
Jacob, P.E.: Lecture notes for couplings and Monte Carlo. Available at https://sites.google.com/site/pierrejacob/cmclectures?authuser=0 (2021/09/17)
Jerison, D.: The drift and minorization method for reversible Markov chains. PhD thesis, Stanford University (2016)
Jin, Z., Hobert, J.P.: Dimension free convergence rates for Gibbs samplers for Bayesian linear mixed models (2021)
Jin, R., Tan, A.: Central limit theorems for markov chains based on their convergence rates in wasserstein distance. arXiv:2002.09427 Statistics Theory (2020)
Jones, G.L.: On the Markov chain central limit theorem. Probab. Surv. 1, 299–320 (2004). https://doi.org/10.1214/154957804100000051
Jovanovski, O.: Convergence bound in total variation for an image restoration model. Stat. Probab. Lett. 90, 11–16 (2014). https://doi.org/10.1016/j.spl.2014.03.007
Jovanovski, O., Madras, N.: Convergence rates for a hierarchical Gibbs sampler. Bernoulli 1(23), 603–625 (2013). https://doi.org/10.3150/15-BEJ758
Levin, D.A., Peres, Y., Wilmer, E.L.: Markov Chains and Mixing Times, 2nd edn. American Mathematical Society, Providence, RI (2017). https://doi.org/10.1090/mbk/107
Liu, J.S., Wong, W.H., Kong, A.: Covariance structure of the gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81(1), 27–40 (1994). https://doi.org/10.1093/biomet/81.1.27
Madras, N., Sezer, D.: Quantitative bounds for Markov chain convergence: Wasserstein and total variation distances. Bernoulli 16(3), 882–908 (2010). https://doi.org/10.2307/25735016
Meyn, S.P., Tweedie, R.L.: Markov Chains and Stochastic Stability. Springer, London (1993). https://doi.org/10.1007/978-1-4471-3267-7
Nummelin., E.: A splitting technique for harris recurrent chains. Z. Wahrscheinlichkeitstheorie und Verw. Geb. 43, 309–318 (1978). https://doi.org/10.1007/BF00534764
Pillai, N.S., Smith, A.: Kac’s walk on \(n\)-sphere mixes in \(n\log n\) steps. Ann. Appl. Probab. 27(1), 631–650 (2017). https://doi.org/10.1214/16-AAP1214
Qin, Q., Hobert, J.P.: Geometric convergence bounds for Markov chains in Wasserstein distance based on generalized drift and contraction conditions (2021)
Qin, Q., Hobert, J.P.: Wasserstein-based methods for convergence complexity analysis of MCMC with applications (2020)
Rajaratnam, B., Sparks, D.: MCMC-Based inference in the era of big data: a fundamental analysis of the convergence complexity of high-dimensional chains (2015)
Reiss, R.-D.: Approximation of product measures with an application to order statistics. Ann. Probab. 9(2), 335–341 (1981). https://doi.org/10.1214/aop/1176994477
Roberts, G.O., Rosenthal, J.S.: Markov chains and de-initializing processes. Scand. J. Stat. 28(3), 489–504 (2001). https://doi.org/10.1111/1467-9469.00250
Roberts, G.O., Rosenthal, J.S.: One-shot coupling for certain stochastic recursive sequences. Stoch. Process. Appl. 99, 195–208 (2002). https://doi.org/10.1016/S0304-4149(02)00096-0
Roberts, G.O., Rosenthal, J.S.: General state space Markov chains and mcmc algorithms. Probab. Surv. 1, 20–71 (2004). https://doi.org/10.1214/154957804100000024
Rosenthal, J.S.: Convergence rates for Markov chains. SIAM Rev. 37(3), 387–405 (1995). https://doi.org/10.1137/1037083
Rosenthal, J.S.: Minorization conditions and convergence rates for Markov chain monte carlo. J. Am. Stat. Assoc. 90(430), 558–566 (1995). https://doi.org/10.2307/2291067
Rosenthal, J.S.: Analysis of the gibbs sampler for a model related to james-stein estimators. Stat. Comput. 6, 269–275 (1996). https://doi.org/10.1007/BF00140871
Rosenthal, J.S.: Faithful couplings of Markov chains: now equals forever. Adv. Appl. Math. 18(3), 372–381 (1997). https://doi.org/10.1006/aama.1996.0515
Rosenthal, J.S.: A First Look at Rigorous Probability Theory, 2nd edn. World Scientific, Singapore (2016). https://doi.org/10.1142/6300
Saloff-Coste, L.: Lectures on finite Markov chains, pp. 301–413. Springer, Berlin, Heidelberg (1997). https://doi.org/10.1007/BFb0092621
Smeets, L., van de Schoot, R.: R regression Bayesian (using brms) (2019). www.rensvandeschoot.com/tutorials/r-linear-regression-bayesian-using-brms/ Accessed 2021-06-03
Steinsaltz, D.: Locally contractive iterated function systems. Ann. Probab. 27(4), 1952–1979 (1999). https://doi.org/10.1214/aop/1022874823
Tan, A., Jones, G.L., Hobert, J.P.: On the geometric ergodicity of two-variable gibbs samplers. Inst. Math. Stat. Collect. 10, 25–42 (2013). https://doi.org/10.1214/12-IMSCOLL1002
van de Schoot, R., Yerkes, M.A., Mouw, J.M., Sonneveld, H.: What took them so long? explaining phd delays among doctoral candidates. PLoS ONE 8(7), 68839 (2013). https://doi.org/10.1371/journal.pone.0068839
Yang, J., Rosenthal, J.S.: Complexity results for MCMC derived from quantitative bounds (2019)
Acknowledgements
We thank the referees for their many excellent comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Propositions related to the properties of total variation distance
Proof of Proposition 2.2
Let \({\mathcal {A}}\) be the sigma field of \({\mathcal {X}}\) and \({\mathcal {B}}\) be the sigma field of \({\mathcal {Y}}\).
First note that \(f^{-1}({\mathcal {B}})=\{f^{-1}(B): B\in {\mathcal {B}}\}={\mathcal {A}}\):
-
\(f^{-1}({\mathcal {B}})\subset {\mathcal {A}}\): For \(B\in {\mathcal {B}}\), \(f^{-1}(B)\subset {\mathcal {A}}\) by measurability.
-
\({\mathcal {A}} \subset f^{-1}({\mathcal {B}})\): Let \(A\in {\mathcal {A}}\). Then \(f(A)\in {\mathcal {B}}\) and \(f^{-1}(f(A))\in f^{-1}({\mathcal {B}})\) by definition. By invertibility, \(f^{-1}(f(A))=A\) and so \(A \in f^{-1}({\mathcal {B}})\).
The equality in Eq. 2 can then be proven as follows,
\(\square \)
Proof of Proposition 2.3
\(\square \)
Proof of Proposition 2.4
To prove this, we use the concept of maximal coupling over the coordinates. By maximal coupling, for \(i\in \{1,\ldots , d\}\) there exists random variables \(X_{i,n}^M, X_{i,n}^{'M}\) such that \(X_{i,n}\overset{d}{=}X_{i,n}^M\) and \(X'_{i,n}\overset{d}{=}X_{i,n}^{'M}\) and
(see Proposition 3g of Roberts and Rosenthal (2004) or Section 2 of Böttcher (2017)).
Further, there exists a unique product measure such that for any \(A_1, \ldots A_d \in {{\mathcal {B}}}\), \(P(\cap _{i=1}^d [X_{i,n}^M\in A_i])=\prod _{i=1}^d P(X_{i,n}^M\in A_i)\) (Theorem 18.2 of Billingsley (2012)). For the unique product measure, the following equality holds,
And so by uniqueness, for \(A\in {{\mathcal {B}}}^{\text {d}}\), \(P(X_n^M\in A)=P(X_n\in A)\). By definition, this means that \(\vec {X}_{n}\overset{d}{=} \vec {X}_n^M\), which implies that \((\vec {X}_n^M, \vec {X}_n^{'M})\in {\mathcal {C}}(\vec {X}_n,\vec {X}'_n)\), the set of all couplings of \(\vec {X}_n,\vec {X}'_n\).
We now use \(\vec {X}_n^M, \vec {X}_n^{'M}\) to prove Eq. 3.
\(\square \)
Appendix B: Lemmas related to the Sideways Theorem
The following are lemmas and corresponding proofs and corollaries related to the Sideways Theorem (4.2).
1.1 Lemmas providing an upper bound on the integral difference between a function and a corresponding shift
The following lemmas are used in the proof of Lemma 4.3.
Lemma B.1
For any invertible, continuous function \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) where the codomain is \(f({\mathbb {R}})=(a,b)\) and \(\Delta >0\),
Proof
Since f is invertible and continuous, it is strictly monotone (Lemma 3.8 if Hairer and Wanner (2008)). Assume that f is strictly increasing. The integral can be written as follows,
If f is strictly decreasing apply the transform \(h(x)=a+b-f(x)\). The function h is a strictly increasing invertible function with codomain (a, b) and so using the previous result for increasing functions,
\(\square \)
Lemma B.2
Let \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) be a continuous function that is invertible over the set (c, d) and is a constant function over \((c,d)^C\). Further suppose that the codomain is \(f({\mathbb {R}})=(a,b)\). Then for \(\Delta >0\), we get that
Proof
Assume that f is an increasing function and so \(f(c)=a\), \(f(d)=b\) and \(|f(x+\Delta )-f(x)|=f(x+\Delta )-f(x)\).
Let \(0<\epsilon <(c-d)/2\) and define
Note that \(g_{\epsilon }(x)\) is continuous, invertible, an increasing function and the codomain is (a, b). By Lemma B.1 for each \(\epsilon >0\)
Further, for all \(x\in {\mathbb {R}}\), \(\lim _{\epsilon \rightarrow 0}g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)=f(x+\Delta )-f(x)\) and so \(g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)\) converges pointwise to \(f(x+\Delta )-f(x)\). Next, for \(0<\epsilon <(c-d)/2\), \(|g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)|<2|b|\) and so the function \(g_{\epsilon }(x+\Delta )-g_{\epsilon }(x)\) is uniformly bounded. The above statements allow us to apply the dominated convergence Theorem (Theorem 16.5 of Billingsley (2012)) and so
If f is strictly decreasing apply the transform \(h(x)=a+b-f(x)\). The function h is a strictly increasing invertible function with codomain (a, b) and so using the previous result for increasing functions,
\(\square \)
Lemma B.3
Let \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) be a continuous function with the following properties:
-
the codomain is (0, K)
-
\((m_1, m_2, \ldots , m_M)\) are the local maxima and minima points
-
\(\lim _{x\rightarrow \infty }f(x)=0\) and \(\lim _{x\rightarrow -\infty }f(x)=0\)
Further suppose that \(\Delta < \max _{i=2,\ldots , M}\{m_i-m_{i-1}\}\). Then
Proof
Since \(\Delta < \max _{i=2,\ldots , M}\{m_i-m_{i-1}\}\), we have that \(m_1-\Delta<m_1<m_2-\Delta<\ldots <m_M\). Let \(I_1,\ldots , I_M\) be the intersection points or the points where \(f(I_i)=f(I_i-\Delta )\).
Show that \(m_i-\Delta<I_i<m_i\): Suppose that \(m_i\) is a local maximum point. Let \(g(x)=f(x+\Delta )\). Within the interval \((m_i-\Delta ,m_i)\), \(f'(x)>0\) and \(g'(x)<0\) by assumption. This implies that \(f(m_i-\Delta )<f(m_i)\) and \(g(m_i-\Delta )>g(m_i)\) by the Mean Value Theorem. Further since \(g(m_i-\Delta )=f(m_i)\) we have that \(g(m_i-\Delta )>f(m_i-\Delta )\) and \(g(m_i)<f(m_i)\).
Let \(h(x)=g(x)-f(x)\). Then \(h(m_i-\Delta )>0\) and \(h(m_i)<0\) further h is a strictly decreasing function over \((m_i-\Delta ,m_i)\) since \(g,-f\) are strictly decreasing functions over the same interval. So by the intermediate value theorem, there exists an \(\xi \in (m_i-\Delta ,m_i)\) such that \(h(\xi )=0\) or \(f(\xi )=g(\xi )=f(\xi +\Delta )\). Further by injectivity, \(\xi \) is unique. Let \(I_i=\xi \). A similar proof can be given for when \(m_i\) is a local minimum.
Show that \(\int _{I_i}^{I_{i+1}}|f(x+\Delta )-f(x)|dx\le K\Delta \): Note first that \(m_i-\Delta<I_i<m_i<m_{i+1}-\Delta<I_{i+1}<m_{i+1}\) further define
Note that over the interval \((m_i,m_{i+1}]\), the function f is either a strictly increasing or a strictly decreasing function.
The last equality is a result of Lemma B.2.
By similar reasoning, it can be shown that
Finally note that the intersection points partition \({\mathbb {R}}\) into \(M+1\) subsets and so
\(\square \)
1.1.1 Proof of Lemma 4.3
Lemma 4.3 represents the coalescing condition for the Sideways Theorem 4.2.
Proof of Lemma 4.3
Set \(\theta _{1,n}=\theta '_{1,n}\). Define
Let \(f_{X_n},f_{X'_n}\) be the density functions for \(X_{n},X_{n}'\), respectively, and \(f_{\theta _{2,n}}, f_{\theta _{2,n}+\Delta }\) be the density functions for \(\theta _{2,n}, \theta _{2,n}+\Delta \).
Suppose that \(\Delta ,X_{n-1}, X'_{n-1} \in {\mathbb {R}}\) are known and so,
We know that \(\theta _{2,n}\overset{d}{=}\theta '_{2,n}\) and in general \(\Delta \), \(\theta _{1,n}\) are random variables, so
By the assumptions in the theorem, the density of \(\theta _{2,n}\) is continuous with M extrema points and has a codomain that is in (0, K). Let \((m_1, m_2,\ldots , m_M)\) be the local extrema points where \(m_i<m_j\) if \(i<j\) and \(L\le \max _{2\le i\le M}\{m_i-m_{i-1}\}\) be the maximum distance between two local extrema points. So, continuing from the inequality B1 and by the definition of total variation, Eq. 1,
The second last inequality is a result of Lemma B.3. The coalescing condition is thus satisfied as follows with \(C=\frac{K(M+1)}{2} +\frac{I_{M>1}}{L}\),
\(\square \)
Appendix C: Lemmas for random-functional autoregressive process examples
1.1 Proof of Lemma 4.5
Proof of Lemma 4.5
First note that
where \(g(x,y)=\frac{1}{2}(x-y +\sin y -\sin x)\) and \(G(x,y)=\sin \left( \frac{1}{2}(y -\sin y)+Z_n\right) -\sin \left( \frac{1}{2}(x -\sin x)+Z_n\right) \). Bytrigonometric identitiesFootnote 1, for \(k(x,y)=\frac{x+y-\sin y - \sin x}{4}\) and \(h(x,y)=\frac{y-x+\sin x - \sin y}{4}\).
And so,
\(\square \)
1.2 Proof of lemmas used in Theorem 4.8
To prove the first part of this theorem, we apply the de-initialization technique which shows how the convergence rate of a Markov chain can be bounded above by the convergence rate of a more simpler Markov chain that includes sufficient information on the Markov chain of interest. The concept of de-initialization and a proposition that bounds total variation is provided below.
Definition C.1
(De-initialization) Let \(\{X_n\}_{n\ge 1}\) be a Markov chain. A Markov chain \(\{Y_n\}_{n\ge 1}\) is a de-initialization of \(\{X_n\}_{n\ge 1}\) if for each \(n\ge 1\)
Proposition C.2
(Theorem 1 of Roberts and Rosenthal (2001)) Let \(\{Y_n\}_{n\ge 1}\) be a de-initialization of \(\{X_n\}_{n\ge 1}\) then for any two initial distributions \(X_0\sim \mu \) and \(X'_0\sim \mu '\),
Proof of Lemma 4.9
Note that \(\beta _{n}= {\tilde{\beta }} + \sigma _{n-1} Z_n, Z_n\sim N_p(0, A^{-1})\) can be written as a random function of \(\sigma ^2_n\). Substituting \(\beta _n\), \(\sigma ^2_{n}\) can then be written as a random function of its previous value for independent \(Z^2_{n}\sim \chi ^2(p)\) and \(G_n \sim \Gamma (\frac{k+p}{2},1)\),
Let \(X_n=\frac{Z^2_{n}}{C}\), \(Y_n=\frac{C}{2G_n}\). We can rewrite \(\sigma ^2_{n}=X_nY_n\sigma ^2_{n-1}+Y_n\) where \(X_n\sim \Gamma \left( \frac{p}{2}, \frac{C}{2}\right) \) and \(Y_n\sim \Gamma ^{-1}\left( \frac{k+p}{2}, \frac{C}{2}\right) \). Using the notation from the Sideways Theorem 4.2\(\theta _{1,n}=X_nY_n\) and \(\theta _{2,n}=Y_n\).
Since \(\beta _{n}\) can be written as a random function of \(\sigma ^2_n\),
and so \(\sigma ^2_n\) is a de-initialization of \((\beta _n,\sigma ^2_n)\). By Proposition C.2,
We are thus interested in evaluating the convergence rate of \(\sigma ^2_{n}\) to bound the convergence rate of \((\beta _{n},\sigma ^2_{n})\).
To interpret this in another way, if \(\sigma ^2_n\) couples then the distribution of \(\beta _n\) is the same for both iterations, so it is automatically coupled. An alternative proof can be made using the results from Liu et al. (1994). \(\square \)
Proof of Lemma 4.10
By Lemma 4.9, \(\theta _{1,n}=X_n Y_n\) and so,
\(\square \)
Proof of Lemma 4.11
Calculate the conditional density \(\theta _{2,n}\mid \theta _{1,n}\) We remove the subscript n on the random variables. Let X, Y be as described in Lemma 4.9. Since the random variables are independent, the joint density is the product of the densities.
Then \((\theta _1, \theta _2)=(XY,Y)\) is a transformation with the Jacobian \(|J|=\theta _2^{-1}\) and the density written as follows,
Next \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) is proportional to \(f_{\theta _1,\theta _2}(\theta _1,\theta _2)\) and so we can derive the conditional density of \(\theta _2\) as follows,
This is proportional to an inverse gamma distribution and so, \(\theta _2\mid \theta _1 \sim \Gamma ^{-1}\left( \frac{k+2p}{2}, (\theta _1+1)C/2\right) \). Since the conditional density is an inverse gamma distribution, the number of modes is \(M=1\) and the density function is continuous.
Calculate the maximum value of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) : Fig. 5 shows how the maximum value of the density increases as the shape, \((\theta _1+1)C/2\) decreases when the rate, \(\frac{k+2p}{2}\) is fixed. It can also be shown from equation C4 that the density function of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) is maximized when \(\theta _1=0\) since the normalizing constant will be the largest. This means that \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) reaches its maximum height when \(\theta _1=0\) and so we find the value of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) evaluated at \(\theta _2= \frac{C}{k+2p+2}\), the mode (Section 5.3 of Hoff (2009)).
And so,
\(\square \)
1.3 Proof of lemmas used in Theorem 4.14
Proof of Lemma 4.15
The iteration \(\tau ^{-1}_{n+1}\) can be written as a function of its previous value, \(\tau ^{-1}_{n}\) since \(\mu _{n+1} = {\bar{y}} + Z_{n+1}/\sqrt{J \tau _{n}}\).
Next we can rewrite, \(\tau ^{-1}_{n}=X_nY_n \tau ^{-1}_{n-1}+Y_n\) where \(X_n=\frac{Z^2_{t+1}}{S}\sim \Gamma \left( \frac{1}{2}, \frac{S}{2}\right) \) and \(Y_n=\frac{S}{2G_{t+1}}\sim \Gamma ^{-1}\left( \frac{J+2}{2}, \frac{S}{2}\right) \).
Since \((\mu _{n},\tau ^{-1}_{n})\) can be written as a random function of \(\tau ^{-1}_n\),
and \(\tau ^{-1}_n\) is a de-initialization of \((\mu _n,\tau ^{-1}_n)\). Further, by Proposition C.2,
To interpret this in another way, if \(\tau _n\) couples then the distribution of \(\mu _n\) is the same for both iterations, so it is automatically coupled. An alternative proof can be made using the results from Liu et al. (1994). \(\square \)
Proof of Lemma 4.16
By Lemma 4.15, \(\theta _{1,n}=X_nY_n\) and so by Corollary 4.6
\(\square \)
Proof of Lemma 4.17
To find M, K and show that the conditional density is continuous, we (a) show that \(\theta _2\mid \theta _1 \sim \Gamma ^{-1}\left( \frac{J-1}{2}, (\theta _1+1)S/2\right) \), which directly implies that the conditional distribution is continuous and \(M=1\) and we (b) we find the value of K.
(a) Calculate the conditional density \(\theta _{2,n}\mid \theta _{1,n}\) For simplicity, we remove the subscript n on the random variables. Let X, Y be as described in Lemma 4.15. Since the random variables are independent, the joint density is the product of the densities.
Then \((\theta _1, \theta _2)=(XY,Y)\) is a transformation with the Jacobian \(|J|=\theta _2^{-1}\) and the density written as follows,
Next \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) is proportional to \(f_{\theta _1,\theta _2}(\theta _1,\theta _2)\) and so we can derive the conditional density of \(\theta _2\) as follows,
This is proportional to an inverse gamma distribution and so, \(\theta _2\mid \theta _1 \sim \Gamma ^{-1}\left( \frac{J-1}{2}, (\theta _1+1)S/2\right) \). We know that the inverse gamma distribution is continuous and unimodal, so \(M=1\).
(b) Calculate the maximum value of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) : Similar to Fig. 5 of Example 4.7, \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) reaches its maximum height when \(\theta _1=0\). It can also be shown from equation C10 that the density function of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) is maximized when \(\theta _1=0\) since the normalizing constant will be the largest. So the largest value of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1)\) will occur when \(\theta _1=0\). To find the maximum conditional distribution, we find the value of \(f_{\theta _2\mid \theta _1}(\theta _2\mid \theta _1=0)\) evaluated at \(\theta _2= \frac{S}{J+1}\), the mode (see Section 5.3 of Hoff (2009)).
And so,
\(\square \)
Proof of Lemma 3.5
By the property of stationary distribution, if \(\sigma ^2_{n-1}\sim \pi \) then \(\sigma ^2_{n}\sim \pi \) and so the lemma follows from the following.
\(\square \)
Proof of 4.19
Let \(\lambda = 0.6583702\), \(h=-0.5248723\) and \(b=106.3874\), then
\(\square \)
1.4 Proof of Theorem 4.23
Proof of Theorem 4.23
This example uses a modified version of the Sideways Theorem 4.2 to find an upper bound on the convergence rate. We will also use Proposition 2.2, which states that the total variation between two random variables is equal to the total variation of any invertible transformation of the same two random variables.
Let \(\vec {X}_n, \vec {X}'_n \in {\mathbb {R}}^2\) be two copies of the autoregressive normal process as defined in Example 4.22. Then for \(\vec {Z}_n\sim N(\vec {0},I_d)\),
We apply the one-shot coupling method to bound the total variation distance. For \(n<N\) set \(\vec {Z}_n=\vec {Z} '_n\).
Suppose \(X_0, X'_0\) are known and define
Decompose \(A=P D P^{-1}\) with D as the corresponding diagonal matrix, \(\lambda _i\) is the ith eigenvalue of A and \(\Vert \cdot \Vert _2\) denotes the Frobenius norm. Then \(\Delta \) is bounded above as follows,
For now assume that \(X_0, X'_0\) are known and note that \(\Sigma ^{-1}_d\) is an invertible transform. We bound the total variation distance as follows by applying two invertible transforms on the Markov chain and using the fact that \(\vec {Z}_{m}=\vec {Z}'_m, m < N\).
There exists a rotation matrix \(R\in {\mathbb {R}}^{d\times d}\) such that
Aggarwal (2020). By properties of rotation, R is orthogonal, so \(R^T =R^{-1}\) and \(RZ_n \sim N(0,RI_d R^T)=N(0,I_d)\sim Z_n\). In other words, \(RZ_n \overset{d}{=} Z_n \overset{d}{=} Z'_n\). Thus, continuing the above equality,
Next, suppose that \(X_0, X'_0\) are unknown. Then, the inequality stated in Eq. 12 is shown as follows,
\(\square \)
Appendix D: Lemmas for ARCH process examples
1.1 Proof of lemmas used in Theorem 5.3
Proof of Lemma 5.4
Let \(\{X_n\}_{n\ge 1}\in {\mathbb {R}}\) and \(\{X'_n\}_{n\ge 1}\in {\mathbb {R}}\) be two copies of the LARCH process. For fixed \(n\ge 1\), let \(Z_n=Z'_n\) and so,
Since \(Z_n\overset{d}{=} Z_0>0\) a.s., the geometric convergence rate is \(D=\beta _1 E[Z_0]\). \(\square \)
Proof of Lemma 5.5
For a fixed \(n\ge 0\), suppose that \(Z_{n+1}, Z'_{n+1}\) are independent. By Proposition 2.3, the total variation distance between the two processes is bounded above by the expectation of the total variation.
Note that \(Z_{n+1}\) and \(Z'_{n+1}\) are used interchangeably in the total variation distance since \(Z_{n+1}\overset{d}{=}Z'_{n+1}\). Let \(Y_{n}=\beta _0+\beta _1 X_{n}\), \(Y'_{n}=\beta _0+\beta _1 X'_{n}\), \(\Delta =Y'_{n}-Y_{n}\), and \(\Delta '=\frac{\Delta }{Y_{n}}\). WLOG \(Y'_{n}>Y_{n}\) so that \(\Delta , \Delta '>0\). Then,
The second last inequality is by Lemma B.3. See the proof of Lemma 4.3 for more details. The last inequality is by the Mean Value Theorem. \(\square \)
1.2 Proof of lemmas used in Theorem 5.8
Proof of Lemma 5.9
Let \(\{X_n\}_{n\ge 1}\in {\mathbb {R}}\) and \(\{X'_n\}_{n\ge 1}\in {\mathbb {R}}\) be two copies of the asymmetric ARCH process.
For a fixed \(n\ge 1\), let \(Z_n=Z'_n\) and so,
Note that the derivative of \(f(x)=\sqrt{(ax+b)^2+c^2}\) is
and so,
Thus, the geometric convergence rate is \(D=|a| E[|Z_0|]\). \(\square \)
Proof of Lemma 5.10
Let \(\{X_n\}_{n\ge 1}\in {\mathbb {R}}\) and \(\{X'_n\}_{n\ge 1}\in {\mathbb {R}}\) be two copies of the asymmetric ARCH process.
For \(n\ge 1\), \(Z_n, Z'_n\) are independent. By Proposition 2.3, the total variation distance between the two processes is bounded above by the expectation of the total variation with respect to \(X_{n-1},X'_{n-1}, Z_n, Z'_n\).
Let \(Y_{n-1}=\sqrt{(aX_{n-1}+b)^2+c^2}\) and \(Y'_{n-1} =\sqrt{(aX_{n-1}+b)^2+c^2}\), \(\Delta =Y'_{n-1}-Y_{n-1}\) and \(\Delta '=\frac{\Delta }{Y_{n-1}}\). WLOG, \(Y'_{n-1}<Y_{n-1}\), so \(-1< \Delta ' <0\), because \(Y_{n-1},Y'_{n-1}>0\) and
Let the density of \(Z_n\) be \(\pi _{Z_n}(x)\), then \(\pi _{(1+\Delta ')Z_n}(x)= \frac{1}{1+\Delta '}\pi _{Z_n}\left( \frac{x}{1+\Delta '}\right) \).
The second inequality is by assumption \(\pi _{Z_n}(x){\ge }\pi _{Z_n}\Big (\frac{x}{1+\Delta '}\Big )\).
\(\square \)
1.3 Proof of lemmas used in Theorem 5.13
Proof of Lemma 5.14
Let \(\{X_n\}_{n\ge 1}\in {\mathbb {R}}\) and \(\{X'_n\}_{n\ge 1}\in {\mathbb {R}}\) be two copies of the GARCH process. For \(n\ge 2\), let \(Z_n=Z'_n\). First note that,
Next, we find an upper bound on \(E[|\sigma _n-\sigma '_n|]\) by first noting that \(\sigma ^2_n=\alpha ^2+(\beta ^2 Z^2_{n-1}+\gamma ^2)\sigma ^2_{n-1}\) by substitution.
The above inequality is by taking the maximum of the derivative and the last equality is a result of Eq. D16. Finally, substituting \(E[|\sigma _n-\sigma '_n|]\) into Eq. D16,
Thus, the geometric convergence rate is \(D=\sqrt{\beta ^2 E[Z_0^2]+\gamma ^2}\). \(\square \)
Proof of Lemma 5.15
Let \(\{X_n\}_{n\ge 1}\in {\mathbb {R}}\) and \(\{X'_n\}_{n\ge 1}\in {\mathbb {R}}\) be two copies of the GARCH process.
For \(n\ge 2\), suppose that \(Z_n, Z'_n\) are independent. By Proposition 2.3, the total variation distance between the two processes is bounded above by the expectation of the total variation.
Let \(\Delta =\sigma '_{n}-\sigma _{n}\) and \(\Delta '=\frac{\Delta }{\sigma _{n}}\). WLOG, \(\sigma '_{n}<\sigma _{n}\), so \(\Delta , \Delta ' <0\) because \(\sigma _{n},\sigma '_{n}>0\) and
Let the density of \(Z_n\) be \(\pi _{Z_n}(x)\), then \(\pi _{(1+\Delta ')Z_n}(x)= \frac{1}{1+\Delta '}\pi _{Z_n}\left( \frac{x}{1+\Delta '}\right) \).
\(\square \)
Proof of Lemma 5.16
\(\square \)
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sixta, S., Rosenthal, J.S. Convergence rate bounds for iterative random functions using one-shot coupling. Stat Comput 32, 71 (2022). https://doi.org/10.1007/s11222-022-10134-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-022-10134-x