Appendix 1. Proofs of theoretical results
1.1 Appendix 1.1: Bounding approximate to centralized HMC
We first bound the distance in probability for the chain governing the update for \(\hat{\varvec{\omega }}^t\) in (7) and for \(\tilde{\varvec{\omega }}^t\) in (6).
Note that by construction, it always holds that,
$$\begin{aligned} \begin{array}{l} \tilde{\varvec{g}}^t = {\textbf{1}}_m\otimes \sum _i \nabla U(\tilde{\varvec{\omega }};X_i,Y_i)={\bar{G}}(\tilde{\varvec{\omega }}),\text { and} \\ \tilde{\aleph }^t = {\textbf{1}}_m\otimes \sum _i ({\textbf{p}}_i^t)^T\left( \nabla ^2_{\varvec{\omega }^2} U(\tilde{\varvec{\omega }};X_i,Y_i)\right) ({\textbf{p}}_i^t) \end{array} \end{aligned}$$
(16)
Thus the only cause of a discrepancy between the chains for \(\tilde{\varvec{\omega }}^t\) and \(\hat{\varvec{\omega }}^t\) is the truncation of the potential at the second order to compute the acceptance probability. In particular we know that the error in this case is simply the Taylor expansion error, which is bounded by,
$$\begin{aligned} \left| \frac{\epsilon ^3}{6} \frac{\partial ^3 U}{\partial \varvec{\omega }^3} [\varvec{p}^t][\varvec{p}^t][\varvec{p}^t]\right| \le \frac{\epsilon ^3 L_3 \Vert \varvec{p}^t\Vert ^3}{6} \end{aligned}$$
(17)
where \(L_3\) is given in Assumption 3.1.
Thus the discrepancy between \(\hat{\varvec{\omega }}\) and \(\tilde{\varvec{\omega }}\) amounts to the possibility of acceptance in one case and not the other, whose probability is bounded by (17) with the error being bounded by the change in the step, or \(\epsilon (\varvec{p}^t+\epsilon {\bar{G}}(\tilde{\varvec{\omega }}^t))\) (and \({\bar{G}}(\hat{\varvec{\omega }})\) in the other case). Let us now prove the main result of this Section. As by the notation of Algorithm 1 we refer to \(\tilde{\omega }^*\) and \({\hat{\omega }}^*\) as the proposed parameter samples following the Euler update for \(\tilde{\omega }^t\) and \({\hat{\omega }}^t\), respectively.
Proof
of Theorem 3.1
For notational brevity, we let,
$$\begin{aligned} \begin{array}{l} {\mathcal {M}}^t:= {\mathcal {M}}(\tilde{\varvec{\omega }}^{*},\tilde{\varvec{\omega }}^{t},\tilde{\aleph }^{t+1},u^t),\text { and}\\ \hat{{\mathcal {M}}}^t:= \hat{{\mathcal {M}}}(\hat{\varvec{\omega }}^{*},\hat{\varvec{\omega }}^{t},u^t) \end{array} \end{aligned}$$
Performing the recursion, we have that,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t+1}-\hat{\varvec{\omega }}^{t+1}\right\| \le {\mathbb {P}}\left[ ({\mathcal {M}}^t) = \tilde{\varvec{\omega }}^{t}) \cap (\hat{{\mathcal {M}}}^t = \hat{\varvec{\omega }}^{t})\right] {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t}-\hat{\varvec{\omega }}^{t}\right\| \\ \qquad +{\mathbb {P}}\left[ ({\mathcal {M}}^t) = \tilde{\varvec{\omega }}^{*}) \cap (\hat{{\mathcal {M}}}^t = \hat{\varvec{\omega }}^{*})\right] {\mathbb {E}}\left\| \epsilon (\varvec{p}^t+\epsilon {\bar{G}}(\tilde{\varvec{\omega }}^t))+\tilde{\varvec{\omega }}^t-\hat{\varvec{\omega }}^t-\epsilon (\varvec{p}^t+\epsilon {\bar{G}}(\hat{\varvec{\omega }}^t))\right\| \\ \qquad +{\mathbb {P}}\left[ ({\mathcal {M}}^t) = \tilde{\varvec{\omega }}^{*}) \cap (\hat{{\mathcal {M}}}^t = \hat{\varvec{\omega }}^{t})\right] {\mathbb {E}}\left\| \epsilon (\varvec{p}^t+\epsilon {\bar{G}}(\tilde{\varvec{\omega }}^t))+\tilde{\varvec{\omega }}^t-\hat{\varvec{\omega }}^t\right\| \\ \qquad +{\mathbb {P}}\left[ ({\mathcal {M}}^t) = \tilde{\varvec{\omega }}^{t}) \cap (\hat{{\mathcal {M}}}^t = \hat{\varvec{\omega }}^{*})\right] {\mathbb {E}}\left\| \epsilon (\varvec{p}^t+\epsilon {\bar{G}}(\hat{\varvec{\omega }}^t))+\hat{\varvec{\omega }}^t-\tilde{\varvec{\omega }}^t\right\| \end{array} \end{aligned}$$
i.e., we partition the discrepancy as by probability of acceptance and rejection for each chain and the associated discrepancy with each form of iteration. Using the triangle inequality and the fact that the probabilities are mutually exclusive and exhaustive and so add to one, we have,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t+1}-\hat{\varvec{\omega }}^{t+1}\right\| \le {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t}-\hat{\varvec{\omega }}^{t}\right\| \\ \qquad + {\mathbb {P}}\left[ ({\mathcal {M}}^t) = \tilde{\varvec{\omega }}^{*}) \cap (\hat{{\mathcal {M}}}^t = \hat{\varvec{\omega }}^{*})\right] {\mathbb {E}}\left\| \epsilon (\varvec{p}^t+\epsilon {\bar{G}}(\tilde{\varvec{\omega }}^t))-\epsilon (\varvec{p}^t+\epsilon {\bar{G}}(\hat{\varvec{\omega }}^t))\right\| \\ \qquad +{\mathbb {P}}\left[ ({\mathcal {M}}^t) = \tilde{\varvec{\omega }}^{*}) \cap (\hat{{\mathcal {M}}}^t = \hat{\varvec{\omega }}^{t})\right] {\mathbb {E}}\left\| \epsilon (\varvec{p}^t+\epsilon {\bar{G}}(\tilde{\varvec{\omega }}^t))\right\| \\ \qquad +{\mathbb {P}}\left[ ({\mathcal {M}}^t) = \tilde{\varvec{\omega }}^{t}) \cap (\hat{{\mathcal {M}}}^t = \hat{\varvec{\omega }}^{*})\right] {\mathbb {E}}\left\| \epsilon (\varvec{p}^t+\epsilon {\bar{G}}(\hat{\varvec{\omega }}^t))\right\| \end{array} \end{aligned}$$
(18)
Now, let us bound the difference in the acceptance probabilities for the two chains as follows,
$$\begin{aligned} \begin{array}{l} {\mathbb {P}}\left[ ({\mathcal {M}}^t) = \tilde{\varvec{\omega }}^{*}) \cap (\hat{{\mathcal {M}}}^t = \hat{\varvec{\omega }}^{t})\right] + {\mathbb {P}}\left[ ({\mathcal {M}}^t) = \tilde{\varvec{\omega }}^{t}) \cap (\hat{{\mathcal {M}}}^t = \hat{\varvec{\omega }}^{*})\right] \\ = \left| \exp \{-\epsilon ^2\varvec{\aleph }^t -\epsilon ^2\Vert \varvec{g}^t\Vert ^2\}-\exp \{-H(\hat{\varvec{\omega }}^*,\varvec{p}^*)+H(\hat{\varvec{\omega }}^t,\varvec{p}^t)\} \right| \\ = \exp \left\{ -H(\hat{\varvec{\omega }}^*,\varvec{p}^*)+H(\hat{\varvec{\omega }}^t,\varvec{p}^t)\right\} \left( \exp \left\{ H(\hat{\varvec{\omega }}^*,\varvec{p}^*)-H(\hat{\varvec{\omega }}^t,\varvec{p}^t)-\epsilon ^2\varvec{\aleph }^t -\epsilon ^2\Vert \varvec{g}^t\Vert ^2\right\} -1\right) \\ \le e \left| H(\hat{\varvec{\omega }}^*,\varvec{p}^*)-H(\varvec{\omega }^t,\varvec{p}^t)-\varvec{\aleph }^t -\epsilon ^2\Vert \varvec{g}^t\Vert ^2\right| \\ \le e \left[ \epsilon ^2 \left| \Vert \varvec{g}^t\Vert ^2-\Vert \hat{\varvec{g}}^t\Vert ^2\right| +\epsilon ^3 L_3 \Vert \varvec{p}^t\Vert ^2\right] \\ \le e \left[ \epsilon ^2 (\Vert \varvec{g}^t\Vert -\Vert \hat{\varvec{g}}^t\Vert )(\Vert \varvec{g}^t\Vert +\Vert \hat{\varvec{g}}^t\Vert -\Vert \hat{\varvec{g}}^t\Vert +\Vert \hat{\varvec{g}}^t\Vert )+\epsilon ^3 L_3 \Vert \varvec{p}^t\Vert ^2\right] \\ \le e \left[ 2 \epsilon ^2 L_2 \Vert \hat{\varvec{\omega }}^t-\varvec{\omega }^t\Vert U+\epsilon ^3 L_3 M^{(2)}\right] \end{array} \end{aligned}$$
We also have the following norm bound on the expected update parameter update,
$$\begin{aligned} \begin{array}{rl} {\mathbb {E}}\left\| \epsilon (\varvec{p}^t+\epsilon {\bar{G}}(\tilde{\varvec{\omega }}^t))\right\| &{}\le \epsilon {\mathbb {E}}\left\| \varvec{p}^t\right\| +\epsilon ^2{\mathbb {E}}\left\| {\bar{G}}(\tilde{\varvec{\omega }}^t)-{\bar{G}}(\hat{\varvec{\omega }}^t)\right\| +\epsilon ^2{\mathbb {E}}\left\| {\bar{G}}(\hat{\varvec{\omega }}^t\right\| \\ {} &{}\le \epsilon M^{(1)}+\epsilon ^2L_2 {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^t-\hat{\varvec{\omega }}^t\right\| +\epsilon ^2U_g \end{array} \end{aligned}$$
In the case of dual acceptance,
$$\begin{aligned} {\mathbb {E}}\left\| \epsilon (\varvec{p}^t+\epsilon {\bar{G}}(\tilde{\varvec{\omega }}^t))-\epsilon (\varvec{p}^t+\epsilon {\bar{G}}(\hat{\varvec{\omega }}^t))\right\| \le \epsilon ^2 L_2 \Vert \tilde{\varvec{\omega }}^t-\hat{\varvec{\omega }}^t\Vert \end{aligned}$$
Combining these last three bounds with (18), we finally have
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t+1}-\hat{\varvec{\omega }}^{t+1}\right\| \le \left[ 1+\epsilon ^2 L_2+2e\epsilon ^3L_2 (M^{(1)}+\epsilon U_g)+e\epsilon ^5 L_2L_3 M^{(2)}\right] {\mathbb {E}} \Vert \tilde{\varvec{\omega }}^t-\hat{\varvec{\omega }}^t\Vert \\ \qquad + 2e\epsilon ^4 L_2^2 ({\mathbb {E}} \Vert \tilde{\varvec{\omega }}^t-\hat{\varvec{\omega }}^t\Vert )^2 + e\epsilon ^4 L_3 M^{(2)}(M^{(1)}+\epsilon U_g) \end{array} \end{aligned}$$
Thus, with \(\tilde{\varvec{\omega }}^0=\hat{\varvec{\omega }}^0\) we have that, for any \(\epsilon\), if \(T(\epsilon )\in {\mathbb {N}}\) is sufficiently small such that,
$$\begin{aligned} \begin{array}{l} A(\epsilon )^{T(\epsilon )} B(\epsilon ) \le 1,\\ A(\epsilon ):= 1+\epsilon ^2 L_2+2e\epsilon ^3L_2 (M^{(1)}+\epsilon U_g)+e\epsilon ^5 L_2L_3 M^{(2)}+2e\epsilon ^4 L_2^2,\\ B(\epsilon ):= e\epsilon ^4 L_3 M^{(2)}(M^{(1)}+\epsilon U_g) \end{array} \end{aligned}$$
we have that for \(t\le T(\epsilon )\),
$$\begin{aligned} {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t+1}-\hat{\varvec{\omega }}^{t+1}\right\| \le \sum \limits _{s=0}^t A(\epsilon )^s B(\epsilon ) \end{aligned}$$
(19)
\(\square\)
1.2 Appendix 1.2: Consensus between decentralized and averaged HMC
Now we relate the process as generated by Algorithm 1 to the average dynamics as given by (5).
Proof
of Theorem 3.2 Consider the recursion in expected L2 error.
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\Vert \bar{\varvec{\omega }}^{t+1}-\varvec{\omega }^{t+1}\Vert + {\mathbb {E}}\Vert \bar{\varvec{g}}^{t+1}-\varvec{g}^{t+1}\Vert + {\mathbb {E}}\Vert {{\bar{\aleph }}}^{t+1}-\aleph ^{t+1}\Vert \\ \le {\mathbb {E}}\left\| {\textbf{W}}{\mathcal {M}}({\varvec{\omega }}^t-\epsilon (\varvec{p}^t+\epsilon \varvec{g}^{t+1}),\varvec{\omega }^t,\aleph ^{t+1})-\frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T){\mathcal {M}}({\varvec{ \omega }}^t-\epsilon (\varvec{p}^t+\epsilon {\varvec{ g}}^{t+1}),{\varvec{\omega }}^t,\aleph ^{t+1})\right\| \\ \quad + {\mathbb {E}}\left\| \left( {\textbf{W}}-\frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)\right) \left( \bar{\varvec{g}}^t-\varvec{g}^t\right) \right\| \\ \quad +{\mathbb {E}}\left\| \left( {\textbf{W}}-\frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)\right) \left[ G( \varvec{\omega }^t)-G(\bar{\varvec{\omega }}^t)-G( \varvec{\omega }^{t-1})+G(\bar{\varvec{\omega }}^{t-1})\right] \right\| \\ \quad + {\mathbb {E}}\left\| \left( {\textbf{W}}-\frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)\right) \left( {{\bar{\aleph }}}^t-\aleph ^t\right) \right\| \\ \quad +{\mathbb {E}}\left\| \left( {\textbf{W}}-\frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)\right) \left[ H(\varvec{\omega }^t)-H(\bar{\varvec{\omega }}^t)-H(\varvec{\omega }^{t-1})+H(\bar{\varvec{\omega }}^{t-1})\right] \right\| \\ \le {\mathbb {E}}\left\| \left( {\textbf{W}}-\frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)\right) \left( \bar{\varvec{g}}^t-\varvec{g}^t\right) \right\| + \epsilon ^2\Vert {\textbf{g}}^{t+1}-\bar{{\textbf{g}}}^{t+1}\Vert \\ \qquad +\epsilon {\mathbb {E}}\left\| \left( {\textbf{W}}-\frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)\right) \varvec{p}^t\right\| \\ \quad +(1+L_2+L_3){\mathbb {E}}\left\| \left( {\textbf{W}}-\frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)\right) \left( \bar{\varvec{\omega }}^t-\varvec{\omega }^t\right) \right\| \\ \quad +(L_2+L_3){\mathbb {E}}\left\| \left( {\textbf{W}}-\frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)\right) \left( \bar{\varvec{\omega }}^{t-1}-\varvec{\omega }^{t-1}\right) \right\| \\ \quad + {\mathbb {E}}\left\| \left( {\textbf{W}}-\frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)\right) \left( {{\bar{\aleph }}}^t-\aleph ^t\right) \right\| \\ \le 2\beta {\mathbb {E}}\left\| \bar{\varvec{g}}^t-\varvec{g}^t\right\| +2\epsilon \beta {\mathbb {E}}\left\| \varvec{p}^t\right\| \\ \quad +2(1+L_2+L_3)\beta {\mathbb {E}}\left\| \bar{\varvec{\omega }}^t-\varvec{\omega }^t\right\| +2(L_2+L_3)\beta {\mathbb {E}}\left\| \left( \bar{\varvec{\omega }}^{t-1}-\varvec{\omega }^{t-1}\right) \right\| \\ \quad + 2\beta {\mathbb {E}}\left\| {{\bar{\aleph }}}^t-\aleph ^t\right\| \end{array} \end{aligned}$$
(20)
where we have used that \({\textbf{W}} \bar{\varvec{g}}^t = \frac{1}{m}\left( \varvec{I}\otimes \varvec{1} \varvec{1}^T\right) \varvec{g}^t= \frac{1}{m}\left( \varvec{I}\otimes \varvec{1} \varvec{1}^T\right) \bar{\varvec{g}}^t\), etc. throughout and, e.g., Di Lorenzo & Scutari (2016, Lemma 6) for the fact that \(\left\| \left( {\textbf{W}}-\frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)\right) \right\| \le \beta\). In the last inequality we subtracted \(\epsilon ^2\Vert {\textbf{g}}^{t+1}-\bar{{\textbf{g}}}^{t+1}\Vert\) from both sides and lower bounded the left hand side by half of its original.
Now the recursion implies that, using induction on the iterates,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\Vert \bar{\varvec{\omega }}^{t+1}-\varvec{\omega }^{t+1}\Vert + {\mathbb {E}}\Vert \bar{\varvec{g}}^{t+1}-\varvec{g}^{t+1}\Vert + {\mathbb {E}}\Vert {{\bar{\aleph }}}^{t+1}-\aleph ^{t+1}\Vert \\ \le \sum \limits _{s=0}^t \left( 2\beta (3+2L_2+2 L_3)\right) ^s\epsilon M^{(1)} \le \frac{\epsilon M^{(1)}}{1-2\beta (3+2L_2+2 L_3)} \end{array} \end{aligned}$$
\(\square\)
1.3 Appendix 1.3: Bounding averaged to approximate HMC
Proof
of Theorem 5.3 We have,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left\| \tilde{\varvec{g}}^{t+1}-\bar{\varvec{g}}^{t+1}\right\| \le \\ {\mathbb {E}}\left\| \frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)\left[ \tilde{\varvec{g}}^{t}-\bar{\varvec{g}}^t+G(\tilde{\varvec{\omega }}^t)-G(\bar{\varvec{\omega }}^t)+G(\bar{\varvec{\omega }}^t)-G({\varvec{\omega }}^t)\right. \right. \\ \qquad \left. \left. +G(\tilde{\varvec{\omega }}^{t-1})-G(\bar{\varvec{\omega }}^{t-1})+G(\bar{\varvec{\omega }}^{t-1})-G({\varvec{\omega }}^{t-1})\right] \right\| \\ \le {\mathbb {E}}\left\| \tilde{\varvec{g}}^{t}-\bar{\varvec{g}}^t\right\| + L_2 {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t}-\bar{\varvec{\omega }}^t\right\| +L_2{\mathbb {E}} \left\| {\varvec{\omega }}^{t}-\bar{\varvec{\omega }}^t\right\| \\ \qquad + L_2 {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t-1}-\bar{\varvec{\omega }}^{t-1}\right\| +L_2 {\mathbb {E}}\left\| {\varvec{\omega }}^{t-1}-\bar{\varvec{\omega }}^{t-1}\right\| \end{array} \end{aligned}$$
(21)
By the same argument we have,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left\| \tilde{{\aleph }}^{t+1}-\bar{{\aleph }}^{t+1}\right\| \le {\mathbb {E}}\left\| \tilde{{\aleph }}^{t}-\bar{{\aleph }}^t\right\| + L_2{\mathbb {E}} \left\| \tilde{\varvec{\omega }}^{t}-\bar{\varvec{\omega }}^t\right\| +L_2 {\mathbb {E}}\left\| {\varvec{\omega }}^{t}-\bar{\varvec{\omega }}^t\right\| \\ \qquad + L_2{\mathbb {E}} \left\| \tilde{\varvec{\omega }}^{t-1}-\bar{\varvec{\omega }}^{t-1}\right\| +L_2 {\mathbb {E}} \left\| {\varvec{\omega }}^{t-1}-\bar{\varvec{\omega }}^{t-1}\right\| \end{array} \end{aligned}$$
(22)
Now we derive the difference in the parameters, noting in the first inequality below that we split the difference across the old parameter values, the update to the old parameter values, and exhaustively splitting the cases of one proposed parameter being accepted and the other not.
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t+1}-\bar{\varvec{\omega }}^{t+1}\right\| \le \\ {\mathbb {E}}\left\| \frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)\left[ {\mathcal {M}}(\varvec{\omega }^t-\epsilon (\varvec{p}^t+\epsilon \varvec{g}^{t+1}),\varvec{\omega }^t,\aleph ^{t+1},u^t) - {\mathcal {M}}(\tilde{\varvec{\omega }}^t-\epsilon (\varvec{p}^t+\epsilon \tilde{\varvec{g}}^{t+1}),\tilde{\varvec{\omega }}^t,{\tilde{\aleph }}^{t+1},u^t)\right] \right\| \\ \le {\mathbb {E}}\left\| \frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)(\varvec{\omega }^t-\tilde{\varvec{\omega }}^t)\right\| + \epsilon ^2{\mathbb {E}}\left\| \frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)(\varvec{g}^{t+1}-\tilde{\varvec{g}}^{t+1})\right\| \\ \quad +{\mathbb {P}}\left[ ({\mathcal {M}}(\varvec{\omega }^t-\epsilon (\varvec{p}^t+\epsilon \varvec{g}^{t+1}),\varvec{\omega }^t,\aleph ^{t+1},u^t) = \varvec{\omega }^t+\epsilon (\varvec{p}^t+\epsilon \varvec{g}^{t+1})) \right. \\ \qquad \left. \cap ({\mathcal {M}}(\tilde{\varvec{\omega }}^t-\epsilon (\varvec{p}^t+\epsilon \tilde{\varvec{g}}^{t+1}),\tilde{\varvec{\omega }}^t,{\tilde{\aleph }}^{t+1},u^t) = \tilde{\varvec{\omega }}^{t})\right] \\ \qquad \times {\mathbb {E}}\left\| \epsilon (\varvec{p}^t+\epsilon (\varvec{g}^{t+1}-\bar{\varvec{g}}^{t+1}+\bar{\varvec{g}}^{t+1}-\tilde{\varvec{g}}^{t+1}+\tilde{\varvec{g}}^{t+1}-\hat{\varvec{g}}^{t+1}-\hat{\varvec{g}}^{t+1}))\right\| \\ \qquad +{\mathbb {P}}\left[ ({\mathcal {M}}(\varvec{\omega }^t-\epsilon (\varvec{p}^t+\epsilon \varvec{g}^{t+1}),\varvec{\omega }^t,\aleph ^{t+1},u^t) = \varvec{\omega }^t) \right. \\ \qquad \left. \cap ({\mathcal {M}}(\tilde{\varvec{\omega }}^t-\epsilon (\varvec{p}^t+\epsilon \tilde{\varvec{g}}^{t+1}),\tilde{\varvec{\omega }}^t,{\tilde{\aleph }}^{t+1},u^t) = \tilde{\varvec{\omega }}^{t}+\epsilon (\varvec{p}^t+\epsilon \tilde{\varvec{g}}^{t+1})\right] \\ \qquad \times {\mathbb {E}}\left\| \epsilon (\varvec{p}^t+\epsilon \tilde{\varvec{g}}^{t+1})\right\| \end{array}\end{aligned}$$
(23)
Clearly,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left\| \frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)(\varvec{\omega }^t-\tilde{\varvec{\omega }}^t)\right\| + \epsilon ^2{\mathbb {E}}\left\| \frac{1}{m}(\varvec{I}\otimes \varvec{1} \varvec{1}^T)(\varvec{g}^{t+1}-\tilde{\varvec{g}}^{t+1})\right\| \\ \qquad \le {\mathbb {E}}\left\| \bar{\varvec{\omega }}^t-\tilde{\varvec{\omega }}^t)\right\| +\epsilon ^2{\mathbb {E}}\left\| \bar{\varvec{g}}^{t+1}-\tilde{\varvec{g}}^{t+1}\right\| \end{array}\end{aligned}$$
Now we must bound the discrepancy in probability of acceptance. We do this similarly as in the proof of Theorem 1. We use \({\mathcal {M}}_t\) and \(\tilde{{\mathcal {M}}}_t\) as shorthand for \({\mathcal {M}}(\varvec{\omega }^t-\epsilon (\varvec{p}^t+\epsilon \varvec{g}^{t+1}),\varvec{\omega }^t,\aleph ^{t+1},u^t)\) and \({\mathcal {M}}(\tilde{\varvec{\omega }}^t-\epsilon (\varvec{p}^t+\epsilon \tilde{\varvec{g}}^{t+1}),\tilde{\varvec{\omega }}^t,{\tilde{\aleph }}^{t+1},u^t)\), respectively. We compute the bounds,
$$\begin{aligned} \begin{array}{l} {\mathbb {P}}\left[ ({\mathcal {M}}^t) = \varvec{\omega }^{*}) \cap (\tilde{{\mathcal {M}}}^t = \tilde{\varvec{\omega }}^{t})\right] + {\mathbb {P}}\left[ ({\mathcal {M}}^t) = {\varvec{\omega }}^{t}) \cap (\tilde{{\mathcal {M}}}^t = \tilde{\varvec{\omega }}^{*})\right] \\ = \left| \exp \{-\epsilon ^2\varvec{\aleph }^t -\epsilon ^2\Vert \varvec{g}^t\Vert ^2\}-\exp \{-\epsilon ^2\tilde{\varvec{\aleph }}^t -\epsilon ^2\Vert \tilde{\varvec{g}}^t\Vert ^2\} \right| \\ = \exp \left\{ -\epsilon ^2\varvec{\aleph }^t -\epsilon ^2\Vert \varvec{g}^t\Vert ^2\right\} \left( \exp \left\{ \epsilon ^2\varvec{\aleph }^t +\epsilon ^2\Vert \varvec{g}^t\Vert ^2-\epsilon ^2\tilde{\varvec{\aleph }}^t -\epsilon ^2\Vert \tilde{\varvec{g}}^t\Vert ^2\right\} -1\right) \\ \le e \epsilon ^2\left| \varvec{\aleph }^t +\Vert \varvec{g}^t\Vert ^2-\tilde{\varvec{\aleph }}^t -\Vert \tilde{\varvec{g}}^t\Vert ^2\right| \\ \le e \epsilon ^2 \left[ \left| \Vert \varvec{g}^t\Vert ^2-\Vert \tilde{\varvec{g}}^t\Vert ^2\right| +\Vert \varvec{\aleph }^t-\tilde{\varvec{\aleph }}^t\Vert \right] \\ \le e \epsilon ^2\left[ (\varvec{g}^t-\tilde{\varvec{g}}^t)^T(\varvec{g}^t+\tilde{\varvec{g}}^t-\tilde{\varvec{g}}^t+\tilde{\varvec{g}}^t-2\hat{\varvec{g}}^t+2\hat{\varvec{g}}^t)+\Vert \varvec{\aleph }^t-\tilde{\varvec{\aleph }}^t\Vert \right] \\ \le e \epsilon ^2\left[ \Vert \varvec{g}^t-\tilde{\varvec{g}}^t\Vert ^2+\Vert \varvec{g}^t-\tilde{\varvec{g}}^t\Vert (\Vert \tilde{\varvec{g}}^t-\hat{\varvec{g}}^t\Vert +2U_g)+\Vert \varvec{\aleph }^t-\tilde{\varvec{\aleph }}^t\Vert \right] \end{array} \end{aligned}$$
Next, we see that we have already prepared the expression to bound the following term,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left\| \epsilon (\varvec{p}^t+\epsilon (\varvec{g}^{t+1}-\bar{\varvec{g}}^{t+1}+\bar{\varvec{g}}^{t+1}-\tilde{\varvec{g}}^{t+1}+\tilde{\varvec{g}}^{t+1}-\hat{\varvec{g}}^{t+1}-\hat{\varvec{g}}^{t+1}))\right\| \\ \qquad \le \epsilon \left( M^{(1)}+\epsilon \left( {\mathbb {E}}\left\| \varvec{g}^{t+1}-\bar{\varvec{g}}^{t+1}\right\| +{\mathbb {E}}\left\| \bar{\varvec{g}}^{t+1}-\tilde{\varvec{g}}^{t+1}\right\| +{\mathbb {E}}\left\| \tilde{\varvec{g}}^{t+1}-\hat{\varvec{g}}^{t+1}\right\| +U_g\right) \right) \end{array} \end{aligned}$$
We see that in the parentheses the terms \({\mathbb {E}}\left\| \varvec{g}^{t+1}-\bar{\varvec{g}}^{t+1}\right\|\) and \({\mathbb {E}}\left\| \tilde{\varvec{g}}^{t+1}-\hat{\varvec{g}}^{t+1}\right\|\) already have a bound due to the previous two Theorems. Next,
$$\begin{aligned} {\mathbb {E}}\left\| \epsilon (\varvec{p}^t+\epsilon \tilde{\varvec{g}}^{t+1})\right\| = {\mathbb {E}}\left\| \epsilon (\varvec{p}^t+\epsilon (\tilde{\varvec{g}}^{t+1}-\hat{\varvec{g}}^{t+1}+\hat{\varvec{g}}^{t+1}))\right\| \le \epsilon M^{(1)}+\epsilon ^2 U_g+\epsilon ^2 {\mathbb {E}}\Vert \tilde{\varvec{g}}^{t+1}-\hat{\varvec{g}}^{t+1}\Vert \end{aligned}$$
Putting all of these expressions together we finally get,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t+1}-\bar{\varvec{\omega }}^{t+1}\right\| \\ \quad \le {\mathbb {E}}\left\| \bar{\varvec{\omega }}^t-\tilde{\varvec{\omega }}^t)\right\| +\epsilon ^2{\mathbb {E}}\left\| \bar{\varvec{g}}^{t+1}-\tilde{\varvec{g}}^{t+1}\right\| \\ \qquad + e \epsilon ^3\left[ {\mathbb {E}}\Vert \varvec{g}^t-\tilde{\varvec{g}}^t\Vert ^2+{\mathbb {E}}\Vert \varvec{g}^t-\tilde{\varvec{g}}^t\Vert (K_i(\epsilon ,t)+2U)+K_c\right] \\ \qquad \times \left[ \left( M^{(1)}+\epsilon \left( K_c+{\mathbb {E}}\left\| \bar{\varvec{g}}^{t+1}-\tilde{\varvec{g}}^{t+1}\right\| +K_i(\epsilon ,t+1)+U_g\right) \right) +M^{(1)}+\epsilon U_g+\epsilon K_i(\epsilon ,t+1)\right] \end{array} \end{aligned}$$
Now, we consider that there exists a \(T_2(\epsilon )\) such that for \(t\le T_2(\epsilon )\), it holds that \(\max \left\{ {\mathbb {E}}\left\| \bar{\varvec{g}}^{t+1}-\tilde{\varvec{g}}^{t+1}\right\| ,\Vert \varvec{g}^t-\tilde{\varvec{g}}^t\Vert \right\} \le 1\). It is clear that such a \(T_2(\epsilon )\) exist for sufficiently small \(\epsilon\), as it holds trivially for \(t=0\). We derive the exact requirement on \(T_2(\epsilon )\) in the sequel. We have, however,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t+1}-\bar{\varvec{\omega }}^{t+1}\right\| \\ \quad \le {\mathbb {E}}\left\| \bar{\varvec{\omega }}^t-\tilde{\varvec{\omega }}^t)\right\| +\epsilon ^2{\mathbb {E}}\left\| \bar{\varvec{g}}^{t+1}-\tilde{\varvec{g}}^{t+1}\right\| \\ \qquad + e \epsilon ^3\left[ {\mathbb {E}}\Vert \bar{\varvec{g}}^t-\tilde{\varvec{g}}^t\Vert +K_c+K_i(\epsilon ,t)+2U+K_c\right] \\ \qquad \quad \times \left[ \left( M^{(1)}+\epsilon \left( K_c+1+K_i(\epsilon ,t+1)+U_g\right) \right) +M^{(1)}+\epsilon U_g+\epsilon K(\epsilon ,t+1)\right] \end{array} \end{aligned}$$
(24)
Finally, combining (21) with (22) and (24) we obtain,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t+1}-\bar{\varvec{\omega }}^{t+1}\right\| +{\mathbb {E}}\left\| \tilde{\varvec{g}}^{t+1}-\bar{\varvec{g}}^{t+1}\right\| +{\mathbb {E}}\left\| \tilde{{\aleph }}^{t+1}-\bar{{\aleph }}^{t+1}\right\| \\ \quad \le {\mathbb {E}}\left\| \tilde{\varvec{g}}^{t}-\bar{\varvec{g}}^t\right\| + L_2 {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t}-\bar{\varvec{\omega }}^t\right\| +L_2K_c \\ \qquad \qquad \qquad \qquad + L_2 {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t-1}-\bar{\varvec{\omega }}^{t-1}\right\| +L_2 K_c \\ \qquad +{\mathbb {E}}\left\| \tilde{{\aleph }}^{t}-\bar{{\aleph }}^t\right\| + L_2{\mathbb {E}} \left\| \tilde{\varvec{\omega }}^{t}-\bar{\varvec{\omega }}^t\right\| +L_2 K_c \\ \qquad \qquad \qquad + L_2{\mathbb {E}} \left\| \tilde{\varvec{\omega }}^{t-1}-\bar{\varvec{\omega }}^{t-1}\right\| +L_2 K_c \\ \qquad +{\mathbb {E}}\left\| \bar{\varvec{\omega }}^t-\tilde{\varvec{\omega }}^t)\right\| +\epsilon ^2{\mathbb {E}}\left\| \bar{\varvec{g}}^{t+1}-\tilde{\varvec{g}}^{t+1}\right\| \\ \qquad + e \epsilon ^3\left[ {\mathbb {E}}\Vert \bar{\varvec{g}}^t-\tilde{\varvec{g}}^t\Vert +K_c+K_i(\epsilon ,t)+2U+K_c\right] \\ \qquad \quad \times \left[ \left( M^{(1)}+\epsilon \left( K_c+1+K_i(\epsilon ,t+1)+U_g\right) \right) +M^{(1)}+\epsilon U_g+\epsilon K_i(\epsilon ,t+1)\right] \end{array} \end{aligned}$$
(25)
Let \({\bar{K}}(\epsilon ,t)=\left( M^{(1)}+\epsilon \left( K_c+1+K_i(\epsilon ,t+1)+U_g\right) \right) +M^{(1)}+\epsilon U_g+\epsilon K_i(\epsilon ,t+1)\). Next, note that the inequality implies a monotonically increasing bound, so we can apply \(\left\| {\varvec{\omega }}^{t-1}-\bar{\varvec{\omega }}^{t-1}\right\| \le \left\| {\varvec{\omega }}^{t}-\bar{\varvec{\omega }}^{t}\right\|\). Then, subtracting \(\epsilon ^2{\mathbb {E}}\left\| \bar{\varvec{g}}^{t+1}-\tilde{\varvec{g}}^{t+1}\right\|\) from both sides and dividing by \(1-\epsilon ^2\) we get,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t+1}-\bar{\varvec{\omega }}^{t+1}\right\| +{\mathbb {E}}\left\| \tilde{\varvec{g}}^{t+1}-\bar{\varvec{g}}^{t+1}\right\| +{\mathbb {E}}\left\| \tilde{{\aleph }}^{t+1}-\bar{{\aleph }}^{t+1}\right\| \\ \quad \le (1-\epsilon ^2)^{-1}\left[ (1+4L_2){\mathbb {E}}\left\| \tilde{\varvec{\omega }}^{t}-\bar{\varvec{\omega }}^{t}\right\| +(1+e\epsilon ^3 {\bar{K}}(\epsilon ,t)){\mathbb {E}}\left\| \tilde{\varvec{g}}^{t}-\bar{\varvec{g}}^{t}\right\| \right. \\ \qquad +\left. {\mathbb {E}}\left\| \tilde{{\aleph }}^{t}-\bar{{\aleph }}^{t}\right\| +3L_2 K_c+{\bar{K}}(\epsilon ,t)+e\epsilon ^3(3K_c+K_i(\epsilon ,t)+2U_g) \right] \end{array} \end{aligned}$$
Now, defining,
$$\begin{aligned} \begin{array}{l} A_2(\epsilon ,t):= (1-\epsilon ^2)^{-1}\left[ 1+4L_2+e\epsilon ^3 {\bar{K}}(\epsilon ,t)\right] \\ B_2(\epsilon ,t):= (1-\epsilon ^2)^{-1}\left[ 3L_2 K_c+{\bar{K}}(\epsilon ,t)+e\epsilon ^3(3K_c+K_i(\epsilon ,t)+2U)\right] \\ T_2(\epsilon ):= \max \left\{ T\in {\mathbb {N}}:\, \sum \limits _{t=0}^T \prod _{s=t+1}^T A_2(\epsilon ,s) B_2(\epsilon ,t) \le 1\right\} \end{array} \end{aligned}$$
We obtain the main result. \(\square\)
1.4 Appendix 1.4: Coupling and contraction outside a finite ball
In this Section we prove Theorem 3.4.
Proof
Let \({\mathcal {A}}_{\omega }\) and \({\mathcal {A}}_{\nu }\) be the acceptance probabilities corresponding to the updates of \(\varvec{\omega }\) and \(\varvec{\nu }\), respectively. Noting that the error q is unknown and cannot be coupled in the two chains, we compute,
$$\begin{aligned} \begin{array}{l} \Vert \bar{{\mathcal {M}}}(\bar{\varvec{\omega }}+\epsilon (\bar{\varvec{p}}+\epsilon \nabla U({\bar{\omega }})),\bar{\varvec{\omega }},u,q_{\omega }) - \bar{{\mathcal {M}}}(\bar{\varvec{\nu }}+\epsilon (\bar{\varvec{p}}+\epsilon \nabla U({\bar{\nu }})),\bar{\varvec{\nu }},u, q_{\nu })\Vert ^2 \\ \le {\textbf{1}}_{{\mathcal {A}}_{\omega }\cap {\mathcal {A}}_{\nu }} \Vert \bar{\varvec{\omega }}-\epsilon ^2 \nabla U(\bar{\varvec{\omega }})-\epsilon ^2 q_{\omega }- \bar{\varvec{\nu }}+\epsilon ^2 \nabla U(\bar{\varvec{\nu }})+\epsilon ^2 q_{\nu }\Vert ^2 \\ \qquad + {\textbf{1}}_{{\mathcal {A}}^c_{\omega }\cup {\mathcal {A}}^c_{\nu }}\left( \Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert ^2+ \max \left\{ \Vert \epsilon (\bar{\varvec{p}}+\epsilon \nabla U(\bar{\varvec{\omega }})+\epsilon q_{\omega })\Vert ^2,\Vert \epsilon (\bar{\varvec{p}}+\epsilon \nabla U(\bar{\varvec{\nu }})+\epsilon q_{\mu })\Vert ^2\right\} \right) \end{array} \end{aligned}$$
From Lemma 3.1 we recall that that,
$$\begin{aligned} \Vert \bar{\varvec{\omega }}-\epsilon ^2 \nabla U(\bar{\varvec{\omega }})-\epsilon ^2 q_{\omega }- \bar{\varvec{\nu }}+\epsilon ^2 \nabla U(\bar{\varvec{\nu }})+\epsilon ^2 q_{\nu }\Vert ^2 \le (1-\epsilon ^2 K/4)\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert ^2 \end{aligned}$$
On the other hand, by Assumption,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left[ \max \left\{ \Vert \epsilon (\bar{\varvec{p}}+\epsilon \nabla U(\bar{\varvec{\omega }})+\epsilon q_{\omega })\Vert ^2,\Vert \epsilon (\bar{\varvec{p}}+\epsilon \nabla U(\bar{\varvec{\nu }})+\epsilon q_{\mu })\Vert ^2\right\} \vert {\mathcal {A}}_{\omega }^c\cup {\mathcal {A}}^c_{\nu }\right] \\ \le \epsilon ^2 {\mathbb {E}}[\Vert \bar{\varvec{p}}\Vert ^2\vert {\mathcal {A}}_{\omega }^c\cup {\mathcal {A}}^c_{\nu }] +\epsilon ^4 L_2 R_2+\epsilon ^4 K_T(\epsilon ) \end{array} \end{aligned}$$
but
$$\begin{aligned} {\mathbb {E}}[\Vert \bar{\varvec{p}}\Vert ^2\vert {\mathcal {A}}_{\omega }^c\cup {\mathcal {A}}^c_{\nu }]{\mathbb {P}}[{\mathcal {A}}_{\omega }^c\cup {\mathcal {A}}^c_{\nu }]\le {\mathbb {E}}\Vert \bar{\varvec{p}}\Vert ^2 =M^{(2)} \end{aligned}$$
Now we must bound \({\mathbb {P}}[{\mathcal {A}}_{\omega }^c\cup {\mathcal {A}}^c_{\nu }]\). We proceed,
$$\begin{aligned} \begin{array}{l} {\mathbb {P}}[{\mathcal {A}}_{\omega }^c\cup {\mathcal {A}}^c_{\nu }]\le 2 {\mathbb {P}}[{\mathcal {A}}_{\omega }^c] \le 1-\exp \left\{ -\epsilon ^2 {\bar{\aleph }}-\epsilon ^2\Vert \bar{\varvec{g}}\Vert ^2+\epsilon ^2 {\bar{\aleph }}-\epsilon ^2 \aleph +\epsilon ^2\Vert \bar{\varvec{g}}\Vert ^2-\epsilon ^2\Vert \varvec{g}\Vert ^2\right\} \\ \qquad \le \epsilon ^2 K_a(\epsilon ) \end{array} \end{aligned}$$
where \(K_a(\epsilon )\) depends on \(L_2\) and \(K_T(\epsilon )\) and the last step follows similar reasoning as in the proof of Theorem 3.1. Putting these bounds together, we get,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\Vert \bar{{\mathcal {M}}}(\bar{\varvec{\omega }}+\epsilon (\bar{\varvec{p}}+\epsilon \nabla U({\bar{\omega }})),\bar{\varvec{\omega }},u,q_{\omega }) - \bar{{\mathcal {M}}}(\bar{\varvec{\nu }}+\epsilon (\bar{\varvec{p}}+\epsilon \nabla U(\bar{\varvec{\nu }})),\bar{\varvec{\nu }},u, q_{\nu })\Vert ^2 \\ \quad \le (1-\epsilon ^2 K/4+\epsilon ^2 K_a(\epsilon )){\mathbb {E}}\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert ^2 +\epsilon ^2 K_a(\epsilon )\left( \epsilon ^2 M^{(2)}+\epsilon ^4 L_2 R_2+\epsilon ^4 K_T(\epsilon )\right) \end{array} \end{aligned}$$
which satisfies the conclusion for sufficiently small \(\epsilon\) relative to \({\mathcal {R}}^2\le \Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert ^2/4\). \(\square\)
1.5 Appendix 1.5: Global coupling and contraction
Here we prove Theorem 3.5. The proof is based on the original proof of Bou-Rabee (2020, Theorem 2.4)
Proof
For \(\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert \ge 2{\mathcal {R}}\), there is the straightforward application of Theorem 3.4, where we write,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left[ \rho \left( \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| \right) -\rho \left( \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \right) \right] \\ \quad \le \rho '\left( \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \right) {\mathbb {E}}\left| \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| -\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \right| \\ \quad \le -\frac{1}{8}K\epsilon ^2 \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \rho '(\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| ) \le -\frac{1}{8} K\epsilon ^2\inf \limits _{r>0}\frac{r\rho '(r)}{\rho (r)} \\ \quad \le -\frac{1}{40}K\epsilon ^2 (1+{\mathcal {R}}/\epsilon )e^{-\frac{5{\mathcal {R}}}{2\epsilon }} \end{array} \end{aligned}$$
Define the event \({\mathcal {C}}:=\{\bar{\varvec{r}}-\bar{\varvec{p}}=\gamma (\bar{\varvec{\omega }}-\bar{\varvec{\nu }})\}\). Now for the case of \(\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| <2{\mathcal {R}}\) we consider the exhaustive decomposition,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left[ \rho \left( \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| \right) -\rho \left( \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \right) \right] \\ \quad = {\mathbb {E}}\left[ \rho \left( \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| \right) -\rho \left( \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \right) {\textbf{1}}( {\mathcal {A}}_{\omega }\cap {\mathcal {A}}_{\nu }\cap {\mathcal {C}} )\right] \\ \qquad + {\mathbb {E}}\left[ \rho \left( \min (R_1,\left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| )\right) -\rho \left( \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \right) {\textbf{1}}({\mathcal {A}}_{\omega }\cap {\mathcal {A}}_{\nu }\cap {\mathcal {C}}^c)\right] \\ \qquad + {\mathbb {E}}\left[ \rho \left( \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| \right) -\rho \left( \min (R_1,\left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| )\right) {\textbf{1}}( {\mathcal {A}}_{\omega }\cap {\mathcal {A}}_{\nu }\cap {\mathcal {C}}^c)\right] \\ \qquad + {\mathbb {E}}\left[ \rho \left( \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| \right) -\rho \left( \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \right) {\textbf{1}}( ({\mathcal {A}}^c_{\omega }\cap {\mathcal {A}}_{\nu })\cup ({\mathcal {A}}_{\omega }\cap {\mathcal {A}}^c_{\nu }))\right] \end{array} \end{aligned}$$
Now for the first expression, under the event \({\mathcal {A}}_{\omega }\cap {\mathcal {A}}_{\nu }\cap {\mathcal {C}}\) it holds that,
$$\begin{aligned} \begin{array}{l} \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| =\left\| \varvec{L}(\bar{\varvec{\omega }},\varvec{p},\varvec{q}_{\omega })_1-\varvec{L}(\bar{\varvec{\nu }},\varvec{r},\varvec{q}_{\nu })_1\right\| \\ \quad = \left\| \bar{\varvec{\omega }}+\epsilon (\varvec{p}+\epsilon \nabla U(\bar{\varvec{\omega }})+\epsilon \varvec{q}_{\omega })-\bar{\varvec{\nu }}-\epsilon (\varvec{r}+\epsilon \nabla U(\bar{\varvec{\nu }})+\epsilon \varvec{q}_{\nu })\right\| \\ \quad = \left\| (1-\epsilon \gamma )(\bar{\varvec{\omega }}-\bar{\varvec{\nu }})+\epsilon ^2 (\nabla U(\bar{\varvec{\omega }})+\varvec{q}_{\omega })-\epsilon ^2 (\nabla U(\bar{\varvec{\nu }})+ \varvec{q}_{\nu })\right\| \\ \quad \le (1-\epsilon \gamma +\epsilon ^2 L_2)\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert +\epsilon ^2 K_T(\epsilon ) \end{array} \end{aligned}$$
and thus by the concavity of \(\rho\),
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left[ \rho \left( \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| \right) -\rho \left( \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \right) {\textbf{1}}( {\mathcal {A}}_{\omega }\cap {\mathcal {A}}_{\nu }\cap {\mathcal {C}} )\right] \\ \quad \le \left[ (-\epsilon \gamma +\epsilon ^2 L_2)\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert +\epsilon ^2 K_T(\epsilon )\right] \rho '(\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert ) [1-{\mathbb {P}}[{\mathcal {A}}^c_{\omega }]-{\mathbb {P}}[{\mathcal {A}}^c_{\nu }]-{\mathbb {P}}[{\mathcal {C}}^c]] \end{array} \end{aligned}$$
Now as in the proof of Theorem 3.4, we have that \(\max ({\mathbb {P}}[{\mathcal {A}}_{\omega }^c],{\mathbb {P}}[{\mathcal {A}}_{\nu }^c])\le \epsilon ^2 K_a(\epsilon )\). Next, recall that by construction and Bou-Rabee (2020, Lemma 3.7) together with \(\gamma {\mathcal {R}}\le 1/4\) we have that \({\mathbb {P}}[{\mathcal {C}}^c]\le \frac{\gamma \Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert }{\sqrt{2\pi }}\le \frac{1}{4\sqrt{2\pi }}< \frac{1}{10}\). Thus, we can make \(\epsilon\) sufficiently small depending on \(\gamma\), \(L_2\) and the form of \(K_a(\epsilon )\) such that,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left[ \rho \left( \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| \right) -\rho \left( \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \right) {\textbf{1}}( {\mathcal {A}}_{\omega }\cap {\mathcal {A}}_{\nu }\cap {\mathcal {C}} )\right] \\ \quad \le -\frac{3}{5}\epsilon \gamma \Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert \rho '(\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert )+ \epsilon ^2 K_T(\epsilon )\rho '(\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert ) \end{array} \end{aligned}$$
Next, it holds that for \(s\le R_1\), \(\rho (s)-\rho (r)\le \frac{1}{a}\rho '(r)\) and thus by Bou-Rabee (2020, Lemma 3.7) and the definition of \(a=1/\epsilon\), we have that,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left[ \rho \left( \min (R_1,\left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| )\right) -\rho \left( \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \right) {\textbf{1}}({\mathcal {A}}_{\omega }\cap {\mathcal {A}}_{\nu }\cap {\mathcal {C}}^c)\right] \\ \quad \le \frac{1}{a}\rho '(\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert ){\mathbb {P}}[{\mathcal {C}}^c] < \frac{2}{5}\gamma \epsilon \Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert \rho '(\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert ) \end{array} \end{aligned}$$
Next, under \({\mathcal {A}}_{\omega }\cap {\mathcal {A}}_{\nu }\cap {\mathcal {C}}^c\), by the definition of \(\varvec{r}\),
$$\begin{aligned} \begin{array}{l} \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| =\left\| \varvec{L}(\bar{\varvec{\omega }},\varvec{p},\varvec{q}_{\omega })_1-\varvec{L}(\bar{\varvec{\nu }},\varvec{r},\varvec{q}_{\nu })_1\right\| \\ \quad = \left\| \bar{\varvec{\omega }}+\epsilon (\varvec{p}+\epsilon \nabla U(\bar{\varvec{\omega }})+\epsilon \varvec{q}_{\omega })-\bar{\varvec{\nu }}-\epsilon (\varvec{r}+\epsilon \nabla U(\bar{\varvec{\nu }})+\epsilon \varvec{q}_{\nu })\right\| \\ \quad = \left\| (1-\epsilon \gamma )(\bar{\varvec{\omega }}-\bar{\varvec{\nu }})+\epsilon ^2 (\nabla U(\bar{\varvec{\omega }})+\varvec{q}_{\omega })-\epsilon ^2 (\nabla U(\bar{\varvec{\nu }})+ \varvec{q}_{\nu })\right\| + \epsilon \Vert \bar{\varvec{p}}-\bar{\varvec{r}}\Vert \\ \quad \le (1-\epsilon \gamma /2)\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert +\epsilon ^2 K_T(\epsilon )+4\epsilon \left\| (\bar{\varvec{\omega }}-\bar{\varvec{\omega }})\cdot \bar{\varvec{p}}/\Vert \bar{\varvec{\omega }}-\bar{\varvec{\omega }}\Vert \right\| \end{array} \end{aligned}$$
Thus, with the fact that by construction \(R_1\ge \frac{5}{4}(1+\gamma \epsilon )\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert\) and by the properties of the coupling (see the proof of Bou-Rabee (2020, Theorem 2.4),
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left[ \rho \left( \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| \right) -\rho \left( \min (R_1,\left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| )\right) {\textbf{1}}( {\mathcal {A}}_{\omega }\cap {\mathcal {A}}_{\nu }\cap {\mathcal {C}}^c)\right] \\ \quad \le \rho '(R_1){\mathbb {E}}\left[ (\varvec{L}(\bar{\varvec{\omega }},\varvec{p},\varvec{q}_{\omega })_1-\varvec{L}(\bar{\varvec{\nu }},\varvec{r},\varvec{q}_{\nu })_1)-R_1)^+{\textbf{1}}({\mathcal {A}}_{\omega }\cap {\mathcal {A}}_{\nu }\cap {\mathcal {C}}^c)\right] \\ \quad \le \left( \frac{5}{4}\gamma \epsilon \Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert +\epsilon ^2 K_T(\epsilon )+4\epsilon M^{(1)}\right) \rho '(R_1) \\ \quad \le \left( \frac{5}{4}\gamma \epsilon \Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert +\epsilon ^2 K_T(\epsilon )\right) e^{-a(R_1-2{\mathcal {R}})}\rho '(\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert ) \\ \quad \le \frac{1}{20}\left( \frac{5}{4}\gamma \epsilon \Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert +\epsilon ^2 K_T(\epsilon )\right) \rho '(\Vert \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\Vert ) \end{array} \end{aligned}$$
Finally,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left[ \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| -\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| {\textbf{1}}( ({\mathcal {A}}^c_{\omega }\cap {\mathcal {A}}_{\nu })\cup ({\mathcal {A}}_{\omega }\cap {\mathcal {A}}^c_{\nu }))\right] \\ \quad = {\mathbb {E}}\left[ \left\| \varvec{L}(\bar{\varvec{\omega }},\varvec{p},\varvec{q}_{\omega })_1-\bar{\varvec{\omega }}\right\| {\textbf{1}} ({\mathcal {A}}^c_{\omega }\cap {\mathcal {A}}_{\nu })\right] +{\mathbb {E}}\left[ \left\| \varvec{L}(\bar{\varvec{\nu }},\varvec{r},\varvec{q}_{\nu })_1-\bar{\varvec{\nu }}\right\| {\textbf{1}}({\mathcal {A}}_{\omega }\cap {\mathcal {A}}^c_{\nu })\right] \\ \quad \le \left[ \epsilon M^{(1)}+\epsilon ^2 R_2+\epsilon ^2 K_T(\epsilon )\right] \epsilon ^2 K_a(\epsilon ) \end{array} \end{aligned}$$
and so, again by the convexity of \(\rho (\cdot )\),
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left[ \rho \left( \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| \right) -\rho \left( \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \right) {\textbf{1}}( ({\mathcal {A}}^c_{\omega }\cap {\mathcal {A}}_{\nu })\cup ({\mathcal {A}}_{\omega }\cap {\mathcal {A}}^c_{\nu }))\right] \\ \quad \le \left[ \epsilon M^{(1)}+\epsilon ^2 R_2+\epsilon ^2 K_T(\epsilon )\right] \epsilon ^2 K_a(\epsilon ) \rho '(\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| ) \end{array} \end{aligned}$$
Finally, putting all of these bounds together, we obtain,
$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left[ \rho \left( \left\| \varvec{\Omega }(\bar{\varvec{\omega }},\bar{\varvec{\nu }})-\varvec{{\mathcal {V}}}(\bar{\varvec{\omega }},\bar{\varvec{\nu }})\right\| \right) -\rho \left( \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \right) \right] \\ \quad \le \left[ -\frac{3}{5}+\frac{2}{5}+\frac{5}{80}\right] \gamma \epsilon \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \rho '(\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| )\\ \qquad +\left[ \epsilon ^2 K_T(\epsilon )+\frac{1}{20}\epsilon ^2 K_T(\epsilon )\right] \rho '(\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| ) \\ \qquad + \left[ \epsilon M^{(1)}+\epsilon ^2 R_2+\epsilon ^2 K_T(\epsilon )\right] \epsilon ^2 K_a(\epsilon )\rho '(\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| ) \\ \quad = -\frac{11}{80} \gamma \epsilon \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \rho '(\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| ) \\ \qquad + \epsilon ^2\left[ \frac{21}{20}K_T(\epsilon )+\epsilon M^{(1)}K_a(\epsilon )+\epsilon ^2 (R_2+K_T(\epsilon ))K_a(\epsilon )\right] \rho '(\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| ) \end{array} \end{aligned}$$
Noting that,
$$\begin{aligned} -\frac{11}{80} \gamma \epsilon \left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| \rho '(\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| ) \le -\frac{11}{80} \gamma \epsilon \inf \limits _{r\le 2{\mathcal {R}}} \frac{r \rho '(r)}{\rho (r)} \rho (\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| )\le -\frac{11}{160} e^{-2{\mathcal {R}}/\epsilon } \rho (\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| ) \end{aligned}$$
and
$$\begin{aligned} \rho '(\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| ) \le \sup \frac{\rho '(r)}{\rho (r)}\rho (\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| ) = a\rho (\left\| \bar{\varvec{\omega }}-\bar{\varvec{\nu }}\right\| ) \end{aligned}$$
we obtain the final result. \(\square\)
Appendix 2. Experiment hyperparameters
1.1 Linear regression
We set the doubly stochastic matrix, \({\textbf{W}} = \frac{1}{N_a}{\textbf{1}}_4\), where number of agents, \(N_a = 4\) and \({\textbf{1}}_4\) is a \(4\times 4\) matrix of ones. We run the experiment over 9 seeds for \(T = 10^5\) iterations. Hardware: MacBook Pro, Processor: 2.6 GHz 6-Core Intel Core i7, Memory: 16 GB.
-
Centralized HMC \(\epsilon = 4\times 10^{-4}\), \(L = 1\), prior precision \(= 1.0\).
-
Decentralized HMC \(\epsilon = 4\times 10^{-4}\), prior precision \(= 1.0\). We switch off the MH step for the first \(10^3\) steps to ensure that the Taylor approximation is only applied from a point closer to the target distribution.
-
Decentralized ULA \(\epsilon = 3\times 10^{-7}\). Following the same notation from Parayil et al. (2020): \(\beta _0 = 0.48,\delta _1 = 0.01, \delta _2 = 0.55, b_1 = 230, b_2 = 230\).
1.2 Logistic regression
Partial observation We set the doubly stochastic matrix, \({\textbf{W}} = \frac{1}{N_a}{\textbf{1}}_4\), where number of agents, \(N_a = 4\) and \({\textbf{1}}_4\) is a \(4\times 4\) matrix of ones. We run the experiment over 9 seeds for \(T = 8\times 10^3\) iterations. Hardware: GeForce RTX 2080 Ti.
-
Centralized HMC \(\epsilon = 0.001\), \(L = 1\), prior precision \(= 100.0\).
-
Decentralized HMC \(\epsilon = 5\times 10^{-4}\), prior precision \(= 100.0\). We switch off the MH step for the first \(2\times 10^3\) steps to ensure that the Taylor approximation is only applied from a point closer to the target distribution.
-
Decentralized ULA \(\epsilon = 1\times 10^{-5}\). Following the same notation from Parayil et al. (2020): \(\beta _0 = 0.48,\delta _1 = 0.01, \delta _2 = 0.55, b_1 = 230, b_2 = 230\).
Ring network We set the doubly stochastic matrix, \({\textbf{W}} = ({\textbf{I}} + {\textbf{A}})\frac{1}{N_a}\), where number of agents, \(N_a = 5\) and \({\textbf{I}}\) is the identity matrix, and \({\textbf{A}}\) is the adjacency matrix for a ring shaped graph. We run the experiment over 9 seeds for \(T = 1\times 10^4\) iterations. Hardware: GeForce RTX 2080 Ti.
-
Centralized HMC \(\epsilon = 0.001\), \(L = 1\), prior precision \(= 100.0\).
-
Decentralized HMC \(\epsilon = 0.003\), prior precision \(= 100.0\). We switch off the MH step for the first \(1\times 10^3\) steps to ensure that the Taylor approximation is only applied from a point closer to the target distribution.
-
Decentralized ULA \(\epsilon = 1\times 10^{-4}\). Following the same notation from Parayil et al. (2020): \(\beta _0 = 0.48,\delta _1 = 0.01, \delta _2 = 0.55, b_1 = 230, b_2 = 230\).
1.3 Bayesian neural network
We set the doubly stochastic matrix, \({\textbf{W}} = \frac{1}{N_a}{\textbf{1}}_2\), where number of agents, \(N_a = 2\) and \({\textbf{1}}_2\) is a \(2\times 2\) matrix of ones. We run the experiment for \(T = 5\times 10^5\) iterations. Decentralized HMC \(\epsilon = 7\times 10^{-5}\), prior precision \(= 10.0\). We switch off the MH step for the first \(2\times 10^3\) steps to ensure that the Taylor approximation is only applied from a point closer to the target distribution. Hardware: GeForce RTX 2080 Ti.