Appendix A: Inexact setting
1.1 A.1 Proof of the Theorem 2.7
Lemma A.1
Let us introduce several definitions.
$$\begin{aligned} \tau _x = (\sigma _x^{-1}+1/2)^{-1}, \end{aligned}$$
(8)
$$\begin{aligned} \alpha _x=\mu _x, \end{aligned}$$
(9)
$$\begin{aligned} \beta _x=\min \left\{ \frac{1}{2L_y}, \frac{1}{2\eta _x L_{xy}^2} \right\} . \end{aligned}$$
(10)
Then the following inequality holds
$$\begin{aligned} \begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\Vert x^{k+1} - x^*\Vert ^2&\le \left( \frac{1}{\eta _x}-\mu _x-\beta _x \mu _{yx}^2 \right) \Vert x^k - x^*\Vert ^2\\&\quad +\left( \mu _x + L_x\sigma _x - \frac{1}{2\eta _x} \right) \mathbb {E}_{\xi _x^k, \xi _y^k}\Vert x^{k+1} - x^k\Vert ^2\\&\quad +B_g(y_g^k,y^*) -B_f(x_g^k,x^*) -\frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k\,\xi _y^k}B_f(x_f^{k+1},x^*)\\&\quad +\left( \frac{2}{\sigma _x} - 1 \right) B_f(x_f^k,x^*) -2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle A^T(y_m^k-y^*), x^{k+1} - x^* \rangle \\&\quad +\delta _y + \left( \frac{4}{\sigma _x}+1 \right) \delta _x + \beta _x\sigma _g^2 + 2\eta _x\sigma _f^2. \end{aligned} \end{aligned}$$
(11)
Proof
The proof is similar to the proof of Lemma B.2 in Kovalev et al. (2020). However, we need to expand the proof to cover the inexact stochastic case. To accomplish this, we need to replace several inequalities in the proof from the referred article by the corresponding inequalities with an inexact stochastic oracle.
Let \(B_f(a, b) = f(a)-f(b)-\langle \nabla f(b), a - b \rangle\), \(B_g(a, b)\) is defined in the same way.
In this analysis, we will need the following inequalities (the term \(\xi ^{k-1}\) is omitted).
$$\begin{aligned}{} & {} \frac{1}{2L_y}\mathbb {E}_{\xi _y^k}\Vert \nabla g_{\delta }(y_g^k, \xi _y^k)-\nabla g(y^*)\Vert ^2 \le B_g(y_g^k,y^*)+\delta _y+\frac{\sigma _g^2}{2L_y}, \end{aligned}$$
(12)
$$\begin{aligned}{} & {} \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k)\nonumber \\{} & {} \quad -\nabla f(x^*), x^{k+1} - x^* \rangle \ge \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k) - \nabla f(x^*), x^{k+1} - x^* \rangle - \eta _x\sigma _f^2, \end{aligned}$$
(13)
$$\begin{aligned}{} & {} \quad \langle \nabla f_{\delta }(x^g_k) - \nabla f(x^*), x_f^{k+1} - x_g^k \rangle \ge B_f(x_f^{k+1},x^*) \nonumber \\{} & {} \quad -B_f(x_g^k,x^*)-\frac{L_x}{2}\Vert x_f^{k+1}-x_g^k\Vert ^2-\delta _x, \end{aligned}$$
(14)
$$\begin{aligned}{} & {} \quad 2\langle \nabla f_{\delta }(x_g^k) - \nabla f(x^*), x_g^k - x^* \rangle \ge 2B_f(x_g^k,x^*) + \mu _x\Vert x_g^k-x^*\Vert ^2 - 2\delta _x, \end{aligned}$$
(15)
$$\begin{aligned}{} & {} \quad \langle \nabla f_{\delta }(x^g_k) - \nabla f(x^*), x_f^k - x_g^k \rangle \le B_f(x_f^k,x^*)-B_f(x_g^k, x^*)+\delta _x. \end{aligned}$$
(16)
Let us prove these inequalities.
Inequality (12). Using \(\mathbb {E}_{\xi _y^k}\nabla g_{\delta }(y_g^k, \xi _y^k) = \nabla g_{\delta }(y_g^k)\) and according to Theorem 1 from Devolder et al. (2014), we have
$$\begin{aligned} \frac{1}{2L_y}\mathbb {E}_{\xi _y^k}\Vert \nabla g_{\delta }(y_g^k, \xi _y^k)-\nabla g(y^*)\Vert ^2&= \frac{1}{2L_y}\mathbb {E}_{\xi _y^k}\Vert \nabla g_{\delta }(y_g^k, \xi _y^k)-\nabla g_{\delta }(y_g^k)\Vert ^2\\&\quad + \frac{1}{2L_y}\Vert \nabla g_{\delta }(y_g^k)-\nabla g(y^*)\Vert ^2 \\&\le g(y_g^k)-g(y^*)-\langle \nabla g(y^*), y_g^k - y^* \rangle + \delta _y + \frac{\sigma _g^2}{2L_y}\\&= B_g(y_g^k,y^*)+\delta _y+\frac{\sigma _g^2}{2L_y}. \end{aligned}$$
We choose inexact oracle \((\hat{g}_{\delta }, \nabla \hat{g}_{\delta })\) such that it is the same as \((g_{\delta }, \nabla g_{\delta })\) at all points except \(y^*\), and at \(y^*\) it equals \((g(y^*), \nabla g(y^*))\).
Inequality (13):
$$\begin{aligned} \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k) - \nabla f(x^*), x^{k+1} - x^* \rangle&= \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k) - \nabla f_{\delta }(x_g^k), x^{k+1} - x^* \rangle \\ {}&\quad + \langle \nabla f_{\delta }(x_g^k) - \nabla f(x^*), x^{k+1} - x^* \rangle . \end{aligned}$$
Using Line (8) of the Algorithm 1 we obtain
$$\begin{aligned} \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k) - \nabla f(x^*), x^{k+1} - x^* \rangle&= \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k) - \nabla f_{\delta }(x_g^k), -\eta _x \nabla f_{\delta }(x_g^k, \xi _x^k) \rangle \\&\quad + \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k) - \nabla f(x^*), x^{k+1} - x^* \rangle \\&= \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k) - \nabla f(x^*), x^{k+1} - x^* \rangle \\&\quad - \eta _x\mathbb {E}_{\xi _x^k}\Vert \nabla f_{\delta }(x_g^k, \xi _x^k) - \nabla f_{\delta }(x_g^k)\Vert ^2\\&\ge \mathbb {E}_{\xi _x^k} \langle \nabla f_{\delta }(x_g^k) - \nabla f(x^*), x^{k+1} - x^* \rangle -\eta _x \sigma _f^2. \end{aligned}$$
Inequality (14):
$$\begin{aligned} \langle \nabla f_{\delta }(x^g_k) - \nabla f(x^*), x_f^{k+1} - x_g^k \rangle&\ge f(x_f^{k+1}) - f_{\delta }(x_g^k)-\frac{L_x}{2}\Vert x_f^{k+1}-x_g^k\Vert ^2\\&\quad -\delta _x - \langle \nabla f(x^*), x_f^{k+1} - x_g^k \rangle \\&\ge f(x_f^{k+1}) - f(x_g^k)-\frac{L_x}{2}\Vert x_f^{k+1}-x_g^k\Vert ^2\\&\quad -\delta _x - \langle \nabla f(x^*), x_f^{k+1} - x_g^k \rangle \\&= B_f(x_f^{k+1},x^*) - B_f(x_g^k,x^*)-\frac{L_x}{2}\Vert x_f^{k+1}-x_g^k\Vert ^2-\delta _x. \end{aligned}$$
Inequality (15):
$$\begin{aligned} 2(f(x^*)-f(x_g^k))-2\langle \nabla f_{\delta }(x_g^k), x^*-x_g^k \rangle + 2\delta _x&\ge 2(f(x^*)-f_{\delta }(x_g^k))-2\langle \nabla f_{\delta }(x_g^k), x^*-x_g^k \rangle \\ {}&\ge \mu _x\Vert x_g^k-x^*\Vert ^2. \end{aligned}$$
Inequality (16):
$$\begin{aligned} \langle \nabla f_{\delta }(x^g_k) - \nabla f(x^*), x_f^k - x_g^k \rangle&\le f(x_f^k) - f_{\delta }(x_g^k)-\langle \nabla f(x^*), x_f^k - x_g^k \rangle \\&\le f(x_f^k) - f(x_g^k)-\langle \nabla f(x^*), x_f^k - x_g^k \rangle +\delta _x\\&= B_f(x_f^k,x^*)-B_f(x_g^k)+\delta _x. \end{aligned}$$
Using Line (8) of the Algorithm 1 we get
$$\begin{aligned} \frac{1}{\eta _x}\left\| x^{k+1} - x^*\right\| ^2&= \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + \frac{2}{\eta _x}\langle x^{k+1} - x^k,x^{k+1} - x^*\rangle - \frac{1}{\eta _x}\left\| x^{k+1} - x^k\right\| ^2\\&=\frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + 2\alpha _x\langle x_g^k - x^k,x^{k+1}- x^*\rangle \\&\quad -2\beta _x\langle \textbf{A}^\top (\textbf{A}x^k - \nabla g_{\delta }(y_g^k, \xi _y^k, \xi ^{k-1}),x^{k+1} - x^*\rangle \\&\quad -2\langle \nabla f_{\delta }(x_g^k, \xi _x^k, \xi ^{k-1}) + \textbf{A}^\top y_m^k,x^{k+1} - x^*\rangle - \frac{1}{\eta _x}\left\| x^{k+1} - x^k\right\| ^2. \end{aligned}$$
Using the parallelogram rule we get
$$\begin{aligned} \frac{1}{\eta _x}\left\| x^{k+1} - x^*\right\| ^2&= \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2\\&\quad +\alpha _x\left( \left\| x_g^k - x^*\right\| ^2 - \left\| x_g^k - x^{k+1}\right\| ^2 - \left\| x^{k} - x^*\right\| ^2+\left\| x^{k+1} - x^k\right\| ^2\right) \\&\quad -2\beta _x\langle \textbf{A}x^k - \nabla g_{\delta }(y_g^k, \xi _y^k, \xi ^{k-1}),\textbf{A}(x^{k+1} - x^*)\rangle \\&\quad -2\langle \nabla f_{\delta }(x_g^k, \xi _x^k, \xi ^{k-1}) + \textbf{A}^\top y_m^k,x^{k+1} - x^*\rangle - \frac{1}{\eta _x}\left\| x^{k+1} - x^k\right\| ^2. \end{aligned}$$
Using the optimality condition \(\nabla g(y^*) = \textbf{A}x^*\), which follows from \(\nabla _y F(x^*, y^*) = 0\), and the parallelogram rule we get
$$\begin{aligned} \frac{1}{\eta _x}\left\| x^{k+1} - x^*\right\| ^2&= \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2\\&\quad +\alpha _x\left( \left\| x_g^k - x^*\right\| ^2 - \left\| x_g^k - x^{k+1}\right\| ^2 - \left\| x^{k} - x^*\right\| ^2\right. \\&\quad \left. +\left\| x^{k+1} - x^k\right\| ^2\right) \\&\quad +\beta _x\left( \left\| \textbf{A}(x^{k+1} - x^k)\right\| ^2 - \left\| \textbf{A}(x^k - x^*)\right\| ^2\right) \\&\quad +\beta _x\left( \left\| \nabla g_{\delta }(y_g^k, \xi _y^k, \xi ^{k-1}) - \nabla g(y^*)\right\| ^2 \right. \\&\left. \quad - \left\| \nabla g_{\delta }(y_g^k, \xi _y^k, \xi ^{k-1}) - \textbf{A}(x^{k+1})\right\| ^2\right) \\&\quad -2\langle \nabla f_{\delta }(x_g^k, \xi _x^k, \xi ^{k-1}) + \textbf{A}^\top y_m^k,x^{k+1} - x^*\rangle \\&\quad - \frac{1}{\eta _x}\left\| x^{k+1} - x^k\right\| ^2. \end{aligned}$$
Using Assumption (1.7) and Eq. (12), we get
$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + \alpha _x\left\| x_g^k - x^*\right\| ^2 - \alpha _x\left\| x^{k} - x^*\right\| ^2\\&\qquad +\alpha _x\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + \beta _xL_{xy}^2\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2\\&\qquad -\beta _x\mu _{yx}^2\left\| x^k - x^*\right\| ^2 \\&\qquad + 2\beta _xL_yB_g(y_g^k,y^*) + 2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2\\&\qquad -2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k, \xi ^{k-1}) + \textbf{A}^\top y_m^k,x^{k+1} - x^*\rangle \\&\qquad -\frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\&\quad =\left( \frac{1}{\eta _x} - \alpha _x- \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2\\&\qquad +\left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\&\qquad +2\beta _xL_yB_g(y_g^k,y^*)\\&\qquad +\alpha _x\left\| x_g^k - x^*\right\| ^2 - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k, \xi ^{k-1}) + \textbf{A}^\top y_m^k,x^{k+1} -x^*\rangle \\&\qquad +2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2. \end{aligned}$$
Using the optimality condition \(\nabla f(x^*) + \textbf{A}^\top y^* = 0\), which follows from \(\nabla _x F(x^*, y^*) = 0\) and Eq. (13), we get
$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2\\&\quad +\left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + 2\beta _xL_yB_g(y_g^k,y^*)\\&\quad+\alpha _x\left\| x_g^k - x^*\right\| ^2 - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k, \xi ^{k-1}) - \nabla f(x^*),x^{k+1} - x^*\rangle \\&\quad-2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle + 2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2\\&=\left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2\\&\quad +\left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2\\&\quad +2\beta _xL_yB_g(y_g^k,y^*) + \alpha _x\left\| x_g^k - x^*\right\| ^2\\&\quad -2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi ^{k-1}) - \nabla f(x^*),x^{k+1}- x^k + x^k - x_g^k + x_g^k - x^*\rangle \\&\quad -2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle + 2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2. \end{aligned}$$
Using \(\mu _y\)-strong convexity of f and Lines (6) and (10) of the Algorithm 1 and Eq. (15) we get
$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2\\&\quad + \left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\&\quad +2\beta _xL_yB_g(y_g^k,y^*)\\&\quad + \alpha _x\left\| x_g^k - x^*\right\| ^2 - \frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi ^{k-1}) - \nabla f(x^*),x_f^{k+1}- x_g^k\rangle \\&\quad + \frac{2(1-\tau _x)}{\tau _x}\langle \nabla f_{\delta }(x_g^k, \xi ^{k-1}) - \nabla f(x^*),x_f^{k}- x_g^k\rangle - 2B_f(x_g^k,x^*) \\&\quad - \mu _x\left\| x_g^k -x^*\right\| ^2 + 2\delta _x(\xi ^{k-1}) - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\&\quad + 2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2 \\&= \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 \\&\quad + \left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + (\alpha _x-\mu _x)\left\| x_g^k - x^*\right\| ^2 \\&\quad + 2\beta _xL_yB_g(y_g^k,y^*) - 2B_f(x_g^k,x^*) \\&\quad - \frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi ^{k-1}) - \nabla f(x^*),x_f^{k+1}- x_g^k\rangle \\&\quad + \frac{2(1-\tau _x)}{\tau _x}\langle \nabla f_{\delta }(x_g^k, \xi ^{k-1}) - \nabla f(x^*),x_f^{k}- x_g^k\rangle \\&\quad - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\&\quad +2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2+ 2\delta _x(\xi ^{k-1}). \end{aligned}$$
Using Eq. (16), we get
$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 \\ {}&\quad + \left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + (\alpha _x-\mu _x)\left\| x_g^k - x^*\right\| ^2 \\ {}&\quad + 2\beta _xL_yB_g(y_g^k,y^*) - 2B_f(x_g^k,x^*) \\ {}&\quad - \frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi ^{k-1}) - \nabla f(x^*),x_f^{k+1}- x_g^k\rangle \\ {}&\quad + \frac{2(1-\tau _x)}{\tau _x}\left( B_f(x_f^k,x^*) - B_f(x_g^k,x^*)\right) \\ {}&\quad - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\ {}&\quad +2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2+ 2\delta _x(\xi ^{k-1})+\frac{2(1-\tau _x)}{\tau _x}\delta _x(\xi ^{k-1}). \end{aligned}$$
Using Eq. (14), we get
$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2\\&\quad +\left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + (\alpha _x-\mu _x)\left\| x_g^k - x^*\right\| ^2\\&\quad +2\beta _xL_yB_g(y_g^k,y^*) - 2B_f(x_g^k,x^*)\\&\quad -\frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left( B_f(x_f^{k+1},x^*) - B_f(x_g^k,x^*) - \frac{L_x}{2}\left\| x_f^{k+1} - x_g^k\right\| ^2\right) \\&\quad +\frac{2(1-\tau _x)}{\tau _x}\left( B_f(x_f^k,x^*) - B_f(x_g^k,x^*)\right) \\&\quad -2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\&\quad +2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2+\left( \frac{2}{\tau _x}+\frac{2}{\sigma _x}\right) \delta _x(\xi ^{k-1}). \end{aligned}$$
Using Line (10) of the Algorithm 1 we get
$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2\\&+\quad \left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + (\alpha _x-\mu _x)\left\| x_g^k - x^*\right\| ^2\\&\quad +2\beta _xL_yB_g(y_g^k,y^*) - 2B_f(x_g^k,x^*)\\&\quad -\frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left( B_f(x_f^{k+1},x^*) - B_f(x_g^k,x^*) - \frac{L_x\sigma _x^2}{2}\left\| x^{k+1} - x^k\right\| ^2\right) \\&\quad +\frac{2(1-\tau _x)}{\tau _x}\left( B_f(x_f^k,x^*) - B_f(x_g^k,x^*)\right) \\&\quad -2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\&\quad +2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2+\left( \frac{2}{\tau _x}+\frac{2}{\sigma _x}\right) \delta _x(\xi ^{k-1}). \end{aligned}$$
Transforming this inequality we get
$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 \\ {}&\quad + \left( \beta _xL_{xy}^2 + \alpha _x + L_x\sigma _x -\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\ {}&\quad + (\alpha _x-\mu _x)\left\| x_g^k - x^*\right\| ^2 + 2\beta _xL_yB_g(y_g^k,y^*) + \left( \frac{2}{\sigma _x} - \frac{2}{\tau _x}\right) B_f(x_g^k,x^*) \\ {}&\quad - \frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}B_f(x_f^{k+1},x^*) + \left( \frac{2}{\tau _x} - 2\right) B_f(x_f^k,x^*) \\ {}&\quad - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\ {}&\quad +2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2+\left( \frac{2}{\tau _x}+\frac{2}{\sigma _x}\right) \delta _x(\xi ^{k-1}). \end{aligned}$$
Using the definition of \(\tau _x\), \(\alpha _x\) and \(\beta _x\) we get
$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 \\ {}&\quad + \left( \mu _x + L_x\sigma _x -\frac{1}{2\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\ {}&\quad + B_g(y_g^k,y^*) - B_f(x_g^k,x^*) \\ {}&\quad - \frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}B_f(x_f^{k+1},x^*) + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) \\ {}&\quad - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\ {}&+\quad \delta _y(\xi ^{k-1})+\left( \frac{4}{\sigma _x}+1 \right) \delta _x(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2. \end{aligned}$$
\(\square\)
Lemma A.2
Let us introduce several definitions.
$$\begin{aligned} \tau _y = (\sigma _y^{-1}+1/2)^{-1}, \end{aligned}$$
(17)
$$\begin{aligned} \alpha _y=\mu _y, \end{aligned}$$
(18)
$$\begin{aligned} \beta _y=\min \left\{ \frac{1}{2L_x}, \frac{1}{2\eta _y L_{xy}^2} \right\} . \end{aligned}$$
(19)
Then the following inequality holds
$$\begin{aligned} \begin{aligned} \frac{1}{\eta _y}\mathbb {E}_{\xi _x^k, \xi _y^k}\Vert y^{k+1} - y^*\Vert ^2&\le \left( \frac{1}{\eta _y}-\mu _y-\beta _y \mu _{xy}^2 \right) \Vert y^k - y^*\Vert ^2 \\ {}&\quad + \left( \mu _y + L_y\sigma _y - \frac{1}{2\eta _y} \right) \mathbb {E}_{\xi _x^k, \xi _y^k}\Vert y^{k+1} - y^k\Vert ^2 \\ {}&\quad +B_f(x_g^k,x^*)-B_g(y_g^k,y^*)-\frac{2}{\sigma _y}\mathbb {E}_{\xi _x^k, \xi _y^k}B_g(y_f^{k+1},y^*) \\ {}&\quad + \left( \frac{2}{\sigma _y} - 1 \right) B_g(y_f^k,y^*) +2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle A(x^{k+1}-x^*), y^{k+1} - y^* \rangle \\ {}&\quad + \delta _x(\xi ^{k-1}) + \left( \frac{4}{\sigma _y}+1 \right) \delta _y(\xi ^{k-1}) + \beta _y\sigma _f^2 + 2\eta _y\sigma _g^2. \end{aligned} \end{aligned}$$
(20)
Proof
The proof is similar to the proof of the previous lemma. \(\square\)
Lemma A.3
Let \(\eta _x\) be defined as
$$\begin{aligned} \eta _x = \min \left\{ \frac{1}{4(\mu _x + L_x\sigma _x)}, \frac{\omega }{4L_{xy}} \right\} , \end{aligned}$$
and let \(\eta _y\) be defined as
$$\begin{aligned} \eta _y = \min \left\{ \frac{1}{4(\mu _y + L_y\sigma _y)}, \frac{1}{4L_{xy}\omega } \right\} , \end{aligned}$$
where \(\omega > 0\) is a parameter. Let \(\theta\) be defined as
$$\begin{aligned} \theta = \theta (\omega , \sigma _x, \sigma _y) = 1 - \max \{\rho _a(\omega , \sigma _x, \sigma _y),\rho _b(\omega , \sigma _x, \sigma _y),\rho _c(\omega , \sigma _x, \sigma _y),\rho _d(\omega , \sigma _x, \sigma _y)\}, \end{aligned}$$
where
\(\rho _a(\omega , \sigma _x, \sigma _y) = \left[ \max \left\{ \frac{4(\mu _x + L_x\sigma _x)}{\mu _x}, \frac{2}{\sigma _x}, \frac{4(\mu _y + L_y\sigma _y)}{\mu _{y}}, \frac{2}{\sigma _y}, \frac{4L_{xy}}{\mu _x\omega }, \frac{4L_{xy}\omega }{\mu _y} \right\} \right] ^{-1},\)
\(\rho _b(\omega , \sigma _x, \sigma _y) = \left[ \max \left\{ \frac{4(\mu _x + L_x\sigma _x)}{\mu _x}, \frac{2}{\sigma _x}, \frac{8L_x(\mu _y + L_y\sigma _y)}{\mu _{xy}^2}, \frac{2}{\sigma _y}, \frac{2L_{xy}^2}{\mu _{xy}^2}, \frac{8L_x L_{xy}\omega }{\mu _{xy}^2},\frac{4L_{xy}}{\mu _x\omega } \right\} \right] ^{-1},\)
\(\rho _c(\omega , \sigma _x, \sigma _y) = \left[ \max \left\{ \frac{4(\mu _y + L_y\sigma _y)}{\mu _y}, \frac{2}{\sigma _y}, \frac{8L_y(\mu _x + L_x\sigma _x)}{\mu _{yx}^2}, \frac{2}{\sigma _x}, \frac{2L_{xy}^2}{\mu _{yx}^2}, \frac{8L_y L_{xy}}{\mu _{yx}^2\omega },\frac{4L_{xy}\omega }{\mu _y} \right\} \right] ^{-1},\)
\(\rho _d(\omega , \sigma _x, \sigma _y) = \left[ \max \left\{ \frac{8L_y(\mu _x + L_x\sigma _x)}{\mu _{yx}^2}, \frac{2}{\sigma _x}, \frac{8L_x(\mu _y + L_y\sigma _y)}{\mu _{xy}^2}, \frac{2}{\sigma _y}, \frac{8L_y L_{xy}}{\mu _{yx}^2\omega },\frac{8L_x L_{xy}\omega }{\mu _{xy}^2},\frac{2L_{xy}^2}{\mu _{yx}^2},\frac{2L_{xy}^2}{\mu _{xy}^2} \right\} \right] ^{-1}.\)
Let \(\Psi ^k\) be the following Lyapunov function:
$$\begin{aligned} \begin{aligned} \Psi ^k&= \frac{1}{\eta _x}\Vert x^k - x^*\Vert ^2+\frac{1}{\eta _y}\Vert y^k - y^*\Vert ^2+\frac{2}{\sigma _x}B_f(x_f^k, x^*)+\frac{2}{\sigma _y}B_g(y_f^k, y^*) \\ {}&\quad +\frac{1}{4\eta _y}\Vert y^k - y^{k-1}\Vert ^2-2\langle y^k-y^{k-1}, A(x^k-x^*) \rangle . \end{aligned} \end{aligned}$$
(21)
Then, the following inequalities hold
$$\begin{aligned} \Psi ^k \ge \frac{3}{4\eta _x}\Vert x^k - x^*\Vert ^2 + \frac{1}{\eta _y}\Vert y^k - y^*\Vert ^2, \end{aligned}$$
(22)
$$\begin{aligned} \mathbb {E}\Psi ^{k+1}\le \theta \mathbb {E}\Psi ^k + \frac{4}{1-\theta }\left( \delta _x+\delta _y\right) + \frac{1}{2}\left( \frac{1}{L_x} + \frac{\omega }{L_{xy}}\right) \sigma _f^2 + \frac{1}{2}\left( \frac{1}{L_y} + \frac{1}{L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$
(23)
Proof
The proof of this lemma is similar to the proof of Lemma B.4. in Kovalev et al. (2021).
After adding up (A.3) and (A.2) we get
$$\begin{aligned} \mathrm {(LHS)}&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 + \left( \frac{1}{\eta _y} - \mu _y - \beta _y\mu _{xy}^2\right) \left\| y^{k} - y^*\right\| ^2 \\ {}&\quad + \left( \mu _x + L_x\sigma _x -\frac{1}{2\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\ {}&\quad + \left( \mu _y + L_y\sigma _y -\frac{1}{2\eta _y}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) \\ {}&\quad + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} - y_m^k,\textbf{A}(x^{k+1} - x^*)\rangle \\ {}&\quad + \left( 2 + \frac{4}{\sigma _x}\right) \delta _x(\xi ^{k-1})+\left( 2 + \frac{4}{\sigma _y}\right) \delta _y(\xi ^{k-1}) + (\beta _y + 2\eta _x)\sigma _f^2 + (\beta _x + 2\eta _y)\sigma _g^2, \end{aligned}$$
where \(\mathrm {(LHS)}\) is given as
$$\begin{aligned} \mathrm {(LHS)}&= \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2 + \frac{1}{\eta _y}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^*\right\| ^2 \\ {}&\quad + \frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}B_f(x_f^{k+1},x^*) + \frac{2}{\sigma _y}\mathbb {E}_{\xi _x^k, \xi _y^k}B_g(y_f^{k+1},y^*). \end{aligned}$$
Using Line (6) of the Algorithm 1 we get
$$\begin{aligned} \mathrm {(LHS)}&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 + \left( \frac{1}{\eta _y} - \mu _y - \beta _y\mu _{xy}^2\right) \left\| y^{k} - y^*\right\| ^2 \\ {}&\quad + \left( \mu _x + L_x\sigma _x -\frac{1}{2\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\ {}&\quad + \left( \mu _y + L_y\sigma _y -\frac{1}{2\eta _y}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k+1} - x^*)\rangle \\ {}&\quad + \left( 2 + \frac{4}{\sigma _x}\right) \delta _x(\xi ^{k-1})+\left( 2 + \frac{4}{\sigma _y}\right) \delta _y(\xi ^{k-1}) + (\beta _y + 2\eta _x)\sigma _f^2 + (\beta _x + 2\eta _y)\sigma _g^2. \end{aligned}$$
Using Assumption (1.7) we get
$$\begin{aligned} \mathrm {(LHS)}&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 + \left( \frac{1}{\eta _y} - \mu _y - \beta _y\mu _{xy}^2\right) \left\| y^{k} - y^*\right\| ^2 \\ {}&\quad + \left( \mu _x + L_x\sigma _x -\frac{1}{2\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\ {}&\quad + \left( \mu _y + L_y\sigma _y -\frac{1}{2\eta _y}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\quad + 2\theta L_{xy}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^k - y^{k-1} \right\| \left\| x^{k+1} - x^k \right\| \\ {}&\quad + \left( 2 + \frac{4}{\sigma _x}\right) \delta _x(\xi ^{k-1})+\left( 2 + \frac{4}{\sigma _y}\right) \delta _y(\xi ^{k-1}) + (\beta _y + 2\eta _x)\sigma _f^2 + (\beta _x + 2\eta _y)\sigma _g^2. \end{aligned}$$
Using the definition of \(\eta _x\) and \(\eta _y\) and the definition of \(\theta < 1\) we get
$$\begin{aligned} \mathrm {(LHS)}&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 + \left( \frac{1}{\eta _y} - \mu _y - \beta _y\mu _{xy}^2\right) \left\| y^{k} - y^*\right\| ^2 \\ {}&\quad - \frac{1}{4\eta _x} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 - \frac{1}{4\eta _y} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\quad + \frac{\theta }{2\sqrt{\eta _x\eta _y}}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^k - y^{k-1} \right\| \left\| x^{k+1} - x^k \right\| \\ {}&\quad + \frac{4}{1-\theta }\left( \delta _x(\xi ^{k-1})+\delta _y(\xi ^{k-1})\right) + \left( \beta _y + \frac{\omega }{2L_{xy}}\right) \sigma _f^2 + \left( \beta _x + \frac{1}{2L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$
Transforming this inequality we get
$$\begin{aligned} \mathrm {(LHS)}&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 + \left( \frac{1}{\eta _y} - \mu _y - \beta _y\mu _{xy}^2\right) \left\| y^{k} - y^*\right\| ^2 \\ {}&\quad - \frac{1}{4\eta _x} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 - \frac{1}{4\eta _y} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\quad + \frac{\theta }{4\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + \frac{\theta }{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 \\ {}&\quad + \frac{4}{1-\theta }\left( \delta _x(\xi ^{k-1})+\delta _y(\xi ^{k-1})\right) + \left( \beta _y + \frac{\omega }{2L_{xy}}\right) \sigma _f^2 + \left( \beta _x + \frac{1}{2L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$
Transforming further
$$\begin{aligned} \mathrm {(LHS)}&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 + \left( \frac{1}{\eta _y} - \mu _y - \beta _y\mu _{xy}^2\right) \left\| y^{k} - y^*\right\| ^2 \\ {}&\quad + \frac{\theta }{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 - \frac{1}{4\eta _y} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\quad + \frac{4}{1-\theta }\left( \delta _x(\xi ^{k-1})+\delta _y(\xi ^{k-1})\right) + \left( \beta _y + \frac{\omega }{2L_{xy}}\right) \sigma _f^2 + \left( \beta _x + \frac{1}{2L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$
Using the definition of \(\beta _x\) and \(\beta _y\) we get
$$\begin{aligned} \mathrm {(LHS)}&\le \left( 1 - \eta _x\mu _x - \min \left\{ \frac{\eta _x\mu _{yx}^2}{2L_y},\frac{\mu _{yx}^2}{2L_{xy}^2}\right\} \right) \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 \\ {}&\quad + \left( 1 - \eta _y\mu _y - \min \left\{ \frac{\eta _y\mu _{xy}^2}{2L_x},\frac{\mu _{xy}^2}{2L_{xy}^2}\right\} \right) \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2 \\ {}&\quad + \frac{\theta }{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 - \frac{1}{4\eta _y} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\quad + \frac{4}{1-\theta }\left( \delta _x(\xi ^{k-1})+\delta _y(\xi ^{k-1})\right) + \frac{1}{2}\left( \frac{1}{L_x} + \frac{\omega }{L_{xy}}\right) \sigma _f^2 + \frac{1}{2}\left( \frac{1}{L_y} + \frac{1}{L_{xy}\omega }\right) \sigma _g^2 \end{aligned}$$
Transforming this inequality
$$\begin{aligned} \mathrm {(LHS)}&\le \left( 1 - \max \left\{ \eta _x\mu _x, \min \left\{ \frac{\eta _x\mu _{yx}^2}{2L_y},\frac{\mu _{yx}^2}{2L_{xy}^2}\right\} \right\} \right) \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 \\ {}&\quad + \left( 1 - \max \left\{ \eta _y\mu _y, \min \left\{ \frac{\eta _y\mu _{xy}^2}{2L_x},\frac{\mu _{xy}^2}{2L_{xy}^2}\right\} \right\} \right) \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2 \\ {}&\quad + \frac{\theta }{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 - \frac{1}{4\eta _y} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\quad + \frac{4}{1-\theta }\left( \delta _x(\xi ^{k-1})+\delta _y(\xi ^{k-1})\right) + \frac{1}{2}\left( \frac{1}{L_x} + \frac{\omega }{L_{xy}}\right) \sigma _f^2 + \frac{1}{2}\left( \frac{1}{L_y} + \frac{1}{L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$
Using the definition of \(\theta\) we get
$$\begin{aligned} \mathrm {(LHS)}&\le \theta \left( \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2 + \frac{1}{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 \right) \\ {}&\quad + \theta \left( -2\langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle + \frac{2}{\sigma _x}B_f(x_f^k,x^*) + \frac{2}{\sigma _y}B_g(y_f^k,y^*) \right) \\ {}&\quad - \frac{1}{4\eta _y} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle \\ {}&\quad + \frac{4}{1-\theta }\left( \delta _x(\xi ^{k-1})+\delta _y(\xi ^{k-1})\right) + \frac{1}{2}\left( \frac{1}{L_x} + \frac{\omega }{L_{xy}}\right) \sigma _f^2 + \frac{1}{2}\left( \frac{1}{L_y} + \frac{1}{L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$
After taking the expectation over all random variables, rearranging and using the definition of \(\Psi ^k\), using the fact that \(\mathbb {E}\delta _x(\xi ^{k-1}) \le \delta _x, \mathbb {E}\delta _y(\xi ^{k-1}) \le \delta _y\) we get
$$\begin{aligned} \mathbb {E}\Psi ^{k+1}\le \theta \mathbb {E}\Psi ^k + \frac{4}{1-\theta }\left( \delta _x+\delta _y\right) + \frac{1}{2}\left( \frac{1}{L_x} + \frac{\omega }{L_{xy}}\right) \sigma _f^2 + \frac{1}{2}\left( \frac{1}{L_y} + \frac{1}{L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$
Finally, using the definition of \(\Psi ^k\), \(\eta _x\) and \(\eta _y\) we get
$$\begin{aligned} \Psi ^k&\ge \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2 + \frac{1}{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 - 2\langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\ge \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2 + \frac{1}{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 - 2L_{xy}\left\| y^k -y^{k-1} \right\| \left\| x^{k} - x^* \right\| \\ {}&\ge \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2 + \frac{1}{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 - \frac{1}{2\sqrt{\eta _x\eta _y}}\left\| y^k -y^{k-1} \right\| \left\| x^{k} - x^* \right\| \\ {}&\ge \frac{3}{4\eta _x}\left\| x^{k} - x^*\right\| ^2 + \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2. \end{aligned}$$
\(\square\)
Back to proof of the Theorem (2.7).
Let \(\Sigma ^2 \triangleq \left( \frac{1}{L_x}+\frac{\omega }{L_{xy}} \right) \sigma _f^2+\left( \frac{1}{L_y}+\frac{1}{L_{xy}\omega } \right) \sigma _g^2\). Then
$$\begin{aligned} \mathbb {E}\Psi ^k&\le \theta ^k \Psi ^0 + \left( \frac{4}{1-\theta }(\delta _x+\delta _y)+\frac{\Sigma ^2}{2}\right) (1 + \theta + \theta ^2 + \dots ) \\ {}&\le \theta ^k \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )}, \end{aligned}$$
$$\begin{aligned} \theta ^k \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )} \ge \mathbb {E}\Psi ^k \ge \frac{3}{4\eta _x}\mathbb {E}\Vert x^k - x^*\Vert ^2 + \frac{1}{\eta _y}\mathbb {E}\Vert y^k - y^*\Vert ^2. \end{aligned}$$
Using the definitions of \(\eta _x\) and \(\eta _y\), we get
$$\begin{aligned} \mathbb {E}\Vert x^k - x^*\Vert ^2 \le \frac{\omega }{3L_{xy}}\left( \theta ^k \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )}\right) , \end{aligned}$$
(24)
$$\begin{aligned} \mathbb {E}\Vert y^k - y^*\Vert ^2 \le \frac{1}{4L_{xy}\omega }\left( \theta ^k \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )}\right) . \end{aligned}$$
(25)
Also for such definitions we know from Kovalev et al. (2021)
$$\begin{aligned} \frac{1}{\rho _a}&\le 4 + 4\max \left\{ \sqrt{\frac{L_x}{\mu _x}}, \sqrt{\frac{L_y}{\mu _y}},\frac{L_{xy}}{\sqrt{\mu _x\mu _y}}\right\} \\&\quad \text { for } \omega = \sqrt{\frac{\mu _y}{\mu _x}}, \sigma _x = \sqrt{\frac{\mu _x}{2L_x}},\sigma _y = \sqrt{\frac{\mu _y}{2L_y}},\\ \frac{1}{\rho _b}&\le 4+8\max \left\{ \frac{\sqrt{L_xL_y}}{\mu _{xy}}, \frac{L_{xy}}{\mu _{xy}}\sqrt{\frac{L_x}{\mu _x}}, \frac{L_{xy}^2}{\mu _{xy}^2} \right\} \\&\quad \text { for } \omega = \sqrt{\frac{\mu _{xy}^2}{2\mu _xL_x}}, \sigma _x = \sqrt{\frac{\mu _x}{2L_x}},\sigma _y =\min \left\{ 1,\sqrt{\frac{\mu _{xy}^2}{4L_xL_y}}\right\} ,\\ \frac{1}{\rho _c}&\le 4+8\max \left\{ \frac{\sqrt{L_xL_y}}{\mu _{yx}}, \frac{L_{xy}}{\mu _{yx}}\sqrt{\frac{L_y}{\mu _y}}, \frac{L_{xy}^2}{\mu _{yx}^2} \right\} \\&\quad \text { for } \omega = \sqrt{\frac{2\mu _yL_y}{\mu _{yx}^2}},\sigma _x =\min \left\{ 1,\sqrt{\frac{\mu _{yx}^2}{4L_xL_y}}\right\} ,\sigma _y = \sqrt{\frac{\mu _y}{2L_y}},\\ \frac{1}{\rho _d}&\le 2+8\max \left\{ \frac{\sqrt{L_xL_y}L_{xy}}{\mu _{xy}\mu _{yx}}, \frac{L_{xy}^2}{\mu _{yx}^2}, \frac{L_{xy}^2}{\mu _{xy}^2} \right\} \\&\quad \text { for } \omega = \frac{\mu _{xy}}{\mu _{yx}}\sqrt{\frac{L_y}{L_x}}, \sigma _x = \min \left\{ 1,\sqrt{\frac{\mu _{yx}^2}{4L_xL_y}}\right\} ,\sigma _y =\min \left\{ 1,\sqrt{\frac{\mu _{xy}^2}{4L_xL_y}}\right\} , \\ \frac{1}{1-\theta }&= \min \{ \rho _a^{-1}, \rho _b^{-1}, \rho _c^{-1}, \rho _d^{-1}\}. \end{aligned}$$
Note, that adding up batches and choosing \(\omega = \sqrt{\frac{\mu _y}{\mu _x}}, \sigma _x = \sqrt{\frac{\mu _x}{2L_x}},\sigma _y = \sqrt{\frac{\mu _x}{2L_x}}\) proves the Theorem (2.7).
Rewriting inequalities in batch setting and assuming \(\delta _x=\delta _y=0\) we get
$$\begin{aligned} \mathbb {E}\Vert x^k - x^*\Vert ^2 \le \frac{\omega }{3L_{xy}}\left( \theta ^k \Psi ^0+\frac{1}{2(1-\theta )}\left( \left( \frac{1}{L_x}+\frac{\omega }{L_{xy}}\right) \frac{\sigma _f^2}{r_f}+\left( \frac{1}{L_y}+\frac{1}{L_{xy}\omega } \right) \frac{\sigma _g^2}{r_g}\right) \right) , \\ \mathbb {E}\Vert y^k - y^*\Vert ^2 \le \frac{1}{4L_{xy}\omega }\left( \theta ^k \Psi ^0+\frac{1}{2(1-\theta )}\left( \left( \frac{1}{L_x}+\frac{\omega }{L_{xy}}\right) \frac{\sigma _f^2}{r_f}+\left( \frac{1}{L_y}+\frac{1}{L_{xy}\omega }\right) \frac{\sigma _g^2}{r_g}\right) \right) . \end{aligned}$$
Therefore, we can estimate the number of algorithm iterations \(N = \mathcal {O} \left( \frac{1}{1-\theta }\log {\frac{C}{\varepsilon }}\right)\), where C is polynomial and not depend on \(\varepsilon\). Rewriting it we obtain \(N = \mathcal {O} \left( \min \{ \rho _a^{-1}, \rho _b^{-1}, \rho _c^{-1}, \rho _d^{-1}\}\log {\frac{C}{\varepsilon }}\right) .\)
It is sufficient to take such batch sizes \(r_f = \Bigg \lceil \frac{\max \{\omega , \omega ^{-1}\}}{2L_{xy}(1-\theta )\varepsilon }\left( \frac{1}{L_x}+\frac{\omega }{L_{xy}}\right) \sigma _f^2\Bigg \rceil\), \(r_g = \Bigg \lceil \frac{\max \{\omega , \omega ^{-1}\}}{2L_{xy}(1-\theta )\varepsilon }\left( \frac{1}{L_y}+\frac{1}{L_{xy}\omega }\right) \sigma _g^2\Bigg \rceil .\)
Rewriting it with the selected constants
$$\begin{aligned} r_f = \Bigg \lceil \max \left\{ \sqrt{\frac{L_x}{\mu _x}}, \sqrt{\frac{L_y}{\mu _y}},\frac{L_{xy}}{\sqrt{\mu _x\mu _y}}\right\} \frac{\mu }{2L_{xy}\sqrt{\mu _x\mu _y}\varepsilon }\left( \frac{1}{L_x} + \frac{1}{L_{xy}}\sqrt{\frac{\mu _y}{\mu _x}}\right) \sigma _f^2 \Bigg \rceil , \\ r_g = \Bigg \lceil \max \left\{ \sqrt{\frac{L_x}{\mu _x}}, \sqrt{\frac{L_y}{\mu _y}},\frac{L_{xy}}{\sqrt{\mu _x\mu _y}}\right\} \frac{\mu }{2L_{xy}\sqrt{\mu _x\mu _y}\varepsilon }\left( \frac{1}{L_y} + \frac{1}{L_{xy}}\sqrt{\frac{\mu _x}{\mu _y}}\right) \sigma _g^2\Bigg \rceil , \end{aligned}$$
where \(\mu = \max \{\mu _x, \mu _y\}\).
Appendix B: Decentralized setting
Let us get an Algorithm 4 from the Algorithm (2), by multiplying every line by \(\frac{{\textbf {1}}}{n}\), where 1 is a column of 1.
Algorithm 4 shows what happens to the average values at the nodes.
Supporting values X and Y to be in the neighborhood of \(\mathcal {C}(d_x)\) and \(\mathcal {C}(d_y)\) and using Lemma (2.1), conditions of Theorem (2.7) hold.
Using Assumption (1.5), Assumption (1.6), we get
$$\begin{aligned} \begin{aligned} \mathbb {E}_{\xi _{x, k}}\Vert f_{\delta }(\overline{x}_g^k, \xi _{x, k}) - \nabla f_{\delta }(\overline{x}_g^k)\Vert ^2 \le \frac{\sum _{i=1}^n \sigma _{f, i}^2/r_{f, i}}{n^2} \triangleq \frac{\sigma _{F, r}^2}{n}, \\ \mathbb {E}_{\xi _{y, k}}\Vert g_{\delta }(\overline{y}_g^k, \xi _{y, k}) - \nabla g_{\delta }(\overline{y}_g^k)\Vert ^2 \le \frac{\sum _{i=1}^n \sigma _{g, i}^2/r_{g, i}}{n^2} \triangleq \frac{\sigma _{G,r}^2}{n}. \end{aligned} \end{aligned}$$
(26)
Let us support the number of iterations of Consensus to be sufficiently big to guarantee \(\mathbb {E}\Vert X^k - \overline{X^k}\Vert \le \sqrt{\delta '}\) and \(\mathbb {E}\Vert Y^k - \overline{Y^k}\Vert \le \sqrt{\delta '}.\)
Introducing some definitions, which correspond to Lemma (2.1)
$$\begin{aligned} \delta _x = \frac{1}{2n}\left( \frac{L_{lx}^2}{L_{x}} + \frac{2L_{lx}^2}{\mu _{x}} + L_{lx} - \mu _{lx} \right) \delta ', \\ \delta _y = \frac{1}{2n}\left( \frac{L_{ly}^2}{L_{y}} + \frac{2L_{ly}^2}{\mu _{y}} + L_{ly} - \mu _{ly} \right) \delta ', \end{aligned}$$
\(\hat{L_x} = 2L_{x}\), \(\hat{L_y} = 2L_{y}\), \(\hat{\mu _x} = \mu _{x}/2\), \(\hat{\mu _y} = \mu _{y}/2.\)
Consider the iteration \(k \ge 1\). Assuming, that \(\mathbb {E}\Vert X^t - \overline{X^t}\Vert \le \sqrt{\delta '}\) and \(\mathbb {E}\Vert Y^t - \overline{Y^t}\Vert \le \sqrt{\delta '}\) for \(t=0,1,\dots ,k\), we are going to prove it for \(t=k+1\), using constant number of consensus iterations.
Using Line (10) and (6) of Algorithm 2, we get
$$\begin{aligned} X_g^k = \tau _x X^k + (1-\tau _x) X_f^k = (\tau _x+(1-\tau _x)\sigma _x) X^k - (1-\tau _x)\sigma _x X^{k-1} + (1-\tau _x)X_g^{k-1}. \end{aligned}$$
Define \(V^k = X_g^k-\sigma _x X^k\). Using \(X_g^0 = X^0\), we get \(V^0 = (1-\sigma _x) X^0, \Vert V^0-\overline{V^0}\Vert \le (1-\sigma _x)\sqrt{\delta '}.\)
$$\begin{aligned} V^k&= (1-\sigma _x)\tau _x X^k + (1-\tau _x)V^{k-1}, \\ V^k-\overline{V^k}&= (1-\sigma _x)\tau _x \left( X^k-\overline{X^k} \right) + (1-\tau _x)\left( V^{k-1}-\overline{V^{k-1}} \right) , \\ \mathbb {E}\Vert V^k-\overline{V^k}\Vert&\le (1-\sigma _x)\tau _x\sqrt{\delta '} + (1-\tau _x)(1-\sigma _x)\sqrt{\delta '} = (1-\sigma _x)\sqrt{\delta '}. \end{aligned}$$
Let us now estimate \(X_f^k\), \(k\ge 1\). Using Line (10), we get
$$\begin{aligned} X_f^k = V^{k-1} + \sigma _x X^k, \\ \mathbb {E}\Vert X_f^k - \overline{X_f^k}\Vert \le \sqrt{\delta '}. \end{aligned}$$
Let us now estimate \(X_g^k\) and \(Y_m^k\). Using Line (6) and (5), we get
$$\begin{aligned} \mathbb {E}\Vert X_g^k - \overline{X_g^k}\Vert \le \sqrt{\delta '}, \\ \mathbb {E}\Vert Y_m^k - \overline{Y_m^k}\Vert \le (1+2\theta )\sqrt{\delta '}. \end{aligned}$$
The estimations for \(Y_g^k\), \(Y_f^k\) are similar.
Let us now estimate \(\mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert\). Using Line (8), we get
$$\begin{aligned} U^{k+1}-\overline{U^{k+1}}&=(1-\eta _x\alpha _x)\left( X^k-\overline{X^k}\right) +\eta _x\alpha _x\left( X_g^k-\overline{X_g^k}\right) \\ {}&\quad - \eta _x\beta _x A^T\left( A \left( X^k-\overline{X^k}\right) -\left( \nabla ^r G(Y_g^k, \xi _{y, k})-\overline{\nabla ^r G(Y_g^k, \xi _{y, k})}\right) \right) \\ {}&\quad - \eta _x\left( \left( \nabla ^r F(X_g^k, \xi _{x, k})-\overline{\nabla ^r F(X_g^k, \xi _{x, k}})\right) + A^T \left( Y_m^k-\overline{Y_m^k}\right) \right) . \end{aligned}$$
Using that \(\eta _x\alpha _x \le 1\) and previous estimations, we get
$$\begin{aligned} \mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert&\le (1-\eta _x\alpha _x)\sqrt{\delta '} + \eta _x\alpha _x\sqrt{\delta '} + \eta _x\beta _x L_{xy}^2\sqrt{\delta '} \\ {}&\quad + \eta _x\beta _x L_{xy}\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert +\eta _x\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert +\eta _x L_{xy}(1+2\theta )\sqrt{\delta '} \\ {}&= (1+\eta _x\beta _x L_{xy}^2+\eta _x L_{xy}(1+2\theta ))\sqrt{\delta '}+\eta _x\beta _x L_{xy}\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert \\ {}&\quad + \eta _x\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert . \end{aligned}$$
Getting estimations for \(\mathbb {E}\left\| \nabla ^r G(Y_g^k, \xi _{y, k}) \right\|\) and \(\mathbb {E}\left\| \nabla ^r F(X_g^k, \xi _{x, k}) \right\|\).
$$\begin{aligned} \mathbb {E}\left\| \nabla ^r F(X_g^k, \xi _{x, k}) \right\|&\le \mathbb {E}\left\| \nabla ^r F(X_g^k, \xi _{x, k}) - \nabla F(X_g^k) \right\| + \mathbb {E}\left\| \nabla F(X_g^k) - \nabla F(\overline{X_g^k}) \right\| \\ {}&\quad + \mathbb {E}\Vert \nabla F(\overline{X_g^k}) - \nabla F(X^*)\Vert + \Vert \nabla F(X^*)\Vert \\ {}&\le \left( \mathbb {E}\left\| \nabla ^r F(X_g^k, \xi _{x, k}) - \nabla F(X_g^k)\right\| ^2\right) ^{\frac{1}{2}} + L_{lx}\mathbb {E}\left\| X_g^k-\overline{X_g^k} \right\| \\ {}&\quad + L_{x}\mathbb {E}\left\| \overline{X_g^k}-X^* \right\| + \left\| \nabla F(X^*) \right\| \\ {}&\le \left( \sum _{i=1}^n\sigma _{f, i}^2/r_{f, i}\right) ^{\frac{1}{2}} + L_{lx}\sqrt{\delta '} + L_{x}\sqrt{n}\mathbb {E}\left\| \overline{x_g^k}-x^* \right\| + \left\| \nabla F(X^*) \right\| . \end{aligned}$$
Let us define \(M_x\)
$$\begin{aligned} M_x^2=\frac{\omega }{3L_{xy}}\left( \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )} \right) , \\ \Sigma ^2 = \left( \frac{1}{2L_{x}}+\frac{\omega }{L_{xy}} \right) \frac{\sigma _{F, r}^2}{n}+\left( \frac{1}{2L_{y}}+\frac{1}{L_{xy}\omega } \right) \frac{\sigma _{G, r}^2}{n}. \end{aligned}$$
We choose constants the same as in Eqs. (24) and (25) for Algorithm 4.
Now we are going to estimate \(\left\| \overline{x_g^k}-x^* \right\|\). As we know from Eq. (24) and from Eq. (26)
$$\begin{aligned}{} & {} \mathbb {E}\left\| x^k - x^*\right\| ^2 \le M_x^2,\\{} & {} \mathbb {E}\left\| x^k - x^* \right\| \le \sqrt{\mathbb {E}\left\| x^k - x^*\right\| ^2} \le M_x. \end{aligned}$$
Let \(k \ge 1\). Using Line (10) and (6) of Algorithm 4, we get
$$\begin{aligned} \overline{x_g^k} = \tau _x\overline{x^k} + (1-\tau _x)\overline{x_f^k} = (\tau _x+(1-\tau _x)\sigma _x) \overline{x^k} - (1-\tau _x)\sigma _x \overline{x^{k-1}} + (1-\tau _x)\overline{x_g^{k-1}}. \end{aligned}$$
Let’s define \(\overline{v^k} = \overline{x_g^k} - \sigma _x\overline{x^k}\) and \(v^* = (1-\sigma _x)x^*\). \(\overline{v^0} = (1-\sigma _x)\overline{x^0}\), therefore \(\Vert \overline{v^0}-v^*\Vert \le (1-\sigma _x)M_x\).
$$\begin{aligned} \overline{v^k} = \tau _x(1-\sigma _x)\overline{x^k}+(1-\tau _x)\overline{v^{k-1}}. \end{aligned}$$
Firstly, we want to estimate \(\mathbb {E}\Vert \overline{v^k}-v^*\Vert\).
$$\begin{aligned} \mathbb {E}\Vert \overline{v^k}-v^*\Vert&\le \tau _x(1-\sigma _x)\mathbb {E}\Vert \overline{x^k}-x^*\Vert +(1-\tau _x)\mathbb {E}\Vert \overline{v^{k-1}}-v^*\Vert \\ {}&\le (\tau _x(1-\sigma _x)+(1-\tau _x)(1-\sigma _x))M_x=(1-\sigma _x)M_x. \end{aligned}$$
Using Line (10), we get
$$\begin{aligned} & \overline{{x_{f}^{k} }} = \overline{{v^{{k - 1}} }} + \sigma _{x} \overline{{x^{k} }} , \\ & {\mathbb{E}}\overline{{x_{f}^{k} }} - x^{*} \le {\mathbb{E}}\overline{{v^{{k - 1}} }} - v^{*} {} + \sigma _{x} {\mathbb{E}}\overline{{x^{k} }} - x^{*} {} \le (1 - \sigma _{x} )M_{x} + \sigma _{x} M_{x} = M_{x} . \\ \end{aligned}$$
Let’s estimate \(\mathbb {E}\Vert \overline{x_g^k}-x^*\Vert\). Using Line (5), we get
$$\begin{aligned} \mathbb {E}\Vert \overline{x_g^k}-x^*\Vert \le \tau _x\mathbb {E}\Vert \overline{x^k}-x^*\Vert +(1-\tau _x)\mathbb {E}\Vert \overline{x_f^k}-x^*\Vert \le M_x. \end{aligned}$$
Returning to \(\mathbb {E}\left\| \nabla ^r F(X_g^k, \xi _{x, k}) \right\|\)
$$\begin{aligned} \mathbb {E}\left\| \nabla ^r F(X_g^k, \xi _{x, k}) \right\| \le \sqrt{n\sigma _{F, r}^2} + L_{lx}\sqrt{\delta '} + L_{x}\sqrt{n}M_x + \left\| \nabla F(X^*) \right\| . \end{aligned}$$
Let’s define \(M_y\)
$$\begin{aligned} M_y^2=\frac{1}{4L_{xy}\omega }\left( \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )} \right) . \end{aligned}$$
Then we can estimate \(\mathbb {E}\left\| \nabla ^r G(Y_g^k, \xi _{y, k}) \right\|\) in a similar way
$$\begin{aligned} \mathbb {E}\left\| \nabla ^r G(Y_g^k, \xi _{y, k}) \right\| \le \sqrt{n\sigma _{G, r}^2} + L_{ly}\sqrt{\delta '} + L_{y}\sqrt{n}M_y + \left\| \nabla G(Y^*) \right\| . \end{aligned}$$
Lemma B.1
$$\begin{aligned} \max \left\{ \mathbb {E}\left\| U^{k+1}-\overline{U^{k+1}} \right\| , \mathbb {E}\left\| W^{k+1}-\overline{W^{k+1}} \right\| \right\} \le D, \end{aligned}$$
where
$$\begin{aligned}{} & {} D = \max \left\{ D_{x,1}\sqrt{\delta '}+D_{x,2}, D_{y,1}\sqrt{\delta '}+D_{y,2} \right\} , \end{aligned}$$
(27)
$$\begin{aligned}{} & {} D_{y, 2} =\frac{L_{xy}}{2\mu _{y}}D_{x,2}+ \frac{1}{2L_{xy}}\left( \sqrt{n\sigma _{F,r}^2} + L_{x}\sqrt{n}M_x+\Vert \nabla F(X^*)\Vert \right) \nonumber \\{} &\quad\quad\quad {} +\frac{1}{2\mu _{y}}\left( \sqrt{n\sigma _{G,r}^2} + L_{y}\sqrt{n}M_y+\Vert \nabla G(Y^*)\Vert \right) , \end{aligned}$$
(28)
$$\begin{aligned}{} & {} D_{y,1} =\frac{3}{2}+\frac{L_{xy}}{2\mu _{y}}D_{x,1}+\frac{L_{lx}}{2L_{xy}}+\frac{L_{ly}}{2\mu _{y}}, \end{aligned}$$
(29)
$$\begin{aligned}{} & {} D_{x,2}= \frac{1}{2L_{xy}}\left( \sqrt{n\sigma _{G,r}^2} + L_{y}\sqrt{n}M_y+\Vert \nabla G(Y^*)\Vert \right) \nonumber \\{} &\quad\quad\quad {} +\frac{1}{2\mu _{x}}\left( \sqrt{n\sigma _{F,r}^2} + L_{x}\sqrt{n}M_x+\Vert \nabla F(X^*)\Vert \right) , \end{aligned}$$
(30)
$$\begin{aligned}{} & {} D_{x,1} = \frac{3}{2}+\frac{L_{xy}}{2\mu _{x}}(1+2\theta )+\frac{L_{ly}}{2L_{xy}}+\frac{L_{lx}}{2\mu _{x}}, \end{aligned}$$
(31)
$$\begin{aligned}{} & {} M_x^2=\frac{\omega }{3L_{xy}}\left( \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )}\right) , \nonumber \\{} & {} M_y^2=\frac{1}{4L_{xy}\omega }\left( \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )}\right) , \end{aligned}$$
(32)
$$\begin{aligned}{} & {} \Sigma ^2 = \left( \frac{1}{2L_{x}}+\frac{\omega }{L_{xy}} \right) \frac{\sigma _{F, r}^2}{n} + \left( \frac{1}{2L_{y}}+\frac{1}{L_{xy}\omega }\right) \frac{\sigma _{G, r}^2}{n}. \end{aligned}$$
(33)
Proof
$$\begin{aligned} \mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert&\le (1+\eta _x\beta _x L_{xy}^2+\eta _x L_{xy}(1+2\theta ))\sqrt{\delta '} \\ {}&\quad + \eta _x\beta _x L_{xy}\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert +\eta _x\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert . \end{aligned}$$
Using the definition of \(\eta _x\), \(\beta _x\) and estimations on gradients, we get
$$\begin{aligned} \begin{aligned} \mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert&\le \left( 1+\frac{1}{2}+\frac{L_{xy}}{4\hat{\mu _x}}(1+2\theta )\right) \sqrt{\delta '}+\frac{1}{2L_{xy}}\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert \\ {}&\quad + \frac{1}{4\hat{\mu _x}}\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert \\ {}&\le \left( \frac{3}{2}+\frac{L_{xy}}{4\hat{\mu _x}}(1+2\theta )+\frac{L_{ly}}{2L_{xy}}+\frac{L_{lx}}{4\hat{\mu _x}}\right) \sqrt{\delta '} \\ {}&\quad + \frac{1}{2L_{xy}}\left( \sqrt{n\sigma _{G, r}^2} + L_{y}\sqrt{n}M_y+\Vert \nabla G(Y^*)\Vert \right) \\ {}&\quad + \frac{1}{4\hat{\mu _x}}\left( \sqrt{n\sigma _{F, r}^2} + L_{x}\sqrt{n}M_x+\Vert \nabla F(X^*)\Vert \right) = D_{x,1}\sqrt{\delta '}+D_{x,2}. \end{aligned} \end{aligned}$$
Let’s estimate \(\mathbb {E}\Vert W^{k+1}-\overline{W^{k+1}}\Vert\)
$$\begin{aligned} W^{k+1}-\overline{W^{k+1}}&=(1-\eta _y\alpha _y)\left( Y^k-\overline{Y^k}\right) +\eta _y\alpha _y\left( Y_g^k-\overline{Y_g^k}\right) \\ {}&\quad - \eta _y\beta _y A\left( A^T \left( Y^k-\overline{Y^k}\right) +\left( \nabla ^r F(X_g^k, \xi _{x, k})-\overline{\nabla ^r F(X_g^k, \xi _{x, k})}\right) \right) \\ {}&\quad - \eta _y\left( \left( \nabla ^r G(Y_g^k, \xi _{y, k})-\overline{\nabla ^r G(Y_g^k, \xi _{y, k})}\right) - A \left( U^{k+1}-\overline{U^{k+1}}\right) \right) . \end{aligned}$$
Using that \(\eta _y\alpha _y \le 1\) and previous estimations, we get
$$\begin{aligned} \mathbb {E}\Vert W^{k+1}-\overline{W^{k+1}}\Vert&\le (1-\eta _y\alpha _y)\sqrt{\delta '} + \eta _y\alpha _y\sqrt{\delta '} + \eta _y\beta _y L_{xy}^2\sqrt{\delta '} \\ {}&\quad + \eta _y\beta _y L_{xy}\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert \\ {}&\quad + \eta _y\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert +\eta _y L_{xy}\mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert \\ {}&= (1+\eta _y\beta _y L_{xy}^2)\sqrt{\delta '}+\eta _y L_{xy}\mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert \\ {}&\quad + \eta _y\beta _y L_{xy}\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert + \eta _y\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert . \end{aligned}$$
Using the definition of \(\beta _y\), \(\eta _y\) and gradient estimations, we get
$$\begin{aligned} \mathbb {E}\Vert W^{k+1}-\overline{W^{k+1}}\Vert&\le \left( 1+\ \frac{1}{2}\right) \sqrt{\delta '}+\frac{L_{xy}}{4\hat{\mu _y}}\mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert +\frac{1}{2L_{xy}}\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert \\ {}&\quad + \frac{1}{4\hat{\mu _y}}\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert \\ {}&\le \left( \frac{3}{2}+\frac{L_{lx}}{2L_{xy}}+\frac{L_{ly}}{4\hat{\mu _y}}+\frac{L_{xy}}{4\hat{\mu _y}}D_{x,1} \right) \sqrt{\delta '}+\frac{L_{xy}}{4\hat{\mu _y}}D_{x,2} \\ {}&\quad + \frac{1}{2L_{xy}}\left( \sqrt{n\sigma _{F, r}^2} + L_{x}\sqrt{n}M_x+\Vert \nabla F(X^*)\Vert \right) \\ {}&\quad + \frac{1}{4\hat{\mu _y}}\left( \sqrt{n\sigma _{G, r}^2} + L_{y}\sqrt{n}M_y+\Vert \nabla G(Y^*)\Vert \right) = D_{y,1}\sqrt{\delta '}+D_{y,2}. \end{aligned}$$
\(\square\)
Now let us estimate the number of communication steps T.
$$\begin{aligned} (1-\lambda )^{T/\tau }\max \{\mathbb {E}\Vert W^{k+1}-\overline{W^{k+1}}\Vert , \mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert \} \le \delta '. \end{aligned}$$
It would we sufficient to guarantee
$$\begin{aligned} (1-\lambda )^{T/\tau }D \le \delta '. \end{aligned}$$
Above inequality leads from this
$$\begin{aligned} T \ge \frac{\tau }{\lambda }\log \left( \frac{D}{\delta '}\right) . \end{aligned}$$
Putting the proof together
Using Lemma (2.1), we get
$$\begin{aligned} \mathbb {E}\left\| \overline{x^k} - x^*\right\| ^2 \le \frac{\omega }{3L_{xy}}\left( \theta ^k\Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+ \frac{\Sigma ^2}{2(1-\theta )} \right) , \\ \mathbb {E}\left\| \overline{y^k} - y^*\right\| ^2 \le \frac{1}{4L_{xy}\omega }\left( \theta ^k\Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )} \right) . \end{aligned}$$
Define several notations
$$\begin{aligned} \nu = \max \left\{ \frac{1}{3L_{xy}}\omega , \frac{1}{4L_{xy}}\omega ^{-1} \right\} . \end{aligned}$$
Using the definition of \(\Psi ^k\), we get
$$\begin{aligned} \Psi ^0 = \frac{1}{\eta _x}\Vert x^0-x^*\Vert +\frac{1}{\eta _y}\Vert x^0-x^*\Vert + \frac{2}{\sigma _x}B_f(x^0,x^*)+\frac{2}{\sigma _y}B_g(y^0,y^*), \end{aligned}$$
where \(\eta _x = \min \left\{ \frac{1}{4(\hat{\mu _x} + \hat{L_x}\sigma _x)}, \frac{\omega }{4L_{xy}} \right\}\), \(\eta _y = \min \left\{ \frac{1}{4(\hat{\mu _y} + \hat{L_y}\sigma _y)}, \frac{1}{4L_{xy}\omega } \right\} .\)
Rewriting it in terms of \(L_{lx}\), \(L_{x}\), \(L_{ly}\), \(L_{y}\), \(\mu _{lx}\), \(\mu _{x}\), \(\mu _{ly}\), \(\mu _{y}.\)
\(\eta _x = \min \left\{ \frac{1}{2\mu _{x} + 8L_{x}\sigma _x}, \frac{\omega }{4L_{xy}} \right\}\), \(\eta _y = \min \left\{ \frac{1}{2\mu _{y} + 8L_{y}\sigma _y}, \frac{1}{4L_{xy}\omega } \right\} .\)
$$\begin{aligned} \nu \theta ^k \Psi ^0 \le \frac{\varepsilon }{3}. \end{aligned}$$
It would be sufficient to take \(N = k = \frac{1}{1-\theta }\log \left( \frac{3\Psi ^0\nu }{\varepsilon }\right)\).
Finally, let us estimate the right part
$$\begin{aligned} \delta _x, \delta _y \le \frac{(1-\theta )^2\varepsilon }{24\nu }. \end{aligned}$$
Define E as \(E=\frac{1}{2n}\max \left\{ \frac{L_{lx}^2}{L_{x}} + \frac{2L_{lx}^2}{\mu _{x}} + L_{lx} - \mu _{lx}, \frac{L_{ly}^2}{L_{y}} + \frac{2L_{ly}^2}{\mu _{y}} + L_{ly} - \mu _{lx} \right\} .\)
Using definition of \(\delta _x\) and \(\delta _y\), we get
$$\begin{aligned} \delta ' = \frac{(1-\theta )^2\varepsilon }{24E\nu }. \end{aligned}$$
Define \(F_x\) and \(F_y\) as \(F_x = \frac{\nu }{2n(1-\theta )}\left( \frac{1}{\hat{L_x}}+\frac{\omega }{L_{xy}} \right)\), \(F_y= \frac{\nu }{2n(1-\theta )}\left( \frac{1}{\hat{L_y}}+\frac{1}{L_{xy}\omega } \right) .\)
Using the definitions of \(\Sigma ^2\), \(\sigma _{F, r}^2\), \(\sigma _{G, r}^2\) we get, that it would be sufficient to take \(r_{f, i} = \Bigg \lceil \frac{6F_x\sigma _{f, i}^2}{\varepsilon }\Bigg \rceil\) and \(r_{g, i} = \Bigg \lceil \frac{6F_y\sigma _{g, i}^2}{\varepsilon }\Bigg \rceil\).
Finally
$$\begin{aligned} N_{comm}&= NT =\mathcal {O}\left( \frac{1}{1-\theta }\tau \chi \log \left( \frac{\Psi ^0\nu }{\varepsilon }\right) \log \left( \frac{D'}{\varepsilon }\right) \right) ,\\ N_{comp}^i&= N(r_{i, f}+r_{i, g})\\&= 2N + \mathcal {O}\left( \frac{\max \{\omega , \omega ^{-1}\}}{nL_{xy}(1-\theta )^2\varepsilon }\left( \left( \frac{1}{L_{x}}+\frac{\omega }{L_{xy}} \right) \sigma _{f, i}^2 + \left( \frac{1}{L_{y}}+\frac{1}{L_{xy}\omega } \right) \sigma _{g, i}^2 \right) \log \left( \frac{\Psi ^0\nu }{\varepsilon }\right) \right) .\\&\frac{1}{1-\theta } = \mathcal {O}\left( \max \left\{ \sqrt{\frac{L_{x}}{\mu _{x}}}, \sqrt{\frac{L_{y}}{\mu _{y}}},\frac{L_{xy}}{\sqrt{\mu _{x}\mu _{y}}}\right\} \right) ,\\ \omega&= \sqrt{\frac{\mu _{y}}{\mu _{x}}}, \sigma _x = \sqrt{\frac{\mu _{x}}{8L_{x}}},\sigma _y = \sqrt{\frac{\mu _{y}}{8L_{y}}},\\ \frac{1}{1-\theta }&= \mathcal {O}\left( \max \left\{ \frac{\sqrt{L_{x}L_{y}}}{\mu _{xy}}, \frac{L_{xy}}{\mu _{xy}}\sqrt{\frac{L_{x}}{\mu _{x}}}, \frac{L_{xy}^2}{\mu _{xy}^2} \right\} \right) , \\ \omega&= \sqrt{\frac{\mu _{xy}^2}{2\mu _{x}L_{x}}}, \sigma _x = \sqrt{\frac{\mu _{x}}{8L_{x}}},\sigma _y =\min \left\{ 1,\sqrt{\frac{\mu _{xy}^2}{16L_{x}L_{y}}}\right\} , \\ \frac{1}{1-\theta }&= \mathcal {O}\left( \max \left\{ \frac{\sqrt{L_{x}L_{y}}}{\mu _{yx}}, \frac{L_{xy}}{\mu _{yx}}\sqrt{\frac{L_{y}}{\mu _{y}}}, \frac{L_{xy}^2}{\mu _{yx}^2} \right\} \right) , \\ \omega&= \sqrt{\frac{2\mu _{y}L_{y}}{\mu _{yx}^2}}, \sigma _x =\min \left\{ 1,\sqrt{\frac{\mu _{yx}^2}{16L_{x}L_{y}}}\right\} ,\sigma _y = \sqrt{\frac{\mu _{y}}{8L_{y}}},\\ \frac{1}{1-\theta }&= \mathcal {O}\left( \max \left\{ \frac{\sqrt{L_{x}L_{y}}L_{xy}}{\mu _{xy}\mu _{yx}}, \frac{L_{xy}^2}{\mu _{yx}^2}, \frac{L_{xy}^2}{\mu _{xy}^2} \right\} \right) , \\ \omega&= \frac{\mu _{xy}}{\mu _{yx}}\sqrt{\frac{L_{y}}{L_{x}}}, \sigma _x = \min \left\{ 1,\sqrt{\frac{\mu _{yx}^2}{16L_{x}L_{y}}}\right\} ,\sigma _y =\min \left\{ 1,\sqrt{\frac{\mu _{xy}^2}{16L_{x}L_{y}}}\right\} . \end{aligned}$$