Skip to main content
Log in

Decentralized saddle-point problems with different constants of strong convexity and strong concavity

  • Original Paper
  • Published:
Computational Management Science Aims and scope Submit manuscript

Abstract

Large-scale saddle-point problems arise in such machine learning tasks as GANs and linear models with affine constraints. In this paper, we study distributed saddle-point problems with strongly-convex–strongly-concave smooth objectives that have different strong convexity and strong concavity parameters of composite terms, which correspond to min and max variables, and bilinear saddle-point part. We consider two types of first-order oracles: deterministic (returns gradient) and stochastic (returns unbiased stochastic gradient). Our method works in both cases and takes several consensus steps between oracle calls.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Strictly speaking, we must set \(\mu _y = 0\), since \(g\equiv 0\). But without limiting generality, we can consider \(\mu _y\) to be as small as \(\varepsilon\) due to the regularization technique (Wang and Li 2020; Rogozin et al. 2021a).

  2. It seems that Kovalev et al. (2021) is the most tricky approach among (Kovalev et al. 2021; Thekumparampil et al. 2022; Jin et al. 2022), which uses significantly new ideas of acceleration unlike standard ones (Nesterov 2018; Lin et al. 2020).

  3. Maybe this drawback could be eliminated over time, like it was done in Kovalev and Gasnikov (2022) for non-distributed setup.

References

  • Alkousa MS, Gasnikov AV, Dvinskikh DM, Kovalev DA, Stonyakin FS (2020) Accelerated methods for saddle-point problem. Computat Math Math Phys 60(11):1787–1809

    Article  Google Scholar 

  • Beznosikov A, Gorbunov E, Berard H, Loizou N (2022) Stochastic gradient descent-ascent: unified theory and new efficient methods. arXiv preprint arXiv:2202.07262

  • Beznosikov A, Rogozin A, Kovalev D, Gasnikov A (2021) Near-optimal decentralized algorithms for saddle point problems over time-varying networks. In: International conference on optimization and applications. Springer, pp 246–257

  • Beznosikov A, Samokhin V, Gasnikov A (2020) Distributed saddle-point problems: lower bounds, optimal and robust algorithms. arXiv preprint arXiv:2010.13112

  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122

    Article  Google Scholar 

  • Devolder O, Glineur F, Nesterov Y (2014) First-order methods of smooth convex optimization with inexact oracle. Math Program 146(1):37–75

    Article  Google Scholar 

  • Du SS, Gidel G, Jordan MI, Li CJ (2022) Optimal extragradient-based bilinearly-coupled saddle-point optimization. arXiv preprint arXiv:2206.08573

  • Gasnikov A, Novitskii A, Novitskii V, Abdukhakimov F, Kamzolov D, Beznosikov A, Takáč M, Dvurechensky P, Gu B (2022) The power of first-order smooth optimization for black-box non-smooth problems. arXiv preprint arXiv:2201.12289

  • Gorbunov E, Berard H, Gidel G, Loizou N (2021) Stochastic extragradient: general analysis and improved rates. arXiv preprint arXiv:2111.08611

  • Gorbunov E, Rogozin A, Beznosikov A, Dvinskikh D, Gasnikov A (2020) Recent theoretical advances in decentralized distributed convex optimization. arXiv preprint arXiv:2011.13259

  • Ibrahim A, Azizian W, Gidel G, Mitliagkas I (2020) Linear lower bounds and conditioning of differentiable games. In: International conference on machine learning. PMLR, pp 4583–4593

  • Jin Y, Sidford A, Tian K (2022) Sharper rates for separable minimax and finite sum optimization via primal–dual extragradient methods. arXiv preprint arXiv:2202.04640

  • Kovalev D, Salim A, Richtárik P (2020) Optimal and practical algorithms for smooth and strongly convex decentralized optimization. Adv Neural Inf Process Syst 33:18342–18352

    Google Scholar 

  • Kovalev D, Gasanov E, Gasnikov A, Richtarik P (2021) Lower bounds and optimal algorithms for smooth and strongly convex decentralized optimization over time-varying networks. Adv Neural Inf Process Syst 34:22325–22335

    Google Scholar 

  • Kovalev D, Beznosikov A, Sadiev A, Persiianov M, Richtárik P, Gasnikov A (2022) Optimal algorithms for decentralized stochastic variational inequalities. arXiv preprint arXiv:2202.02771

  • Kovalev D, Gasnikov A (2022) The first optimal algorithm for smooth and strongly-convex-strongly-concave minimax optimization. arXiv preprint arXiv:2205.05653

  • Kovalev D, Gasnikov A, Richtárik P (2021) Accelerated primal-dual gradient method for smooth and convex–concave saddle-point problems with bilinear coupling. arXiv preprint arXiv:2112.15199

  • Kovalev D, Shulgin E, Richtárik P, Rogozin AV, Gasnikov A (2021) Adom: accelerated decentralized optimization method for time-varying networks. In: International conference on machine learning. PMLR, pp 5784–5793

  • Lan G (2020) First-order and stochastic optimization methods for machine learning. Springer, New York

    Book  Google Scholar 

  • Li H, Lin Z (2021) Accelerated gradient tracking over time-varying graphs for decentralized optimization. arXiv preprint arXiv:2104.02596

  • Lin Z, Li H, Fang C (2020) Accelerated optimization for machine learning. Springer, New York

    Book  Google Scholar 

  • Lin T, Jin C, Jordan MI (2020) Near-optimal algorithms for minimax optimization. In: Conference on learning theory. PMLR, pp 2738–2779

  • Luo L, Ye H (2022) Decentralized stochastic variance reduced extragradient method

  • Nemirovski A, Yudin D (1983) Problem complexity and method efficiency in optimization. Wiley, New York

    Google Scholar 

  • Nesterov Y (2018) Lectures on convex optimization, vol 137. Springer, New York

    Google Scholar 

  • Rogozin A, Beznosikov A, Dvinskikh D, Kovalev D, Dvurechensky P, Gasnikov A (2021) Decentralized distributed optimization for saddle point problems. arXiv preprint arXiv:2102.07758

  • Rogozin A, Bochko M, Dvurechensky P, Gasnikov A, Lukoshkin, V (2021) An accelerated method for decentralized distributed stochastic optimization over time-varying graphs. arXiv preprint arXiv:2103.15598

  • Rogozin A, Lukoshkin V, Gasnikov A, Kovalev D, Shulgin E (2021) Towards accelerated rates for distributed optimization over time-varying networks. In: International conference on optimization and applications. Springer, pp 258–272

  • Scaman K, Bach F, Bubeck S, Lee YT, Massoulié L (2017) Optimal algorithms for smooth and strongly convex distributed optimization in networks

  • Song Z, Shi L, Pu S, Yan M (2021) Optimal gradient tracking for decentralized optimization. arXiv preprint arXiv:2110.05282

  • Thekumparampil KK, He N, Oh S (2022) Lifted primal–dual method for bilinearly coupled smooth minimax optimization. arXiv preprint arXiv:2201.07427

  • Tian Y, Scutari G, Cao T, Gasnikov A (2021) Acceleration in distributed optimization under similarity. arXiv preprint arXiv:2110.12347

  • Tominin V, Tominin Y, Borodich E, Kovalev D, Gasnikov A, Dvurechensky P (2021) On accelerated methods for saddle-point problems with composite structure. arXiv preprint arXiv:2103.09344

  • Tseng P (2000) A modified forward–backward splitting method for maximal monotone mappings. SIAM J Control Optim 38(2):431–446

    Article  Google Scholar 

  • Wang Y, Li J (2020) Improved algorithms for convex–concave minimax optimization. Adv Neural Inf Process Syst 33:4800–4810

    Google Scholar 

  • Yarmoshik D, Rogozin A, Khamisov O, Dvurechensky P, Gasnikov A et al (2022) Decentralized convex optimization under affine constraints for power systems control. arXiv preprint arXiv:2203.16686

  • Zhang X, Aybat NS, Gurbuzbalaban M (2021) Robust accelerated primal-dual methods for computing saddle points. arXiv preprint arXiv:2111.12743

  • Zhang J, Hong M, Zhang S (2019) On lower iteration complexity bounds for the saddle point problems. arXiv preprint arXiv:1912.07481

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dmitry Metelev.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Inexact setting

1.1 A.1 Proof of the Theorem 2.7

Lemma A.1

Let us introduce several definitions.

$$\begin{aligned} \tau _x = (\sigma _x^{-1}+1/2)^{-1}, \end{aligned}$$
(8)
$$\begin{aligned} \alpha _x=\mu _x, \end{aligned}$$
(9)
$$\begin{aligned} \beta _x=\min \left\{ \frac{1}{2L_y}, \frac{1}{2\eta _x L_{xy}^2} \right\} . \end{aligned}$$
(10)

Then the following inequality holds

$$\begin{aligned} \begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\Vert x^{k+1} - x^*\Vert ^2&\le \left( \frac{1}{\eta _x}-\mu _x-\beta _x \mu _{yx}^2 \right) \Vert x^k - x^*\Vert ^2\\&\quad +\left( \mu _x + L_x\sigma _x - \frac{1}{2\eta _x} \right) \mathbb {E}_{\xi _x^k, \xi _y^k}\Vert x^{k+1} - x^k\Vert ^2\\&\quad +B_g(y_g^k,y^*) -B_f(x_g^k,x^*) -\frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k\,\xi _y^k}B_f(x_f^{k+1},x^*)\\&\quad +\left( \frac{2}{\sigma _x} - 1 \right) B_f(x_f^k,x^*) -2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle A^T(y_m^k-y^*), x^{k+1} - x^* \rangle \\&\quad +\delta _y + \left( \frac{4}{\sigma _x}+1 \right) \delta _x + \beta _x\sigma _g^2 + 2\eta _x\sigma _f^2. \end{aligned} \end{aligned}$$
(11)

Proof

The proof is similar to the proof of Lemma B.2 in Kovalev et al. (2020). However, we need to expand the proof to cover the inexact stochastic case. To accomplish this, we need to replace several inequalities in the proof from the referred article by the corresponding inequalities with an inexact stochastic oracle.

Let \(B_f(a, b) = f(a)-f(b)-\langle \nabla f(b), a - b \rangle\), \(B_g(a, b)\) is defined in the same way.

In this analysis, we will need the following inequalities (the term \(\xi ^{k-1}\) is omitted).

$$\begin{aligned}{} & {} \frac{1}{2L_y}\mathbb {E}_{\xi _y^k}\Vert \nabla g_{\delta }(y_g^k, \xi _y^k)-\nabla g(y^*)\Vert ^2 \le B_g(y_g^k,y^*)+\delta _y+\frac{\sigma _g^2}{2L_y}, \end{aligned}$$
(12)
$$\begin{aligned}{} & {} \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k)\nonumber \\{} & {} \quad -\nabla f(x^*), x^{k+1} - x^* \rangle \ge \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k) - \nabla f(x^*), x^{k+1} - x^* \rangle - \eta _x\sigma _f^2, \end{aligned}$$
(13)
$$\begin{aligned}{} & {} \quad \langle \nabla f_{\delta }(x^g_k) - \nabla f(x^*), x_f^{k+1} - x_g^k \rangle \ge B_f(x_f^{k+1},x^*) \nonumber \\{} & {} \quad -B_f(x_g^k,x^*)-\frac{L_x}{2}\Vert x_f^{k+1}-x_g^k\Vert ^2-\delta _x, \end{aligned}$$
(14)
$$\begin{aligned}{} & {} \quad 2\langle \nabla f_{\delta }(x_g^k) - \nabla f(x^*), x_g^k - x^* \rangle \ge 2B_f(x_g^k,x^*) + \mu _x\Vert x_g^k-x^*\Vert ^2 - 2\delta _x, \end{aligned}$$
(15)
$$\begin{aligned}{} & {} \quad \langle \nabla f_{\delta }(x^g_k) - \nabla f(x^*), x_f^k - x_g^k \rangle \le B_f(x_f^k,x^*)-B_f(x_g^k, x^*)+\delta _x. \end{aligned}$$
(16)

Let us prove these inequalities.

Inequality (12). Using \(\mathbb {E}_{\xi _y^k}\nabla g_{\delta }(y_g^k, \xi _y^k) = \nabla g_{\delta }(y_g^k)\) and according to Theorem 1 from Devolder et al. (2014), we have

$$\begin{aligned} \frac{1}{2L_y}\mathbb {E}_{\xi _y^k}\Vert \nabla g_{\delta }(y_g^k, \xi _y^k)-\nabla g(y^*)\Vert ^2&= \frac{1}{2L_y}\mathbb {E}_{\xi _y^k}\Vert \nabla g_{\delta }(y_g^k, \xi _y^k)-\nabla g_{\delta }(y_g^k)\Vert ^2\\&\quad + \frac{1}{2L_y}\Vert \nabla g_{\delta }(y_g^k)-\nabla g(y^*)\Vert ^2 \\&\le g(y_g^k)-g(y^*)-\langle \nabla g(y^*), y_g^k - y^* \rangle + \delta _y + \frac{\sigma _g^2}{2L_y}\\&= B_g(y_g^k,y^*)+\delta _y+\frac{\sigma _g^2}{2L_y}. \end{aligned}$$

We choose inexact oracle \((\hat{g}_{\delta }, \nabla \hat{g}_{\delta })\) such that it is the same as \((g_{\delta }, \nabla g_{\delta })\) at all points except \(y^*\), and at \(y^*\) it equals \((g(y^*), \nabla g(y^*))\).

Inequality (13):

$$\begin{aligned} \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k) - \nabla f(x^*), x^{k+1} - x^* \rangle&= \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k) - \nabla f_{\delta }(x_g^k), x^{k+1} - x^* \rangle \\ {}&\quad + \langle \nabla f_{\delta }(x_g^k) - \nabla f(x^*), x^{k+1} - x^* \rangle . \end{aligned}$$

Using Line (8) of the Algorithm 1 we obtain

$$\begin{aligned} \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k) - \nabla f(x^*), x^{k+1} - x^* \rangle&= \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k) - \nabla f_{\delta }(x_g^k), -\eta _x \nabla f_{\delta }(x_g^k, \xi _x^k) \rangle \\&\quad + \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k) - \nabla f(x^*), x^{k+1} - x^* \rangle \\&= \mathbb {E}_{\xi _x^k}\langle \nabla f_{\delta }(x_g^k) - \nabla f(x^*), x^{k+1} - x^* \rangle \\&\quad - \eta _x\mathbb {E}_{\xi _x^k}\Vert \nabla f_{\delta }(x_g^k, \xi _x^k) - \nabla f_{\delta }(x_g^k)\Vert ^2\\&\ge \mathbb {E}_{\xi _x^k} \langle \nabla f_{\delta }(x_g^k) - \nabla f(x^*), x^{k+1} - x^* \rangle -\eta _x \sigma _f^2. \end{aligned}$$

Inequality (14):

$$\begin{aligned} \langle \nabla f_{\delta }(x^g_k) - \nabla f(x^*), x_f^{k+1} - x_g^k \rangle&\ge f(x_f^{k+1}) - f_{\delta }(x_g^k)-\frac{L_x}{2}\Vert x_f^{k+1}-x_g^k\Vert ^2\\&\quad -\delta _x - \langle \nabla f(x^*), x_f^{k+1} - x_g^k \rangle \\&\ge f(x_f^{k+1}) - f(x_g^k)-\frac{L_x}{2}\Vert x_f^{k+1}-x_g^k\Vert ^2\\&\quad -\delta _x - \langle \nabla f(x^*), x_f^{k+1} - x_g^k \rangle \\&= B_f(x_f^{k+1},x^*) - B_f(x_g^k,x^*)-\frac{L_x}{2}\Vert x_f^{k+1}-x_g^k\Vert ^2-\delta _x. \end{aligned}$$

Inequality (15):

$$\begin{aligned} 2(f(x^*)-f(x_g^k))-2\langle \nabla f_{\delta }(x_g^k), x^*-x_g^k \rangle + 2\delta _x&\ge 2(f(x^*)-f_{\delta }(x_g^k))-2\langle \nabla f_{\delta }(x_g^k), x^*-x_g^k \rangle \\ {}&\ge \mu _x\Vert x_g^k-x^*\Vert ^2. \end{aligned}$$

Inequality (16):

$$\begin{aligned} \langle \nabla f_{\delta }(x^g_k) - \nabla f(x^*), x_f^k - x_g^k \rangle&\le f(x_f^k) - f_{\delta }(x_g^k)-\langle \nabla f(x^*), x_f^k - x_g^k \rangle \\&\le f(x_f^k) - f(x_g^k)-\langle \nabla f(x^*), x_f^k - x_g^k \rangle +\delta _x\\&= B_f(x_f^k,x^*)-B_f(x_g^k)+\delta _x. \end{aligned}$$

Using Line (8) of the Algorithm 1 we get

$$\begin{aligned} \frac{1}{\eta _x}\left\| x^{k+1} - x^*\right\| ^2&= \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + \frac{2}{\eta _x}\langle x^{k+1} - x^k,x^{k+1} - x^*\rangle - \frac{1}{\eta _x}\left\| x^{k+1} - x^k\right\| ^2\\&=\frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + 2\alpha _x\langle x_g^k - x^k,x^{k+1}- x^*\rangle \\&\quad -2\beta _x\langle \textbf{A}^\top (\textbf{A}x^k - \nabla g_{\delta }(y_g^k, \xi _y^k, \xi ^{k-1}),x^{k+1} - x^*\rangle \\&\quad -2\langle \nabla f_{\delta }(x_g^k, \xi _x^k, \xi ^{k-1}) + \textbf{A}^\top y_m^k,x^{k+1} - x^*\rangle - \frac{1}{\eta _x}\left\| x^{k+1} - x^k\right\| ^2. \end{aligned}$$

Using the parallelogram rule we get

$$\begin{aligned} \frac{1}{\eta _x}\left\| x^{k+1} - x^*\right\| ^2&= \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2\\&\quad +\alpha _x\left( \left\| x_g^k - x^*\right\| ^2 - \left\| x_g^k - x^{k+1}\right\| ^2 - \left\| x^{k} - x^*\right\| ^2+\left\| x^{k+1} - x^k\right\| ^2\right) \\&\quad -2\beta _x\langle \textbf{A}x^k - \nabla g_{\delta }(y_g^k, \xi _y^k, \xi ^{k-1}),\textbf{A}(x^{k+1} - x^*)\rangle \\&\quad -2\langle \nabla f_{\delta }(x_g^k, \xi _x^k, \xi ^{k-1}) + \textbf{A}^\top y_m^k,x^{k+1} - x^*\rangle - \frac{1}{\eta _x}\left\| x^{k+1} - x^k\right\| ^2. \end{aligned}$$

Using the optimality condition \(\nabla g(y^*) = \textbf{A}x^*\), which follows from \(\nabla _y F(x^*, y^*) = 0\), and the parallelogram rule we get

$$\begin{aligned} \frac{1}{\eta _x}\left\| x^{k+1} - x^*\right\| ^2&= \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2\\&\quad +\alpha _x\left( \left\| x_g^k - x^*\right\| ^2 - \left\| x_g^k - x^{k+1}\right\| ^2 - \left\| x^{k} - x^*\right\| ^2\right. \\&\quad \left. +\left\| x^{k+1} - x^k\right\| ^2\right) \\&\quad +\beta _x\left( \left\| \textbf{A}(x^{k+1} - x^k)\right\| ^2 - \left\| \textbf{A}(x^k - x^*)\right\| ^2\right) \\&\quad +\beta _x\left( \left\| \nabla g_{\delta }(y_g^k, \xi _y^k, \xi ^{k-1}) - \nabla g(y^*)\right\| ^2 \right. \\&\left. \quad - \left\| \nabla g_{\delta }(y_g^k, \xi _y^k, \xi ^{k-1}) - \textbf{A}(x^{k+1})\right\| ^2\right) \\&\quad -2\langle \nabla f_{\delta }(x_g^k, \xi _x^k, \xi ^{k-1}) + \textbf{A}^\top y_m^k,x^{k+1} - x^*\rangle \\&\quad - \frac{1}{\eta _x}\left\| x^{k+1} - x^k\right\| ^2. \end{aligned}$$

Using Assumption (1.7) and Eq. (12), we get

$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + \alpha _x\left\| x_g^k - x^*\right\| ^2 - \alpha _x\left\| x^{k} - x^*\right\| ^2\\&\qquad +\alpha _x\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + \beta _xL_{xy}^2\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2\\&\qquad -\beta _x\mu _{yx}^2\left\| x^k - x^*\right\| ^2 \\&\qquad + 2\beta _xL_yB_g(y_g^k,y^*) + 2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2\\&\qquad -2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k, \xi ^{k-1}) + \textbf{A}^\top y_m^k,x^{k+1} - x^*\rangle \\&\qquad -\frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\&\quad =\left( \frac{1}{\eta _x} - \alpha _x- \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2\\&\qquad +\left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\&\qquad +2\beta _xL_yB_g(y_g^k,y^*)\\&\qquad +\alpha _x\left\| x_g^k - x^*\right\| ^2 - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k, \xi ^{k-1}) + \textbf{A}^\top y_m^k,x^{k+1} -x^*\rangle \\&\qquad +2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2. \end{aligned}$$

Using the optimality condition \(\nabla f(x^*) + \textbf{A}^\top y^* = 0\), which follows from \(\nabla _x F(x^*, y^*) = 0\) and Eq. (13), we get

$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2\\&\quad +\left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + 2\beta _xL_yB_g(y_g^k,y^*)\\&\quad+\alpha _x\left\| x_g^k - x^*\right\| ^2 - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi _x^k, \xi ^{k-1}) - \nabla f(x^*),x^{k+1} - x^*\rangle \\&\quad-2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle + 2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2\\&=\left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2\\&\quad +\left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2\\&\quad +2\beta _xL_yB_g(y_g^k,y^*) + \alpha _x\left\| x_g^k - x^*\right\| ^2\\&\quad -2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi ^{k-1}) - \nabla f(x^*),x^{k+1}- x^k + x^k - x_g^k + x_g^k - x^*\rangle \\&\quad -2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle + 2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2. \end{aligned}$$

Using \(\mu _y\)-strong convexity of f and Lines (6) and  (10) of the Algorithm 1 and Eq. (15) we get

$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2\\&\quad + \left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\&\quad +2\beta _xL_yB_g(y_g^k,y^*)\\&\quad + \alpha _x\left\| x_g^k - x^*\right\| ^2 - \frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi ^{k-1}) - \nabla f(x^*),x_f^{k+1}- x_g^k\rangle \\&\quad + \frac{2(1-\tau _x)}{\tau _x}\langle \nabla f_{\delta }(x_g^k, \xi ^{k-1}) - \nabla f(x^*),x_f^{k}- x_g^k\rangle - 2B_f(x_g^k,x^*) \\&\quad - \mu _x\left\| x_g^k -x^*\right\| ^2 + 2\delta _x(\xi ^{k-1}) - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\&\quad + 2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2 \\&= \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 \\&\quad + \left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + (\alpha _x-\mu _x)\left\| x_g^k - x^*\right\| ^2 \\&\quad + 2\beta _xL_yB_g(y_g^k,y^*) - 2B_f(x_g^k,x^*) \\&\quad - \frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi ^{k-1}) - \nabla f(x^*),x_f^{k+1}- x_g^k\rangle \\&\quad + \frac{2(1-\tau _x)}{\tau _x}\langle \nabla f_{\delta }(x_g^k, \xi ^{k-1}) - \nabla f(x^*),x_f^{k}- x_g^k\rangle \\&\quad - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\&\quad +2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2+ 2\delta _x(\xi ^{k-1}). \end{aligned}$$

Using Eq. (16), we get

$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 \\ {}&\quad + \left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + (\alpha _x-\mu _x)\left\| x_g^k - x^*\right\| ^2 \\ {}&\quad + 2\beta _xL_yB_g(y_g^k,y^*) - 2B_f(x_g^k,x^*) \\ {}&\quad - \frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \nabla f_{\delta }(x_g^k, \xi ^{k-1}) - \nabla f(x^*),x_f^{k+1}- x_g^k\rangle \\ {}&\quad + \frac{2(1-\tau _x)}{\tau _x}\left( B_f(x_f^k,x^*) - B_f(x_g^k,x^*)\right) \\ {}&\quad - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\ {}&\quad +2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2+ 2\delta _x(\xi ^{k-1})+\frac{2(1-\tau _x)}{\tau _x}\delta _x(\xi ^{k-1}). \end{aligned}$$

Using Eq. (14), we get

$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2\\&\quad +\left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + (\alpha _x-\mu _x)\left\| x_g^k - x^*\right\| ^2\\&\quad +2\beta _xL_yB_g(y_g^k,y^*) - 2B_f(x_g^k,x^*)\\&\quad -\frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left( B_f(x_f^{k+1},x^*) - B_f(x_g^k,x^*) - \frac{L_x}{2}\left\| x_f^{k+1} - x_g^k\right\| ^2\right) \\&\quad +\frac{2(1-\tau _x)}{\tau _x}\left( B_f(x_f^k,x^*) - B_f(x_g^k,x^*)\right) \\&\quad -2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\&\quad +2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2+\left( \frac{2}{\tau _x}+\frac{2}{\sigma _x}\right) \delta _x(\xi ^{k-1}). \end{aligned}$$

Using Line (10) of the Algorithm 1 we get

$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2\\&+\quad \left( \beta _xL_{xy}^2 + \alpha _x-\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + (\alpha _x-\mu _x)\left\| x_g^k - x^*\right\| ^2\\&\quad +2\beta _xL_yB_g(y_g^k,y^*) - 2B_f(x_g^k,x^*)\\&\quad -\frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left( B_f(x_f^{k+1},x^*) - B_f(x_g^k,x^*) - \frac{L_x\sigma _x^2}{2}\left\| x^{k+1} - x^k\right\| ^2\right) \\&\quad +\frac{2(1-\tau _x)}{\tau _x}\left( B_f(x_f^k,x^*) - B_f(x_g^k,x^*)\right) \\&\quad -2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\&\quad +2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2+\left( \frac{2}{\tau _x}+\frac{2}{\sigma _x}\right) \delta _x(\xi ^{k-1}). \end{aligned}$$

Transforming this inequality we get

$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \alpha _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 \\ {}&\quad + \left( \beta _xL_{xy}^2 + \alpha _x + L_x\sigma _x -\frac{1}{\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\ {}&\quad + (\alpha _x-\mu _x)\left\| x_g^k - x^*\right\| ^2 + 2\beta _xL_yB_g(y_g^k,y^*) + \left( \frac{2}{\sigma _x} - \frac{2}{\tau _x}\right) B_f(x_g^k,x^*) \\ {}&\quad - \frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}B_f(x_f^{k+1},x^*) + \left( \frac{2}{\tau _x} - 2\right) B_f(x_f^k,x^*) \\ {}&\quad - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\ {}&\quad +2\beta _xL_y\delta _y(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2+\left( \frac{2}{\tau _x}+\frac{2}{\sigma _x}\right) \delta _x(\xi ^{k-1}). \end{aligned}$$

Using the definition of \(\tau _x\), \(\alpha _x\) and \(\beta _x\) we get

$$\begin{aligned} \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 \\ {}&\quad + \left( \mu _x + L_x\sigma _x -\frac{1}{2\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\ {}&\quad + B_g(y_g^k,y^*) - B_f(x_g^k,x^*) \\ {}&\quad - \frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}B_f(x_f^{k+1},x^*) + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) \\ {}&\quad - 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle \textbf{A}^\top (y_m^k - y^*),x^{k+1} - x^*\rangle \\ {}&+\quad \delta _y(\xi ^{k-1})+\left( \frac{4}{\sigma _x}+1 \right) \delta _x(\xi ^{k-1})+\beta _x\sigma _g^2+2\eta _x\sigma _f^2. \end{aligned}$$

\(\square\)

Lemma A.2

Let us introduce several definitions.

$$\begin{aligned} \tau _y = (\sigma _y^{-1}+1/2)^{-1}, \end{aligned}$$
(17)
$$\begin{aligned} \alpha _y=\mu _y, \end{aligned}$$
(18)
$$\begin{aligned} \beta _y=\min \left\{ \frac{1}{2L_x}, \frac{1}{2\eta _y L_{xy}^2} \right\} . \end{aligned}$$
(19)

Then the following inequality holds

$$\begin{aligned} \begin{aligned} \frac{1}{\eta _y}\mathbb {E}_{\xi _x^k, \xi _y^k}\Vert y^{k+1} - y^*\Vert ^2&\le \left( \frac{1}{\eta _y}-\mu _y-\beta _y \mu _{xy}^2 \right) \Vert y^k - y^*\Vert ^2 \\ {}&\quad + \left( \mu _y + L_y\sigma _y - \frac{1}{2\eta _y} \right) \mathbb {E}_{\xi _x^k, \xi _y^k}\Vert y^{k+1} - y^k\Vert ^2 \\ {}&\quad +B_f(x_g^k,x^*)-B_g(y_g^k,y^*)-\frac{2}{\sigma _y}\mathbb {E}_{\xi _x^k, \xi _y^k}B_g(y_f^{k+1},y^*) \\ {}&\quad + \left( \frac{2}{\sigma _y} - 1 \right) B_g(y_f^k,y^*) +2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle A(x^{k+1}-x^*), y^{k+1} - y^* \rangle \\ {}&\quad + \delta _x(\xi ^{k-1}) + \left( \frac{4}{\sigma _y}+1 \right) \delta _y(\xi ^{k-1}) + \beta _y\sigma _f^2 + 2\eta _y\sigma _g^2. \end{aligned} \end{aligned}$$
(20)

Proof

The proof is similar to the proof of the previous lemma. \(\square\)

Lemma A.3

Let \(\eta _x\) be defined as

$$\begin{aligned} \eta _x = \min \left\{ \frac{1}{4(\mu _x + L_x\sigma _x)}, \frac{\omega }{4L_{xy}} \right\} , \end{aligned}$$

and let \(\eta _y\) be defined as

$$\begin{aligned} \eta _y = \min \left\{ \frac{1}{4(\mu _y + L_y\sigma _y)}, \frac{1}{4L_{xy}\omega } \right\} , \end{aligned}$$

where \(\omega > 0\) is a parameter. Let \(\theta\) be defined as

$$\begin{aligned} \theta = \theta (\omega , \sigma _x, \sigma _y) = 1 - \max \{\rho _a(\omega , \sigma _x, \sigma _y),\rho _b(\omega , \sigma _x, \sigma _y),\rho _c(\omega , \sigma _x, \sigma _y),\rho _d(\omega , \sigma _x, \sigma _y)\}, \end{aligned}$$

where

\(\rho _a(\omega , \sigma _x, \sigma _y) = \left[ \max \left\{ \frac{4(\mu _x + L_x\sigma _x)}{\mu _x}, \frac{2}{\sigma _x}, \frac{4(\mu _y + L_y\sigma _y)}{\mu _{y}}, \frac{2}{\sigma _y}, \frac{4L_{xy}}{\mu _x\omega }, \frac{4L_{xy}\omega }{\mu _y} \right\} \right] ^{-1},\)

\(\rho _b(\omega , \sigma _x, \sigma _y) = \left[ \max \left\{ \frac{4(\mu _x + L_x\sigma _x)}{\mu _x}, \frac{2}{\sigma _x}, \frac{8L_x(\mu _y + L_y\sigma _y)}{\mu _{xy}^2}, \frac{2}{\sigma _y}, \frac{2L_{xy}^2}{\mu _{xy}^2}, \frac{8L_x L_{xy}\omega }{\mu _{xy}^2},\frac{4L_{xy}}{\mu _x\omega } \right\} \right] ^{-1},\)

\(\rho _c(\omega , \sigma _x, \sigma _y) = \left[ \max \left\{ \frac{4(\mu _y + L_y\sigma _y)}{\mu _y}, \frac{2}{\sigma _y}, \frac{8L_y(\mu _x + L_x\sigma _x)}{\mu _{yx}^2}, \frac{2}{\sigma _x}, \frac{2L_{xy}^2}{\mu _{yx}^2}, \frac{8L_y L_{xy}}{\mu _{yx}^2\omega },\frac{4L_{xy}\omega }{\mu _y} \right\} \right] ^{-1},\)

\(\rho _d(\omega , \sigma _x, \sigma _y) = \left[ \max \left\{ \frac{8L_y(\mu _x + L_x\sigma _x)}{\mu _{yx}^2}, \frac{2}{\sigma _x}, \frac{8L_x(\mu _y + L_y\sigma _y)}{\mu _{xy}^2}, \frac{2}{\sigma _y}, \frac{8L_y L_{xy}}{\mu _{yx}^2\omega },\frac{8L_x L_{xy}\omega }{\mu _{xy}^2},\frac{2L_{xy}^2}{\mu _{yx}^2},\frac{2L_{xy}^2}{\mu _{xy}^2} \right\} \right] ^{-1}.\)

Let \(\Psi ^k\) be the following Lyapunov function:

$$\begin{aligned} \begin{aligned} \Psi ^k&= \frac{1}{\eta _x}\Vert x^k - x^*\Vert ^2+\frac{1}{\eta _y}\Vert y^k - y^*\Vert ^2+\frac{2}{\sigma _x}B_f(x_f^k, x^*)+\frac{2}{\sigma _y}B_g(y_f^k, y^*) \\ {}&\quad +\frac{1}{4\eta _y}\Vert y^k - y^{k-1}\Vert ^2-2\langle y^k-y^{k-1}, A(x^k-x^*) \rangle . \end{aligned} \end{aligned}$$
(21)

Then, the following inequalities hold

$$\begin{aligned} \Psi ^k \ge \frac{3}{4\eta _x}\Vert x^k - x^*\Vert ^2 + \frac{1}{\eta _y}\Vert y^k - y^*\Vert ^2, \end{aligned}$$
(22)
$$\begin{aligned} \mathbb {E}\Psi ^{k+1}\le \theta \mathbb {E}\Psi ^k + \frac{4}{1-\theta }\left( \delta _x+\delta _y\right) + \frac{1}{2}\left( \frac{1}{L_x} + \frac{\omega }{L_{xy}}\right) \sigma _f^2 + \frac{1}{2}\left( \frac{1}{L_y} + \frac{1}{L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$
(23)

Proof

The proof of this lemma is similar to the proof of Lemma B.4. in Kovalev et al. (2021).

After adding up (A.3) and (A.2) we get

$$\begin{aligned} \mathrm {(LHS)}&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 + \left( \frac{1}{\eta _y} - \mu _y - \beta _y\mu _{xy}^2\right) \left\| y^{k} - y^*\right\| ^2 \\ {}&\quad + \left( \mu _x + L_x\sigma _x -\frac{1}{2\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\ {}&\quad + \left( \mu _y + L_y\sigma _y -\frac{1}{2\eta _y}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) \\ {}&\quad + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} - y_m^k,\textbf{A}(x^{k+1} - x^*)\rangle \\ {}&\quad + \left( 2 + \frac{4}{\sigma _x}\right) \delta _x(\xi ^{k-1})+\left( 2 + \frac{4}{\sigma _y}\right) \delta _y(\xi ^{k-1}) + (\beta _y + 2\eta _x)\sigma _f^2 + (\beta _x + 2\eta _y)\sigma _g^2, \end{aligned}$$

where \(\mathrm {(LHS)}\) is given as

$$\begin{aligned} \mathrm {(LHS)}&= \frac{1}{\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^*\right\| ^2 + \frac{1}{\eta _y}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^*\right\| ^2 \\ {}&\quad + \frac{2}{\sigma _x}\mathbb {E}_{\xi _x^k, \xi _y^k}B_f(x_f^{k+1},x^*) + \frac{2}{\sigma _y}\mathbb {E}_{\xi _x^k, \xi _y^k}B_g(y_f^{k+1},y^*). \end{aligned}$$

Using Line (6) of the Algorithm 1 we get

$$\begin{aligned} \mathrm {(LHS)}&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 + \left( \frac{1}{\eta _y} - \mu _y - \beta _y\mu _{xy}^2\right) \left\| y^{k} - y^*\right\| ^2 \\ {}&\quad + \left( \mu _x + L_x\sigma _x -\frac{1}{2\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\ {}&\quad + \left( \mu _y + L_y\sigma _y -\frac{1}{2\eta _y}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k+1} - x^*)\rangle \\ {}&\quad + \left( 2 + \frac{4}{\sigma _x}\right) \delta _x(\xi ^{k-1})+\left( 2 + \frac{4}{\sigma _y}\right) \delta _y(\xi ^{k-1}) + (\beta _y + 2\eta _x)\sigma _f^2 + (\beta _x + 2\eta _y)\sigma _g^2. \end{aligned}$$

Using Assumption (1.7) we get

$$\begin{aligned} \mathrm {(LHS)}&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 + \left( \frac{1}{\eta _y} - \mu _y - \beta _y\mu _{xy}^2\right) \left\| y^{k} - y^*\right\| ^2 \\ {}&\quad + \left( \mu _x + L_x\sigma _x -\frac{1}{2\eta _x}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 \\ {}&\quad + \left( \mu _y + L_y\sigma _y -\frac{1}{2\eta _y}\right) \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\quad + 2\theta L_{xy}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^k - y^{k-1} \right\| \left\| x^{k+1} - x^k \right\| \\ {}&\quad + \left( 2 + \frac{4}{\sigma _x}\right) \delta _x(\xi ^{k-1})+\left( 2 + \frac{4}{\sigma _y}\right) \delta _y(\xi ^{k-1}) + (\beta _y + 2\eta _x)\sigma _f^2 + (\beta _x + 2\eta _y)\sigma _g^2. \end{aligned}$$

Using the definition of \(\eta _x\) and \(\eta _y\) and the definition of \(\theta < 1\) we get

$$\begin{aligned} \mathrm {(LHS)}&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 + \left( \frac{1}{\eta _y} - \mu _y - \beta _y\mu _{xy}^2\right) \left\| y^{k} - y^*\right\| ^2 \\ {}&\quad - \frac{1}{4\eta _x} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 - \frac{1}{4\eta _y} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\quad + \frac{\theta }{2\sqrt{\eta _x\eta _y}}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^k - y^{k-1} \right\| \left\| x^{k+1} - x^k \right\| \\ {}&\quad + \frac{4}{1-\theta }\left( \delta _x(\xi ^{k-1})+\delta _y(\xi ^{k-1})\right) + \left( \beta _y + \frac{\omega }{2L_{xy}}\right) \sigma _f^2 + \left( \beta _x + \frac{1}{2L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$

Transforming this inequality we get

$$\begin{aligned} \mathrm {(LHS)}&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 + \left( \frac{1}{\eta _y} - \mu _y - \beta _y\mu _{xy}^2\right) \left\| y^{k} - y^*\right\| ^2 \\ {}&\quad - \frac{1}{4\eta _x} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 - \frac{1}{4\eta _y} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\quad + \frac{\theta }{4\eta _x}\mathbb {E}_{\xi _x^k, \xi _y^k}\left\| x^{k+1} - x^k\right\| ^2 + \frac{\theta }{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 \\ {}&\quad + \frac{4}{1-\theta }\left( \delta _x(\xi ^{k-1})+\delta _y(\xi ^{k-1})\right) + \left( \beta _y + \frac{\omega }{2L_{xy}}\right) \sigma _f^2 + \left( \beta _x + \frac{1}{2L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$

Transforming further

$$\begin{aligned} \mathrm {(LHS)}&\le \left( \frac{1}{\eta _x} - \mu _x - \beta _x\mu _{yx}^2\right) \left\| x^{k} - x^*\right\| ^2 + \left( \frac{1}{\eta _y} - \mu _y - \beta _y\mu _{xy}^2\right) \left\| y^{k} - y^*\right\| ^2 \\ {}&\quad + \frac{\theta }{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 - \frac{1}{4\eta _y} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\quad + \frac{4}{1-\theta }\left( \delta _x(\xi ^{k-1})+\delta _y(\xi ^{k-1})\right) + \left( \beta _y + \frac{\omega }{2L_{xy}}\right) \sigma _f^2 + \left( \beta _x + \frac{1}{2L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$

Using the definition of \(\beta _x\) and \(\beta _y\) we get

$$\begin{aligned} \mathrm {(LHS)}&\le \left( 1 - \eta _x\mu _x - \min \left\{ \frac{\eta _x\mu _{yx}^2}{2L_y},\frac{\mu _{yx}^2}{2L_{xy}^2}\right\} \right) \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 \\ {}&\quad + \left( 1 - \eta _y\mu _y - \min \left\{ \frac{\eta _y\mu _{xy}^2}{2L_x},\frac{\mu _{xy}^2}{2L_{xy}^2}\right\} \right) \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2 \\ {}&\quad + \frac{\theta }{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 - \frac{1}{4\eta _y} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\quad + \frac{4}{1-\theta }\left( \delta _x(\xi ^{k-1})+\delta _y(\xi ^{k-1})\right) + \frac{1}{2}\left( \frac{1}{L_x} + \frac{\omega }{L_{xy}}\right) \sigma _f^2 + \frac{1}{2}\left( \frac{1}{L_y} + \frac{1}{L_{xy}\omega }\right) \sigma _g^2 \end{aligned}$$

Transforming this inequality

$$\begin{aligned} \mathrm {(LHS)}&\le \left( 1 - \max \left\{ \eta _x\mu _x, \min \left\{ \frac{\eta _x\mu _{yx}^2}{2L_y},\frac{\mu _{yx}^2}{2L_{xy}^2}\right\} \right\} \right) \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 \\ {}&\quad + \left( 1 - \max \left\{ \eta _y\mu _y, \min \left\{ \frac{\eta _y\mu _{xy}^2}{2L_x},\frac{\mu _{xy}^2}{2L_{xy}^2}\right\} \right\} \right) \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2 \\ {}&\quad + \frac{\theta }{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 - \frac{1}{4\eta _y} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 \\ {}&\quad + \left( \frac{2}{\sigma _x} - 1\right) B_f(x_f^k,x^*) + \left( \frac{2}{\sigma _y} - 1\right) B_g(y_f^k,y^*) \\ {}&\quad + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle - 2\theta \langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\quad + \frac{4}{1-\theta }\left( \delta _x(\xi ^{k-1})+\delta _y(\xi ^{k-1})\right) + \frac{1}{2}\left( \frac{1}{L_x} + \frac{\omega }{L_{xy}}\right) \sigma _f^2 + \frac{1}{2}\left( \frac{1}{L_y} + \frac{1}{L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$

Using the definition of \(\theta\) we get

$$\begin{aligned} \mathrm {(LHS)}&\le \theta \left( \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2 + \frac{1}{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 \right) \\ {}&\quad + \theta \left( -2\langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle + \frac{2}{\sigma _x}B_f(x_f^k,x^*) + \frac{2}{\sigma _y}B_g(y_f^k,y^*) \right) \\ {}&\quad - \frac{1}{4\eta _y} \mathbb {E}_{\xi _x^k, \xi _y^k}\left\| y^{k+1} - y^k\right\| ^2 + 2\mathbb {E}_{\xi _x^k, \xi _y^k}\langle y^{k+1} -y^k,\textbf{A}(x^{k+1} - x^*)\rangle \\ {}&\quad + \frac{4}{1-\theta }\left( \delta _x(\xi ^{k-1})+\delta _y(\xi ^{k-1})\right) + \frac{1}{2}\left( \frac{1}{L_x} + \frac{\omega }{L_{xy}}\right) \sigma _f^2 + \frac{1}{2}\left( \frac{1}{L_y} + \frac{1}{L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$

After taking the expectation over all random variables, rearranging and using the definition of \(\Psi ^k\), using the fact that \(\mathbb {E}\delta _x(\xi ^{k-1}) \le \delta _x, \mathbb {E}\delta _y(\xi ^{k-1}) \le \delta _y\) we get

$$\begin{aligned} \mathbb {E}\Psi ^{k+1}\le \theta \mathbb {E}\Psi ^k + \frac{4}{1-\theta }\left( \delta _x+\delta _y\right) + \frac{1}{2}\left( \frac{1}{L_x} + \frac{\omega }{L_{xy}}\right) \sigma _f^2 + \frac{1}{2}\left( \frac{1}{L_y} + \frac{1}{L_{xy}\omega }\right) \sigma _g^2. \end{aligned}$$

Finally, using the definition of \(\Psi ^k\), \(\eta _x\) and \(\eta _y\) we get

$$\begin{aligned} \Psi ^k&\ge \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2 + \frac{1}{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 - 2\langle y^k -y^{k-1},\textbf{A}(x^{k} - x^*)\rangle \\ {}&\ge \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2 + \frac{1}{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 - 2L_{xy}\left\| y^k -y^{k-1} \right\| \left\| x^{k} - x^* \right\| \\ {}&\ge \frac{1}{\eta _x}\left\| x^{k} - x^*\right\| ^2 + \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2 + \frac{1}{4\eta _y}\left\| y^k - y^{k-1}\right\| ^2 - \frac{1}{2\sqrt{\eta _x\eta _y}}\left\| y^k -y^{k-1} \right\| \left\| x^{k} - x^* \right\| \\ {}&\ge \frac{3}{4\eta _x}\left\| x^{k} - x^*\right\| ^2 + \frac{1}{\eta _y}\left\| y^{k} - y^*\right\| ^2. \end{aligned}$$

\(\square\)

Back to proof of the Theorem (2.7).

Let \(\Sigma ^2 \triangleq \left( \frac{1}{L_x}+\frac{\omega }{L_{xy}} \right) \sigma _f^2+\left( \frac{1}{L_y}+\frac{1}{L_{xy}\omega } \right) \sigma _g^2\). Then

$$\begin{aligned} \mathbb {E}\Psi ^k&\le \theta ^k \Psi ^0 + \left( \frac{4}{1-\theta }(\delta _x+\delta _y)+\frac{\Sigma ^2}{2}\right) (1 + \theta + \theta ^2 + \dots ) \\ {}&\le \theta ^k \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )}, \end{aligned}$$
$$\begin{aligned} \theta ^k \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )} \ge \mathbb {E}\Psi ^k \ge \frac{3}{4\eta _x}\mathbb {E}\Vert x^k - x^*\Vert ^2 + \frac{1}{\eta _y}\mathbb {E}\Vert y^k - y^*\Vert ^2. \end{aligned}$$

Using the definitions of \(\eta _x\) and \(\eta _y\), we get

$$\begin{aligned} \mathbb {E}\Vert x^k - x^*\Vert ^2 \le \frac{\omega }{3L_{xy}}\left( \theta ^k \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )}\right) , \end{aligned}$$
(24)
$$\begin{aligned} \mathbb {E}\Vert y^k - y^*\Vert ^2 \le \frac{1}{4L_{xy}\omega }\left( \theta ^k \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )}\right) . \end{aligned}$$
(25)

Also for such definitions we know from Kovalev et al. (2021)

$$\begin{aligned} \frac{1}{\rho _a}&\le 4 + 4\max \left\{ \sqrt{\frac{L_x}{\mu _x}}, \sqrt{\frac{L_y}{\mu _y}},\frac{L_{xy}}{\sqrt{\mu _x\mu _y}}\right\} \\&\quad \text { for } \omega = \sqrt{\frac{\mu _y}{\mu _x}}, \sigma _x = \sqrt{\frac{\mu _x}{2L_x}},\sigma _y = \sqrt{\frac{\mu _y}{2L_y}},\\ \frac{1}{\rho _b}&\le 4+8\max \left\{ \frac{\sqrt{L_xL_y}}{\mu _{xy}}, \frac{L_{xy}}{\mu _{xy}}\sqrt{\frac{L_x}{\mu _x}}, \frac{L_{xy}^2}{\mu _{xy}^2} \right\} \\&\quad \text { for } \omega = \sqrt{\frac{\mu _{xy}^2}{2\mu _xL_x}}, \sigma _x = \sqrt{\frac{\mu _x}{2L_x}},\sigma _y =\min \left\{ 1,\sqrt{\frac{\mu _{xy}^2}{4L_xL_y}}\right\} ,\\ \frac{1}{\rho _c}&\le 4+8\max \left\{ \frac{\sqrt{L_xL_y}}{\mu _{yx}}, \frac{L_{xy}}{\mu _{yx}}\sqrt{\frac{L_y}{\mu _y}}, \frac{L_{xy}^2}{\mu _{yx}^2} \right\} \\&\quad \text { for } \omega = \sqrt{\frac{2\mu _yL_y}{\mu _{yx}^2}},\sigma _x =\min \left\{ 1,\sqrt{\frac{\mu _{yx}^2}{4L_xL_y}}\right\} ,\sigma _y = \sqrt{\frac{\mu _y}{2L_y}},\\ \frac{1}{\rho _d}&\le 2+8\max \left\{ \frac{\sqrt{L_xL_y}L_{xy}}{\mu _{xy}\mu _{yx}}, \frac{L_{xy}^2}{\mu _{yx}^2}, \frac{L_{xy}^2}{\mu _{xy}^2} \right\} \\&\quad \text { for } \omega = \frac{\mu _{xy}}{\mu _{yx}}\sqrt{\frac{L_y}{L_x}}, \sigma _x = \min \left\{ 1,\sqrt{\frac{\mu _{yx}^2}{4L_xL_y}}\right\} ,\sigma _y =\min \left\{ 1,\sqrt{\frac{\mu _{xy}^2}{4L_xL_y}}\right\} , \\ \frac{1}{1-\theta }&= \min \{ \rho _a^{-1}, \rho _b^{-1}, \rho _c^{-1}, \rho _d^{-1}\}. \end{aligned}$$

Note, that adding up batches and choosing \(\omega = \sqrt{\frac{\mu _y}{\mu _x}}, \sigma _x = \sqrt{\frac{\mu _x}{2L_x}},\sigma _y = \sqrt{\frac{\mu _x}{2L_x}}\) proves the Theorem (2.7).

Rewriting inequalities in batch setting and assuming \(\delta _x=\delta _y=0\) we get

$$\begin{aligned} \mathbb {E}\Vert x^k - x^*\Vert ^2 \le \frac{\omega }{3L_{xy}}\left( \theta ^k \Psi ^0+\frac{1}{2(1-\theta )}\left( \left( \frac{1}{L_x}+\frac{\omega }{L_{xy}}\right) \frac{\sigma _f^2}{r_f}+\left( \frac{1}{L_y}+\frac{1}{L_{xy}\omega } \right) \frac{\sigma _g^2}{r_g}\right) \right) , \\ \mathbb {E}\Vert y^k - y^*\Vert ^2 \le \frac{1}{4L_{xy}\omega }\left( \theta ^k \Psi ^0+\frac{1}{2(1-\theta )}\left( \left( \frac{1}{L_x}+\frac{\omega }{L_{xy}}\right) \frac{\sigma _f^2}{r_f}+\left( \frac{1}{L_y}+\frac{1}{L_{xy}\omega }\right) \frac{\sigma _g^2}{r_g}\right) \right) . \end{aligned}$$

Therefore, we can estimate the number of algorithm iterations \(N = \mathcal {O} \left( \frac{1}{1-\theta }\log {\frac{C}{\varepsilon }}\right)\), where C is polynomial and not depend on \(\varepsilon\). Rewriting it we obtain \(N = \mathcal {O} \left( \min \{ \rho _a^{-1}, \rho _b^{-1}, \rho _c^{-1}, \rho _d^{-1}\}\log {\frac{C}{\varepsilon }}\right) .\)

It is sufficient to take such batch sizes \(r_f = \Bigg \lceil \frac{\max \{\omega , \omega ^{-1}\}}{2L_{xy}(1-\theta )\varepsilon }\left( \frac{1}{L_x}+\frac{\omega }{L_{xy}}\right) \sigma _f^2\Bigg \rceil\), \(r_g = \Bigg \lceil \frac{\max \{\omega , \omega ^{-1}\}}{2L_{xy}(1-\theta )\varepsilon }\left( \frac{1}{L_y}+\frac{1}{L_{xy}\omega }\right) \sigma _g^2\Bigg \rceil .\)

Rewriting it with the selected constants

$$\begin{aligned} r_f = \Bigg \lceil \max \left\{ \sqrt{\frac{L_x}{\mu _x}}, \sqrt{\frac{L_y}{\mu _y}},\frac{L_{xy}}{\sqrt{\mu _x\mu _y}}\right\} \frac{\mu }{2L_{xy}\sqrt{\mu _x\mu _y}\varepsilon }\left( \frac{1}{L_x} + \frac{1}{L_{xy}}\sqrt{\frac{\mu _y}{\mu _x}}\right) \sigma _f^2 \Bigg \rceil , \\ r_g = \Bigg \lceil \max \left\{ \sqrt{\frac{L_x}{\mu _x}}, \sqrt{\frac{L_y}{\mu _y}},\frac{L_{xy}}{\sqrt{\mu _x\mu _y}}\right\} \frac{\mu }{2L_{xy}\sqrt{\mu _x\mu _y}\varepsilon }\left( \frac{1}{L_y} + \frac{1}{L_{xy}}\sqrt{\frac{\mu _x}{\mu _y}}\right) \sigma _g^2\Bigg \rceil , \end{aligned}$$

where \(\mu = \max \{\mu _x, \mu _y\}\).

Appendix B: Decentralized setting

Let us get an Algorithm 4 from the Algorithm (2), by multiplying every line by \(\frac{{\textbf {1}}}{n}\), where 1 is a column of 1.

Algorithm 4 shows what happens to the average values at the nodes.

figure d

Average values

Supporting values X and Y to be in the neighborhood of \(\mathcal {C}(d_x)\) and \(\mathcal {C}(d_y)\) and using Lemma (2.1), conditions of Theorem (2.7) hold.

Using Assumption (1.5), Assumption (1.6), we get

$$\begin{aligned} \begin{aligned} \mathbb {E}_{\xi _{x, k}}\Vert f_{\delta }(\overline{x}_g^k, \xi _{x, k}) - \nabla f_{\delta }(\overline{x}_g^k)\Vert ^2 \le \frac{\sum _{i=1}^n \sigma _{f, i}^2/r_{f, i}}{n^2} \triangleq \frac{\sigma _{F, r}^2}{n}, \\ \mathbb {E}_{\xi _{y, k}}\Vert g_{\delta }(\overline{y}_g^k, \xi _{y, k}) - \nabla g_{\delta }(\overline{y}_g^k)\Vert ^2 \le \frac{\sum _{i=1}^n \sigma _{g, i}^2/r_{g, i}}{n^2} \triangleq \frac{\sigma _{G,r}^2}{n}. \end{aligned} \end{aligned}$$
(26)

Let us support the number of iterations of Consensus to be sufficiently big to guarantee \(\mathbb {E}\Vert X^k - \overline{X^k}\Vert \le \sqrt{\delta '}\) and \(\mathbb {E}\Vert Y^k - \overline{Y^k}\Vert \le \sqrt{\delta '}.\)

Introducing some definitions, which correspond to Lemma (2.1)

$$\begin{aligned} \delta _x = \frac{1}{2n}\left( \frac{L_{lx}^2}{L_{x}} + \frac{2L_{lx}^2}{\mu _{x}} + L_{lx} - \mu _{lx} \right) \delta ', \\ \delta _y = \frac{1}{2n}\left( \frac{L_{ly}^2}{L_{y}} + \frac{2L_{ly}^2}{\mu _{y}} + L_{ly} - \mu _{ly} \right) \delta ', \end{aligned}$$

\(\hat{L_x} = 2L_{x}\), \(\hat{L_y} = 2L_{y}\), \(\hat{\mu _x} = \mu _{x}/2\), \(\hat{\mu _y} = \mu _{y}/2.\)

Consider the iteration \(k \ge 1\). Assuming, that \(\mathbb {E}\Vert X^t - \overline{X^t}\Vert \le \sqrt{\delta '}\) and \(\mathbb {E}\Vert Y^t - \overline{Y^t}\Vert \le \sqrt{\delta '}\) for \(t=0,1,\dots ,k\), we are going to prove it for \(t=k+1\), using constant number of consensus iterations.

Using Line (10) and (6) of Algorithm 2, we get

$$\begin{aligned} X_g^k = \tau _x X^k + (1-\tau _x) X_f^k = (\tau _x+(1-\tau _x)\sigma _x) X^k - (1-\tau _x)\sigma _x X^{k-1} + (1-\tau _x)X_g^{k-1}. \end{aligned}$$

Define \(V^k = X_g^k-\sigma _x X^k\). Using \(X_g^0 = X^0\), we get \(V^0 = (1-\sigma _x) X^0, \Vert V^0-\overline{V^0}\Vert \le (1-\sigma _x)\sqrt{\delta '}.\)

$$\begin{aligned} V^k&= (1-\sigma _x)\tau _x X^k + (1-\tau _x)V^{k-1}, \\ V^k-\overline{V^k}&= (1-\sigma _x)\tau _x \left( X^k-\overline{X^k} \right) + (1-\tau _x)\left( V^{k-1}-\overline{V^{k-1}} \right) , \\ \mathbb {E}\Vert V^k-\overline{V^k}\Vert&\le (1-\sigma _x)\tau _x\sqrt{\delta '} + (1-\tau _x)(1-\sigma _x)\sqrt{\delta '} = (1-\sigma _x)\sqrt{\delta '}. \end{aligned}$$

Let us now estimate \(X_f^k\), \(k\ge 1\). Using Line (10), we get

$$\begin{aligned} X_f^k = V^{k-1} + \sigma _x X^k, \\ \mathbb {E}\Vert X_f^k - \overline{X_f^k}\Vert \le \sqrt{\delta '}. \end{aligned}$$

Let us now estimate \(X_g^k\) and \(Y_m^k\). Using Line (6) and (5), we get

$$\begin{aligned} \mathbb {E}\Vert X_g^k - \overline{X_g^k}\Vert \le \sqrt{\delta '}, \\ \mathbb {E}\Vert Y_m^k - \overline{Y_m^k}\Vert \le (1+2\theta )\sqrt{\delta '}. \end{aligned}$$

The estimations for \(Y_g^k\), \(Y_f^k\) are similar.

Let us now estimate \(\mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert\). Using Line (8), we get

$$\begin{aligned} U^{k+1}-\overline{U^{k+1}}&=(1-\eta _x\alpha _x)\left( X^k-\overline{X^k}\right) +\eta _x\alpha _x\left( X_g^k-\overline{X_g^k}\right) \\ {}&\quad - \eta _x\beta _x A^T\left( A \left( X^k-\overline{X^k}\right) -\left( \nabla ^r G(Y_g^k, \xi _{y, k})-\overline{\nabla ^r G(Y_g^k, \xi _{y, k})}\right) \right) \\ {}&\quad - \eta _x\left( \left( \nabla ^r F(X_g^k, \xi _{x, k})-\overline{\nabla ^r F(X_g^k, \xi _{x, k}})\right) + A^T \left( Y_m^k-\overline{Y_m^k}\right) \right) . \end{aligned}$$

Using that \(\eta _x\alpha _x \le 1\) and previous estimations, we get

$$\begin{aligned} \mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert&\le (1-\eta _x\alpha _x)\sqrt{\delta '} + \eta _x\alpha _x\sqrt{\delta '} + \eta _x\beta _x L_{xy}^2\sqrt{\delta '} \\ {}&\quad + \eta _x\beta _x L_{xy}\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert +\eta _x\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert +\eta _x L_{xy}(1+2\theta )\sqrt{\delta '} \\ {}&= (1+\eta _x\beta _x L_{xy}^2+\eta _x L_{xy}(1+2\theta ))\sqrt{\delta '}+\eta _x\beta _x L_{xy}\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert \\ {}&\quad + \eta _x\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert . \end{aligned}$$

Getting estimations for \(\mathbb {E}\left\| \nabla ^r G(Y_g^k, \xi _{y, k}) \right\|\) and \(\mathbb {E}\left\| \nabla ^r F(X_g^k, \xi _{x, k}) \right\|\).

$$\begin{aligned} \mathbb {E}\left\| \nabla ^r F(X_g^k, \xi _{x, k}) \right\|&\le \mathbb {E}\left\| \nabla ^r F(X_g^k, \xi _{x, k}) - \nabla F(X_g^k) \right\| + \mathbb {E}\left\| \nabla F(X_g^k) - \nabla F(\overline{X_g^k}) \right\| \\ {}&\quad + \mathbb {E}\Vert \nabla F(\overline{X_g^k}) - \nabla F(X^*)\Vert + \Vert \nabla F(X^*)\Vert \\ {}&\le \left( \mathbb {E}\left\| \nabla ^r F(X_g^k, \xi _{x, k}) - \nabla F(X_g^k)\right\| ^2\right) ^{\frac{1}{2}} + L_{lx}\mathbb {E}\left\| X_g^k-\overline{X_g^k} \right\| \\ {}&\quad + L_{x}\mathbb {E}\left\| \overline{X_g^k}-X^* \right\| + \left\| \nabla F(X^*) \right\| \\ {}&\le \left( \sum _{i=1}^n\sigma _{f, i}^2/r_{f, i}\right) ^{\frac{1}{2}} + L_{lx}\sqrt{\delta '} + L_{x}\sqrt{n}\mathbb {E}\left\| \overline{x_g^k}-x^* \right\| + \left\| \nabla F(X^*) \right\| . \end{aligned}$$

Let us define \(M_x\)

$$\begin{aligned} M_x^2=\frac{\omega }{3L_{xy}}\left( \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )} \right) , \\ \Sigma ^2 = \left( \frac{1}{2L_{x}}+\frac{\omega }{L_{xy}} \right) \frac{\sigma _{F, r}^2}{n}+\left( \frac{1}{2L_{y}}+\frac{1}{L_{xy}\omega } \right) \frac{\sigma _{G, r}^2}{n}. \end{aligned}$$

We choose constants the same as in Eqs. (24) and (25) for Algorithm 4.

Now we are going to estimate \(\left\| \overline{x_g^k}-x^* \right\|\). As we know from Eq. (24) and from Eq. (26)

$$\begin{aligned}{} & {} \mathbb {E}\left\| x^k - x^*\right\| ^2 \le M_x^2,\\{} & {} \mathbb {E}\left\| x^k - x^* \right\| \le \sqrt{\mathbb {E}\left\| x^k - x^*\right\| ^2} \le M_x. \end{aligned}$$

Let \(k \ge 1\). Using Line (10) and (6) of Algorithm 4, we get

$$\begin{aligned} \overline{x_g^k} = \tau _x\overline{x^k} + (1-\tau _x)\overline{x_f^k} = (\tau _x+(1-\tau _x)\sigma _x) \overline{x^k} - (1-\tau _x)\sigma _x \overline{x^{k-1}} + (1-\tau _x)\overline{x_g^{k-1}}. \end{aligned}$$

Let’s define \(\overline{v^k} = \overline{x_g^k} - \sigma _x\overline{x^k}\) and \(v^* = (1-\sigma _x)x^*\). \(\overline{v^0} = (1-\sigma _x)\overline{x^0}\), therefore \(\Vert \overline{v^0}-v^*\Vert \le (1-\sigma _x)M_x\).

$$\begin{aligned} \overline{v^k} = \tau _x(1-\sigma _x)\overline{x^k}+(1-\tau _x)\overline{v^{k-1}}. \end{aligned}$$

Firstly, we want to estimate \(\mathbb {E}\Vert \overline{v^k}-v^*\Vert\).

$$\begin{aligned} \mathbb {E}\Vert \overline{v^k}-v^*\Vert&\le \tau _x(1-\sigma _x)\mathbb {E}\Vert \overline{x^k}-x^*\Vert +(1-\tau _x)\mathbb {E}\Vert \overline{v^{k-1}}-v^*\Vert \\ {}&\le (\tau _x(1-\sigma _x)+(1-\tau _x)(1-\sigma _x))M_x=(1-\sigma _x)M_x. \end{aligned}$$

Using Line (10), we get

$$\begin{aligned} & \overline{{x_{f}^{k} }} = \overline{{v^{{k - 1}} }} + \sigma _{x} \overline{{x^{k} }} , \\ & {\mathbb{E}}\overline{{x_{f}^{k} }} - x^{*} \le {\mathbb{E}}\overline{{v^{{k - 1}} }} - v^{*} {} + \sigma _{x} {\mathbb{E}}\overline{{x^{k} }} - x^{*} {} \le (1 - \sigma _{x} )M_{x} + \sigma _{x} M_{x} = M_{x} . \\ \end{aligned}$$

Let’s estimate \(\mathbb {E}\Vert \overline{x_g^k}-x^*\Vert\). Using Line (5), we get

$$\begin{aligned} \mathbb {E}\Vert \overline{x_g^k}-x^*\Vert \le \tau _x\mathbb {E}\Vert \overline{x^k}-x^*\Vert +(1-\tau _x)\mathbb {E}\Vert \overline{x_f^k}-x^*\Vert \le M_x. \end{aligned}$$

Returning to \(\mathbb {E}\left\| \nabla ^r F(X_g^k, \xi _{x, k}) \right\|\)

$$\begin{aligned} \mathbb {E}\left\| \nabla ^r F(X_g^k, \xi _{x, k}) \right\| \le \sqrt{n\sigma _{F, r}^2} + L_{lx}\sqrt{\delta '} + L_{x}\sqrt{n}M_x + \left\| \nabla F(X^*) \right\| . \end{aligned}$$

Let’s define \(M_y\)

$$\begin{aligned} M_y^2=\frac{1}{4L_{xy}\omega }\left( \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )} \right) . \end{aligned}$$

Then we can estimate \(\mathbb {E}\left\| \nabla ^r G(Y_g^k, \xi _{y, k}) \right\|\) in a similar way

$$\begin{aligned} \mathbb {E}\left\| \nabla ^r G(Y_g^k, \xi _{y, k}) \right\| \le \sqrt{n\sigma _{G, r}^2} + L_{ly}\sqrt{\delta '} + L_{y}\sqrt{n}M_y + \left\| \nabla G(Y^*) \right\| . \end{aligned}$$

Lemma B.1

$$\begin{aligned} \max \left\{ \mathbb {E}\left\| U^{k+1}-\overline{U^{k+1}} \right\| , \mathbb {E}\left\| W^{k+1}-\overline{W^{k+1}} \right\| \right\} \le D, \end{aligned}$$

where

$$\begin{aligned}{} & {} D = \max \left\{ D_{x,1}\sqrt{\delta '}+D_{x,2}, D_{y,1}\sqrt{\delta '}+D_{y,2} \right\} , \end{aligned}$$
(27)
$$\begin{aligned}{} & {} D_{y, 2} =\frac{L_{xy}}{2\mu _{y}}D_{x,2}+ \frac{1}{2L_{xy}}\left( \sqrt{n\sigma _{F,r}^2} + L_{x}\sqrt{n}M_x+\Vert \nabla F(X^*)\Vert \right) \nonumber \\{} &\quad\quad\quad {} +\frac{1}{2\mu _{y}}\left( \sqrt{n\sigma _{G,r}^2} + L_{y}\sqrt{n}M_y+\Vert \nabla G(Y^*)\Vert \right) , \end{aligned}$$
(28)
$$\begin{aligned}{} & {} D_{y,1} =\frac{3}{2}+\frac{L_{xy}}{2\mu _{y}}D_{x,1}+\frac{L_{lx}}{2L_{xy}}+\frac{L_{ly}}{2\mu _{y}}, \end{aligned}$$
(29)
$$\begin{aligned}{} & {} D_{x,2}= \frac{1}{2L_{xy}}\left( \sqrt{n\sigma _{G,r}^2} + L_{y}\sqrt{n}M_y+\Vert \nabla G(Y^*)\Vert \right) \nonumber \\{} &\quad\quad\quad {} +\frac{1}{2\mu _{x}}\left( \sqrt{n\sigma _{F,r}^2} + L_{x}\sqrt{n}M_x+\Vert \nabla F(X^*)\Vert \right) , \end{aligned}$$
(30)
$$\begin{aligned}{} & {} D_{x,1} = \frac{3}{2}+\frac{L_{xy}}{2\mu _{x}}(1+2\theta )+\frac{L_{ly}}{2L_{xy}}+\frac{L_{lx}}{2\mu _{x}}, \end{aligned}$$
(31)
$$\begin{aligned}{} & {} M_x^2=\frac{\omega }{3L_{xy}}\left( \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )}\right) , \nonumber \\{} & {} M_y^2=\frac{1}{4L_{xy}\omega }\left( \Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )}\right) , \end{aligned}$$
(32)
$$\begin{aligned}{} & {} \Sigma ^2 = \left( \frac{1}{2L_{x}}+\frac{\omega }{L_{xy}} \right) \frac{\sigma _{F, r}^2}{n} + \left( \frac{1}{2L_{y}}+\frac{1}{L_{xy}\omega }\right) \frac{\sigma _{G, r}^2}{n}. \end{aligned}$$
(33)

Proof

$$\begin{aligned} \mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert&\le (1+\eta _x\beta _x L_{xy}^2+\eta _x L_{xy}(1+2\theta ))\sqrt{\delta '} \\ {}&\quad + \eta _x\beta _x L_{xy}\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert +\eta _x\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert . \end{aligned}$$

Using the definition of \(\eta _x\), \(\beta _x\) and estimations on gradients, we get

$$\begin{aligned} \begin{aligned} \mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert&\le \left( 1+\frac{1}{2}+\frac{L_{xy}}{4\hat{\mu _x}}(1+2\theta )\right) \sqrt{\delta '}+\frac{1}{2L_{xy}}\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert \\ {}&\quad + \frac{1}{4\hat{\mu _x}}\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert \\ {}&\le \left( \frac{3}{2}+\frac{L_{xy}}{4\hat{\mu _x}}(1+2\theta )+\frac{L_{ly}}{2L_{xy}}+\frac{L_{lx}}{4\hat{\mu _x}}\right) \sqrt{\delta '} \\ {}&\quad + \frac{1}{2L_{xy}}\left( \sqrt{n\sigma _{G, r}^2} + L_{y}\sqrt{n}M_y+\Vert \nabla G(Y^*)\Vert \right) \\ {}&\quad + \frac{1}{4\hat{\mu _x}}\left( \sqrt{n\sigma _{F, r}^2} + L_{x}\sqrt{n}M_x+\Vert \nabla F(X^*)\Vert \right) = D_{x,1}\sqrt{\delta '}+D_{x,2}. \end{aligned} \end{aligned}$$

Let’s estimate \(\mathbb {E}\Vert W^{k+1}-\overline{W^{k+1}}\Vert\)

$$\begin{aligned} W^{k+1}-\overline{W^{k+1}}&=(1-\eta _y\alpha _y)\left( Y^k-\overline{Y^k}\right) +\eta _y\alpha _y\left( Y_g^k-\overline{Y_g^k}\right) \\ {}&\quad - \eta _y\beta _y A\left( A^T \left( Y^k-\overline{Y^k}\right) +\left( \nabla ^r F(X_g^k, \xi _{x, k})-\overline{\nabla ^r F(X_g^k, \xi _{x, k})}\right) \right) \\ {}&\quad - \eta _y\left( \left( \nabla ^r G(Y_g^k, \xi _{y, k})-\overline{\nabla ^r G(Y_g^k, \xi _{y, k})}\right) - A \left( U^{k+1}-\overline{U^{k+1}}\right) \right) . \end{aligned}$$

Using that \(\eta _y\alpha _y \le 1\) and previous estimations, we get

$$\begin{aligned} \mathbb {E}\Vert W^{k+1}-\overline{W^{k+1}}\Vert&\le (1-\eta _y\alpha _y)\sqrt{\delta '} + \eta _y\alpha _y\sqrt{\delta '} + \eta _y\beta _y L_{xy}^2\sqrt{\delta '} \\ {}&\quad + \eta _y\beta _y L_{xy}\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert \\ {}&\quad + \eta _y\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert +\eta _y L_{xy}\mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert \\ {}&= (1+\eta _y\beta _y L_{xy}^2)\sqrt{\delta '}+\eta _y L_{xy}\mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert \\ {}&\quad + \eta _y\beta _y L_{xy}\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert + \eta _y\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert . \end{aligned}$$

Using the definition of \(\beta _y\), \(\eta _y\) and gradient estimations, we get

$$\begin{aligned} \mathbb {E}\Vert W^{k+1}-\overline{W^{k+1}}\Vert&\le \left( 1+\ \frac{1}{2}\right) \sqrt{\delta '}+\frac{L_{xy}}{4\hat{\mu _y}}\mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert +\frac{1}{2L_{xy}}\mathbb {E}\Vert \nabla ^r F(X_g^k, \xi _{x, k})\Vert \\ {}&\quad + \frac{1}{4\hat{\mu _y}}\mathbb {E}\Vert \nabla ^r G(Y_g^k, \xi _{y, k})\Vert \\ {}&\le \left( \frac{3}{2}+\frac{L_{lx}}{2L_{xy}}+\frac{L_{ly}}{4\hat{\mu _y}}+\frac{L_{xy}}{4\hat{\mu _y}}D_{x,1} \right) \sqrt{\delta '}+\frac{L_{xy}}{4\hat{\mu _y}}D_{x,2} \\ {}&\quad + \frac{1}{2L_{xy}}\left( \sqrt{n\sigma _{F, r}^2} + L_{x}\sqrt{n}M_x+\Vert \nabla F(X^*)\Vert \right) \\ {}&\quad + \frac{1}{4\hat{\mu _y}}\left( \sqrt{n\sigma _{G, r}^2} + L_{y}\sqrt{n}M_y+\Vert \nabla G(Y^*)\Vert \right) = D_{y,1}\sqrt{\delta '}+D_{y,2}. \end{aligned}$$

\(\square\)

Now let us estimate the number of communication steps T.

$$\begin{aligned} (1-\lambda )^{T/\tau }\max \{\mathbb {E}\Vert W^{k+1}-\overline{W^{k+1}}\Vert , \mathbb {E}\Vert U^{k+1}-\overline{U^{k+1}}\Vert \} \le \delta '. \end{aligned}$$

It would we sufficient to guarantee

$$\begin{aligned} (1-\lambda )^{T/\tau }D \le \delta '. \end{aligned}$$

Above inequality leads from this

$$\begin{aligned} T \ge \frac{\tau }{\lambda }\log \left( \frac{D}{\delta '}\right) . \end{aligned}$$

Putting the proof together

Using Lemma (2.1), we get

$$\begin{aligned} \mathbb {E}\left\| \overline{x^k} - x^*\right\| ^2 \le \frac{\omega }{3L_{xy}}\left( \theta ^k\Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+ \frac{\Sigma ^2}{2(1-\theta )} \right) , \\ \mathbb {E}\left\| \overline{y^k} - y^*\right\| ^2 \le \frac{1}{4L_{xy}\omega }\left( \theta ^k\Psi ^0 + \frac{4}{(1-\theta )^2}(\delta _x+\delta _y)+\frac{\Sigma ^2}{2(1-\theta )} \right) . \end{aligned}$$

Define several notations

$$\begin{aligned} \nu = \max \left\{ \frac{1}{3L_{xy}}\omega , \frac{1}{4L_{xy}}\omega ^{-1} \right\} . \end{aligned}$$

Using the definition of \(\Psi ^k\), we get

$$\begin{aligned} \Psi ^0 = \frac{1}{\eta _x}\Vert x^0-x^*\Vert +\frac{1}{\eta _y}\Vert x^0-x^*\Vert + \frac{2}{\sigma _x}B_f(x^0,x^*)+\frac{2}{\sigma _y}B_g(y^0,y^*), \end{aligned}$$

where \(\eta _x = \min \left\{ \frac{1}{4(\hat{\mu _x} + \hat{L_x}\sigma _x)}, \frac{\omega }{4L_{xy}} \right\}\), \(\eta _y = \min \left\{ \frac{1}{4(\hat{\mu _y} + \hat{L_y}\sigma _y)}, \frac{1}{4L_{xy}\omega } \right\} .\)

Rewriting it in terms of \(L_{lx}\), \(L_{x}\), \(L_{ly}\), \(L_{y}\), \(\mu _{lx}\), \(\mu _{x}\), \(\mu _{ly}\), \(\mu _{y}.\)

\(\eta _x = \min \left\{ \frac{1}{2\mu _{x} + 8L_{x}\sigma _x}, \frac{\omega }{4L_{xy}} \right\}\), \(\eta _y = \min \left\{ \frac{1}{2\mu _{y} + 8L_{y}\sigma _y}, \frac{1}{4L_{xy}\omega } \right\} .\)

$$\begin{aligned} \nu \theta ^k \Psi ^0 \le \frac{\varepsilon }{3}. \end{aligned}$$

It would be sufficient to take \(N = k = \frac{1}{1-\theta }\log \left( \frac{3\Psi ^0\nu }{\varepsilon }\right)\).

Finally, let us estimate the right part

$$\begin{aligned} \delta _x, \delta _y \le \frac{(1-\theta )^2\varepsilon }{24\nu }. \end{aligned}$$

Define E as \(E=\frac{1}{2n}\max \left\{ \frac{L_{lx}^2}{L_{x}} + \frac{2L_{lx}^2}{\mu _{x}} + L_{lx} - \mu _{lx}, \frac{L_{ly}^2}{L_{y}} + \frac{2L_{ly}^2}{\mu _{y}} + L_{ly} - \mu _{lx} \right\} .\)

Using definition of \(\delta _x\) and \(\delta _y\), we get

$$\begin{aligned} \delta ' = \frac{(1-\theta )^2\varepsilon }{24E\nu }. \end{aligned}$$

Define \(F_x\) and \(F_y\) as \(F_x = \frac{\nu }{2n(1-\theta )}\left( \frac{1}{\hat{L_x}}+\frac{\omega }{L_{xy}} \right)\), \(F_y= \frac{\nu }{2n(1-\theta )}\left( \frac{1}{\hat{L_y}}+\frac{1}{L_{xy}\omega } \right) .\)

Using the definitions of \(\Sigma ^2\), \(\sigma _{F, r}^2\), \(\sigma _{G, r}^2\) we get, that it would be sufficient to take \(r_{f, i} = \Bigg \lceil \frac{6F_x\sigma _{f, i}^2}{\varepsilon }\Bigg \rceil\) and \(r_{g, i} = \Bigg \lceil \frac{6F_y\sigma _{g, i}^2}{\varepsilon }\Bigg \rceil\).

Finally

$$\begin{aligned} N_{comm}&= NT =\mathcal {O}\left( \frac{1}{1-\theta }\tau \chi \log \left( \frac{\Psi ^0\nu }{\varepsilon }\right) \log \left( \frac{D'}{\varepsilon }\right) \right) ,\\ N_{comp}^i&= N(r_{i, f}+r_{i, g})\\&= 2N + \mathcal {O}\left( \frac{\max \{\omega , \omega ^{-1}\}}{nL_{xy}(1-\theta )^2\varepsilon }\left( \left( \frac{1}{L_{x}}+\frac{\omega }{L_{xy}} \right) \sigma _{f, i}^2 + \left( \frac{1}{L_{y}}+\frac{1}{L_{xy}\omega } \right) \sigma _{g, i}^2 \right) \log \left( \frac{\Psi ^0\nu }{\varepsilon }\right) \right) .\\&\frac{1}{1-\theta } = \mathcal {O}\left( \max \left\{ \sqrt{\frac{L_{x}}{\mu _{x}}}, \sqrt{\frac{L_{y}}{\mu _{y}}},\frac{L_{xy}}{\sqrt{\mu _{x}\mu _{y}}}\right\} \right) ,\\ \omega&= \sqrt{\frac{\mu _{y}}{\mu _{x}}}, \sigma _x = \sqrt{\frac{\mu _{x}}{8L_{x}}},\sigma _y = \sqrt{\frac{\mu _{y}}{8L_{y}}},\\ \frac{1}{1-\theta }&= \mathcal {O}\left( \max \left\{ \frac{\sqrt{L_{x}L_{y}}}{\mu _{xy}}, \frac{L_{xy}}{\mu _{xy}}\sqrt{\frac{L_{x}}{\mu _{x}}}, \frac{L_{xy}^2}{\mu _{xy}^2} \right\} \right) , \\ \omega&= \sqrt{\frac{\mu _{xy}^2}{2\mu _{x}L_{x}}}, \sigma _x = \sqrt{\frac{\mu _{x}}{8L_{x}}},\sigma _y =\min \left\{ 1,\sqrt{\frac{\mu _{xy}^2}{16L_{x}L_{y}}}\right\} , \\ \frac{1}{1-\theta }&= \mathcal {O}\left( \max \left\{ \frac{\sqrt{L_{x}L_{y}}}{\mu _{yx}}, \frac{L_{xy}}{\mu _{yx}}\sqrt{\frac{L_{y}}{\mu _{y}}}, \frac{L_{xy}^2}{\mu _{yx}^2} \right\} \right) , \\ \omega&= \sqrt{\frac{2\mu _{y}L_{y}}{\mu _{yx}^2}}, \sigma _x =\min \left\{ 1,\sqrt{\frac{\mu _{yx}^2}{16L_{x}L_{y}}}\right\} ,\sigma _y = \sqrt{\frac{\mu _{y}}{8L_{y}}},\\ \frac{1}{1-\theta }&= \mathcal {O}\left( \max \left\{ \frac{\sqrt{L_{x}L_{y}}L_{xy}}{\mu _{xy}\mu _{yx}}, \frac{L_{xy}^2}{\mu _{yx}^2}, \frac{L_{xy}^2}{\mu _{xy}^2} \right\} \right) , \\ \omega&= \frac{\mu _{xy}}{\mu _{yx}}\sqrt{\frac{L_{y}}{L_{x}}}, \sigma _x = \min \left\{ 1,\sqrt{\frac{\mu _{yx}^2}{16L_{x}L_{y}}}\right\} ,\sigma _y =\min \left\{ 1,\sqrt{\frac{\mu _{xy}^2}{16L_{x}L_{y}}}\right\} . \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Metelev, D., Rogozin, A., Gasnikov, A. et al. Decentralized saddle-point problems with different constants of strong convexity and strong concavity. Comput Manag Sci 21, 5 (2024). https://doi.org/10.1007/s10287-023-00485-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10287-023-00485-9

Keywords

Navigation