Skip to main content
Log in

Non-smooth setting of stochastic decentralized convex optimization problem over time-varying Graphs

  • Original Paper
  • Published:
Computational Management Science Aims and scope Submit manuscript

Abstract

Distributed optimization has a rich history. It has demonstrated its effectiveness in many machine learning applications, etc. In this paper we study a subclass of distributed optimization, namely decentralized optimization in a non-smooth setting. Decentralized means that m agents (machines) working in parallel on one problem communicate only with the neighbors agents (machines), i.e. there is no (central) server through which agents communicate. And by non-smooth setting we mean that each agent has a convex stochastic non-smooth function, that is, agents can hold and communicate information only about the value of the objective function, which corresponds to a gradient-free oracle. In this paper, to minimize the global objective function, which consists of the sum of the functions of each agent, we create a gradient-free algorithm by applying a smoothing scheme via \(l_2\) randomization. We also verify in experiments the obtained theoretical convergence results of the gradient-free algorithm proposed in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Agarwal A, Dekel O, Xiao L (2010) Optimal algorithms for online convex optimization with multi-point bandit feedback. In: Colt. Citeseer, pp 28–40

  • Akhavan A, Pontil M, Tsybakov A (2020) Exploiting higher order smoothness in derivative-free optimization and continuous bandits. Adv Neural Inf Process Syst 33:9017–9027

    Google Scholar 

  • Akhavan A, Chzhen E, Pontil M, Tsybakov A (2022) A gradient estimator via l1-randomization for online zero-order optimization with two point feedback. Adv Neural Inf Process Syst 35:7685–7696

    Google Scholar 

  • Akhavan A, Chzhen E, Pontil M, Tsybakov AB (2023) Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm. arXiv:2306.02159

  • Assran M, Loizou N, Ballas N, Rabbat M (2019) Stochastic gradient push for distributed deep learning. In: International conference on machine learning. PMLR, pp 344–353

  • Audet C, Hare W (2017) Derivative-free and blackbox optimization

  • Bach F, Perchet V (2016) Highly-smooth zero-th order online optimization. In: Conference on learning theory. PMLR, pp 257–283

  • Balasubramanian K, Ghadimi S (2022) Zeroth-order nonconvex stochastic optimization: handling constraints, high dimensionality, and saddle points. Found Comput Math 22:1–42

    Article  Google Scholar 

  • Beck A, Nedić A, Ozdaglar A, Teboulle M (2014) An \( o (1/k) \) gradient method for network resource allocation problems. IEEE Trans Control Netw Syst 1(1):64–73

    Article  Google Scholar 

  • Bogolubsky L, Dvurechenskii P, Gasnikov A, Gusev G, Nesterov Y, Raigorodskii AM, Tikhonov A, Zhukovskii M (2016) Learning supervised pagerank with gradient-based and gradient-free optimization methods. Adv Neural Inf Process Syst 29

  • Bubeck S et al (2015) Convex optimization: algorithms and complexity. Found Trends® Mach Learn 8(3–4):231–357

    Article  Google Scholar 

  • Bubeck S, Lee YT, Eldan R (2017) Kernel-based methods for bandit convex optimization. In: Proceedings of the 49th annual ACM SIGACT symposium on theory of computing, pp 72–85

  • Cai K, Ishii H (2014) Average consensus on arbitrary strongly connected digraphs with time-varying topologies. IEEE Trans Autom Control 59(4):1066–1071

    Article  Google Scholar 

  • Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):1–27

    Article  Google Scholar 

  • Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, Barham P, Chung HW, Sutton C, Gehrmann S et al (2022) Palm: scaling language modeling with pathways. arXiv:2204.02311

  • Conn AR, Scheinberg K, Vicente LN (2009) Introduction to derivative-free optimization. SIAM, Philadelphia

    Book  Google Scholar 

  • Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Ranzato M, Senior A, Tucker P, Yang K et al (2012) Large scale distributed deep networks. Adv Neural Inf Process Syst 25

  • Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13(1):165–202

    Google Scholar 

  • Duchi JC, Jordan MI, Wainwright MJ, Wibisono A (2015) Optimal rates for zero-order convex optimization: the power of two function evaluations. IEEE Trans Inf Theory 61(5):2788–2806

    Article  Google Scholar 

  • Dvinskikh D, Tominin V, Tominin I, Gasnikov A (2022) Noisy zeroth-order optimization for non-smooth saddle point problems. In: Mathematical optimization theory and operations research: 21st international conference, MOTOR 2022, Petrozavodsk, Russia, July 2–6, 2022, Proceedings. Springer, pp 18–33

  • Dvurechensky P, Gorbunov E, Gasnikov A (2021) An accelerated directional derivative method for smooth stochastic convex optimization. Eur J Oper Res 290(2):601–621

    Article  Google Scholar 

  • Ermoliev Y (1976) Stochastic programming methods. Nauka, Moscow

    Google Scholar 

  • Forero PA, Cano A, Giannakis GB (2010) Consensus-based distributed linear support vector machines. In: Proceedings of the 9th ACM/IEEE international conference on information processing in sensor networks, pp 35–46

  • Gasnikov AV, Krymova EA, Lagunovskaya AA, Usmanova IN, Fedorenko FA (2017) Stochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex case. Autom Remote Control 78:224–234

    Article  Google Scholar 

  • Gasnikov A, Dvinskikh D, Dvurechensky P, Gorbunov E, Beznosikov A, Lobanov A (2022a) Randomized gradient-free methods in convex optimization. arXiv:2211.13566

  • Gasnikov A, Novitskii A, Novitskii V, Abdukhakimov F, Kamzolov D, Beznosikov A, Takac M, Dvurechensky P, Gu B (2022b) The power of first-order smooth optimization for black-box non-smooth problems. In: International conference on machine learning. PMLR, pp 7241–7265

  • Ghadimi S, Lan G (2012) Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: a generic algorithmic framework. SIAM J Optim 22(4):1469–1492

    Article  Google Scholar 

  • Giselsson P, Doan MD, Keviczky T, De Schutter B, Rantzer A (2013) Accelerated gradient methods and dual decomposition in distributed model predictive control. Automatica 49(3):829–833

    Article  Google Scholar 

  • Gorbunov E, Danilova M, Gasnikov A (2020) Stochastic optimization with heavy-tailed noise via accelerated gradient clipping. Adv Neural Inf Process Syst 33:15042–15053

    Google Scholar 

  • Granichin ON et al (2023) B.T. Polyak’s method based on stochastic Lyapunov function for justifying the validity of estimates of the search algorithm for stochastic approximation under unknown but bounded disturbances. Comput Math Math Phys

  • Hernández-Lobato JM, Hoffman MW, Ghahramani Z (2014) Predictive entropy search for efficient global optimization of black-box functions. Adv Neural Inf Process Syst 27

  • Kolar M, Song L, Ahmed A, Xing EP (2010) Estimating time-varying networks. Ann Appl Sta 4:94–123

    Google Scholar 

  • Konečnỳ J, McMahan HB, Yu FX, Richtárik P, Suresh AT, Bacon D (2016) Federated learning: strategies for improving communication efficiency. arXiv:1610.05492

  • Kovalev D, Gasanov E, Gasnikov A, Richtarik P (2021a) Lower bounds and optimal algorithms for smooth and strongly convex decentralized optimization over time-varying networks. Adv Neural Inf Process Syst 34:22325–22335

  • Kovalev D, Shulgin E, Richtárik P, Rogozin AV, Gasnikov A (2021b) Adom: accelerated decentralized optimization method for time-varying networks. In: International conference on machine learning. PMLR, pp 5784–5793

  • Kovalev D, Gasanov E, Gasnikov A, Richtarik P (2021c) Lower bounds and optimal algorithms for smooth and strongly convex decentralized optimization over time-varying networks. In: Advances in neural information processing systems, vol 34, pp 22325–22335

  • Kovalev D, Beznosikov A, Sadiev A, Persiianov M, Richtárik P, Gasnikov A (2022) Optimal algorithms for decentralized stochastic variational inequalities. arXiv:2202.02771

  • Lattimore T, Gyorgy A (2021) Improved regret for zeroth-order stochastic convex bandits. In: Conference on learning theory. PMLR, pp 2938–2964

  • Li H, Lin Z (2021) Accelerated gradient tracking over time-varying graphs for decentralized optimization. arXiv:2104.02596

  • Li H, Fang C, Yin W, Lin Z (2020) Decentralized accelerated gradient methods with increasing penalty parameters. IEEE Trans Signal Process 68:4855–4870

    Article  Google Scholar 

  • Lian X, Zhang C, Zhang H, Hsieh C-J, Zhang W, Liu J (2017) Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. Adv Neural Inf Process Syst 30

  • Liu Z, Koloskova A, Jaggi M, Lin T (2022) Decentralized stochastic optimization with client sampling. In: OPT 2022: optimization for machine learning (NeurIPS 2022 Workshop)

  • Lobanov A (2023) Stochastic adversarial noise in the “black box” optimization problem. arXiv:2304.07861

  • Lobanov A, Alashqar B, Dvinskikh D, Gasnikov A (2022) Gradient-free federated learning methods with \( l_1 \) and \( l_2 \)-randomization for non-smooth convex stochastic optimization problems. arXiv:2211.10783

  • Lobanov A, Anikin A, Gasnikov A, Gornov A, Chukanov S (2023a) Zero-order stochastic conditional gradient sliding method for non-smooth convex optimization. arXiv:2303.02778

  • Lobanov A, Bashirov N, Gasnikov A (2023b) The black-box optimization prob- lem: Zero-order accelerated stochastic method via kernel approximation. arXiv:2310.02371

  • Lobanov A, Gasnikov A, Stonyakin F (2023c) Highly smoothness zero-order methods for solving optimization problems under pl condition. arXiv:2305.15828

  • Maros M Jaldén J (2018) Panda: a dual linearly converging method for distributed optimization over time-varying undirected graphs. In: 2018 IEEE conference on decision and control (CDC). IEEE, pp 6520–6525

  • McMahan B, Moore E, Ramage D, Hampson S, Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. PMLR, pp 1273–1282

  • Nedić A, Olshevsky A, Uribe CA (2017) Fast convergence rates for distributed non-Bayesian learning. IEEE Trans Autom Control 62(11):5538–5553

    Article  Google Scholar 

  • Nedic A, Olshevsky A, Shi W (2017) Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J Optim 27(4):2597–2633

    Article  Google Scholar 

  • Nesterov Y, Spokoiny V (2017) Random gradient-free minimization of convex functions. Found Comput Math 17:527–566

    Article  Google Scholar 

  • Nguyen A, Balasubramanian K (2022) Stochastic zeroth-order functional constrained optimization: oracle complexity and applications. INFORMS J Optim 5(3):256–272

    Article  Google Scholar 

  • Novitskii V, Gasnikov A (2021) Improved exploiting higher order smoothness in derivative-free optimization and continuous bandit. arXiv:2101.03821

  • Polyak BT, Tsybakov AB (1990) Optimal order of accuracy of search algorithms in stochastic optimization. Probl Peredachi Informatsii 26(2):45–53

    Google Scholar 

  • Qu G, Li N (2019) Accelerated distributed Nesterov gradient descent. IEEE Trans Autom Control 65(6):2566–2581

    Article  Google Scholar 

  • Rabbat M, Nowak R (2004) Distributed optimization in sensor networks. In: Proceedings of the 3rd international symposium on information processing in sensor networks, pp 20–27

  • Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning. PMLR, pp 8748–8763

  • Ramesh A, Pavlov M, Goh G, Gray S, Voss C, Radford A, Chen M, Sutskever I (2021) Zero-shot text-to-image generation. In: International conference on machine learning. PMLR, pp 8821–8831

  • Richtárik P, Takáč M (2014) Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math Program 144(1–2):1–38

    Article  Google Scholar 

  • Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22:400–407

    Article  Google Scholar 

  • Rogozin A, Gasnikov A, Beznosikov A, Kovalev D (2022) Decentralized optimization over time-varying graphs: a survey. arXiv:2210.09719

  • Scaman K, Bach F, Bubeck S, Lee YT, Massoulié L (2019) Optimal convergence rates for convex distributed optimization in networks. J Mach Learn Res 20(159):1–31

    Google Scholar 

  • Shamir O (2017) An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J Mach Learn Res 18(1):1703–1713

    Google Scholar 

  • Stepanov I, Voronov A, Beznosikov A, Gasnikov A (2021) One-point gradient-free methods for composite optimization with applications to distributed optimization. arXiv:2107.05951

  • Stich SU (2019) Unified optimal analysis of the (stochastic) gradient method. arXiv:1907.04232

  • Stich S, Mohtashami A, Jaggi M (2021) Critical parameters for scalable distributed learning with large batches and asynchronous updates. In: International conference on artificial intelligence and statistics. PMLR, pp 4042–4050

  • Tang H, Lian X, Yan M, Zhang C, Liu J (2018) \({D}^{2}\): decentralized training over decentralized data. In: International conference on machine learning. PMLR, pp 4848–4856

  • Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M-A, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F et al (2023) Llama: open and efficient foundation language models. arXiv:2302.13971

  • Wang M, Fu W, He X, Hao S, Wu X (2020) A survey on large-scale machine learning. IEEE Trans Knowl Data Eng 34(6):2574–2594

    Google Scholar 

  • Wang B, Safaryan M, Richtárik P (2022) Theoretically better and numerically faster distributed optimization with smoothness-aware quantization techniques. Adv Neural Inf Process Syst 35:9841–9852

    Google Scholar 

  • Woodworth B, Patel KK, Stich S, Dai Z, Bullins B, Mcmahan B, Shamir O, Srebro N (2020) Is local SGD better than minibatch SGD? In: International conference on machine learning. PMLR, pp 10334–10343

  • Woodworth BE, Bullins B, Shamir O, Srebro N (2021) The min–max complexity of distributed stochastic convex optimization with intermittent communication. In: Conference on learning theory. PMLR, pp 4386–4437

  • Wright SJ (2015) Coordinate descent algorithms. Math Program 151(1):3–34

    Article  Google Scholar 

  • Xiao L, Boyd S, Kim S-J (2007) Distributed average consensus with least-mean-square deviation. J Parallel Distrib Comput 67(1):33–46

    Article  Google Scholar 

  • Ye H, Luo L, Zhou Z, Zhang T (2020) Multi-consensus decentralized accelerated gradient descent. arXiv:2005.00797

  • Yu H, Jin R, Yang S (2019) On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization. In: International conference on machine learning. PMLR, pp 7184–7193

  • Zadeh LA (1961) Time-varying networks, I. Proc IRE 49(10):1488–1503

    Article  Google Scholar 

Download references

Acknowledgements

The work of Alexander Gasnikov, Aleksandr Lobanov was supported by a grant for research centers in the field of artificial intelligence, provided by the Analytical Center for the Government of the Russian Federation in accordance with the subsidy agreement (agreement identifier 000000D730321P5Q0002) and the agreement with the Ivannikov Institute for System Programming of the Russian Academy of Sciences dated November 2, 2021 No. 70-2021-00142.

Author information

Authors and Affiliations

Authors

Contributions

AL wrote the main text of the paper and prepared the theoretical materials, AV generalized the theoretical results to the case of one-point and two-point feedback, GK prepared Figures 1-3, AB helped with AV's theoretical results, DK provided the source code of the ADOM+ algorithm, AG supervised this work and checked the results.

Corresponding author

Correspondence to Aleksandr Lobanov.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Appendix 1: Auxiliary facts and results

In this section we list auxiliary facts and results that we use several times in our proofs.

1.1 Squared norm of the sum

For all \(a_1, \ldots ,a_n \in {\mathbb {R}}^d\), where \(n=\{2,3\}\)

$$\begin{aligned} \Vert a_1 + \cdots + a_n \Vert ^2 \le n \Vert a_1 \Vert ^2 + \cdots + n \Vert a_n \Vert ^2. \end{aligned}$$
(21)

1.2 Fenchel–Young inequality

For all \(a,b\in {\mathbb {R}}^d\) and \(\lambda > 0\)

$$\begin{aligned} \left\langle a,b \right\rangle \le \frac{\Vert a\Vert ^2}{2\lambda } + \frac{\lambda \Vert b\Vert ^2}{2}. \end{aligned}$$
(22)

1.3 Inner product representation

For all \(a,b\in {\mathbb {R}}^d\)

$$\begin{aligned} \left\langle a,b \right\rangle = \frac{1}{2}\left( \Vert a+b\Vert ^2 - \Vert a\Vert ^2 - \Vert b\Vert ^2\right) . \end{aligned}$$
(23)

1.4 Fact from concentration of the measure

Let \({\textbf{e}}\) is uniformly distributed on the Euclidean unit sphere, then, for \(d \ge 8\), \(\forall s \in {\mathbb {R}}^d\)

$$\begin{aligned} {\mathbb {E}}_{\textbf{e}}\left( \left\langle s,{\textbf{e}} \right\rangle ^2\right) \le \frac{\Vert s \Vert ^2}{d}. \end{aligned}$$
(24)

Appendix 2: Proof of Theorem

By \(\textrm{D}_F(x,y)\) we denote Bregman distance \(\textrm{D}_F(x,y){:}{=}F(x) - F(y) - \langle \nabla F(y),x-y\rangle \).

Lemma 5

Let \(\tau _2\) be defined as follows:

$$\begin{aligned} \tau _2 = \sqrt{\mu /L}. \end{aligned}$$
(25)

Let \(\tau _1\) be defined as follows:

$$\begin{aligned} \tau _1 = (1/\tau _2 + 1/2)^{-1}. \end{aligned}$$
(26)

Let \(\eta \) be defined as follows:

$$\begin{aligned} \eta = \left( [1/\beta + L] \cdot \tau _2\right) ^{-1} \end{aligned}$$
(27)

Let \(\alpha \) be defined as follows:

$$\begin{aligned} \alpha = \mu / 4 \end{aligned}$$
(28)

Let \(\nu \) be defined as follows:

$$\begin{aligned} \nu = \mu /2. \end{aligned}$$
(29)

Let \(\Psi _x^k\) be defined as follows:

$$\begin{aligned} \Psi _x^k = \left( \frac{1}{\eta } + \alpha \right) \left\Vert x^{k} - x^*\right\Vert ^2 + \frac{2}{\tau _2}\left( \textrm{D}_f(x_f^{k},x^*)-\frac{\nu }{2}\left\Vert x_f^{k} - x^*\right\Vert ^2 \right) \end{aligned}$$
(30)

Then the following inequality holds:

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ \Psi _x^{k+1} \right]&\le {\max \left\{ 1 - \tau _2/2, 1/(1+\eta \alpha )\right\} }\Psi _x^k \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] \\&\quad -\left( \textrm{D}_F(x_g^k,x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2\right) + \frac{\sigma ^2}{\tau _2} + {\frac{4}{\mu } \Delta ^2}. \end{aligned} \end{aligned}$$
(31)

Proof

$$\begin{aligned} \frac{1}{\eta }\left\Vert x^{k+1} - x^*\right\Vert ^2&= \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2+\frac{2}{\eta }\langle x^{k+1} - x^k,x^{k+1}- x^*\rangle - \frac{1}{\eta }\left\Vert x^{k+1} - x^k\right\Vert ^2. \end{aligned}$$

Let \({\textbf{G}}_{k} = {\textbf{g}}(x_g^k, \varvec{\xi }^k)\) then using Line 5 of Algorithm 1 we get

$$\begin{aligned} \frac{1}{\eta }\left\Vert x^{k+1} - x^*\right\Vert ^2&= \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 + 2\alpha \langle x_g^k - x^{k+1},x^{k+1}- x^*\rangle \\&\quad - 2\langle {\textbf{G}}_k - \nu x_g^k - y^{k+1},x^{k+1} - x^*\rangle - \frac{1}{\eta }\left\Vert x^{k+1} - x^k\right\Vert ^2 \\ {}&= \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 + 2\alpha \langle x_g^k - x^*- x^{k+1} + x^*,x^{k+1}- x^*\rangle \\ {}&\quad - 2\langle {\textbf{G}}_k - \nu x_g^k - y^{k+1},x^{k+1} - x^*\rangle - \frac{1}{\eta }\left\Vert x^{k+1} - x^k\right\Vert ^2 \\ {}&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\quad - 2\langle {\textbf{G}}_k - \nu x_g^k - y^{k+1},x^{k+1} - x^*\rangle - \frac{1}{\eta }\left\Vert x^{k+1} - x^k\right\Vert ^2. \end{aligned}$$

Using optimality condition (4) we get

$$\begin{aligned} \frac{1}{\eta }\left\Vert x^{k+1} - x^*\right\Vert ^2&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad - \frac{1}{\eta }\left\Vert x^{k+1} - x^k\right\Vert ^2 -2\langle {\textbf{G}}_k - \nabla F(x^*),x^{k+1} - x^*\rangle \\&\quad + 2\nu \langle x_g^k - x^*,x^{k+1} - x^*\rangle + 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle . \end{aligned}$$

Using Line 6 of Algorithm 1 we get

$$\begin{aligned} \frac{1}{\eta }\left\Vert x^{k+1} - x^*\right\Vert ^2&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad - \frac{1}{\eta \tau _2^2}\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 -2\langle {\textbf{G}}_k - \nabla F(x^*),x^k - x^*\rangle \\&\quad + 2\nu \langle x_g^k - x^*,x^k - x^*\rangle + 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \\&\quad - \frac{2}{\tau _2}\langle {\textbf{G}}_k - \nabla F(x^*),x_f^{k+1} - x_g^k\rangle + \frac{2\nu }{\tau _2}\langle x_g^k - x^*,x_f^{k+1} - x_g^k\rangle \\ {}&= \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad - \frac{1}{\eta \tau _2^2}\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 -2\langle mG_k - \nabla F(x^*),x^k - x^*\rangle \\&\quad + 2\nu \langle x_g^k - x^*,x^k - x^*\rangle + 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \\&\quad - \frac{2}{\tau _2}\langle {\textbf{G}}_k - \nabla F(x^*),x_f^{k+1} - x_g^k\rangle \\&\quad + \frac{\nu }{\tau _2}\left( \left\Vert x_f^{k+1} - x^*\right\Vert ^2 - \left\Vert x_g^k - x^*\right\Vert ^2-\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2\right) \\ {}&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad - \frac{1}{\eta \tau _2^2}\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 -2\langle {\textbf{G}}_k - \nabla F(x^*),x^k - x^*\rangle \\&\quad + 2\nu \langle x_g^k - x^*,x^k - x^*\rangle + 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \\&\quad + \frac{\nu }{\tau _2}\left( \left\Vert x_f^{k+1} - x^*\right\Vert ^2 - \left\Vert x_g^k - x^*\right\Vert ^2-\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2\right) \\&\quad -\frac{2}{\tau _2} \underbrace{\left\langle {\textbf{G}}_k - \nabla F(x_g^{k}),x_f^{k+1} - x_g^k \right\rangle }_{\textcircled {1}} \\&\quad -\frac{2}{\tau _2} \underbrace{\left\langle \nabla F(x_g^{k}) - \nabla F(x^{*}),x_f^{k+1} - x_g^k \right\rangle }_{\textcircled {2}} . \end{aligned}$$

Find the upper estimate for the term \(\textcircled {1}\):

$$\begin{aligned}&-\frac{2}{\tau _2} \left\langle {\textbf{G}}_k - \nabla F(x_g^{k}),x_f^{k+1} - x_g^k \right\rangle \\&\quad = \frac{2}{\tau _2} \left\langle {\textbf{G}}_k - \nabla F(x_g^{k}),x_g^k - x_f^{k+1} \right\rangle \\&\quad \overset{(\text {A2})}{\le } \frac{2}{\tau _2} \left( \frac{{\beta }}{2 } \left\| {\textbf{G}}_k - \nabla F(x_g^{k}) \right\| ^2 + \frac{ 1}{2 {\beta }} \left\| x_f^{k+1} - x_g^k \right\| ^2\right) . \end{aligned}$$

Find the upper estimate for the term \(\textcircled {2}\):

$$\begin{aligned}{} & {} -\frac{2}{\tau _2} \left\langle \nabla F(x_g^{k}) - \nabla F(x^{*}),x_f^{k+1} - x_g^k \right\rangle \\{} & {} \quad = -\frac{2}{\tau _2} \left\langle \nabla F(x_g^{k}),x_f^{k+1} - x_g^k \right\rangle \\{} & {} \quad \le \frac{2}{\tau _2} \left( F(x_g^{k}) - F(x_f^{k+1}) + \frac{L}{2} \left\| x_f^{k+1} - x_g^k \right\| ^2\right) . \end{aligned}$$

Substituting the obtained estimates we get:

$$\begin{aligned} \frac{1}{\eta }\left\Vert x^{k+1} - x^*\right\Vert ^2\le & {} \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\ {}{} & {} -\, \frac{1}{\eta \tau _2^2}\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 -2\langle {\textbf{G}}_k - \nabla F(x^*),x^k - x^*\rangle \\ {}{} & {} +\, 2\nu \langle x_g^k - x^*,x^k - x^*\rangle + 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \\{} & {} +\, \frac{\nu }{\tau _2}\left( \left\Vert x_f^{k+1} - x^*\right\Vert ^2 - \left\Vert x_g^k - x^*\right\Vert ^2-\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2\right) \\{} & {} +\, \frac{2}{\tau _2} \left( F(x_g^{k}) - F(x_f^{k+1}) + {\frac{1/\beta + L}{2} \cdot } \left\| x_f^{k+1} - x_g^k \right\| ^2 \right) \\ {}{} & {} + \,\frac{{\beta }}{\tau _2} \left\| {\textbf{G}}_k - \nabla F(x_g^{k}) \right\| ^2 \\ {}= & {} \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 \\ {}{} & {} +\, \alpha \left\Vert x_g^k - x^*\right\Vert ^2 + \left( \frac{ {1/\beta + L}}{\tau _2}- \frac{1}{\eta \tau _2^2} \right) \left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 \\{} & {} -\,2\langle {\textbf{G}}_k - \nabla F(x^*),x^k - x^*\rangle + 2\nu \langle x_g^k - x^*,x^k - x^*\rangle \\ {}{} & {} +\, 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle + \frac{{\beta }}{\tau _2 } \left\| {\textbf{G}}_k - \nabla F(x_g^{k}) \right\| ^2 \\{} & {} +\, \frac{\nu }{\tau _2}\left( \left\Vert x_f^{k+1} - x^*\right\Vert ^2 - \left\Vert x_g^k - x^*\right\Vert ^2-\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2\right) \\{} & {} +\, \frac{2}{\tau _2} \left( - F(x_f^{k+1}) + F(x_g^{k}) \pm F(x^*) + \left\langle \nabla F(x^*),x^* - x_f^{k+1} \right\rangle \right. \\ {}{} & {} \left. -\, \left\langle \nabla F(x^*),x^* - x_g^k \right\rangle \right) \\ {}= & {} \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 \\ {}{} & {} +\, \alpha \left\Vert x_g^k - x^*\right\Vert ^2 + \left( \frac{ {1/\beta + L} - \nu }{\tau _2}- \frac{1}{\eta \tau _2^2} \right) \left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 \\{} & {} -\,2\langle {\textbf{G}}_k - \nabla F(x^*),x^k - x^*\rangle + 2\nu \langle x_g^k - x^*,x^k - x^*\rangle \\ {}{} & {} +\, 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle - \frac{2}{\tau _2} \left( D_f(x_f^{k+1}) - D_f(x_g^k) \right) \\ {}{} & {} +\, \frac{\nu }{\tau _2}\left( \left\Vert x_f^{k+1} - x^*\right\Vert ^2 - \left\Vert x_g^k - x^*\right\Vert ^2\right) + \frac{{\beta }}{\tau _2 } \left\| {\textbf{G}}_k - \nabla F(x_g^{k}) \right\| ^2. \end{aligned}$$

Taking expectation on \(\varvec{\xi }^k\) we have:

$$\begin{aligned} \frac{1}{\eta }{\mathbb {E}}\left[ \left\| x^{k+1} - x^* \right\| ^2\right]\le & {} \frac{1}{\eta }{\mathbb {E}}\left[ \left\| x^k - x^* \right\| ^2 \right] - \alpha {\mathbb {E}}\left[ \left\| x^{k+1} - x^* \right\| ^2 \right] \\ {}{} & {} +\, \alpha {\mathbb {E}}\left[ \left\| x_g^k - x^* \right\| ^2 \right] -2 {\mathbb {E}}\left[ \left\langle {\textbf{G}}_k - \nabla F(x^*),x^k - x^* \right\rangle \right] \\ {}{} & {} +\, 2\nu {\mathbb {E}}\left[ \left\langle x_g^k - x^*,x^k - x^* \right\rangle \right] + 2{\mathbb {E}}\left[ \left\langle y^{k+1} - y^*,x^{k+1} - x^* \right\rangle \right] \\ {}{} & {} -\, \frac{2}{\tau _2} \left( {\mathbb {E}}\left[ D_f(x_f^{k+1}) - D_f(x_g^k) \right] \right) \\{} & {} +\, \frac{\nu }{\tau _2}\left( {\mathbb {E}}\left[ \left\| x_f^{k+1} - x^* \right\| ^2 - \left\| x_g^k - x^* \right\| ^2 \right] \right) \\{} & {} +\, \left( \frac{ {1/\beta + L} - \nu }{\tau _2}- \frac{1}{\eta \tau _2^2} \right) {\mathbb {E}}\left[ \left\| x_f^{k+1} - x_g^k \right\| ^2 \right] \\ {}{} & {} +\, \frac{{\beta }}{\tau _2 } \left\| {\textbf{G}}_k - \nabla F(x_g^{k}) \right\| ^2 \\ {}&\overset{(\text {11}),(\text {13})}{\le }&\frac{1}{\eta }{\mathbb {E}}\left[ \left\| x^k - x^* \right\| ^2 \right] \\ {}{} & {} -\, \alpha {\mathbb {E}}\left[ \left\| x^{k+1} - x^* \right\| ^2 \right] + \alpha {\mathbb {E}}\left[ \left\| x_g^k - x^* \right\| ^2 \right] \\{} & {} -\,2 \left\langle \nabla F(x_g^k) + {\varvec{\omega }(x_g^k)} - \nabla F(x^*),x^k - x^* \right\rangle \\ {}{} & {} +\, 2\nu \left\langle x_g^k - x^*,x^k - x^* \right\rangle + 2{\mathbb {E}}\left[ \left\langle y^{k+1} - y^*,x^{k+1} - x^* \right\rangle \right] \\ {}{} & {} -\, \frac{2}{\tau _2} \left( {\mathbb {E}}\left[ D_f(x_f^{k+1}) \right] - {\mathbb {E}}\left[ D_f(x_g^k) \right] \right) \\{} & {} +\, \frac{\nu }{\tau _2}\left( {\mathbb {E}}\left[ \left\| x_f^{k+1} - x^* \right\| ^2 \right] - {\mathbb {E}}\left[ \left\| x_g^k - x^* \right\| ^2 \right] \right) \\{} & {} +\, \left( \frac{ {1/\beta + L}- \nu }{\tau _2}- \frac{1}{\eta \tau _2^2} \right) {\mathbb {E}}\left[ \left\| x_f^{k+1} - x_g^k \right\| ^2 \right] + \frac{{\beta } \sigma ^2}{\tau _2 }. \end{aligned}$$

Using

$$\begin{aligned} 2 \left\langle \varvec{\omega }(x_g^k),x^k - x^* \right\rangle \le \frac{4}{\mu } \left\| \varvec{\omega }(x_g^k) \right\| ^2 + \frac{\mu }{4} \left\| x_g^k - x^* \right\| ^2. \end{aligned}$$

And Line 4 of Algorithm 1 we get

$$\begin{aligned} \frac{1}{\eta }{\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right]&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha {\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right] + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \left( \frac{ {1/\beta + L} - \nu }{\tau _2}-\frac{1}{\eta \tau _2^2}\right) {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 \right] \\&\quad -2\langle \nabla F(x_g^k) - \nabla F(x^*),x_g^k - x^*\rangle + 2\nu \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \frac{2(1-\tau _1)}{\tau _1}\langle \nabla F(x_g^k) - \nabla F(x^*),x_f^k - x_g^k\rangle \\&\quad + \frac{2\nu (1-\tau _1)}{\tau _1}\langle x_g^k - x_f^k,x_g^k - x^*\rangle \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] + \frac{{\beta }\sigma ^2}{\tau _2 } \\&\quad - \frac{2}{\tau _2}\left( {\mathbb {E}}\left[ \textrm{D}_f(x_f^{k+1},x^*) \right] - \textrm{D}_f(x_g^k,x^*)\right) \\&\quad + \frac{\nu }{\tau _2}\left( {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x^*\right\Vert ^2 \right] - \left\Vert x_g^k - x^*\right\Vert ^2\right) \\&\quad + {\frac{4}{\mu } \Delta ^2 + \frac{\mu }{4} \left\| x_g^k - x^* \right\| ^2} \\ {}&= \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha {\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right] \\&\quad + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 + \left( \frac{ {1/\beta + L}- \nu }{\tau _2}-\frac{1}{\eta \tau _2^2}\right) {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 \right] \\&\quad -2\langle \nabla F(x_g^k) - \nabla F(x^*),x_g^k - x^*\rangle + 2\nu \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \frac{2(1-\tau _1)}{\tau _1}\langle \nabla F(x_g^k) - \nabla F(x^*),x_f^k - x_g^k\rangle \\&\quad + \frac{\nu (1-\tau _1)}{\tau _1}\left( \left\Vert x_g^k- x_f^k\right\Vert ^2 + \left\Vert x_g^k - x^*\right\Vert ^2 - \left\Vert x_f^k - x^*\right\Vert ^2\right) \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] \\&\quad - \frac{2}{\tau _2}\left( {\mathbb {E}}\left[ \textrm{D}_f(x_f^{k+1},x^*) \right] - \textrm{D}_f(x_g^k,x^*)\right) \\&\quad + \frac{\nu }{\tau _2}\left( {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x^*\right\Vert ^2 \right] - \left\Vert x_g^k - x^*\right\Vert ^2\right) + \frac{{\beta }\sigma ^2}{\tau _2 } \\&\quad {+} {\frac{4 \Delta ^2}{\mu } + \frac{\mu }{4} \left\| x_g^k - x^* \right\| ^2}. \end{aligned}$$

Using \(\mu \)-strong convexity of \(\textrm{D}_F(x,x^*)\) in x, which follows from \(\mu \)-strong convexity of F(x), we get

$$\begin{aligned} \frac{1}{\eta }{\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right]&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha {\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right] + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \left( \frac{ {1/\beta + L} - \nu }{\tau _2}-\frac{1}{\eta \tau _2^2}\right) {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 \right] -2\textrm{D}_F(x_g^k,x^*) \\&\quad - \mu \left\Vert x_g^k - x^*\right\Vert ^2 + 2\nu \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \frac{2(1-\tau _1)}{\tau _1}\left( \textrm{D}_F(x_f^k,x^*) - \textrm{D}_F(x_g^k,x^*) - \frac{\mu }{2}\left\Vert x_f^k - x_g^k\right\Vert ^2\right) \\&\quad + \frac{\nu (1-\tau _1)}{\tau _1}\left( \left\Vert x_g^k- x_f^k\right\Vert ^2 + \left\Vert x_g^k - x^*\right\Vert ^2 - \left\Vert x_f^k - x^*\right\Vert ^2\right) \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] \\&\quad - \frac{2}{\tau _2}\left( {\mathbb {E}}\left[ \textrm{D}_f(x_f^{k+1},x^*) \right] - \textrm{D}_f(x_g^k,x^*)\right) \\&\quad + \frac{\nu }{\tau _2}\left( {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x^*\right\Vert ^2 \right] - \left\Vert x_g^k - x^*\right\Vert ^2\right) + \frac{{\beta }\sigma ^2}{\tau _2 } \\ {}&= \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha {\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right] \\&\quad + \frac{2(1-\tau _1)}{\tau _1}\left( \textrm{D}_F(x_f^k,x^*) - \frac{\nu }{2}\left\Vert x_f^k - x^*\right\Vert ^2\right) \\&\quad - \frac{2}{\tau _2}\left( {\mathbb {E}}\left[ \textrm{D}_f(x_f^{k+1},x^*) \right] -\frac{\nu }{2}{\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x^*\right\Vert ^2 \right] \right) \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] + 2\left( \frac{1}{\tau _2}-\frac{1}{\tau _1}\right) \textrm{D}_F(x_g^k,x^*) \\&\quad + \left( \alpha - \mu + \nu +\frac{\nu }{\tau _1}-\frac{\nu }{\tau _2}\right) \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \left( \frac{ {1/\beta + L} - \nu }{\tau _2}-\frac{1}{\eta \tau _2^2}\right) {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 \right] \\&\quad + \frac{(1-\tau _1)(\nu -\mu )}{\tau _1}\left\Vert x_f^k - x_g^k\right\Vert ^2 \\&\quad + \frac{\sigma ^2}{{\beta } \tau _2 } {+ \frac{4\Delta ^2}{\mu } + \frac{\mu }{4} \left\| x_g^k - x^* \right\| ^2}. \end{aligned}$$

Using \(\eta \) defined by (27), \(\tau _1\) defined by (26) and the fact that \(\nu < \mu \) we get

$$\begin{aligned} \frac{1}{\eta }{\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right]&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha {\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right] \\&\quad + \frac{2(1-\tau _2/2)}{\tau _2}\left( \textrm{D}_F(x_f^k,x^*) - \frac{\nu }{2}\left\Vert x_f^k - x^*\right\Vert ^2\right) \\&\quad - \frac{2}{\tau _2}\left( {\mathbb {E}}\left[ \textrm{D}_f(x_f^{k+1},x^*) \right] -\frac{\nu }{2}{\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x^*\right\Vert ^2 \right] \right) \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] \\&\quad -\textrm{D}_F(x_g^k,x^*) + \left( \alpha - \mu + \frac{3\nu }{2} {+ \frac{\mu }{4}} \right) \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \frac{{\beta }\sigma ^2}{\tau _2 } {+ \frac{4}{\mu } \Delta ^2}. \end{aligned}$$

Using \(\alpha \) defined by (28) and \(\nu \) defined by (29) we get

$$\begin{aligned} \frac{1}{\eta }{\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right]&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha {\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right] \\&\quad + \frac{2(1-\tau _2/2)}{\tau _2}\left( \textrm{D}_F(x_f^k,x^*) - \frac{\nu }{2}\left\Vert x_f^k - x^*\right\Vert ^2\right) \\ {}&\quad - \frac{2}{\tau _2}\left( {\mathbb {E}}\left[ \textrm{D}_f(x_f^{k+1},x^*) \right] -\frac{\nu }{2}{\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x^*\right\Vert ^2 \right] \right) \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] \\&\quad - \left( \textrm{D}_F(x_g^k,x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2\right) + \frac{{\beta }\sigma ^2}{\tau _2 } {+ \frac{4}{\mu } \Delta ^2}. \end{aligned}$$

After rearranging and using \(\Psi _x^k \) definition (30) we get

$$\begin{aligned} {\mathbb {E}}\left[ \Psi _x^{k+1} \right]&\le \max \left\{ 1 - \tau _2/2, 1/(1+\eta \alpha )\right\} \Psi _x^k + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] \\&\quad - \left( \textrm{D}_F(x_g^k,x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2\right) + \frac{{\beta }\sigma ^2}{\tau _2 } {+ \frac{4}{\mu } \Delta ^2} \end{aligned}$$

\(\square \)

Lemma 6

The following inequality holds:

$$\begin{aligned} \begin{aligned} -\left\Vert y^{k+1}- y^*\right\Vert ^2&\le \frac{(1-\vartheta _1)}{\vartheta _1}\left\Vert y_f^k - y^*\right\Vert ^2 - \frac{1}{\vartheta _2}\left\Vert y_f^{k+1} - y^*\right\Vert ^2 \\&\quad - \left( \frac{1}{\vartheta _1} - \frac{1}{\vartheta _2}\right) \left\Vert y_g^k - y^*\right\Vert ^2 + \left( \vartheta _2- \vartheta _1\right) \left\Vert y^{k+1} - y^k\right\Vert ^2 \end{aligned} \end{aligned}$$
(32)

Proof

Lines 7 and 9 of Algorithm 1 imply

$$\begin{aligned} y_f^{k+1}&= y_g^k + \vartheta _2(y^{k+1} - y_k)\\ {}&= y_g^k + \vartheta _2 y^{k+1} - \frac{\vartheta _2}{\vartheta _1}\left( y_g^k - (1-\vartheta _1)y_f^k\right) \\ {}&= \left( 1 - \frac{\vartheta _2}{\vartheta _1}\right) y_g^k + \vartheta _2 y^{k+1} + \left( \frac{\vartheta _2}{\vartheta _1}- \vartheta _2\right) y_f^k. \end{aligned}$$

After subtracting \(y^*\) and rearranging we get

$$\begin{aligned} (y_f^{k+1}- y^*)+ \left( \frac{\vartheta _2}{\vartheta _1} - 1\right) (y_g^k - y^*) = \vartheta _2( y^{k+1} - y^*)+ \left( \frac{\vartheta _2}{\vartheta _1} - \vartheta _2\right) (y_f^k - y^*). \end{aligned}$$

Multiplying both sides by \(\frac{\vartheta _1}{\vartheta _2}\) gives

$$\begin{aligned} \frac{\vartheta _1}{\vartheta _2}(y_f^{k+1}- y^*)+ \left( 1-\frac{\vartheta _1}{\vartheta _2}\right) (y_g^k - y^*) = \vartheta _1( y^{k+1} - y^*)+ \left( 1 - \vartheta _1\right) (y_f^k - y^*). \end{aligned}$$

Squaring both sides gives

$$\begin{aligned}&\frac{\vartheta _1}{\vartheta _2}\left\Vert y_f^{k+1} - y^*\right\Vert ^2 + \left( 1- \frac{\vartheta _1}{\vartheta _2}\right) \left\Vert y_g^k - y^*\right\Vert ^2 - \frac{\vartheta _1}{\vartheta _2}\left( 1-\frac{\vartheta _1}{\vartheta _2}\right) \left\Vert y_f^{k+1} - y_g^k\right\Vert ^2\\&\quad \le \vartheta _1\left\Vert y^{k+1} - y^*\right\Vert ^2 + (1-\vartheta _1)\left\Vert y_f^k - y^*\right\Vert ^2. \end{aligned}$$

Rearranging gives

$$\begin{aligned} -\left\Vert y^{k+1}- y^*\right\Vert ^2&\le -\left( \frac{1}{\vartheta _1} - \frac{1}{\vartheta _2}\right) \left\Vert y_g^k - y^*\right\Vert ^2 + \frac{(1-\vartheta _1)}{\vartheta _1}\left\Vert y_f^k - y^*\right\Vert ^2 \\&\quad - \frac{1}{\vartheta _2}\left\Vert y_f^{k+1} - y^*\right\Vert ^2+ \frac{1}{\vartheta _2}\left( 1 - \frac{\vartheta _1}{\vartheta _2}\right) \left\Vert y_f^{k+1} - y_g^k\right\Vert ^2. \end{aligned}$$

Using Line 9 of Algorithm 1 we get

$$\begin{aligned} -\left\Vert y^{k+1}- y^*\right\Vert ^2&\le -\left( \frac{1}{\vartheta _1} - \frac{1}{\vartheta _2}\right) \left\Vert y_g^k - y^*\right\Vert ^2 + \frac{(1-\vartheta _1)}{\vartheta _1}\left\Vert y_f^k - y^*\right\Vert ^2\\&\quad - \frac{1}{\vartheta _2}\left\Vert y_f^{k+1} - y^*\right\Vert ^2+ \left( \vartheta _2 - \vartheta _1\right) \left\Vert y^{k+1} - y^k\right\Vert ^2. \end{aligned}$$

\(\square \)

Lemma 7

Let \(\beta \) be defined as follows:

$$\begin{aligned} \beta \le 1/(2L). \end{aligned}$$
(33)

Let \(\vartheta _1\) be defined as follows:

$$\begin{aligned} \vartheta _1 = (1/\vartheta _2 + 1/2)^{-1}. \end{aligned}$$
(34)

Then the following inequality holds:

$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \nonumber \\&\quad \le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) \nonumber \\&\qquad - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] \nonumber \\&\qquad - 2\nu ^{-1}{\mathbb {E}}\left[ \langle y_g^k + z_g^k - (y^* + z^*), y^{k+1} - y^*\rangle \right] - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 \nonumber \\&\qquad + \left( \frac{\beta \vartheta _2^2}{4} - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] + {\sigma ^2 \beta }. \end{aligned}$$
(35)

Proof

$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&= \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \frac{2}{\theta }\langle y^{k+1} - y^k , y^{k+1} - y^*\rangle - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2. \end{aligned}$$

Using Line 8 of Algorithm 1 we get

$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&= \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + 2\beta \langle {\textbf{G}}_k - \nu x_g^k - y^{k+1}, y^{k+1} - y^*\rangle \\&\quad - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2. \end{aligned}$$

Using optimality condition (4) we get

$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&= \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\&\quad + 2\beta \langle {\textbf{G}}_k - \nu x_g^k - (\nabla F(x^*) - \nu x^*) + y^*- y^{k+1}, y^{k+1} - y^*\rangle \\&\quad - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&= \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + 2\beta \langle {\textbf{G}}_k - \nu x_g^k - (\nabla F(x^*) - \nu x^*), y^{k+1} - y^*\rangle \\&\quad - 2\beta \left\Vert y^{k+1} - y^*\right\Vert ^2 - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle \\&\quad - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\&\quad + \beta \left\Vert {\textbf{G}}_k - \nu x_g^k - (\nabla F(x^*) - \nu x^*)\right\Vert ^2 - \beta \left\Vert y^{k+1} - y^*\right\Vert ^2 \\&\quad - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \beta \left\Vert {\textbf{G}}_k - \nabla F(x_g^k)\right\Vert ^2 \\&\quad + \beta \left\Vert \nabla F(x_g^k) - \nu x_g^k - (\nabla F(x^*) - \nu x^*)\right\Vert ^2 - \beta \left\Vert y^{k+1} - y^*\right\Vert ^2 \\&\quad - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2. \end{aligned}$$

Function \(F(x) - \frac{\nu }{2}\left\Vert x\right\Vert ^2\) is convex and L-smooth, which implies

$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + 2\beta L\left( \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2\right) \\ {}&\quad - \beta \left\Vert y^{k+1} - y^*\right\Vert ^2 - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle \\ {}&\quad - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2 + \beta \left\Vert {\textbf{G}}_k - \nabla F(x_g^k)\right\Vert ^2. \end{aligned}$$

Using \(\beta \) definition (33) we get

$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\quad - \beta \left\Vert y^{k+1} - y^*\right\Vert ^2 - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle \\ {}&\quad - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2 + \beta \left\Vert {\textbf{G}}_k - \nabla F(x_g^k)\right\Vert ^2. \end{aligned}$$

Using optimality condition (5) we get

$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\quad - \beta \left\Vert y^{k+1} - y^*\right\Vert ^2 + \beta \left\Vert {\textbf{G}}_k - \nabla F(x_g^k)\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^* + z^*), y^{k+1} - y^*\rangle \\ {}&\quad - 2\langle x^{k+1} - x^*, y^{k+1} - y^*\rangle - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2. \end{aligned}$$

Using (32) together with \(\vartheta _1\) definition (34) we get

$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\quad - \frac{\beta }{2}\left\Vert y^{k+1} - y^*\right\Vert ^2 + \beta \left\Vert {\textbf{G}}_k - \nabla F(x_g^k)\right\Vert ^2 \\&\quad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 \\ {}&\quad - \frac{\beta }{2\vartheta _2}\left\Vert y_f^{k+1} - y^*\right\Vert ^2 - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 \\&\quad + \frac{\beta \left( \vartheta _2- \vartheta _1\right) }{2}\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^* + z^*), y^{k+1} - y^*\rangle \\&\quad - 2\langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \\ {}&\quad - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 - \frac{\beta }{2}\left\Vert y^{k+1} - y^*\right\Vert ^2 \\ {}&\quad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 - \frac{\beta }{2\vartheta _2}\left\Vert y_f^{k+1} - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) \\ {}&\quad - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2\\&\quad + \left( \frac{\beta \vartheta _2^2}{4} - \frac{1}{\theta }\right) \left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^* + z^*), y^{k+1} - y^*\rangle \\ {}&\quad - 2\langle x^{k+1} - x^*, y^{k+1} - y^*\rangle + \beta \left\Vert {\textbf{G}}_k - \nabla F(x_g^k)\right\Vert ^2. \end{aligned}$$

Rearranging and taking expectation wrt \(\xi _k\) gives

$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\&\quad \le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2\\&\qquad + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\qquad - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] \\&\qquad - 2\nu ^{-1}{\mathbb {E}}\left[ \langle y_g^k + z_g^k - (y^* + z^*), y^{k+1} - y^*\rangle \right] \\ {}&\qquad - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 + \left( \frac{\beta \vartheta _2^2}{4} - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] + {\sigma ^2 \beta }. \end{aligned}$$

\(\square \)

Lemma 8

The following inequality holds:

$$\begin{aligned} \left\Vert m^k\right\Vert ^2_{\textbf{P}}\le 8\chi ^2\varkappa ^2\nu ^{-2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}+ 4\chi (1 - (4\chi )^{-1})\left\Vert m^k\right\Vert ^2_{\textbf{P}}- 4\chi \left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}. \end{aligned}$$
(36)

Proof

Using Line 12 of Algorithm 1 we get

$$\begin{aligned} \left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}&= \left\Vert \varkappa \nu ^{-1}(y_g^k+z_g^k) + m^k - ({\textbf{W}}(k)\otimes {\textbf{I}}_d)\left[ \varkappa \nu ^{-1}(y_g^k+z_g^k) + m^k\right] \right\Vert ^2_{{\textbf{P}}}\\&= \left\Vert {\textbf{P}}\left[ \varkappa \nu ^{-1}(y_g^k+z_g^k) + m^k\right] - ({\textbf{W}}(k)\otimes {\textbf{I}}_d){\textbf{P}}\left[ \varkappa \nu ^{-1}(y_g^k+z_g^k) + m^k\right] \right\Vert ^2. \end{aligned}$$

Using property (4) we obtain

$$\begin{aligned} \left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}&\le (1 - \chi ^{-1}) \left\Vert m^k + \varkappa \nu ^{-1}(y_g^k + z_g^k)\right\Vert ^2_{\textbf{P}}. \end{aligned}$$

Using inequality \(\left\Vert a+b\right\Vert ^2 \le (1+c)\left\Vert a\right\Vert ^2 + (1+c^{-1})\left\Vert b\right\Vert ^2\) with \(c = \frac{1}{2(\chi - 1)}\) we get

$$\begin{aligned} \left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}&\le (1 - \chi ^{-1}) \left[ \left( 1 + \frac{1}{2(\chi - 1)}\right) \left\Vert m^k\right\Vert ^2_{\textbf{P}}\right. \\&\left. \quad + \left( 1 + 2(\chi - 1)\right) \varkappa ^2\nu ^{-2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}\right] \\ {}&\le (1 - (2\chi )^{-1})\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ 2\chi \varkappa ^2\nu ^{-2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}. \end{aligned}$$

Rearranging gives

$$\begin{aligned} \left\Vert m^k\right\Vert ^2_{\textbf{P}}&\le 8\chi ^2\varkappa ^2\nu ^{-2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}+ 4\chi (1 - (4\chi )^{-1})\left\Vert m^k\right\Vert ^2_{\textbf{P}}- 4\chi \left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}. \end{aligned}$$

\(\square \)

Lemma 9

Let \({\hat{z}}^k\) be defined as follows:

$$\begin{aligned} {\hat{z}}^k = z^k - {\textbf{P}}m^k. \end{aligned}$$
(37)

Then the following inequality holds:

$$\begin{aligned} \begin{aligned}&\frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1} +\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^*+z^*),z^k - z^*\rangle + \varkappa \nu ^{-2}\left( 1 + 6\chi \right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2. \end{aligned} \end{aligned}$$
(38)

Proof

$$\begin{aligned} \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2&= \frac{1}{\varkappa }\left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + \frac{2}{\varkappa }\langle {\hat{z}}^{k+1} - {\hat{z}}^k,{\hat{z}}^k - z^*\rangle + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - {\hat{z}}^k\right\Vert ^2. \end{aligned}$$

Lines 11 and 12 of Algorithm 1 together with \({\hat{z}}^k\) definition (37) imply

$$\begin{aligned} {\hat{z}}^{k+1} - {\hat{z}}^k = \varkappa \pi (z_g^k - z^k) - \varkappa \nu ^{-1}{\textbf{P}}(y_g^k + z_g^k). \end{aligned}$$

Hence,

$$\begin{aligned} \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2&= \frac{1}{\varkappa }\left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + 2\pi \langle z_g^k - z^k,{\hat{z}}^k - z^*\rangle \\ {}&\quad - 2\nu ^{-1}\langle {\textbf{P}}(y_g^k + z_g^k),{\hat{z}}^k - z^*\rangle + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - {\hat{z}}^k\right\Vert ^2 \\ {}&= \frac{1}{\varkappa }\left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + \pi \left\Vert z_g^k - {\textbf{P}}m^k - z^*\right\Vert ^2 \\ {}&\quad - \pi \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 - \pi \left\Vert z_g^k - z^k\right\Vert ^2 - 2\nu ^{-1}\langle {\textbf{P}}(y_g^k + z_g^k),{\hat{z}}^k - z^*\rangle \\ {}&\quad + \varkappa \left\Vert \pi (z_g^k - z^k) - \nu ^{-1}{\textbf{P}}(y_g^k+z_g^k)\right\Vert ^2 \\ {}&\le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\quad + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 + 2\pi \left\Vert m^k\right\Vert ^2_{\textbf{P}}- \pi \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle {\textbf{P}}(y_g^k + z_g^k),{\hat{z}}^k - z^*\rangle + 2\varkappa \pi ^2\left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad + \varkappa \left\Vert \nu ^{-1}{\textbf{P}}(y_g^k+z_g^k)\right\Vert ^2 \\ {}&\le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 \\ {}&\quad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 - 2\nu ^{-1}\langle {\textbf{P}}(y_g^k + z_g^k),z^k - z^*\rangle \\ {}&\quad + \varkappa \left\Vert \nu ^{-1}{\textbf{P}}(y_g^k+z_g^k)\right\Vert ^2 + 2\pi \left\Vert m^k\right\Vert ^2_{\textbf{P}}+ 2\nu ^{-1}\langle {\textbf{P}}(y_g^k + z_g^k),m^k\rangle . \end{aligned}$$

Using the fact that \(z^k \in {\mathcal {L}}^\perp \) for all \(k=0,1,2\ldots \) and optimality condition (6) we get

$$\begin{aligned} \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2&\le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2\\&\quad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^*+z^*),z^k - z^*\rangle + \varkappa \nu ^{-2}\left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\quad + 2\pi \left\Vert m^k\right\Vert ^2_{\textbf{P}}+ 2\nu ^{-1}\langle {\textbf{P}}(y_g^k + z_g^k),m^k\rangle . \end{aligned}$$

Using Young’s inequality we get

$$\begin{aligned} \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2&\le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2\\&\quad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^*+z^*),z^k - z^*\rangle + \varkappa \nu ^{-2}\left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\quad + 2\pi \left\Vert m^k\right\Vert ^2_{\textbf{P}}+ 3\varkappa \chi \nu ^{-2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{3\varkappa \chi }\left\Vert m^k\right\Vert ^2_{\textbf{P}}. \end{aligned}$$

Using (36) we get

$$\begin{aligned} \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2&\le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 \\&\quad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^*+z^*),z^k - z^*\rangle + \varkappa \nu ^{-2}\left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\quad + 2\pi \left\Vert m^k\right\Vert ^2_{\textbf{P}}+ 6\varkappa \nu ^{-2}\chi \left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\quad + \frac{4(1 - (4\chi )^{-1})}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}- \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&= \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2\\&\quad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^*+z^*),z^k - z^*\rangle \\&\quad + \varkappa \nu ^{-2}\left( 1 + 6\chi \right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\quad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}- \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}. \end{aligned}$$

\(\square \)

Lemma 10

The following inequality holds:

$$\begin{aligned} \begin{aligned}&2\langle y_g^k + z_g^k - (y^*+z^*),y^k + z^k - (y^*+ z^*)\rangle \\ {}&\quad \ge 2\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \frac{(1-\vartheta _2/2)}{\vartheta _2}\left( \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 - \left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2\right) . \end{aligned} \end{aligned}$$
(39)

Proof

$$\begin{aligned}&2\langle y_g^k + z_g^k - (y^*+z^*),y^k + z^k - (y^*+ z^*)\rangle \\ {}&\quad = 2\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\langle y_g^k + z_g^k - (y^*+z^*),y^k + z^k - (y_g^k + z_g^k)\rangle . \end{aligned}$$

Using Lines 7 and 10 of Algorithm 1 we get

$$\begin{aligned}&2\langle y_g^k + z_g^k - (y^*+z^*),y^k + z^k - (y^*+ z^*)\rangle \\ {}&\quad = 2\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \frac{2(1-\vartheta _1)}{\vartheta _1}\langle y_g^k + z_g^k - (y^*+z^*), y_g^k + z_g^k - (y_f^k + z_f^k)\rangle \\ {}&\quad = 2\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \frac{(1-\vartheta _1)}{\vartheta _1}\left( \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + \left\Vert y_g^k + z_g^k - (y_f^k + z_f^k)\right\Vert ^2 \right. \\ {}&\qquad \left. - \left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2\right) \\ {}&\quad \ge 2\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \frac{(1-\vartheta _1)}{\vartheta _1}\left( \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 - \left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2\right) . \end{aligned}$$

Using \(\vartheta _1\) definition (34) we get

$$\begin{aligned}&2\langle y_g^k + z_g^k - (y^*+z^*),y^k + z^k - (y^*+ z^*)\rangle \\ {}&\quad \ge 2\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \frac{(1-\vartheta _2/2)}{\vartheta _2}\left( \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 - \left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2\right) . \end{aligned}$$

\(\square \)

Lemma 11

Let \(\zeta \) be defined by

$$\begin{aligned} \zeta = 1/2. \end{aligned}$$
(40)

Then the following inequality holds:

$$\begin{aligned} \begin{aligned}&-2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\quad \le \frac{1}{\vartheta _2}\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 - \frac{1}{\vartheta _2}\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + 2\vartheta _2\left\Vert y^{k+1} - y^k\right\Vert ^2 - \frac{1}{2\vartheta _2\chi }\left\Vert y_g^k + z_g^k\right\Vert ^2_{{\textbf{P}}}. \end{aligned} \end{aligned}$$
(41)

Proof

$$\begin{aligned} \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2&= \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\quad + 2\langle y_f^{k+1} + z_f^{k+1} - (y_g^k + z_g^k),y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\quad + \left\Vert y_f^{k+1} + z_f^{k+1} - (y_g^k + z_g^k)\right\Vert ^2 \\ {}&\le \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\quad + 2\langle y_f^{k+1} + z_f^{k+1} - (y_g^k + z_g^k),y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\quad + 2\left\Vert y_f^{k+1} - y_g^k\right\Vert ^2 + 2\left\Vert z_f^{k+1} - z_g^k\right\Vert ^2. \end{aligned}$$

Using Line 9 of Algorithm 1 we get

$$\begin{aligned}&\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \\ {}&\quad \le \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\qquad + 2\langle z_f^{k+1} - z_g^k,y_g^k + z_g^k - (y^*+z^*)\rangle + 2\left\Vert z_f^{k+1} - z_g^k\right\Vert ^2. \end{aligned}$$

Using Line 13 of Algorithm 1 and optimality condition (6) we get

$$\begin{aligned}&\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \\ {}&\quad \le \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 - 2\zeta \langle ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k),y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\zeta ^2\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k)\right\Vert ^2 \\ {}&\quad = \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\qquad - 2\zeta \langle ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k),y_g^k + z_g^k\rangle + 2\zeta ^2\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k)\right\Vert ^2. \end{aligned}$$

Using \(\zeta \) definition (40) we get

$$\begin{aligned}&\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \\ {}&\quad \le \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 - \langle ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k),y_g^k + z_g^k\rangle \\ {}&\qquad + \frac{1}{2}\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k)\right\Vert ^2 \\ {}&\quad = \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 - \frac{1}{2}\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k)\right\Vert ^2 \\ {}&\qquad - \frac{1}{2}\left\Vert y_g^k + z_g^k\right\Vert ^2 + \frac{1}{2}\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k) - (y_g^k + z_g^k)\right\Vert ^2 \\ {}&\qquad + \frac{1}{2}\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k)\right\Vert ^2 \\ {}&\quad \le \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\qquad - \frac{1}{2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{2}\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k) - (y_g^k + z_g^k)\right\Vert ^2_{\textbf{P}}. \\ {}&\quad = \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\qquad - \frac{1}{2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{2}\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d){\textbf{P}}(y_g^k + z_g^k) - {\textbf{P}}(y_g^k + z_g^k)\right\Vert ^2. \end{aligned}$$

Using Assumption 4 we get

$$\begin{aligned}&\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \\ {}&\quad \le \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 - (2\chi )^{-1}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}. \end{aligned}$$

Rearranging gives

$$\begin{aligned}&-2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\quad \le \frac{1}{\vartheta _2}\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 - \frac{1}{\vartheta _2}\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + 2\vartheta _2\left\Vert y^{k+1} - y^k\right\Vert ^2 - \frac{1}{2\vartheta _2\chi }\left\Vert y_g^k + z_g^k\right\Vert ^2_{{\textbf{P}}}. \end{aligned}$$

\(\square \)

Lemma 12

Let \(\pi \) be defined as follows:

$$\begin{aligned} \pi = \frac{\beta }{16}. \end{aligned}$$
(42)

Let \(\varkappa \) be defined as follows:

$$\begin{aligned} \varkappa = \frac{\nu }{14\vartheta _2\chi ^2}. \end{aligned}$$
(43)

Let \(\theta \) be defined as follows:

$$\begin{aligned} \theta = \frac{\nu }{4\vartheta _2}. \end{aligned}$$
(44)

Let \(\vartheta _2\) be defined as follows:

$$\begin{aligned} \vartheta _2 = \frac{\sqrt{\beta \mu }}{16\chi }. \end{aligned}$$
(45)

Let \(\Psi _{yz}^k\) be the following Lyapunov function

$$\begin{aligned} \begin{aligned} \Psi _{yz}^k&= \left( \frac{1}{\theta } + \frac{\beta }{2}\right) \left\Vert y^{k} - y^*\right\Vert ^2 + \frac{\beta }{2\vartheta _2}\left\Vert y_f^{k} - y^*\right\Vert ^2 + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k} - z^*\right\Vert ^2 \\ {}&\quad + \frac{4}{3\varkappa }\left\Vert m^{k}\right\Vert ^2_{\textbf{P}}+ \frac{\nu ^{-1}}{\vartheta _2}\left\Vert y_f^{k} + z_f^{k} - (y^*+z^*)\right\Vert ^2. \end{aligned} \end{aligned}$$
(46)

Then the following inequality holds:

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ \Psi _{yz}^{k+1} \right]&\le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) }\Psi _{yz}^k + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\quad - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned} \end{aligned}$$
(47)

Proof

Combining (35) and (38) gives

$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^*+z^*),y^k + z^k - (y^*+ z^*)\rangle \\ {}&\qquad - 2\nu ^{-1}{\mathbb {E}}\left[ \langle y_g^k + z_g^k - (y^* + z^*), y^{k+1} - y^k\rangle \right] + \varkappa \nu ^{-2}\left( 1 + 6\chi \right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 \\ {}&\qquad - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\qquad - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] +\left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 + {\sigma ^2 \beta }. \end{aligned}$$

Using (39) and (41) we get

$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1- (4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 - 2\nu ^{-1}\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left( \left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 - \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2\right) \\ {}&\qquad + \frac{\nu ^{-1}}{\vartheta _2}\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] \\ {}&\qquad + 2\nu ^{-1}\vartheta _2{\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] - \frac{\nu ^{-1}}{2\vartheta _2\chi }\left\Vert y_g^k + z_g^k\right\Vert ^2_{{\textbf{P}}} \\ {}&\qquad + \varkappa \nu ^{-2}\left( 1 + 6\chi \right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 + {\sigma ^2 \beta } \\ {}&\qquad - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) \\ {}&\qquad - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] +\left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad = \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 \\ {}&\qquad + \nu ^{-1}\left( \frac{1}{\vartheta _2} - \frac{(1-\vartheta _2/2)}{\vartheta _2} - 2\right) \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \left( \varkappa \nu ^{-2}\left( 1 + 6\chi \right) - \frac{\nu ^{-1}}{2\vartheta _2\chi }\right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} + 2\nu ^{-1}\vartheta _2 - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] \\ {}&\qquad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\qquad - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta } \\ {}&\quad = \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 \\ {}&\qquad - \frac{3\nu ^{-1}}{2}\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\qquad + \left( \varkappa \nu ^{-2}\left( 1 + 6\chi \right) - \frac{\nu ^{-1}}{2\vartheta _2\chi }\right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} + 2\nu ^{-1}\vartheta _2 - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] \\ {}&\qquad + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$

Using \(\beta \) definition (33) and \(\nu \) definition (29) we get

$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 \\ {}&\qquad - {\frac{\beta }{4}}\left\Vert y_g^k - y^*\right\Vert ^2 - \frac{3}{\mu }\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\qquad + \left( \varkappa \nu ^{-2}\left( 1 + 6\chi \right) - \frac{\nu ^{-1}}{2\vartheta _2\chi }\right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} + 2\nu ^{-1}\vartheta _2 - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] \\ {}&\qquad + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$

Using \(\pi \) definition (42) (\(\pi \le \beta / 16\) and \(\pi \le 3/(4 \mu )\)) we get

$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] \\ {}&\qquad + \left( \varkappa \nu ^{-2}\left( 1 + 6\chi \right) - \frac{\nu ^{-1}}{2\vartheta _2\chi }\right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} + 2\nu ^{-1}\vartheta _2 - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] \\ {}&\qquad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) \\ {}&\qquad - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$

Using \(\varkappa \) definition (43) we get

$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] \\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} + 2\nu ^{-1}\vartheta _2 - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] \\ {}&\qquad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\qquad - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$

Using \(\theta \) definition together with (29), (33) (\(\beta \le 16 / (\mu \nu _2)\)) and (45) gives

$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\qquad + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$

Using \(\varkappa \) definition (43) and \(\pi \) definition (42) (\(2 \varkappa \pi \le 1\)) we get

$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(8\chi )^{-1}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] \\ {}&\qquad + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$

After rearranging and using \(\Psi _{yz}^k\) definition (46) we get

$$\begin{aligned} {\mathbb {E}}\left[ \Psi _{yz}^{k+1} \right]&\le \max \left\{ (1 + \theta \beta /2)^{-1}, (1-\varkappa \pi ), (1-\vartheta _2/2), (1-(8\chi )^{-1})\right\} \Psi _{yz}^k \\&\quad + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] \\ {}&\le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) }\Psi _{yz}^k + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\qquad - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$

Since \(1 - \nu _2 / 2 = 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\), \(1 - \varkappa \pi = 1 - 1 / L\) and

$$\begin{aligned} \frac{1}{1+\theta \beta } = \frac{1}{1 + (\mu \beta )(8 \nu _2)} = \frac{1}{1 + (2 \mu \chi \beta )/\sqrt{\beta \mu }} = \frac{1}{1 + 2 \chi \sqrt{\mu \beta }} \le 1 - \frac{\sqrt{\beta \mu }}{32 \chi } \end{aligned}$$

This inequality holds since

$$\begin{aligned} 1 + 2 \chi \sqrt{\mu \beta } - \frac{\sqrt{\mu \beta }}{32 \chi } - \frac{\mu \beta }{16} \ge 1 + 2 \sqrt{\mu \beta } - \frac{\sqrt{\mu \beta }}{32} - \frac{\mu \beta }{16} \ge 1 \end{aligned}$$

Because \(\mu \beta \le \mu / (2L) \le 1/2\)\(\square \)

Proof of Theorem 1

Using \(\tau _2\) definition (25) and combining (31) and (47) gives

$$\begin{aligned} {\mathbb {E}}\left[ \Psi _x^{k+1} \right] + {\mathbb {E}}\left[ \Psi _{yz}^{k+1} \right]&\le {\max \{1 - \tau _2/2, (1 + \eta \alpha )^{-1}\}}\Psi _x^k + \frac{{\beta }\sigma ^2}{\tau _2 } \\ {}&\quad + {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) }\Psi _{yz}^k + {\sigma ^2 \beta + C \Delta ^2} \\ {}&\le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) }(\Psi _x^k + \Psi _{yz}^k) + {\sigma ^2 \beta \left( 1 + \sqrt{\frac{L}{\mu }} \right) + \frac{4}{\mu } \Delta ^2}. \end{aligned}$$

The last inequality is fulfilled since

$$\begin{aligned} (1 + \alpha \eta )\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) \ge 1 \end{aligned}$$

Because

$$\begin{aligned} \left( 1 + \frac{\sqrt{\mu L}}{4(1/\beta + L)}\right) \left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) \ge 1 \end{aligned}$$

This inequality follows from \(\beta \) definition (33) (\(\beta \le 1/(2L))\) and a fact that \(\chi \ge 1\):

$$\begin{aligned} \left( 1 + \frac{\sqrt{\mu }}{12 \sqrt{L}}\right) \left( 1 - \frac{\sqrt{\mu }}{32 \sqrt{2} \sqrt{L}}\right) \ge 1 \end{aligned}$$

This inequality is true since \(\mu / L \le 1\).

This implies

$$\begin{aligned} {\mathbb {E}}\left[ \Psi _x^{N} \right] + {\mathbb {E}}\left[ \Psi _{yz}^N \right]&\le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) ^N}(\Psi _x^0 + \Psi _{yz}^0)\\&\quad + { \frac{32 \chi }{\sqrt{\mu }}\sigma ^2 \sqrt{\beta } \left( 1 + \sqrt{\frac{L}{\mu }} \right) } { + \frac{128 \chi }{\sqrt{\beta \mu ^3}} \Delta ^2} \\&\le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) ^N}(\Psi _x^0 + \Psi _{yz}^0) + { \frac{64 \chi }{\mu }\sigma ^2 \sqrt{\beta L}} { + \frac{128 \chi }{\sqrt{\beta \mu ^3}} \Delta ^2}. \end{aligned}$$

Using \(\Psi _x^k\) definition (30), we have

$$\begin{aligned}&{\mathbb {E}}\Bigg [\left( \frac{1}{\eta } + \alpha \right) \left\Vert x^{N} - x^*\right\Vert ^2 + \frac{2}{\tau _2}\left( \textrm{D}_F(x_f^{N},x^*)-\frac{\nu }{2}\left\Vert x_f^{N} - x^*\right\Vert ^2 \right) \Bigg ] \\&\quad \le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) ^N}(\Psi _x^0 + \Psi _{yz}^0) + { \frac{64 \chi }{\mu }\sigma ^2 \sqrt{\beta L}} { + \frac{128 \chi }{\sqrt{\beta \mu ^3}} \Delta ^2}. \end{aligned}$$

Using the choices of \(\eta \), \(\alpha \), \(\nu \), \(\tau _2\) and \(\eta = ([1/\beta + L]\tau _2)^{-1} \le (L \tau _2)^{-1} = (\sqrt{\mu L})^{-1}\), we get

$$\begin{aligned}&{\mathbb {E}}\Bigg [\sqrt{\mu L} \left\Vert x^{N} - x^*\right\Vert ^2 + 2 \sqrt{\frac{L}{\mu }}\left( \textrm{D}_F(x_f^{N},x^*)-\frac{\mu }{4}\left\Vert x_f^{N} - x^*\right\Vert ^2 \right) \Bigg ] \\&\quad \le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) ^N}(\Psi _x^0 + \Psi _{yz}^0) + { \frac{64 \chi }{\mu }\sigma ^2 \sqrt{\beta L}} { + \frac{128 \chi }{\sqrt{\beta \mu ^3}} \Delta ^2}. \end{aligned}$$

And finally,

$$\begin{aligned}&{\mathbb {E}}\Bigg [ \left\Vert x^{N} - x^*\right\Vert ^2 + \frac{2}{\mu }\left( \textrm{D}_F(x_f^{N},x^*)-\frac{\mu }{4}\left\Vert x_f^{N} - x^*\right\Vert ^2 \right) \Bigg ] \\&\quad \le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) ^N} (\sqrt{\mu L})^{-1}(\Psi _x^0 + \Psi _{yz}^0) + { \frac{64 \chi }{\sqrt{\mu ^{3}}}\sigma ^2 \sqrt{\beta }} { + \frac{128 \chi }{\sqrt{\beta L} \mu ^2} \Delta ^2}. \end{aligned}$$

\(\square \)

Proof of Corollary 2

Write out the convergence rate of the SADOM algorithm from Theorem 1:

$$\begin{aligned}&{\mathbb {E}}\Bigg [ \left\Vert x^{N} - x^*\right\Vert ^2 + \frac{2}{\mu }\left( \textrm{D}_F(x_f^{N},x^*)-\frac{\mu }{4}\left\Vert x_f^{N} - x^*\right\Vert ^2 \right) \Bigg ] \nonumber \\&\quad \le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) ^N} (\sqrt{\mu L})^{-1}(\Psi _x^0 + \Psi _{yz}^0) + { \frac{64 \chi }{\sqrt{\mu ^{3}}}\sigma ^2 \sqrt{\beta }} { + \frac{128 \chi }{\sqrt{\beta L} \mu ^2} \Delta ^2}. \end{aligned}$$
(48)

Let us introduce notations for shortness:

$$\begin{aligned} r_N:= & {} {\mathbb {E}}\Bigg [ \left\Vert x^{N} - x^*\right\Vert ^2 + \frac{2}{\mu }\left( \textrm{D}_F(x_f^{N},x^*)-\frac{\mu }{4}\left\Vert x_f^{N} - x^*\right\Vert ^2 \right) \Bigg ], \\ r_0:= & {} (\sqrt{\mu L})^{-1}(\Psi _x^0 + \Psi _{yz}^0), ~~ a:= \frac{\sqrt{\mu }}{32 \chi }, ~~ b:= \frac{64 \chi }{\sqrt{\mu ^3}},~~ c:= \frac{128 \chi }{\sqrt{L} \mu ^2}. \end{aligned}$$

The equation (48) takes the form

$$\begin{aligned} \begin{aligned} r_{N}&\le r_0 (1 - a\sqrt{\beta })^N + b \sigma ^2 \sqrt{\beta } + \frac{c \Delta ^2}{\sqrt{\beta }} \\ {}&\le r_0 \exp \left[ -a \sqrt{\beta } N \right] + b \sigma ^2 \sqrt{\beta } + \frac{c \Delta ^2}{\sqrt{\beta }} \end{aligned} \end{aligned}$$
(49)

Consider two cases

  • If \(\frac{1}{\sqrt{2L}} \ge \frac{\ln (\max \{2, a r_0 N / (b \sigma ^2) \})}{a N}\), then choose

    $$\begin{aligned} \sqrt{\beta } = \frac{\ln (\max \{2, a r_0 N /(b \sigma ^2) \})}{a N} \end{aligned}$$

    And Eq. (49) becomes

    $$\begin{aligned} r_{N} = \widetilde{{\mathcal {O}}} \left( \frac{b \sigma ^2}{a N} + a c \Delta ^2 N \right) \end{aligned}$$
  • If \(\frac{1}{\sqrt{2L}} \le \frac{\ln (\max \{2, a r_0 N / (b \sigma ^2) \})}{a N}\), then choose

    $$\begin{aligned} \sqrt{\beta } = \frac{1}{\sqrt{2L}} \end{aligned}$$

    And Eq. (49) becomes

    $$\begin{aligned} r_{N} = \widetilde{{\mathcal {O}}} \left( r_0 \exp \left[ -\frac{aN}{\sqrt{2L}}\right] + \frac{b \sigma ^2}{a N} + a c \Delta ^2 N \right) \end{aligned}$$

After substituting the notations abc we obtain

$$\begin{aligned}&{\mathbb {E}}\Bigg [ \left\Vert x^{N} - x^*\right\Vert ^2 + \frac{2}{\mu }\left( \textrm{D}_F(x_f^{N},x^*)-\frac{\mu }{4}\left\Vert x_f^{N} - x^*\right\Vert ^2 \right) \Bigg ] \\&\quad = \widetilde{{\mathcal {O}}} \left( C_0 \exp \left[ -\frac{\sqrt{\mu } N}{32 \sqrt{2} \chi \sqrt{L}}\right] + \frac{\chi ^2 \sigma ^2}{B \mu ^{2} N} + \frac{\Delta ^2 N}{\sqrt{L} \mu ^{3/2}} \right) . \end{aligned}$$

This finishes the proof. \(\square \)

Appendix 3: Proof of Theorem 2

1.1 Proof of Lemma 4

Proof

First lets consider TPF smoothing scheme (16):

$$\begin{aligned} \begin{aligned} {\textbf{g}}(x, \xi , e)&= \frac{d}{2 \gamma }(F_{\delta }(x + \gamma e) - F_{\delta }(x - \gamma e))e \\ {}&= \frac{d}{2 \gamma }(F(x + \gamma e) - F(x - \gamma e))e + \frac{d e}{2 \gamma }(\delta (x + \gamma e) - \delta (x - \gamma e)) \end{aligned} \end{aligned}$$

According to Gasnikov et al. (2022a) the first summand is an unbiased gradient estimator, let us consider the second one:

$$\begin{aligned} \left\| \varvec{\omega }(x) = {\mathbb {E}}\left[ \frac{d e}{2 \gamma }(\delta (x + \gamma e, \xi ) - \delta (x - \gamma e, \xi ))\right] \right\| \le \frac{d}{2 \gamma } \cdot 2 {\widetilde{\Delta }} = \frac{d {\widetilde{\Delta }}}{\gamma } \end{aligned}$$

Similar results are obtained for the two remaining schemes (17) and (18). For OPF via single realization of \(\xi \) (17):

$$\begin{aligned} \left\| \varvec{\omega }(x) = {\mathbb {E}}\left[ \frac{d e}{\gamma }(\delta (x + \gamma e, \xi ) \right] \right\| \le \frac{d {\widetilde{\Delta }}}{\gamma } \end{aligned}$$

For OPF via double realization of \(\xi \) (18):

$$\begin{aligned} \left\| \varvec{\omega }(x) = {\mathbb {E}}\left[ \frac{d e}{2 \gamma }(\delta (x + \gamma e, \xi _1) - \delta (x - \gamma e, \xi _2))\right] \right\| \le \frac{d {\widetilde{\Delta }}}{\gamma } \end{aligned}$$

\(\square \)

1.2 Proof of Theorem 2

Proof

Write out converge result from Corollary 2:

$$\begin{aligned}&{\mathbb {E}}\Bigg [ \frac{\mu }{2}\left\Vert x^{N} - x^*\right\Vert ^2 + F(x_f^{N}) - F(x^*) -\frac{\mu }{4}\left\Vert x_f^{N} - x^*\right\Vert ^2 \Bigg ] \\&\quad = \widetilde{{\mathcal {O}}} \left( {{\hat{C}}}_0 \exp \left[ -\frac{\sqrt{\mu } N}{32 \sqrt{2} \chi \sqrt{L}}\right] + \frac{\chi ^2 \sigma ^2}{B \mu N} + \frac{\Delta ^2 N}{\sqrt{L} \mu ^{1/2}} \right) . \end{aligned}$$

Consider the first summand

$$\begin{aligned} {{\hat{C}}}_0 \exp \left[ -\frac{\sqrt{\mu } N}{32 \chi \sqrt{L}}\right] \le \varepsilon / 6 \end{aligned}$$

So

$$\begin{aligned} N \ge 32 \sqrt{2} \chi \sqrt{\frac{L}{\mu }} \log \left( \frac{6 {{\hat{C}}}_0}{\varepsilon }\right) \end{aligned}$$

From Lemma 2: \(L_{F_{\gamma }} = \frac{\sqrt{d} M_2}{\gamma }\) and \(\gamma = \frac{\varepsilon }{2 M_2}\). So \(L_{F_{\gamma }} = \frac{2\sqrt{d} M_2^2}{\varepsilon }\), and we obtain

$$\begin{aligned} N \gtrsim 32 \sqrt{2} \chi \frac{\sqrt{2} d^{1/4} M_2}{\sqrt{\varepsilon \mu }} \end{aligned}$$

Consider the second summand

$$\begin{aligned} \frac{\chi ^2 \sigma ^2}{B \mu N} \le \varepsilon / 6 \end{aligned}$$

So with \(\sigma ^2 = {{\tilde{\sigma }}}^2\)

$$\begin{aligned} N \ge \frac{6 \chi ^2 {{\tilde{\sigma }}}^2}{\varepsilon B \mu } \end{aligned}$$

Finally we obtain

$$\begin{aligned} N = {\tilde{{{\mathcal{O}}}}}\left( \max \left\{ \frac{d^{1/4} M_2 \chi }{\sqrt{\varepsilon \mu }}; \frac{\chi ^2 {{\tilde{\sigma }}}^2}{\varepsilon B \mu }\right\} \right) \end{aligned}$$

Consider the third summand

$$\begin{aligned} \frac{\Delta ^2 N}{\sqrt{L} \mu ^{1/2}} \le \varepsilon / 6 \end{aligned}$$

So

$$\begin{aligned} \Delta ^2 \le \frac{\sqrt{L} \mu ^{1/2} \varepsilon }{6N} = \frac{\sqrt{2} M_2 d^{1/4} \mu ^{1/2} \sqrt{\varepsilon }}{6N} \end{aligned}$$

From Lemma 4: \(\Delta = \frac{d {\widetilde{\Delta }}}{\gamma }\), so

$$\begin{aligned} {\widetilde{\Delta }}^2 \le \frac{\sqrt{2} \varepsilon ^{5/2} \mu ^{1/2}}{3 d^{7/4} M_2 N} \end{aligned}$$

Finaly we obtain

$$\begin{aligned} {\widetilde{\Delta }}^2 = {\mathcal {O}}\left( \frac{\varepsilon ^{5/2} \mu ^{1/2}}{d^{7/4} M_2 N}\right) \end{aligned}$$

\(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lobanov, A., Veprikov, A., Konin, G. et al. Non-smooth setting of stochastic decentralized convex optimization problem over time-varying Graphs. Comput Manag Sci 20, 48 (2023). https://doi.org/10.1007/s10287-023-00479-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10287-023-00479-7

Keywords

Navigation