Appendix
Appendix 1: Auxiliary facts and results
In this section we list auxiliary facts and results that we use several times in our proofs.
1.1 Squared norm of the sum
For all \(a_1, \ldots ,a_n \in {\mathbb {R}}^d\), where \(n=\{2,3\}\)
$$\begin{aligned} \Vert a_1 + \cdots + a_n \Vert ^2 \le n \Vert a_1 \Vert ^2 + \cdots + n \Vert a_n \Vert ^2. \end{aligned}$$
(21)
1.2 Fenchel–Young inequality
For all \(a,b\in {\mathbb {R}}^d\) and \(\lambda > 0\)
$$\begin{aligned} \left\langle a,b \right\rangle \le \frac{\Vert a\Vert ^2}{2\lambda } + \frac{\lambda \Vert b\Vert ^2}{2}. \end{aligned}$$
(22)
1.3 Inner product representation
For all \(a,b\in {\mathbb {R}}^d\)
$$\begin{aligned} \left\langle a,b \right\rangle = \frac{1}{2}\left( \Vert a+b\Vert ^2 - \Vert a\Vert ^2 - \Vert b\Vert ^2\right) . \end{aligned}$$
(23)
1.4 Fact from concentration of the measure
Let \({\textbf{e}}\) is uniformly distributed on the Euclidean unit sphere, then, for \(d \ge 8\), \(\forall s \in {\mathbb {R}}^d\)
$$\begin{aligned} {\mathbb {E}}_{\textbf{e}}\left( \left\langle s,{\textbf{e}} \right\rangle ^2\right) \le \frac{\Vert s \Vert ^2}{d}. \end{aligned}$$
(24)
Appendix 2: Proof of Theorem
By \(\textrm{D}_F(x,y)\) we denote Bregman distance \(\textrm{D}_F(x,y){:}{=}F(x) - F(y) - \langle \nabla F(y),x-y\rangle \).
Lemma 5
Let \(\tau _2\) be defined as follows:
$$\begin{aligned} \tau _2 = \sqrt{\mu /L}. \end{aligned}$$
(25)
Let \(\tau _1\) be defined as follows:
$$\begin{aligned} \tau _1 = (1/\tau _2 + 1/2)^{-1}. \end{aligned}$$
(26)
Let \(\eta \) be defined as follows:
$$\begin{aligned} \eta = \left( [1/\beta + L] \cdot \tau _2\right) ^{-1} \end{aligned}$$
(27)
Let \(\alpha \) be defined as follows:
$$\begin{aligned} \alpha = \mu / 4 \end{aligned}$$
(28)
Let \(\nu \) be defined as follows:
$$\begin{aligned} \nu = \mu /2. \end{aligned}$$
(29)
Let \(\Psi _x^k\) be defined as follows:
$$\begin{aligned} \Psi _x^k = \left( \frac{1}{\eta } + \alpha \right) \left\Vert x^{k} - x^*\right\Vert ^2 + \frac{2}{\tau _2}\left( \textrm{D}_f(x_f^{k},x^*)-\frac{\nu }{2}\left\Vert x_f^{k} - x^*\right\Vert ^2 \right) \end{aligned}$$
(30)
Then the following inequality holds:
$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ \Psi _x^{k+1} \right]&\le {\max \left\{ 1 - \tau _2/2, 1/(1+\eta \alpha )\right\} }\Psi _x^k \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] \\&\quad -\left( \textrm{D}_F(x_g^k,x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2\right) + \frac{\sigma ^2}{\tau _2} + {\frac{4}{\mu } \Delta ^2}. \end{aligned} \end{aligned}$$
(31)
Proof
$$\begin{aligned} \frac{1}{\eta }\left\Vert x^{k+1} - x^*\right\Vert ^2&= \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2+\frac{2}{\eta }\langle x^{k+1} - x^k,x^{k+1}- x^*\rangle - \frac{1}{\eta }\left\Vert x^{k+1} - x^k\right\Vert ^2. \end{aligned}$$
Let \({\textbf{G}}_{k} = {\textbf{g}}(x_g^k, \varvec{\xi }^k)\) then using Line 5 of Algorithm 1 we get
$$\begin{aligned} \frac{1}{\eta }\left\Vert x^{k+1} - x^*\right\Vert ^2&= \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 + 2\alpha \langle x_g^k - x^{k+1},x^{k+1}- x^*\rangle \\&\quad - 2\langle {\textbf{G}}_k - \nu x_g^k - y^{k+1},x^{k+1} - x^*\rangle - \frac{1}{\eta }\left\Vert x^{k+1} - x^k\right\Vert ^2 \\ {}&= \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 + 2\alpha \langle x_g^k - x^*- x^{k+1} + x^*,x^{k+1}- x^*\rangle \\ {}&\quad - 2\langle {\textbf{G}}_k - \nu x_g^k - y^{k+1},x^{k+1} - x^*\rangle - \frac{1}{\eta }\left\Vert x^{k+1} - x^k\right\Vert ^2 \\ {}&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\quad - 2\langle {\textbf{G}}_k - \nu x_g^k - y^{k+1},x^{k+1} - x^*\rangle - \frac{1}{\eta }\left\Vert x^{k+1} - x^k\right\Vert ^2. \end{aligned}$$
Using optimality condition (4) we get
$$\begin{aligned} \frac{1}{\eta }\left\Vert x^{k+1} - x^*\right\Vert ^2&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad - \frac{1}{\eta }\left\Vert x^{k+1} - x^k\right\Vert ^2 -2\langle {\textbf{G}}_k - \nabla F(x^*),x^{k+1} - x^*\rangle \\&\quad + 2\nu \langle x_g^k - x^*,x^{k+1} - x^*\rangle + 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle . \end{aligned}$$
Using Line 6 of Algorithm 1 we get
$$\begin{aligned} \frac{1}{\eta }\left\Vert x^{k+1} - x^*\right\Vert ^2&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad - \frac{1}{\eta \tau _2^2}\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 -2\langle {\textbf{G}}_k - \nabla F(x^*),x^k - x^*\rangle \\&\quad + 2\nu \langle x_g^k - x^*,x^k - x^*\rangle + 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \\&\quad - \frac{2}{\tau _2}\langle {\textbf{G}}_k - \nabla F(x^*),x_f^{k+1} - x_g^k\rangle + \frac{2\nu }{\tau _2}\langle x_g^k - x^*,x_f^{k+1} - x_g^k\rangle \\ {}&= \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad - \frac{1}{\eta \tau _2^2}\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 -2\langle mG_k - \nabla F(x^*),x^k - x^*\rangle \\&\quad + 2\nu \langle x_g^k - x^*,x^k - x^*\rangle + 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \\&\quad - \frac{2}{\tau _2}\langle {\textbf{G}}_k - \nabla F(x^*),x_f^{k+1} - x_g^k\rangle \\&\quad + \frac{\nu }{\tau _2}\left( \left\Vert x_f^{k+1} - x^*\right\Vert ^2 - \left\Vert x_g^k - x^*\right\Vert ^2-\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2\right) \\ {}&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad - \frac{1}{\eta \tau _2^2}\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 -2\langle {\textbf{G}}_k - \nabla F(x^*),x^k - x^*\rangle \\&\quad + 2\nu \langle x_g^k - x^*,x^k - x^*\rangle + 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \\&\quad + \frac{\nu }{\tau _2}\left( \left\Vert x_f^{k+1} - x^*\right\Vert ^2 - \left\Vert x_g^k - x^*\right\Vert ^2-\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2\right) \\&\quad -\frac{2}{\tau _2} \underbrace{\left\langle {\textbf{G}}_k - \nabla F(x_g^{k}),x_f^{k+1} - x_g^k \right\rangle }_{\textcircled {1}} \\&\quad -\frac{2}{\tau _2} \underbrace{\left\langle \nabla F(x_g^{k}) - \nabla F(x^{*}),x_f^{k+1} - x_g^k \right\rangle }_{\textcircled {2}} . \end{aligned}$$
Find the upper estimate for the term \(\textcircled {1}\):
$$\begin{aligned}&-\frac{2}{\tau _2} \left\langle {\textbf{G}}_k - \nabla F(x_g^{k}),x_f^{k+1} - x_g^k \right\rangle \\&\quad = \frac{2}{\tau _2} \left\langle {\textbf{G}}_k - \nabla F(x_g^{k}),x_g^k - x_f^{k+1} \right\rangle \\&\quad \overset{(\text {A2})}{\le } \frac{2}{\tau _2} \left( \frac{{\beta }}{2 } \left\| {\textbf{G}}_k - \nabla F(x_g^{k}) \right\| ^2 + \frac{ 1}{2 {\beta }} \left\| x_f^{k+1} - x_g^k \right\| ^2\right) . \end{aligned}$$
Find the upper estimate for the term \(\textcircled {2}\):
$$\begin{aligned}{} & {} -\frac{2}{\tau _2} \left\langle \nabla F(x_g^{k}) - \nabla F(x^{*}),x_f^{k+1} - x_g^k \right\rangle \\{} & {} \quad = -\frac{2}{\tau _2} \left\langle \nabla F(x_g^{k}),x_f^{k+1} - x_g^k \right\rangle \\{} & {} \quad \le \frac{2}{\tau _2} \left( F(x_g^{k}) - F(x_f^{k+1}) + \frac{L}{2} \left\| x_f^{k+1} - x_g^k \right\| ^2\right) . \end{aligned}$$
Substituting the obtained estimates we get:
$$\begin{aligned} \frac{1}{\eta }\left\Vert x^{k+1} - x^*\right\Vert ^2\le & {} \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\ {}{} & {} -\, \frac{1}{\eta \tau _2^2}\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 -2\langle {\textbf{G}}_k - \nabla F(x^*),x^k - x^*\rangle \\ {}{} & {} +\, 2\nu \langle x_g^k - x^*,x^k - x^*\rangle + 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \\{} & {} +\, \frac{\nu }{\tau _2}\left( \left\Vert x_f^{k+1} - x^*\right\Vert ^2 - \left\Vert x_g^k - x^*\right\Vert ^2-\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2\right) \\{} & {} +\, \frac{2}{\tau _2} \left( F(x_g^{k}) - F(x_f^{k+1}) + {\frac{1/\beta + L}{2} \cdot } \left\| x_f^{k+1} - x_g^k \right\| ^2 \right) \\ {}{} & {} + \,\frac{{\beta }}{\tau _2} \left\| {\textbf{G}}_k - \nabla F(x_g^{k}) \right\| ^2 \\ {}= & {} \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 \\ {}{} & {} +\, \alpha \left\Vert x_g^k - x^*\right\Vert ^2 + \left( \frac{ {1/\beta + L}}{\tau _2}- \frac{1}{\eta \tau _2^2} \right) \left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 \\{} & {} -\,2\langle {\textbf{G}}_k - \nabla F(x^*),x^k - x^*\rangle + 2\nu \langle x_g^k - x^*,x^k - x^*\rangle \\ {}{} & {} +\, 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle + \frac{{\beta }}{\tau _2 } \left\| {\textbf{G}}_k - \nabla F(x_g^{k}) \right\| ^2 \\{} & {} +\, \frac{\nu }{\tau _2}\left( \left\Vert x_f^{k+1} - x^*\right\Vert ^2 - \left\Vert x_g^k - x^*\right\Vert ^2-\left\Vert x_f^{k+1} - x_g^k\right\Vert ^2\right) \\{} & {} +\, \frac{2}{\tau _2} \left( - F(x_f^{k+1}) + F(x_g^{k}) \pm F(x^*) + \left\langle \nabla F(x^*),x^* - x_f^{k+1} \right\rangle \right. \\ {}{} & {} \left. -\, \left\langle \nabla F(x^*),x^* - x_g^k \right\rangle \right) \\ {}= & {} \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha \left\Vert x^{k+1} - x^*\right\Vert ^2 \\ {}{} & {} +\, \alpha \left\Vert x_g^k - x^*\right\Vert ^2 + \left( \frac{ {1/\beta + L} - \nu }{\tau _2}- \frac{1}{\eta \tau _2^2} \right) \left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 \\{} & {} -\,2\langle {\textbf{G}}_k - \nabla F(x^*),x^k - x^*\rangle + 2\nu \langle x_g^k - x^*,x^k - x^*\rangle \\ {}{} & {} +\, 2\langle y^{k+1} - y^*,x^{k+1} - x^*\rangle - \frac{2}{\tau _2} \left( D_f(x_f^{k+1}) - D_f(x_g^k) \right) \\ {}{} & {} +\, \frac{\nu }{\tau _2}\left( \left\Vert x_f^{k+1} - x^*\right\Vert ^2 - \left\Vert x_g^k - x^*\right\Vert ^2\right) + \frac{{\beta }}{\tau _2 } \left\| {\textbf{G}}_k - \nabla F(x_g^{k}) \right\| ^2. \end{aligned}$$
Taking expectation on \(\varvec{\xi }^k\) we have:
$$\begin{aligned} \frac{1}{\eta }{\mathbb {E}}\left[ \left\| x^{k+1} - x^* \right\| ^2\right]\le & {} \frac{1}{\eta }{\mathbb {E}}\left[ \left\| x^k - x^* \right\| ^2 \right] - \alpha {\mathbb {E}}\left[ \left\| x^{k+1} - x^* \right\| ^2 \right] \\ {}{} & {} +\, \alpha {\mathbb {E}}\left[ \left\| x_g^k - x^* \right\| ^2 \right] -2 {\mathbb {E}}\left[ \left\langle {\textbf{G}}_k - \nabla F(x^*),x^k - x^* \right\rangle \right] \\ {}{} & {} +\, 2\nu {\mathbb {E}}\left[ \left\langle x_g^k - x^*,x^k - x^* \right\rangle \right] + 2{\mathbb {E}}\left[ \left\langle y^{k+1} - y^*,x^{k+1} - x^* \right\rangle \right] \\ {}{} & {} -\, \frac{2}{\tau _2} \left( {\mathbb {E}}\left[ D_f(x_f^{k+1}) - D_f(x_g^k) \right] \right) \\{} & {} +\, \frac{\nu }{\tau _2}\left( {\mathbb {E}}\left[ \left\| x_f^{k+1} - x^* \right\| ^2 - \left\| x_g^k - x^* \right\| ^2 \right] \right) \\{} & {} +\, \left( \frac{ {1/\beta + L} - \nu }{\tau _2}- \frac{1}{\eta \tau _2^2} \right) {\mathbb {E}}\left[ \left\| x_f^{k+1} - x_g^k \right\| ^2 \right] \\ {}{} & {} +\, \frac{{\beta }}{\tau _2 } \left\| {\textbf{G}}_k - \nabla F(x_g^{k}) \right\| ^2 \\ {}&\overset{(\text {11}),(\text {13})}{\le }&\frac{1}{\eta }{\mathbb {E}}\left[ \left\| x^k - x^* \right\| ^2 \right] \\ {}{} & {} -\, \alpha {\mathbb {E}}\left[ \left\| x^{k+1} - x^* \right\| ^2 \right] + \alpha {\mathbb {E}}\left[ \left\| x_g^k - x^* \right\| ^2 \right] \\{} & {} -\,2 \left\langle \nabla F(x_g^k) + {\varvec{\omega }(x_g^k)} - \nabla F(x^*),x^k - x^* \right\rangle \\ {}{} & {} +\, 2\nu \left\langle x_g^k -
x^*,x^k - x^*
\right\rangle + 2{\mathbb {E}}\left[ \left\langle y^{k+1} - y^*,x^{k+1} - x^* \right\rangle \right] \\ {}{} & {} -\, \frac{2}{\tau _2} \left( {\mathbb {E}}\left[ D_f(x_f^{k+1}) \right] - {\mathbb {E}}\left[ D_f(x_g^k) \right] \right) \\{} & {} +\, \frac{\nu }{\tau _2}\left( {\mathbb {E}}\left[ \left\| x_f^{k+1} - x^* \right\| ^2 \right] - {\mathbb {E}}\left[ \left\| x_g^k - x^* \right\| ^2 \right] \right) \\{} & {} +\, \left( \frac{ {1/\beta + L}- \nu }{\tau _2}- \frac{1}{\eta \tau _2^2} \right) {\mathbb {E}}\left[ \left\| x_f^{k+1} - x_g^k \right\| ^2 \right] + \frac{{\beta } \sigma ^2}{\tau _2 }. \end{aligned}$$
Using
$$\begin{aligned} 2 \left\langle \varvec{\omega }(x_g^k),x^k - x^* \right\rangle \le \frac{4}{\mu } \left\| \varvec{\omega }(x_g^k) \right\| ^2 + \frac{\mu }{4} \left\| x_g^k - x^* \right\| ^2. \end{aligned}$$
And Line 4 of Algorithm 1 we get
$$\begin{aligned} \frac{1}{\eta }{\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right]&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha {\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right] + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \left( \frac{ {1/\beta + L} - \nu }{\tau _2}-\frac{1}{\eta \tau _2^2}\right) {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 \right] \\&\quad -2\langle \nabla F(x_g^k) - \nabla F(x^*),x_g^k - x^*\rangle + 2\nu \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \frac{2(1-\tau _1)}{\tau _1}\langle \nabla F(x_g^k) - \nabla F(x^*),x_f^k - x_g^k\rangle \\&\quad + \frac{2\nu (1-\tau _1)}{\tau _1}\langle x_g^k - x_f^k,x_g^k - x^*\rangle \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] + \frac{{\beta }\sigma ^2}{\tau _2 } \\&\quad - \frac{2}{\tau _2}\left( {\mathbb {E}}\left[ \textrm{D}_f(x_f^{k+1},x^*) \right] - \textrm{D}_f(x_g^k,x^*)\right) \\&\quad + \frac{\nu }{\tau _2}\left( {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x^*\right\Vert ^2 \right] - \left\Vert x_g^k - x^*\right\Vert ^2\right) \\&\quad + {\frac{4}{\mu } \Delta ^2 + \frac{\mu }{4} \left\| x_g^k - x^* \right\| ^2} \\ {}&= \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha {\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right] \\&\quad + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 + \left( \frac{ {1/\beta + L}- \nu }{\tau _2}-\frac{1}{\eta \tau _2^2}\right) {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 \right] \\&\quad -2\langle \nabla F(x_g^k) - \nabla F(x^*),x_g^k - x^*\rangle + 2\nu \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \frac{2(1-\tau _1)}{\tau _1}\langle \nabla F(x_g^k) - \nabla F(x^*),x_f^k - x_g^k\rangle \\&\quad + \frac{\nu (1-\tau _1)}{\tau _1}\left( \left\Vert x_g^k- x_f^k\right\Vert ^2 + \left\Vert x_g^k - x^*\right\Vert ^2 - \left\Vert x_f^k - x^*\right\Vert ^2\right) \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] \\&\quad - \frac{2}{\tau _2}\left( {\mathbb {E}}\left[ \textrm{D}_f(x_f^{k+1},x^*) \right] - \textrm{D}_f(x_g^k,x^*)\right) \\&\quad + \frac{\nu }{\tau _2}\left( {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x^*\right\Vert ^2 \right] - \left\Vert x_g^k - x^*\right\Vert ^2\right) + \frac{{\beta }\sigma ^2}{\tau _2 } \\&\quad {+} {\frac{4 \Delta ^2}{\mu } + \frac{\mu }{4} \left\| x_g^k - x^* \right\| ^2}. \end{aligned}$$
Using \(\mu \)-strong convexity of \(\textrm{D}_F(x,x^*)\) in x, which follows from \(\mu \)-strong convexity of F(x), we get
$$\begin{aligned} \frac{1}{\eta }{\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right]&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha {\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right] + \alpha \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \left( \frac{ {1/\beta + L} - \nu }{\tau _2}-\frac{1}{\eta \tau _2^2}\right) {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 \right] -2\textrm{D}_F(x_g^k,x^*) \\&\quad - \mu \left\Vert x_g^k - x^*\right\Vert ^2 + 2\nu \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \frac{2(1-\tau _1)}{\tau _1}\left( \textrm{D}_F(x_f^k,x^*) - \textrm{D}_F(x_g^k,x^*) - \frac{\mu }{2}\left\Vert x_f^k - x_g^k\right\Vert ^2\right) \\&\quad + \frac{\nu (1-\tau _1)}{\tau _1}\left( \left\Vert x_g^k- x_f^k\right\Vert ^2 + \left\Vert x_g^k - x^*\right\Vert ^2 - \left\Vert x_f^k - x^*\right\Vert ^2\right) \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] \\&\quad - \frac{2}{\tau _2}\left( {\mathbb {E}}\left[ \textrm{D}_f(x_f^{k+1},x^*) \right] - \textrm{D}_f(x_g^k,x^*)\right) \\&\quad + \frac{\nu }{\tau _2}\left( {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x^*\right\Vert ^2 \right] - \left\Vert x_g^k - x^*\right\Vert ^2\right) + \frac{{\beta }\sigma ^2}{\tau _2 } \\ {}&= \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha {\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right] \\&\quad + \frac{2(1-\tau _1)}{\tau _1}\left( \textrm{D}_F(x_f^k,x^*) - \frac{\nu }{2}\left\Vert x_f^k - x^*\right\Vert ^2\right) \\&\quad - \frac{2}{\tau _2}\left( {\mathbb {E}}\left[ \textrm{D}_f(x_f^{k+1},x^*) \right] -\frac{\nu }{2}{\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x^*\right\Vert ^2 \right] \right) \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] + 2\left( \frac{1}{\tau _2}-\frac{1}{\tau _1}\right) \textrm{D}_F(x_g^k,x^*) \\&\quad + \left( \alpha - \mu + \nu +\frac{\nu }{\tau _1}-\frac{\nu }{\tau _2}\right) \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \left( \frac{ {1/\beta + L} - \nu }{\tau _2}-\frac{1}{\eta \tau _2^2}\right) {\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x_g^k\right\Vert ^2 \right] \\&\quad + \frac{(1-\tau _1)(\nu -\mu )}{\tau _1}\left\Vert x_f^k - x_g^k\right\Vert ^2 \\&\quad + \frac{\sigma ^2}{{\beta } \tau _2 } {+ \frac{4\Delta ^2}{\mu } + \frac{\mu }{4} \left\| x_g^k - x^* \right\| ^2}. \end{aligned}$$
Using \(\eta \) defined by (27), \(\tau _1\) defined by (26) and the fact that \(\nu < \mu \) we get
$$\begin{aligned} \frac{1}{\eta }{\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right]&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha {\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right] \\&\quad + \frac{2(1-\tau _2/2)}{\tau _2}\left( \textrm{D}_F(x_f^k,x^*) - \frac{\nu }{2}\left\Vert x_f^k - x^*\right\Vert ^2\right) \\&\quad - \frac{2}{\tau _2}\left( {\mathbb {E}}\left[ \textrm{D}_f(x_f^{k+1},x^*) \right] -\frac{\nu }{2}{\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x^*\right\Vert ^2 \right] \right) \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] \\&\quad -\textrm{D}_F(x_g^k,x^*) + \left( \alpha - \mu + \frac{3\nu }{2} {+ \frac{\mu }{4}} \right) \left\Vert x_g^k - x^*\right\Vert ^2 \\&\quad + \frac{{\beta }\sigma ^2}{\tau _2 } {+ \frac{4}{\mu } \Delta ^2}. \end{aligned}$$
Using \(\alpha \) defined by (28) and \(\nu \) defined by (29) we get
$$\begin{aligned} \frac{1}{\eta }{\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right]&\le \frac{1}{\eta }\left\Vert x^k - x^*\right\Vert ^2 - \alpha {\mathbb {E}}\left[ \left\Vert x^{k+1} - x^*\right\Vert ^2 \right] \\&\quad + \frac{2(1-\tau _2/2)}{\tau _2}\left( \textrm{D}_F(x_f^k,x^*) - \frac{\nu }{2}\left\Vert x_f^k - x^*\right\Vert ^2\right) \\ {}&\quad - \frac{2}{\tau _2}\left( {\mathbb {E}}\left[ \textrm{D}_f(x_f^{k+1},x^*) \right] -\frac{\nu }{2}{\mathbb {E}}\left[ \left\Vert x_f^{k+1} - x^*\right\Vert ^2 \right] \right) \\&\quad + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] \\&\quad - \left( \textrm{D}_F(x_g^k,x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2\right) + \frac{{\beta }\sigma ^2}{\tau _2 } {+ \frac{4}{\mu } \Delta ^2}. \end{aligned}$$
After rearranging and using \(\Psi _x^k \) definition (30) we get
$$\begin{aligned} {\mathbb {E}}\left[ \Psi _x^{k+1} \right]&\le \max \left\{ 1 - \tau _2/2, 1/(1+\eta \alpha )\right\} \Psi _x^k + 2{\mathbb {E}}\left[ \langle y^{k+1} - y^*,x^{k+1} - x^*\rangle \right] \\&\quad - \left( \textrm{D}_F(x_g^k,x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2\right) + \frac{{\beta }\sigma ^2}{\tau _2 } {+ \frac{4}{\mu } \Delta ^2} \end{aligned}$$
\(\square \)
Lemma 6
The following inequality holds:
$$\begin{aligned} \begin{aligned} -\left\Vert y^{k+1}- y^*\right\Vert ^2&\le \frac{(1-\vartheta _1)}{\vartheta _1}\left\Vert y_f^k - y^*\right\Vert ^2 - \frac{1}{\vartheta _2}\left\Vert y_f^{k+1} - y^*\right\Vert ^2 \\&\quad - \left( \frac{1}{\vartheta _1} - \frac{1}{\vartheta _2}\right) \left\Vert y_g^k - y^*\right\Vert ^2 + \left( \vartheta _2- \vartheta _1\right) \left\Vert y^{k+1} - y^k\right\Vert ^2 \end{aligned} \end{aligned}$$
(32)
Proof
Lines 7 and 9 of Algorithm 1 imply
$$\begin{aligned} y_f^{k+1}&= y_g^k + \vartheta _2(y^{k+1} - y_k)\\ {}&= y_g^k + \vartheta _2 y^{k+1} - \frac{\vartheta _2}{\vartheta _1}\left( y_g^k - (1-\vartheta _1)y_f^k\right) \\ {}&= \left( 1 - \frac{\vartheta _2}{\vartheta _1}\right) y_g^k + \vartheta _2 y^{k+1} + \left( \frac{\vartheta _2}{\vartheta _1}- \vartheta _2\right) y_f^k. \end{aligned}$$
After subtracting \(y^*\) and rearranging we get
$$\begin{aligned} (y_f^{k+1}- y^*)+ \left( \frac{\vartheta _2}{\vartheta _1} - 1\right) (y_g^k - y^*) = \vartheta _2( y^{k+1} - y^*)+ \left( \frac{\vartheta _2}{\vartheta _1} - \vartheta _2\right) (y_f^k - y^*). \end{aligned}$$
Multiplying both sides by \(\frac{\vartheta _1}{\vartheta _2}\) gives
$$\begin{aligned} \frac{\vartheta _1}{\vartheta _2}(y_f^{k+1}- y^*)+ \left( 1-\frac{\vartheta _1}{\vartheta _2}\right) (y_g^k - y^*) = \vartheta _1( y^{k+1} - y^*)+ \left( 1 - \vartheta _1\right) (y_f^k - y^*). \end{aligned}$$
Squaring both sides gives
$$\begin{aligned}&\frac{\vartheta _1}{\vartheta _2}\left\Vert y_f^{k+1} - y^*\right\Vert ^2 + \left( 1- \frac{\vartheta _1}{\vartheta _2}\right) \left\Vert y_g^k - y^*\right\Vert ^2 - \frac{\vartheta _1}{\vartheta _2}\left( 1-\frac{\vartheta _1}{\vartheta _2}\right) \left\Vert y_f^{k+1} - y_g^k\right\Vert ^2\\&\quad \le \vartheta _1\left\Vert y^{k+1} - y^*\right\Vert ^2 + (1-\vartheta _1)\left\Vert y_f^k - y^*\right\Vert ^2. \end{aligned}$$
Rearranging gives
$$\begin{aligned} -\left\Vert y^{k+1}- y^*\right\Vert ^2&\le -\left( \frac{1}{\vartheta _1} - \frac{1}{\vartheta _2}\right) \left\Vert y_g^k - y^*\right\Vert ^2 + \frac{(1-\vartheta _1)}{\vartheta _1}\left\Vert y_f^k - y^*\right\Vert ^2 \\&\quad - \frac{1}{\vartheta _2}\left\Vert y_f^{k+1} - y^*\right\Vert ^2+ \frac{1}{\vartheta _2}\left( 1 - \frac{\vartheta _1}{\vartheta _2}\right) \left\Vert y_f^{k+1} - y_g^k\right\Vert ^2. \end{aligned}$$
Using Line 9 of Algorithm 1 we get
$$\begin{aligned} -\left\Vert y^{k+1}- y^*\right\Vert ^2&\le -\left( \frac{1}{\vartheta _1} - \frac{1}{\vartheta _2}\right) \left\Vert y_g^k - y^*\right\Vert ^2 + \frac{(1-\vartheta _1)}{\vartheta _1}\left\Vert y_f^k - y^*\right\Vert ^2\\&\quad - \frac{1}{\vartheta _2}\left\Vert y_f^{k+1} - y^*\right\Vert ^2+ \left( \vartheta _2 - \vartheta _1\right) \left\Vert y^{k+1} - y^k\right\Vert ^2. \end{aligned}$$
\(\square \)
Lemma 7
Let \(\beta \) be defined as follows:
$$\begin{aligned} \beta \le 1/(2L). \end{aligned}$$
(33)
Let \(\vartheta _1\) be defined as follows:
$$\begin{aligned} \vartheta _1 = (1/\vartheta _2 + 1/2)^{-1}. \end{aligned}$$
(34)
Then the following inequality holds:
$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \nonumber \\&\quad \le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) \nonumber \\&\qquad - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] \nonumber \\&\qquad - 2\nu ^{-1}{\mathbb {E}}\left[ \langle y_g^k + z_g^k - (y^* + z^*), y^{k+1} - y^*\rangle \right] - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 \nonumber \\&\qquad + \left( \frac{\beta \vartheta _2^2}{4} - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] + {\sigma ^2 \beta }. \end{aligned}$$
(35)
Proof
$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&= \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \frac{2}{\theta }\langle y^{k+1} - y^k , y^{k+1} - y^*\rangle - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2. \end{aligned}$$
Using Line 8 of Algorithm 1 we get
$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&= \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + 2\beta \langle {\textbf{G}}_k - \nu x_g^k - y^{k+1}, y^{k+1} - y^*\rangle \\&\quad - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2. \end{aligned}$$
Using optimality condition (4) we get
$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&= \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\&\quad + 2\beta \langle {\textbf{G}}_k - \nu x_g^k - (\nabla F(x^*) - \nu x^*) + y^*- y^{k+1}, y^{k+1} - y^*\rangle \\&\quad - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&= \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + 2\beta \langle {\textbf{G}}_k - \nu x_g^k - (\nabla F(x^*) - \nu x^*), y^{k+1} - y^*\rangle \\&\quad - 2\beta \left\Vert y^{k+1} - y^*\right\Vert ^2 - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle \\&\quad - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\&\quad + \beta \left\Vert {\textbf{G}}_k - \nu x_g^k - (\nabla F(x^*) - \nu x^*)\right\Vert ^2 - \beta \left\Vert y^{k+1} - y^*\right\Vert ^2 \\&\quad - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \beta \left\Vert {\textbf{G}}_k - \nabla F(x_g^k)\right\Vert ^2 \\&\quad + \beta \left\Vert \nabla F(x_g^k) - \nu x_g^k - (\nabla F(x^*) - \nu x^*)\right\Vert ^2 - \beta \left\Vert y^{k+1} - y^*\right\Vert ^2 \\&\quad - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2. \end{aligned}$$
Function \(F(x) - \frac{\nu }{2}\left\Vert x\right\Vert ^2\) is convex and L-smooth, which implies
$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + 2\beta L\left( \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2\right) \\ {}&\quad - \beta \left\Vert y^{k+1} - y^*\right\Vert ^2 - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle \\ {}&\quad - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2 + \beta \left\Vert {\textbf{G}}_k - \nabla F(x_g^k)\right\Vert ^2. \end{aligned}$$
Using \(\beta \) definition (33) we get
$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\quad - \beta \left\Vert y^{k+1} - y^*\right\Vert ^2 - 2\langle \nu ^{-1}(y_g^k + z_g^k) + x^{k+1}, y^{k+1} - y^*\rangle \\ {}&\quad - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2 + \beta \left\Vert {\textbf{G}}_k - \nabla F(x_g^k)\right\Vert ^2. \end{aligned}$$
Using optimality condition (5) we get
$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\quad - \beta \left\Vert y^{k+1} - y^*\right\Vert ^2 + \beta \left\Vert {\textbf{G}}_k - \nabla F(x_g^k)\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^* + z^*), y^{k+1} - y^*\rangle \\ {}&\quad - 2\langle x^{k+1} - x^*, y^{k+1} - y^*\rangle - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2. \end{aligned}$$
Using (32) together with \(\vartheta _1\) definition (34) we get
$$\begin{aligned} \frac{1}{\theta }\left\Vert y^{k+1} - y^*\right\Vert ^2&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\quad - \frac{\beta }{2}\left\Vert y^{k+1} - y^*\right\Vert ^2 + \beta \left\Vert {\textbf{G}}_k - \nabla F(x_g^k)\right\Vert ^2 \\&\quad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 \\ {}&\quad - \frac{\beta }{2\vartheta _2}\left\Vert y_f^{k+1} - y^*\right\Vert ^2 - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 \\&\quad + \frac{\beta \left( \vartheta _2- \vartheta _1\right) }{2}\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^* + z^*), y^{k+1} - y^*\rangle \\&\quad - 2\langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \\ {}&\quad - \frac{1}{\theta }\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 - \frac{\beta }{2}\left\Vert y^{k+1} - y^*\right\Vert ^2 \\ {}&\quad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 - \frac{\beta }{2\vartheta _2}\left\Vert y_f^{k+1} - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) \\ {}&\quad - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2\\&\quad + \left( \frac{\beta \vartheta _2^2}{4} - \frac{1}{\theta }\right) \left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^* + z^*), y^{k+1} - y^*\rangle \\ {}&\quad - 2\langle x^{k+1} - x^*, y^{k+1} - y^*\rangle + \beta \left\Vert {\textbf{G}}_k - \nabla F(x_g^k)\right\Vert ^2. \end{aligned}$$
Rearranging and taking expectation wrt \(\xi _k\) gives
$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\&\quad \le \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2\\&\qquad + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\qquad - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] \\&\qquad - 2\nu ^{-1}{\mathbb {E}}\left[ \langle y_g^k + z_g^k - (y^* + z^*), y^{k+1} - y^*\rangle \right] \\ {}&\qquad - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 + \left( \frac{\beta \vartheta _2^2}{4} - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] + {\sigma ^2 \beta }. \end{aligned}$$
\(\square \)
Lemma 8
The following inequality holds:
$$\begin{aligned} \left\Vert m^k\right\Vert ^2_{\textbf{P}}\le 8\chi ^2\varkappa ^2\nu ^{-2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}+ 4\chi (1 - (4\chi )^{-1})\left\Vert m^k\right\Vert ^2_{\textbf{P}}- 4\chi \left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}. \end{aligned}$$
(36)
Proof
Using Line 12 of Algorithm 1 we get
$$\begin{aligned} \left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}&= \left\Vert \varkappa \nu ^{-1}(y_g^k+z_g^k) + m^k - ({\textbf{W}}(k)\otimes {\textbf{I}}_d)\left[ \varkappa \nu ^{-1}(y_g^k+z_g^k) + m^k\right] \right\Vert ^2_{{\textbf{P}}}\\&= \left\Vert {\textbf{P}}\left[ \varkappa \nu ^{-1}(y_g^k+z_g^k) + m^k\right] - ({\textbf{W}}(k)\otimes {\textbf{I}}_d){\textbf{P}}\left[ \varkappa \nu ^{-1}(y_g^k+z_g^k) + m^k\right] \right\Vert ^2. \end{aligned}$$
Using property (4) we obtain
$$\begin{aligned} \left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}&\le (1 - \chi ^{-1}) \left\Vert m^k + \varkappa \nu ^{-1}(y_g^k + z_g^k)\right\Vert ^2_{\textbf{P}}. \end{aligned}$$
Using inequality \(\left\Vert a+b\right\Vert ^2 \le (1+c)\left\Vert a\right\Vert ^2 + (1+c^{-1})\left\Vert b\right\Vert ^2\) with \(c = \frac{1}{2(\chi - 1)}\) we get
$$\begin{aligned} \left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}&\le (1 - \chi ^{-1}) \left[ \left( 1 + \frac{1}{2(\chi - 1)}\right) \left\Vert m^k\right\Vert ^2_{\textbf{P}}\right. \\&\left. \quad + \left( 1 + 2(\chi - 1)\right) \varkappa ^2\nu ^{-2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}\right] \\ {}&\le (1 - (2\chi )^{-1})\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ 2\chi \varkappa ^2\nu ^{-2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}. \end{aligned}$$
Rearranging gives
$$\begin{aligned} \left\Vert m^k\right\Vert ^2_{\textbf{P}}&\le 8\chi ^2\varkappa ^2\nu ^{-2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}+ 4\chi (1 - (4\chi )^{-1})\left\Vert m^k\right\Vert ^2_{\textbf{P}}- 4\chi \left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}. \end{aligned}$$
\(\square \)
Lemma 9
Let \({\hat{z}}^k\) be defined as follows:
$$\begin{aligned} {\hat{z}}^k = z^k - {\textbf{P}}m^k. \end{aligned}$$
(37)
Then the following inequality holds:
$$\begin{aligned} \begin{aligned}&\frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1} +\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^*+z^*),z^k - z^*\rangle + \varkappa \nu ^{-2}\left( 1 + 6\chi \right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2. \end{aligned} \end{aligned}$$
(38)
Proof
$$\begin{aligned} \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2&= \frac{1}{\varkappa }\left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + \frac{2}{\varkappa }\langle {\hat{z}}^{k+1} - {\hat{z}}^k,{\hat{z}}^k - z^*\rangle + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - {\hat{z}}^k\right\Vert ^2. \end{aligned}$$
Lines 11 and 12 of Algorithm 1 together with \({\hat{z}}^k\) definition (37) imply
$$\begin{aligned} {\hat{z}}^{k+1} - {\hat{z}}^k = \varkappa \pi (z_g^k - z^k) - \varkappa \nu ^{-1}{\textbf{P}}(y_g^k + z_g^k). \end{aligned}$$
Hence,
$$\begin{aligned} \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2&= \frac{1}{\varkappa }\left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + 2\pi \langle z_g^k - z^k,{\hat{z}}^k - z^*\rangle \\ {}&\quad - 2\nu ^{-1}\langle {\textbf{P}}(y_g^k + z_g^k),{\hat{z}}^k - z^*\rangle + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - {\hat{z}}^k\right\Vert ^2 \\ {}&= \frac{1}{\varkappa }\left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + \pi \left\Vert z_g^k - {\textbf{P}}m^k - z^*\right\Vert ^2 \\ {}&\quad - \pi \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 - \pi \left\Vert z_g^k - z^k\right\Vert ^2 - 2\nu ^{-1}\langle {\textbf{P}}(y_g^k + z_g^k),{\hat{z}}^k - z^*\rangle \\ {}&\quad + \varkappa \left\Vert \pi (z_g^k - z^k) - \nu ^{-1}{\textbf{P}}(y_g^k+z_g^k)\right\Vert ^2 \\ {}&\le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\quad + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 + 2\pi \left\Vert m^k\right\Vert ^2_{\textbf{P}}- \pi \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle {\textbf{P}}(y_g^k + z_g^k),{\hat{z}}^k - z^*\rangle + 2\varkappa \pi ^2\left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad + \varkappa \left\Vert \nu ^{-1}{\textbf{P}}(y_g^k+z_g^k)\right\Vert ^2 \\ {}&\le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 \\ {}&\quad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 - 2\nu ^{-1}\langle {\textbf{P}}(y_g^k + z_g^k),z^k - z^*\rangle \\ {}&\quad + \varkappa \left\Vert \nu ^{-1}{\textbf{P}}(y_g^k+z_g^k)\right\Vert ^2 + 2\pi \left\Vert m^k\right\Vert ^2_{\textbf{P}}+ 2\nu ^{-1}\langle {\textbf{P}}(y_g^k + z_g^k),m^k\rangle . \end{aligned}$$
Using the fact that \(z^k \in {\mathcal {L}}^\perp \) for all \(k=0,1,2\ldots \) and optimality condition (6) we get
$$\begin{aligned} \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2&\le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2\\&\quad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^*+z^*),z^k - z^*\rangle + \varkappa \nu ^{-2}\left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\quad + 2\pi \left\Vert m^k\right\Vert ^2_{\textbf{P}}+ 2\nu ^{-1}\langle {\textbf{P}}(y_g^k + z_g^k),m^k\rangle . \end{aligned}$$
Using Young’s inequality we get
$$\begin{aligned} \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2&\le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2\\&\quad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^*+z^*),z^k - z^*\rangle + \varkappa \nu ^{-2}\left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\quad + 2\pi \left\Vert m^k\right\Vert ^2_{\textbf{P}}+ 3\varkappa \chi \nu ^{-2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{3\varkappa \chi }\left\Vert m^k\right\Vert ^2_{\textbf{P}}. \end{aligned}$$
Using (36) we get
$$\begin{aligned} \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2&\le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 \\&\quad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^*+z^*),z^k - z^*\rangle + \varkappa \nu ^{-2}\left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\quad + 2\pi \left\Vert m^k\right\Vert ^2_{\textbf{P}}+ 6\varkappa \nu ^{-2}\chi \left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\quad + \frac{4(1 - (4\chi )^{-1})}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}- \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&= \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2\\&\quad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^*+z^*),z^k - z^*\rangle \\&\quad + \varkappa \nu ^{-2}\left( 1 + 6\chi \right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\quad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}- \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}. \end{aligned}$$
\(\square \)
Lemma 10
The following inequality holds:
$$\begin{aligned} \begin{aligned}&2\langle y_g^k + z_g^k - (y^*+z^*),y^k + z^k - (y^*+ z^*)\rangle \\ {}&\quad \ge 2\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \frac{(1-\vartheta _2/2)}{\vartheta _2}\left( \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 - \left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2\right) . \end{aligned} \end{aligned}$$
(39)
Proof
$$\begin{aligned}&2\langle y_g^k + z_g^k - (y^*+z^*),y^k + z^k - (y^*+ z^*)\rangle \\ {}&\quad = 2\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\langle y_g^k + z_g^k - (y^*+z^*),y^k + z^k - (y_g^k + z_g^k)\rangle . \end{aligned}$$
Using Lines 7 and 10 of Algorithm 1 we get
$$\begin{aligned}&2\langle y_g^k + z_g^k - (y^*+z^*),y^k + z^k - (y^*+ z^*)\rangle \\ {}&\quad = 2\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \frac{2(1-\vartheta _1)}{\vartheta _1}\langle y_g^k + z_g^k - (y^*+z^*), y_g^k + z_g^k - (y_f^k + z_f^k)\rangle \\ {}&\quad = 2\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \frac{(1-\vartheta _1)}{\vartheta _1}\left( \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + \left\Vert y_g^k + z_g^k - (y_f^k + z_f^k)\right\Vert ^2 \right. \\ {}&\qquad \left. - \left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2\right) \\ {}&\quad \ge 2\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \frac{(1-\vartheta _1)}{\vartheta _1}\left( \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 - \left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2\right) . \end{aligned}$$
Using \(\vartheta _1\) definition (34) we get
$$\begin{aligned}&2\langle y_g^k + z_g^k - (y^*+z^*),y^k + z^k - (y^*+ z^*)\rangle \\ {}&\quad \ge 2\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \frac{(1-\vartheta _2/2)}{\vartheta _2}\left( \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 - \left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2\right) . \end{aligned}$$
\(\square \)
Lemma 11
Let \(\zeta \) be defined by
$$\begin{aligned} \zeta = 1/2. \end{aligned}$$
(40)
Then the following inequality holds:
$$\begin{aligned} \begin{aligned}&-2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\quad \le \frac{1}{\vartheta _2}\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 - \frac{1}{\vartheta _2}\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + 2\vartheta _2\left\Vert y^{k+1} - y^k\right\Vert ^2 - \frac{1}{2\vartheta _2\chi }\left\Vert y_g^k + z_g^k\right\Vert ^2_{{\textbf{P}}}. \end{aligned} \end{aligned}$$
(41)
Proof
$$\begin{aligned} \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2&= \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\quad + 2\langle y_f^{k+1} + z_f^{k+1} - (y_g^k + z_g^k),y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\quad + \left\Vert y_f^{k+1} + z_f^{k+1} - (y_g^k + z_g^k)\right\Vert ^2 \\ {}&\le \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\quad + 2\langle y_f^{k+1} + z_f^{k+1} - (y_g^k + z_g^k),y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\quad + 2\left\Vert y_f^{k+1} - y_g^k\right\Vert ^2 + 2\left\Vert z_f^{k+1} - z_g^k\right\Vert ^2. \end{aligned}$$
Using Line 9 of Algorithm 1 we get
$$\begin{aligned}&\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \\ {}&\quad \le \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\qquad + 2\langle z_f^{k+1} - z_g^k,y_g^k + z_g^k - (y^*+z^*)\rangle + 2\left\Vert z_f^{k+1} - z_g^k\right\Vert ^2. \end{aligned}$$
Using Line 13 of Algorithm 1 and optimality condition (6) we get
$$\begin{aligned}&\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \\ {}&\quad \le \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 - 2\zeta \langle ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k),y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\zeta ^2\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k)\right\Vert ^2 \\ {}&\quad = \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\qquad - 2\zeta \langle ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k),y_g^k + z_g^k\rangle + 2\zeta ^2\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k)\right\Vert ^2. \end{aligned}$$
Using \(\zeta \) definition (40) we get
$$\begin{aligned}&\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \\ {}&\quad \le \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 - \langle ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k),y_g^k + z_g^k\rangle \\ {}&\qquad + \frac{1}{2}\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k)\right\Vert ^2 \\ {}&\quad = \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 - \frac{1}{2}\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k)\right\Vert ^2 \\ {}&\qquad - \frac{1}{2}\left\Vert y_g^k + z_g^k\right\Vert ^2 + \frac{1}{2}\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k) - (y_g^k + z_g^k)\right\Vert ^2 \\ {}&\qquad + \frac{1}{2}\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k)\right\Vert ^2 \\ {}&\quad \le \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\qquad - \frac{1}{2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{2}\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d)(y_g^k + z_g^k) - (y_g^k + z_g^k)\right\Vert ^2_{\textbf{P}}. \\ {}&\quad = \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 \\ {}&\qquad - \frac{1}{2}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{2}\left\Vert ({\textbf{W}}(k)\otimes {\textbf{I}}_d){\textbf{P}}(y_g^k + z_g^k) - {\textbf{P}}(y_g^k + z_g^k)\right\Vert ^2. \end{aligned}$$
Using Assumption 4 we get
$$\begin{aligned}&\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \\ {}&\quad \le \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + 2\vartheta _2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\qquad + 2\vartheta _2^2\left\Vert y^{k+1} - y^k\right\Vert ^2 - (2\chi )^{-1}\left\Vert y_g^k + z_g^k\right\Vert ^2_{\textbf{P}}. \end{aligned}$$
Rearranging gives
$$\begin{aligned}&-2\langle y^{k+1} - y^k,y_g^k + z_g^k - (y^*+z^*)\rangle \\ {}&\quad \le \frac{1}{\vartheta _2}\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 - \frac{1}{\vartheta _2}\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + 2\vartheta _2\left\Vert y^{k+1} - y^k\right\Vert ^2 - \frac{1}{2\vartheta _2\chi }\left\Vert y_g^k + z_g^k\right\Vert ^2_{{\textbf{P}}}. \end{aligned}$$
\(\square \)
Lemma 12
Let \(\pi \) be defined as follows:
$$\begin{aligned} \pi = \frac{\beta }{16}. \end{aligned}$$
(42)
Let \(\varkappa \) be defined as follows:
$$\begin{aligned} \varkappa = \frac{\nu }{14\vartheta _2\chi ^2}. \end{aligned}$$
(43)
Let \(\theta \) be defined as follows:
$$\begin{aligned} \theta = \frac{\nu }{4\vartheta _2}. \end{aligned}$$
(44)
Let \(\vartheta _2\) be defined as follows:
$$\begin{aligned} \vartheta _2 = \frac{\sqrt{\beta \mu }}{16\chi }. \end{aligned}$$
(45)
Let \(\Psi _{yz}^k\) be the following Lyapunov function
$$\begin{aligned} \begin{aligned} \Psi _{yz}^k&= \left( \frac{1}{\theta } + \frac{\beta }{2}\right) \left\Vert y^{k} - y^*\right\Vert ^2 + \frac{\beta }{2\vartheta _2}\left\Vert y_f^{k} - y^*\right\Vert ^2 + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k} - z^*\right\Vert ^2 \\ {}&\quad + \frac{4}{3\varkappa }\left\Vert m^{k}\right\Vert ^2_{\textbf{P}}+ \frac{\nu ^{-1}}{\vartheta _2}\left\Vert y_f^{k} + z_f^{k} - (y^*+z^*)\right\Vert ^2. \end{aligned} \end{aligned}$$
(46)
Then the following inequality holds:
$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ \Psi _{yz}^{k+1} \right]&\le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) }\Psi _{yz}^k + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\quad - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned} \end{aligned}$$
(47)
Proof
Combining (35) and (38) gives
$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 - 2\nu ^{-1}\langle y_g^k + z_g^k - (y^*+z^*),y^k + z^k - (y^*+ z^*)\rangle \\ {}&\qquad - 2\nu ^{-1}{\mathbb {E}}\left[ \langle y_g^k + z_g^k - (y^* + z^*), y^{k+1} - y^k\rangle \right] + \varkappa \nu ^{-2}\left( 1 + 6\chi \right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 \\ {}&\qquad - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\qquad - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] +\left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 + {\sigma ^2 \beta }. \end{aligned}$$
Using (39) and (41) we get
$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1- (4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 - 2\nu ^{-1}\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left( \left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 - \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2\right) \\ {}&\qquad + \frac{\nu ^{-1}}{\vartheta _2}\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] \\ {}&\qquad + 2\nu ^{-1}\vartheta _2{\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] - \frac{\nu ^{-1}}{2\vartheta _2\chi }\left\Vert y_g^k + z_g^k\right\Vert ^2_{{\textbf{P}}} \\ {}&\qquad + \varkappa \nu ^{-2}\left( 1 + 6\chi \right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 + {\sigma ^2 \beta } \\ {}&\qquad - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) \\ {}&\qquad - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] +\left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\quad = \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 \\ {}&\qquad + \nu ^{-1}\left( \frac{1}{\vartheta _2} - \frac{(1-\vartheta _2/2)}{\vartheta _2} - 2\right) \left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \left( \varkappa \nu ^{-2}\left( 1 + 6\chi \right) - \frac{\nu ^{-1}}{2\vartheta _2\chi }\right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} + 2\nu ^{-1}\vartheta _2 - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] \\ {}&\qquad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\qquad - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta } \\ {}&\quad = \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}\left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 - \frac{\beta }{4}\left\Vert y_g^k - y^*\right\Vert ^2 \\ {}&\qquad - \frac{3\nu ^{-1}}{2}\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\qquad + \left( \varkappa \nu ^{-2}\left( 1 + 6\chi \right) - \frac{\nu ^{-1}}{2\vartheta _2\chi }\right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} + 2\nu ^{-1}\vartheta _2 - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] \\ {}&\qquad + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$
Using \(\beta \) definition (33) and \(\nu \) definition (29) we get
$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] + 2\pi \left\Vert z_g^k - z^*\right\Vert ^2 \\ {}&\qquad - {\frac{\beta }{4}}\left\Vert y_g^k - y^*\right\Vert ^2 - \frac{3}{\mu }\left\Vert y_g^k + z_g^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\qquad + \left( \varkappa \nu ^{-2}\left( 1 + 6\chi \right) - \frac{\nu ^{-1}}{2\vartheta _2\chi }\right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} + 2\nu ^{-1}\vartheta _2 - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] \\ {}&\qquad + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$
Using \(\pi \) definition (42) (\(\pi \le \beta / 16\) and \(\pi \le 3/(4 \mu )\)) we get
$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi
)^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] \\ {}&\qquad + \left( \varkappa \nu ^{-2}\left( 1 + 6\chi \right) - \frac{\nu ^{-1}}{2\vartheta _2\chi }\right) \left\Vert y_g^k+z_g^k\right\Vert ^2_{\textbf{P}}\\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} + 2\nu ^{-1}\vartheta _2 - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] \\ {}&\qquad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) \\ {}&\qquad - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$
Using \(\varkappa \) definition (43) we get
$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] \\ {}&\qquad + \left( \frac{\beta \vartheta _2^2}{4} + 2\nu ^{-1}\vartheta _2 - \frac{1}{\theta }\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^k\right\Vert ^2 \right] \\ {}&\qquad + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\qquad - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$
Using \(\theta \) definition together with (29), (33) (\(\beta \le 16 / (\mu \nu _2)\)) and (45) gives
$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(4\chi )^{-1}+\frac{3\varkappa \pi }{2}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] + \left( 2\varkappa \pi ^2-\pi \right) \left\Vert z_g^k - z^k\right\Vert ^2 \\ {}&\qquad + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$
Using \(\varkappa \) definition (43) and \(\pi \) definition (42) (\(2 \varkappa \pi \le 1\)) we get
$$\begin{aligned}&\left( \frac{1}{\theta } + \frac{\beta }{2}\right) {\mathbb {E}}\left[ \left\Vert y^{k+1} - y^*\right\Vert ^2 \right] + \frac{\beta }{2\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} - y^*\right\Vert ^2 \right] \\ {}&\qquad + \frac{1}{\varkappa }\left\Vert {\hat{z}}^{k+1} - z^*\right\Vert ^2 + \frac{4}{3\varkappa }\left\Vert m^{k+1}\right\Vert ^2_{\textbf{P}}\\ {}&\quad \le \left( \frac{1}{\varkappa } - \pi \right) \left\Vert {\hat{z}}^k - z^*\right\Vert ^2 \\ {}&\qquad + \left( 1-(8\chi )^{-1}\right) \frac{4}{3\varkappa }\left\Vert m^k\right\Vert ^2_{\textbf{P}}+ \frac{1}{\theta }\left\Vert y^k - y^*\right\Vert ^2 + \frac{\beta (1-\vartheta _2/2)}{2\vartheta _2}\left\Vert y_f^k - y^*\right\Vert ^2 \\ {}&\qquad + \frac{\nu ^{-1}(1-\vartheta _2/2)}{\vartheta _2}\left\Vert y_f^k + z_f^k - (y^*+z^*)\right\Vert ^2 \\ {}&\qquad - \frac{\nu ^{-1}}{\vartheta _2}{\mathbb {E}}\left[ \left\Vert y_f^{k+1} + z_f^{k+1} - (y^*+z^*)\right\Vert ^2 \right] \\ {}&\qquad + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$
After rearranging and using \(\Psi _{yz}^k\) definition (46) we get
$$\begin{aligned} {\mathbb {E}}\left[ \Psi _{yz}^{k+1} \right]&\le \max \left\{ (1 + \theta \beta /2)^{-1}, (1-\varkappa \pi ), (1-\vartheta _2/2), (1-(8\chi )^{-1})\right\} \Psi _{yz}^k \\&\quad + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] \\ {}&\le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) }\Psi _{yz}^k + \textrm{D}_F(x_g^k, x^*) - \frac{\nu }{2}\left\Vert x_g^k - x^*\right\Vert ^2 \\ {}&\qquad - 2{\mathbb {E}}\left[ \langle x^{k+1} - x^*, y^{k+1} - y^*\rangle \right] + {\sigma ^2 \beta }. \end{aligned}$$
Since \(1 - \nu _2 / 2 = 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\), \(1 - \varkappa \pi = 1 - 1 / L\) and
$$\begin{aligned} \frac{1}{1+\theta \beta } = \frac{1}{1 + (\mu \beta )(8 \nu _2)} = \frac{1}{1 + (2 \mu \chi \beta )/\sqrt{\beta \mu }} = \frac{1}{1 + 2 \chi \sqrt{\mu \beta }} \le 1 - \frac{\sqrt{\beta \mu }}{32 \chi } \end{aligned}$$
This inequality holds since
$$\begin{aligned} 1 + 2 \chi \sqrt{\mu \beta } - \frac{\sqrt{\mu \beta }}{32 \chi } - \frac{\mu \beta }{16} \ge 1 + 2 \sqrt{\mu \beta } - \frac{\sqrt{\mu \beta }}{32} - \frac{\mu \beta }{16} \ge 1 \end{aligned}$$
Because \(\mu \beta \le \mu / (2L) \le 1/2\)\(\square \)
Proof of Theorem 1
Using \(\tau _2\) definition (25) and combining (31) and (47) gives
$$\begin{aligned} {\mathbb {E}}\left[ \Psi _x^{k+1} \right] + {\mathbb {E}}\left[ \Psi _{yz}^{k+1} \right]&\le {\max \{1 - \tau _2/2, (1 + \eta \alpha )^{-1}\}}\Psi _x^k + \frac{{\beta }\sigma ^2}{\tau _2 } \\ {}&\quad + {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) }\Psi _{yz}^k + {\sigma ^2 \beta + C \Delta ^2} \\ {}&\le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) }(\Psi _x^k + \Psi _{yz}^k) + {\sigma ^2 \beta \left( 1 + \sqrt{\frac{L}{\mu }} \right) + \frac{4}{\mu } \Delta ^2}. \end{aligned}$$
The last inequality is fulfilled since
$$\begin{aligned} (1 + \alpha \eta )\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) \ge 1 \end{aligned}$$
Because
$$\begin{aligned} \left( 1 + \frac{\sqrt{\mu L}}{4(1/\beta + L)}\right) \left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) \ge 1 \end{aligned}$$
This inequality follows from \(\beta \) definition (33) (\(\beta \le 1/(2L))\) and a fact that \(\chi \ge 1\):
$$\begin{aligned} \left( 1 + \frac{\sqrt{\mu }}{12 \sqrt{L}}\right) \left( 1 - \frac{\sqrt{\mu }}{32 \sqrt{2} \sqrt{L}}\right) \ge 1 \end{aligned}$$
This inequality is true since \(\mu / L \le 1\).
This implies
$$\begin{aligned} {\mathbb {E}}\left[ \Psi _x^{N} \right] + {\mathbb {E}}\left[ \Psi _{yz}^N \right]&\le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) ^N}(\Psi _x^0 + \Psi _{yz}^0)\\&\quad + { \frac{32 \chi }{\sqrt{\mu }}\sigma ^2 \sqrt{\beta } \left( 1 + \sqrt{\frac{L}{\mu }} \right) } { + \frac{128 \chi }{\sqrt{\beta \mu ^3}} \Delta ^2} \\&\le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) ^N}(\Psi _x^0 + \Psi _{yz}^0) + { \frac{64 \chi }{\mu }\sigma ^2 \sqrt{\beta L}} { + \frac{128 \chi }{\sqrt{\beta \mu ^3}} \Delta ^2}. \end{aligned}$$
Using \(\Psi _x^k\) definition (30), we have
$$\begin{aligned}&{\mathbb {E}}\Bigg [\left( \frac{1}{\eta } + \alpha \right) \left\Vert x^{N} - x^*\right\Vert ^2 + \frac{2}{\tau _2}\left( \textrm{D}_F(x_f^{N},x^*)-\frac{\nu }{2}\left\Vert x_f^{N} - x^*\right\Vert ^2 \right) \Bigg ] \\&\quad \le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) ^N}(\Psi _x^0 + \Psi _{yz}^0) + { \frac{64 \chi }{\mu }\sigma ^2 \sqrt{\beta L}} { + \frac{128 \chi }{\sqrt{\beta \mu ^3}} \Delta ^2}. \end{aligned}$$
Using the choices of \(\eta \), \(\alpha \), \(\nu \), \(\tau _2\) and \(\eta = ([1/\beta + L]\tau _2)^{-1} \le (L \tau _2)^{-1} = (\sqrt{\mu L})^{-1}\), we get
$$\begin{aligned}&{\mathbb {E}}\Bigg [\sqrt{\mu L} \left\Vert x^{N} - x^*\right\Vert ^2 + 2 \sqrt{\frac{L}{\mu }}\left( \textrm{D}_F(x_f^{N},x^*)-\frac{\mu }{4}\left\Vert x_f^{N} - x^*\right\Vert ^2 \right) \Bigg ] \\&\quad \le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) ^N}(\Psi _x^0 + \Psi _{yz}^0) + { \frac{64 \chi }{\mu }\sigma ^2 \sqrt{\beta L}} { + \frac{128 \chi }{\sqrt{\beta \mu ^3}} \Delta ^2}. \end{aligned}$$
And finally,
$$\begin{aligned}&{\mathbb {E}}\Bigg [ \left\Vert x^{N} - x^*\right\Vert ^2 + \frac{2}{\mu }\left( \textrm{D}_F(x_f^{N},x^*)-\frac{\mu }{4}\left\Vert x_f^{N} - x^*\right\Vert ^2 \right) \Bigg ] \\&\quad \le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) ^N} (\sqrt{\mu L})^{-1}(\Psi _x^0 + \Psi _{yz}^0) + { \frac{64 \chi }{\sqrt{\mu ^{3}}}\sigma ^2 \sqrt{\beta }} { + \frac{128 \chi }{\sqrt{\beta L} \mu ^2} \Delta ^2}. \end{aligned}$$
\(\square \)
Proof of Corollary 2
Write out the convergence rate of the SADOM algorithm from Theorem 1:
$$\begin{aligned}&{\mathbb {E}}\Bigg [ \left\Vert x^{N} - x^*\right\Vert ^2 + \frac{2}{\mu }\left( \textrm{D}_F(x_f^{N},x^*)-\frac{\mu }{4}\left\Vert x_f^{N} - x^*\right\Vert ^2 \right) \Bigg ] \nonumber \\&\quad \le {\left( 1 - \frac{\sqrt{\beta \mu }}{32 \chi }\right) ^N} (\sqrt{\mu L})^{-1}(\Psi _x^0 + \Psi _{yz}^0) + { \frac{64 \chi }{\sqrt{\mu ^{3}}}\sigma ^2 \sqrt{\beta }} { + \frac{128 \chi }{\sqrt{\beta L} \mu ^2} \Delta ^2}. \end{aligned}$$
(48)
Let us introduce notations for shortness:
$$\begin{aligned} r_N:= & {} {\mathbb {E}}\Bigg [ \left\Vert x^{N} - x^*\right\Vert ^2 + \frac{2}{\mu }\left( \textrm{D}_F(x_f^{N},x^*)-\frac{\mu }{4}\left\Vert x_f^{N} - x^*\right\Vert ^2 \right) \Bigg ], \\ r_0:= & {} (\sqrt{\mu L})^{-1}(\Psi _x^0 + \Psi _{yz}^0), ~~ a:= \frac{\sqrt{\mu }}{32 \chi }, ~~ b:= \frac{64 \chi }{\sqrt{\mu ^3}},~~ c:= \frac{128 \chi }{\sqrt{L} \mu ^2}. \end{aligned}$$
The equation (48) takes the form
$$\begin{aligned} \begin{aligned} r_{N}&\le r_0 (1 - a\sqrt{\beta })^N + b \sigma ^2 \sqrt{\beta } + \frac{c \Delta ^2}{\sqrt{\beta }} \\ {}&\le r_0 \exp \left[ -a \sqrt{\beta } N \right] + b \sigma ^2 \sqrt{\beta } + \frac{c \Delta ^2}{\sqrt{\beta }} \end{aligned} \end{aligned}$$
(49)
Consider two cases
-
If \(\frac{1}{\sqrt{2L}} \ge \frac{\ln (\max \{2, a r_0 N / (b \sigma ^2) \})}{a N}\), then choose
$$\begin{aligned} \sqrt{\beta } = \frac{\ln (\max \{2, a r_0 N /(b \sigma ^2) \})}{a N} \end{aligned}$$
And Eq. (49) becomes
$$\begin{aligned} r_{N} = \widetilde{{\mathcal {O}}} \left( \frac{b \sigma ^2}{a N} + a c \Delta ^2 N \right) \end{aligned}$$
-
If \(\frac{1}{\sqrt{2L}} \le \frac{\ln (\max \{2, a r_0 N / (b \sigma ^2) \})}{a N}\), then choose
$$\begin{aligned} \sqrt{\beta } = \frac{1}{\sqrt{2L}} \end{aligned}$$
And Eq. (49) becomes
$$\begin{aligned} r_{N} = \widetilde{{\mathcal {O}}} \left( r_0 \exp \left[ -\frac{aN}{\sqrt{2L}}\right] + \frac{b \sigma ^2}{a N} + a c \Delta ^2 N \right) \end{aligned}$$
After substituting the notations a, b, c we obtain
$$\begin{aligned}&{\mathbb {E}}\Bigg [ \left\Vert x^{N} - x^*\right\Vert ^2 + \frac{2}{\mu }\left( \textrm{D}_F(x_f^{N},x^*)-\frac{\mu }{4}\left\Vert x_f^{N} - x^*\right\Vert ^2 \right) \Bigg ] \\&\quad = \widetilde{{\mathcal {O}}} \left( C_0 \exp \left[ -\frac{\sqrt{\mu } N}{32 \sqrt{2} \chi \sqrt{L}}\right] + \frac{\chi ^2 \sigma ^2}{B \mu ^{2} N} + \frac{\Delta ^2 N}{\sqrt{L} \mu ^{3/2}} \right) . \end{aligned}$$
This finishes the proof. \(\square \)
Appendix 3: Proof of Theorem 2
1.1 Proof of Lemma 4
Proof
First lets consider TPF smoothing scheme (16):
$$\begin{aligned} \begin{aligned} {\textbf{g}}(x, \xi , e)&= \frac{d}{2 \gamma }(F_{\delta }(x + \gamma e) - F_{\delta }(x - \gamma e))e \\ {}&= \frac{d}{2 \gamma }(F(x + \gamma e) - F(x - \gamma e))e + \frac{d e}{2 \gamma }(\delta (x + \gamma e) - \delta (x - \gamma e)) \end{aligned} \end{aligned}$$
According to Gasnikov et al. (2022a) the first summand is an unbiased gradient estimator, let us consider the second one:
$$\begin{aligned} \left\| \varvec{\omega }(x) = {\mathbb {E}}\left[ \frac{d e}{2 \gamma }(\delta (x + \gamma e, \xi ) - \delta (x - \gamma e, \xi ))\right] \right\| \le \frac{d}{2 \gamma } \cdot 2 {\widetilde{\Delta }} = \frac{d {\widetilde{\Delta }}}{\gamma } \end{aligned}$$
Similar results are obtained for the two remaining schemes (17) and (18). For OPF via single realization of \(\xi \) (17):
$$\begin{aligned} \left\| \varvec{\omega }(x) = {\mathbb {E}}\left[ \frac{d e}{\gamma }(\delta (x + \gamma e, \xi ) \right] \right\| \le \frac{d {\widetilde{\Delta }}}{\gamma } \end{aligned}$$
For OPF via double realization of \(\xi \) (18):
$$\begin{aligned} \left\| \varvec{\omega }(x) = {\mathbb {E}}\left[ \frac{d e}{2 \gamma }(\delta (x + \gamma e, \xi _1) - \delta (x - \gamma e, \xi _2))\right] \right\| \le \frac{d {\widetilde{\Delta }}}{\gamma } \end{aligned}$$
\(\square \)
1.2 Proof of Theorem 2
Proof
Write out converge result from Corollary 2:
$$\begin{aligned}&{\mathbb {E}}\Bigg [ \frac{\mu }{2}\left\Vert x^{N} - x^*\right\Vert ^2 + F(x_f^{N}) - F(x^*) -\frac{\mu }{4}\left\Vert x_f^{N} - x^*\right\Vert ^2 \Bigg ] \\&\quad = \widetilde{{\mathcal {O}}} \left( {{\hat{C}}}_0 \exp \left[ -\frac{\sqrt{\mu } N}{32 \sqrt{2} \chi \sqrt{L}}\right] + \frac{\chi ^2 \sigma ^2}{B \mu N} + \frac{\Delta ^2 N}{\sqrt{L} \mu ^{1/2}} \right) . \end{aligned}$$
Consider the first summand
$$\begin{aligned} {{\hat{C}}}_0 \exp \left[ -\frac{\sqrt{\mu } N}{32 \chi \sqrt{L}}\right] \le \varepsilon / 6 \end{aligned}$$
So
$$\begin{aligned} N \ge 32 \sqrt{2} \chi \sqrt{\frac{L}{\mu }} \log \left( \frac{6 {{\hat{C}}}_0}{\varepsilon }\right) \end{aligned}$$
From Lemma 2: \(L_{F_{\gamma }} = \frac{\sqrt{d} M_2}{\gamma }\) and \(\gamma = \frac{\varepsilon }{2 M_2}\). So \(L_{F_{\gamma }} = \frac{2\sqrt{d} M_2^2}{\varepsilon }\), and we obtain
$$\begin{aligned} N \gtrsim 32 \sqrt{2} \chi \frac{\sqrt{2} d^{1/4} M_2}{\sqrt{\varepsilon \mu }} \end{aligned}$$
Consider the second summand
$$\begin{aligned} \frac{\chi ^2 \sigma ^2}{B \mu N} \le \varepsilon / 6 \end{aligned}$$
So with \(\sigma ^2 = {{\tilde{\sigma }}}^2\)
$$\begin{aligned} N \ge \frac{6 \chi ^2 {{\tilde{\sigma }}}^2}{\varepsilon B \mu } \end{aligned}$$
Finally we obtain
$$\begin{aligned} N = {\tilde{{{\mathcal{O}}}}}\left( \max \left\{ \frac{d^{1/4} M_2 \chi }{\sqrt{\varepsilon \mu }}; \frac{\chi ^2 {{\tilde{\sigma }}}^2}{\varepsilon B \mu }\right\} \right) \end{aligned}$$
Consider the third summand
$$\begin{aligned} \frac{\Delta ^2 N}{\sqrt{L} \mu ^{1/2}} \le \varepsilon / 6 \end{aligned}$$
So
$$\begin{aligned} \Delta ^2 \le \frac{\sqrt{L} \mu ^{1/2} \varepsilon }{6N} = \frac{\sqrt{2} M_2 d^{1/4} \mu ^{1/2} \sqrt{\varepsilon }}{6N} \end{aligned}$$
From Lemma 4: \(\Delta = \frac{d {\widetilde{\Delta }}}{\gamma }\), so
$$\begin{aligned} {\widetilde{\Delta }}^2 \le \frac{\sqrt{2} \varepsilon ^{5/2} \mu ^{1/2}}{3 d^{7/4} M_2 N} \end{aligned}$$
Finaly we obtain
$$\begin{aligned} {\widetilde{\Delta }}^2 = {\mathcal {O}}\left( \frac{\varepsilon ^{5/2} \mu ^{1/2}}{d^{7/4} M_2 N}\right) \end{aligned}$$
\(\square \)