Appendix
Lemma 5
Suppose that positive sequence \(\{\alpha _k\}\) is nonincreasing and \(\lim _{k\rightarrow \infty }\frac{\alpha _k}{\alpha _{k+1}}=1\). Then for any \(0<\rho <1\), there exists a constant c such that
$$\begin{aligned} \sum _{t=1}^k\rho ^{k-t}\alpha _t\le c\alpha _k. \end{aligned}$$
Proof
Let \(\beta _k=\sum _{t=1}^{k}\rho ^{k-t}\alpha _t\), then \(\beta _k=\rho \sum _{t=1}^{k-1}\rho ^{k-1-t}\alpha _t+\alpha _{k}=\rho \beta _{k-1}+\alpha _{k}\). Denoting \(b_k=\beta _k/\alpha _{k}\), then \(b_k=\rho \frac{\alpha _{k-1}}{\alpha _{k}}b_{k-1}+1\). Noting that \(\lim _{k\rightarrow \infty }\frac{\alpha _{k-1}}{\alpha _{k}}=1\) and \(\rho <1\), there exists an integer \(k_0>0\) such that \(\frac{\alpha _{k-1}}{\alpha _{k}}\le \frac{2}{\rho +1}\) for \(k>k_0\). Taking \(c=\max \left\{ \sup _{1\le k\le k_0}b_k,~\frac{\rho +1}{1-\rho }\right\} \), we have \(b_k\le c\) for \(k\le k_0\). Suppose that the claim holds for \(k-1\) (\(k-1\ge k_0\)), that is \(b_{k-1}\le c\), then
$$\begin{aligned} b_k=\rho \frac{\alpha _{k-1}}{\alpha _{k}}b_{k-1}+1\le \frac{2\rho }{\rho +1} c+1\le \frac{2\rho }{\rho +1} c+\frac{1-\rho }{\rho +1}c=c. \end{aligned}$$
The proof is complete. \(\square \)
1.1 Proof of Lemma 1
Proof
Under Assumption 2, the conditions of [37, Lemma 3] hold and then there exists an invertible matrix \({\textbf{A}}_*\in {\mathbb {R}}^{n\times n}\) such that
$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\textbf{A}_*}={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \textbf{A}_*\left( {\textbf{A}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\right) \textbf{A}_*^{-1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }<1, \end{aligned}$$
where \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\textbf{A}_*}\) is the matrix norm induced by vector norm \(\Vert x\Vert _{\textbf{A}_*}:=\Vert {\textbf{A}}_*x\Vert \). Let \(\hat{{\textbf{A}}}={\textbf{A}}_*\otimes {\textbf{I}}_d\). Noting that \(\left( {\textbf{W}}_1\otimes {\textbf{W}}_2\right) ^{-1}={\textbf{W}}_1^{-1}\otimes {\textbf{W}}_2^{-1}\) for any invertible matrices \({\textbf{W}}_1,{\textbf{W}}_2\in {\mathbb {R}}^{nd\times nd}\), \(\hat{{\textbf{A}}}^{-1}={\textbf{A}}_*^{-1}\otimes {\textbf{I}}_d\). Therefore, vector matrix \(\Vert {\textbf{x}}\Vert _{\hat{{\textbf{A}}}}:=\Vert \hat{{\textbf{A}}}{\textbf{x}}\Vert \) is well defined and the corresponding induced matrix norm \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{A}}}}\) satisfies
$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{A}}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_d \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{A}}}}&={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \hat{{\textbf{A}}}\left( \tilde{{\textbf{A}}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_d\right) \hat{{\textbf{A}}}^{-1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\\&={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \left[ \textbf{A}_*\left( {\textbf{A}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\right) \textbf{A}_*^{-1}\right] \otimes {\textbf{I}}_d \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\\&={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \textbf{A}_*\left( {\textbf{A}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\right) \textbf{A}_*^{-1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }<1. \end{aligned}$$
By the similar analysis, there exists \(\hat{{\textbf{B}}}\) such that
$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_d \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}&={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \hat{{\textbf{B}}}\left( \tilde{{\textbf{B}}}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_d\right) \hat{{\textbf{B}}}^{-1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }<1. \end{aligned}$$
The inequality (12) follows from the equivalence relation of all norms on \({\mathbb {R}}^d\). The proof is complete. \(\square \)
1.2 Proof of Lemma 2
Proof
We first provide the upper bound of consensus error \({\textbf{x}}_{k+1}-{\textbf{1}}\otimes {\bar{x}}_{k+1}\) in the mean square sense. Note that for any random vectors \(\theta \), \(\theta ^{'}\) and positive scalar \(\tau \),
$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ \left\| \theta +\theta ^{'}\right\| _*^2\right] \le (1+\tau ){\mathbb {E}}\left[ \Vert \theta \Vert _*^2\right] +\left( 1+\frac{1}{\tau }\right) {\mathbb {E}}\left[ \left\| \theta ^{'}\right\| _*^2\right] , \end{aligned} \end{aligned}$$
(29)
where the norm \(\Vert \cdot \Vert _*\) may be \(\Vert \cdot \Vert _{\hat{{\textbf{A}}}}\) or \(\Vert \cdot \Vert _{\hat{{\textbf{B}}}}\). Choosing
$$\begin{aligned} \theta =\left( \tilde{{\textbf{A}}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_{k}\right) ,\quad \theta ^{'}=-\alpha _k\left( \tilde{{\textbf{A}}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k, \end{aligned}$$
we have \({\textbf{x}}_{k+1}-{\textbf{1}}\otimes {\bar{x}}_{k+1}=\theta +\theta ^{'}\) and
$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\textbf{x}}_{k+1}-{\textbf{1}}\otimes {\bar{x}}_{k+1}\Vert _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\le (1+\tau ){\mathbb {E}}\left[ \left\| \left( \tilde{{\textbf{A}}} -\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_{k}\right) \right\| _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\quad +\left( 1+\frac{1}{\tau }\right) {\mathbb {E}}\left[ \left\| \alpha _{k}\left( \tilde{{\textbf{A}}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\le \frac{1+\tau _{\textbf{A}}^2}{2}{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_{k}\right\| _{\hat{{\textbf{A}}}}^2\right] +\alpha _{k}^2\frac{1+\tau _{\textbf{A}}^2}{1-\tau _{\textbf{A}}^2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{A}}}}^2\overline{c}^2{\mathbb {E}}\left[ \left\| {\textbf{y}}_k\right\| ^2\right] , \end{aligned}$$
(30)
where \(\tau _{{\textbf{A}}}\) is defined in (17), \(\tau =(1-\tau _{{\textbf{A}}}^2)/(2\tau _{{\textbf{A}}}^2)\) and the last inequality follows from the fact (12). By the definition of \({\textbf{y}}_k\) in (10),
$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ \left\| {\textbf{y}}_k\right\| ^2\right]&={\mathbb {E}}\left[ \left\| \sum _{t=1}^{k-1}\tilde{{\textbf{B}}}^{k-1-t}(\tilde{{\textbf{B}}}-{\textbf{I}}_{nd}){\textbf{H}}_t+{\textbf{H}}_{k}\right\| ^2\right] \\&\le \sum _{t_1=1}^{k}\sum _{t_2=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t_1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t_2) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathbb {E}}\left[ \Vert {\textbf{H}}_{t_1}\Vert \Vert {\textbf{H}}_{t_2}\Vert \right] , \end{aligned} \end{aligned}$$
where
$$\begin{aligned} \begin{aligned}&\tilde{{\textbf{B}}}(k,t):=\tilde{{\textbf{B}}}^{k-1-t}(\tilde{{\textbf{B}}}-{\textbf{I}}_{nd})~(t\le k-1),\quad \tilde{{\textbf{B}}}(k,k):={\textbf{I}}_{nd}. \end{aligned} \end{aligned}$$
(31)
Obviously, \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,k) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }=1\), \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,k-1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\le \overline{c}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}-{\textbf{I}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}\) and for \(t<k-1\),
$$\begin{aligned} \begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\le \overline{c}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}^{k-1-t}\left( \tilde{{\textbf{B}}}-{\textbf{I}}_{nd}\right) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}&=\overline{c}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \left( \tilde{{\textbf{B}}}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_d\right) \tilde{{\textbf{B}}}^{k-2-t}\left( \tilde{{\textbf{B}}}-{\textbf{I}}_{n}\right) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}\\&\le \overline{c}\tau _{{\textbf{B}}} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}^{k-2-t}\left( \tilde{{\textbf{B}}}-{\textbf{I}}_{nd}\right) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}\\&\le \cdots \le \overline{c}\tau _{{\textbf{B}}}^{k-1-t} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}-{\textbf{I}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}. \end{aligned} \end{aligned}$$
Denoting \(c_b=\max \left\{ \overline{c},\frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{B}}-{\textbf{I}}_{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}}{\tau _{{\textbf{B}}}}\overline{c}\right\} \), we have
$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\le c_b\tau _{{\textbf{B}}}^{k-t} \end{aligned}$$
(32)
and
$$\begin{aligned} {\mathbb {E}}\left[ \left\| {\textbf{y}}_k\right\| ^2\right]&\le c_b^2\sum _{t_1=1}^{k}\sum _{t_2=1}^{k}\tau _{{\textbf{B}}}^{2k-t_1-t_2}{\mathbb {E}}\left[ \Vert {\textbf{H}}_{t_1}\Vert \Vert {\textbf{H}}_{t_2}\Vert \right] \nonumber \\&\le c_b^2\sum _{t_1=1}^{k}\sum _{t_2=1}^{k}\tau _{{\textbf{B}}}^{2k-t_1-t_2}\frac{{\mathbb {E}}\left[ \Vert {\textbf{H}}_{t_1}\Vert ^2\right] +{\mathbb {E}}\left[ \Vert {\textbf{H}}_{t_2}\Vert ^2\right] }{2}\nonumber \\&\le c_b^2\sum _{t_1=1}^{k}\sum _{t_2=1}^{k}\tau _{{\textbf{B}}}^{2k-t_1-t_2} nC_gC_f\le \frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}, \end{aligned}$$
(33)
where the third inequality follows from Assumption 1 (c). Substitute (33) into (30),
$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\textbf{x}}_{k+1}-{\textbf{1}}\otimes {\bar{x}}_{k+1}\Vert _{\hat{{\textbf{A}}}}^2\right] \le \frac{1+\tau _{\textbf{A}}^2}{2}{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_{k}\right\| _{\hat{{\textbf{A}}}}^2\right] +c_1\alpha _{k}^2, \end{aligned}$$
(34)
where \(c_1=\frac{1+\tau _{\textbf{A}}^2}{1-\tau _{\textbf{A}}^2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{A}}}}^2\overline{c}^2\frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}\).
Next, we estimate the upper bound of consensus error \(\Vert {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\Vert ^2\) in the mean sense. Set
$$\begin{aligned} \theta =\left( \tilde{{\textbf{B}}}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\right) ,\quad \theta ^{'}=\left( {\textbf{I}}_{nd}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{J}}_{k+1}-{\textbf{J}}_k\right) \end{aligned}$$
in (29). By the definitions of \({\textbf{y}}_{k+1}^{'}\) and \({\bar{y}}_{k+1}^{'}\), we have \({\textbf{y}}_{k+1}^{'}-{\textbf{1}}\otimes {\bar{y}}_{k+1}^{'}=\theta +\theta ^{'}\) and
$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\textbf{y}}_{k+1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{k+1}^{'}\Vert _{\hat{{\textbf{B}}}}^2\right] \nonumber \\&\le (1+\tau ){\mathbb {E}}\left[ \!\left\| \left( \!\tilde{{\textbf{B}}}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_{d}\!\right) \left( \!{\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\!\right) \right\| _{\hat{{\textbf{B}}}}^2\!\right] +\left( 1+\frac{1}{\tau }\right) {\mathbb {E}}\left[ \!\left\| \left( {\textbf{I}}_{nd}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{J}}_{k+1}-{\textbf{J}}_k\right) \right\| _{\hat{{\textbf{B}}}}^2\!\right] \nonumber \\&\le \frac{1+\tau _{\textbf{B}}^2}{2}{\mathbb {E}}\left[ \left\| {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] +2\frac{1+\tau _{\textbf{B}}^2}{1-\tau _{\textbf{B}}^2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{I}}_{nd}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_{d} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}^2\overline{c}^2{\mathbb {E}}\left[ \left\| {\textbf{J}}_{k+1}-{\textbf{J}}_k\right\| ^2\right] , \end{aligned}$$
(35)
where \(\tau _{{\textbf{B}}}\) is defined in (17), the second inequality follows from the setting \(\tau =(1-\tau _{{\textbf{B}}}^2)/(2\tau _{{\textbf{B}}}^2)\) and (12). For the term \({\mathbb {E}}\left[ \left\| {\textbf{J}}_{k+1}-{\textbf{J}}_k\right\| ^2\right] \),
$$\begin{aligned}&{\mathbb {E}}\left[ \left\| {\textbf{J}}_{k+1}-{\textbf{J}}_k\right\| ^2\right] =\sum _{j=1}^n {\mathbb {E}}\left[ \left\| \nabla g_j(x_{j,k+1})\nabla f_j(z_{j,k+1})-\nabla g_j(x_{j,k})\nabla f_j(z_{j,k})\right\| ^2\right] \nonumber \\&\le 2\sum _{j=1}^n\ {\mathbb {E}}\left[ \left\| \left( \nabla g_j(x_{j,k+1})-\nabla g_j(x_{j,k})\right) \nabla f_j(z_{j,k+1})\right\| ^2\right. \nonumber \\&\qquad \left. +\left\| \nabla g_j(x_{j,k})\left( \nabla f_j(z_{j,k})-\nabla f_j(z_{j,k+1})\right) \right\| ^2\right] \nonumber \\&\le 2C_fL_g^2{\mathbb {E}}\left[ \left\| {\textbf{x}}_{k+1}-{\textbf{x}}_k\right\| ^2\right] +2C_gL_f^2{\mathbb {E}}\left[ \left\| {\textbf{z}}_{k+1}-{\textbf{z}}_k\right\| ^2\right] \nonumber \\&\le \left( 2C_fL_g^2+8C_g^2L_f^2\right) {\mathbb {E}}\left[ \left\| {\textbf{x}}_{k+1}-{\textbf{x}}_k\right\| ^2\right] +8\beta _k^2C_gL_f^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] +8\beta _k^2C_gL_f^2V_g\nonumber \\&=\left( 2C_fL_g^2+8C_g^2L_f^2\right) {\mathbb {E}}\left[ \left\| \left( \tilde{{\textbf{A}}}-{\textbf{I}}_{nd}\right) \left( {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right) -\alpha _k\tilde{{\textbf{A}}}{\textbf{y}}_k\right\| ^2\right] \nonumber \\&\qquad +8\beta _k^2C_gL_f^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] +8\beta _k^2C_gL_f^2V_g\nonumber \\&\le \left( 4C_fL_g^2+16C_g^2L_f^2\right) {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{A}}}-{\textbf{I}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2 \overline{c}^2{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\qquad +\left( 4C_fL_g^2+16C_g^2L_f^2\right) {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2\alpha _k^2{\mathbb {E}}\left[ \left\| {\textbf{y}}_k\right\| ^2\right] \nonumber \\&\qquad +8\beta _k^2C_gL_f^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] +8\beta _k^2C_gL_f^2V_g, \end{aligned}$$
(36)
where \({\textbf{g}}_k=\left[ g_1(x_{1,k})^\intercal ,\cdots , g_n(x_{n,k})^\intercal \right] ^\intercal \), \(V_g\) is defined in Assumption 1 (d), the second inequality follows from Assumption 1 (a) and (c), the third inequality follows from the definition of \({\textbf{z}}_k\) and Assumption 1 (c) and (d), the second equality follows from the fact \(\left( \tilde{{\textbf{A}}}-{\textbf{I}}_{nd}\right) ({\textbf{1}}\otimes {\bar{x}}_k)={\textbf{0}}\).
Substitute (33) and (36) into (35),
$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\textbf{y}}_{k+1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{k+1}^{'}\Vert _{\hat{{\textbf{B}}}}^2\right]&\le \frac{1+\tau _{\textbf{B}}^2}{2}{\mathbb {E}}\left[ \left\| {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] +c_2{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_{k}\right\| _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\quad +c_3\alpha _k^2+c_4\beta _k^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] +c_4V_g\beta _k^2, \end{aligned}$$
(37)
where the constants
$$\begin{aligned} \begin{aligned}&c_2=8\frac{1+\tau _{{\textbf {B}}}^2}{1-\tau _{{\textbf {B}}}^2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {{\textbf {I}}}_{nd}- \frac{{{\textbf {v}}}{{\textbf {1}}}^\intercal }{n}\otimes {{\textbf {I}}}_{d} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{{\textbf {B}}}}}^2\overline{c}^4\left( C_fL_g^2+4C_g^2L_f^2\right) {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{{\textbf {A}}}}-{{\textbf {I}}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2,\\ {}&c_3=8\frac{1+\tau _{{\textbf {B}}}^2}{1- \tau _{{\textbf {B}}}^2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {{\textbf {I}}}_{nd}- \frac{{{\textbf {v}}}{{\textbf {1}}}^\intercal }{n}\otimes {{\textbf {I}}}_{d} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{{\textbf {B}}}}}^2\overline{c}^2\left( C_fL_g^2+4C_g^2L_f^2\right) {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {{\textbf {A}}} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2 \frac{c_b^2nC_gC_f}{(1- \tau _{{{\textbf {B}}}})^2},\\ {}&c_4=16\frac{1+\tau _{{\textbf {B}}}^2}{1-\tau _{{\textbf {B}}}^2}{\left| \hspace{- 1.0625pt}\left| \hspace{-1.0625pt}\left| {{\textbf {I}}}_{nd}-\frac{{{\textbf {v}}}{{\textbf {1}}}^\intercal }{n}\otimes {{\textbf {I}}}_{d} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{{\textbf {B}}}}}^2\overline{c}^2C_gL_f^2. \end{aligned} \end{aligned}$$
Lastly, we show (14) through combining (34) with (37). Multiplying \(c_5=\frac{1-\tau _{\textbf{A}}^2}{4c_2}\) on both sides of inequality (37),
$$\begin{aligned} \begin{aligned} c_5{\mathbb {E}}\left[ \Vert {\textbf{y}}_{k+1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{k+1}^{'}\Vert _{\hat{{\textbf{B}}}}^2\right]&\le \frac{1+\tau _{\textbf{B}}^2}{2}c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] +\frac{1-\tau _{\textbf{A}}^2}{4}{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] +c_3c_5\alpha _k^2\\&\quad +c_5c_4\beta _k^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] +c_5c_4\beta _k^2V_g. \end{aligned} \end{aligned}$$
Substituting above inequality into (34), we have
$$\begin{aligned} \begin{aligned}&{\mathbb {E}}\left[ \Vert {\textbf{x}}_{k+1}-{\textbf{1}}\otimes {\bar{x}}_{k+1}\Vert _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \Vert {\textbf{y}}_{k+1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{k+1}^{'}\Vert _{\hat{{\textbf{B}}}}^2\right] \\&\le \frac{3+\tau _{\textbf{A}}^2}{4}{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] +\frac{1+\tau _{\textbf{B}}^2}{2}c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] +(c_1+c_3c_5)\alpha _k^2\\&\quad +c_5c_4\beta _k^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] +c_5c_4V_g\beta _k^2\\&\le \rho ^{k} \left( {\mathbb {E}}\left[ \left\| {\textbf{x}}_1-{\textbf{1}}\otimes {\bar{x}}_1\right\| _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_{1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{1}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] \right) +(c_1+c_3c_5)\sum _{t=1}^k\rho ^{k-t}\alpha _t^2\\&\quad +c_5c_4\sum _{t=1}^k\rho ^{k-t}\beta _t^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{t}-{\textbf{z}}_t\right\| ^2\right] +c_5c_4V_g\sum _{t=1}^k\rho ^{k-t}\beta _t^2, \end{aligned} \end{aligned}$$
where \(\rho =\max \left\{ \frac{1+\tau _{\textbf{B}}^2}{2},~\frac{3+\tau _{\textbf{A}}^2}{4}\right\} \). Moreover, by (34) and Lemma 5,
$$\begin{aligned} {\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right]&\le \frac{1+\tau _{\textbf{A}}^2}{2}{\mathbb {E}}\left[ \left\| {\textbf{x}}_{k-1}-{\textbf{1}}\otimes {\bar{x}}_{k-1}\right\| _{\hat{{\textbf{A}}}}^2\right] +c_1\alpha _{k-1}^2\\&\cdots \\&\le \left( \frac{1+\tau _{\textbf{A}}^2}{2}\right) ^{k-1}{\mathbb {E}}\left[ \left\| {\textbf{x}}_{1}-{\textbf{1}}\otimes {\bar{x}}_{1}\right\| _{\hat{{\textbf{A}}}}^2\right] +c_1\sum _{t=1}^{k-1}\left( \frac{1+\tau _{\textbf{A}}^2}{2}\right) ^{k-1-t}\alpha _{t}^2\\&\le \left( \frac{1+\tau _{\textbf{A}}^2}{2}\right) ^{k-1}{\mathbb {E}}\left[ \left\| {\textbf{x}}_{1}-{\textbf{1}}\otimes {\bar{x}}_{1}\right\| _{\hat{{\textbf{A}}}}^2\right] +c_1c_{\tau }\alpha _{k-1}^2, \end{aligned}$$
where \(c_{\tau }>0\) is some constant. Note that \(\lim _{k\rightarrow \infty }\frac{\alpha _{k}}{\alpha _{k+1}}=1\), there exists a positive constant \(U_1\) such that
$$\begin{aligned} {\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] \le U_1\alpha _{k}^2. \end{aligned}$$
(38)
The proof is complete. \(\square \)
1.3 Proof of Lemma 3
Proof
By the definitions of \({\textbf{z}}_{k+1}\) and \({\textbf{g}}_{k+1}\),
$$\begin{aligned} {\textbf{z}}_{k+1}- {\textbf{g}}_{k+1} =(1-\beta _k)\left( {\textbf{z}}_k-{\textbf{g}}_k\right) +({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)}). \nonumber \\ \end{aligned}$$
(39)
Then
$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\textbf{z}}_{k+1}- {\textbf{g}}_{k+1}\Vert ^2\right] \nonumber \\&= (1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +{\mathbb {E}}\left[ \Vert ({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)})\Vert ^2\right] \nonumber \\&\quad + 2{\mathbb {E}}\left[ \left\langle (1-\beta _k)({\textbf{z}}_k-{\textbf{g}}_k),({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)})\right\rangle \right] \nonumber \\&= (1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +{\mathbb {E}}\left[ \Vert ({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)})\Vert ^2\right] , \end{aligned}$$
(40)
where the second equality follows from the fact
$$\begin{aligned}{} & {} {\mathbb {E}}\left[ ({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)})\right] \\{} & {} \qquad ={\mathbb {E}}\left[ {\mathbb {E}}\left[ ({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)})\bigg |{\mathcal {F}}_k^{'}\right] \right] ={\textbf{0}} \end{aligned}$$
with
$$\begin{aligned} \begin{aligned}&{\mathcal {F}}_1^{'}=\sigma \left( x_{i,1}, z_{i,1}, \phi _{i,1},\zeta _{i,1}:i\in {\mathcal {V}}\right) ,\\&{\mathcal {F}}_k^{'}=\sigma \left( \{x_{i,1},z_{i,1}, \phi _{i,t},\zeta _{i,t}:i\in {\mathcal {V}}, 1\le t\le k\}\cup \{\phi _{i,t}^{'}:i\in {\mathcal {V}}, 2\le t\le k\}\right) (k\ge 2). \end{aligned}\nonumber \\ \end{aligned}$$
(41)
For the second term on the right hand side of (40),
$$\begin{aligned}&{\mathbb {E}}\left[ \Vert ({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)})\Vert ^2\right] \\&={\mathbb {E}}\left[ \Vert (1-\beta _k)({\textbf{G}}_{k+1}^{(1)}-{\textbf{G}}_{k+1}^{(2)})+\beta _k( {\textbf{G}}_{k+1}^{(1)}- {\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{g}}_{k+1}) \Vert ^2\right] \\&\le 3(1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{G}}_{k+1}^{(1)}-{\textbf{G}}_{k+1}^{(2)}\Vert ^2\right] +3\beta _k^2{\mathbb {E}}\left[ \Vert {\textbf{G}}_{k+1}^{(1)}- {\textbf{g}}_{k+1}\Vert ^2\right] \\&\quad +3(1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{g}}_{k+1} \Vert ^2\right] \\&\le 6(1-\beta _k)^2C_g{\mathbb {E}}\left[ \Vert {\textbf{x}}_{k+1}-{\textbf{x}}_k\Vert ^2\right] +3V_g\beta _k^2, \end{aligned}$$
where the second inequality follows from the conditions (c) and (d) in Assumption 1. Substitute above inequality into (40),
$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\textbf{z}}_{k+1}- {\textbf{g}}_{k+1}\Vert ^2\right] \nonumber \\&\le (1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +6(1-\beta _k)^2C_g{\mathbb {E}}\left[ \Vert {\textbf{x}}_{k+1}-{\textbf{x}}_k\Vert ^2\right] +3V_g\beta _k^2\nonumber \\&=(1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +6(1-\beta _k)^2C_g{\mathbb {E}}\nonumber \\&\quad \left[ \left\| \left( \tilde{{\textbf{A}}}-{\textbf{I}}_{nd}\right) \left( {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right) -\alpha _k\tilde{{\textbf{A}}}{\textbf{y}}_k\right\| ^2\right] +3V_g\beta _k^2\nonumber \\&\le (1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +12(1-\beta _k)^2C_g\overline{c}^2{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{A}}}-{\textbf{I}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\quad +12(1-\beta _k)^2C_g\alpha _k^2{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2{\mathbb {E}}\left[ \left\| {\textbf{y}}_k\right\| ^2\right] +3V_g\beta _k^2\nonumber \\&\le (1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +12C_g\overline{c}^2{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{A}}}-{\textbf{I}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\quad +\frac{12c_b^2nC_g^2C_f{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2}{(1-\tau _{{\textbf{B}}})^2}\alpha _k^2 +3V_g\beta _k^2, \end{aligned}$$
(42)
where \(\overline{c}\) is defined in (12), the equality follows from the fact \(\left( \tilde{{\textbf{A}}}-{\textbf{I}}_{nd}\right) ({\textbf{1}}\otimes {\bar{x}}_k)={\textbf{0}}\) by the row stochasticity of \({\textbf{A}}\), the last inequality follows from (33). Substitute (38) into (42), we have
$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\textbf{z}}_{k+1}- {\textbf{g}}_{k+1}\Vert ^2\right]&\le (1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] \\&\quad +\left( 12C_g\overline{c}^2{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{A}}}-{\textbf{I}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2U_1+\frac{12c_b^2nC_g^2C_f{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2}{(1-\tau _{{\textbf{B}}})^2}\right) \alpha _k^2+3V_g\beta _k^2. \end{aligned}$$
The proof is complete. \(\square \)
1.4 Proof of Lemma 4
Proof
We first show part (i). By the definition of \(\xi _k\),
$$\begin{aligned} \xi _k=\sum _{t=1}^{k-1}\tilde{{\textbf{B}}}^{k-1-t}(\tilde{{\textbf{B}}}-{\textbf{I}}_{nd})\epsilon _t+\epsilon _{k}=\sum _{t=1}^{k}\tilde{{\textbf{B}}}(k,t)\epsilon _t, \end{aligned}$$
(43)
where \(\epsilon _t:={\textbf{H}}_t-{\textbf{J}}_t\), \({\textbf{H}}_t\) and \({\textbf{J}}_t\) present in (10) and Lemma 2 respectively, \(\tilde{{\textbf{B}}}(k,t)\) is defined in (31). Then we have
$$\begin{aligned} {\mathbb {E}}[\Vert \xi _k\Vert ^2]&\le \sum _{t_1=1}^{k}\sum _{t_2=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t_1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t_2) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathbb {E}}\left[ \Vert \epsilon _{t_1}\Vert \Vert \epsilon _{t_2}\Vert \right] \nonumber \\&\le c_b^2\sum _{t_1=1}^{k}\sum _{t_2=1}^{k}\tau _{{\textbf{B}}}^{2k-t_1-t_2}{\mathbb {E}}\left[ \Vert \epsilon _{t_1}\Vert \Vert \epsilon _{t_2}\Vert \right] \nonumber \\&\le c_b^2\sum _{t_1=1}^{k}\sum _{t_2=1}^{k}\tau _{{\textbf{B}}}^{2k-t_1-t_2}\frac{{\mathbb {E}}\left[ \Vert \epsilon _{t_1}\Vert ^2+\Vert \epsilon _{t_2}\Vert ^2\right] }{2}, \end{aligned}$$
(44)
where \(c_b=\max \left\{ \overline{c},\frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{B}}-{\textbf{I}}_{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}}{\tau _{{\textbf{B}}}}\overline{c}\right\} \), the second inequality follows from (32). By the definition of \(\epsilon _k\),
$$\begin{aligned} {\mathbb {E}}\left[ \Vert \epsilon _{k}\Vert ^2\right]&=\sum _{j=1}^n{\mathbb {E}}\left[ \Vert \nabla G_j(x_{j,k};\phi _{j,k})\nabla F_j(z_{j,k};\zeta _{j,k})-\nabla g_j(x_{j,k})\nabla f_j(z_{j,k})\Vert ^2\right] \nonumber \\&\le 2\sum _{j=1}^n\left( {\mathbb {E}}\left[ \Vert \nabla G_j(x_{j,k};\phi _{j,k})\Vert ^2\Vert \nabla F_j(z_{j,k};\zeta _{j,k})\Vert ^2\right] +C_fC_g\right) \nonumber \\&= 2\sum _{j=1}^n\left( {\mathbb {E}}\left[ {\mathbb {E}}\left[ \Vert \nabla G_j(x_{j,k};\phi _{j,k})\Vert ^2\Vert \nabla F_j(z_{j,k};\zeta _{j,k})\Vert ^2\big |{\mathcal {F}}_k,\zeta _{j,k}\right] \right] +C_fC_g\right) \nonumber \\&\le 2\sum _{j=1}^n\left( C_g{\mathbb {E}}\left[ \Vert \nabla F_j(z_{j,k};\zeta _{j,k})\Vert ^2\right] +C_fC_g\right) \le 4n C_fC_g, \end{aligned}$$
(45)
where
$$\begin{aligned} \begin{aligned}&{\mathcal {F}}_1=\sigma \{x_{i,1}, z_{i,1}:i\in {\mathcal {V}}\},\\&{\mathcal {F}}_k=\sigma \left( \{x_{i,1},z_{i,1}, \phi _{i,t},\zeta _{i,t}:i\in {\mathcal {V}}, 1\le t\le k-1\}\cup \{\phi _{i,t}^{'}:i\in {\mathcal {V}}, 2\le t\le k\}\right) (k\ge 2). \end{aligned}\nonumber \\ \end{aligned}$$
(46)
Substitute (45) into (44), \({\mathbb {E}}[\Vert \xi _k\Vert ^2] \le c_b^24n C_fC_g\sum _{t_1=1}^{k}\sum _{t_2=1}^{k}\tau _{{\textbf{B}}}^{2k-t_1-t_2}\le \frac{c_b^24n C_fC_g}{(1-\tau _{{\textbf{B}}})^2}\). Part (i) is obtained.
By (43),
$$\begin{aligned}&\left| {\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \right| \\&=\left| \sum _{t=1}^{k}{\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{B}}}(k,t)\epsilon _t\right\rangle \right] \right| \\&=\left| \sum _{t=1}^{k-1}{\mathbb {E}}\left[ {\mathbb {E}}\left[ \left\langle \sum _{l=t+1}^{k}\left( \nabla h({\bar{x}}_l)-\nabla h({\bar{x}}_{l-1})\right) +\nabla h({\bar{x}}_t),\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{B}}}(k,t)\epsilon _t\right\rangle \bigg |{\mathcal {F}}_t\right] \right] \right. \\&\quad \left. +{\mathbb {E}}\left[ {\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{B}}}(k,t)\epsilon _k\right\rangle \bigg |{\mathcal {F}}_k\right] \right] \right| \\&=\left| \sum _{t=1}^{k-1}{\mathbb {E}}\left[ \left\langle \sum _{l=t+1}^{k}\left( \nabla h({\bar{x}}_l)-\nabla h({\bar{x}}_{l-1})\right) ,\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{B}}}(k,t)\epsilon _t\right\rangle \right] \right| \\&\le \frac{\Vert {\textbf{u}}\Vert Lc_b}{n}\sum _{t=1}^{k-1}\tau _{{\textbf{B}}}^{k-t}\sum _{l=t+1}^{k}{\mathbb {E}}\left[ \left\| {\bar{x}}_l-{\bar{x}}_{l-1}\right\| \left\| \epsilon _t\right\| \right] \\&\le \frac{\Vert {\textbf{u}}\Vert ^2Lc_b}{n^2}\sum _{t=1}^{k-1}\tau _{{\textbf{B}}}^{k-t}\sum _{l=t+1}^{k}\alpha _l{\mathbb {E}}\left[ \left\| {\textbf{y}}_l\right\| \left\| \epsilon _t\right\| \right] , \end{aligned}$$
where the third equality holds as \(\{\epsilon _t\}\) is a martingale difference sequence, the first inequality follows from (32) and the last inequality follows from the fact \({\bar{x}}_{k+1}={\bar{x}}_{k}-\alpha _k\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k\). By (33) and (45),
$$\begin{aligned} {\mathbb {E}}\left[ \left\| {\textbf{y}}_l\right\| \left\| \epsilon _t\right\| \right] \le \frac{{\mathbb {E}}\left[ \left\| {\textbf{y}}_l\right\| ^2\right] +{\mathbb {E}}\left[ \left\| \epsilon _k\right\| ^2\right] }{2}\le \frac{c_b^2n\ C_fC_g}{2(1-\tau _{{\textbf{B}}})^2}+2nC_fC_g. \end{aligned}$$
(47)
Let \(U=\frac{\Vert {\textbf{u}}\Vert ^2Lc_b}{n}\left( \frac{c_b^2\ C_fC_g}{2(1-\tau _{{\textbf{B}}})^2}+2C_fC_g\right) \),
$$\begin{aligned} {\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right]{} & {} \le (1-\tau _{{\textbf{B}}})U \sum _{t=1}^{k-1}\tau _{{\textbf{B}}}^{k-t}\sum _{l=t+1}^{k}\alpha _l\\{} & {} =(1-\tau _{{\textbf{B}}})U\sum _{t=2}^{k}\alpha _t\tau _{{\textbf{B}}}^{k-t}\left( \sum _{l=1}^{t-1}\tau _{{\textbf{B}}}^l\right) \le U c \alpha _k, \end{aligned}$$
where the last inequality follows from the fact \((1-\tau _{{\textbf{B}}})\left( \sum _{l=1}^{t-1}\tau _{{\textbf{B}}}^l\right) \le 1\) and Lemma 5. Part (ii) holds. The proof is complete. \(\square \)
1.5 Proof of Theorem 1
Proof
We first estimate the upper bound of \(\nabla h({\bar{x}}_k)\) in expectation. Noting that h(x) is \(L\left( = C_gL_f + C_f^{1/2}L_g\right) \)-smooth [46],
$$\begin{aligned} h({\bar{x}}_{k+1})&\le h({\bar{x}}_k)+\langle \nabla h({\bar{x}}_k),{\bar{x}}_{k+1}-{\bar{x}}_k\rangle +\frac{L}{2}\Vert {\bar{x}}_{k+1}-{\bar{x}}_k\Vert ^2\\&=h({\bar{x}}_k)-\left\langle \nabla h({\bar{x}}_k),\alpha _k\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}+\xi _k\right) \right\rangle +\frac{L}{2}\left\| \alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k\right\| ^2\\&=h({\bar{x}}_k)-\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\Vert \nabla h({\bar{x}}_k)\Vert ^2+\frac{L}{2}\left\| \alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k\right\| ^2\\&\quad +\left\langle \nabla h({\bar{x}}_k),\alpha _{k}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\nabla h({\bar{x}}_k)-\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}+\xi _k\right) \right) \right\rangle , \end{aligned}$$
where the second equality follows from the fact that
$$\begin{aligned} {\bar{x}}_{k+1}={\bar{x}}_{k}-\alpha _k\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k={\bar{x}}_{k}-\alpha _k\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}+\xi _k\right) . \end{aligned}$$
Take expectation on both sides of above inequality,
$$\begin{aligned} {\mathbb {E}}\left[ h({\bar{x}}_{k+1})\right]&\le {\mathbb {E}}\left[ h({\bar{x}}_k)\right] -\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\frac{L}{2}{\mathbb {E}}\left[ \left\| \alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k\right\| ^2\right] \nonumber \\&\quad +{\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),\alpha _{k}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\nabla h({\bar{x}}_k)-\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}+\xi _k\right) \right) \right\rangle \right] . \end{aligned}$$
(48)
For the third term on the right hand of (48),
$$\begin{aligned} \frac{L}{2}{\mathbb {E}}\left[ \left\| \alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k\right\| ^2\right] \le \frac{L\alpha _{k}^2\Vert {\textbf{u}}\Vert ^2}{2n^2}{\mathbb {E}}\left[ \left\| {\textbf{y}}_k\right\| ^2\right] \le \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}\alpha _{k}^2, \end{aligned}$$
(49)
where the second inequalities follows from (33).
For the fourth term on the right hand of (48),
$$\begin{aligned}&{\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),\alpha _{k}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\nabla h({\bar{x}}_k)-\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}+\xi _k\right) \right) \right\rangle \right] \nonumber \\&\le \frac{\alpha _k^2}{2\tau }{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\frac{3\tau }{2}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\right) ^2{\mathbb {E}}\left[ \Vert P_1\Vert ^2\right] +\frac{3\tau }{2}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\right) ^2{\mathbb {E}}\left[ \Vert P_2\Vert ^2\right] \nonumber \\&\quad +\frac{3\tau \Vert {\textbf{u}}\Vert ^2}{2n^2}{\mathbb {E}}\left[ \Vert {\textbf{v}}\otimes {\bar{y}}_k^{'}-{\textbf{y}}_k^{'}\Vert ^2\right] +\frac{\alpha _{k}\Vert {\textbf{u}}\Vert }{n}\left| {\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),-\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \right| \nonumber \\&\le \frac{\alpha _k^2}{2\tau }{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\frac{3\tau L^2}{2n}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\right) ^2{\mathbb {E}}\left[ \Vert {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\Vert ^2\right] \nonumber \\&\quad +\frac{3\tau C_gL_f^2}{2n}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\right) ^2{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] \nonumber \\&\quad +\frac{3\tau \Vert {\textbf{u}}\Vert ^2}{2n^2}{\mathbb {E}}\left[ \Vert {\textbf{v}}\otimes {\bar{y}}_k^{'}-{\textbf{y}}_k^{'}\Vert ^2\right] +\frac{\alpha _{k}\Vert {\textbf{u}}\Vert }{n}\left| {\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),-\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \right| \nonumber \\&\le \frac{\alpha _k^2}{2\tau }{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\frac{3\tau L^2n}{2}{\mathbb {E}}\left[ \Vert {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\Vert ^2\right] +\frac{3\tau C_gL_f^2n}{2}{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] \nonumber \\&\quad +\frac{3\tau }{2}{\mathbb {E}}\left[ \Vert {\textbf{v}}\otimes {\bar{y}}_k^{'}-{\textbf{y}}_k^{'}\Vert ^2\right] +U_2\alpha _k^2, \end{aligned}$$
(50)
where \(P_1=\nabla h({\bar{x}}_k)-\frac{1}{n}\sum _{j=1}^n\nabla g_j(x_{j,k})\nabla f_j(g_j(x_{j,k}))\), \(P_2=\frac{1}{n}\sum _{j=1}^n\nabla g_j(x_{j,k})\nabla f_j(g_j(x_{j,k}))-{\bar{y}}_k^{'}\) and \(\tau \) can be any positive scalar, the first inequality follows from Cauchy-Schwartz inequality and the fact \(ab\le \frac{1}{2\tau }a^2+\frac{\tau }{2}b^2\), the second inequality follows from the Lipschitz continuity of \(\nabla g_j(\cdot )\nabla f_j(g_j(\cdot ))\), Assumption 1 and the fact \({\bar{y}}_k^{'}=\frac{1}{n}\sum _{j=1}^n\nabla g_j(x_{j,k})\nabla f_j(z_{j,k})\), the third inequality follows from the facts \({\textbf{u}}^\intercal {\textbf{v}}\le n^2, \Vert {\textbf{u}}\Vert \le n\) and Lemma 4 (ii).
Plug (49)-(50) into (48) and set \(\tau =\frac{2n\alpha _k}{3{\textbf{u}}^\intercal {\textbf{v}}}\),
$$\begin{aligned}&{\mathbb {E}}\left[ h({\bar{x}}_{k+1})\right] \nonumber \\&\quad \le {\mathbb {E}}\left[ h({\bar{x}}_k)\right] -\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\left( 1-\frac{n\alpha _k}{{\textbf{u}}^\intercal {\textbf{v}}2\tau }\right) {\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] \nonumber \\&\qquad +\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) \alpha _k^2\nonumber \\&\qquad +\frac{3\tau C_gL_f^2n}{2}{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] +\frac{3\tau L^2n}{2}{\mathbb {E}}\left[ \Vert {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\Vert ^2\right] +\frac{3\tau }{2}{\mathbb {E}}\left[ \Vert {\textbf{v}}\otimes {\bar{y}}_k^{'}-{\textbf{y}}_k^{'}\Vert ^2\right] \nonumber \\&\quad \le {\mathbb {E}}\left[ h({\bar{x}}_k)\right] -\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _k}{4n}{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) \alpha _{k}^2\nonumber \\&\qquad + \frac{n\beta _k}{{\textbf{u}}^\intercal {\textbf{v}}}{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] +\frac{n^2L^2\alpha _k}{{\textbf{u}}^\intercal {\textbf{v}}}{\mathbb {E}}\left[ \Vert {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\Vert ^2\right] +\frac{n\alpha _k}{{\textbf{u}}^\intercal {\textbf{v}}}{\mathbb {E}}\left[ \Vert {\textbf{v}}\otimes {\bar{y}}_k^{'}-{\textbf{y}}_k^{'}\Vert ^2\right] \nonumber \\&\quad \le {\mathbb {E}}\left[ h({\bar{x}}_k)\right] -\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _k}{4n}{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) \alpha _{k}^2\nonumber \\&\qquad + \frac{n\beta _k}{{\textbf{u}}^\intercal {\textbf{v}}}{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] +\alpha _kc_6\rho ^{k-1} \left( {\mathbb {E}}\left[ \left\| {\textbf{x}}_1-{\textbf{1}}\otimes {\bar{x}}_1\right\| _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_{1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{1}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] \right) \nonumber \\&\qquad +\alpha _k^3c_6(c_1+c_3c_5)c_\alpha +\alpha _k^3c_6c_5c_4C_g^2L_f^4n^2\left( \sum _{t=1}^{k-1}\rho ^{k-t-1}{\mathbb {E}}\left[ \left\| {\textbf{g}}_{t}-{\textbf{z}}_t\right\| ^2\right] +c_\beta \right) , \end{aligned}$$
(51)
where \(c_6=\frac{\max \{\overline{c}^2\,L^2n^2,n\overline{c}^2/c_5\}}{{\textbf{u}}^\intercal {\textbf{v}}}\), \(c_5\) is defined in Lemma 2, \(c_\alpha \) and \(c_\beta \) are some constant, the last inequality follows from Lemma 2, Lemma 5 and the definitions \(\alpha _{k}=\frac{a}{\sqrt{K}}\), \(\beta _k=\alpha _{k} C_gL_f^2n\).
Reordering the terms of (51) and summing over k from 1 to K,
$$\begin{aligned}&\sum _{k=1}^K\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _k}{4n}{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] \\&\le {\mathbb {E}}\left[ h({\bar{x}}_1)\right] -{\mathbb {E}}\left[ h({\bar{x}}_{K+1})\right] +\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) \sum _{k=1}^K\alpha _{k}^2\\&\quad + \sum _{k=1}^K\frac{n\beta _k}{{\textbf{u}}^\intercal {\textbf{v}}}{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] +\sum _{k=1}^K\alpha _kc_6\rho ^{k-1} \left( {\mathbb {E}}\left[ \left\| {\textbf{x}}_1-{\textbf{1}}\otimes {\bar{x}}_1\right\| _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_{1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{1}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] \right) \\&\quad +\sum _{k=1}^K\alpha _k^3c_6(c_1+c_3c_5)c_\alpha +\sum _{k=2}^K\alpha _k^3c_6c_5c_4C_g^2L_f^4n^2\left( \sum _{t=1}^{k-1}\rho ^{k-t-1}{\mathbb {E}}\left[ \left\| {\textbf{g}}_{t}-{\textbf{z}}_t\right\| ^2\right] +c_\beta \right) \\&\le {\mathbb {E}}\left[ h({\bar{x}}_1)\right] -{\mathbb {E}}\left[ h({\bar{x}}_{K+1})\right] +\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) K\alpha _{1}^2\\&\quad + \frac{n\beta _1}{{\textbf{u}}^\intercal {\textbf{v}}}\sum _{k=1}^K{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] +\frac{\alpha _1c_6\left( {\mathbb {E}}\left[ \left\| {\textbf{x}}_1-{\textbf{1}}\otimes {\bar{x}}_1\right\| _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_{1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{1}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] \right) }{1-\rho }\\&\quad +K\alpha _1^3\left( c_6(c_1+c_3c_5)c_\alpha +c_6c_5c_4C_g^2L_f^4n^2c_\beta \right) +\alpha _1^3\frac{c_6c_5c_4C_g^2L_f^4n^2}{1-\rho }\sum _{k=1}^{K-1}{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] , \end{aligned}$$
where the second inequality follows from definitions \(\alpha _{k}=\frac{a}{\sqrt{K}}\) and \(\beta _k=\alpha _{k} C_gL_f^2n\). Multiplying both sides of the above inequality by \(\frac{4n}{a{\textbf{u}}^\intercal {\textbf{v}}\sqrt{K}}\),
$$\begin{aligned} \frac{1}{K}\sum _{k=1}^K{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right]&\le \frac{4\left( {\mathbb {E}}\left[ h({\bar{x}}_1)\right] -{\mathbb {E}}\left[ h({\bar{x}}_{K+1})\right] \right) \frac{n}{a{\textbf{u}}^\intercal {\textbf{v}}}+4\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) \frac{na}{{\textbf{u}}^\intercal {\textbf{v}}}}{\sqrt{K}}\\&\quad +{\mathcal {O}}\left( \left( \frac{1}{K}+\frac{1}{K^2}\right) \sum _{k=1}^{K}{\mathbb {E}}\left[ \left\| {\textbf{g}}_{t}-{\textbf{z}}_t\right\| ^2\right] \right) +{\mathcal {O}}\left( \frac{1}{K}\right) . \end{aligned}$$
By Lemma 3,
$$\begin{aligned} \frac{1}{K}\sum _{k=2}^{K+1}{\mathbb {E}}\left[ \left\| {\textbf{g}}_{t}-{\textbf{z}}_t\right\| ^2\right]&\le \frac{1}{K}\sum _{k=1}^{K}(1-\beta _k){\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +{\mathcal {O}}\left( \frac{1}{K}\right) . \end{aligned}$$
Rearranging the above inequality, we have \(\frac{1}{K}\sum _{k=1}^{K}{\mathbb {E}}\left[ \left\| {\textbf{g}}_{t}-{\textbf{z}}_t\right\| ^2\right] \le {\mathcal {O}}\left( \frac{1}{\sqrt{K}}\right) \). Then,
$$\begin{aligned} \frac{1}{K}\sum _{k=1}^K{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right]&\le \frac{4\left( {\mathbb {E}}\left[ h({\bar{x}}_1)\right] -{\mathbb {E}}\left[ h({\bar{x}}_{K+1})\right] \right) /a+4\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) a}{\sqrt{K}}+{\mathcal {O}}\left( \frac{1}{\sqrt{K}}\right) \\&\le {\mathcal {O}}\left( \frac{1}{\sqrt{K}}\right) . \end{aligned}$$
By the Lipschitz continuity of \(\nabla h(\cdot )\), we have
$$\begin{aligned} \frac{1}{K}\sum _{k=1}^K{\mathbb {E}}\left[ \Vert \nabla h(x_{i,k})\Vert ^2\right]&\le \frac{2}{K}\sum _{k=1}^K{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\frac{2L^2}{K}\sum _{k=1}^K{\mathbb {E}}\left[ \Vert x_{i,k}-{\bar{x}}_k\Vert ^2\right] \\&\le {\mathcal {O}}\left( \frac{1}{\sqrt{K}}\right) , \end{aligned}$$
where the last inequality follows from (38). The proof is complete. The proof is complete. \(\square \)
Lemma 6
Let \(\alpha _k=a/(k+b)^\alpha \), \(a>0,b\ge 0\), \(\alpha \in (1/2, 1]\). Under Assumptions 1–2 and the condition that objective function h(x) is \(\mu \)-strongly convex,
$$\begin{aligned} {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,-\alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \le \frac{\Vert {\textbf{u}}\Vert c_bc_0}{2n(1-\tau _{{\textbf{B}}})}\left( \frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}+4nC_g C_f\right) \alpha _k^2, \end{aligned}$$
where \(c_b=\max \left\{ \overline{c},\frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{B}}-{\textbf{I}}_{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}}{\tau _{{\textbf{B}}}}\overline{c}\right\} \), \(c_0\) is some constant scalar.
Proof
Recall the definition \({\bar{x}}_{k}= \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{x}}_{k}\) in Lemma 2,
$$\begin{aligned} \begin{aligned} {\bar{x}}_{k}-x^* ={\bar{x}}_{k-1}-x^*-\alpha _{k-1}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_{k-1}={\bar{x}}_{1}-x^*-\sum _{t=1}^{k-1}\alpha _t\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_t, \end{aligned} \end{aligned}$$
and then
$$\begin{aligned} \begin{aligned}&{\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*, -\alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \\&={\mathbb {E}}\left[ \left\langle {\bar{x}}_{1}-x^*-\sum _{t=1}^{k-1}\alpha _t\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_t, -\alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \\&=-\alpha _{k}{\mathbb {E}}\left[ \left\langle {\bar{x}}_{1}-x^*-\sum _{t=1}^{k-1}\alpha _t\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_t, \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \sum _{t=1}^{k}\tilde{{\textbf{B}}}(k,t)\epsilon _t\right\rangle \right] , \end{aligned} \end{aligned}$$
where \(\epsilon _t={\textbf{H}}_t-{\textbf{J}}_t\), \({\textbf{H}}_t\) and \({\textbf{J}}_t\) are defined in (10) and Lemma 2 respectively, the second equality follows from (43). Note that \({\mathbb {E}}\left[ \left\langle {\bar{x}}_{0}-x^*, \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{B}}}(k,t)\epsilon _t\right\rangle \bigg |{\mathcal {F}}_t\right] =0\) and
$$\begin{aligned} {\mathbb {E}}\left[ \left\langle \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_{t_1}, \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{B}}}(k,t_2)\epsilon _{t_2}\right\rangle \bigg |{\mathcal {F}}_{t_2}\right] =0~ (t_1<t_2), \end{aligned}$$
where \({\mathcal {F}}_k\) is defined in (46). Then
$$\begin{aligned} \begin{aligned}&{\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*, -\alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \\&\le \alpha _{k}\sum _{t_1=1}^{k-1}\sum _{t_2=1}^{t_1}\alpha _{t_1}\frac{\Vert {\textbf{u}}\Vert ^2}{2n^2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t_2) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\left( {\mathbb {E}}\left[ \Vert {\textbf{y}}_{t_1}\Vert ^2\right] +{\mathbb {E}}\left[ \left\| \epsilon _{t_2}\right\| ^2\right] \right) \\&\le \alpha _{k}\sum _{t_1=1}^{k-1}\sum _{t_2=1}^{t_1}\alpha _{t_1}\frac{\Vert {\textbf{u}}\Vert ^2c_b}{2n^2}\tau _{{\textbf{B}}}^{k-t_2}\left( \frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}+4nC_g C_f\right) \\&\le \frac{\Vert {\textbf{u}}\Vert ^2c_bc}{2n^2(1-\tau _{{\textbf{B}}})}\left( \frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}+4nC_g C_f\right) \alpha _k\alpha _{k-1}, \end{aligned} \end{aligned}$$
where \(c_b=\max \left\{ \overline{c},\frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{B}}-{\textbf{I}}_{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}}{\tau _{{\textbf{B}}}}\overline{c}\right\} \) and c is some constant scalar, the second inequality follows from (32), (33) and (45), the third inequality follows from Lemma 5. Noting that \(\lim _{k\rightarrow \infty }\frac{\alpha _{k-1}}{\alpha _k}=1\), there exists constant \(c_0>c\) such that
$$\begin{aligned} {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,-\alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \le \frac{\Vert {\textbf{u}}\Vert ^2c_bc_0}{2n^2(1-\tau _{{\textbf{B}}})}\left( \frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}+4nC_g C_f\right) \alpha _k^2. \end{aligned}$$
The proof is complete. \(\square \)
1.6 Proof of Theorem 2
Proof
Recall the definition \({\bar{x}}_{k+1}= \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{x}}_{k+1}\) in Lemma 2,
$$\begin{aligned} {\bar{x}}_{k+1}&=\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{A}}}\left( {\textbf{x}}_k-\alpha _k{\textbf{y}}_k\right) \nonumber \\&={\bar{x}}_{k}-\alpha _k\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}+\xi _k\right) \nonumber \\&={\bar{x}}_{k}-\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\nabla h({\bar{x}}_k)+\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\Bigg (\underbrace{\nabla h({\bar{x}}_k)-\frac{1}{n}\sum _{j=1}^n\nabla g_j(x_{j,k})\nabla f_j(g_j(x_{j,k}))}_{P^{(1)}_k}\nonumber \\&\quad +\underbrace{\frac{1}{n}\sum _{j=1}^n\nabla g_j(x_{j,k})\nabla f_j(g_j(x_{j,k}))-{\bar{y}}^{'}_k}_{P^{(2)}_k}+\underbrace{\frac{n}{{\textbf{u}}^\intercal {\textbf{v}}}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{v}}\otimes {\bar{y}}^{'}_k-{\textbf{y}}_k^{'}\right) }_{P^{(3)}_k}\nonumber \\&\quad +\underbrace{\left( -\frac{n}{{\textbf{u}}^\intercal {\textbf{v}}}\right) \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k}_{P^{(4)}_k}\Bigg ), \end{aligned}$$
(52)
where \({\textbf{y}}_k^{'}\) and \(\xi _{k+1}\) are defined in (13) and (19), the second equality follows from the fact \({\textbf{u}}^\intercal {\textbf{A}}={\textbf{1}}\). Subsequently,
$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\bar{x}}_{k+1}-x^*\Vert ^2\right] \nonumber \\&={\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*-\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\nabla h({\bar{x}}_k)\right\| ^2\right] +\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\| ^2\right] \nonumber \\&\quad +2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*-\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\nabla h({\bar{x}}_k), P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\rangle \right] \nonumber \\&\le \left( 1-\frac{{\textbf{u}}^\intercal {\textbf{v}}\mu \alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] +\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\| ^2\right] \nonumber \\&\quad +2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*-\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\nabla h({\bar{x}}_k), P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\rangle \right] \nonumber \\&\le \left( 1-\frac{{\textbf{u}}^\intercal {\textbf{v}}\mu \alpha _{k}}{n}\right) ^2 {\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] +\frac{\tau }{2}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| \nabla h({\bar{x}}_k)\right\| ^2\right] \nonumber \\&\quad +\left( 1+\frac{1}{2\tau }\right) \left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\| ^2\right] \nonumber \\&\quad +2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\rangle \right] \nonumber \\&\le \left( \left( 1-\frac{{\textbf{u}}^\intercal {\textbf{v}}\mu \alpha _{k}}{n}\right) ^2+\frac{\tau }{2}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2L^2 \right) {\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] \nonumber \\&\quad +\left( 1+\frac{1}{2\tau }\right) \left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\| ^2\right] \nonumber \\&\quad +2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\rangle \right] , \end{aligned}$$
(53)
where \(\tau \) is any positive scalar, the first inequality follows from [30, Lemm 10], the second inequality follows from the fact \(ab\le \frac{\tau a^2}{2}+\frac{b^2}{2\tau }\) and the third inequality follows from the fact that \(\nabla h(x)\) is \(L(:=C_gL_f + C_f^{1/2}L_g)\)-smooth.
For the second term on the right hand side of (53),
$$\begin{aligned}&\left( 1+\frac{1}{2\tau }\right) \left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\| ^2\right] \nonumber \\&\le \left( 1+\frac{1}{2\tau }\right) \left( 4\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2\frac{L^2\overline{c}^2}{n}{\mathbb {E}}\left[ \Vert x_{k}-{\textbf{1}}\otimes {\bar{x}}_{k}\Vert _{\hat{{\textbf{A}}}}^2\right] +4\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2\frac{C_g^2L_f^2}{n}{\mathbb {E}}\left[ \left\| {\textbf{g}}_k-{\textbf{z}}_k\right\| ^2\right] \right. \nonumber \\&\quad \left. +4\alpha _k^2\frac{\Vert {\textbf{u}}\Vert ^2}{n^2}\overline{c}^2{\mathbb {E}}\left[ \left\| {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}^{'}_k\right\| _{\hat{{\textbf{B}}}}^2\right] +4\frac{\Vert {\textbf{u}}\Vert ^2}{n^2}\alpha _k^2{\mathbb {E}}\left[ \left\| \xi _k\right\| ^2\right] \right) , \end{aligned}$$
(54)
where \(c_b=\max \left\{ \overline{c},\frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{B}}-{\textbf{I}}_{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}}{\tau _{{\textbf{B}}}}\overline{c}\right\} \), the inequality follows from Assumption 1 (c), the Lipschitz continuity of gradients \(\nabla g(\cdot )\nabla f_j(g(\cdot ))\) and \(\nabla f_j(\cdot )\). By Lemma 3 and [25, Lemmas 4-5 in Chapter 2], there exists a constant \(U_3\) such that
$$\begin{aligned} {\mathbb {E}}\left[ \left\| {\textbf{g}}_k-{\textbf{z}}_k\right\| ^2\right] \le U_3\beta _k, \end{aligned}$$
(55)
and then by Lemmas 2 and 5,
$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\textbf{x}}_{k+1}-{\textbf{1}}\otimes {\bar{x}}_{k+1}\Vert _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \Vert {\textbf{y}}_{k+1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{k+1}^{'}\Vert _{\hat{{\textbf{B}}}}^2\right] \nonumber \\&\le \rho ^{k} \left( {\mathbb {E}}\left[ \left\| {\textbf{x}}_1-{\textbf{1}}\otimes {\bar{x}}_1\right\| _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_{1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{1}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] \right) +(c_1+c_3c_5)\sum _{t=1}^k\rho ^{k-t}\alpha _t^2\nonumber \\&\quad +c_5c_4U_3\sum _{t=1}^k\rho ^{k-t}\beta _t^3+c_5c_4V_g\sum _{t=1}^k\rho ^{k-t}\beta _t^2\nonumber \\&\le {\mathcal {O}}\left( \alpha _{k+1}^2\right) . \end{aligned}$$
(56)
Combining inequalities (54)-(56), we have
$$\begin{aligned}&\left( 1+\frac{1}{2\tau }\right) \left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}} \left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\| ^2\right] \nonumber \\&\le \left( 1+\frac{1}{2\tau }\right) \left( {\mathcal {O}}(\alpha _k^3) +4\frac{\Vert {\textbf{u}}\Vert ^2}{n^2}\alpha _k^2{\mathbb {E}}\left[ \left\| \xi _k\right\| ^2\right] \right) \nonumber \\&\le {\mathcal {O}}(\alpha _k^3)+16\left( 1+\frac{L^2}{\mu ^2}\right) \frac{\Vert {\textbf{u}}\Vert ^2c_b^2 C_fC_g}{n(1-\tau _{{\textbf{B}}})^2}\alpha _k^2, \end{aligned}$$
(57)
where
$$\begin{aligned} \tau =\frac{\mu ^2}{2L^2} \end{aligned}$$
(58)
and the second inequality follows from Lemma 4.
For the third term on the right hand side of (53),
$$\begin{aligned}&2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\rangle \right] \nonumber \\&\le \tau _1 {\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] +\frac{1}{\tau _1}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k\right\| ^2\right] \nonumber \\&\quad +2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,P^{(4)}_k\right\rangle \right] \nonumber \\&\le \tau _1 {\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] +\frac{1}{\tau _1}{\mathcal {O}}(\alpha _k^3)+2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,P^{(4)}_k\right\rangle \right] \nonumber \\&\le \frac{{\textbf{u}}^\intercal {\textbf{v}}\mu \alpha _k}{4n} {\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] +{\mathcal {O}}(\alpha _k^2)+\frac{\Vert {\textbf{u}}\Vert c_bc_0}{n(1-\tau _{{\textbf{B}}})}\left( \frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}+4nC_g C_f\right) \alpha _k^2, \end{aligned}$$
(59)
where \(c_0\) is some constant scalar,
$$\begin{aligned} \tau _1=\frac{{\textbf{u}}^\intercal {\textbf{v}}\mu }{4n}\alpha _k, \end{aligned}$$
the first inequality follows from the fact \(ab\le \frac{\tau _1 a^2}{2}+\frac{b^2}{2\tau _1}\) for any positive scalar \(\tau _1\), the second inequality follows from (57) and the third inequality follows from Lemma 6. Substitute (57)-(59) into (53),
$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\bar{x}}_{k+1}-x^*\Vert ^2\right]&\le \left( 1-\frac{{\textbf{u}}^\intercal {\textbf{v}}\mu \alpha _{k}}{4n}\right) {\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] +{\mathcal {O}}(\alpha _k^2). \end{aligned}$$
Then by [25, Lemmas 4-5 in Chapter 2],
$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\bar{x}}_{k+1}-x^*\Vert ^2\right] ={\mathcal {O}}\left( \alpha _k\right) ~ \text {if}~\alpha _k=a/(k+b)^\alpha ,\alpha \in (1/2,1) \end{aligned}$$
and
$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\bar{x}}_k-x^*\Vert ^2\right] ={\mathcal {O}}\left( \frac{1}{k}\right) ~ \text {if}~\alpha _k=a/(k+b). \end{aligned}$$
The proof is complete. \(\square \)
Lemma 7
Let \(\alpha _{k}=a/(k+b)^\alpha \), \(a>0\), \(b\ge 0\), \(\alpha \in (1/2,1)\). Suppose that
-
(a)
Assumptions 1-2 hold;
-
(b)
for any \(i\in {\mathcal {V}}\), there exist scalar \(C_i\) and matrix \({\textbf{T}}_i\) such that
$$\begin{aligned} \left\| \nabla f_i(y)-\nabla f_i(y^{'})-{\textbf{T}}_i\left( y-y^{'}\right) \right\| \le C_i\Vert y-y^{'}\Vert ^{1+\gamma },\quad \forall y,y^{'}\in {\mathbb {R}}^p, \end{aligned}$$
where \(\gamma \in (0,1]\) satisfies that \(\sum _{k=1}^\infty \frac{\alpha _k^{(1+\gamma )/2}}{\sqrt{k}}<\infty \).
Denote
$$\begin{aligned} \begin{aligned}&{{\textbf {H}}}_{\theta }= {} \left( \begin{array}{cc} \frac{1}{n}{{\textbf {H}}}&{}{} {{\textbf {I}}}_d\\ {{\textbf {0}}}&{}{} \frac{n\beta }{{{\textbf {u}}}^\intercal {{\textbf {v}}}}{{\textbf {I}}}_d \end{array} \right) ,\\ {}&{{\textbf {M}}}(k,t)= {} {\tilde{\alpha }}_t\sum _{l_1=t}^k\Pi _{l_2=t+1}^{l_1}\left( {{\textbf {I}}}_{2d}-{\tilde{\alpha }}_k{{\textbf {H}}}_{\theta }\right) , {{\textbf {N}}}(k,t)={{\textbf {M}}}(k,t)-{{\textbf {H}}}_{\theta }^{-1},\\ {}&\eta _t^{(1)}= {} \left( \begin{array}{c} \left( -\frac{n}{{{\textbf {u}}}^\intercal {{\textbf {v}}}}\right) \left( \frac{{{\textbf {u}}}^\intercal }{n}\otimes {{\textbf {I}}}_{d}\right) \xi _t\\ \frac{\beta }{{{\textbf {u}}}^\intercal {{\textbf {v}}}}\sum _{j=1}^{n}\nabla g_j(x^*){{\textbf {T}}}_j\left( G_j(x^*;\phi _{i,t+1}^{'})-g_j(x^*)\right) \end{array} \right) . \end{aligned} \end{aligned}$$
We have
$$\begin{aligned} \lim _{k\rightarrow \infty }{\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{N}}(k,t)\eta _t^{(1)}\right\| ^2\right] =0. \end{aligned}$$
Proof
Note that
$$\begin{aligned} \eta _t^{(1)}=\left( \begin{array}{c} \left( -\frac{n}{{\textbf{u}}^\intercal {\textbf{v}}}\right) \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _t\\ {\textbf{0}} \end{array} \right) +\left( \begin{array}{c} {\textbf{0}}\\ \frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( G_j(x^*;\phi _{i,t+1}^{'})-g_j(x^*)\right) \end{array} \right) \end{aligned}$$
and
$$\begin{aligned}&{\mathbb {E}}\left[ \left\langle \xi _{t_1},\xi _{t_2}\right\rangle \right] ={\mathbb {E}}\left[ {\mathbb {E}}\left[ \left\langle \xi _{t_1},\xi _{t_2}\right\rangle \big | {\mathcal {F}}_{\min \{t_1,t_2\}}\right] \right] \\&={\mathbb {E}}\left[ \left\langle \xi _{\min \{t_1,t_2\}}, \sum _{l=1}^{\min \{t_1,t_2\}}\tilde{{\textbf{B}}}(\max \{t_1,t_2\},l)\epsilon _l\right\rangle \right] ~(t_1\le t_2),\\&{\mathbb {E}}\left[ \left\langle G_j(x^*;\phi _{i,t_1}^{'})-g_j(x^*),G_j(x^*;\phi _{i,t_2}^{'})-g_j(x^*)\right\rangle \big |{\mathcal {F}}_{\min \{t_1,t_2\}}\right] =0 ~(t_1\ne t_2), \end{aligned}$$
where \({\mathcal {F}}_t\) is defined in (46). Then
$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{N}}(k,t)\eta _t^{(1)}\right\| ^2\right]&={\mathbb {E}}\left[ {\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{N}}(k,t)\eta _t^{(1)}\right\| ^2\bigg |{\mathcal {F}}_{\min \{t_1,t_2\}}\right] \right] \\&\le \left( \frac{\Vert {\textbf{u}}\Vert }{{\textbf{u}}^\intercal {\textbf{v}}}\right) ^2\frac{4}{k}\sum _{t_1=1}^{k}\sum _{t_2=t_1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t_1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t_2) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\sum _{l=1}^{t_1}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(t_2,l) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathbb {E}}\left[ \left\| \xi _{t_1}\right\| \left\| \epsilon _l\right\| \right] \\&\quad +\frac{2}{k}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2{\mathbb {E}}\left[ \left\| \frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( G_j(x^*;\phi _{i,t}^{'})-g_j(x^*)\right) \right\| ^2\right] \\&\le c_bc_N(c_b^2+1)nC_fC_g\left( \frac{\Vert {\textbf{u}}\Vert }{{\textbf{u}}^\intercal {\textbf{v}}}\right) ^2\frac{1}{(1-\tau _{{\textbf{B}}})^4}\frac{8}{k}\sum _{t_1=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t_1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\\&\quad +n\left( \frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\right) ^2\left( \sum _{j=1}^{n}\Vert \nabla g_j(x^*)\Vert ^2\Vert {\textbf{T}}_j\Vert ^2\right) V_gc_N\frac{2}{k}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }, \end{aligned} \end{aligned}$$
where \(c_b=\max \left\{ \overline{c},\frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{B}}-{\textbf{I}}_{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}}{\tau _{{\textbf{B}}}}\overline{c}\right\} ,~c_N=\sup _{k,t}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\), the second inequality follows from the fact \(\sup _{k,t}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }<\infty \) [26, Lemma 1 (ii)], (32), (45), Lemma 4 (i) and Assumption 1 (c). By [26, Lemma 1 (ii)],
$$\begin{aligned} \lim _{k\rightarrow \infty }\frac{1}{k}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }=0, \end{aligned}$$
which implies \( \lim _{k\rightarrow \infty }{\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{N}}(k,t)\eta _t^{(1)}\right\| ^2\right] =0\). The proof is complete. \(\square \)
1.7 Proof of Theorem 3
Proof
By (56),
$$\begin{aligned} \begin{aligned}&{\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}\left( {\bar{x}}_{t}-x^*\right) -\frac{1}{\sqrt{k}}\sum _{t=1}^{k}\left( x_{i,t}-x^*\right) \right\| \right] \\&\le \frac{1}{\sqrt{k}}\sum _{t=0}^{k-1}\sqrt{{\mathbb {E}}\left[ \Vert x_{t}-{\textbf{1}}\otimes {\bar{x}}_{t}\Vert ^2\right] }\le \frac{\sqrt{U_1}}{\sqrt{k}}\sum _{t=0}^{k-1}\alpha _t\rightarrow 0. \end{aligned} \end{aligned}$$
Then by Slutsky’s theorem, it is sufficient to show
$$\begin{aligned} \frac{1}{\sqrt{k}}\sum _{t=1}^{k} \left( \begin{array}{c} {\bar{x}}_t-x^*\\ \frac{\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( z_{j,t}-g_j\left( x_{j,t}\right) \right) }{n} \end{array} \right) {\mathop {\longrightarrow }\limits ^{d}} N\left( {\textbf{0}},\left( \begin{array}{cc} {\textbf{H}}^{-1}\left( {\textbf{S}}_1+{\textbf{S}}_2\right) ({\textbf{H}}^{-1})^\intercal &{} -\frac{1}{n}{\textbf{H}}^{-1}{\textbf{S}}_2\\ -\frac{1}{n}{\textbf{S}}_2({\textbf{H}}^{-1})^\intercal &{} \frac{1}{n^2}{\textbf{S}}_2 \end{array} \right) \right) . \end{aligned}$$
Subtract \(x^*\) from both sides of (52),
$$\begin{aligned} {\bar{x}}_{k+1}-x^*&={\bar{x}}_{k}-x^*-\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\nabla h({\bar{x}}_k)+\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) \left( P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right) \nonumber \\&=\left( {\textbf{I}}_d-{\tilde{\alpha }}_k\frac{1}{n}{\textbf{H}}\right) ({\bar{x}}_{k}-x^*)-{\tilde{\alpha }}_k\frac{1}{n}\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( z_{j,k}-g_j\left( x_{j,k}\right) \right) \nonumber \\&\quad +{\tilde{\alpha }}_k\left( P^{(0)}_k+P^{(1)}_k+P^{(3)}_k+P^{(4)}_k\right) , \end{aligned}$$
(60)
where \({\tilde{\alpha }}_k=\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\),
$$\begin{aligned} P^{(0)}_k=-\left( \nabla h({\bar{x}}_k)-\frac{1}{n}{\textbf{H}}({\bar{x}}_{k}-x^*)\right) +\frac{1}{n}\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( z_{j,k}-g_j\left( x_{j,k}\right) \right) +P^{(2)}_k.\nonumber \\ \end{aligned}$$
(61)
According to the definition of \(z_{i,k+1}\) and \(\beta _k\),
$$\begin{aligned} z_{i,k+1}-g_i\left( x_{i,k+1}\right)&=\left( 1-\frac{n\beta }{{\textbf{u}}^\intercal {\textbf{v}}}{\tilde{\alpha }}_k\right) \left( z_{i,k}-g_i\left( x_{i,k}\right) \right) \\&\quad +G_{i,k+1}^{(1)}-g_i(x_{i,k+1})+\left( 1-\beta _k\right) \left( g_i(x_{i,k})-G_{i,k+1}^{(2)}\right) , \end{aligned}$$
where \(G_{i,k+1}^{(1)}=G_i(x_{i,k+1};\phi _{i,k+1}^{'})\), \(G_{i,k+1}^{(2)}=G_i(x_{i,k};\phi _{i,k+1}^{'})\). Combining above equation with (60),
$$\begin{aligned} \Delta _{k+1}=\left( {\textbf{I}}_{2d}-{\tilde{\alpha }}_k{\textbf{H}}_{\theta }\right) \Delta _k+{\tilde{\alpha }}_k\eta _k^{(1)}+{\tilde{\alpha }}_k\left( \eta _k^{(2)}+\eta _k^{(3)}\right) , \end{aligned}$$
(62)
where
$$\begin{aligned} \Delta _k= & {} \left( \begin{array}{c} {\bar{x}}_{k}-x^*\\ \frac{\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( z_{j,k}-g_j\left( x_{j,k}\right) \right) }{n} \end{array} \right) ,\nonumber \\ {\textbf{H}}_{\theta }= & {} \left( \begin{array}{cc} \frac{1}{n}{\textbf{H}}&{} {\textbf{I}}_d\\ {\textbf{0}}&{} \frac{n\beta }{{\textbf{u}}^\intercal {\textbf{v}}}{\textbf{I}}_d \end{array} \right) , \nonumber \\ \eta _k^{(1)}= & {} \left( \begin{aligned}&\quad \quad \quad ~~\quad \quad \quad ~~\quad \quad P^{(4)}_k\\&\frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( G_j(x^*;\phi _{j,k+1}^{'})-g_j(x^*)\right) \end{aligned} \right) ,\nonumber \\ \eta _k^{(2)}= & {} \left( \begin{array}{c} P^{(0)}_k+P^{(1)}_k+P^{(3)}_k\\ {\textbf{0}} \end{array} \right) , \end{aligned}$$
(63)
and
$$\begin{aligned} \eta _k^{(3)}=\left( \begin{array}{c} {\textbf{0}}\\ \sum \limits _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( \frac{G_{j,k+1}^{(1)}-g_j(x_{j,k+1})+\left( 1-\beta _k\right) \left( g_j(x_{j,k})-G_{j,k+1}^{(2)}\right) }{n{\tilde{\alpha }}_k}-\frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\left( G_j(x^*;\phi _{j,k+1}^{'})-g_j(x^*)\right) \right) \end{array} \right) . \end{aligned}$$
Denote \({\textbf{M}}(k,t)={\tilde{\alpha }}_t\sum _{l_1=t}^{k-1}\Pi _{l_2=t+1}^{l_1}\left( {\textbf{I}}_{2d}-{\tilde{\alpha }}_{l_2}{\textbf{H}}_{\theta }\right) ,\quad {\textbf{N}}(k,t)={\textbf{M}}(k,t)-{\textbf{H}}_{\theta }^{-1}\), where \({\textbf{M}}(k,k)={\textbf{0}},\Pi _{l=t+1}^{t}\left( {\textbf{I}}_{2d}-{\tilde{\alpha }}_{l_2}{\textbf{H}}_{\theta }\right) ={\textbf{I}}_{2d}\). Then by the recursion (62),
$$\begin{aligned} \frac{1}{\sqrt{k}}\sum _{t=1}^{k}\Delta _t&=\frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{H}}_{\theta }^{-1}\eta _t^{(1)}+\frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{N}}(k,t)\eta _t^{(1)}+\frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{M}}(k,t)\eta _t^{(2)}\nonumber \\&\quad +\frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{M}}(k,t)\eta _t^{(3)}+{\mathcal {O}}\left( \frac{1}{\sqrt{k}}\right) . \end{aligned}$$
(64)
It is easy to show that the second term on the right hand side of (64) converges to 0 in mean, see Lemma 7. For the third term on the right hand side of (64),
$$\begin{aligned}&{\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{M}}(k,t)\eta _t^{(2)}\right\| \right] \le \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\\&\quad \left( {\mathbb {E}}\left[ \left\| \frac{1}{n}\sum _{j=1}^{n}\nabla g_j(x^*)\left( \nabla f_j(g_j(x_{j,t}))-\nabla f_j(z_{j,t})-{\textbf{T}}_j\left( z_{j,t}-g_j\left( x_{j,t}\right) \right) \right) \right\| \right] \right. \\&\qquad \left. +{\mathbb {E}}\left[ \left\| \nabla h({\bar{x}}_t)-\frac{1}{n}{\textbf{H}}({\bar{x}}_{t}-x^*)\right\| \right] \right. \\&\qquad \left. +{\mathbb {E}}\left[ \left\| \frac{1}{n}\sum _{j=1}^{n}\left( \nabla g_j(x_{j,t})-\nabla g_j(x^*)\right) \left( \nabla f_j(g_j(x_{j,t}))-\nabla f_j(z_{j,t})\right) \right\| \right] \right) \\&\qquad +\frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathbb {E}}\left[ \left\| P^{(1)}_t+P^{(3)}_t\right\| \right] \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\\&\quad \left( \frac{1}{n}\sum _{j=1}^{n}\Vert \nabla g_j(x^*)\Vert {\mathbb {E}}\left[ \left\| z_{j,t}-g_j\left( x_{j,t}\right) \right\| ^{1+\gamma }\right] +{\mathbb {E}}\left[ \left\| {\bar{x}}_{t}-x^*\right\| ^{1+\gamma }\right] \right. \\&\qquad \left. +\frac{1}{n}\sum _{j=1}^{n}L_gL_f\sqrt{{\mathbb {E}}\left[ \left\| x_{j,t}-x^*\right\| ^2\right] {\mathbb {E}}\left[ \left\| g_j(x_{j,t})-z_{j,t}\right\| ^2\right] }\right) \\&\qquad +\frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathbb {E}}\left[ \left\| P^{(1)}_t+P^{(3)}_t\right\| \right] \\&\quad = \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathcal {O}}\left( \alpha _t^{(1+\gamma )/2}+\alpha _t\right) , \end{aligned}$$
where the first inequality follows from the definitions of \(\eta _t^{(2)}\), \(P_t^{(0)}\) and \(P_t^{(2)}\) in (63), (61) and (52), the second inequality follows from condition (d), Assumption 1 (a) and the Hölder inequality, the equality follows from (54)-(56) and Theorem 2. Then by the boundedness of \({\textbf{M}}(k,t)\) [26, Lemma 1 (ii)], the fact \(\sum _{k=1}^\infty \frac{\alpha _k^{(1+\gamma )/2}}{\sqrt{k}}<\infty \) and Kronecker Lemma, we have
$$\begin{aligned} {\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{M}}(k,t)\eta _t^{(2)}\right\| \right] \le \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathcal {O}}\left( \alpha _t^{(1+\gamma )/2}+\alpha _t\right) \longrightarrow 0. \end{aligned}$$
Noting that \({\eta _k^{(3)}}\) is a martingale difference sequence adapted to the filtration \({\mathcal {F}}_k\) (46), the fourth term on the right hand side of (64)
$$\begin{aligned} \begin{aligned}&{\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{M}}(k,t)\eta _t^{(3)}\right\| ^2\right] \\&=\frac{1}{k}\sum _{t=1}^{k}{\mathbb {E}}\left[ \left\| {\textbf{M}}(k,t)\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( \frac{G_{j,t+1}^{(1)}-g_j(x_{j,t+1})-\left( G_{j,t+1}^{(2)}-g_j(x_{j,t})\right) }{n{\tilde{\alpha }}_t}\right. \right. \right. \\&\quad \left. +\frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\left( G_{j,t+1}^{(2)}-g_j(x_{j,t})-\left( G_j(x^*;\phi _{j,t+1}^{'})-g_j(x^*)\right) \right) \bigg )\bigg \Vert ^2\right] \\&\le \frac{1}{k}\sum _{t=1}^{k}\frac{1}{n}\sum _{j=1}^{n}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2\Vert \nabla g_j(x^*)\Vert ^2\Vert {\textbf{T}}_j\Vert ^{2}4\\&\quad \left( \left( \frac{L_g^{'}}{{\tilde{\alpha }}_t}\right) ^2{\mathbb {E}} \left[ \left\| x_{j,t+1}-x_{j,t}\right\| ^2\right] +\left( \frac{n\beta L_g^{'}}{{\textbf{u}}^\intercal {\textbf{v}}}\right) ^2{\mathbb {E}}\left[ \left\| x_{j,t}-x^*\right\| ^2\right] \right) \\&=\frac{1}{k}\sum _{t=1}^{k}{\mathcal {O}}\left( \alpha _t\right) , \end{aligned} \end{aligned}$$
where the inequality follows from the Lipschitz continuity of \(G_j(\cdot ;\phi )\), the second equality follows from (56), Theorem 2 and the fact
$$\begin{aligned} {\mathbb {E}}\left[ \left\| x_{j,t+1}-x_{j,t}\right\| ^2\right]\le & {} 3\left( {\mathbb {E}}\left[ \left\| x_{j,t+1}-{\bar{x}}_{t+1}\right\| ^2\right] \right. \\{} & {} \left. +{\mathbb {E}}\left[ \left\| x_{j,t}-{\bar{x}}_{t}\right\| ^2\right] +{\mathbb {E}}\left[ \left\| {\bar{x}}_{t+1}-{\bar{x}}_{t}\right\| ^2\right] \right) ={\mathcal {O}}\left( \alpha _t^2\right) . \end{aligned}$$
Then by Kronecker Lemma, \({\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{M}}(k,t)\eta _t^{(3)}\right\| ^2\right] =\frac{1}{k}\sum _{t=1}^{k}{\mathcal {O}}\left( \alpha _t\right) \longrightarrow 0.\)
It is left to show the asymptotic normality of the first term on the right hand side of (64). Indeed, by the similar way to [48, Lemma 6 in Appendix B], we may obtain that
$$\begin{aligned}{} & {} {\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k} P^{(4)}_k-\frac{1}{\sqrt{k}}\sum _{t=1}^{k}\left( \frac{{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_d\right) \epsilon _t^*\right\| ^2\right] \longrightarrow 0,\\{} & {} \frac{1}{\sqrt{k}}\sum _{t=1}^{k}\left( \frac{{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_d\right) \epsilon _t^*{\mathop {\rightarrow }\limits ^{d}} N\left( {\textbf{0}},\frac{1}{n^2}{\textbf{S}}_1\right) \end{aligned}$$
and
$$\begin{aligned} \frac{1}{\sqrt{k}}\sum _{t=1}^{k}\frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( G_j(x^*;\phi _{j,k+1}^{'})-g_j(x^*)\right) {\mathop {\rightarrow }\limits ^{d}} N\left( {\textbf{0}},\left( \frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\right) ^2{\textbf{S}}_2\right) , \end{aligned}$$
where
$$\begin{aligned}&\epsilon _t^*=\left[ \left( \nabla G_1(x^*;\phi _{1,t})\nabla F_1(g(x^*);\zeta _{1,t})-\nabla g_1(x^*;\phi _{1,t})\nabla f_1(g(x^*))\right) ^\intercal ,\cdots ,\right. \\&\quad \quad \left. \left( \nabla G_n(x^*;\phi _{n,t})\nabla F_n(g(x^*);\zeta _{n,t})-\nabla g_n(x^*;\phi _{n,t})\nabla f_n(g(x^*))\right) ^\intercal \right] ^\intercal . \end{aligned}$$
Note that \({\textbf{H}}_{\theta }^{-1}=\left( \begin{array}{cc} n{\textbf{H}}^{-1}&{} -\frac{{\textbf{u}}^\intercal {\textbf{v}}}{\beta }{\textbf{H}}^{-1}\\ {\textbf{0}}&{} \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n\beta }{\textbf{I}}_d \end{array} \right) \) and \(\phi _{i,k}\) is independent of \(\phi _{i,k}^{'}\). Then
$$\begin{aligned} \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{H}}_{\theta }^{-1}\eta _t^{(1)}{\mathop {\longrightarrow }\limits ^{d}} N\left( {\textbf{0}},\left( \begin{array}{cc} {\textbf{H}}^{-1}\left( {\textbf{S}}_1+{\textbf{S}}_2\right) ({\textbf{H}}^{-1})^\intercal &{} -\frac{1}{n}{\textbf{H}}^{-1}{\textbf{S}}_2\\ -\frac{1}{n}{\textbf{S}}_2({\textbf{H}}^{-1})^\intercal &{} \frac{1}{n^2}{\textbf{S}}_2 \end{array} \right) \right) . \end{aligned}$$
The proof is complete. \(\square \)