Skip to main content
Log in

Distributed stochastic compositional optimization problems over directed networks

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We study the distributed stochastic compositional optimization problems over directed communication networks in which agents privately own a stochastic compositional objective function and collaborate to minimize the sum of all objective functions. We propose a distributed stochastic compositional gradient descent method, where the gradient tracking and the stochastic correction techniques are employed to adapt to the networks’ directed structure and increase the accuracy of inner function estimation. When the objective function is smooth, the proposed method achieves the convergence rate \({\mathcal {O}}\left( k^{-1/2}\right) \) and sample complexity \({\mathcal {O}}\left( \frac{1}{\epsilon ^2}\right) \) for finding the (\(\epsilon \))-stationary point. When the objective function is strongly convex, the convergence rate is improved to \({\mathcal {O}}\left( k^{-1}\right) \). Moreover, the asymptotic normality of Polyak-Ruppert averaged iterates of the proposed method is also presented. We demonstrate the empirical performance of the proposed method on model-agnostic meta-learning problem and logistic regression problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

The authors confirm that all data generated or analysed during this study are included in this published article.

Notes

  1. The underlying graph of a directed graph \({\mathcal {G}}^{'}\) is an undirected graph obtained by replacing all directed edges of \({\mathcal {G}}^{'}\) with undirected edges.

References

  1. Balasubramanian, K., Ghadimi, S., Nguyen, A.: Stochastic multi-level composition optimization algorithms with levelindependent convergence rates. SIAM J. Optim. 32, 519–544 (2022)

    Article  MathSciNet  Google Scholar 

  2. Bianchi, P., Fort, G., Hachem, W.: Performance of a distributed stochastic approximation algorithm. IEEE Trans. Inf. Theory 59, 7405–7418 (2013)

    Article  MathSciNet  Google Scholar 

  3. Chen, T., Sun, Y., Yin, W.: Solving stochastic compositional optimization is nearly as easy as solving stochastic optimization. IEEE Trans. Signal Process. 69, 4937–4948 (2021)

    Article  MathSciNet  Google Scholar 

  4. Chung, K.L.: On a stochastic approximation method. Ann. Math. Stat. 25, 463–483 (1954)

    Article  MathSciNet  Google Scholar 

  5. Dai, B., He, N., Pan, Y., Boots, B., Song, L.: Learning from Conditional Distributions via Dual Embeddings. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol. 54, PMLR, pp. 1458–1467 (2017)

  6. Dentcheva, D., Penev, S., Ruszczyński, A.: Statistical estimation of composite risk functionals and risk optimization problems. Ann. Inst. Stat. Math. 69, 737–760 (2017)

    Article  MathSciNet  Google Scholar 

  7. Ermoliev, Y.M.: Methods of Stochastic Programming. Nauka, Moscow (1976)

    Google Scholar 

  8. Ermoliev, Y.M., Norkin, V.I.: Sample average approximation method for compound stochastic optimization problems. SIAM J. Optim. 23, 2231–2263 (2013)

    Article  MathSciNet  Google Scholar 

  9. Fabian, V.: On asymptotic normality in stochastic approximation. Ann. Math. Stat. 39, 1327–1332 (1968)

    Article  MathSciNet  Google Scholar 

  10. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1126–1135 (2017)

  11. Gao, H., Huang, H.: Fast training method for stochastic compositional optimization problems. In: Advances in Neural Information Processing Systems, vol. 34, pp. 25334–25345 (2021)

  12. Gao, H., Li, J., Huang, H.: On the convergence of local stochastic compositional gradient descent with momentum. In: Proceedings of the 39th International Conference on Machine Learning, vol. 162, pp. 7017–7035 (2022)

  13. Ghadimi, S., Ruszczynski, A., Wang, M.: A single timescale stochastic approximation method for nested stochastic optimization. SIAM J. Optim. 30, 960–979 (2020)

    Article  MathSciNet  Google Scholar 

  14. Ghadimi, S., Wang, M.: Approximation methods for bilevel programming, arXiv preprint arXiv:1802.02246 (2018)

  15. Guo, Z., Hu, Q., Zhang, L., Yang, T.: Randomized stochastic variance-reduced methods for multi-task stochastic bilevel optimization, arXiv preprint arXiv:2105.02266 (2021)

  16. Hong, M., Wai, H.-T., Wang, Z., Yang, Z.: A two-timescale stochastic algorithm framework for bilevel optimization: complexity analysis and application to actor-critic. SIAM J. Optim. 33, 147–180 (2023)

    Article  MathSciNet  Google Scholar 

  17. Hu, Y., Chen, X., He, N.: Sample complexity of sample average approximation for conditional stochastic optimization. SIAM J. Optim. 30, 2103–2133 (2020)

    Article  MathSciNet  Google Scholar 

  18. Huo, Z., Gu, B., Liu, J., Huang, H.: Accelerated method for stochastic composition optimization with nonsmooth regularization. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pp. 3287–3294 (2018)

  19. Ji, K., Yang, J., Liang, Y.: Bilevel optimization: Convergence analysis and enhanced design. In: Proceedings of the 38th International Conference on Machine Learning, vol. 139, pp. 4882–4892 (2021)

  20. Jiang, W., Wang, B., Wang, Y., Zhang, L., Yang, T.: Optimal algorithms for stochastic multi-level compositional optimization. In: Proceedings of the 39th International Conference on Machine Learning, vol 162, pp. 10195–10216 (2022)

  21. Lei, J., Chen, H.F., Fang, H.T.: Asymptotic properties of primal-dual algorithm for distributed stochastic optimization over random networks with imperfect communications. SIAM J. Control. Optim. 56, 2159–2188 (2018)

    Article  MathSciNet  Google Scholar 

  22. Liu, L., Liu, J., Tao, D.: Variance reduced methods for non-convex composition optimization. IEEE Trans Pattern Anal Mach Intell. 44, 5813–5825 (2021)

    Google Scholar 

  23. Morral, G., Bianchi, P., Fort, G., Jakubowicz, J.: Distributed stochastic approximation: the price of non-double stochasticity. In: 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers, pp. 1473–1477 (2012)

  24. Nedic, A.: Distributed gradient methods for convex machine learning problems in networks: distributed optimization. IEEE Signal Process. Mag. 37, 92–101 (2020)

    Article  Google Scholar 

  25. Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)

    Google Scholar 

  26. Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control. Optim. 30, 838–855 (1992)

    Article  MathSciNet  Google Scholar 

  27. Pu, S., Shi, W., Xu, J., Nedic, A.: A push-pull gradient method for distributed optimization in networks. In: IEEE Conference on Decision and Control, pp. 3385–3390 (2018)

  28. Pu, S., Shi, W., Xu, J., Nedic, A.: Push-pull gradient methods for distributed optimization in networks. IEEE Trans. Autom. Control 66, 1–16 (2021)

    Article  MathSciNet  Google Scholar 

  29. Qi, Q., Luo, Y., Xu, Z., Ji, S., Yang, T.: Stochastic optimization of areas under precision-recall curves with provable convergence. In: Advances in Neural Information Processing Systems, vol 34, pp. 1752–1765 (2021)

  30. Qu, G., Li, N.: Harnessing smoothness to accelerate distributed optimization. IEEE Trans. Control Netw. Syst. 5, 1245–1260 (2018)

    Article  MathSciNet  Google Scholar 

  31. Rakhlin, A., Shamir, O., Sridharan, K.: Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Coference on Machine Learning, pp. 1571–1578 (2012)

  32. Ren, J., Haupt, J., Guo, Z.: Communication-efficient hierarchical distributed optimization for multi-agent policy evaluation. J. Comput. Sci. 49, 101280 (2021)

    Article  MathSciNet  Google Scholar 

  33. Ruszczynski, A.: A stochastic subgradient method for nonsmooth nonconvex multilevel composition optimization. SIAM J. Control. Optim. 59, 2301–2320 (2021)

    Article  MathSciNet  Google Scholar 

  34. Sahu, A.K., Kar, S., Moura, J.M.F., Poor, H.V.: Distributed constrained recursive nonlinear least-squares estimation: algorithms and asymptotics. IEEE Trans. Signal Inf. Process. Netw. 2, 426–441 (2016)

    MathSciNet  Google Scholar 

  35. Seneta, E.: Non-Negative Matrices and Markov Chains. Springer, New York (1981)

    Book  Google Scholar 

  36. Sha, X., Zhang, J., You, K., Zhang, K., Basar, T.: Fully asynchronous policy evaluation in distributed reinforcement learning over networks. Automatica 136, 1–11 (2022)

    Article  MathSciNet  Google Scholar 

  37. Song, Z., Shi, L., Pu, S., Yan, M.: Compressed gradient tracking for decentralized optimization over general directed networks. IEEE Trans. Signal Process. 70, 1775–1787 (2022)

    Article  MathSciNet  Google Scholar 

  38. Wang, B., Yuan, Z., Ying, Y., Yang, T.: Memory-based optimization methods for model-agnostic meta-learning, arXiv preprint arXiv:2106.04911 (2021)

  39. Wang, M., Fang, E.X., Liu, H.: Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions. Math. Program. 161, 419–449 (2017)

    Article  MathSciNet  Google Scholar 

  40. Wang, M., Liu, J., Fang, E.: Accelerating stochastic composition optimization. J. Mach. Learn. Res. 18, 1–23 (2017)

    MathSciNet  Google Scholar 

  41. Xin, R., Khan, U.A.: A linear algorithm for optimization over directed graphs with geometric convergence. IEEE Control Syst. Lett. 2, 315–320 (2018)

    Article  MathSciNet  Google Scholar 

  42. Xin, R., Pu, S., Nedic, A., Khan, U.A.: A general framework for decentralized optimization with first-order methods. Proc. IEEE 108, 1869–1889 (2020)

    Article  Google Scholar 

  43. Yang, S., Wang, M., Fang, E.X.: Multilevel stochastic gradient methods for nested composition optimization. SIAM J. Optim. 29, 616–659 (2019)

    Article  MathSciNet  Google Scholar 

  44. Yang, S., Zhang, X., Wang, M.: Decentralized gossip-based stochastic bilevel optimization over communication networks. In: Advances in Neural Information Processing Systems, vol. 35, pp. 238–252 (2022)

  45. Zhang, J., Xiao, L.: Multilevel composite stochastic optimization via nested variance reduction. SIAM J. Optim. 31, 1131–1157 (2021)

    Article  MathSciNet  Google Scholar 

  46. Zhang, J., Xiao, L.: A stochastic composite gradient method with incremental variance reduction. In: Advances in Neural Information Processing Systems, pp. 9078–9088 (2019)

  47. Zhao, S., Chen, X.-M., Liu, Y.: Asymptotic properties of dual averaging algorithm for constrained distributed stochastic optimization. Syst. Control Lett. 165, 1–14 (2022)

    Article  MathSciNet  Google Scholar 

  48. Zhao, S., Liu, Y.: Asymptotic properties of \(\cal{S}-\cal{AB}\) method with diminishing stepsize, arXiv preprint arXiv:2109.07981 (2021)

Download references

Acknowledgements

The authors thank Dr. Yuejiao Sun for sharing the code of SCSC [3]. The research is supported by the NSFC #11971090 and Fundamental Research Funds for the Central Universities DUT22LAB301.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongchao Liu.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Lemma 5

Suppose that positive sequence \(\{\alpha _k\}\) is nonincreasing and \(\lim _{k\rightarrow \infty }\frac{\alpha _k}{\alpha _{k+1}}=1\). Then for any \(0<\rho <1\), there exists a constant c such that

$$\begin{aligned} \sum _{t=1}^k\rho ^{k-t}\alpha _t\le c\alpha _k. \end{aligned}$$

Proof

Let \(\beta _k=\sum _{t=1}^{k}\rho ^{k-t}\alpha _t\), then \(\beta _k=\rho \sum _{t=1}^{k-1}\rho ^{k-1-t}\alpha _t+\alpha _{k}=\rho \beta _{k-1}+\alpha _{k}\). Denoting \(b_k=\beta _k/\alpha _{k}\), then \(b_k=\rho \frac{\alpha _{k-1}}{\alpha _{k}}b_{k-1}+1\). Noting that \(\lim _{k\rightarrow \infty }\frac{\alpha _{k-1}}{\alpha _{k}}=1\) and \(\rho <1\), there exists an integer \(k_0>0\) such that \(\frac{\alpha _{k-1}}{\alpha _{k}}\le \frac{2}{\rho +1}\) for \(k>k_0\). Taking \(c=\max \left\{ \sup _{1\le k\le k_0}b_k,~\frac{\rho +1}{1-\rho }\right\} \), we have \(b_k\le c\) for \(k\le k_0\). Suppose that the claim holds for \(k-1\) (\(k-1\ge k_0\)), that is \(b_{k-1}\le c\), then

$$\begin{aligned} b_k=\rho \frac{\alpha _{k-1}}{\alpha _{k}}b_{k-1}+1\le \frac{2\rho }{\rho +1} c+1\le \frac{2\rho }{\rho +1} c+\frac{1-\rho }{\rho +1}c=c. \end{aligned}$$

The proof is complete. \(\square \)

1.1 Proof of Lemma 1

Proof

Under Assumption 2, the conditions of [37, Lemma 3] hold and then there exists an invertible matrix \({\textbf{A}}_*\in {\mathbb {R}}^{n\times n}\) such that

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\textbf{A}_*}={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \textbf{A}_*\left( {\textbf{A}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\right) \textbf{A}_*^{-1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }<1, \end{aligned}$$

where \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\textbf{A}_*}\) is the matrix norm induced by vector norm \(\Vert x\Vert _{\textbf{A}_*}:=\Vert {\textbf{A}}_*x\Vert \). Let \(\hat{{\textbf{A}}}={\textbf{A}}_*\otimes {\textbf{I}}_d\). Noting that \(\left( {\textbf{W}}_1\otimes {\textbf{W}}_2\right) ^{-1}={\textbf{W}}_1^{-1}\otimes {\textbf{W}}_2^{-1}\) for any invertible matrices \({\textbf{W}}_1,{\textbf{W}}_2\in {\mathbb {R}}^{nd\times nd}\), \(\hat{{\textbf{A}}}^{-1}={\textbf{A}}_*^{-1}\otimes {\textbf{I}}_d\). Therefore, vector matrix \(\Vert {\textbf{x}}\Vert _{\hat{{\textbf{A}}}}:=\Vert \hat{{\textbf{A}}}{\textbf{x}}\Vert \) is well defined and the corresponding induced matrix norm \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \cdot \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{A}}}}\) satisfies

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{A}}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_d \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{A}}}}&={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \hat{{\textbf{A}}}\left( \tilde{{\textbf{A}}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_d\right) \hat{{\textbf{A}}}^{-1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\\&={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \left[ \textbf{A}_*\left( {\textbf{A}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\right) \textbf{A}_*^{-1}\right] \otimes {\textbf{I}}_d \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\\&={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \textbf{A}_*\left( {\textbf{A}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\right) \textbf{A}_*^{-1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }<1. \end{aligned}$$

By the similar analysis, there exists \(\hat{{\textbf{B}}}\) such that

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_d \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}&={\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \hat{{\textbf{B}}}\left( \tilde{{\textbf{B}}}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_d\right) \hat{{\textbf{B}}}^{-1} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }<1. \end{aligned}$$

The inequality (12) follows from the equivalence relation of all norms on \({\mathbb {R}}^d\). The proof is complete. \(\square \)

1.2 Proof of Lemma 2

Proof

We first provide the upper bound of consensus error \({\textbf{x}}_{k+1}-{\textbf{1}}\otimes {\bar{x}}_{k+1}\) in the mean square sense. Note that for any random vectors \(\theta \), \(\theta ^{'}\) and positive scalar \(\tau \),

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ \left\| \theta +\theta ^{'}\right\| _*^2\right] \le (1+\tau ){\mathbb {E}}\left[ \Vert \theta \Vert _*^2\right] +\left( 1+\frac{1}{\tau }\right) {\mathbb {E}}\left[ \left\| \theta ^{'}\right\| _*^2\right] , \end{aligned} \end{aligned}$$
(29)

where the norm \(\Vert \cdot \Vert _*\) may be \(\Vert \cdot \Vert _{\hat{{\textbf{A}}}}\) or \(\Vert \cdot \Vert _{\hat{{\textbf{B}}}}\). Choosing

$$\begin{aligned} \theta =\left( \tilde{{\textbf{A}}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_{k}\right) ,\quad \theta ^{'}=-\alpha _k\left( \tilde{{\textbf{A}}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k, \end{aligned}$$

we have \({\textbf{x}}_{k+1}-{\textbf{1}}\otimes {\bar{x}}_{k+1}=\theta +\theta ^{'}\) and

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\textbf{x}}_{k+1}-{\textbf{1}}\otimes {\bar{x}}_{k+1}\Vert _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\le (1+\tau ){\mathbb {E}}\left[ \left\| \left( \tilde{{\textbf{A}}} -\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_{k}\right) \right\| _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\quad +\left( 1+\frac{1}{\tau }\right) {\mathbb {E}}\left[ \left\| \alpha _{k}\left( \tilde{{\textbf{A}}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\le \frac{1+\tau _{\textbf{A}}^2}{2}{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_{k}\right\| _{\hat{{\textbf{A}}}}^2\right] +\alpha _{k}^2\frac{1+\tau _{\textbf{A}}^2}{1-\tau _{\textbf{A}}^2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{A}}}}^2\overline{c}^2{\mathbb {E}}\left[ \left\| {\textbf{y}}_k\right\| ^2\right] , \end{aligned}$$
(30)

where \(\tau _{{\textbf{A}}}\) is defined in (17), \(\tau =(1-\tau _{{\textbf{A}}}^2)/(2\tau _{{\textbf{A}}}^2)\) and the last inequality follows from the fact (12). By the definition of \({\textbf{y}}_k\) in (10),

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ \left\| {\textbf{y}}_k\right\| ^2\right]&={\mathbb {E}}\left[ \left\| \sum _{t=1}^{k-1}\tilde{{\textbf{B}}}^{k-1-t}(\tilde{{\textbf{B}}}-{\textbf{I}}_{nd}){\textbf{H}}_t+{\textbf{H}}_{k}\right\| ^2\right] \\&\le \sum _{t_1=1}^{k}\sum _{t_2=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t_1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t_2) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathbb {E}}\left[ \Vert {\textbf{H}}_{t_1}\Vert \Vert {\textbf{H}}_{t_2}\Vert \right] , \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \begin{aligned}&\tilde{{\textbf{B}}}(k,t):=\tilde{{\textbf{B}}}^{k-1-t}(\tilde{{\textbf{B}}}-{\textbf{I}}_{nd})~(t\le k-1),\quad \tilde{{\textbf{B}}}(k,k):={\textbf{I}}_{nd}. \end{aligned} \end{aligned}$$
(31)

Obviously, \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,k) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }=1\), \({\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,k-1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\le \overline{c}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}-{\textbf{I}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}\) and for \(t<k-1\),

$$\begin{aligned} \begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\le \overline{c}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}^{k-1-t}\left( \tilde{{\textbf{B}}}-{\textbf{I}}_{nd}\right) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}&=\overline{c}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \left( \tilde{{\textbf{B}}}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_d\right) \tilde{{\textbf{B}}}^{k-2-t}\left( \tilde{{\textbf{B}}}-{\textbf{I}}_{n}\right) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}\\&\le \overline{c}\tau _{{\textbf{B}}} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}^{k-2-t}\left( \tilde{{\textbf{B}}}-{\textbf{I}}_{nd}\right) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}\\&\le \cdots \le \overline{c}\tau _{{\textbf{B}}}^{k-1-t} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}-{\textbf{I}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}. \end{aligned} \end{aligned}$$

Denoting \(c_b=\max \left\{ \overline{c},\frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{B}}-{\textbf{I}}_{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}}{\tau _{{\textbf{B}}}}\overline{c}\right\} \), we have

$$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\le c_b\tau _{{\textbf{B}}}^{k-t} \end{aligned}$$
(32)

and

$$\begin{aligned} {\mathbb {E}}\left[ \left\| {\textbf{y}}_k\right\| ^2\right]&\le c_b^2\sum _{t_1=1}^{k}\sum _{t_2=1}^{k}\tau _{{\textbf{B}}}^{2k-t_1-t_2}{\mathbb {E}}\left[ \Vert {\textbf{H}}_{t_1}\Vert \Vert {\textbf{H}}_{t_2}\Vert \right] \nonumber \\&\le c_b^2\sum _{t_1=1}^{k}\sum _{t_2=1}^{k}\tau _{{\textbf{B}}}^{2k-t_1-t_2}\frac{{\mathbb {E}}\left[ \Vert {\textbf{H}}_{t_1}\Vert ^2\right] +{\mathbb {E}}\left[ \Vert {\textbf{H}}_{t_2}\Vert ^2\right] }{2}\nonumber \\&\le c_b^2\sum _{t_1=1}^{k}\sum _{t_2=1}^{k}\tau _{{\textbf{B}}}^{2k-t_1-t_2} nC_gC_f\le \frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}, \end{aligned}$$
(33)

where the third inequality follows from Assumption 1 (c). Substitute (33) into (30),

$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\textbf{x}}_{k+1}-{\textbf{1}}\otimes {\bar{x}}_{k+1}\Vert _{\hat{{\textbf{A}}}}^2\right] \le \frac{1+\tau _{\textbf{A}}^2}{2}{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_{k}\right\| _{\hat{{\textbf{A}}}}^2\right] +c_1\alpha _{k}^2, \end{aligned}$$
(34)

where \(c_1=\frac{1+\tau _{\textbf{A}}^2}{1-\tau _{\textbf{A}}^2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}}-\frac{{\textbf{1}}{\textbf{u}}^\intercal }{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{A}}}}^2\overline{c}^2\frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}\).

Next, we estimate the upper bound of consensus error \(\Vert {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\Vert ^2\) in the mean sense. Set

$$\begin{aligned} \theta =\left( \tilde{{\textbf{B}}}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\right) ,\quad \theta ^{'}=\left( {\textbf{I}}_{nd}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{J}}_{k+1}-{\textbf{J}}_k\right) \end{aligned}$$

in (29). By the definitions of \({\textbf{y}}_{k+1}^{'}\) and \({\bar{y}}_{k+1}^{'}\), we have \({\textbf{y}}_{k+1}^{'}-{\textbf{1}}\otimes {\bar{y}}_{k+1}^{'}=\theta +\theta ^{'}\) and

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\textbf{y}}_{k+1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{k+1}^{'}\Vert _{\hat{{\textbf{B}}}}^2\right] \nonumber \\&\le (1+\tau ){\mathbb {E}}\left[ \!\left\| \left( \!\tilde{{\textbf{B}}}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_{d}\!\right) \left( \!{\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\!\right) \right\| _{\hat{{\textbf{B}}}}^2\!\right] +\left( 1+\frac{1}{\tau }\right) {\mathbb {E}}\left[ \!\left\| \left( {\textbf{I}}_{nd}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{J}}_{k+1}-{\textbf{J}}_k\right) \right\| _{\hat{{\textbf{B}}}}^2\!\right] \nonumber \\&\le \frac{1+\tau _{\textbf{B}}^2}{2}{\mathbb {E}}\left[ \left\| {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] +2\frac{1+\tau _{\textbf{B}}^2}{1-\tau _{\textbf{B}}^2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{I}}_{nd}-\frac{{\textbf{v}}{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_{d} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}^2\overline{c}^2{\mathbb {E}}\left[ \left\| {\textbf{J}}_{k+1}-{\textbf{J}}_k\right\| ^2\right] , \end{aligned}$$
(35)

where \(\tau _{{\textbf{B}}}\) is defined in (17), the second inequality follows from the setting \(\tau =(1-\tau _{{\textbf{B}}}^2)/(2\tau _{{\textbf{B}}}^2)\) and (12). For the term \({\mathbb {E}}\left[ \left\| {\textbf{J}}_{k+1}-{\textbf{J}}_k\right\| ^2\right] \),

$$\begin{aligned}&{\mathbb {E}}\left[ \left\| {\textbf{J}}_{k+1}-{\textbf{J}}_k\right\| ^2\right] =\sum _{j=1}^n {\mathbb {E}}\left[ \left\| \nabla g_j(x_{j,k+1})\nabla f_j(z_{j,k+1})-\nabla g_j(x_{j,k})\nabla f_j(z_{j,k})\right\| ^2\right] \nonumber \\&\le 2\sum _{j=1}^n\ {\mathbb {E}}\left[ \left\| \left( \nabla g_j(x_{j,k+1})-\nabla g_j(x_{j,k})\right) \nabla f_j(z_{j,k+1})\right\| ^2\right. \nonumber \\&\qquad \left. +\left\| \nabla g_j(x_{j,k})\left( \nabla f_j(z_{j,k})-\nabla f_j(z_{j,k+1})\right) \right\| ^2\right] \nonumber \\&\le 2C_fL_g^2{\mathbb {E}}\left[ \left\| {\textbf{x}}_{k+1}-{\textbf{x}}_k\right\| ^2\right] +2C_gL_f^2{\mathbb {E}}\left[ \left\| {\textbf{z}}_{k+1}-{\textbf{z}}_k\right\| ^2\right] \nonumber \\&\le \left( 2C_fL_g^2+8C_g^2L_f^2\right) {\mathbb {E}}\left[ \left\| {\textbf{x}}_{k+1}-{\textbf{x}}_k\right\| ^2\right] +8\beta _k^2C_gL_f^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] +8\beta _k^2C_gL_f^2V_g\nonumber \\&=\left( 2C_fL_g^2+8C_g^2L_f^2\right) {\mathbb {E}}\left[ \left\| \left( \tilde{{\textbf{A}}}-{\textbf{I}}_{nd}\right) \left( {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right) -\alpha _k\tilde{{\textbf{A}}}{\textbf{y}}_k\right\| ^2\right] \nonumber \\&\qquad +8\beta _k^2C_gL_f^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] +8\beta _k^2C_gL_f^2V_g\nonumber \\&\le \left( 4C_fL_g^2+16C_g^2L_f^2\right) {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{A}}}-{\textbf{I}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2 \overline{c}^2{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\qquad +\left( 4C_fL_g^2+16C_g^2L_f^2\right) {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2\alpha _k^2{\mathbb {E}}\left[ \left\| {\textbf{y}}_k\right\| ^2\right] \nonumber \\&\qquad +8\beta _k^2C_gL_f^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] +8\beta _k^2C_gL_f^2V_g, \end{aligned}$$
(36)

where \({\textbf{g}}_k=\left[ g_1(x_{1,k})^\intercal ,\cdots , g_n(x_{n,k})^\intercal \right] ^\intercal \), \(V_g\) is defined in Assumption 1 (d), the second inequality follows from Assumption 1 (a) and (c), the third inequality follows from the definition of \({\textbf{z}}_k\) and Assumption 1 (c) and (d), the second equality follows from the fact \(\left( \tilde{{\textbf{A}}}-{\textbf{I}}_{nd}\right) ({\textbf{1}}\otimes {\bar{x}}_k)={\textbf{0}}\).

Substitute (33) and (36) into (35),

$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\textbf{y}}_{k+1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{k+1}^{'}\Vert _{\hat{{\textbf{B}}}}^2\right]&\le \frac{1+\tau _{\textbf{B}}^2}{2}{\mathbb {E}}\left[ \left\| {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] +c_2{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_{k}\right\| _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\quad +c_3\alpha _k^2+c_4\beta _k^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] +c_4V_g\beta _k^2, \end{aligned}$$
(37)

where the constants

$$\begin{aligned} \begin{aligned}&c_2=8\frac{1+\tau _{{\textbf {B}}}^2}{1-\tau _{{\textbf {B}}}^2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {{\textbf {I}}}_{nd}- \frac{{{\textbf {v}}}{{\textbf {1}}}^\intercal }{n}\otimes {{\textbf {I}}}_{d} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{{\textbf {B}}}}}^2\overline{c}^4\left( C_fL_g^2+4C_g^2L_f^2\right) {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{{\textbf {A}}}}-{{\textbf {I}}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2,\\ {}&c_3=8\frac{1+\tau _{{\textbf {B}}}^2}{1- \tau _{{\textbf {B}}}^2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {{\textbf {I}}}_{nd}- \frac{{{\textbf {v}}}{{\textbf {1}}}^\intercal }{n}\otimes {{\textbf {I}}}_{d} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{{\textbf {B}}}}}^2\overline{c}^2\left( C_fL_g^2+4C_g^2L_f^2\right) {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {{\textbf {A}}} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2 \frac{c_b^2nC_gC_f}{(1- \tau _{{{\textbf {B}}}})^2},\\ {}&c_4=16\frac{1+\tau _{{\textbf {B}}}^2}{1-\tau _{{\textbf {B}}}^2}{\left| \hspace{- 1.0625pt}\left| \hspace{-1.0625pt}\left| {{\textbf {I}}}_{nd}-\frac{{{\textbf {v}}}{{\textbf {1}}}^\intercal }{n}\otimes {{\textbf {I}}}_{d} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{{\textbf {B}}}}}^2\overline{c}^2C_gL_f^2. \end{aligned} \end{aligned}$$

Lastly, we show (14) through combining (34) with (37). Multiplying \(c_5=\frac{1-\tau _{\textbf{A}}^2}{4c_2}\) on both sides of inequality (37),

$$\begin{aligned} \begin{aligned} c_5{\mathbb {E}}\left[ \Vert {\textbf{y}}_{k+1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{k+1}^{'}\Vert _{\hat{{\textbf{B}}}}^2\right]&\le \frac{1+\tau _{\textbf{B}}^2}{2}c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] +\frac{1-\tau _{\textbf{A}}^2}{4}{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] +c_3c_5\alpha _k^2\\&\quad +c_5c_4\beta _k^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] +c_5c_4\beta _k^2V_g. \end{aligned} \end{aligned}$$

Substituting above inequality into (34), we have

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}\left[ \Vert {\textbf{x}}_{k+1}-{\textbf{1}}\otimes {\bar{x}}_{k+1}\Vert _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \Vert {\textbf{y}}_{k+1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{k+1}^{'}\Vert _{\hat{{\textbf{B}}}}^2\right] \\&\le \frac{3+\tau _{\textbf{A}}^2}{4}{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] +\frac{1+\tau _{\textbf{B}}^2}{2}c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}_{k}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] +(c_1+c_3c_5)\alpha _k^2\\&\quad +c_5c_4\beta _k^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] +c_5c_4V_g\beta _k^2\\&\le \rho ^{k} \left( {\mathbb {E}}\left[ \left\| {\textbf{x}}_1-{\textbf{1}}\otimes {\bar{x}}_1\right\| _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_{1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{1}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] \right) +(c_1+c_3c_5)\sum _{t=1}^k\rho ^{k-t}\alpha _t^2\\&\quad +c_5c_4\sum _{t=1}^k\rho ^{k-t}\beta _t^2{\mathbb {E}}\left[ \left\| {\textbf{g}}_{t}-{\textbf{z}}_t\right\| ^2\right] +c_5c_4V_g\sum _{t=1}^k\rho ^{k-t}\beta _t^2, \end{aligned} \end{aligned}$$

where \(\rho =\max \left\{ \frac{1+\tau _{\textbf{B}}^2}{2},~\frac{3+\tau _{\textbf{A}}^2}{4}\right\} \). Moreover, by (34) and Lemma 5,

$$\begin{aligned} {\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right]&\le \frac{1+\tau _{\textbf{A}}^2}{2}{\mathbb {E}}\left[ \left\| {\textbf{x}}_{k-1}-{\textbf{1}}\otimes {\bar{x}}_{k-1}\right\| _{\hat{{\textbf{A}}}}^2\right] +c_1\alpha _{k-1}^2\\&\cdots \\&\le \left( \frac{1+\tau _{\textbf{A}}^2}{2}\right) ^{k-1}{\mathbb {E}}\left[ \left\| {\textbf{x}}_{1}-{\textbf{1}}\otimes {\bar{x}}_{1}\right\| _{\hat{{\textbf{A}}}}^2\right] +c_1\sum _{t=1}^{k-1}\left( \frac{1+\tau _{\textbf{A}}^2}{2}\right) ^{k-1-t}\alpha _{t}^2\\&\le \left( \frac{1+\tau _{\textbf{A}}^2}{2}\right) ^{k-1}{\mathbb {E}}\left[ \left\| {\textbf{x}}_{1}-{\textbf{1}}\otimes {\bar{x}}_{1}\right\| _{\hat{{\textbf{A}}}}^2\right] +c_1c_{\tau }\alpha _{k-1}^2, \end{aligned}$$

where \(c_{\tau }>0\) is some constant. Note that \(\lim _{k\rightarrow \infty }\frac{\alpha _{k}}{\alpha _{k+1}}=1\), there exists a positive constant \(U_1\) such that

$$\begin{aligned} {\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] \le U_1\alpha _{k}^2. \end{aligned}$$
(38)

The proof is complete. \(\square \)

1.3 Proof of Lemma 3

Proof

By the definitions of \({\textbf{z}}_{k+1}\) and \({\textbf{g}}_{k+1}\),

$$\begin{aligned} {\textbf{z}}_{k+1}- {\textbf{g}}_{k+1} =(1-\beta _k)\left( {\textbf{z}}_k-{\textbf{g}}_k\right) +({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)}). \nonumber \\ \end{aligned}$$
(39)

Then

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\textbf{z}}_{k+1}- {\textbf{g}}_{k+1}\Vert ^2\right] \nonumber \\&= (1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +{\mathbb {E}}\left[ \Vert ({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)})\Vert ^2\right] \nonumber \\&\quad + 2{\mathbb {E}}\left[ \left\langle (1-\beta _k)({\textbf{z}}_k-{\textbf{g}}_k),({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)})\right\rangle \right] \nonumber \\&= (1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +{\mathbb {E}}\left[ \Vert ({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)})\Vert ^2\right] , \end{aligned}$$
(40)

where the second equality follows from the fact

$$\begin{aligned}{} & {} {\mathbb {E}}\left[ ({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)})\right] \\{} & {} \qquad ={\mathbb {E}}\left[ {\mathbb {E}}\left[ ({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)})\bigg |{\mathcal {F}}_k^{'}\right] \right] ={\textbf{0}} \end{aligned}$$

with

$$\begin{aligned} \begin{aligned}&{\mathcal {F}}_1^{'}=\sigma \left( x_{i,1}, z_{i,1}, \phi _{i,1},\zeta _{i,1}:i\in {\mathcal {V}}\right) ,\\&{\mathcal {F}}_k^{'}=\sigma \left( \{x_{i,1},z_{i,1}, \phi _{i,t},\zeta _{i,t}:i\in {\mathcal {V}}, 1\le t\le k\}\cup \{\phi _{i,t}^{'}:i\in {\mathcal {V}}, 2\le t\le k\}\right) (k\ge 2). \end{aligned}\nonumber \\ \end{aligned}$$
(41)

For the second term on the right hand side of (40),

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert ({\textbf{G}}_{k+1}^{(1)}-{\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{G}}_{k+1}^{(2)})\Vert ^2\right] \\&={\mathbb {E}}\left[ \Vert (1-\beta _k)({\textbf{G}}_{k+1}^{(1)}-{\textbf{G}}_{k+1}^{(2)})+\beta _k( {\textbf{G}}_{k+1}^{(1)}- {\textbf{g}}_{k+1})+(1-\beta _k)({\textbf{g}}_k-{\textbf{g}}_{k+1}) \Vert ^2\right] \\&\le 3(1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{G}}_{k+1}^{(1)}-{\textbf{G}}_{k+1}^{(2)}\Vert ^2\right] +3\beta _k^2{\mathbb {E}}\left[ \Vert {\textbf{G}}_{k+1}^{(1)}- {\textbf{g}}_{k+1}\Vert ^2\right] \\&\quad +3(1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{g}}_{k+1} \Vert ^2\right] \\&\le 6(1-\beta _k)^2C_g{\mathbb {E}}\left[ \Vert {\textbf{x}}_{k+1}-{\textbf{x}}_k\Vert ^2\right] +3V_g\beta _k^2, \end{aligned}$$

where the second inequality follows from the conditions (c) and (d) in Assumption 1. Substitute above inequality into (40),

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\textbf{z}}_{k+1}- {\textbf{g}}_{k+1}\Vert ^2\right] \nonumber \\&\le (1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +6(1-\beta _k)^2C_g{\mathbb {E}}\left[ \Vert {\textbf{x}}_{k+1}-{\textbf{x}}_k\Vert ^2\right] +3V_g\beta _k^2\nonumber \\&=(1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +6(1-\beta _k)^2C_g{\mathbb {E}}\nonumber \\&\quad \left[ \left\| \left( \tilde{{\textbf{A}}}-{\textbf{I}}_{nd}\right) \left( {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right) -\alpha _k\tilde{{\textbf{A}}}{\textbf{y}}_k\right\| ^2\right] +3V_g\beta _k^2\nonumber \\&\le (1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +12(1-\beta _k)^2C_g\overline{c}^2{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{A}}}-{\textbf{I}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\quad +12(1-\beta _k)^2C_g\alpha _k^2{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2{\mathbb {E}}\left[ \left\| {\textbf{y}}_k\right\| ^2\right] +3V_g\beta _k^2\nonumber \\&\le (1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +12C_g\overline{c}^2{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{A}}}-{\textbf{I}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2{\mathbb {E}}\left[ \left\| {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\right\| _{\hat{{\textbf{A}}}}^2\right] \nonumber \\&\quad +\frac{12c_b^2nC_g^2C_f{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2}{(1-\tau _{{\textbf{B}}})^2}\alpha _k^2 +3V_g\beta _k^2, \end{aligned}$$
(42)

where \(\overline{c}\) is defined in (12), the equality follows from the fact \(\left( \tilde{{\textbf{A}}}-{\textbf{I}}_{nd}\right) ({\textbf{1}}\otimes {\bar{x}}_k)={\textbf{0}}\) by the row stochasticity of \({\textbf{A}}\), the last inequality follows from (33). Substitute (38) into (42), we have

$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\textbf{z}}_{k+1}- {\textbf{g}}_{k+1}\Vert ^2\right]&\le (1-\beta _k)^2{\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] \\&\quad +\left( 12C_g\overline{c}^2{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{A}}}-{\textbf{I}}_{nd} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2U_1+\frac{12c_b^2nC_g^2C_f{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{A}} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2}{(1-\tau _{{\textbf{B}}})^2}\right) \alpha _k^2+3V_g\beta _k^2. \end{aligned}$$

The proof is complete. \(\square \)

1.4 Proof of Lemma 4

Proof

We first show part (i). By the definition of \(\xi _k\),

$$\begin{aligned} \xi _k=\sum _{t=1}^{k-1}\tilde{{\textbf{B}}}^{k-1-t}(\tilde{{\textbf{B}}}-{\textbf{I}}_{nd})\epsilon _t+\epsilon _{k}=\sum _{t=1}^{k}\tilde{{\textbf{B}}}(k,t)\epsilon _t, \end{aligned}$$
(43)

where \(\epsilon _t:={\textbf{H}}_t-{\textbf{J}}_t\), \({\textbf{H}}_t\) and \({\textbf{J}}_t\) present in (10) and Lemma 2 respectively, \(\tilde{{\textbf{B}}}(k,t)\) is defined in (31). Then we have

$$\begin{aligned} {\mathbb {E}}[\Vert \xi _k\Vert ^2]&\le \sum _{t_1=1}^{k}\sum _{t_2=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t_1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t_2) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathbb {E}}\left[ \Vert \epsilon _{t_1}\Vert \Vert \epsilon _{t_2}\Vert \right] \nonumber \\&\le c_b^2\sum _{t_1=1}^{k}\sum _{t_2=1}^{k}\tau _{{\textbf{B}}}^{2k-t_1-t_2}{\mathbb {E}}\left[ \Vert \epsilon _{t_1}\Vert \Vert \epsilon _{t_2}\Vert \right] \nonumber \\&\le c_b^2\sum _{t_1=1}^{k}\sum _{t_2=1}^{k}\tau _{{\textbf{B}}}^{2k-t_1-t_2}\frac{{\mathbb {E}}\left[ \Vert \epsilon _{t_1}\Vert ^2+\Vert \epsilon _{t_2}\Vert ^2\right] }{2}, \end{aligned}$$
(44)

where \(c_b=\max \left\{ \overline{c},\frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{B}}-{\textbf{I}}_{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}}{\tau _{{\textbf{B}}}}\overline{c}\right\} \), the second inequality follows from (32). By the definition of \(\epsilon _k\),

$$\begin{aligned} {\mathbb {E}}\left[ \Vert \epsilon _{k}\Vert ^2\right]&=\sum _{j=1}^n{\mathbb {E}}\left[ \Vert \nabla G_j(x_{j,k};\phi _{j,k})\nabla F_j(z_{j,k};\zeta _{j,k})-\nabla g_j(x_{j,k})\nabla f_j(z_{j,k})\Vert ^2\right] \nonumber \\&\le 2\sum _{j=1}^n\left( {\mathbb {E}}\left[ \Vert \nabla G_j(x_{j,k};\phi _{j,k})\Vert ^2\Vert \nabla F_j(z_{j,k};\zeta _{j,k})\Vert ^2\right] +C_fC_g\right) \nonumber \\&= 2\sum _{j=1}^n\left( {\mathbb {E}}\left[ {\mathbb {E}}\left[ \Vert \nabla G_j(x_{j,k};\phi _{j,k})\Vert ^2\Vert \nabla F_j(z_{j,k};\zeta _{j,k})\Vert ^2\big |{\mathcal {F}}_k,\zeta _{j,k}\right] \right] +C_fC_g\right) \nonumber \\&\le 2\sum _{j=1}^n\left( C_g{\mathbb {E}}\left[ \Vert \nabla F_j(z_{j,k};\zeta _{j,k})\Vert ^2\right] +C_fC_g\right) \le 4n C_fC_g, \end{aligned}$$
(45)

where

$$\begin{aligned} \begin{aligned}&{\mathcal {F}}_1=\sigma \{x_{i,1}, z_{i,1}:i\in {\mathcal {V}}\},\\&{\mathcal {F}}_k=\sigma \left( \{x_{i,1},z_{i,1}, \phi _{i,t},\zeta _{i,t}:i\in {\mathcal {V}}, 1\le t\le k-1\}\cup \{\phi _{i,t}^{'}:i\in {\mathcal {V}}, 2\le t\le k\}\right) (k\ge 2). \end{aligned}\nonumber \\ \end{aligned}$$
(46)

Substitute (45) into (44), \({\mathbb {E}}[\Vert \xi _k\Vert ^2] \le c_b^24n C_fC_g\sum _{t_1=1}^{k}\sum _{t_2=1}^{k}\tau _{{\textbf{B}}}^{2k-t_1-t_2}\le \frac{c_b^24n C_fC_g}{(1-\tau _{{\textbf{B}}})^2}\). Part (i) is obtained.

By (43),

$$\begin{aligned}&\left| {\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \right| \\&=\left| \sum _{t=1}^{k}{\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{B}}}(k,t)\epsilon _t\right\rangle \right] \right| \\&=\left| \sum _{t=1}^{k-1}{\mathbb {E}}\left[ {\mathbb {E}}\left[ \left\langle \sum _{l=t+1}^{k}\left( \nabla h({\bar{x}}_l)-\nabla h({\bar{x}}_{l-1})\right) +\nabla h({\bar{x}}_t),\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{B}}}(k,t)\epsilon _t\right\rangle \bigg |{\mathcal {F}}_t\right] \right] \right. \\&\quad \left. +{\mathbb {E}}\left[ {\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{B}}}(k,t)\epsilon _k\right\rangle \bigg |{\mathcal {F}}_k\right] \right] \right| \\&=\left| \sum _{t=1}^{k-1}{\mathbb {E}}\left[ \left\langle \sum _{l=t+1}^{k}\left( \nabla h({\bar{x}}_l)-\nabla h({\bar{x}}_{l-1})\right) ,\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{B}}}(k,t)\epsilon _t\right\rangle \right] \right| \\&\le \frac{\Vert {\textbf{u}}\Vert Lc_b}{n}\sum _{t=1}^{k-1}\tau _{{\textbf{B}}}^{k-t}\sum _{l=t+1}^{k}{\mathbb {E}}\left[ \left\| {\bar{x}}_l-{\bar{x}}_{l-1}\right\| \left\| \epsilon _t\right\| \right] \\&\le \frac{\Vert {\textbf{u}}\Vert ^2Lc_b}{n^2}\sum _{t=1}^{k-1}\tau _{{\textbf{B}}}^{k-t}\sum _{l=t+1}^{k}\alpha _l{\mathbb {E}}\left[ \left\| {\textbf{y}}_l\right\| \left\| \epsilon _t\right\| \right] , \end{aligned}$$

where the third equality holds as \(\{\epsilon _t\}\) is a martingale difference sequence, the first inequality follows from (32) and the last inequality follows from the fact \({\bar{x}}_{k+1}={\bar{x}}_{k}-\alpha _k\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k\). By (33) and (45),

$$\begin{aligned} {\mathbb {E}}\left[ \left\| {\textbf{y}}_l\right\| \left\| \epsilon _t\right\| \right] \le \frac{{\mathbb {E}}\left[ \left\| {\textbf{y}}_l\right\| ^2\right] +{\mathbb {E}}\left[ \left\| \epsilon _k\right\| ^2\right] }{2}\le \frac{c_b^2n\ C_fC_g}{2(1-\tau _{{\textbf{B}}})^2}+2nC_fC_g. \end{aligned}$$
(47)

Let \(U=\frac{\Vert {\textbf{u}}\Vert ^2Lc_b}{n}\left( \frac{c_b^2\ C_fC_g}{2(1-\tau _{{\textbf{B}}})^2}+2C_fC_g\right) \),

$$\begin{aligned} {\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right]{} & {} \le (1-\tau _{{\textbf{B}}})U \sum _{t=1}^{k-1}\tau _{{\textbf{B}}}^{k-t}\sum _{l=t+1}^{k}\alpha _l\\{} & {} =(1-\tau _{{\textbf{B}}})U\sum _{t=2}^{k}\alpha _t\tau _{{\textbf{B}}}^{k-t}\left( \sum _{l=1}^{t-1}\tau _{{\textbf{B}}}^l\right) \le U c \alpha _k, \end{aligned}$$

where the last inequality follows from the fact \((1-\tau _{{\textbf{B}}})\left( \sum _{l=1}^{t-1}\tau _{{\textbf{B}}}^l\right) \le 1\) and Lemma 5. Part (ii) holds. The proof is complete. \(\square \)

1.5 Proof of Theorem 1

Proof

We first estimate the upper bound of \(\nabla h({\bar{x}}_k)\) in expectation. Noting that h(x) is \(L\left( = C_gL_f + C_f^{1/2}L_g\right) \)-smooth [46],

$$\begin{aligned} h({\bar{x}}_{k+1})&\le h({\bar{x}}_k)+\langle \nabla h({\bar{x}}_k),{\bar{x}}_{k+1}-{\bar{x}}_k\rangle +\frac{L}{2}\Vert {\bar{x}}_{k+1}-{\bar{x}}_k\Vert ^2\\&=h({\bar{x}}_k)-\left\langle \nabla h({\bar{x}}_k),\alpha _k\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}+\xi _k\right) \right\rangle +\frac{L}{2}\left\| \alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k\right\| ^2\\&=h({\bar{x}}_k)-\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\Vert \nabla h({\bar{x}}_k)\Vert ^2+\frac{L}{2}\left\| \alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k\right\| ^2\\&\quad +\left\langle \nabla h({\bar{x}}_k),\alpha _{k}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\nabla h({\bar{x}}_k)-\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}+\xi _k\right) \right) \right\rangle , \end{aligned}$$

where the second equality follows from the fact that

$$\begin{aligned} {\bar{x}}_{k+1}={\bar{x}}_{k}-\alpha _k\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k={\bar{x}}_{k}-\alpha _k\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}+\xi _k\right) . \end{aligned}$$

Take expectation on both sides of above inequality,

$$\begin{aligned} {\mathbb {E}}\left[ h({\bar{x}}_{k+1})\right]&\le {\mathbb {E}}\left[ h({\bar{x}}_k)\right] -\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\frac{L}{2}{\mathbb {E}}\left[ \left\| \alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k\right\| ^2\right] \nonumber \\&\quad +{\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),\alpha _{k}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\nabla h({\bar{x}}_k)-\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}+\xi _k\right) \right) \right\rangle \right] . \end{aligned}$$
(48)

For the third term on the right hand of (48),

$$\begin{aligned} \frac{L}{2}{\mathbb {E}}\left[ \left\| \alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_k\right\| ^2\right] \le \frac{L\alpha _{k}^2\Vert {\textbf{u}}\Vert ^2}{2n^2}{\mathbb {E}}\left[ \left\| {\textbf{y}}_k\right\| ^2\right] \le \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}\alpha _{k}^2, \end{aligned}$$
(49)

where the second inequalities follows from (33).

For the fourth term on the right hand of (48),

$$\begin{aligned}&{\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),\alpha _{k}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\nabla h({\bar{x}}_k)-\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}+\xi _k\right) \right) \right\rangle \right] \nonumber \\&\le \frac{\alpha _k^2}{2\tau }{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\frac{3\tau }{2}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\right) ^2{\mathbb {E}}\left[ \Vert P_1\Vert ^2\right] +\frac{3\tau }{2}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\right) ^2{\mathbb {E}}\left[ \Vert P_2\Vert ^2\right] \nonumber \\&\quad +\frac{3\tau \Vert {\textbf{u}}\Vert ^2}{2n^2}{\mathbb {E}}\left[ \Vert {\textbf{v}}\otimes {\bar{y}}_k^{'}-{\textbf{y}}_k^{'}\Vert ^2\right] +\frac{\alpha _{k}\Vert {\textbf{u}}\Vert }{n}\left| {\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),-\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \right| \nonumber \\&\le \frac{\alpha _k^2}{2\tau }{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\frac{3\tau L^2}{2n}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\right) ^2{\mathbb {E}}\left[ \Vert {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\Vert ^2\right] \nonumber \\&\quad +\frac{3\tau C_gL_f^2}{2n}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n}\right) ^2{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] \nonumber \\&\quad +\frac{3\tau \Vert {\textbf{u}}\Vert ^2}{2n^2}{\mathbb {E}}\left[ \Vert {\textbf{v}}\otimes {\bar{y}}_k^{'}-{\textbf{y}}_k^{'}\Vert ^2\right] +\frac{\alpha _{k}\Vert {\textbf{u}}\Vert }{n}\left| {\mathbb {E}}\left[ \left\langle \nabla h({\bar{x}}_k),-\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \right| \nonumber \\&\le \frac{\alpha _k^2}{2\tau }{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\frac{3\tau L^2n}{2}{\mathbb {E}}\left[ \Vert {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\Vert ^2\right] +\frac{3\tau C_gL_f^2n}{2}{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] \nonumber \\&\quad +\frac{3\tau }{2}{\mathbb {E}}\left[ \Vert {\textbf{v}}\otimes {\bar{y}}_k^{'}-{\textbf{y}}_k^{'}\Vert ^2\right] +U_2\alpha _k^2, \end{aligned}$$
(50)

where \(P_1=\nabla h({\bar{x}}_k)-\frac{1}{n}\sum _{j=1}^n\nabla g_j(x_{j,k})\nabla f_j(g_j(x_{j,k}))\), \(P_2=\frac{1}{n}\sum _{j=1}^n\nabla g_j(x_{j,k})\nabla f_j(g_j(x_{j,k}))-{\bar{y}}_k^{'}\) and \(\tau \) can be any positive scalar, the first inequality follows from Cauchy-Schwartz inequality and the fact \(ab\le \frac{1}{2\tau }a^2+\frac{\tau }{2}b^2\), the second inequality follows from the Lipschitz continuity of \(\nabla g_j(\cdot )\nabla f_j(g_j(\cdot ))\), Assumption 1 and the fact \({\bar{y}}_k^{'}=\frac{1}{n}\sum _{j=1}^n\nabla g_j(x_{j,k})\nabla f_j(z_{j,k})\), the third inequality follows from the facts \({\textbf{u}}^\intercal {\textbf{v}}\le n^2, \Vert {\textbf{u}}\Vert \le n\) and Lemma 4 (ii).

Plug (49)-(50) into (48) and set \(\tau =\frac{2n\alpha _k}{3{\textbf{u}}^\intercal {\textbf{v}}}\),

$$\begin{aligned}&{\mathbb {E}}\left[ h({\bar{x}}_{k+1})\right] \nonumber \\&\quad \le {\mathbb {E}}\left[ h({\bar{x}}_k)\right] -\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\left( 1-\frac{n\alpha _k}{{\textbf{u}}^\intercal {\textbf{v}}2\tau }\right) {\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] \nonumber \\&\qquad +\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) \alpha _k^2\nonumber \\&\qquad +\frac{3\tau C_gL_f^2n}{2}{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] +\frac{3\tau L^2n}{2}{\mathbb {E}}\left[ \Vert {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\Vert ^2\right] +\frac{3\tau }{2}{\mathbb {E}}\left[ \Vert {\textbf{v}}\otimes {\bar{y}}_k^{'}-{\textbf{y}}_k^{'}\Vert ^2\right] \nonumber \\&\quad \le {\mathbb {E}}\left[ h({\bar{x}}_k)\right] -\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _k}{4n}{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) \alpha _{k}^2\nonumber \\&\qquad + \frac{n\beta _k}{{\textbf{u}}^\intercal {\textbf{v}}}{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] +\frac{n^2L^2\alpha _k}{{\textbf{u}}^\intercal {\textbf{v}}}{\mathbb {E}}\left[ \Vert {\textbf{x}}_k-{\textbf{1}}\otimes {\bar{x}}_k\Vert ^2\right] +\frac{n\alpha _k}{{\textbf{u}}^\intercal {\textbf{v}}}{\mathbb {E}}\left[ \Vert {\textbf{v}}\otimes {\bar{y}}_k^{'}-{\textbf{y}}_k^{'}\Vert ^2\right] \nonumber \\&\quad \le {\mathbb {E}}\left[ h({\bar{x}}_k)\right] -\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _k}{4n}{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) \alpha _{k}^2\nonumber \\&\qquad + \frac{n\beta _k}{{\textbf{u}}^\intercal {\textbf{v}}}{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] +\alpha _kc_6\rho ^{k-1} \left( {\mathbb {E}}\left[ \left\| {\textbf{x}}_1-{\textbf{1}}\otimes {\bar{x}}_1\right\| _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_{1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{1}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] \right) \nonumber \\&\qquad +\alpha _k^3c_6(c_1+c_3c_5)c_\alpha +\alpha _k^3c_6c_5c_4C_g^2L_f^4n^2\left( \sum _{t=1}^{k-1}\rho ^{k-t-1}{\mathbb {E}}\left[ \left\| {\textbf{g}}_{t}-{\textbf{z}}_t\right\| ^2\right] +c_\beta \right) , \end{aligned}$$
(51)

where \(c_6=\frac{\max \{\overline{c}^2\,L^2n^2,n\overline{c}^2/c_5\}}{{\textbf{u}}^\intercal {\textbf{v}}}\), \(c_5\) is defined in Lemma 2, \(c_\alpha \) and \(c_\beta \) are some constant, the last inequality follows from Lemma 2, Lemma 5 and the definitions \(\alpha _{k}=\frac{a}{\sqrt{K}}\), \(\beta _k=\alpha _{k} C_gL_f^2n\).

Reordering the terms of (51) and summing over k from 1 to K,

$$\begin{aligned}&\sum _{k=1}^K\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _k}{4n}{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] \\&\le {\mathbb {E}}\left[ h({\bar{x}}_1)\right] -{\mathbb {E}}\left[ h({\bar{x}}_{K+1})\right] +\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) \sum _{k=1}^K\alpha _{k}^2\\&\quad + \sum _{k=1}^K\frac{n\beta _k}{{\textbf{u}}^\intercal {\textbf{v}}}{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] +\sum _{k=1}^K\alpha _kc_6\rho ^{k-1} \left( {\mathbb {E}}\left[ \left\| {\textbf{x}}_1-{\textbf{1}}\otimes {\bar{x}}_1\right\| _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_{1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{1}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] \right) \\&\quad +\sum _{k=1}^K\alpha _k^3c_6(c_1+c_3c_5)c_\alpha +\sum _{k=2}^K\alpha _k^3c_6c_5c_4C_g^2L_f^4n^2\left( \sum _{t=1}^{k-1}\rho ^{k-t-1}{\mathbb {E}}\left[ \left\| {\textbf{g}}_{t}-{\textbf{z}}_t\right\| ^2\right] +c_\beta \right) \\&\le {\mathbb {E}}\left[ h({\bar{x}}_1)\right] -{\mathbb {E}}\left[ h({\bar{x}}_{K+1})\right] +\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) K\alpha _{1}^2\\&\quad + \frac{n\beta _1}{{\textbf{u}}^\intercal {\textbf{v}}}\sum _{k=1}^K{\mathbb {E}}\left[ \Vert {\textbf{g}}_k-{\textbf{z}}_k\Vert ^2\right] +\frac{\alpha _1c_6\left( {\mathbb {E}}\left[ \left\| {\textbf{x}}_1-{\textbf{1}}\otimes {\bar{x}}_1\right\| _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_{1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{1}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] \right) }{1-\rho }\\&\quad +K\alpha _1^3\left( c_6(c_1+c_3c_5)c_\alpha +c_6c_5c_4C_g^2L_f^4n^2c_\beta \right) +\alpha _1^3\frac{c_6c_5c_4C_g^2L_f^4n^2}{1-\rho }\sum _{k=1}^{K-1}{\mathbb {E}}\left[ \left\| {\textbf{g}}_{k}-{\textbf{z}}_k\right\| ^2\right] , \end{aligned}$$

where the second inequality follows from definitions \(\alpha _{k}=\frac{a}{\sqrt{K}}\) and \(\beta _k=\alpha _{k} C_gL_f^2n\). Multiplying both sides of the above inequality by \(\frac{4n}{a{\textbf{u}}^\intercal {\textbf{v}}\sqrt{K}}\),

$$\begin{aligned} \frac{1}{K}\sum _{k=1}^K{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right]&\le \frac{4\left( {\mathbb {E}}\left[ h({\bar{x}}_1)\right] -{\mathbb {E}}\left[ h({\bar{x}}_{K+1})\right] \right) \frac{n}{a{\textbf{u}}^\intercal {\textbf{v}}}+4\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) \frac{na}{{\textbf{u}}^\intercal {\textbf{v}}}}{\sqrt{K}}\\&\quad +{\mathcal {O}}\left( \left( \frac{1}{K}+\frac{1}{K^2}\right) \sum _{k=1}^{K}{\mathbb {E}}\left[ \left\| {\textbf{g}}_{t}-{\textbf{z}}_t\right\| ^2\right] \right) +{\mathcal {O}}\left( \frac{1}{K}\right) . \end{aligned}$$

By Lemma 3,

$$\begin{aligned} \frac{1}{K}\sum _{k=2}^{K+1}{\mathbb {E}}\left[ \left\| {\textbf{g}}_{t}-{\textbf{z}}_t\right\| ^2\right]&\le \frac{1}{K}\sum _{k=1}^{K}(1-\beta _k){\mathbb {E}}\left[ \Vert {\textbf{z}}_k-{\textbf{g}}_k\Vert ^2\right] +{\mathcal {O}}\left( \frac{1}{K}\right) . \end{aligned}$$

Rearranging the above inequality, we have \(\frac{1}{K}\sum _{k=1}^{K}{\mathbb {E}}\left[ \left\| {\textbf{g}}_{t}-{\textbf{z}}_t\right\| ^2\right] \le {\mathcal {O}}\left( \frac{1}{\sqrt{K}}\right) \). Then,

$$\begin{aligned} \frac{1}{K}\sum _{k=1}^K{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right]&\le \frac{4\left( {\mathbb {E}}\left[ h({\bar{x}}_1)\right] -{\mathbb {E}}\left[ h({\bar{x}}_{K+1})\right] \right) /a+4\left( \frac{L\Vert {\textbf{u}}\Vert ^2c_b^2\ C_fC_g}{2n^2(1-\tau _{{\textbf{B}}})^2}+U_2\right) a}{\sqrt{K}}+{\mathcal {O}}\left( \frac{1}{\sqrt{K}}\right) \\&\le {\mathcal {O}}\left( \frac{1}{\sqrt{K}}\right) . \end{aligned}$$

By the Lipschitz continuity of \(\nabla h(\cdot )\), we have

$$\begin{aligned} \frac{1}{K}\sum _{k=1}^K{\mathbb {E}}\left[ \Vert \nabla h(x_{i,k})\Vert ^2\right]&\le \frac{2}{K}\sum _{k=1}^K{\mathbb {E}}\left[ \Vert \nabla h({\bar{x}}_k)\Vert ^2\right] +\frac{2L^2}{K}\sum _{k=1}^K{\mathbb {E}}\left[ \Vert x_{i,k}-{\bar{x}}_k\Vert ^2\right] \\&\le {\mathcal {O}}\left( \frac{1}{\sqrt{K}}\right) , \end{aligned}$$

where the last inequality follows from (38). The proof is complete. The proof is complete. \(\square \)

Lemma 6

Let \(\alpha _k=a/(k+b)^\alpha \), \(a>0,b\ge 0\), \(\alpha \in (1/2, 1]\). Under Assumptions 12 and the condition that objective function h(x) is \(\mu \)-strongly convex,

$$\begin{aligned} {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,-\alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \le \frac{\Vert {\textbf{u}}\Vert c_bc_0}{2n(1-\tau _{{\textbf{B}}})}\left( \frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}+4nC_g C_f\right) \alpha _k^2, \end{aligned}$$

where \(c_b=\max \left\{ \overline{c},\frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{B}}-{\textbf{I}}_{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}}{\tau _{{\textbf{B}}}}\overline{c}\right\} \), \(c_0\) is some constant scalar.

Proof

Recall the definition \({\bar{x}}_{k}= \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{x}}_{k}\) in Lemma 2,

$$\begin{aligned} \begin{aligned} {\bar{x}}_{k}-x^* ={\bar{x}}_{k-1}-x^*-\alpha _{k-1}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_{k-1}={\bar{x}}_{1}-x^*-\sum _{t=1}^{k-1}\alpha _t\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_t, \end{aligned} \end{aligned}$$

and then

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*, -\alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \\&={\mathbb {E}}\left[ \left\langle {\bar{x}}_{1}-x^*-\sum _{t=1}^{k-1}\alpha _t\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_t, -\alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \\&=-\alpha _{k}{\mathbb {E}}\left[ \left\langle {\bar{x}}_{1}-x^*-\sum _{t=1}^{k-1}\alpha _t\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_t, \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \sum _{t=1}^{k}\tilde{{\textbf{B}}}(k,t)\epsilon _t\right\rangle \right] , \end{aligned} \end{aligned}$$

where \(\epsilon _t={\textbf{H}}_t-{\textbf{J}}_t\), \({\textbf{H}}_t\) and \({\textbf{J}}_t\) are defined in (10) and Lemma 2 respectively, the second equality follows from (43). Note that \({\mathbb {E}}\left[ \left\langle {\bar{x}}_{0}-x^*, \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{B}}}(k,t)\epsilon _t\right\rangle \bigg |{\mathcal {F}}_t\right] =0\) and

$$\begin{aligned} {\mathbb {E}}\left[ \left\langle \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{y}}_{t_1}, \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{B}}}(k,t_2)\epsilon _{t_2}\right\rangle \bigg |{\mathcal {F}}_{t_2}\right] =0~ (t_1<t_2), \end{aligned}$$

where \({\mathcal {F}}_k\) is defined in (46). Then

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*, -\alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \\&\le \alpha _{k}\sum _{t_1=1}^{k-1}\sum _{t_2=1}^{t_1}\alpha _{t_1}\frac{\Vert {\textbf{u}}\Vert ^2}{2n^2}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(k,t_2) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\left( {\mathbb {E}}\left[ \Vert {\textbf{y}}_{t_1}\Vert ^2\right] +{\mathbb {E}}\left[ \left\| \epsilon _{t_2}\right\| ^2\right] \right) \\&\le \alpha _{k}\sum _{t_1=1}^{k-1}\sum _{t_2=1}^{t_1}\alpha _{t_1}\frac{\Vert {\textbf{u}}\Vert ^2c_b}{2n^2}\tau _{{\textbf{B}}}^{k-t_2}\left( \frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}+4nC_g C_f\right) \\&\le \frac{\Vert {\textbf{u}}\Vert ^2c_bc}{2n^2(1-\tau _{{\textbf{B}}})}\left( \frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}+4nC_g C_f\right) \alpha _k\alpha _{k-1}, \end{aligned} \end{aligned}$$

where \(c_b=\max \left\{ \overline{c},\frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{B}}-{\textbf{I}}_{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}}{\tau _{{\textbf{B}}}}\overline{c}\right\} \) and c is some constant scalar, the second inequality follows from (32), (33) and (45), the third inequality follows from Lemma 5. Noting that \(\lim _{k\rightarrow \infty }\frac{\alpha _{k-1}}{\alpha _k}=1\), there exists constant \(c_0>c\) such that

$$\begin{aligned} {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,-\alpha _{k}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k\right\rangle \right] \le \frac{\Vert {\textbf{u}}\Vert ^2c_bc_0}{2n^2(1-\tau _{{\textbf{B}}})}\left( \frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}+4nC_g C_f\right) \alpha _k^2. \end{aligned}$$

The proof is complete. \(\square \)

1.6 Proof of Theorem 2

Proof

Recall the definition \({\bar{x}}_{k+1}= \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) {\textbf{x}}_{k+1}\) in Lemma 2,

$$\begin{aligned} {\bar{x}}_{k+1}&=\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \tilde{{\textbf{A}}}\left( {\textbf{x}}_k-\alpha _k{\textbf{y}}_k\right) \nonumber \\&={\bar{x}}_{k}-\alpha _k\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{y}}_k^{'}+\xi _k\right) \nonumber \\&={\bar{x}}_{k}-\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\nabla h({\bar{x}}_k)+\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\Bigg (\underbrace{\nabla h({\bar{x}}_k)-\frac{1}{n}\sum _{j=1}^n\nabla g_j(x_{j,k})\nabla f_j(g_j(x_{j,k}))}_{P^{(1)}_k}\nonumber \\&\quad +\underbrace{\frac{1}{n}\sum _{j=1}^n\nabla g_j(x_{j,k})\nabla f_j(g_j(x_{j,k}))-{\bar{y}}^{'}_k}_{P^{(2)}_k}+\underbrace{\frac{n}{{\textbf{u}}^\intercal {\textbf{v}}}\left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \left( {\textbf{v}}\otimes {\bar{y}}^{'}_k-{\textbf{y}}_k^{'}\right) }_{P^{(3)}_k}\nonumber \\&\quad +\underbrace{\left( -\frac{n}{{\textbf{u}}^\intercal {\textbf{v}}}\right) \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _k}_{P^{(4)}_k}\Bigg ), \end{aligned}$$
(52)

where \({\textbf{y}}_k^{'}\) and \(\xi _{k+1}\) are defined in (13) and (19), the second equality follows from the fact \({\textbf{u}}^\intercal {\textbf{A}}={\textbf{1}}\). Subsequently,

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\bar{x}}_{k+1}-x^*\Vert ^2\right] \nonumber \\&={\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*-\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\nabla h({\bar{x}}_k)\right\| ^2\right] +\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\| ^2\right] \nonumber \\&\quad +2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*-\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\nabla h({\bar{x}}_k), P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\rangle \right] \nonumber \\&\le \left( 1-\frac{{\textbf{u}}^\intercal {\textbf{v}}\mu \alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] +\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\| ^2\right] \nonumber \\&\quad +2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*-\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\nabla h({\bar{x}}_k), P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\rangle \right] \nonumber \\&\le \left( 1-\frac{{\textbf{u}}^\intercal {\textbf{v}}\mu \alpha _{k}}{n}\right) ^2 {\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] +\frac{\tau }{2}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| \nabla h({\bar{x}}_k)\right\| ^2\right] \nonumber \\&\quad +\left( 1+\frac{1}{2\tau }\right) \left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\| ^2\right] \nonumber \\&\quad +2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\rangle \right] \nonumber \\&\le \left( \left( 1-\frac{{\textbf{u}}^\intercal {\textbf{v}}\mu \alpha _{k}}{n}\right) ^2+\frac{\tau }{2}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2L^2 \right) {\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] \nonumber \\&\quad +\left( 1+\frac{1}{2\tau }\right) \left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\| ^2\right] \nonumber \\&\quad +2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\rangle \right] , \end{aligned}$$
(53)

where \(\tau \) is any positive scalar, the first inequality follows from [30, Lemm 10], the second inequality follows from the fact \(ab\le \frac{\tau a^2}{2}+\frac{b^2}{2\tau }\) and the third inequality follows from the fact that \(\nabla h(x)\) is \(L(:=C_gL_f + C_f^{1/2}L_g)\)-smooth.

For the second term on the right hand side of (53),

$$\begin{aligned}&\left( 1+\frac{1}{2\tau }\right) \left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\| ^2\right] \nonumber \\&\le \left( 1+\frac{1}{2\tau }\right) \left( 4\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2\frac{L^2\overline{c}^2}{n}{\mathbb {E}}\left[ \Vert x_{k}-{\textbf{1}}\otimes {\bar{x}}_{k}\Vert _{\hat{{\textbf{A}}}}^2\right] +4\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2\frac{C_g^2L_f^2}{n}{\mathbb {E}}\left[ \left\| {\textbf{g}}_k-{\textbf{z}}_k\right\| ^2\right] \right. \nonumber \\&\quad \left. +4\alpha _k^2\frac{\Vert {\textbf{u}}\Vert ^2}{n^2}\overline{c}^2{\mathbb {E}}\left[ \left\| {\textbf{y}}_k^{'}-{\textbf{v}}\otimes {\bar{y}}^{'}_k\right\| _{\hat{{\textbf{B}}}}^2\right] +4\frac{\Vert {\textbf{u}}\Vert ^2}{n^2}\alpha _k^2{\mathbb {E}}\left[ \left\| \xi _k\right\| ^2\right] \right) , \end{aligned}$$
(54)

where \(c_b=\max \left\{ \overline{c},\frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{B}}-{\textbf{I}}_{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}}{\tau _{{\textbf{B}}}}\overline{c}\right\} \), the inequality follows from Assumption 1 (c), the Lipschitz continuity of gradients \(\nabla g(\cdot )\nabla f_j(g(\cdot ))\) and \(\nabla f_j(\cdot )\). By Lemma 3 and [25, Lemmas 4-5 in Chapter 2], there exists a constant \(U_3\) such that

$$\begin{aligned} {\mathbb {E}}\left[ \left\| {\textbf{g}}_k-{\textbf{z}}_k\right\| ^2\right] \le U_3\beta _k, \end{aligned}$$
(55)

and then by Lemmas 2 and 5,

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\textbf{x}}_{k+1}-{\textbf{1}}\otimes {\bar{x}}_{k+1}\Vert _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \Vert {\textbf{y}}_{k+1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{k+1}^{'}\Vert _{\hat{{\textbf{B}}}}^2\right] \nonumber \\&\le \rho ^{k} \left( {\mathbb {E}}\left[ \left\| {\textbf{x}}_1-{\textbf{1}}\otimes {\bar{x}}_1\right\| _{\hat{{\textbf{A}}}}^2\right] +c_5{\mathbb {E}}\left[ \left\| {\textbf{y}}_{1}^{'}-{\textbf{v}}\otimes {\bar{y}}_{1}^{'}\right\| _{\hat{{\textbf{B}}}}^2\right] \right) +(c_1+c_3c_5)\sum _{t=1}^k\rho ^{k-t}\alpha _t^2\nonumber \\&\quad +c_5c_4U_3\sum _{t=1}^k\rho ^{k-t}\beta _t^3+c_5c_4V_g\sum _{t=1}^k\rho ^{k-t}\beta _t^2\nonumber \\&\le {\mathcal {O}}\left( \alpha _{k+1}^2\right) . \end{aligned}$$
(56)

Combining inequalities (54)-(56), we have

$$\begin{aligned}&\left( 1+\frac{1}{2\tau }\right) \left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}} \left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\| ^2\right] \nonumber \\&\le \left( 1+\frac{1}{2\tau }\right) \left( {\mathcal {O}}(\alpha _k^3) +4\frac{\Vert {\textbf{u}}\Vert ^2}{n^2}\alpha _k^2{\mathbb {E}}\left[ \left\| \xi _k\right\| ^2\right] \right) \nonumber \\&\le {\mathcal {O}}(\alpha _k^3)+16\left( 1+\frac{L^2}{\mu ^2}\right) \frac{\Vert {\textbf{u}}\Vert ^2c_b^2 C_fC_g}{n(1-\tau _{{\textbf{B}}})^2}\alpha _k^2, \end{aligned}$$
(57)

where

$$\begin{aligned} \tau =\frac{\mu ^2}{2L^2} \end{aligned}$$
(58)

and the second inequality follows from Lemma 4.

For the third term on the right hand side of (53),

$$\begin{aligned}&2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right\rangle \right] \nonumber \\&\le \tau _1 {\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] +\frac{1}{\tau _1}\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) ^2{\mathbb {E}}\left[ \left\| P^{(1)}_k+P^{(2)}_k+P^{(3)}_k\right\| ^2\right] \nonumber \\&\quad +2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,P^{(4)}_k\right\rangle \right] \nonumber \\&\le \tau _1 {\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] +\frac{1}{\tau _1}{\mathcal {O}}(\alpha _k^3)+2\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) {\mathbb {E}}\left[ \left\langle {\bar{x}}_{k}-x^*,P^{(4)}_k\right\rangle \right] \nonumber \\&\le \frac{{\textbf{u}}^\intercal {\textbf{v}}\mu \alpha _k}{4n} {\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] +{\mathcal {O}}(\alpha _k^2)+\frac{\Vert {\textbf{u}}\Vert c_bc_0}{n(1-\tau _{{\textbf{B}}})}\left( \frac{c_b^2nC_gC_f}{(1-\tau _{{\textbf{B}}})^2}+4nC_g C_f\right) \alpha _k^2, \end{aligned}$$
(59)

where \(c_0\) is some constant scalar,

$$\begin{aligned} \tau _1=\frac{{\textbf{u}}^\intercal {\textbf{v}}\mu }{4n}\alpha _k, \end{aligned}$$

the first inequality follows from the fact \(ab\le \frac{\tau _1 a^2}{2}+\frac{b^2}{2\tau _1}\) for any positive scalar \(\tau _1\), the second inequality follows from (57) and the third inequality follows from Lemma 6. Substitute (57)-(59) into (53),

$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\bar{x}}_{k+1}-x^*\Vert ^2\right]&\le \left( 1-\frac{{\textbf{u}}^\intercal {\textbf{v}}\mu \alpha _{k}}{4n}\right) {\mathbb {E}}\left[ \left\| {\bar{x}}_{k}-x^*\right\| ^2\right] +{\mathcal {O}}(\alpha _k^2). \end{aligned}$$

Then by [25, Lemmas 4-5 in Chapter 2],

$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\bar{x}}_{k+1}-x^*\Vert ^2\right] ={\mathcal {O}}\left( \alpha _k\right) ~ \text {if}~\alpha _k=a/(k+b)^\alpha ,\alpha \in (1/2,1) \end{aligned}$$

and

$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\bar{x}}_k-x^*\Vert ^2\right] ={\mathcal {O}}\left( \frac{1}{k}\right) ~ \text {if}~\alpha _k=a/(k+b). \end{aligned}$$

The proof is complete. \(\square \)

Lemma 7

Let \(\alpha _{k}=a/(k+b)^\alpha \), \(a>0\), \(b\ge 0\), \(\alpha \in (1/2,1)\). Suppose that

  1. (a)

    Assumptions 1-2 hold;

  2. (b)

    for any \(i\in {\mathcal {V}}\), there exist scalar \(C_i\) and matrix \({\textbf{T}}_i\) such that

    $$\begin{aligned} \left\| \nabla f_i(y)-\nabla f_i(y^{'})-{\textbf{T}}_i\left( y-y^{'}\right) \right\| \le C_i\Vert y-y^{'}\Vert ^{1+\gamma },\quad \forall y,y^{'}\in {\mathbb {R}}^p, \end{aligned}$$

    where \(\gamma \in (0,1]\) satisfies that \(\sum _{k=1}^\infty \frac{\alpha _k^{(1+\gamma )/2}}{\sqrt{k}}<\infty \).

Denote

$$\begin{aligned} \begin{aligned}&{{\textbf {H}}}_{\theta }= {} \left( \begin{array}{cc} \frac{1}{n}{{\textbf {H}}}&{}{} {{\textbf {I}}}_d\\ {{\textbf {0}}}&{}{} \frac{n\beta }{{{\textbf {u}}}^\intercal {{\textbf {v}}}}{{\textbf {I}}}_d \end{array} \right) ,\\ {}&{{\textbf {M}}}(k,t)= {} {\tilde{\alpha }}_t\sum _{l_1=t}^k\Pi _{l_2=t+1}^{l_1}\left( {{\textbf {I}}}_{2d}-{\tilde{\alpha }}_k{{\textbf {H}}}_{\theta }\right) , {{\textbf {N}}}(k,t)={{\textbf {M}}}(k,t)-{{\textbf {H}}}_{\theta }^{-1},\\ {}&\eta _t^{(1)}= {} \left( \begin{array}{c} \left( -\frac{n}{{{\textbf {u}}}^\intercal {{\textbf {v}}}}\right) \left( \frac{{{\textbf {u}}}^\intercal }{n}\otimes {{\textbf {I}}}_{d}\right) \xi _t\\ \frac{\beta }{{{\textbf {u}}}^\intercal {{\textbf {v}}}}\sum _{j=1}^{n}\nabla g_j(x^*){{\textbf {T}}}_j\left( G_j(x^*;\phi _{i,t+1}^{'})-g_j(x^*)\right) \end{array} \right) . \end{aligned} \end{aligned}$$

We have

$$\begin{aligned} \lim _{k\rightarrow \infty }{\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{N}}(k,t)\eta _t^{(1)}\right\| ^2\right] =0. \end{aligned}$$

Proof

Note that

$$\begin{aligned} \eta _t^{(1)}=\left( \begin{array}{c} \left( -\frac{n}{{\textbf{u}}^\intercal {\textbf{v}}}\right) \left( \frac{{\textbf{u}}^\intercal }{n}\otimes {\textbf{I}}_{d}\right) \xi _t\\ {\textbf{0}} \end{array} \right) +\left( \begin{array}{c} {\textbf{0}}\\ \frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( G_j(x^*;\phi _{i,t+1}^{'})-g_j(x^*)\right) \end{array} \right) \end{aligned}$$

and

$$\begin{aligned}&{\mathbb {E}}\left[ \left\langle \xi _{t_1},\xi _{t_2}\right\rangle \right] ={\mathbb {E}}\left[ {\mathbb {E}}\left[ \left\langle \xi _{t_1},\xi _{t_2}\right\rangle \big | {\mathcal {F}}_{\min \{t_1,t_2\}}\right] \right] \\&={\mathbb {E}}\left[ \left\langle \xi _{\min \{t_1,t_2\}}, \sum _{l=1}^{\min \{t_1,t_2\}}\tilde{{\textbf{B}}}(\max \{t_1,t_2\},l)\epsilon _l\right\rangle \right] ~(t_1\le t_2),\\&{\mathbb {E}}\left[ \left\langle G_j(x^*;\phi _{i,t_1}^{'})-g_j(x^*),G_j(x^*;\phi _{i,t_2}^{'})-g_j(x^*)\right\rangle \big |{\mathcal {F}}_{\min \{t_1,t_2\}}\right] =0 ~(t_1\ne t_2), \end{aligned}$$

where \({\mathcal {F}}_t\) is defined in (46). Then

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{N}}(k,t)\eta _t^{(1)}\right\| ^2\right]&={\mathbb {E}}\left[ {\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{N}}(k,t)\eta _t^{(1)}\right\| ^2\bigg |{\mathcal {F}}_{\min \{t_1,t_2\}}\right] \right] \\&\le \left( \frac{\Vert {\textbf{u}}\Vert }{{\textbf{u}}^\intercal {\textbf{v}}}\right) ^2\frac{4}{k}\sum _{t_1=1}^{k}\sum _{t_2=t_1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t_1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t_2) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\sum _{l=1}^{t_1}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| \tilde{{\textbf{B}}}(t_2,l) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathbb {E}}\left[ \left\| \xi _{t_1}\right\| \left\| \epsilon _l\right\| \right] \\&\quad +\frac{2}{k}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2{\mathbb {E}}\left[ \left\| \frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( G_j(x^*;\phi _{i,t}^{'})-g_j(x^*)\right) \right\| ^2\right] \\&\le c_bc_N(c_b^2+1)nC_fC_g\left( \frac{\Vert {\textbf{u}}\Vert }{{\textbf{u}}^\intercal {\textbf{v}}}\right) ^2\frac{1}{(1-\tau _{{\textbf{B}}})^4}\frac{8}{k}\sum _{t_1=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t_1) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\\&\quad +n\left( \frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\right) ^2\left( \sum _{j=1}^{n}\Vert \nabla g_j(x^*)\Vert ^2\Vert {\textbf{T}}_j\Vert ^2\right) V_gc_N\frac{2}{k}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }, \end{aligned} \end{aligned}$$

where \(c_b=\max \left\{ \overline{c},\frac{{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{B}}-{\textbf{I}}_{n} \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }_{\hat{{\textbf{B}}}}}{\tau _{{\textbf{B}}}}\overline{c}\right\} ,~c_N=\sup _{k,t}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\), the second inequality follows from the fact \(\sup _{k,t}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }<\infty \) [26, Lemma 1 (ii)], (32), (45), Lemma 4 (i) and Assumption 1 (c). By [26, Lemma 1 (ii)],

$$\begin{aligned} \lim _{k\rightarrow \infty }\frac{1}{k}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{N}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }=0, \end{aligned}$$

which implies \( \lim _{k\rightarrow \infty }{\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{N}}(k,t)\eta _t^{(1)}\right\| ^2\right] =0\). The proof is complete. \(\square \)

1.7 Proof of Theorem 3

Proof

By (56),

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}\left( {\bar{x}}_{t}-x^*\right) -\frac{1}{\sqrt{k}}\sum _{t=1}^{k}\left( x_{i,t}-x^*\right) \right\| \right] \\&\le \frac{1}{\sqrt{k}}\sum _{t=0}^{k-1}\sqrt{{\mathbb {E}}\left[ \Vert x_{t}-{\textbf{1}}\otimes {\bar{x}}_{t}\Vert ^2\right] }\le \frac{\sqrt{U_1}}{\sqrt{k}}\sum _{t=0}^{k-1}\alpha _t\rightarrow 0. \end{aligned} \end{aligned}$$

Then by Slutsky’s theorem, it is sufficient to show

$$\begin{aligned} \frac{1}{\sqrt{k}}\sum _{t=1}^{k} \left( \begin{array}{c} {\bar{x}}_t-x^*\\ \frac{\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( z_{j,t}-g_j\left( x_{j,t}\right) \right) }{n} \end{array} \right) {\mathop {\longrightarrow }\limits ^{d}} N\left( {\textbf{0}},\left( \begin{array}{cc} {\textbf{H}}^{-1}\left( {\textbf{S}}_1+{\textbf{S}}_2\right) ({\textbf{H}}^{-1})^\intercal &{} -\frac{1}{n}{\textbf{H}}^{-1}{\textbf{S}}_2\\ -\frac{1}{n}{\textbf{S}}_2({\textbf{H}}^{-1})^\intercal &{} \frac{1}{n^2}{\textbf{S}}_2 \end{array} \right) \right) . \end{aligned}$$

Subtract \(x^*\) from both sides of (52),

$$\begin{aligned} {\bar{x}}_{k+1}-x^*&={\bar{x}}_{k}-x^*-\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\nabla h({\bar{x}}_k)+\left( \frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\right) \left( P^{(1)}_k+P^{(2)}_k+P^{(3)}_k+P^{(4)}_k\right) \nonumber \\&=\left( {\textbf{I}}_d-{\tilde{\alpha }}_k\frac{1}{n}{\textbf{H}}\right) ({\bar{x}}_{k}-x^*)-{\tilde{\alpha }}_k\frac{1}{n}\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( z_{j,k}-g_j\left( x_{j,k}\right) \right) \nonumber \\&\quad +{\tilde{\alpha }}_k\left( P^{(0)}_k+P^{(1)}_k+P^{(3)}_k+P^{(4)}_k\right) , \end{aligned}$$
(60)

where \({\tilde{\alpha }}_k=\frac{{\textbf{u}}^\intercal {\textbf{v}}\alpha _{k}}{n}\),

$$\begin{aligned} P^{(0)}_k=-\left( \nabla h({\bar{x}}_k)-\frac{1}{n}{\textbf{H}}({\bar{x}}_{k}-x^*)\right) +\frac{1}{n}\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( z_{j,k}-g_j\left( x_{j,k}\right) \right) +P^{(2)}_k.\nonumber \\ \end{aligned}$$
(61)

According to the definition of \(z_{i,k+1}\) and \(\beta _k\),

$$\begin{aligned} z_{i,k+1}-g_i\left( x_{i,k+1}\right)&=\left( 1-\frac{n\beta }{{\textbf{u}}^\intercal {\textbf{v}}}{\tilde{\alpha }}_k\right) \left( z_{i,k}-g_i\left( x_{i,k}\right) \right) \\&\quad +G_{i,k+1}^{(1)}-g_i(x_{i,k+1})+\left( 1-\beta _k\right) \left( g_i(x_{i,k})-G_{i,k+1}^{(2)}\right) , \end{aligned}$$

where \(G_{i,k+1}^{(1)}=G_i(x_{i,k+1};\phi _{i,k+1}^{'})\), \(G_{i,k+1}^{(2)}=G_i(x_{i,k};\phi _{i,k+1}^{'})\). Combining above equation with (60),

$$\begin{aligned} \Delta _{k+1}=\left( {\textbf{I}}_{2d}-{\tilde{\alpha }}_k{\textbf{H}}_{\theta }\right) \Delta _k+{\tilde{\alpha }}_k\eta _k^{(1)}+{\tilde{\alpha }}_k\left( \eta _k^{(2)}+\eta _k^{(3)}\right) , \end{aligned}$$
(62)

where

$$\begin{aligned} \Delta _k= & {} \left( \begin{array}{c} {\bar{x}}_{k}-x^*\\ \frac{\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( z_{j,k}-g_j\left( x_{j,k}\right) \right) }{n} \end{array} \right) ,\nonumber \\ {\textbf{H}}_{\theta }= & {} \left( \begin{array}{cc} \frac{1}{n}{\textbf{H}}&{} {\textbf{I}}_d\\ {\textbf{0}}&{} \frac{n\beta }{{\textbf{u}}^\intercal {\textbf{v}}}{\textbf{I}}_d \end{array} \right) , \nonumber \\ \eta _k^{(1)}= & {} \left( \begin{aligned}&\quad \quad \quad ~~\quad \quad \quad ~~\quad \quad P^{(4)}_k\\&\frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( G_j(x^*;\phi _{j,k+1}^{'})-g_j(x^*)\right) \end{aligned} \right) ,\nonumber \\ \eta _k^{(2)}= & {} \left( \begin{array}{c} P^{(0)}_k+P^{(1)}_k+P^{(3)}_k\\ {\textbf{0}} \end{array} \right) , \end{aligned}$$
(63)

and

$$\begin{aligned} \eta _k^{(3)}=\left( \begin{array}{c} {\textbf{0}}\\ \sum \limits _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( \frac{G_{j,k+1}^{(1)}-g_j(x_{j,k+1})+\left( 1-\beta _k\right) \left( g_j(x_{j,k})-G_{j,k+1}^{(2)}\right) }{n{\tilde{\alpha }}_k}-\frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\left( G_j(x^*;\phi _{j,k+1}^{'})-g_j(x^*)\right) \right) \end{array} \right) . \end{aligned}$$

Denote \({\textbf{M}}(k,t)={\tilde{\alpha }}_t\sum _{l_1=t}^{k-1}\Pi _{l_2=t+1}^{l_1}\left( {\textbf{I}}_{2d}-{\tilde{\alpha }}_{l_2}{\textbf{H}}_{\theta }\right) ,\quad {\textbf{N}}(k,t)={\textbf{M}}(k,t)-{\textbf{H}}_{\theta }^{-1}\), where \({\textbf{M}}(k,k)={\textbf{0}},\Pi _{l=t+1}^{t}\left( {\textbf{I}}_{2d}-{\tilde{\alpha }}_{l_2}{\textbf{H}}_{\theta }\right) ={\textbf{I}}_{2d}\). Then by the recursion (62),

$$\begin{aligned} \frac{1}{\sqrt{k}}\sum _{t=1}^{k}\Delta _t&=\frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{H}}_{\theta }^{-1}\eta _t^{(1)}+\frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{N}}(k,t)\eta _t^{(1)}+\frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{M}}(k,t)\eta _t^{(2)}\nonumber \\&\quad +\frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{M}}(k,t)\eta _t^{(3)}+{\mathcal {O}}\left( \frac{1}{\sqrt{k}}\right) . \end{aligned}$$
(64)

It is easy to show that the second term on the right hand side of (64) converges to 0 in mean, see Lemma 7. For the third term on the right hand side of (64),

$$\begin{aligned}&{\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{M}}(k,t)\eta _t^{(2)}\right\| \right] \le \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\\&\quad \left( {\mathbb {E}}\left[ \left\| \frac{1}{n}\sum _{j=1}^{n}\nabla g_j(x^*)\left( \nabla f_j(g_j(x_{j,t}))-\nabla f_j(z_{j,t})-{\textbf{T}}_j\left( z_{j,t}-g_j\left( x_{j,t}\right) \right) \right) \right\| \right] \right. \\&\qquad \left. +{\mathbb {E}}\left[ \left\| \nabla h({\bar{x}}_t)-\frac{1}{n}{\textbf{H}}({\bar{x}}_{t}-x^*)\right\| \right] \right. \\&\qquad \left. +{\mathbb {E}}\left[ \left\| \frac{1}{n}\sum _{j=1}^{n}\left( \nabla g_j(x_{j,t})-\nabla g_j(x^*)\right) \left( \nabla f_j(g_j(x_{j,t}))-\nabla f_j(z_{j,t})\right) \right\| \right] \right) \\&\qquad +\frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathbb {E}}\left[ \left\| P^{(1)}_t+P^{(3)}_t\right\| \right] \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }\\&\quad \left( \frac{1}{n}\sum _{j=1}^{n}\Vert \nabla g_j(x^*)\Vert {\mathbb {E}}\left[ \left\| z_{j,t}-g_j\left( x_{j,t}\right) \right\| ^{1+\gamma }\right] +{\mathbb {E}}\left[ \left\| {\bar{x}}_{t}-x^*\right\| ^{1+\gamma }\right] \right. \\&\qquad \left. +\frac{1}{n}\sum _{j=1}^{n}L_gL_f\sqrt{{\mathbb {E}}\left[ \left\| x_{j,t}-x^*\right\| ^2\right] {\mathbb {E}}\left[ \left\| g_j(x_{j,t})-z_{j,t}\right\| ^2\right] }\right) \\&\qquad +\frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathbb {E}}\left[ \left\| P^{(1)}_t+P^{(3)}_t\right\| \right] \\&\quad = \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathcal {O}}\left( \alpha _t^{(1+\gamma )/2}+\alpha _t\right) , \end{aligned}$$

where the first inequality follows from the definitions of \(\eta _t^{(2)}\), \(P_t^{(0)}\) and \(P_t^{(2)}\) in (63), (61) and (52), the second inequality follows from condition (d), Assumption 1 (a) and the Hölder inequality, the equality follows from (54)-(56) and Theorem 2. Then by the boundedness of \({\textbf{M}}(k,t)\) [26, Lemma 1 (ii)], the fact \(\sum _{k=1}^\infty \frac{\alpha _k^{(1+\gamma )/2}}{\sqrt{k}}<\infty \) and Kronecker Lemma, we have

$$\begin{aligned} {\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{M}}(k,t)\eta _t^{(2)}\right\| \right] \le \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }{\mathcal {O}}\left( \alpha _t^{(1+\gamma )/2}+\alpha _t\right) \longrightarrow 0. \end{aligned}$$

Noting that \({\eta _k^{(3)}}\) is a martingale difference sequence adapted to the filtration \({\mathcal {F}}_k\) (46), the fourth term on the right hand side of (64)

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{M}}(k,t)\eta _t^{(3)}\right\| ^2\right] \\&=\frac{1}{k}\sum _{t=1}^{k}{\mathbb {E}}\left[ \left\| {\textbf{M}}(k,t)\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( \frac{G_{j,t+1}^{(1)}-g_j(x_{j,t+1})-\left( G_{j,t+1}^{(2)}-g_j(x_{j,t})\right) }{n{\tilde{\alpha }}_t}\right. \right. \right. \\&\quad \left. +\frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\left( G_{j,t+1}^{(2)}-g_j(x_{j,t})-\left( G_j(x^*;\phi _{j,t+1}^{'})-g_j(x^*)\right) \right) \bigg )\bigg \Vert ^2\right] \\&\le \frac{1}{k}\sum _{t=1}^{k}\frac{1}{n}\sum _{j=1}^{n}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| {\textbf{M}}(k,t) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }^2\Vert \nabla g_j(x^*)\Vert ^2\Vert {\textbf{T}}_j\Vert ^{2}4\\&\quad \left( \left( \frac{L_g^{'}}{{\tilde{\alpha }}_t}\right) ^2{\mathbb {E}} \left[ \left\| x_{j,t+1}-x_{j,t}\right\| ^2\right] +\left( \frac{n\beta L_g^{'}}{{\textbf{u}}^\intercal {\textbf{v}}}\right) ^2{\mathbb {E}}\left[ \left\| x_{j,t}-x^*\right\| ^2\right] \right) \\&=\frac{1}{k}\sum _{t=1}^{k}{\mathcal {O}}\left( \alpha _t\right) , \end{aligned} \end{aligned}$$

where the inequality follows from the Lipschitz continuity of \(G_j(\cdot ;\phi )\), the second equality follows from (56), Theorem 2 and the fact

$$\begin{aligned} {\mathbb {E}}\left[ \left\| x_{j,t+1}-x_{j,t}\right\| ^2\right]\le & {} 3\left( {\mathbb {E}}\left[ \left\| x_{j,t+1}-{\bar{x}}_{t+1}\right\| ^2\right] \right. \\{} & {} \left. +{\mathbb {E}}\left[ \left\| x_{j,t}-{\bar{x}}_{t}\right\| ^2\right] +{\mathbb {E}}\left[ \left\| {\bar{x}}_{t+1}-{\bar{x}}_{t}\right\| ^2\right] \right) ={\mathcal {O}}\left( \alpha _t^2\right) . \end{aligned}$$

Then by Kronecker Lemma, \({\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{M}}(k,t)\eta _t^{(3)}\right\| ^2\right] =\frac{1}{k}\sum _{t=1}^{k}{\mathcal {O}}\left( \alpha _t\right) \longrightarrow 0.\)

It is left to show the asymptotic normality of the first term on the right hand side of (64). Indeed, by the similar way to [48, Lemma 6 in Appendix B], we may obtain that

$$\begin{aligned}{} & {} {\mathbb {E}}\left[ \left\| \frac{1}{\sqrt{k}}\sum _{t=1}^{k} P^{(4)}_k-\frac{1}{\sqrt{k}}\sum _{t=1}^{k}\left( \frac{{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_d\right) \epsilon _t^*\right\| ^2\right] \longrightarrow 0,\\{} & {} \frac{1}{\sqrt{k}}\sum _{t=1}^{k}\left( \frac{{\textbf{1}}^\intercal }{n}\otimes {\textbf{I}}_d\right) \epsilon _t^*{\mathop {\rightarrow }\limits ^{d}} N\left( {\textbf{0}},\frac{1}{n^2}{\textbf{S}}_1\right) \end{aligned}$$

and

$$\begin{aligned} \frac{1}{\sqrt{k}}\sum _{t=1}^{k}\frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\sum _{j=1}^{n}\nabla g_j(x^*){\textbf{T}}_j\left( G_j(x^*;\phi _{j,k+1}^{'})-g_j(x^*)\right) {\mathop {\rightarrow }\limits ^{d}} N\left( {\textbf{0}},\left( \frac{\beta }{{\textbf{u}}^\intercal {\textbf{v}}}\right) ^2{\textbf{S}}_2\right) , \end{aligned}$$

where

$$\begin{aligned}&\epsilon _t^*=\left[ \left( \nabla G_1(x^*;\phi _{1,t})\nabla F_1(g(x^*);\zeta _{1,t})-\nabla g_1(x^*;\phi _{1,t})\nabla f_1(g(x^*))\right) ^\intercal ,\cdots ,\right. \\&\quad \quad \left. \left( \nabla G_n(x^*;\phi _{n,t})\nabla F_n(g(x^*);\zeta _{n,t})-\nabla g_n(x^*;\phi _{n,t})\nabla f_n(g(x^*))\right) ^\intercal \right] ^\intercal . \end{aligned}$$

Note that \({\textbf{H}}_{\theta }^{-1}=\left( \begin{array}{cc} n{\textbf{H}}^{-1}&{} -\frac{{\textbf{u}}^\intercal {\textbf{v}}}{\beta }{\textbf{H}}^{-1}\\ {\textbf{0}}&{} \frac{{\textbf{u}}^\intercal {\textbf{v}}}{n\beta }{\textbf{I}}_d \end{array} \right) \) and \(\phi _{i,k}\) is independent of \(\phi _{i,k}^{'}\). Then

$$\begin{aligned} \frac{1}{\sqrt{k}}\sum _{t=1}^{k}{\textbf{H}}_{\theta }^{-1}\eta _t^{(1)}{\mathop {\longrightarrow }\limits ^{d}} N\left( {\textbf{0}},\left( \begin{array}{cc} {\textbf{H}}^{-1}\left( {\textbf{S}}_1+{\textbf{S}}_2\right) ({\textbf{H}}^{-1})^\intercal &{} -\frac{1}{n}{\textbf{H}}^{-1}{\textbf{S}}_2\\ -\frac{1}{n}{\textbf{S}}_2({\textbf{H}}^{-1})^\intercal &{} \frac{1}{n^2}{\textbf{S}}_2 \end{array} \right) \right) . \end{aligned}$$

The proof is complete. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, S., Liu, Y. Distributed stochastic compositional optimization problems over directed networks. Comput Optim Appl 87, 249–288 (2024). https://doi.org/10.1007/s10589-023-00512-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-023-00512-0

Keywords

Navigation