Skip to main content

Towards Accelerated Rates for Distributed Optimization over Time-Varying Networks

  • Conference paper
  • First Online:
Optimization and Applications (OPTIMA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13078))

Included in the following conference series:

Abstract

We study the problem of decentralized optimization with strongly convex smooth cost functions. This paper investigates accelerated algorithms under time-varying network constraints. In our approach, nodes run a multi-step gossip procedure after taking each gradient update, thus ensuring approximate consensus at each iteration. The outer cycle is based on accelerated Nesterov scheme. Both computation and communication complexities of our method have an optimal dependence on global function condition number \(\kappa _g\). In particular, the algorithm reaches an optimal computation complexity \(O(\sqrt{\kappa _g}\log (1/\varepsilon ))\).

The research of A. Rogozin was partially supported by RFBR 19-31-51001 and was partially done in Sirius (Sochi). The research of A. Gasnikov was partially supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) 075-00337-20-03, project no. 0714-2020-0005.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Nedić, A., Olshevsky, A., Uribe, C.A.: Fast convergence rates for distributed non-bayesian learning. IEEE Trans. Autom. Control 62(11), 5538–5553 (2017)

    Google Scholar 

  2. Ram, S.S., Veeravalli, V.V., Nedic, A.: Distributed non-autonomous power control through distributed convex optimization. In: IEEE INFOCOM 2009, pp. 3001–3005. IEEE (2009)

    Google Scholar 

  3. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 37–75 (2013). https://doi.org/10.1007/s10107-013-0677-5

  4. Devolder, O., Glineur, F., Nesterov, Yu.: First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers 2013016:47 (2013)

    Google Scholar 

  5. Jakovetić, D., Xavier, J., Moura, J.M.F.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)

    Google Scholar 

  6. Nedić, A., Olshevsky, A., Shi, W.: Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4), 2597–2633 (2017)

    Article  MathSciNet  Google Scholar 

  7. Scaman, K., Bach, F., Bubeck, S., Lee, Y.T., Massoulié, L.: Optimal algorithms for non-smooth distributed optimization in networks. In: Advances in Neural Information Processing Systems, pp. 2740–2749 (2018)

    Google Scholar 

  8. Pu, S., Shi, W., Xu, J., Nedich, A.: A push-pull gradient method for distributed optimization in networks. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 3385–3390 (2018)

    Google Scholar 

  9. Qu, G., Li, N.: Accelerated distributed Nesterov gradient descent. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (2016)

    Google Scholar 

  10. Shi, W., Ling, Q., Gang, W., Yin, W.: Extra: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2015)

    Article  MathSciNet  Google Scholar 

  11. Ye, H., Luo, L., Zhou, Z., Zhang, T.: Multi-consensus decentralized accelerated gradient descent. arXiv preprint arXiv:2005.00797 (2020)

  12. Li, H., Fang, C., Yin, W., Lin, Z.: A sharp convergence rate analysis for distributed accelerated gradient methods. arXiv:1810.01053 (2018)

  13. Nedic, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)

    Article  MathSciNet  Google Scholar 

  14. Scaman, K., Bach, F., Bubeck, S., Lee, Y.T., Massoulié, L.: Optimal algorithms for smooth and strongly convex distributed optimization in networks. In: International Conference on Machine Learning, pp. 3027–3036 (2017)

    Google Scholar 

  15. Jakovetic, D.: A unification and generalization of exact distributed first order methods. IEEE Trans. Signal Inf. Process. Netw. 31–46 (2019)

    Google Scholar 

  16. Rogozin, A., Gasnikov, A.: Projected gradient method for decentralized optimization over time-varying networks (2019). https://doi.org/10.1007/978-3-030-62867-3_18

  17. Dvinskikh, D., Gasnikov, A.: Decentralized and parallelized primal and dual accelerated methods for stochastic convex programming problems (2019). https://doi.org/10.1515/jiip-2020-0068

  18. Li, H., Lin, Z.: Revisiting extra for smooth distributed optimization (2020). https://doi.org/10.1137/18M122902X

  19. Hendrikx, H., Bach, F., Massoulie, L.: An optimal algorithm for decentralized finite sum optimization. arXiv preprint arXiv:2005.10675 (2020)

  20. Li, H., Lin, Z., Fang, Y.: Optimal accelerated variance reduced EXTRA and DIGing for strongly convex and smooth decentralized optimization. arXiv preprint arXiv:2009.04373 (2020)

  21. Wu, X., Lu, J.: Fenchel dual gradient methods for distributed convex optimization over time-varying networks. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2894–2899, December 2017

    Google Scholar 

  22. Zhang, G., Heusdens, R.: Distributed optimization using the primal-dual method of multipliers. IEEE Trans. Signal Inf. Process. Netw. 4(1), 173–187 (2018)

    MathSciNet  Google Scholar 

  23. Uribe, C.A., Lee, S., Gasnikov, A., Nedić, A.: A dual approach for optimal algorithms in distributed optimization over networks. Optim. Methods Softw. 1–40 (2020)

    Google Scholar 

  24. Arjevani, Y., Bruna, J., Can, B., Gürbüzbalaban, M., Jegelka, S., Lin, H.: Ideal: inexact decentralized accelerated augmented Lagrangian method. arXiv preprint arXiv:2006.06733 (2020)

  25. Wei, E., Ozdaglar, A.: Distributed alternating direction method of multipliers. In: 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp. 5445–5450. IEEE (2012)

    Google Scholar 

  26. Maros, M., Jaldén, J.: PANDA: a dual linearly converging method for distributed optimization over time-varying undirected graphs. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 6520–6525 (2018)

    Google Scholar 

  27. Tang, J., Egiazarian, K., Golbabaee, M., Davies, M.: The practicality of stochastic optimization in imaging inverse problems (2019). https://doi.org/10.1109/TCI.2020.3032101

  28. Stonyakin, F., et al.: Inexact relative smoothness and strong convexity for optimization and variational inequalities by inexact model. arXiv:2001.09013 (2020)

  29. Koloskova, A., Loizou, N., Boreiri, S., Jaggi, M., Stich, S.U.: A unified theory of decentralized SGD with changing topology and local updates (2020). http://proceedings.mlr.press/v119/koloskova20a.html

  30. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  31. Li, H., Fang, C., Yin, W., Lin, Z.: Decentralized accelerated gradient methods with increasing penalty parameters. IEEE Trans. Signal Process. 68, 4855–4870 (2020)

    Article  MathSciNet  Google Scholar 

  32. Arjevani, Y., Shamir, O.: Communication complexity of distributed convex learning and optimization. Adv. Neural Inf. Process. Syst. 28, 1756–1764 (2015)

    Google Scholar 

  33. Dvinskikh, D.M., Turin, A.I., Gasnikov, A.V., Omelchenko, S.S.: Accelerated and non accelerated stochastic gradient descent in model generality. Matematicheskie Zametki 108(4), 515–528 (2020)

    Google Scholar 

  34. Rogozin, A., Lukoshkin, V., Gasnikov, A., Kovalev, D., Shulgin, E.: Towards accelerated rates for distributed optimization over time-varying networks. arXiv preprint arXiv:2009.11069 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Rogozin .

Editor information

Editors and Affiliations

Appendices

Supplementary Material

A Proof of Theorem 3

First, note that Algorithm 2 comes down to the following iterative procedure in \(\mathbb {R}^d\).

$$\begin{aligned} \overline{y}^{k+1}&= \frac{\alpha ^{k+1}\overline{u}^k + A^k\overline{x}^k}{A^{k+1}} \\ \overline{u}^{k + 1}&= \mathop {\text {arg min}}\limits _{\overline{z}\in \mathbb {R}^d}\left\{ \alpha ^{k+1} \left( \left\langle \frac{1}{n}\sum _{i=1}^n \nabla f(y_i^{k+1}), \overline{z} - \overline{y}^{k+1} \right\rangle + \frac{\mu }{2}\left\| \overline{z} - \overline{y}^{k+1} \right\| ^2 \right) + \frac{1+ A^k\mu }{2}\left\| \overline{z} - \overline{u}^k \right\| ^2 \right\} \\ \overline{x}^{k+1}&= \frac{\alpha ^{k+1}\overline{u}^{k+1} + A^k\overline{x}^k}{A^{k+1}} \end{aligned}$$

1.1 A.1 Outer Loop

Initially we recall basic properties of coefficients \(A^k\) which immediately follow from Lemma 3.7 in [28] (for details see the full technical report of this paper [34]).

Lemma 2

For coefficients \(A^k\), it holds

Lemma 3

Provided that consensus accuracy is \(\delta '\), i.e. \(\left\| \mathbf{U}^{j} - \overline{\mathbf{U}}^j \right\| ^2\leqslant \delta ' \text { for } j = 1, \ldots , k\), we have

$$\begin{aligned} f(\overline{x}^k) - f(x^*)&\leqslant \frac{\left\| \overline{u}^0 - x^* \right\| ^2}{2A^k} + \frac{2\sum _{j=1}^{k} A^j\delta }{A^k} \\ \left\| \overline{u}^k - x^* \right\| ^2&\leqslant \frac{\left\| \overline{u}^0 - x^* \right\| ^2}{1 + A^k\mu } + \frac{4\sum _{j=1}^{k} A^j\delta }{1 + A^k\mu } \end{aligned}$$

where \(\delta \) is given in (5).

Proof

First, assuming that \(\left\| \mathbf{U}^{j} - \overline{\mathbf{U}}^j \right\| ^2\leqslant \delta '\), we show that \(\mathbf{Y}^j, \mathbf{U}^j, \mathbf{X}^j\) lie in \(\sqrt{\delta '}\)-neighborhood of \(\mathcal {C}\) by induction. At \(j=0\), we have \(\left\| \mathbf{X}^0 - \overline{\mathbf{X}}^0 \right\| = \left\| \mathbf{U}^0 - \overline{\mathbf{U}}^0 \right\| = 0\). Using \(A^{j+1} = A^j + \alpha ^j\), we get an induction pass \(j\rightarrow j+1\).

$$\begin{aligned} \left\| \mathbf{Y}^{j+1} - \overline{\mathbf{Y}}^{j+1} \right\|&\leqslant \frac{\alpha ^{j+1}}{A^{j+1}}\left\| \mathbf{U}^j - \overline{\mathbf{U}}^j \right\| + \frac{A^j}{A^{j+1}} \left\| \mathbf{X}^j - \overline{\mathbf{X}}^j \right\| \leqslant \sqrt{\delta '} \\ \left\| \mathbf{X}^{j+1} - \overline{\mathbf{X}}^{j+1} \right\|&\leqslant \frac{\alpha ^{j+1}}{A^{j+1}}\left\| \mathbf{U}^{j+1} - \overline{\mathbf{U}}^{j+1} \right\| + \frac{A^j}{A^{j+1}} \left\| \mathbf{X}^j - \overline{\mathbf{X}}^j \right\| \leqslant \sqrt{\delta '} \end{aligned}$$

Therefore, \(g(\overline{y}) = \frac{1}{n}\sum _{i=1}^n \nabla f(y_i)\) is a gradient from \((\delta , L, \mu )\)-model of f, and the desired result directly follows from Theorem 3.4 in [28].

1.2 A.2 Consensus Subroutine Iterations

We specify the number of iteration required for reaching accuracy \(\delta '\) in the following Lemma, which is proved in the extended version of this paper [34].

Lemma 4

Let consensus accuracy be maintained at level \(\delta '\), i.e. \(\left\| \mathbf{U}^j - \overline{\mathbf{U}}^j \right\| ^2\leqslant \delta ' \text { for } j = 1, \ldots , k\) and let Assumption 2 hold. Define

$$\begin{aligned} \sqrt{D} := \left( \frac{2{L_{l}}}{\sqrt{L\mu }} + 1 \right) \sqrt{\delta '} + \frac{{L_{l}}}{\mu } \sqrt{n} \left( \left\| \overline{u}^0 - x^* \right\| ^2 + \frac{8{\delta '}}{\sqrt{L\mu }}\right) ^{1/2} + \frac{2\left\| \nabla F(\mathbf{X}^*) \right\| }{\sqrt{L\mu }} \end{aligned}$$

Then it is sufficient to make \(T_k = T = \frac{\tau }{2\lambda }\log \frac{D}{\delta '}\) consensus iterations in order to ensure \(\delta '\)-accuracy on step \(k+1\), i.e. \(\left\| \mathbf{U}^{k+1} - \overline{\mathbf{U}}^{k+1} \right\| ^2\leqslant \delta '\).

1.3 A.3 Putting the Proof Together

Let us show that choice of number of subroutine iterations \(T_k = T\) yields

$$\begin{aligned}&f(\overline{x}^k) - f(x^*)\leqslant \frac{\left\| \overline{u}^0 - x^* \right\| ^2}{2A^k} + \frac{2\sum _{j=1}^k A^j\delta }{A^k} \end{aligned}$$

by induction. At \(k=0\), we have \(\left\| \mathbf{U}^0 - \overline{\mathbf{U}}^0 \right\| = 0\) and by Lemma 3 it holds

$$\begin{aligned} f(\overline{x}^1) - f(x^*) \leqslant \frac{\left\| \overline{u}^0 - x^* \right\| ^2}{2A^1} + \frac{2A^1\delta }{A^1}. \end{aligned}$$

For induction pass, assume that \(\left\| \mathbf{U}^j - \overline{\mathbf{U}}^j \right\| ^2\leqslant \delta '\) for \(j = 0,\ldots , k\). By Lemma 4, if we set \(T_k = T\), then \(\left\| \mathbf{U}^{k+1} - \overline{\mathbf{U}}^{k+1} \right\| ^2\leqslant \delta '\). Applying Lemma 3 again, we get

$$\begin{aligned}&f(\overline{x}^{k+1}) - f(x^*)\leqslant \frac{\left\| \overline{u}^0 - x^* \right\| ^2}{2A^{k+1}} + \frac{2\sum _{j=1}^{k+1} A^j\delta }{A^{k+1}} \end{aligned}$$

Recalling a bound on \(A^k\) from Lemma 2 gives

Here in we used the definition of \(L, \mu \) in (10): \(L = 2{L_{g}},~ \mu = \frac{{\mu _{g}}}{2}\). For \(\varepsilon \)-accuracy:

It is sufficient to choose

Let us estimate the term \(\frac{D}{\delta '}\) under \(\log \).

where \(D_1, D_2\) are defined in 11. Finally, the total number of iterations is

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rogozin, A., Lukoshkin, V., Gasnikov, A., Kovalev, D., Shulgin, E. (2021). Towards Accelerated Rates for Distributed Optimization over Time-Varying Networks. In: Olenev, N.N., Evtushenko, Y.G., Jaćimović, M., Khachay, M., Malkova, V. (eds) Optimization and Applications. OPTIMA 2021. Lecture Notes in Computer Science(), vol 13078. Springer, Cham. https://doi.org/10.1007/978-3-030-91059-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91059-4_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91058-7

  • Online ISBN: 978-3-030-91059-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics