Towards Accelerated Rates for Distributed Optimization over Time-Varying Networks

Rogozin, Alexander; Lukoshkin, Vladislav; Gasnikov, Alexander; Kovalev, Dmitry; Shulgin, Egor

doi:10.1007/978-3-030-91059-4_19

Alexander Rogozin¹³,
Vladislav Lukoshkin¹⁴,
Alexander Gasnikov¹³,
Dmitry Kovalev¹⁵ &
…
Egor Shulgin¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13078))

Included in the following conference series:

International Conference on Optimization and Applications

738 Accesses
7 Citations

Abstract

We study the problem of decentralized optimization with strongly convex smooth cost functions. This paper investigates accelerated algorithms under time-varying network constraints. In our approach, nodes run a multi-step gossip procedure after taking each gradient update, thus ensuring approximate consensus at each iteration. The outer cycle is based on accelerated Nesterov scheme. Both computation and communication complexities of our method have an optimal dependence on global function condition number $\kappa _g$. In particular, the algorithm reaches an optimal computation complexity $O(\sqrt{\kappa _g}\log (1/\varepsilon ))$.

The research of A. Rogozin was partially supported by RFBR 19-31-51001 and was partially done in Sirius (Sochi). The research of A. Gasnikov was partially supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) 075-00337-20-03, project no. 0714-2020-0005.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Nedić, A., Olshevsky, A., Uribe, C.A.: Fast convergence rates for distributed non-bayesian learning. IEEE Trans. Autom. Control 62(11), 5538–5553 (2017)
Google Scholar
Ram, S.S., Veeravalli, V.V., Nedic, A.: Distributed non-autonomous power control through distributed convex optimization. In: IEEE INFOCOM 2009, pp. 3001–3005. IEEE (2009)
Google Scholar
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 37–75 (2013). https://doi.org/10.1007/s10107-013-0677-5
Devolder, O., Glineur, F., Nesterov, Yu.: First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers 2013016:47 (2013)
Google Scholar
Jakovetić, D., Xavier, J., Moura, J.M.F.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)
Google Scholar
Nedić, A., Olshevsky, A., Shi, W.: Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4), 2597–2633 (2017)
Article MathSciNet Google Scholar
Scaman, K., Bach, F., Bubeck, S., Lee, Y.T., Massoulié, L.: Optimal algorithms for non-smooth distributed optimization in networks. In: Advances in Neural Information Processing Systems, pp. 2740–2749 (2018)
Google Scholar
Pu, S., Shi, W., Xu, J., Nedich, A.: A push-pull gradient method for distributed optimization in networks. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 3385–3390 (2018)
Google Scholar
Qu, G., Li, N.: Accelerated distributed Nesterov gradient descent. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (2016)
Google Scholar
Shi, W., Ling, Q., Gang, W., Yin, W.: Extra: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2015)
Article MathSciNet Google Scholar
Ye, H., Luo, L., Zhou, Z., Zhang, T.: Multi-consensus decentralized accelerated gradient descent. arXiv preprint arXiv:2005.00797 (2020)
Li, H., Fang, C., Yin, W., Lin, Z.: A sharp convergence rate analysis for distributed accelerated gradient methods. arXiv:1810.01053 (2018)
Nedic, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)
Article MathSciNet Google Scholar
Scaman, K., Bach, F., Bubeck, S., Lee, Y.T., Massoulié, L.: Optimal algorithms for smooth and strongly convex distributed optimization in networks. In: International Conference on Machine Learning, pp. 3027–3036 (2017)
Google Scholar
Jakovetic, D.: A unification and generalization of exact distributed first order methods. IEEE Trans. Signal Inf. Process. Netw. 31–46 (2019)
Google Scholar
Rogozin, A., Gasnikov, A.: Projected gradient method for decentralized optimization over time-varying networks (2019). https://doi.org/10.1007/978-3-030-62867-3_18
Dvinskikh, D., Gasnikov, A.: Decentralized and parallelized primal and dual accelerated methods for stochastic convex programming problems (2019). https://doi.org/10.1515/jiip-2020-0068
Li, H., Lin, Z.: Revisiting extra for smooth distributed optimization (2020). https://doi.org/10.1137/18M122902X
Hendrikx, H., Bach, F., Massoulie, L.: An optimal algorithm for decentralized finite sum optimization. arXiv preprint arXiv:2005.10675 (2020)
Li, H., Lin, Z., Fang, Y.: Optimal accelerated variance reduced EXTRA and DIGing for strongly convex and smooth decentralized optimization. arXiv preprint arXiv:2009.04373 (2020)
Wu, X., Lu, J.: Fenchel dual gradient methods for distributed convex optimization over time-varying networks. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2894–2899, December 2017
Google Scholar
Zhang, G., Heusdens, R.: Distributed optimization using the primal-dual method of multipliers. IEEE Trans. Signal Inf. Process. Netw. 4(1), 173–187 (2018)
MathSciNet Google Scholar
Uribe, C.A., Lee, S., Gasnikov, A., Nedić, A.: A dual approach for optimal algorithms in distributed optimization over networks. Optim. Methods Softw. 1–40 (2020)
Google Scholar
Arjevani, Y., Bruna, J., Can, B., Gürbüzbalaban, M., Jegelka, S., Lin, H.: Ideal: inexact decentralized accelerated augmented Lagrangian method. arXiv preprint arXiv:2006.06733 (2020)
Wei, E., Ozdaglar, A.: Distributed alternating direction method of multipliers. In: 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp. 5445–5450. IEEE (2012)
Google Scholar
Maros, M., Jaldén, J.: PANDA: a dual linearly converging method for distributed optimization over time-varying undirected graphs. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 6520–6525 (2018)
Google Scholar
Tang, J., Egiazarian, K., Golbabaee, M., Davies, M.: The practicality of stochastic optimization in imaging inverse problems (2019). https://doi.org/10.1109/TCI.2020.3032101
Stonyakin, F., et al.: Inexact relative smoothness and strong convexity for optimization and variational inequalities by inexact model. arXiv:2001.09013 (2020)
Koloskova, A., Loizou, N., Boreiri, S., Jaggi, M., Stich, S.U.: A unified theory of decentralized SGD with changing topology and local updates (2020). http://proceedings.mlr.press/v119/koloskova20a.html
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Li, H., Fang, C., Yin, W., Lin, Z.: Decentralized accelerated gradient methods with increasing penalty parameters. IEEE Trans. Signal Process. 68, 4855–4870 (2020)
Article MathSciNet Google Scholar
Arjevani, Y., Shamir, O.: Communication complexity of distributed convex learning and optimization. Adv. Neural Inf. Process. Syst. 28, 1756–1764 (2015)
Google Scholar
Dvinskikh, D.M., Turin, A.I., Gasnikov, A.V., Omelchenko, S.S.: Accelerated and non accelerated stochastic gradient descent in model generality. Matematicheskie Zametki 108(4), 515–528 (2020)
Google Scholar
Rogozin, A., Lukoshkin, V., Gasnikov, A., Kovalev, D., Shulgin, E.: Towards accelerated rates for distributed optimization over time-varying networks. arXiv preprint arXiv:2009.11069 (2020)

Download references

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Alexander Rogozin & Alexander Gasnikov
Skolkovo Institute of Science and Technology, Moscow, Russia
Vladislav Lukoshkin
King Abdullah Institute of Science and Technology, Thuwal, Saudi Arabia
Dmitry Kovalev & Egor Shulgin

Authors

Alexander Rogozin
View author publications
You can also search for this author in PubMed Google Scholar
Vladislav Lukoshkin
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gasnikov
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Kovalev
View author publications
You can also search for this author in PubMed Google Scholar
Egor Shulgin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Rogozin .

Editor information

Editors and Affiliations

Dorodnicyn Computing Centre, FRC CSC RAS, Moscow, Russia
Nicholas N. Olenev
Dorodnicyn Computing Centre, FRC CSC RAS, Moscow, Russia
Yuri G. Evtushenko
University of Montenegro, Podgorica, Montenegro
Milojica Jaćimović
Krasovsky Institute of Mathematics and Mechanics, Ekaterinburg, Russia
Michael Khachay
Dorodnicyn Computing Centre, FRC CSC RAS, Moscow, Russia
Vlasta Malkova

Appendices

Supplementary Material

A Proof of Theorem 3

First, note that Algorithm 2 comes down to the following iterative procedure in $\mathbb {R}^d$.

$$\begin{aligned} \overline{y}^{k+1}&= \frac{\alpha ^{k+1}\overline{u}^k + A^k\overline{x}^k}{A^{k+1}} \\ \overline{u}^{k + 1}&= \mathop {\text {arg min}}\limits _{\overline{z}\in \mathbb {R}^d}\left\{ \alpha ^{k+1} \left( \left\langle \frac{1}{n}\sum _{i=1}^n \nabla f(y_i^{k+1}), \overline{z} - \overline{y}^{k+1} \right\rangle + \frac{\mu }{2}\left\| \overline{z} - \overline{y}^{k+1} \right\| ^2 \right) + \frac{1+ A^k\mu }{2}\left\| \overline{z} - \overline{u}^k \right\| ^2 \right\} \\ \overline{x}^{k+1}&= \frac{\alpha ^{k+1}\overline{u}^{k+1} + A^k\overline{x}^k}{A^{k+1}} \end{aligned}$$

1.1 A.1 Outer Loop

Initially we recall basic properties of coefficients $A^k$ which immediately follow from Lemma 3.7 in [28] (for details see the full technical report of this paper [34]).

Lemma 2

For coefficients $A^k$, it holds

Lemma 3

Provided that consensus accuracy is $\delta '$, i.e. $\left\| \mathbf{U}^{j} - \overline{\mathbf{U}}^j \right\| ^2\leqslant \delta ' \text { for } j = 1, \ldots , k$, we have

$$\begin{aligned} f(\overline{x}^k) - f(x^*)&\leqslant \frac{\left\| \overline{u}^0 - x^* \right\| ^2}{2A^k} + \frac{2\sum _{j=1}^{k} A^j\delta }{A^k} \\ \left\| \overline{u}^k - x^* \right\| ^2&\leqslant \frac{\left\| \overline{u}^0 - x^* \right\| ^2}{1 + A^k\mu } + \frac{4\sum _{j=1}^{k} A^j\delta }{1 + A^k\mu } \end{aligned}$$

where $\delta $ is given in (5).

Proof

First, assuming that $\left\| \mathbf{U}^{j} - \overline{\mathbf{U}}^j \right\| ^2\leqslant \delta '$, we show that $\mathbf{Y}^j, \mathbf{U}^j, \mathbf{X}^j$ lie in $\sqrt{\delta '}$-neighborhood of $\mathcal {C}$ by induction. At $j=0$, we have $\left\| \mathbf{X}^0 - \overline{\mathbf{X}}^0 \right\| = \left\| \mathbf{U}^0 - \overline{\mathbf{U}}^0 \right\| = 0$. Using $A^{j+1} = A^j + \alpha ^j$, we get an induction pass $j\rightarrow j+1$.

$$\begin{aligned} \left\| \mathbf{Y}^{j+1} - \overline{\mathbf{Y}}^{j+1} \right\|&\leqslant \frac{\alpha ^{j+1}}{A^{j+1}}\left\| \mathbf{U}^j - \overline{\mathbf{U}}^j \right\| + \frac{A^j}{A^{j+1}} \left\| \mathbf{X}^j - \overline{\mathbf{X}}^j \right\| \leqslant \sqrt{\delta '} \\ \left\| \mathbf{X}^{j+1} - \overline{\mathbf{X}}^{j+1} \right\|&\leqslant \frac{\alpha ^{j+1}}{A^{j+1}}\left\| \mathbf{U}^{j+1} - \overline{\mathbf{U}}^{j+1} \right\| + \frac{A^j}{A^{j+1}} \left\| \mathbf{X}^j - \overline{\mathbf{X}}^j \right\| \leqslant \sqrt{\delta '} \end{aligned}$$

Therefore, $g(\overline{y}) = \frac{1}{n}\sum _{i=1}^n \nabla f(y_i)$ is a gradient from $(\delta , L, \mu )$-model of f, and the desired result directly follows from Theorem 3.4 in [28].

1.2 A.2 Consensus Subroutine Iterations

We specify the number of iteration required for reaching accuracy $\delta '$ in the following Lemma, which is proved in the extended version of this paper [34].

Lemma 4

Let consensus accuracy be maintained at level $\delta '$, i.e. $\left\| \mathbf{U}^j - \overline{\mathbf{U}}^j \right\| ^2\leqslant \delta ' \text { for } j = 1, \ldots , k$ and let Assumption 2 hold. Define

$$\begin{aligned} \sqrt{D} := \left( \frac{2{L_{l}}}{\sqrt{L\mu }} + 1 \right) \sqrt{\delta '} + \frac{{L_{l}}}{\mu } \sqrt{n} \left( \left\| \overline{u}^0 - x^* \right\| ^2 + \frac{8{\delta '}}{\sqrt{L\mu }}\right) ^{1/2} + \frac{2\left\| \nabla F(\mathbf{X}^*) \right\| }{\sqrt{L\mu }} \end{aligned}$$

Then it is sufficient to make $T_k = T = \frac{\tau }{2\lambda }\log \frac{D}{\delta '}$ consensus iterations in order to ensure $\delta '$-accuracy on step $k+1$, i.e. $\left\| \mathbf{U}^{k+1} - \overline{\mathbf{U}}^{k+1} \right\| ^2\leqslant \delta '$.

1.3 A.3 Putting the Proof Together

Let us show that choice of number of subroutine iterations $T_k = T$ yields

$$\begin{aligned}&f(\overline{x}^k) - f(x^*)\leqslant \frac{\left\| \overline{u}^0 - x^* \right\| ^2}{2A^k} + \frac{2\sum _{j=1}^k A^j\delta }{A^k} \end{aligned}$$

by induction. At $k=0$, we have $\left\| \mathbf{U}^0 - \overline{\mathbf{U}}^0 \right\| = 0$ and by Lemma 3 it holds

$$\begin{aligned} f(\overline{x}^1) - f(x^*) \leqslant \frac{\left\| \overline{u}^0 - x^* \right\| ^2}{2A^1} + \frac{2A^1\delta }{A^1}. \end{aligned}$$

For induction pass, assume that $\left\| \mathbf{U}^j - \overline{\mathbf{U}}^j \right\| ^2\leqslant \delta '$ for $j = 0,\ldots , k$. By Lemma 4, if we set $T_k = T$, then $\left\| \mathbf{U}^{k+1} - \overline{\mathbf{U}}^{k+1} \right\| ^2\leqslant \delta '$. Applying Lemma 3 again, we get

$$\begin{aligned}&f(\overline{x}^{k+1}) - f(x^*)\leqslant \frac{\left\| \overline{u}^0 - x^* \right\| ^2}{2A^{k+1}} + \frac{2\sum _{j=1}^{k+1} A^j\delta }{A^{k+1}} \end{aligned}$$

Recalling a bound on $A^k$ from Lemma 2 gives

Here in we used the definition of $L, \mu $ in (10): $L = 2{L_{g}},~ \mu = \frac{{\mu _{g}}}{2}$. For $\varepsilon $-accuracy:

It is sufficient to choose

Let us estimate the term $\frac{D}{\delta '}$ under $\log $.

where $D_1, D_2$ are defined in 11. Finally, the total number of iterations is

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rogozin, A., Lukoshkin, V., Gasnikov, A., Kovalev, D., Shulgin, E. (2021). Towards Accelerated Rates for Distributed Optimization over Time-Varying Networks. In: Olenev, N.N., Evtushenko, Y.G., Jaćimović, M., Khachay, M., Malkova, V. (eds) Optimization and Applications. OPTIMA 2021. Lecture Notes in Computer Science(), vol 13078. Springer, Cham. https://doi.org/10.1007/978-3-030-91059-4_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-91059-4_19
Published: 05 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91058-7
Online ISBN: 978-3-030-91059-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Accelerated Rates for Distributed Optimization over Time-Varying Networks