Abstract
We study the problem of decentralized optimization with strongly convex smooth cost functions. This paper investigates accelerated algorithms under time-varying network constraints. In our approach, nodes run a multi-step gossip procedure after taking each gradient update, thus ensuring approximate consensus at each iteration. The outer cycle is based on accelerated Nesterov scheme. Both computation and communication complexities of our method have an optimal dependence on global function condition number \(\kappa _g\). In particular, the algorithm reaches an optimal computation complexity \(O(\sqrt{\kappa _g}\log (1/\varepsilon ))\).
The research of A. Rogozin was partially supported by RFBR 19-31-51001 and was partially done in Sirius (Sochi). The research of A. Gasnikov was partially supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) 075-00337-20-03, project no. 0714-2020-0005.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Nedić, A., Olshevsky, A., Uribe, C.A.: Fast convergence rates for distributed non-bayesian learning. IEEE Trans. Autom. Control 62(11), 5538–5553 (2017)
Ram, S.S., Veeravalli, V.V., Nedic, A.: Distributed non-autonomous power control through distributed convex optimization. In: IEEE INFOCOM 2009, pp. 3001–3005. IEEE (2009)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 37–75 (2013). https://doi.org/10.1007/s10107-013-0677-5
Devolder, O., Glineur, F., Nesterov, Yu.: First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers 2013016:47 (2013)
Jakovetić, D., Xavier, J., Moura, J.M.F.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)
Nedić, A., Olshevsky, A., Shi, W.: Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4), 2597–2633 (2017)
Scaman, K., Bach, F., Bubeck, S., Lee, Y.T., Massoulié, L.: Optimal algorithms for non-smooth distributed optimization in networks. In: Advances in Neural Information Processing Systems, pp. 2740–2749 (2018)
Pu, S., Shi, W., Xu, J., Nedich, A.: A push-pull gradient method for distributed optimization in networks. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 3385–3390 (2018)
Qu, G., Li, N.: Accelerated distributed Nesterov gradient descent. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (2016)
Shi, W., Ling, Q., Gang, W., Yin, W.: Extra: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2015)
Ye, H., Luo, L., Zhou, Z., Zhang, T.: Multi-consensus decentralized accelerated gradient descent. arXiv preprint arXiv:2005.00797 (2020)
Li, H., Fang, C., Yin, W., Lin, Z.: A sharp convergence rate analysis for distributed accelerated gradient methods. arXiv:1810.01053 (2018)
Nedic, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)
Scaman, K., Bach, F., Bubeck, S., Lee, Y.T., Massoulié, L.: Optimal algorithms for smooth and strongly convex distributed optimization in networks. In: International Conference on Machine Learning, pp. 3027–3036 (2017)
Jakovetic, D.: A unification and generalization of exact distributed first order methods. IEEE Trans. Signal Inf. Process. Netw. 31–46 (2019)
Rogozin, A., Gasnikov, A.: Projected gradient method for decentralized optimization over time-varying networks (2019). https://doi.org/10.1007/978-3-030-62867-3_18
Dvinskikh, D., Gasnikov, A.: Decentralized and parallelized primal and dual accelerated methods for stochastic convex programming problems (2019). https://doi.org/10.1515/jiip-2020-0068
Li, H., Lin, Z.: Revisiting extra for smooth distributed optimization (2020). https://doi.org/10.1137/18M122902X
Hendrikx, H., Bach, F., Massoulie, L.: An optimal algorithm for decentralized finite sum optimization. arXiv preprint arXiv:2005.10675 (2020)
Li, H., Lin, Z., Fang, Y.: Optimal accelerated variance reduced EXTRA and DIGing for strongly convex and smooth decentralized optimization. arXiv preprint arXiv:2009.04373 (2020)
Wu, X., Lu, J.: Fenchel dual gradient methods for distributed convex optimization over time-varying networks. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2894–2899, December 2017
Zhang, G., Heusdens, R.: Distributed optimization using the primal-dual method of multipliers. IEEE Trans. Signal Inf. Process. Netw. 4(1), 173–187 (2018)
Uribe, C.A., Lee, S., Gasnikov, A., Nedić, A.: A dual approach for optimal algorithms in distributed optimization over networks. Optim. Methods Softw. 1–40 (2020)
Arjevani, Y., Bruna, J., Can, B., Gürbüzbalaban, M., Jegelka, S., Lin, H.: Ideal: inexact decentralized accelerated augmented Lagrangian method. arXiv preprint arXiv:2006.06733 (2020)
Wei, E., Ozdaglar, A.: Distributed alternating direction method of multipliers. In: 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp. 5445–5450. IEEE (2012)
Maros, M., Jaldén, J.: PANDA: a dual linearly converging method for distributed optimization over time-varying undirected graphs. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 6520–6525 (2018)
Tang, J., Egiazarian, K., Golbabaee, M., Davies, M.: The practicality of stochastic optimization in imaging inverse problems (2019). https://doi.org/10.1109/TCI.2020.3032101
Stonyakin, F., et al.: Inexact relative smoothness and strong convexity for optimization and variational inequalities by inexact model. arXiv:2001.09013 (2020)
Koloskova, A., Loizou, N., Boreiri, S., Jaggi, M., Stich, S.U.: A unified theory of decentralized SGD with changing topology and local updates (2020). http://proceedings.mlr.press/v119/koloskova20a.html
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Li, H., Fang, C., Yin, W., Lin, Z.: Decentralized accelerated gradient methods with increasing penalty parameters. IEEE Trans. Signal Process. 68, 4855–4870 (2020)
Arjevani, Y., Shamir, O.: Communication complexity of distributed convex learning and optimization. Adv. Neural Inf. Process. Syst. 28, 1756–1764 (2015)
Dvinskikh, D.M., Turin, A.I., Gasnikov, A.V., Omelchenko, S.S.: Accelerated and non accelerated stochastic gradient descent in model generality. Matematicheskie Zametki 108(4), 515–528 (2020)
Rogozin, A., Lukoshkin, V., Gasnikov, A., Kovalev, D., Shulgin, E.: Towards accelerated rates for distributed optimization over time-varying networks. arXiv preprint arXiv:2009.11069 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Supplementary Material
A Proof of Theorem 3
First, note that Algorithm 2 comes down to the following iterative procedure in \(\mathbb {R}^d\).
1.1 A.1 Outer Loop
Initially we recall basic properties of coefficients \(A^k\) which immediately follow from Lemma 3.7 in [28] (for details see the full technical report of this paper [34]).
Lemma 2
For coefficients \(A^k\), it holds
Lemma 3
Provided that consensus accuracy is \(\delta '\), i.e. \(\left\| \mathbf{U}^{j} - \overline{\mathbf{U}}^j \right\| ^2\leqslant \delta ' \text { for } j = 1, \ldots , k\), we have
where \(\delta \) is given in (5).
Proof
First, assuming that \(\left\| \mathbf{U}^{j} - \overline{\mathbf{U}}^j \right\| ^2\leqslant \delta '\), we show that \(\mathbf{Y}^j, \mathbf{U}^j, \mathbf{X}^j\) lie in \(\sqrt{\delta '}\)-neighborhood of \(\mathcal {C}\) by induction. At \(j=0\), we have \(\left\| \mathbf{X}^0 - \overline{\mathbf{X}}^0 \right\| = \left\| \mathbf{U}^0 - \overline{\mathbf{U}}^0 \right\| = 0\). Using \(A^{j+1} = A^j + \alpha ^j\), we get an induction pass \(j\rightarrow j+1\).
Therefore, \(g(\overline{y}) = \frac{1}{n}\sum _{i=1}^n \nabla f(y_i)\) is a gradient from \((\delta , L, \mu )\)-model of f, and the desired result directly follows from Theorem 3.4 in [28].
1.2 A.2 Consensus Subroutine Iterations
We specify the number of iteration required for reaching accuracy \(\delta '\) in the following Lemma, which is proved in the extended version of this paper [34].
Lemma 4
Let consensus accuracy be maintained at level \(\delta '\), i.e. \(\left\| \mathbf{U}^j - \overline{\mathbf{U}}^j \right\| ^2\leqslant \delta ' \text { for } j = 1, \ldots , k\) and let Assumption 2 hold. Define
Then it is sufficient to make \(T_k = T = \frac{\tau }{2\lambda }\log \frac{D}{\delta '}\) consensus iterations in order to ensure \(\delta '\)-accuracy on step \(k+1\), i.e. \(\left\| \mathbf{U}^{k+1} - \overline{\mathbf{U}}^{k+1} \right\| ^2\leqslant \delta '\).
1.3 A.3 Putting the Proof Together
Let us show that choice of number of subroutine iterations \(T_k = T\) yields
by induction. At \(k=0\), we have \(\left\| \mathbf{U}^0 - \overline{\mathbf{U}}^0 \right\| = 0\) and by Lemma 3 it holds
For induction pass, assume that \(\left\| \mathbf{U}^j - \overline{\mathbf{U}}^j \right\| ^2\leqslant \delta '\) for \(j = 0,\ldots , k\). By Lemma 4, if we set \(T_k = T\), then \(\left\| \mathbf{U}^{k+1} - \overline{\mathbf{U}}^{k+1} \right\| ^2\leqslant \delta '\). Applying Lemma 3 again, we get
Recalling a bound on \(A^k\) from Lemma 2 gives
Here in we used the definition of \(L, \mu \) in (10): \(L = 2{L_{g}},~ \mu = \frac{{\mu _{g}}}{2}\). For \(\varepsilon \)-accuracy:
It is sufficient to choose
Let us estimate the term \(\frac{D}{\delta '}\) under \(\log \).
where \(D_1, D_2\) are defined in 11. Finally, the total number of iterations is
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Rogozin, A., Lukoshkin, V., Gasnikov, A., Kovalev, D., Shulgin, E. (2021). Towards Accelerated Rates for Distributed Optimization over Time-Varying Networks. In: Olenev, N.N., Evtushenko, Y.G., Jaćimović, M., Khachay, M., Malkova, V. (eds) Optimization and Applications. OPTIMA 2021. Lecture Notes in Computer Science(), vol 13078. Springer, Cham. https://doi.org/10.1007/978-3-030-91059-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-91059-4_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91058-7
Online ISBN: 978-3-030-91059-4
eBook Packages: Computer ScienceComputer Science (R0)