Exact spectral-like gradient method for distributed optimization

Jakovetić, Dušan; Krejić, Nataša; Krklec Jerinkić, Nataša

doi:10.1007/s10589-019-00131-8

Exact spectral-like gradient method for distributed optimization

Published: 19 September 2019

Volume 74, pages 703–728, (2019)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Dušan Jakovetić¹,
Nataša Krejić¹ &
Nataša Krklec Jerinkić¹

302 Accesses
7 Citations
Explore all metrics

Abstract

Since the initial proposal in the late 80s, spectral gradient methods continue to receive significant attention, especially due to their excellent numerical performance on various large scale applications. However, to date, they have not been sufficiently explored in the context of distributed optimization. In this paper, we consider unconstrained distributed optimization problems where n nodes constitute an arbitrary connected network and collaboratively minimize the sum of their local convex cost functions. In this setting, building from existing exact distributed gradient methods, we propose a novel exact distributed gradient method wherein nodes’ step-sizes are designed according to the novel rules akin to those in spectral gradient methods. We refer to the proposed method as Distributed Spectral Gradient method. The method exhibits R-linear convergence under standard assumptions for the nodes’ local costs and safeguarding on the algorithm step-sizes. We illustrate the method’s performance through simulation examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An augmented Lagrangian method for distributed optimization

Article 05 September 2014

Randomized difference-based gradient-free algorithm for distributed resource allocation

Article 16 March 2022

Distributed accelerated optimization algorithms: Insights from an ODE

Article 19 June 2020

Notes

Lemma 4.3 can be proved similarly, if we work with representation (9)–(10) instead of (7)–(8). The corresponding error recursion matrix then becomes as in (20), with $\Sigma _k^{-1}=\alpha \,I$ and $H=I$. The matrix has the same blocks as E up to a permutation, and the results through the alternative analysis will be equivalent.

References

Schizas, I.D., Ribeiro, A., Giannakis, G.B.: Consensus in ad hoc WSNs with noisy links—Part I: distributed estimation of deterministic signals. IEEE Trans. Signal Process. 56(1), 350–364 (2009)
Article Google Scholar
Kar, S., Moura, J.M.F., Ramanan, K.: Distributed parameter estimation in sensor networks: nonlinear observation models and imperfect communication. IEEE Trans. Inf. Theory 58(6), 3575–3605 (2012)
Article MathSciNet Google Scholar
Lopes, C., Sayed, A.H.: Adaptive estimation algorithms over distributed networks. In: 21st IEICE Signal Processing Symposium. Kyoto, Japan (2006)
Cattivelli, F., Sayed, A.H.: Diffusion LMS strategies for distributed estimation. IEEE Trans. Signal Process. 58(3), 1035–1048 (2010)
Article MathSciNet Google Scholar
Mota, J., Xavier, J., Aguiar, P., Püschel, M.: Distributed optimization with local domains: applications in MPC and network flows. IEEE Trans. Autom. Control 60, 2004–2009 (2015)
Article MathSciNet Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article Google Scholar
Nedić, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)
Article MathSciNet Google Scholar
Jakovetić, D., Xavier, J., Moura, J.M.F.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)
Article MathSciNet Google Scholar
Mokhtari, A., Ling, Q., Ribeiro, A.: Network Newton distributed optimization methods. IEEE Trans. Signal Process. 65(1), 146–161 (2017)
Article MathSciNet Google Scholar
Mokhtari, A., Shi, W., Ling, Q., Ribeiro, A.: DQM: Decentralized quadratically approximated alternating direction method of multipliers. IEEE Trans. Signal Process. (2016). https://doi.org/10.1109/TSP.2016.2548989
Article MathSciNet MATH Google Scholar
Jakovetić, D., Bajović, D., Krejić, N., Krklec Jerinkić, N.: Newton-like method with diagonal correction for distributed optimization. SIAM J. Optim. 27(2), 1171–1203 (2017)
Article MathSciNet Google Scholar
Jakovetić, D., Moura, J.M.F., Xavier, J.: Distributed Nesterov-like gradient algorithms. In: CDC’12, 51$^{{\rm st}}$ IEEE Conference on Decision and Control, pp. 5459–5464. Maui, Hawaii, December (2012)
Xu, J., Zhu, S., Soh, Y. C., Xie, L.: Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant step- sizes. In: IEEE Conference on Deci- sion and Control (CDC), pp. 2055-2060 (2015)
Di Lorenzo, P., Scutari, G.: Distributed nonconvex optimization over networks. In: IEEE International Conference on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 229–232 (2015)
Shi, W., Ling, Q., Wu, G., Yin, W.: EXTRA: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 2(25), 944–966 (2015)
Article MathSciNet Google Scholar
Qu, G., Li, N.: Harnessing smoothness to accelerate distributed optimization, IEEE Transactions on Control of Network Systems (to appear)
Jakovetić, D.: A Unification and Generalization of Exact Distributed First Order Methods, arxiv preprint, arXiv:1709.01317, (2017)
Mokhtari, A., Shi, W., Ling, Q., Ribeiro, A.: A Decentralized Second Order Method with Exact Linear Convergence Rate for Consensus Optimization, (2016), available at: arXiv:1602.00596
Nedic, A., Olshevsky, A., Shi, W., Uribe, C.A.: Geometrically convergent distributed optimization with uncoordinated step-sizes. In: 2017 American Control Conference (ACC). Seattle, WA, USA (2017). https://doi.org/10.23919/ACC.2017.7963560
Nedic, A., Olshevsky, A., Shi, W.: Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4), 2597–2633 (2017)
Article MathSciNet Google Scholar
Barzilai, J., Borwein, J.M.: Two point step size gradient methods. IMA J. Numer. Anal. 8, 141–148 (1988)
Article MathSciNet Google Scholar
Raydan, M.: On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 13, 321–326 (1993)
Article MathSciNet Google Scholar
Raydan, M.: Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7, 26–33 (1997)
Article MathSciNet Google Scholar
Birgin, E.G., Martínez, J.M.: Raydan M spectral projected gradient methods: review and perspectives. J. Stat. Softw. 60(3), 1–21 (2014)
Article Google Scholar
Dai, Y.H., Liao, L.Z.: R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22, 1–10 (2002)
Article MathSciNet Google Scholar
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10, 1196–1211 (2000)
Article MathSciNet Google Scholar
Birgin, E.G., Martínez, J.M., Raydan, M.: Algorithm 813: SPG—software for convex- constrained optimization. ACM Trans. Math. Softw. 27, 340–349 (2001)
Article Google Scholar
Birgin, E.G., Martínez, J.M.: Raydan M inexact spectral projected gradient methods on convex sets. IMA J. Numer. Anal. 23, 539–559 (2003)
Article MathSciNet Google Scholar
Kar, S., Moura, J.M.F.: Distributed consensus algorithms in sensor networks with imperfect communication: link failures and channel noise. IEEE Trans. Signal Process. 57(1), 355–369 (2009)
Article MathSciNet Google Scholar
Desoer, C., Vidyasagar, M.: Feedback Systems: Input–Output Properties. SIAM, New Delhi (2009)
Book Google Scholar
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Informatics, Faculty of Sciences, University of Novi Sad, Trg Dositeja Obradovića 4, Novi Sad, 21000, Serbia
Dušan Jakovetić, Nataša Krejić & Nataša Krklec Jerinkić

Authors

Dušan Jakovetić
View author publications
You can also search for this author in PubMed Google Scholar
Nataša Krejić
View author publications
You can also search for this author in PubMed Google Scholar
Nataša Krklec Jerinkić
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dušan Jakovetić.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Research supported by the Serbian Ministry of Education, Science, and Technological Development, Grant No. 174030. This work is also supported by the I-BiDaaS project, funded by the European Commission under Grant Agreement No. 780787. This publication reflects the views only of the authors, and the Commission cannot be held responsible for any use which maybe made of the information contained therein.

Appendix

Proof of Lemma 4.2

Consider algorithm (8)–(9). For the special case considered here, the update rule (7)–(8) becomes:

$$\begin{aligned} x^{k+1}= & {} W x^{k} - \alpha \, z^k, \end{aligned}$$

(57)

$$\begin{aligned} z^{k+1}= & {} W z^k + x^{k+1}- x^k. \end{aligned}$$

(58)

Denote by $\xi ^k$ the $(2n) \times 1$ vector defined by $\xi ^k = \left( e^k\,,z^k\right) $, where $e^k = x^k - x^\star $. Then, it is easy to show that $\xi ^k$ obeys the following recursion:

$$\begin{aligned} \xi ^{k+1} = E\,\xi ^k, \end{aligned}$$

where E is the $(2n) \times (2n)$ matrix with the following $n \times n $ blocks:^{Footnote 1}

$$\begin{aligned} E_{11} = W-J,\,\,\,E_{12} =-\alpha \,I,\,\,\,E_{21} =W-I, \,\,\,E_{22} =W-\alpha \,I. \end{aligned}$$

Consider the eigenvalue decomposition of matrix $W = Q\Lambda Q^T$, where Q is the matrix of orthonormal eigenvectors, and $\Lambda $ is the matrix of eigenvalues ordered in a descending order.

We have that $\lambda _1=1$, and $\lambda _i \in (0,1)$, $i \ne 1$. Note that the matrix E can now be decomposed as follows:

$$\begin{aligned} E = {\widehat{Q}}\,{\widehat{P}}\,{\widehat{\Lambda }}\,{\widehat{P}}^T\,{\widehat{Q}}^T. \end{aligned}$$

Here, ${\widehat{Q}}$ is the $(2n) \times (2n)$ orthonormal matrix with the $n \times n$ blocks at positions (1,1) and (2,2) equal to Q, and zero-off diagonal $n \times n$ blocks; and ${\widehat{P}}$ is an appropriate permutation matrix. Furthermore, ${\widehat{\Lambda }}$ is the $(2n)\times (2n)$ block-diagonal matrix with the $2 \times 2 $ diagonal blocks $D_1,\ldots ,D_n$, as follows:

$$\begin{aligned} D_1 = \begin{bmatrix} 0&\quad -\alpha \\ 0&\quad 1-\alpha \end{bmatrix},\,\,\, D_i = \begin{bmatrix} \lambda _i&\quad -\alpha \\ \lambda _i-1&\quad \lambda _i-\alpha \end{bmatrix},\,\,\,i \ne 1. \end{aligned}$$

It is then clear that the matrix E has the same eigenvalues as the matrix ${\widehat{\Lambda }}$, and hence the two matrices have the same spectral radius. Next, by evaluating the eigenvalues of the $2 \times 2$ matrices $D_i,$$i=1,\ldots ,n$, it is straightforward to verify sufficient conditions on $\alpha $ such that the spectral radius $\rho ({\widehat{\Lambda }})$ is strictly less than one, and such that $\rho ({\widehat{\Lambda }})$ is strictly greater than one. Namely, for $i \ne 1$, it is easy to show that $\rho (D_i)<1$ if and only if $\alpha \in (0,{\overline{\alpha }}_i)$, where ${\overline{\alpha }}_i = \frac{(1+\lambda _i)^2}{2}$. On the other hand, for $i=1$, we have that $\rho (D_1) = 1-\alpha < 1$. In view of the fact that $\lambda _i>0$, $i=2,\ldots ,n$, the latter implies that, when $\alpha \le 1/2$, we have that $\rho ({\widehat{\Lambda }})<1$; also, $\rho ({\widehat{\Lambda }})>1$ whenever $\alpha >2$. This in particular implies that $x^k$ converges R-linearly to $x^\star $ if $\alpha \le 1/2$, and that $x^k$ diverges, when $\alpha >2$. The proof is complete. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jakovetić, D., Krejić, N. & Krklec Jerinkić, N. Exact spectral-like gradient method for distributed optimization. Comput Optim Appl 74, 703–728 (2019). https://doi.org/10.1007/s10589-019-00131-8

Download citation

Received: 12 June 2018
Published: 19 September 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10589-019-00131-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exact spectral-like gradient method for distributed optimization

Abstract

Access this article

Similar content being viewed by others

An augmented Lagrangian method for distributed optimization

Randomized difference-based gradient-free algorithm for distributed resource allocation

Distributed accelerated optimization algorithms: Insights from an ODE

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Lemma 4.2

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Exact spectral-like gradient method for distributed optimization

Abstract

Access this article

Similar content being viewed by others

An augmented Lagrangian method for distributed optimization

Randomized difference-based gradient-free algorithm for distributed resource allocation

Distributed accelerated optimization algorithms: Insights from an ODE

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Lemma 4.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation