Skip to main content
Log in

Exact spectral-like gradient method for distributed optimization

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Since the initial proposal in the late 80s, spectral gradient methods continue to receive significant attention, especially due to their excellent numerical performance on various large scale applications. However, to date, they have not been sufficiently explored in the context of distributed optimization. In this paper, we consider unconstrained distributed optimization problems where n nodes constitute an arbitrary connected network and collaboratively minimize the sum of their local convex cost functions. In this setting, building from existing exact distributed gradient methods, we propose a novel exact distributed gradient method wherein nodes’ step-sizes are designed according to the novel rules akin to those in spectral gradient methods. We refer to the proposed method as Distributed Spectral Gradient method. The method exhibits R-linear convergence under standard assumptions for the nodes’ local costs and safeguarding on the algorithm step-sizes. We illustrate the method’s performance through simulation examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Lemma 4.3 can be proved similarly, if we work with representation (9)–(10) instead of (7)–(8). The corresponding error recursion matrix then becomes as in (20), with \(\Sigma _k^{-1}=\alpha \,I\) and \(H=I\). The matrix has the same blocks as E up to a permutation, and the results through the alternative analysis will be equivalent.

References

  1. Schizas, I.D., Ribeiro, A., Giannakis, G.B.: Consensus in ad hoc WSNs with noisy links—Part I: distributed estimation of deterministic signals. IEEE Trans. Signal Process. 56(1), 350–364 (2009)

    Article  Google Scholar 

  2. Kar, S., Moura, J.M.F., Ramanan, K.: Distributed parameter estimation in sensor networks: nonlinear observation models and imperfect communication. IEEE Trans. Inf. Theory 58(6), 3575–3605 (2012)

    Article  MathSciNet  Google Scholar 

  3. Lopes, C., Sayed, A.H.: Adaptive estimation algorithms over distributed networks. In: 21st IEICE Signal Processing Symposium. Kyoto, Japan (2006)

  4. Cattivelli, F., Sayed, A.H.: Diffusion LMS strategies for distributed estimation. IEEE Trans. Signal Process. 58(3), 1035–1048 (2010)

    Article  MathSciNet  Google Scholar 

  5. Mota, J., Xavier, J., Aguiar, P., Püschel, M.: Distributed optimization with local domains: applications in MPC and network flows. IEEE Trans. Autom. Control 60, 2004–2009 (2015)

    Article  MathSciNet  Google Scholar 

  6. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  Google Scholar 

  7. Nedić, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)

    Article  MathSciNet  Google Scholar 

  8. Jakovetić, D., Xavier, J., Moura, J.M.F.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)

    Article  MathSciNet  Google Scholar 

  9. Mokhtari, A., Ling, Q., Ribeiro, A.: Network Newton distributed optimization methods. IEEE Trans. Signal Process. 65(1), 146–161 (2017)

    Article  MathSciNet  Google Scholar 

  10. Mokhtari, A., Shi, W., Ling, Q., Ribeiro, A.: DQM: Decentralized quadratically approximated alternating direction method of multipliers. IEEE Trans. Signal Process. (2016). https://doi.org/10.1109/TSP.2016.2548989

    Article  MathSciNet  MATH  Google Scholar 

  11. Jakovetić, D., Bajović, D., Krejić, N., Krklec Jerinkić, N.: Newton-like method with diagonal correction for distributed optimization. SIAM J. Optim. 27(2), 1171–1203 (2017)

    Article  MathSciNet  Google Scholar 

  12. Jakovetić, D., Moura, J.M.F., Xavier, J.: Distributed Nesterov-like gradient algorithms. In: CDC’12, 51\(^{{\rm st}}\) IEEE Conference on Decision and Control, pp. 5459–5464. Maui, Hawaii, December (2012)

  13. Xu, J., Zhu, S., Soh, Y. C., Xie, L.: Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant step- sizes. In: IEEE Conference on Deci- sion and Control (CDC), pp. 2055-2060 (2015)

  14. Di Lorenzo, P., Scutari, G.: Distributed nonconvex optimization over networks. In: IEEE International Conference on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 229–232 (2015)

  15. Shi, W., Ling, Q., Wu, G., Yin, W.: EXTRA: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 2(25), 944–966 (2015)

    Article  MathSciNet  Google Scholar 

  16. Qu, G., Li, N.: Harnessing smoothness to accelerate distributed optimization, IEEE Transactions on Control of Network Systems (to appear)

  17. Jakovetić, D.: A Unification and Generalization of Exact Distributed First Order Methods, arxiv preprint, arXiv:1709.01317, (2017)

  18. Mokhtari, A., Shi, W., Ling, Q., Ribeiro, A.: A Decentralized Second Order Method with Exact Linear Convergence Rate for Consensus Optimization, (2016), available at: arXiv:1602.00596

  19. Nedic, A., Olshevsky, A., Shi, W., Uribe, C.A.: Geometrically convergent distributed optimization with uncoordinated step-sizes. In: 2017 American Control Conference (ACC). Seattle, WA, USA (2017). https://doi.org/10.23919/ACC.2017.7963560

  20. Nedic, A., Olshevsky, A., Shi, W.: Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4), 2597–2633 (2017)

    Article  MathSciNet  Google Scholar 

  21. Barzilai, J., Borwein, J.M.: Two point step size gradient methods. IMA J. Numer. Anal. 8, 141–148 (1988)

    Article  MathSciNet  Google Scholar 

  22. Raydan, M.: On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 13, 321–326 (1993)

    Article  MathSciNet  Google Scholar 

  23. Raydan, M.: Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7, 26–33 (1997)

    Article  MathSciNet  Google Scholar 

  24. Birgin, E.G., Martínez, J.M.: Raydan M spectral projected gradient methods: review and perspectives. J. Stat. Softw. 60(3), 1–21 (2014)

    Article  Google Scholar 

  25. Dai, Y.H., Liao, L.Z.: R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22, 1–10 (2002)

    Article  MathSciNet  Google Scholar 

  26. Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10, 1196–1211 (2000)

    Article  MathSciNet  Google Scholar 

  27. Birgin, E.G., Martínez, J.M., Raydan, M.: Algorithm 813: SPG—software for convex- constrained optimization. ACM Trans. Math. Softw. 27, 340–349 (2001)

    Article  Google Scholar 

  28. Birgin, E.G., Martínez, J.M.: Raydan M inexact spectral projected gradient methods on convex sets. IMA J. Numer. Anal. 23, 539–559 (2003)

    Article  MathSciNet  Google Scholar 

  29. Kar, S., Moura, J.M.F.: Distributed consensus algorithms in sensor networks with imperfect communication: link failures and channel noise. IEEE Trans. Signal Process. 57(1), 355–369 (2009)

    Article  MathSciNet  Google Scholar 

  30. Desoer, C., Vidyasagar, M.: Feedback Systems: Input–Output Properties. SIAM, New Delhi (2009)

    Book  Google Scholar 

  31. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dušan Jakovetić.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Research supported by the Serbian Ministry of Education, Science, and Technological Development, Grant No. 174030. This work is also supported by the I-BiDaaS project, funded by the European Commission under Grant Agreement No. 780787. This publication reflects the views only of the authors, and the Commission cannot be held responsible for any use which maybe made of the information contained therein.

Appendix

Appendix

Proof of Lemma 4.2

Consider algorithm (8)–(9). For the special case considered here, the update rule (7)–(8) becomes:

$$\begin{aligned} x^{k+1}= & {} W x^{k} - \alpha \, z^k, \end{aligned}$$
(57)
$$\begin{aligned} z^{k+1}= & {} W z^k + x^{k+1}- x^k. \end{aligned}$$
(58)

Denote by \(\xi ^k\) the \((2n) \times 1\) vector defined by \(\xi ^k = \left( e^k\,,z^k\right) \), where \(e^k = x^k - x^\star \). Then, it is easy to show that \(\xi ^k\) obeys the following recursion:

$$\begin{aligned} \xi ^{k+1} = E\,\xi ^k, \end{aligned}$$

where E is the \((2n) \times (2n)\) matrix with the following \(n \times n \) blocks:Footnote 1

$$\begin{aligned} E_{11} = W-J,\,\,\,E_{12} =-\alpha \,I,\,\,\,E_{21} =W-I, \,\,\,E_{22} =W-\alpha \,I. \end{aligned}$$

Consider the eigenvalue decomposition of matrix \(W = Q\Lambda Q^T\), where Q is the matrix of orthonormal eigenvectors, and \(\Lambda \) is the matrix of eigenvalues ordered in a descending order.

We have that \(\lambda _1=1\), and \(\lambda _i \in (0,1)\), \(i \ne 1\). Note that the matrix E can now be decomposed as follows:

$$\begin{aligned} E = {\widehat{Q}}\,{\widehat{P}}\,{\widehat{\Lambda }}\,{\widehat{P}}^T\,{\widehat{Q}}^T. \end{aligned}$$

Here, \({\widehat{Q}}\) is the \((2n) \times (2n)\) orthonormal matrix with the \(n \times n\) blocks at positions (1,1) and (2,2) equal to Q, and zero-off diagonal \(n \times n\) blocks; and \({\widehat{P}}\) is an appropriate permutation matrix. Furthermore, \({\widehat{\Lambda }}\) is the \((2n)\times (2n)\) block-diagonal matrix with the \(2 \times 2 \) diagonal blocks \(D_1,\ldots ,D_n\), as follows:

$$\begin{aligned} D_1 = \begin{bmatrix} 0&\quad -\alpha \\ 0&\quad 1-\alpha \end{bmatrix},\,\,\, D_i = \begin{bmatrix} \lambda _i&\quad -\alpha \\ \lambda _i-1&\quad \lambda _i-\alpha \end{bmatrix},\,\,\,i \ne 1. \end{aligned}$$

It is then clear that the matrix E has the same eigenvalues as the matrix \({\widehat{\Lambda }}\), and hence the two matrices have the same spectral radius. Next, by evaluating the eigenvalues of the \(2 \times 2\) matrices \(D_i,\)\(i=1,\ldots ,n\), it is straightforward to verify sufficient conditions on \(\alpha \) such that the spectral radius \(\rho ({\widehat{\Lambda }})\) is strictly less than one, and such that \(\rho ({\widehat{\Lambda }})\) is strictly greater than one. Namely, for \(i \ne 1\), it is easy to show that \(\rho (D_i)<1\) if and only if \(\alpha \in (0,{\overline{\alpha }}_i)\), where \({\overline{\alpha }}_i = \frac{(1+\lambda _i)^2}{2}\). On the other hand, for \(i=1\), we have that \(\rho (D_1) = 1-\alpha < 1\). In view of the fact that \(\lambda _i>0\), \(i=2,\ldots ,n\), the latter implies that, when \(\alpha \le 1/2\), we have that \(\rho ({\widehat{\Lambda }})<1\); also, \(\rho ({\widehat{\Lambda }})>1\) whenever \(\alpha >2\). This in particular implies that \(x^k\) converges R-linearly to \(x^\star \) if \(\alpha \le 1/2\), and that \(x^k\) diverges, when \(\alpha >2\). The proof is complete. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jakovetić, D., Krejić, N. & Krklec Jerinkić, N. Exact spectral-like gradient method for distributed optimization. Comput Optim Appl 74, 703–728 (2019). https://doi.org/10.1007/s10589-019-00131-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-019-00131-8

Keywords

Mathematics Subject Classification

Navigation