Abstract
Since the initial proposal in the late 80s, spectral gradient methods continue to receive significant attention, especially due to their excellent numerical performance on various large scale applications. However, to date, they have not been sufficiently explored in the context of distributed optimization. In this paper, we consider unconstrained distributed optimization problems where n nodes constitute an arbitrary connected network and collaboratively minimize the sum of their local convex cost functions. In this setting, building from existing exact distributed gradient methods, we propose a novel exact distributed gradient method wherein nodes’ step-sizes are designed according to the novel rules akin to those in spectral gradient methods. We refer to the proposed method as Distributed Spectral Gradient method. The method exhibits R-linear convergence under standard assumptions for the nodes’ local costs and safeguarding on the algorithm step-sizes. We illustrate the method’s performance through simulation examples.
Similar content being viewed by others
Notes
Lemma 4.3 can be proved similarly, if we work with representation (9)–(10) instead of (7)–(8). The corresponding error recursion matrix then becomes as in (20), with \(\Sigma _k^{-1}=\alpha \,I\) and \(H=I\). The matrix has the same blocks as E up to a permutation, and the results through the alternative analysis will be equivalent.
References
Schizas, I.D., Ribeiro, A., Giannakis, G.B.: Consensus in ad hoc WSNs with noisy links—Part I: distributed estimation of deterministic signals. IEEE Trans. Signal Process. 56(1), 350–364 (2009)
Kar, S., Moura, J.M.F., Ramanan, K.: Distributed parameter estimation in sensor networks: nonlinear observation models and imperfect communication. IEEE Trans. Inf. Theory 58(6), 3575–3605 (2012)
Lopes, C., Sayed, A.H.: Adaptive estimation algorithms over distributed networks. In: 21st IEICE Signal Processing Symposium. Kyoto, Japan (2006)
Cattivelli, F., Sayed, A.H.: Diffusion LMS strategies for distributed estimation. IEEE Trans. Signal Process. 58(3), 1035–1048 (2010)
Mota, J., Xavier, J., Aguiar, P., Püschel, M.: Distributed optimization with local domains: applications in MPC and network flows. IEEE Trans. Autom. Control 60, 2004–2009 (2015)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Nedić, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)
Jakovetić, D., Xavier, J., Moura, J.M.F.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)
Mokhtari, A., Ling, Q., Ribeiro, A.: Network Newton distributed optimization methods. IEEE Trans. Signal Process. 65(1), 146–161 (2017)
Mokhtari, A., Shi, W., Ling, Q., Ribeiro, A.: DQM: Decentralized quadratically approximated alternating direction method of multipliers. IEEE Trans. Signal Process. (2016). https://doi.org/10.1109/TSP.2016.2548989
Jakovetić, D., Bajović, D., Krejić, N., Krklec Jerinkić, N.: Newton-like method with diagonal correction for distributed optimization. SIAM J. Optim. 27(2), 1171–1203 (2017)
Jakovetić, D., Moura, J.M.F., Xavier, J.: Distributed Nesterov-like gradient algorithms. In: CDC’12, 51\(^{{\rm st}}\) IEEE Conference on Decision and Control, pp. 5459–5464. Maui, Hawaii, December (2012)
Xu, J., Zhu, S., Soh, Y. C., Xie, L.: Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant step- sizes. In: IEEE Conference on Deci- sion and Control (CDC), pp. 2055-2060 (2015)
Di Lorenzo, P., Scutari, G.: Distributed nonconvex optimization over networks. In: IEEE International Conference on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 229–232 (2015)
Shi, W., Ling, Q., Wu, G., Yin, W.: EXTRA: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 2(25), 944–966 (2015)
Qu, G., Li, N.: Harnessing smoothness to accelerate distributed optimization, IEEE Transactions on Control of Network Systems (to appear)
Jakovetić, D.: A Unification and Generalization of Exact Distributed First Order Methods, arxiv preprint, arXiv:1709.01317, (2017)
Mokhtari, A., Shi, W., Ling, Q., Ribeiro, A.: A Decentralized Second Order Method with Exact Linear Convergence Rate for Consensus Optimization, (2016), available at: arXiv:1602.00596
Nedic, A., Olshevsky, A., Shi, W., Uribe, C.A.: Geometrically convergent distributed optimization with uncoordinated step-sizes. In: 2017 American Control Conference (ACC). Seattle, WA, USA (2017). https://doi.org/10.23919/ACC.2017.7963560
Nedic, A., Olshevsky, A., Shi, W.: Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4), 2597–2633 (2017)
Barzilai, J., Borwein, J.M.: Two point step size gradient methods. IMA J. Numer. Anal. 8, 141–148 (1988)
Raydan, M.: On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 13, 321–326 (1993)
Raydan, M.: Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7, 26–33 (1997)
Birgin, E.G., Martínez, J.M.: Raydan M spectral projected gradient methods: review and perspectives. J. Stat. Softw. 60(3), 1–21 (2014)
Dai, Y.H., Liao, L.Z.: R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22, 1–10 (2002)
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10, 1196–1211 (2000)
Birgin, E.G., Martínez, J.M., Raydan, M.: Algorithm 813: SPG—software for convex- constrained optimization. ACM Trans. Math. Softw. 27, 340–349 (2001)
Birgin, E.G., Martínez, J.M.: Raydan M inexact spectral projected gradient methods on convex sets. IMA J. Numer. Anal. 23, 539–559 (2003)
Kar, S., Moura, J.M.F.: Distributed consensus algorithms in sensor networks with imperfect communication: link failures and channel noise. IEEE Trans. Signal Process. 57(1), 355–369 (2009)
Desoer, C., Vidyasagar, M.: Feedback Systems: Input–Output Properties. SIAM, New Delhi (2009)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1997)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Research supported by the Serbian Ministry of Education, Science, and Technological Development, Grant No. 174030. This work is also supported by the I-BiDaaS project, funded by the European Commission under Grant Agreement No. 780787. This publication reflects the views only of the authors, and the Commission cannot be held responsible for any use which maybe made of the information contained therein.
Appendix
Appendix
Proof of Lemma 4.2
Consider algorithm (8)–(9). For the special case considered here, the update rule (7)–(8) becomes:
Denote by \(\xi ^k\) the \((2n) \times 1\) vector defined by \(\xi ^k = \left( e^k\,,z^k\right) \), where \(e^k = x^k - x^\star \). Then, it is easy to show that \(\xi ^k\) obeys the following recursion:
where E is the \((2n) \times (2n)\) matrix with the following \(n \times n \) blocks:Footnote 1
Consider the eigenvalue decomposition of matrix \(W = Q\Lambda Q^T\), where Q is the matrix of orthonormal eigenvectors, and \(\Lambda \) is the matrix of eigenvalues ordered in a descending order.
We have that \(\lambda _1=1\), and \(\lambda _i \in (0,1)\), \(i \ne 1\). Note that the matrix E can now be decomposed as follows:
Here, \({\widehat{Q}}\) is the \((2n) \times (2n)\) orthonormal matrix with the \(n \times n\) blocks at positions (1,1) and (2,2) equal to Q, and zero-off diagonal \(n \times n\) blocks; and \({\widehat{P}}\) is an appropriate permutation matrix. Furthermore, \({\widehat{\Lambda }}\) is the \((2n)\times (2n)\) block-diagonal matrix with the \(2 \times 2 \) diagonal blocks \(D_1,\ldots ,D_n\), as follows:
It is then clear that the matrix E has the same eigenvalues as the matrix \({\widehat{\Lambda }}\), and hence the two matrices have the same spectral radius. Next, by evaluating the eigenvalues of the \(2 \times 2\) matrices \(D_i,\)\(i=1,\ldots ,n\), it is straightforward to verify sufficient conditions on \(\alpha \) such that the spectral radius \(\rho ({\widehat{\Lambda }})\) is strictly less than one, and such that \(\rho ({\widehat{\Lambda }})\) is strictly greater than one. Namely, for \(i \ne 1\), it is easy to show that \(\rho (D_i)<1\) if and only if \(\alpha \in (0,{\overline{\alpha }}_i)\), where \({\overline{\alpha }}_i = \frac{(1+\lambda _i)^2}{2}\). On the other hand, for \(i=1\), we have that \(\rho (D_1) = 1-\alpha < 1\). In view of the fact that \(\lambda _i>0\), \(i=2,\ldots ,n\), the latter implies that, when \(\alpha \le 1/2\), we have that \(\rho ({\widehat{\Lambda }})<1\); also, \(\rho ({\widehat{\Lambda }})>1\) whenever \(\alpha >2\). This in particular implies that \(x^k\) converges R-linearly to \(x^\star \) if \(\alpha \le 1/2\), and that \(x^k\) diverges, when \(\alpha >2\). The proof is complete. \(\square \)
Rights and permissions
About this article
Cite this article
Jakovetić, D., Krejić, N. & Krklec Jerinkić, N. Exact spectral-like gradient method for distributed optimization. Comput Optim Appl 74, 703–728 (2019). https://doi.org/10.1007/s10589-019-00131-8
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-019-00131-8