Distributed Optimization

Garcia, Alfredo; Wang, Bingyu; Pu, Shi

doi:10.1007/978-3-030-54621-2_809-1

Alfredo Garcia³,
Bingyu Wang⁴ &
Shi Pu⁴

63 Accesses

Introduction

The ever-growing scale of many optimization applications continues to pose significant implementation challenges. For example, in machine learning applications, the increasing size and complexity of data pose scalability challenges for both storage and processing. Thus, deep learning (DL) models can take weeks to train on a single GPU-equipped machine. Distributed implementations of several optimization methods are often used to speed up the training time [1, 36]. Similarly, in artificial intelligence applications, distributed and asynchronous implementation of actor-critic reinforcement learning algorithms has scaled up to provide significant breakthroughs [20].

Generally speaking, distributed implementations of optimization methods use two types of architectures. In a federated architecture (Fig. 1, left), solution updates are executed in a centralserver based upon first-order (and sometimes second-order) information obtained from local nodes. The federated learning...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation
Google Scholar
Assran M, Loizou N, Ballas N, Rabbat M (2019) Stochastic gradient push for distributed deep learning. In: International Conference on Machine Learning. PMLR, pp 344–353
Google Scholar
Assran MS, Rabbat MG (2020) Asynchronous gradient push. IEEE Trans Autom Control 66(1):168–183
Article MathSciNet MATH Google Scholar
Aybat N, Wang Z, Iyengar G (2015) An asynchronous distributed proximal gradient method for composite convex optimization. In: International Conference on Machine Learning. PMLR, pp 2454–2462
Google Scholar
Bastianello N, Carli R, Schenato L, Todescato M (2020) Asynchronous distributed optimization over lossy networks via relaxed ADMM: stability and linear convergence. IEEE Trans Autom Control 66(6):2620–2635
Article MathSciNet MATH Google Scholar
Bianchi P, Hachem W, Iutzeler F (2015) A coordinate descent primal-dual algorithm and application to distributed asynchronous optimization. IEEE Trans Autom Control 61(10):2947–2957
Article MathSciNet MATH Google Scholar
Chen S, Garcia A, Hong M, Shahrampour S (2021) Decentralized riemannian gradient descent on the stiefel manifold. In: Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol 139. PMLR, pp 1594–1605
Google Scholar
Chen S, Garcia A, Shahrampour S (2022) On distributed nonconvex optimization: projected subgradient method for weakly convex problems in networks. IEEE Trans Autom Control 67(2):662–675. https://doi.org/10.1109/TAC.2021.3056535
Article MathSciNet MATH Google Scholar
Eisen M, Mokhtari A, Ribeiro A (2017) Decentralized quasi-newton methods. IEEE Trans Signal Process 65(10):2613–2628
Article MathSciNet MATH Google Scholar
Farina F, Garulli A, Giannitrapani A, Notarstefano G (2019) A distributed asynchronous method of multipliers for constrained nonconvex optimization. Automatica 103:243–253
Article MathSciNet MATH Google Scholar
Garcia A, Wang L, Huang J, Hong L (2021) Distributed networked real-time learning. IEEE Trans Control Netw Syst 8(1):28–38
Article MathSciNet MATH Google Scholar
Hsieh YG, Iutzeler F, Malick J, Mertikopoulos P (2020) Multi-agent online optimization with delays: asynchronicity, adaptivity, and optimism. arXiv preprint arXiv:2012.11579
Google Scholar
Iutzeler F, Bianchi P, Ciblat P, Hachem W (2013) Asynchronous distributed optimization using a randomized alternating direction method of multipliers. In: 52nd IEEE Conference on Decision and Control. IEEE, pp 3671–3676
Google Scholar
Kumar S, Jain R, Rajawat K (2016) Asynchronous optimization over heterogeneous networks via consensus ADMM. IEEE Trans Signal Inf Process Netw 3(1):114–129
MathSciNet Google Scholar
Li H, Lü Q, Chen G, Huang T, Dong Z (2020) Distributed constrained optimization over unbalanced directed networks using asynchronous broadcast-based algorithm. IEEE Trans Autom Control 66(3):1102–1115
Article MathSciNet MATH Google Scholar
Lian X, Zhang W, Zhang C, Liu J (2018) Asynchronous decentralized parallel stochastic gradient descent. In: International Conference on Machine Learning. PMLR, pp 3043–3052
Google Scholar
Lin Y, Shames I, Nesic D (2021) Asynchronous distributed optimization via dual decomposition and block coordinate subgradient methods. IEEE Trans Control Netw Syst 8:1348–1359
Article MathSciNet Google Scholar
Lu J, Feyzmahdavian HR, Johansson M (2015) Dual coordinate descent algorithms for multi-agent optimization. In: 2015 European Control Conference (ECC). IEEE, pp 715–720
Google Scholar
Mansoori F, Wei E (2017) Superlinearly convergent asynchronous distributed network newton method. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, pp 2874–2879
Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu A, Veness J et al (2015) Human-level control through deep reinforcement learning. Nature 518:529–533
Article Google Scholar
Nedić A (2010) Asynchronous broadcast-based convex optimization over a network. IEEE Trans Autom Control 56(6):1337–1351
Article MathSciNet MATH Google Scholar
Nedić A, Olshevsky A (2016) Stochastic gradient-push for strongly convex functions on time-varying directed graphs. IEEE Trans Autom Control 61(12):3936–3947
Article MathSciNet MATH Google Scholar
Notarnicola I, Notarstefano G (2016) Asynchronous distributed optimization via randomized dual proximal gradient. IEEE Trans Autom Control 62(5):2095–2106
Article MathSciNet MATH Google Scholar
Peng Z, Xu Y, Yan M, Yin W (2016) Arock: an algorithmic framework for asynchronous parallel coordinate updates. SIAM J Sci Comput 38(5):A2851–A2879
Article MathSciNet MATH Google Scholar
Pu S, Garcia A (2018) Swarming for faster convergence in stochastic optimization. SIAM J Control Optim 56(4):2997–3020
Article MathSciNet MATH Google Scholar
Pu S, Shi W, Xu J, Nedić A (2020) Push–pull gradient methods for distributed optimization in networks. IEEE Trans Autom Control 66(1):1–16
Article MathSciNet MATH Google Scholar
Spiridonoff A, Olshevsky A, Paschalidis IC (2020) Robust asynchronous stochastic gradient-push: asymptotically optimal and network-independent performance for strongly convex functions. J Mach Learn Res 21(58):1–47
MathSciNet MATH Google Scholar
Srivastava K, Nedić A (2011) Distributed asynchronous constrained stochastic optimization. IEEE J Sel Top Sign Process 5(4):772–790
Article Google Scholar
Sun Y, Maros M, Scutari G, Cheng G (2022) High-dimensional inference over networks: linear convergence and statistical guarantees. CoRR abs/2201.08507. https://arxiv.org/abs/2201.08507
Tian Y, Sun Y, Scutari G (2020) Achieving linear convergence in distributed asynchronous multiagent optimization. IEEE Trans Autom Control 65(12):5264–5279
Article MathSciNet MATH Google Scholar
Wang J, Tantia V, Ballas N, Rabbat M (2019) Slowmo: improving communication-efficient distributed sgd with slow momentum. arXiv preprint arXiv:1910.00643
Google Scholar
Wei E, Ozdaglar A (2013) On the O(1/k) convergence of asynchronous distributed alternating direction method of multipliers. In: 2013 IEEE Global Conference on Signal and Information Processing. IEEE, pp 551–554
Google Scholar
Wu T, Yuan K, Ling Q, Yin W, Sayed AH (2017) Decentralized consensus optimization with asynchrony and delays. IEEE Trans Signal Inf Process Netw 4(2):293–307
MathSciNet Google Scholar
Xie T, Chen G, Liao X (2020) Event-triggered asynchronous distributed optimization algorithm with heterogeneous time-varying step-sizes. Neural Comput Appl 32(10):6175–6184
Article Google Scholar
Zhang G, Heusdens R (2017) Distributed optimization using the primal-dual method of multipliers. IEEE Trans Signal Inf Process Netw 4(1):173–187
MathSciNet Google Scholar
Zhang H, Zheng Z, Xu S, Dai W, Ho Q, Liang X, Hu Z, Wei J, Xie P, Xing EP (2017) Poseidon: an efficient communication architecture for distributed deep learning on GPU clusters. In: 13th USENIX Annual Technical Conference
Google Scholar
Zhang J, You K (2019) Fully asynchronous distributed optimization with linear convergence in directed networks. arXiv preprint arXiv:1901.08215
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, USA
Alfredo Garcia
The Chinese University of Hong Kong, Shenzhen, China
Bingyu Wang & Shi Pu

Authors

Alfredo Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Bingyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shi Pu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alfredo Garcia .

Editor information

Editors and Affiliations

Department of Industrial & Systems Engin, University of Florida, Gainesville, FL, USA
Panos M. Pardalos
Departmentl of Industrial Engineering, University of Pittsburgh, Pittsburgh, PA, USA
Oleg A. Prokopyev

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Garcia, A., Wang, B., Pu, S. (2023). Distributed Optimization. In: Pardalos, P.M., Prokopyev, O.A. (eds) Encyclopedia of Optimization. Springer, Cham. https://doi.org/10.1007/978-3-030-54621-2_809-1

Download citation

DOI: https://doi.org/10.1007/978-3-030-54621-2_809-1
Published: 07 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54621-2
Online ISBN: 978-3-030-54621-2
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics