Skip to main content

Distributed Optimization

  • Living reference work entry
  • First Online:
Encyclopedia of Optimization
  • 63 Accesses

Introduction

The ever-growing scale of many optimization applications continues to pose significant implementation challenges. For example, in machine learning applications, the increasing size and complexity of data pose scalability challenges for both storage and processing. Thus, deep learning (DL) models can take weeks to train on a single GPU-equipped machine. Distributed implementations of several optimization methods are often used to speed up the training time [1, 36]. Similarly, in artificial intelligence applications, distributed and asynchronous implementation of actor-critic reinforcement learning algorithms has scaled up to provide significant breakthroughs [20].

Generally speaking, distributed implementations of optimization methods use two types of architectures. In a federated architecture (Fig. 1, left), solution updates are executed in a centralserver based upon first-order (and sometimes second-order) information obtained from local nodes. The federated learning...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation

    Google Scholar 

  2. Assran M, Loizou N, Ballas N, Rabbat M (2019) Stochastic gradient push for distributed deep learning. In: International Conference on Machine Learning. PMLR, pp 344–353

    Google Scholar 

  3. Assran MS, Rabbat MG (2020) Asynchronous gradient push. IEEE Trans Autom Control 66(1):168–183

    Article  MathSciNet  MATH  Google Scholar 

  4. Aybat N, Wang Z, Iyengar G (2015) An asynchronous distributed proximal gradient method for composite convex optimization. In: International Conference on Machine Learning. PMLR, pp 2454–2462

    Google Scholar 

  5. Bastianello N, Carli R, Schenato L, Todescato M (2020) Asynchronous distributed optimization over lossy networks via relaxed ADMM: stability and linear convergence. IEEE Trans Autom Control 66(6):2620–2635

    Article  MathSciNet  MATH  Google Scholar 

  6. Bianchi P, Hachem W, Iutzeler F (2015) A coordinate descent primal-dual algorithm and application to distributed asynchronous optimization. IEEE Trans Autom Control 61(10):2947–2957

    Article  MathSciNet  MATH  Google Scholar 

  7. Chen S, Garcia A, Hong M, Shahrampour S (2021) Decentralized riemannian gradient descent on the stiefel manifold. In: Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol 139. PMLR, pp 1594–1605

    Google Scholar 

  8. Chen S, Garcia A, Shahrampour S (2022) On distributed nonconvex optimization: projected subgradient method for weakly convex problems in networks. IEEE Trans Autom Control 67(2):662–675. https://doi.org/10.1109/TAC.2021.3056535

    Article  MathSciNet  MATH  Google Scholar 

  9. Eisen M, Mokhtari A, Ribeiro A (2017) Decentralized quasi-newton methods. IEEE Trans Signal Process 65(10):2613–2628

    Article  MathSciNet  MATH  Google Scholar 

  10. Farina F, Garulli A, Giannitrapani A, Notarstefano G (2019) A distributed asynchronous method of multipliers for constrained nonconvex optimization. Automatica 103:243–253

    Article  MathSciNet  MATH  Google Scholar 

  11. Garcia A, Wang L, Huang J, Hong L (2021) Distributed networked real-time learning. IEEE Trans Control Netw Syst 8(1):28–38

    Article  MathSciNet  MATH  Google Scholar 

  12. Hsieh YG, Iutzeler F, Malick J, Mertikopoulos P (2020) Multi-agent online optimization with delays: asynchronicity, adaptivity, and optimism. arXiv preprint arXiv:2012.11579

    Google Scholar 

  13. Iutzeler F, Bianchi P, Ciblat P, Hachem W (2013) Asynchronous distributed optimization using a randomized alternating direction method of multipliers. In: 52nd IEEE Conference on Decision and Control. IEEE, pp 3671–3676

    Google Scholar 

  14. Kumar S, Jain R, Rajawat K (2016) Asynchronous optimization over heterogeneous networks via consensus ADMM. IEEE Trans Signal Inf Process Netw 3(1):114–129

    MathSciNet  Google Scholar 

  15. Li H, Lü Q, Chen G, Huang T, Dong Z (2020) Distributed constrained optimization over unbalanced directed networks using asynchronous broadcast-based algorithm. IEEE Trans Autom Control 66(3):1102–1115

    Article  MathSciNet  MATH  Google Scholar 

  16. Lian X, Zhang W, Zhang C, Liu J (2018) Asynchronous decentralized parallel stochastic gradient descent. In: International Conference on Machine Learning. PMLR, pp 3043–3052

    Google Scholar 

  17. Lin Y, Shames I, Nesic D (2021) Asynchronous distributed optimization via dual decomposition and block coordinate subgradient methods. IEEE Trans Control Netw Syst 8:1348–1359

    Article  MathSciNet  Google Scholar 

  18. Lu J, Feyzmahdavian HR, Johansson M (2015) Dual coordinate descent algorithms for multi-agent optimization. In: 2015 European Control Conference (ECC). IEEE, pp 715–720

    Google Scholar 

  19. Mansoori F, Wei E (2017) Superlinearly convergent asynchronous distributed network newton method. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, pp 2874–2879

    Google Scholar 

  20. Mnih V, Kavukcuoglu K, Silver D, Rusu A, Veness J et al (2015) Human-level control through deep reinforcement learning. Nature 518:529–533

    Article  Google Scholar 

  21. Nedić A (2010) Asynchronous broadcast-based convex optimization over a network. IEEE Trans Autom Control 56(6):1337–1351

    Article  MathSciNet  MATH  Google Scholar 

  22. Nedić A, Olshevsky A (2016) Stochastic gradient-push for strongly convex functions on time-varying directed graphs. IEEE Trans Autom Control 61(12):3936–3947

    Article  MathSciNet  MATH  Google Scholar 

  23. Notarnicola I, Notarstefano G (2016) Asynchronous distributed optimization via randomized dual proximal gradient. IEEE Trans Autom Control 62(5):2095–2106

    Article  MathSciNet  MATH  Google Scholar 

  24. Peng Z, Xu Y, Yan M, Yin W (2016) Arock: an algorithmic framework for asynchronous parallel coordinate updates. SIAM J Sci Comput 38(5):A2851–A2879

    Article  MathSciNet  MATH  Google Scholar 

  25. Pu S, Garcia A (2018) Swarming for faster convergence in stochastic optimization. SIAM J Control Optim 56(4):2997–3020

    Article  MathSciNet  MATH  Google Scholar 

  26. Pu S, Shi W, Xu J, Nedić A (2020) Push–pull gradient methods for distributed optimization in networks. IEEE Trans Autom Control 66(1):1–16

    Article  MathSciNet  MATH  Google Scholar 

  27. Spiridonoff A, Olshevsky A, Paschalidis IC (2020) Robust asynchronous stochastic gradient-push: asymptotically optimal and network-independent performance for strongly convex functions. J Mach Learn Res 21(58):1–47

    MathSciNet  MATH  Google Scholar 

  28. Srivastava K, Nedić A (2011) Distributed asynchronous constrained stochastic optimization. IEEE J Sel Top Sign Process 5(4):772–790

    Article  Google Scholar 

  29. Sun Y, Maros M, Scutari G, Cheng G (2022) High-dimensional inference over networks: linear convergence and statistical guarantees. CoRR abs/2201.08507. https://arxiv.org/abs/2201.08507

  30. Tian Y, Sun Y, Scutari G (2020) Achieving linear convergence in distributed asynchronous multiagent optimization. IEEE Trans Autom Control 65(12):5264–5279

    Article  MathSciNet  MATH  Google Scholar 

  31. Wang J, Tantia V, Ballas N, Rabbat M (2019) Slowmo: improving communication-efficient distributed sgd with slow momentum. arXiv preprint arXiv:1910.00643

    Google Scholar 

  32. Wei E, Ozdaglar A (2013) On the O(1/k) convergence of asynchronous distributed alternating direction method of multipliers. In: 2013 IEEE Global Conference on Signal and Information Processing. IEEE, pp 551–554

    Google Scholar 

  33. Wu T, Yuan K, Ling Q, Yin W, Sayed AH (2017) Decentralized consensus optimization with asynchrony and delays. IEEE Trans Signal Inf Process Netw 4(2):293–307

    MathSciNet  Google Scholar 

  34. Xie T, Chen G, Liao X (2020) Event-triggered asynchronous distributed optimization algorithm with heterogeneous time-varying step-sizes. Neural Comput Appl 32(10):6175–6184

    Article  Google Scholar 

  35. Zhang G, Heusdens R (2017) Distributed optimization using the primal-dual method of multipliers. IEEE Trans Signal Inf Process Netw 4(1):173–187

    MathSciNet  Google Scholar 

  36. Zhang H, Zheng Z, Xu S, Dai W, Ho Q, Liang X, Hu Z, Wei J, Xie P, Xing EP (2017) Poseidon: an efficient communication architecture for distributed deep learning on GPU clusters. In: 13th USENIX Annual Technical Conference

    Google Scholar 

  37. Zhang J, You K (2019) Fully asynchronous distributed optimization with linear convergence in directed networks. arXiv preprint arXiv:1901.08215

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alfredo Garcia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Garcia, A., Wang, B., Pu, S. (2023). Distributed Optimization. In: Pardalos, P.M., Prokopyev, O.A. (eds) Encyclopedia of Optimization. Springer, Cham. https://doi.org/10.1007/978-3-030-54621-2_809-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-54621-2_809-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-54621-2

  • Online ISBN: 978-3-030-54621-2

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics