Skip to main content

Advertisement

Log in

Decentralized multi-task reinforcement learning policy gradient method with momentum over networks

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

To find the optimal policy quickly for reinforcement learning problems, policy gradient (PG) method is very effective, it parameters the policy and updates policy parameter directly. Besides, momentum methods are commonly employed to improve convergence performance in the training of centralized deep networks, which can accelerate training rate by changing the descending direction of gradients. However, decentralized variants with momentum of PG are rarely investigated. For this reason, we propose a Decentralized Policy Gradient algorithm with Momentum called DPGM for solving multi-task reinforcement learning problems. Moreover, this article makes theoretical analysis on the convergence performance of DPGM rigorously, it can reach the rate of O(1/T), where T denotes the number of iterations. This rate can match the state of the art of decentralized PG methods. Furthermore, we provide experimental verification on decentralized reinforcement learning environment to support the theoretical result.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://github.com/openai/multiagent-particle-envs

References

  1. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J (2019) A guide to deep learning in healthcare. Nat Med 25(1):24–29

    Article  Google Scholar 

  2. Wang WY, Li J, He X (2018) Deep reinforcement learning for NLP. In: Proceedings of the 56th annual metting of association for computational linguistics, ACL, pp 19–21

  3. Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE International conference on robotics and automation, ICRA 2017, Singapore, Singapore, May 29 - June 3, 2017, pp 3357–3364

  4. Sutton RS, Barto AG (2018) Reinforcement learning - an introduction. The MIT Press Cambridge, pp 1–552

  5. Rummery GA, Niranjan M (1994) On-line q-learning using connectionist systems. Technical Report, 37

  6. Tesauro G (1995) Temporal difference learning and td-gammon. Commun ACM 38(3):58–68

    Article  Google Scholar 

  7. Sutton RS, McAllester DA, Singh SP, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, NIPS, pp 1057–1063

  8. Fazel M, Ge R, Kakade SM, Mesbahi M (2018) Global convergence of policy gradient methods for the linear quadratic regulator. In: Proceedings of the 35th international conference on machine learning, ICML, pp 1466–1475

  9. Tu S, Recht B (2019) The gap between model-based and model-free methods on the linear quadratic regulator: an asymptotic viewpoint. In: Conference on learning theory, COLT, pp 3036–3083

  10. Luo Y, Chiu C, Jaitly N, Sutskever I (2017) Learning online alignments with continuous rewards policy gradient. In: 2017 IEEE International conference on acoustics, speech and signal processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017, pp 2801–2805

  11. Andreas J, Klein D, Levine S (2017) Modular multitask reinforcement learning with policy sketches. In: Proceedings of the 34th international conference on machine learning, ICML, pp 166–175

  12. Igl M, Gambardella A, He J, Nardelli N, Siddharth N, Boehmer W, Whiteson S (2020) Multitask soft option learning. In: Proceedings of the 36th conference on uncertainty in artificial intelligence, UAI, pp 969–978

  13. D’Eramo C, Tateo D, Bonarini A, Restelli M, Peters J (2020) Sharing knowledge in multi-task deep reinforcement learning. In: Proceedings of the 8th international conference on learning representations, ICLR

  14. Cui F, Di H, Shen L, Ouchi K, Liu Z, Xu J (2022) Modeling semantic and emotional relationship in multi-turn emotional conversations using multi-task learning. Appl Intell 52(4):4663–4673

    Article  Google Scholar 

  15. Zeng S, Anwar MA, Doan TT, Raychowdhury A, Romberg J (2021) A decentralized policy gradient approach to multi-task reinforcement learning. In: Proceedings of the thirty-seventh conference on uncertainty in artificial intelligence, UAI 2021, virtual event, 27-30 July 2021. Proceedings of machine learning research, vol 161, pp 1002–1012

  16. Ma W, Dentcheva D, Zavlanos MM (2017) Risk-averse sensor planning using distributed policy gradient. In: American control conference, ACC, pp 4839–4844

  17. Pinyoanuntapong P, Lee M, Wang P (2019) Distributed multi-hop traffic engineering via stochastic policy gradient reinforcement learning. In: IEEE Global communications conference, GLOBECOM, pp 1–6

  18. Khan A, Kumar V, Ribeiro A (2021) Large scale distributed collaborative unlabeled motion planning with graph policy gradients. IEEE Robot Autom Lett 6(3):5340–5347

    Article  Google Scholar 

  19. Bono G, Dibangoye JS, Matignon L, Pereyron F, Simonin O (2018) Cooperative multi-agent policy gradient. In: Machine learning and knowledge discovery in databases - European conference, ECML PKDD, pp 459–476

  20. Lu S, Zhang K, Chen T, Basar T, Horesh L (2021) Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning. In: Proceedings of the 35th conference on artificial intelligence, AAAI, pp 8767–8775

  21. Cutkosky A, Orabona F (2019) Momentum-based variance reduction in non-convex SGD. In: Advances in neural information processing systems, NIPS, pp 15210–15219

  22. Tao W, Wu G, Tao Q (2022) Momentum acceleration in the individual convergence of nonsmooth convex optimization with constraints. IEEE Trans Neur Netw Learn Syst 33(3):1107–1118

    Article  MathSciNet  Google Scholar 

  23. Huang F, Gao S, Pei J, Huang H (2020) Momentum-based policy gradient methods. In: Proceedings of the 37th international conference on machine learning, ICML, pp 4422–4433. http://proceedings.mlr.press/v119/huang20a.html. Accessed 13 Aug 2021

  24. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8:229–256

    Article  MATH  Google Scholar 

  25. Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Itell Res 15:319–350

    MathSciNet  MATH  Google Scholar 

  26. Schulman J, Levine S, Abbeel P, Jordan MI, Moritz P (2015) Trust region policy optimization. In: Proceedings of the 32nd international conference on machine learning, ICML, pp 1889–1897

  27. Pirotta M, Restelli M, Bascetta L (2015) Policy gradient in Lipschitz Markov decision processes. Mach Learn 100(2-3):255–283

    Article  MathSciNet  MATH  Google Scholar 

  28. Agarwal A, Kakade SM, Lee JD, Mahajan G (2020) Optimality and approximation with policy gradient methods in markov decision processes. In: Conference on learning theory, COLT 2020, 9-12 July 2020, virtual event [Graz, Austria]. Proceedings of machine learning research, vol 125, pp 64–66

  29. Shen Z, Ribeiro A, Hassani H, Qian H, Mi C (2019) Hessian aided policy gradient. In: Proceedings of the 36th international conference on machine learning, ICML, vol 97, pp 5729– 5738

  30. Xu P, Gao F, Gu Q (2019) An improved convergence analysis of stochastic variance-reduced policy gradient. In: Proceedings of the 35th conference on uncertainty in artificial intelligence, UAI, pp 541–551

  31. Papini M, Binaghi D, Canonaco G, Pirotta M, Restelli M (2018) Stochastic variance-reduced policy gradient. In: Proceedings of the 35th international conference on machine learning, ICML, pp 4023–4032

  32. Yuan H, Lian X, Liu J, Zhou Y (2020) Stochastic recursive momentum for policy gradient methods. arXiv:2003.04302

  33. Lu S, Zhang K, Chen T, Basar T, Horesh L (2021) Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning. In: Thirty-Fifth AAAI conference on artificial intelligence, AAAI 2021, thirty-third conference on innovative applications of artificial intelligence, IAAI 2021, The eleventh symposium on educational advances in artificial intelligence, EAAI 2021, pp 8767–8775

  34. Jiang Z, Lee XY, Tan SY, Tan KL, Balu A, Lee YM, Hegde C, Sarkar S (2021) MDPGT: momentum-based decentralized policy gradient tracking. arXiv:2112.02813

  35. Khanduri P, Sharma P, Kafle S, Bulusu S, Rajawat K, Varshney PK (2020) Distributed stochastic non-convex optimization: momentum-based variance reduction. arXiv:2005.00224

  36. Foerster JN, Assael YM, de Freitas N, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. arXiv:1605.06676

  37. Peng P, Yuan Q, Wen Y, Yang Y, Tang Z, Long H, Wang J (2017) Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv:1703.10069

  38. Leibo JZ, Zambaldi VF, Lanctot M, Marecki J, Graepel T (2017) Multi-agent reinforcement learning in sequential social dilemmas. arXiv:1702.03037

  39. Zhang K, Yang Z, Basar T (2018) Networked multi-agent reinforcement learning in continuous spaces. In: 2018 IEEE Conference on decision and control (CDC)2018, pp 2771– 2776

  40. Lu S, Zhang K, Chen T, Basar T, Horesh L (2021) Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning. In: Thirty-Fifth AAAI conference on artificial intelligence, AAAI, pp 8767–8775

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grants No. 61871430, No. 62172142, and No. 62102134, and in part by the Leading talents of science and technology in the Central Plain of China under Grants no. 214200510012, and in part by the Scientific and Technological Innovation Team of Colleges and Universities in Henan Province under Grants No. 20IRTSTHN018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wu Qingtao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Junru, S., Qiong, W., Muhua, L. et al. Decentralized multi-task reinforcement learning policy gradient method with momentum over networks. Appl Intell 53, 10365–10379 (2023). https://doi.org/10.1007/s10489-022-04028-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04028-8

Keywords

Navigation