Decentralized multi-task reinforcement learning policy gradient method with momentum over networks

Junru, Shi; Qiong, Wang; Muhua, Liu; Zhihang, Ji; Ruijuan, Zheng; Qingtao, Wu

doi:10.1007/s10489-022-04028-8

Decentralized multi-task reinforcement learning policy gradient method with momentum over networks

Published: 18 August 2022

Volume 53, pages 10365–10379, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Shi Junru¹,
Wang Qiong²,
Liu Muhua¹,
Ji Zhihang¹,
Zheng Ruijuan¹ &
…
Wu Qingtao ORCID: orcid.org/0000-0003-1572-5293¹

400 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

To find the optimal policy quickly for reinforcement learning problems, policy gradient (PG) method is very effective, it parameters the policy and updates policy parameter directly. Besides, momentum methods are commonly employed to improve convergence performance in the training of centralized deep networks, which can accelerate training rate by changing the descending direction of gradients. However, decentralized variants with momentum of PG are rarely investigated. For this reason, we propose a Decentralized Policy Gradient algorithm with Momentum called DPGM for solving multi-task reinforcement learning problems. Moreover, this article makes theoretical analysis on the convergence performance of DPGM rigorously, it can reach the rate of O(1/T), where T denotes the number of iterations. This rate can match the state of the art of decentralized PG methods. Furthermore, we provide experimental verification on decentralized reinforcement learning environment to support the theoretical result.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ε-Maximum Critic Deep Deterministic Policy Gradient for Multi-agent Reinforcement Learning

Cooperative Multi-agent Policy Gradient

SMPG: Adaptive Soft Update for Masked MADDPG

Notes

https://github.com/openai/multiagent-particle-envs

References

Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J (2019) A guide to deep learning in healthcare. Nat Med 25(1):24–29
Article Google Scholar
Wang WY, Li J, He X (2018) Deep reinforcement learning for NLP. In: Proceedings of the 56th annual metting of association for computational linguistics, ACL, pp 19–21
Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE International conference on robotics and automation, ICRA 2017, Singapore, Singapore, May 29 - June 3, 2017, pp 3357–3364
Sutton RS, Barto AG (2018) Reinforcement learning - an introduction. The MIT Press Cambridge, pp 1–552
Rummery GA, Niranjan M (1994) On-line q-learning using connectionist systems. Technical Report, 37
Tesauro G (1995) Temporal difference learning and td-gammon. Commun ACM 38(3):58–68
Article Google Scholar
Sutton RS, McAllester DA, Singh SP, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, NIPS, pp 1057–1063
Fazel M, Ge R, Kakade SM, Mesbahi M (2018) Global convergence of policy gradient methods for the linear quadratic regulator. In: Proceedings of the 35th international conference on machine learning, ICML, pp 1466–1475
Tu S, Recht B (2019) The gap between model-based and model-free methods on the linear quadratic regulator: an asymptotic viewpoint. In: Conference on learning theory, COLT, pp 3036–3083
Luo Y, Chiu C, Jaitly N, Sutskever I (2017) Learning online alignments with continuous rewards policy gradient. In: 2017 IEEE International conference on acoustics, speech and signal processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017, pp 2801–2805
Andreas J, Klein D, Levine S (2017) Modular multitask reinforcement learning with policy sketches. In: Proceedings of the 34th international conference on machine learning, ICML, pp 166–175
Igl M, Gambardella A, He J, Nardelli N, Siddharth N, Boehmer W, Whiteson S (2020) Multitask soft option learning. In: Proceedings of the 36th conference on uncertainty in artificial intelligence, UAI, pp 969–978
D’Eramo C, Tateo D, Bonarini A, Restelli M, Peters J (2020) Sharing knowledge in multi-task deep reinforcement learning. In: Proceedings of the 8th international conference on learning representations, ICLR
Cui F, Di H, Shen L, Ouchi K, Liu Z, Xu J (2022) Modeling semantic and emotional relationship in multi-turn emotional conversations using multi-task learning. Appl Intell 52(4):4663–4673
Article Google Scholar
Zeng S, Anwar MA, Doan TT, Raychowdhury A, Romberg J (2021) A decentralized policy gradient approach to multi-task reinforcement learning. In: Proceedings of the thirty-seventh conference on uncertainty in artificial intelligence, UAI 2021, virtual event, 27-30 July 2021. Proceedings of machine learning research, vol 161, pp 1002–1012
Ma W, Dentcheva D, Zavlanos MM (2017) Risk-averse sensor planning using distributed policy gradient. In: American control conference, ACC, pp 4839–4844
Pinyoanuntapong P, Lee M, Wang P (2019) Distributed multi-hop traffic engineering via stochastic policy gradient reinforcement learning. In: IEEE Global communications conference, GLOBECOM, pp 1–6
Khan A, Kumar V, Ribeiro A (2021) Large scale distributed collaborative unlabeled motion planning with graph policy gradients. IEEE Robot Autom Lett 6(3):5340–5347
Article Google Scholar
Bono G, Dibangoye JS, Matignon L, Pereyron F, Simonin O (2018) Cooperative multi-agent policy gradient. In: Machine learning and knowledge discovery in databases - European conference, ECML PKDD, pp 459–476
Lu S, Zhang K, Chen T, Basar T, Horesh L (2021) Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning. In: Proceedings of the 35th conference on artificial intelligence, AAAI, pp 8767–8775
Cutkosky A, Orabona F (2019) Momentum-based variance reduction in non-convex SGD. In: Advances in neural information processing systems, NIPS, pp 15210–15219
Tao W, Wu G, Tao Q (2022) Momentum acceleration in the individual convergence of nonsmooth convex optimization with constraints. IEEE Trans Neur Netw Learn Syst 33(3):1107–1118
Article MathSciNet Google Scholar
Huang F, Gao S, Pei J, Huang H (2020) Momentum-based policy gradient methods. In: Proceedings of the 37th international conference on machine learning, ICML, pp 4422–4433. http://proceedings.mlr.press/v119/huang20a.html. Accessed 13 Aug 2021
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8:229–256
Article MATH Google Scholar
Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Itell Res 15:319–350
MathSciNet MATH Google Scholar
Schulman J, Levine S, Abbeel P, Jordan MI, Moritz P (2015) Trust region policy optimization. In: Proceedings of the 32nd international conference on machine learning, ICML, pp 1889–1897
Pirotta M, Restelli M, Bascetta L (2015) Policy gradient in Lipschitz Markov decision processes. Mach Learn 100(2-3):255–283
Article MathSciNet MATH Google Scholar
Agarwal A, Kakade SM, Lee JD, Mahajan G (2020) Optimality and approximation with policy gradient methods in markov decision processes. In: Conference on learning theory, COLT 2020, 9-12 July 2020, virtual event [Graz, Austria]. Proceedings of machine learning research, vol 125, pp 64–66
Shen Z, Ribeiro A, Hassani H, Qian H, Mi C (2019) Hessian aided policy gradient. In: Proceedings of the 36th international conference on machine learning, ICML, vol 97, pp 5729– 5738
Xu P, Gao F, Gu Q (2019) An improved convergence analysis of stochastic variance-reduced policy gradient. In: Proceedings of the 35th conference on uncertainty in artificial intelligence, UAI, pp 541–551
Papini M, Binaghi D, Canonaco G, Pirotta M, Restelli M (2018) Stochastic variance-reduced policy gradient. In: Proceedings of the 35th international conference on machine learning, ICML, pp 4023–4032
Yuan H, Lian X, Liu J, Zhou Y (2020) Stochastic recursive momentum for policy gradient methods. arXiv:2003.04302
Lu S, Zhang K, Chen T, Basar T, Horesh L (2021) Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning. In: Thirty-Fifth AAAI conference on artificial intelligence, AAAI 2021, thirty-third conference on innovative applications of artificial intelligence, IAAI 2021, The eleventh symposium on educational advances in artificial intelligence, EAAI 2021, pp 8767–8775
Jiang Z, Lee XY, Tan SY, Tan KL, Balu A, Lee YM, Hegde C, Sarkar S (2021) MDPGT: momentum-based decentralized policy gradient tracking. arXiv:2112.02813
Khanduri P, Sharma P, Kafle S, Bulusu S, Rajawat K, Varshney PK (2020) Distributed stochastic non-convex optimization: momentum-based variance reduction. arXiv:2005.00224
Foerster JN, Assael YM, de Freitas N, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. arXiv:1605.06676
Peng P, Yuan Q, Wen Y, Yang Y, Tang Z, Long H, Wang J (2017) Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv:1703.10069
Leibo JZ, Zambaldi VF, Lanctot M, Marecki J, Graepel T (2017) Multi-agent reinforcement learning in sequential social dilemmas. arXiv:1702.03037
Zhang K, Yang Z, Basar T (2018) Networked multi-agent reinforcement learning in continuous spaces. In: 2018 IEEE Conference on decision and control (CDC)2018, pp 2771– 2776
Lu S, Zhang K, Chen T, Basar T, Horesh L (2021) Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning. In: Thirty-Fifth AAAI conference on artificial intelligence, AAAI, pp 8767–8775

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grants No. 61871430, No. 62172142, and No. 62102134, and in part by the Leading talents of science and technology in the Central Plain of China under Grants no. 214200510012, and in part by the Scientific and Technological Innovation Team of Colleges and Universities in Henan Province under Grants No. 20IRTSTHN018.

Author information

Authors and Affiliations

School of Information Engineering, Henan University of Science and Technology, Luoyang, 471023, Henan, China
Shi Junru, Liu Muhua, Ji Zhihang, Zheng Ruijuan & Wu Qingtao
Institute of Industrial Economics, Chinese Academy of Social Sciences, Beijing, 100732, Beijing, China
Wang Qiong

Authors

Shi Junru
View author publications
You can also search for this author in PubMed Google Scholar
Wang Qiong
View author publications
You can also search for this author in PubMed Google Scholar
Liu Muhua
View author publications
You can also search for this author in PubMed Google Scholar
Ji Zhihang
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Ruijuan
View author publications
You can also search for this author in PubMed Google Scholar
Wu Qingtao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wu Qingtao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Junru, S., Qiong, W., Muhua, L. et al. Decentralized multi-task reinforcement learning policy gradient method with momentum over networks. Appl Intell 53, 10365–10379 (2023). https://doi.org/10.1007/s10489-022-04028-8

Download citation

Accepted: 22 July 2022
Published: 18 August 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10489-022-04028-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decentralized multi-task reinforcement learning policy gradient method with momentum over networks

Abstract

Access this article

Similar content being viewed by others

ε-Maximum Critic Deep Deterministic Policy Gradient for Multi-agent Reinforcement Learning

Cooperative Multi-agent Policy Gradient

SMPG: Adaptive Soft Update for Masked MADDPG

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Decentralized multi-task reinforcement learning policy gradient method with momentum over networks

Abstract

Access this article

Similar content being viewed by others

ε-Maximum Critic Deep Deterministic Policy Gradient for Multi-agent Reinforcement Learning

Cooperative Multi-agent Policy Gradient

SMPG: Adaptive Soft Update for Masked MADDPG

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation