Skip to main content
Log in

Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning

基于长周期极坐标系追击问题的多智能体强化学习奖赏函数设计方法

  • Published:
Journal of Shanghai Jiaotong University (Science) Aims and scope Submit manuscript

Abstract

Multi-agent reinforcement learning has recently been applied to solve pursuit problems. However, it suffers from a large number of time steps per training episode, thus always struggling to converge effectively, resulting in low rewards and an inability for agents to learn strategies. This paper proposes a deep reinforcement learning (DRL) training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before. The ensemble reward function combines the advantages of two reward functions, which enhances the training effect of agents in long episode. Then, we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation. Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’ policy scores of the task. These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems, leading to an improved model training performance.

摘要

多智能体强化学习最近被应用于解决追击问题。然而,当算法面临训练的时间步数较多的长周期任务时,会遇到算法难以训练收敛的问题,进而导致智能体奖励较低、无法有效学习策略。提出了一种深度强化学习训练方法,采用联合分段多奖励函数设计方法来解决前面提到的收敛问题。联合奖励函数结合了两种不同特性的奖励函数的优点,增强了智能体在长周期任务中的训练效果。然后,提出方法消除了传统二维极坐标观测表示法中三角函数带来的奖励函数非单调行为。实验结果表明,在追逐场景中,提出的方法优于传统的单一奖励函数机制,提高了智能体在追击任务中的策略得分。方法为深度强化学习模型在长周期极坐标系追击问题中面临的收敛难题提供了解决方案,提高了模型训练性能。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search [J]. Nature, 2016, 529: 484–489.

    Article  Google Scholar 

  2. SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of Go without human knowledge [J]. Nature, 2017, 550: 354–359.

    Article  Google Scholar 

  3. BERNER C, BROCKMAN G, CHAN B, et al. Dota 2 with large scale deep reinforcement learning [DB/OL]. (2019-12-13). http://arxiv.org/abs/1912.06680

  4. VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning [J]. Nature, 2019, 575: 350–354.

    Article  Google Scholar 

  5. KOBER J, BAGNELL J A, PETERS J. Reinforcement learning in robotics: A survey [J]. International Journal of Robotics Research, 2013, 32(11): 1238–1274.

    Article  Google Scholar 

  6. LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning [DB/OL]. (2015-09-09). http://arxiv.org/abs/1509.02971

  7. LI P, RUAN X, ZHU X Q, et al. A regionalization vision navigation method based on deep reinforcement learning [J]. Journal of Shanghai Jiao Tong University, 2021, 55(5): 575–585 (in Chinese).

    Google Scholar 

  8. SHALEV-SHWARTZ S, SHAMMAH S, SHASHUA A. Safe, multi-agent, reinforcement learning for autonomous driving [DB/OL]. (2016-10-11). https://arxiv.org/abs/1610.03295

  9. ZHOU Y, ZHOU L, DING J, et al. Power network topology optimization and power flow control based on deep reinforcement learning [J]. Journal of Shanghai Jiao Tong University, 2021, 55(S2): 7–14 (in Chinese).

    Google Scholar 

  10. LU Q B, LIU T Y, ZHANG R, et al. Generation approach of human-robot cooperative assembly strategy based on transfer learning [J]. Journal of Shanghai Jiao Tong University (Science), 2022, 27(5): 602–613.

    Google Scholar 

  11. LIU Y, SHEN X, GU X, et al. A dual-system reinforcement learning method for flexible job shop dynamic scheduling [J]. Journal of Shanghai Jiao Tong University, 2022, 56(9): 1262–1275 (in Chinese).

    Google Scholar 

  12. MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning [J]. Nature, 2015, 518: 529–533.

    Article  Google Scholar 

  13. BUSONIU L, BABUSKA R, DE SCHUTTER B. A comprehensive survey of multiagent reinforcement learning [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38(2): 156–172.

    Article  Google Scholar 

  14. BUŞONIU L, BABUŠKA R, DE SCHUTTER B. Multi-agent reinforcement learning: An overview [M]//Innovations in multi-agent systems and applications - 1. Berlin, Heidelberg: Springer, 2010: 183–221.

    Chapter  Google Scholar 

  15. FOERSTER J N, ASSAEL Y M, DE FREITAS N, et al. Learning to communicate with Deep multi-agent reinforcement learning [C]//30th International Conference on Neural Information Processing Systems. Barcelona: NIPS, 2016: 2145–2153.

    Google Scholar 

  16. JIANG J C, LU Z Q. Learning attentional communication for multi-agent cooperation [C]//32nd International Conference on Neural Information Processing Systems. Montréal: NIPS, 2018: 7265–7275.

    Google Scholar 

  17. SUKHBAATAR S, SZLAM A, FERGUS R. Learning multiagent communication with backpropagation [C]//30th International Conference on Neural Information Processing Systems. Barcelona: NIPS, 2016: 2252–2260.

    Google Scholar 

  18. PENG P, WEN Y, YANG Y D, et al. Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play StarCraft combat games [DB/OL]. (2017-03-29). https://arxiv.org/abs/1703.10069

  19. JIANG J, DUN C, HUANG T, et al. Graph convolutional reinforcement learning [J]. (2018-10-22). https://arxiv.org/abs/1810.09202

  20. SINGH A, JAIN T, SUKHBAATAR S. Learning when to communicate at scale in multiagent cooperative and competitive tasks [DB/OL]. (2018-12-23). http://arxiv.org/abs/1812.09755

  21. KIM D, MOON S, HOSTALLERO D, et al. Learning to schedule communication in multiagent reinforcement learning [DB/OL]. (2019-02-09). http://arxiv.org/abs/1902.01554

  22. DAS A, GERVET T, ROMOFF J, et al. TarMAC: Targeted multi-agent communication [C]//36th International Conference on Machine Learning. Long Beach: PMLR 97, 2019: 1538–1546.

    Google Scholar 

  23. WANG Y F, ZHONG F W, XU J, et al. ToM2C: Target-oriented multi-agent communication and cooperation with theory of mind [DB/OL]. (2021-10-15). https://arxiv.org/abs/2111.09189

  24. SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning [DB/OL]. (2017-06-16). http://arxiv.org/abs/1706.05296

  25. WEI E M, WICKE D, FREELAN D, et al. Multiagent soft Q-learning [DB/OL]. (2018-04-25). https://arxiv.org/abs/1804.09817

  26. SON K, KIM D, KANG W J, et al. QTRAN: Learning to factorize with transformation for cooperative multiagent reinforcement learning [DB/OL]. (2019-05-14). http://arxiv.org/abs/1905.05408

  27. WANG J H, REN Z Z, LIU T, et al. QPLEX: Duplex dueling multi-agent Q-learning [DB/OL]. (2020-08-03). https://arxiv.org/abs/2008.01062

  28. TABISH R, MIKAYEL S, SCHROEDER D W C, et al. Monotonic value function factorisation for deep multiagent reinforcement learning [J]. Journal of Machine Learning Research, 2020, 21(1): 7234–7284.

    Google Scholar 

  29. YANG Y D, WEN Y, CHEN L H, et al. Multi-agent determinantal Q-learning [C]//37th International Conference on Machine Learning. Vienna: PMLR 119, 2020: 10757–10766.

    Google Scholar 

  30. FU W, YU C, XU Z L, et al. Revisiting some common practices in cooperative multi-agent reinforcement learning [DB/OL]. (2022-06-15). http://arxiv.org/abs/2206.07505

  31. LONG Q, ZHOU Z H, GUPTA A, et al. Evolutionary population curriculum for scaling multiagent reinforcement learning [DB/OL]. (2020-03-23). https://arxiv.org/abs/2003.10423

  32. WANG W X, YANG T P, LIU Y, et al. From few to more: Large-scale dynamic multiagent curriculum learning [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 7293–7300.

    Article  Google Scholar 

  33. FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 2974–2982.

    Article  Google Scholar 

  34. LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments [C]//31st International Conference on Neural Information Processing Systems. Long Beach: NIPS, 2017: 6382–6393.

    Google Scholar 

  35. MAHAJAN A, RASHID T, SAMVELYAN M, et al. MAVEN: Multi-agent variational exploration [DB/OL]. (2019-10-16). http://arxiv.org/abs/1910.07483

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Dong  (董鹏).

Ethics declarations

Conflict of Interest The authors declare that they have no conflict of interest.

Additional information

Foundation item: the National Natural Science Foundation of China (Nos. 61803260, 61673262 and 61175028)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, Y., Cui, T., Zhou, Y. et al. Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning. J. Shanghai Jiaotong Univ. (Sci.) (2024). https://doi.org/10.1007/s12204-024-2713-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12204-024-2713-4

Keywords

关键词

CLC number

Document code

Navigation