Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning

Dong, Yubo; Cui, Tao; Zhou, Yufan; Song, Xun; Zhu, Yue; Dong, Peng

doi:10.1007/s12204-024-2713-4

Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning

基于长周期极坐标系追击问题的多智能体强化学习奖赏函数设计方法

Published: 08 April 2024

(2024)
Cite this article

Journal of Shanghai Jiaotong University (Science) Aims and scope Submit manuscript

Yubo Dong (董玉博)¹,
Tao Cui (崔涛)¹,
Yufan Zhou (周禹帆)¹,
Xun Song (宋勋)²,
Yue Zhu (祝月)² &
…
Peng Dong (董鹏)¹

25 Accesses
Explore all metrics

Abstract

Multi-agent reinforcement learning has recently been applied to solve pursuit problems. However, it suffers from a large number of time steps per training episode, thus always struggling to converge effectively, resulting in low rewards and an inability for agents to learn strategies. This paper proposes a deep reinforcement learning (DRL) training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before. The ensemble reward function combines the advantages of two reward functions, which enhances the training effect of agents in long episode. Then, we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation. Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’ policy scores of the task. These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems, leading to an improved model training performance.

摘要

多智能体强化学习最近被应用于解决追击问题。然而,当算法面临训练的时间步数较多的长周期任务时,会遇到算法难以训练收敛的问题,进而导致智能体奖励较低、无法有效学习策略。提出了一种深度强化学习训练方法,采用联合分段多奖励函数设计方法来解决前面提到的收敛问题。联合奖励函数结合了两种不同特性的奖励函数的优点,增强了智能体在长周期任务中的训练效果。然后,提出方法消除了传统二维极坐标观测表示法中三角函数带来的奖励函数非单调行为。实验结果表明,在追逐场景中,提出的方法优于传统的单一奖励函数机制,提高了智能体在追击任务中的策略得分。方法为深度强化学习模型在长周期极坐标系追击问题中面临的收敛难题提供了解决方案,提高了模型训练性能。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Reinvent 4: Modern AI–driven generative molecule design

Article Open access 21 February 2024

References

SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search [J]. Nature, 2016, 529: 484–489.
Article Google Scholar
SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of Go without human knowledge [J]. Nature, 2017, 550: 354–359.
Article Google Scholar
BERNER C, BROCKMAN G, CHAN B, et al. Dota 2 with large scale deep reinforcement learning [DB/OL]. (2019-12-13). http://arxiv.org/abs/1912.06680
VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning [J]. Nature, 2019, 575: 350–354.
Article Google Scholar
KOBER J, BAGNELL J A, PETERS J. Reinforcement learning in robotics: A survey [J]. International Journal of Robotics Research, 2013, 32(11): 1238–1274.
Article Google Scholar
LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning [DB/OL]. (2015-09-09). http://arxiv.org/abs/1509.02971
LI P, RUAN X, ZHU X Q, et al. A regionalization vision navigation method based on deep reinforcement learning [J]. Journal of Shanghai Jiao Tong University, 2021, 55(5): 575–585 (in Chinese).
Google Scholar
SHALEV-SHWARTZ S, SHAMMAH S, SHASHUA A. Safe, multi-agent, reinforcement learning for autonomous driving [DB/OL]. (2016-10-11). https://arxiv.org/abs/1610.03295
ZHOU Y, ZHOU L, DING J, et al. Power network topology optimization and power flow control based on deep reinforcement learning [J]. Journal of Shanghai Jiao Tong University, 2021, 55(S2): 7–14 (in Chinese).
Google Scholar
LU Q B, LIU T Y, ZHANG R, et al. Generation approach of human-robot cooperative assembly strategy based on transfer learning [J]. Journal of Shanghai Jiao Tong University (Science), 2022, 27(5): 602–613.
Google Scholar
LIU Y, SHEN X, GU X, et al. A dual-system reinforcement learning method for flexible job shop dynamic scheduling [J]. Journal of Shanghai Jiao Tong University, 2022, 56(9): 1262–1275 (in Chinese).
Google Scholar
MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning [J]. Nature, 2015, 518: 529–533.
Article Google Scholar
BUSONIU L, BABUSKA R, DE SCHUTTER B. A comprehensive survey of multiagent reinforcement learning [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38(2): 156–172.
Article Google Scholar
BUŞONIU L, BABUŠKA R, DE SCHUTTER B. Multi-agent reinforcement learning: An overview [M]//Innovations in multi-agent systems and applications - 1. Berlin, Heidelberg: Springer, 2010: 183–221.
Chapter Google Scholar
FOERSTER J N, ASSAEL Y M, DE FREITAS N, et al. Learning to communicate with Deep multi-agent reinforcement learning [C]//30th International Conference on Neural Information Processing Systems. Barcelona: NIPS, 2016: 2145–2153.
Google Scholar
JIANG J C, LU Z Q. Learning attentional communication for multi-agent cooperation [C]//32nd International Conference on Neural Information Processing Systems. Montréal: NIPS, 2018: 7265–7275.
Google Scholar
SUKHBAATAR S, SZLAM A, FERGUS R. Learning multiagent communication with backpropagation [C]//30th International Conference on Neural Information Processing Systems. Barcelona: NIPS, 2016: 2252–2260.
Google Scholar
PENG P, WEN Y, YANG Y D, et al. Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play StarCraft combat games [DB/OL]. (2017-03-29). https://arxiv.org/abs/1703.10069
JIANG J, DUN C, HUANG T, et al. Graph convolutional reinforcement learning [J]. (2018-10-22). https://arxiv.org/abs/1810.09202
SINGH A, JAIN T, SUKHBAATAR S. Learning when to communicate at scale in multiagent cooperative and competitive tasks [DB/OL]. (2018-12-23). http://arxiv.org/abs/1812.09755
KIM D, MOON S, HOSTALLERO D, et al. Learning to schedule communication in multiagent reinforcement learning [DB/OL]. (2019-02-09). http://arxiv.org/abs/1902.01554
DAS A, GERVET T, ROMOFF J, et al. TarMAC: Targeted multi-agent communication [C]//36th International Conference on Machine Learning. Long Beach: PMLR 97, 2019: 1538–1546.
Google Scholar
WANG Y F, ZHONG F W, XU J, et al. ToM2C: Target-oriented multi-agent communication and cooperation with theory of mind [DB/OL]. (2021-10-15). https://arxiv.org/abs/2111.09189
SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning [DB/OL]. (2017-06-16). http://arxiv.org/abs/1706.05296
WEI E M, WICKE D, FREELAN D, et al. Multiagent soft Q-learning [DB/OL]. (2018-04-25). https://arxiv.org/abs/1804.09817
SON K, KIM D, KANG W J, et al. QTRAN: Learning to factorize with transformation for cooperative multiagent reinforcement learning [DB/OL]. (2019-05-14). http://arxiv.org/abs/1905.05408
WANG J H, REN Z Z, LIU T, et al. QPLEX: Duplex dueling multi-agent Q-learning [DB/OL]. (2020-08-03). https://arxiv.org/abs/2008.01062
TABISH R, MIKAYEL S, SCHROEDER D W C, et al. Monotonic value function factorisation for deep multiagent reinforcement learning [J]. Journal of Machine Learning Research, 2020, 21(1): 7234–7284.
Google Scholar
YANG Y D, WEN Y, CHEN L H, et al. Multi-agent determinantal Q-learning [C]//37th International Conference on Machine Learning. Vienna: PMLR 119, 2020: 10757–10766.
Google Scholar
FU W, YU C, XU Z L, et al. Revisiting some common practices in cooperative multi-agent reinforcement learning [DB/OL]. (2022-06-15). http://arxiv.org/abs/2206.07505
LONG Q, ZHOU Z H, GUPTA A, et al. Evolutionary population curriculum for scaling multiagent reinforcement learning [DB/OL]. (2020-03-23). https://arxiv.org/abs/2003.10423
WANG W X, YANG T P, LIU Y, et al. From few to more: Large-scale dynamic multiagent curriculum learning [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 7293–7300.
Article Google Scholar
FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 2974–2982.
Article Google Scholar
LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments [C]//31st International Conference on Neural Information Processing Systems. Long Beach: NIPS, 2017: 6382–6393.
Google Scholar
MAHAJAN A, RASHID T, SAMVELYAN M, et al. MAVEN: Multi-agent variational exploration [DB/OL]. (2019-10-16). http://arxiv.org/abs/1910.07483

Download references

Author information

Authors and Affiliations

School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai, 200240, China
Yubo Dong (董玉博), Tao Cui (崔涛), Yufan Zhou (周禹帆) & Peng Dong (董鹏)
Beijing Institute of Electronic System Engineering, Beijing, 100854, China
Xun Song (宋勋) & Yue Zhu (祝月)

Authors

Yubo Dong (董玉博)
View author publications
You can also search for this author in PubMed Google Scholar
Tao Cui (崔涛)
View author publications
You can also search for this author in PubMed Google Scholar
Yufan Zhou (周禹帆)
View author publications
You can also search for this author in PubMed Google Scholar
Xun Song (宋勋)
View author publications
You can also search for this author in PubMed Google Scholar
Yue Zhu (祝月)
View author publications
You can also search for this author in PubMed Google Scholar
Peng Dong (董鹏)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Dong (董鹏).

Ethics declarations

Conflict of Interest The authors declare that they have no conflict of interest.

Additional information

Foundation item: the National Natural Science Foundation of China (Nos. 61803260, 61673262 and 61175028)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, Y., Cui, T., Zhou, Y. et al. Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning. J. Shanghai Jiaotong Univ. (Sci.) (2024). https://doi.org/10.1007/s12204-024-2713-4

Download citation

Received: 02 June 2023
Accepted: 10 October 2023
Published: 08 April 2024
DOI: https://doi.org/10.1007/s12204-024-2713-4

Keywords

关键词

CLC number

TP242

Document code

A

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning

Abstract

摘要

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Reinvent 4: Modern AI–driven generative molecule design

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

关键词

CLC number

Document code

Navigation

Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning

Abstract

摘要

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Reinvent 4: Modern AI–driven generative molecule design

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

关键词

CLC number

Document code

Search

Navigation