Cooperative Multi-Agent Reinforcement Learning with Dynamic Target Localization: A Reward Sharing Approach

Wickramaarachchi, Helani; Kirley, Michael; Geard, Nicholas

doi:10.1007/978-981-99-8391-9_25

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14472))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

581 Accesses

Abstract

Cooperation in multi-agent reinforcement learning (MARL) facilitates the acquisition of complex problem-solving skills and promotes more efficient and effective decision-making among agents. Numerous strategies for cooperative learning in MARL exist, including joint action learning, task decomposition, role assignment, and communication protocols. However, deploying these strategies in a complex and dynamic environment remains challenging. To address such challenges, we propose a technique that uses reward sharing to enhance cooperation in partially observable multi-agent environments. As an extension of reward shaping, reward sharing allows agents to work together towards a global objective while still pursuing their local objectives. This approach can foster cooperation and reduce competition between agents without explicit communication, ultimately leading to faster learning and better performance. This study compares three different reward sharing techniques: the Performance Incentive (PI), the Observer’s Share (OS), and the Synergy Achievement (SA) in the context of dynamic target localization, focusing on simulation studies. Thereafter, the proposed reward sharing techniques are evaluated under the effects of objective prioritization, various agent counts, and a variety of map sizes. The research reveals that the proposed reward sharing techniques enhance agent performance, scaling the number of agents leads to higher rewards, and demonstrates a negative correlation between map size and average rewards.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yang, J., Borovikov, I., Zha, H.: Hierarchical cooperative multi-agent reinforcement learning with skill discovery. In: Adaptive Agents and Multi-Agent Systems (2019)
Google Scholar
Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Google Scholar
Hu, Z., Zhao, D.: Reinforcement learning for multi-agent patrol policy. In: 9th IEEE International Conference on Cognitive Informatics (ICCI’10), pp. 530–535 (2010)
Google Scholar
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: AAAI/IAAI (1998)
Google Scholar
Rashid, T., Samvelyan, M., De Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning (2018). ArXiv, abs/1803.11485
Google Scholar
Marzari, L., Pore, A., Dall’Alba, D., Aragon-Camarasa, G., Farinelli, A., Fiorini, P.: Towards hierarchical task decomposition using deep reinforcement learning for pick and place subtasks. In: 2021 20th International Conference on Advanced Robotics (ICAR), pp. 640–645 (2021)
Google Scholar
Chaimowicz, L., Campos, M.F., Kumar, V.: Dynamic role assignment for cooperative robots. In: Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292), vol. 1, pp. 293–298 (2002)
Google Scholar
Foerster, J.N., Assael, Y., De Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning (2016). ArXiv, abs/1605.06676
Google Scholar
Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.) AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 66–83. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71682-4_5
Chapter Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning (2013). ArXiv, abs/1312.5602
Google Scholar
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: International Conference on Machine Learning (1999)
Google Scholar
Wiewiora, E., Cottrell, G.W., Elkan, C.: Principled methods for advising reinforcement learning agents. In: Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML’03, pp. 792–799. AAAI Press (2003)
Google Scholar
Mannion, P., Devlin, S., Mason, K., Duggan, J., Howley, E.: Policy invariance under reward transformations for multi-objective reinforcement learning. Neurocomputing 263, 60–73 (2017)
Article Google Scholar
Mannion, P., Devlin, S., Duggan, J., Howley, E.: Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning. Knowl. Eng. Rev. 33, e23 (2018). https://doi.org/10.1017/S0269888918000292. Cambridge University Press
Grześ, M., Kudenko, D.: Multigrid reinforcement learning with reward shaping. In: Kurková, V., Neruda, R., Koutník, J. (eds.) ICANN 2008. LNCS, vol. 5163, pp. 357–366. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87536-9_37
Grzes, M., Kudenko, D.: Reinforcement learning with reward shaping and mixed resolution function approximation. Int. J. Agent Technol. Syst. 1, 36–54 (2009)
Article Google Scholar
Ferreira, E., Lefèvre, F.: Reinforcement-learning based dialogue system for human-robot interactions with socially-inspired rewards. Comput. Speech Lang. 34, 256–274 (2015)
Article Google Scholar
Devlin, S., Yliniemi, L., Kudenko, D., Tumer, K.: Potential-based difference rewards for multiagent reinforcement learning. In: Adaptive Agents and Multi-Agent Systems (2014)
Google Scholar
Kim, D., et al.: Learning to schedule communication in multi-agent reinforcement learning (2019). ArXiv, abs/1902.01554
Google Scholar
Hostallero, D.E., Kim, D., Moon, S., Son, K., Kang, W.J., Yi, Y.: Inducing cooperation through reward reshaping based on peer evaluations in deep multi-agent reinforcement learning. In: AAMAS (2020)
Google Scholar
Co-Reyes, J.D., Sanjeev, S., Berseth, G., Gupta, A., Levine, S.: Ecological reinforcement learning (2020). ArXiv, abs/2006.12478
Google Scholar
Huang, B., Jin, Y.: Reward shaping in multiagent reinforcement learning for self-organizing systems in assembly tasks. Adv. Eng. Inform. 54, 101800 (2022)
Article Google Scholar
Konidaris, G.D., Barto, A.G.: Autonomous shaping: knowledge transfer in reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning (2006)
Google Scholar
Rouček, T., et al.: DARPA subterranean challenge: multi-robotic exploration of underground environments. In: Mazal, J., Fagiolini, A., Vasik, P. (eds.) MESAS 2019. LNCS, vol. 11995, pp. 274–290. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43890-6_22
Stone, P., Veloso, M.: Multiagent systems: a survey from a machine learning perspective (2000)
Google Scholar
Chen, X., Ghadirzadeh, A., Björkman, M., Jensfelt, P.: Meta-learning for multi-objective reinforcement learning (2018)
Google Scholar
Deep reinforcement learning framework for autonomous driving. Electron. Imaging 2017(19), 70–76 (2017)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). ArXiv, abs/1707.06347
Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization (2015). ArXiv, abs/1502.05477
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
Helani Wickramaarachchi, Michael Kirley & Nicholas Geard
ARC Training Centre in Optimisation Technologies, Integrated Methodologies, and Applications (OPTIMA), Melbourne, Australia
Helani Wickramaarachchi & Michael Kirley

Authors

Helani Wickramaarachchi
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kirley
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Geard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Helani Wickramaarachchi .

Editor information

Editors and Affiliations

The University of Sydney, Darlington, NSW, Australia
Tongliang Liu
Monash University, Clayton, VIC, Australia
Geoff Webb
The University of Newcastle, Callaghan, NSW, Australia
Lin Yue
CSIRO Data61, Sydney, NSW, Australia
Dadong Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wickramaarachchi, H., Kirley, M., Geard, N. (2024). Cooperative Multi-Agent Reinforcement Learning with Dynamic Target Localization: A Reward Sharing Approach. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_25

Download citation

DOI: https://doi.org/10.1007/978-981-99-8391-9_25
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8390-2
Online ISBN: 978-981-99-8391-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cooperative Multi-Agent Reinforcement Learning with Dynamic Target Localization: A Reward Sharing Approach