Collective Intrinsic Motivation of a Multi-agent System Based on Reinforcement Learning Algorithms

Bolshakov , Vladislav; Sakulin, Sergey; Alfimtsev, Alexander

doi:10.1007/978-3-031-47718-8_42

Vladislav Bolshakov ¹⁰,
Sergey Sakulin¹⁰ &
Alexander Alfimtsev¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 825))

Included in the following conference series:

Intelligent Systems Conference

108 Accesses

Abstract

One of the great challenges in reinforcement learning is learning an optimal behavior in environments with sparse rewards. Solving tasks in such setting require effective exploration methods that are often based on intrinsic rewards. Plenty of real-world problems involve sparse rewards and many of them are further complicated by multi-agent setting, where the majority of intrinsic motivation methods are ineffective. In this paper we address the problem of multi-agent environments with sparse rewards and propose to combine intrinsic rewards and multi-agent reinforcement learning (MARL) technics to create the Collective Intrinsic Motivation of Agents (CIMA) method. CIMA uses both the external reward and the intrinsic collective reward from the cooperative multi-agent system. The proposed method can be used along with any MARL method as base reinforcement learning algorithm. We compare CIMA with several state-of-the-art MARL methods within multi-agent environment with sparse rewards designed in StarCraft II.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amato, C., Konidaris, G., Cruz, G., Maynor, C., How, J., Kaelbling, L.: Planning for decentralized control of multiple robots under uncertainty 2015, 5 (2015)
Google Scholar
Aubret, A., Matignon, L., Hassas, S.: A survey on intrinsic motivation in reinforcement learning (2019)
Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Google Scholar
Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016)
Google Scholar
Bellemare, M., Veness, J., Talvitie, E.: Skip context tree switching. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pp. 1458–1466, Bejing, China, 22–24 Jun 2014. PMLR
Google Scholar
Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation (2018)
Google Scholar
Calvaresi, D., Dicente Cid, Y., Marinoni, M., Dragoni, A.F., Najjar, A., Schumacher, M.: Real-time multi-agent systems: rationality, formal model, and empirical results. 35(1) (2021)
Google Scholar
Du, Y., Han, L., Fang, M., Liu, J., Dai, T., Tao, D.: LIIR: learning individual intrinsic reward in multi-agent reinforcement learning. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing System, vol. 32. Curran Associates, Inc. (2019)
Google Scholar
Efroni, Y., Mannor, S., Pirotta, M.: Exploration-exploitation in constrained MDPs (2020)
Google Scholar
Fu, J., Co-Reyes, J., Levine, S.: Ex2: exploration with exemplar models for deep reinforcement learning. In: Guyon, I., Von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Gronauer, S., Diepold, K.: Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev. 55(2), 895–943 (2022)
Google Scholar
Jiang, J., Lu, Z.: The emergence of individuality. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 4992–5001. PMLR, 18–24 July 2021
Google Scholar
Kim, H., Kim, J., Jeong, Y., Levine, S., Song, H.O.: EMI: exploration with mutual information. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 3360–3369. PMLR, 9–15 June 2019
Google Scholar
Kim, Y., Nam, W., Kim, H., Kim, J.-H., Kim, G.: Curiosity-bottleneck: exploration by distilling task-specific novelty. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 3379–3388. PMLR, 9–15 June 2019
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013)
Google Scholar
Klissarov, M., Islam, R., Khetarpal, K., Precup, D.: Variational state encoding as intrinsic motivation in reinforcement learning (2019)
Google Scholar
Kraemer, L., Banerjee, B.: Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190, 82–94 (2016)
Article Google Scholar
Lazic, N., Boutilier, C., Lu, T., Wong, E., Roy, B., Ryu, M.K., Imwalle, G.: Data center cooling using model-predictive control. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018) Lazic, N., Boutilier, C., Lu, T., Wong, E., Roy, B., Ryu, M.K., Imwalle, G.: Data center cooling using model-predictive control. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)
Google Scholar
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P, Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Guyon, I., Von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Machado, M.C., Bellemare, M.G., Bowling, M.: Count-based exploration with the successor representation (2018)
Google Scholar
Martin, J., Sasikumar, S.N., Everitt, T., Hutter, M.: Count-based exploration in feature space for reinforcement learning (2017)
Google Scholar
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T.P., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning (2016)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning (2013)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533, February 2015
Google Scholar
Oh, J., Guo, X., Lee, H., Lewis, R., Singh, S.: Action-conditional video prediction using deep networks in Atari games (2015)
Google Scholar
Ostrovski, G., Bellemare, M.G., van den Oord, A., Munos, R.: Count-based exploration with neural density models. In: Precup, D., Teh, W. (eds.) Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 2721–2730. PMLR, 6–11 August 2017
Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: ICML’17, pp. 2778–2787 (2017). JMLR.org
Google Scholar
Sallab, A.E.L., Abdou, M., Perot, E., Yogamani, S.: Deep reinforcement learning framework for autonomous driving. Electron. Imaging 29(19), 70–76 (2017)
Google Scholar
Samvelyan, M., Rashid, T., de Witt, C.S., Farquhar, G., Nardelli, N., Rudner, T.G.J., Hung, C.-M., Torr, P.H.S., Foerster, J., Whiteson, S.: The starcraft multi-agent challenge (2019)
Google Scholar
Savinov, N., Raichuk, A., Marinier, R., Vincent, D., Pollefeys, M., Lillicrap, T., Gelly, S.: Episodic curiosity through reachability (2018)
Google Scholar
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pp. 387–395, Bejing, China, 22–24 June 2014. PMLR
Google Scholar
Singh, S., Lewis, R.L., Barto, A.G., Sorg, J.: Intrinsically motivated reinforcement learning: an evolutionary perspective. IEEE Trans. Auton. Ment. Dev. 2(2), 70–82 (2010)
Article Google Scholar
Stadie, B.C., Levine, S., Abbeel, P.: Incentivizing exploration in reinforcement learning with deep predictive models (2015)
Google Scholar
Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, X., Duan, Y., Schulman, J., De Turck, F., Abbeel, P.: #Exploration: a study of count-based exploration for deep reinforcement learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 2750–2759, Red Hook, NY, USA. Curran Associates Inc. (2017)
Google Scholar
Vinyals, O., Babuschkin, I., Czarnecki, W.M., Mathieu, M., Dudzik, A., Chung, J., Choi, D.H., Powell, R., Ewalds, T., Georgiev, P., Junhyuk, O., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J.P., Jaderberg, M., Vezhnevets, A.S., Leblond, R., Pohlen, T., Dalibard, V., Budden, D., Sulsky, Y., Molloy, J., Paine, T.L., Gulcehre, C., Wang, Z., Pfaff, T., Yuhuai, W., Ring, R., Yogatama, D., Wünsch, D., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Kavukcuoglu, K., Hassabis, D., Apps, C., Silver, D.: Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Google Scholar
Wiering, M.: Multi-agent reinforcement leraning for traffic light control. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML’00, pp. 1151–1158, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. (2000)
Google Scholar
Yang, Y.: Many-agent reinforcement learning (2021)
Google Scholar

Download references

Acknowledgements

Research by S. Sakulin and A. Alfimtsev with support of Grant RNF No 22-21-00711.

Author information

Authors and Affiliations

BMSTU, Moscow, Russia
Vladislav Bolshakov , Sergey Sakulin & Alexander Alfimtsev

Authors

Vladislav Bolshakov
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Sakulin
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Alfimtsev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladislav Bolshakov .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bolshakov , V., Sakulin, S., Alfimtsev, A. (2024). Collective Intrinsic Motivation of a Multi-agent System Based on Reinforcement Learning Algorithms. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 825. Springer, Cham. https://doi.org/10.1007/978-3-031-47718-8_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-47718-8_42
Published: 14 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47717-1
Online ISBN: 978-3-031-47718-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Collective Intrinsic Motivation of a Multi-agent System Based on Reinforcement Learning Algorithms