Skip to main content

Collective Intrinsic Motivation of a Multi-agent System Based on Reinforcement Learning Algorithms

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 825))

Included in the following conference series:

  • 108 Accesses

Abstract

One of the great challenges in reinforcement learning is learning an optimal behavior in environments with sparse rewards. Solving tasks in such setting require effective exploration methods that are often based on intrinsic rewards. Plenty of real-world problems involve sparse rewards and many of them are further complicated by multi-agent setting, where the majority of intrinsic motivation methods are ineffective. In this paper we address the problem of multi-agent environments with sparse rewards and propose to combine intrinsic rewards and multi-agent reinforcement learning (MARL) technics to create the Collective Intrinsic Motivation of Agents (CIMA) method. CIMA uses both the external reward and the intrinsic collective reward from the cooperative multi-agent system. The proposed method can be used along with any MARL method as base reinforcement learning algorithm. We compare CIMA with several state-of-the-art MARL methods within multi-agent environment with sparse rewards designed in StarCraft II.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amato, C., Konidaris, G., Cruz, G., Maynor, C., How, J., Kaelbling, L.: Planning for decentralized control of multiple robots under uncertainty 2015, 5 (2015)

    Google Scholar 

  2. Aubret, A., Matignon, L., Hassas, S.: A survey on intrinsic motivation in reinforcement learning (2019)

    Google Scholar 

  3. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Google Scholar 

  4. Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016)

    Google Scholar 

  5. Bellemare, M., Veness, J., Talvitie, E.: Skip context tree switching. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pp. 1458–1466, Bejing, China, 22–24 Jun 2014. PMLR

    Google Scholar 

  6. Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation (2018)

    Google Scholar 

  7. Calvaresi, D., Dicente Cid, Y., Marinoni, M., Dragoni, A.F., Najjar, A., Schumacher, M.: Real-time multi-agent systems: rationality, formal model, and empirical results. 35(1) (2021)

    Google Scholar 

  8. Du, Y., Han, L., Fang, M., Liu, J., Dai, T., Tao, D.: LIIR: learning individual intrinsic reward in multi-agent reinforcement learning. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing System, vol. 32. Curran Associates, Inc. (2019)

    Google Scholar 

  9. Efroni, Y., Mannor, S., Pirotta, M.: Exploration-exploitation in constrained MDPs (2020)

    Google Scholar 

  10. Fu, J., Co-Reyes, J., Levine, S.: Ex2: exploration with exemplar models for deep reinforcement learning. In: Guyon, I., Von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  11. Gronauer, S., Diepold, K.: Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev. 55(2), 895–943 (2022)

    Google Scholar 

  12. Jiang, J., Lu, Z.: The emergence of individuality. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 4992–5001. PMLR, 18–24 July 2021

    Google Scholar 

  13. Kim, H., Kim, J., Jeong, Y., Levine, S., Song, H.O.: EMI: exploration with mutual information. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 3360–3369. PMLR, 9–15 June 2019

    Google Scholar 

  14. Kim, Y., Nam, W., Kim, H., Kim, J.-H., Kim, G.: Curiosity-bottleneck: exploration by distilling task-specific novelty. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 3379–3388. PMLR, 9–15 June 2019

    Google Scholar 

  15. Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013)

    Google Scholar 

  16. Klissarov, M., Islam, R., Khetarpal, K., Precup, D.: Variational state encoding as intrinsic motivation in reinforcement learning (2019)

    Google Scholar 

  17. Kraemer, L., Banerjee, B.: Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190, 82–94 (2016)

    Article  Google Scholar 

  18. Lazic, N., Boutilier, C., Lu, T., Wong, E., Roy, B., Ryu, M.K., Imwalle, G.: Data center cooling using model-predictive control. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018) Lazic, N., Boutilier, C., Lu, T., Wong, E., Roy, B., Ryu, M.K., Imwalle, G.: Data center cooling using model-predictive control. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)

    Google Scholar 

  19. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P, Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Guyon, I., Von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  20. Machado, M.C., Bellemare, M.G., Bowling, M.: Count-based exploration with the successor representation (2018)

    Google Scholar 

  21. Martin, J., Sasikumar, S.N., Everitt, T., Hutter, M.: Count-based exploration in feature space for reinforcement learning (2017)

    Google Scholar 

  22. Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T.P., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning (2016)

    Google Scholar 

  23. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning (2013)

    Google Scholar 

  24. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533, February 2015

    Google Scholar 

  25. Oh, J., Guo, X., Lee, H., Lewis, R., Singh, S.: Action-conditional video prediction using deep networks in Atari games (2015)

    Google Scholar 

  26. Ostrovski, G., Bellemare, M.G., van den Oord, A., Munos, R.: Count-based exploration with neural density models. In: Precup, D., Teh, W. (eds.) Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 2721–2730. PMLR, 6–11 August 2017

    Google Scholar 

  27. Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: ICML’17, pp. 2778–2787 (2017). JMLR.org

    Google Scholar 

  28. Sallab, A.E.L., Abdou, M., Perot, E., Yogamani, S.: Deep reinforcement learning framework for autonomous driving. Electron. Imaging 29(19), 70–76 (2017)

    Google Scholar 

  29. Samvelyan, M., Rashid, T., de Witt, C.S., Farquhar, G., Nardelli, N., Rudner, T.G.J., Hung, C.-M., Torr, P.H.S., Foerster, J., Whiteson, S.: The starcraft multi-agent challenge (2019)

    Google Scholar 

  30. Savinov, N., Raichuk, A., Marinier, R., Vincent, D., Pollefeys, M., Lillicrap, T., Gelly, S.: Episodic curiosity through reachability (2018)

    Google Scholar 

  31. Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization (2015)

    Google Scholar 

  32. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017)

    Google Scholar 

  33. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Google Scholar 

  34. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pp. 387–395, Bejing, China, 22–24 June 2014. PMLR

    Google Scholar 

  35. Singh, S., Lewis, R.L., Barto, A.G., Sorg, J.: Intrinsically motivated reinforcement learning: an evolutionary perspective. IEEE Trans. Auton. Ment. Dev. 2(2), 70–82 (2010)

    Article  Google Scholar 

  36. Stadie, B.C., Levine, S., Abbeel, P.: Incentivizing exploration in reinforcement learning with deep predictive models (2015)

    Google Scholar 

  37. Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, X., Duan, Y., Schulman, J., De Turck, F., Abbeel, P.: #Exploration: a study of count-based exploration for deep reinforcement learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 2750–2759, Red Hook, NY, USA. Curran Associates Inc. (2017)

    Google Scholar 

  38. Vinyals, O., Babuschkin, I., Czarnecki, W.M., Mathieu, M., Dudzik, A., Chung, J., Choi, D.H., Powell, R., Ewalds, T., Georgiev, P., Junhyuk, O., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J.P., Jaderberg, M., Vezhnevets, A.S., Leblond, R., Pohlen, T., Dalibard, V., Budden, D., Sulsky, Y., Molloy, J., Paine, T.L., Gulcehre, C., Wang, Z., Pfaff, T., Yuhuai, W., Ring, R., Yogatama, D., Wünsch, D., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Kavukcuoglu, K., Hassabis, D., Apps, C., Silver, D.: Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)

    Google Scholar 

  39. Wiering, M.: Multi-agent reinforcement leraning for traffic light control. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML’00, pp. 1151–1158, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. (2000)

    Google Scholar 

  40. Yang, Y.: Many-agent reinforcement learning (2021)

    Google Scholar 

Download references

Acknowledgements

Research by S. Sakulin and A. Alfimtsev with support of Grant RNF No 22-21-00711.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladislav Bolshakov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bolshakov , V., Sakulin, S., Alfimtsev, A. (2024). Collective Intrinsic Motivation of a Multi-agent System Based on Reinforcement Learning Algorithms. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 825. Springer, Cham. https://doi.org/10.1007/978-3-031-47718-8_42

Download citation

Publish with us

Policies and ethics