State-based episodic memory for multi-agent reinforcement learning

Ma, Xiao; Li, Wu-Jun

doi:10.1007/s10994-023-06365-2

State-based episodic memory for multi-agent reinforcement learning

Published: 20 September 2023

Volume 112, pages 5163–5190, (2023)
Cite this article

Machine Learning Aims and scope Submit manuscript

Xiao Ma¹ &
Wu-Jun Li¹

189 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Multi-agent reinforcement learning (MARL) algorithms have made promising progress in recent years by leveraging the centralized training and decentralized execution (CTDE) paradigm. However, existing MARL algorithms still suffer from the sample inefficiency problem. In this paper, we propose a simple yet effective approach, called state-based episodic memory (SEM), to improve sample efficiency in MARL. SEM adopts episodic memory (EM) to supervise the centralized training procedure of CTDE in MARL. To the best of our knowledge, SEM is the first work to introduce EM into MARL. SEM has lower space complexity and time complexity than state and action based EM (SAEM) initially proposed for single-agent reinforcement learning when using for MARL. Experimental results on two synthetic environments and one real environment show that introducing episodic memory into MARL can improve sample efficiency, and SEM can reduce storage cost and time cost compared with SAEM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Model-Based Offline Adaptive Policy Optimization with Episodic Memory

Efficient Reinforcement Learning Using State-Action Uncertainty with Multiple Heads

Improving sample efficiency in Multi-Agent Actor-Critic methods

Article 09 July 2021

Data availability

Not applicable.

Code availability

Not applicable.

Notes

EMC (Zheng et al., 2021) also adopts EM in MARL. However, the arXiv version (Ma and Li, 2021) of our SEM came out earlier than EMC. Hence, the idea to adopt EM in MARL was independently proposed by SEM and EMC. Furthermore, EMC did not discuss the possibility of different ways to apply EM for MARL. On the contrary, our SEM gives a deep analysis of different ways to apply EM for MARL, which provides insight for designing a suitable EM mechanism for MARL.
V-learning (Jin et al., 2021) also adopts the method of restricting the state and joint action space to the state space to avoid the dependence on the number of joint actions. V-learning is based on decentralized training, but our work is based on centralized training. Furthermore, there is no EM mechanism in V-learning. Our work introduces the EM mechanism and restricts the state and joint action space to the state space when introducing the EM mechanism. In addition, the arXiv version of our SEM (Ma and Li, 2021) came out earlier than V-learning.
Our work SEM is based on the global states, independent of the observation function type. Hence, our method remains useful regardless of whether the observation function is probabilistic or deterministic.
We use \(\textrm{SC} 2.4 .6 .2 .69232\) (the same version as that in (Samvelyan et al., 2019)), instead of the newer version \(\textrm{SC} 2.4.10\).
The weighting function is the centrally-weighting function with a hyper-parameter \(\alpha\). \(\alpha\) is set to 0.1 on grid-world game and two-step matrix game. On SMAC, \(\alpha\) is set to 0.75 by following (Rashid et al., 2020).

References

Amarjyoti, S. (2017). Deep reinforcement learning for robotic manipulation—the state of the art. arXiv:1701.08878.
Andersen, P., Morris, R., Amaral, D., et al. (2006). The hippocampus book. Oxford: Oxford University Press.
Book Google Scholar
Badia AP, Piot B, Kapturowski S, et al (2020). Agent57: Outperforming the atari human benchmark. In ICML.
Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517.
Article MATH Google Scholar
Berner, C., Brockman, G., Chan, B., et al. (2019). Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680.
Blundell, C., Uria, B., Pritzel, A., et al. (2016). Model-free episodic control. arXiv:1606.04460.
Cao, Y., Yu, W., Ren, W., et al. (2012). An overview of recent progress in the study of distributed multi-agent coordination. IEEE Transactions on Industrial Informatics, 9(1), 427–438.
Article Google Scholar
Duan, Y., Chen, X., Houthooft, R., et al. (2016). Benchmarking deep reinforcement learning for continuous control. In ICML.
Foerster, J. N., Farquhar, G., Afouras, T., et al. (2018). Counterfactual multi-agent policy gradients. In AAAI.
Hardt, O., Nader, K., & Nadel, L. (2013). Decay happens: The role of active forgetting in memory. Trends in Cognitive Sciences, 17(3), 111–120.
Article Google Scholar
Hernandez-Leal, P., Kartal, B., & Taylor, M. E. (2019). A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems, 33(6), 750–797.
Article Google Scholar
Jaakkola, T. S., Jordan, M. I., & Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computing, 6(6), 1185–1201.
Article MATH Google Scholar
Jin, C., Liu, Q., Wang, Y., et al. (2021). V-learning: A simple, efficient, decentralized algorithm for multiagent RL. arXiv:2110.14555.
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26(189–206), 1.
MathSciNet MATH Google Scholar
Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274.
Article Google Scholar
Kononenko, I., & Kukar, M. (2007). Machine learning and data mining. Horwood Publishing.
Lample, G., Chaplot, D. S. (2017). Playing FPS games with deep reinforcement learning. In AAAI.
Lengyel, M., & Dayan, P. (2007). Hippocampal contributions to control: The third way. In NeurIPS.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., et al. (2016). Continuous control with deep reinforcement learning. In ICLR.
Lin, Z., Zhao, T., Yang, G., et al. (2018). Episodic memory deep q-networks. In IJCAI.
Lowe, R., Wu, Y., Tamar, A., et al. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In NeurIPS.
Ma, X., & Li, W. (2021). State-based episodic memory for multi-agent reinforcement learning. arXiv:2110.09817.
Melo, F. S. (2001). Convergence of q-learning: A simple proof. Institute of Systems and Robotics, Technical Report, pp. 1–4.
Mnih, V., Kavukcuoglu, K., Silver, D., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.
Article Google Scholar
Oliehoek, F. A., & Amato, C. (2016). A concise introduction to decentralized POMDPs. Berlin: Springer.
Book MATH Google Scholar
Oliehoek, F. A., Spaan, M. T. J., & Vlassis, N. A. (2008). Optimal and approximate q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research, 32, 289–353.
Article MathSciNet MATH Google Scholar
Powers, R., Shoham, Y., & Vu, T. (2007). A general criterion and an algorithmic framework for learning in multi-agent systems. Machine Learning, 67(1–2), 45–76.
Article MATH Google Scholar
Pritzel, A., Uria, B., Srinivasan, S., et al. (2017). Neural episodic control. In ICML.
Rashid, T., Samvelyan, M., de Witt, C. S., et al. (2018). QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In ICML.
Rashid, T., Farquhar, G., Peng, B., et al. (2020). Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In NeurIPS.
Samvelyan, M., Rashid, T., de Witt, C. S., et al. (2019). The starcraft multi-agent challenge. In AAMAS.
Shalev-Shwartz, S., Shammah, S., Shashua, A. (2016). Safe, multi-agent, reinforcement learning for autonomous driving. arXiv:1610.03295.
Silver, D., Huang, A., Maddison, C. J., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489.
Article Google Scholar
Son, K., Kim, D., Kang, W. J., et al. (2019). QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In ICML.
Squire, L. R. (2004). Memory systems of the brain: A brief history and current perspective. Neurobiology of Learning and Memory, 82(3), 171–177.
Article Google Scholar
Sunehag, P., Lever, G., Gruslys, A., et al. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward. In AAMAS.
Sutton, R. S., Barto, A. G. (1998). Reinforcement learning: An introduction. MIT Press.
Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In ICML.
Vinyals, O., Ewalds, T., Bartunov, S., et al. (2017). Starcraft II: A new challenge for reinforcement learning. arXiv:1708.04782.
Wang, J., Ren, Z., Liu, T., et al. (2020). QPLEX: Duplex dueling multi-agent q-learning. arXiv:2008.01062.
Watkins, C. J. C. H., & Dayan, P. (1992). Technical note q-learning. Machine Learning, 8, 279–292.
Article MATH Google Scholar
Wiering, M. A. (2000). Multi-agent reinforcement learning for traffic light control. In ICML.
Yang, Y., Hao, J., Liao, B., et al. (2020). Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv:2002.03939.
Zheng, L., Chen, J., Wang, J., et al. (2021). Episodic multi-agent reinforcement learning with curiosity-driven exploration. In NeurIPS.
Zhu, G., Lin, Z., Yang, G., et al. (2020). Episodic reinforcement learning with associative memory. In ICLR.

Download references

Funding

This work is supported by National Key R&D Program of China (No. 2020YFA0713900), NSFC Project (No. 61921006, No. 62192783), and Fundamental Research Funds for the Central Universities (No. 020214380108).

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University, 163 Xianlin Avenue, Nanjing, 210023, Jiangsu, China
Xiao Ma & Wu-Jun Li

Authors

Xiao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Wu-Jun Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Xiao Ma and Wu-Jun Li conceived of the presented idea. Xiao Ma developed the theory and Wu-Jun Li verified the theory. Xiao Ma and Wu-Jun Li conceived and planned the experiments. Xiao Ma carried out the experiments. Xiao Ma wrote the manuscript in consultation with Wu-Jun Li, and Wu-Jun Li modified the manuscript.

Corresponding author

Correspondence to Wu-Jun Li.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Editor: Tong Zhang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, X., Li, WJ. State-based episodic memory for multi-agent reinforcement learning. Mach Learn 112, 5163–5190 (2023). https://doi.org/10.1007/s10994-023-06365-2

Download citation

Received: 20 June 2022
Revised: 15 March 2023
Accepted: 17 June 2023
Published: 20 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10994-023-06365-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

State-based episodic memory for multi-agent reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Model-Based Offline Adaptive Policy Optimization with Episodic Memory

Efficient Reinforcement Learning Using State-Action Uncertainty with Multiple Heads

Improving sample efficiency in Multi-Agent Actor-Critic methods

Data availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

State-based episodic memory for multi-agent reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Model-Based Offline Adaptive Policy Optimization with Episodic Memory

Efficient Reinforcement Learning Using State-Action Uncertainty with Multiple Heads

Improving sample efficiency in Multi-Agent Actor-Critic methods

Data availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation