Cooperative learning with joint state value approximation for multi-agent systems

Chen, Xin; Chen, Gang; Cao, Weihua; Wu, Min

doi:10.1007/s11768-013-1141-z

Cooperative learning with joint state value approximation for multi-agent systems

Published: 05 May 2013

Volume 11, pages 149–155, (2013)
Cite this article

Journal of Control Theory and Applications Aims and scope Submit manuscript

Xin Chen¹,
Gang Chen¹,
Weihua Cao¹ &
…
Min Wu¹

215 Accesses
4 Citations
Explore all metrics

Abstract

This paper relieves the ‘curse of dimensionality’ problem, which becomes intractable when scaling reinforcement learning to multi-agent systems. This problem is aggravated exponentially as the number of agents increases, resulting in large memory requirement and slowness in learning speed. For cooperative systems which widely exist in multi-agent systems, this paper proposes a new multi-agent Q-learning algorithm based on decomposing the joint state and joint action learning into two learning processes, which are learning individual action and the maximum value of the joint state approximately. The latter process considers others’ actions to insure that the joint action is optimal and supports the updating of the former one. The simulation results illustrate that the proposed algorithm can learn the optimal joint behavior with smaller memory and faster learning speed compared with friend-Q learning and independent learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games

Article 09 January 2024

Breaking Deadlocks in Multi-agent Reinforcement Learning with Sparse Interaction

Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

Article 18 July 2023

References

G. Weiss. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. Cambridge: MIT Press, 1999.
Google Scholar
N. Vlassis. A concise introduction to multiagent systems and distributed artificial intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2007, 1(1): 1–71.
Article Google Scholar
M. Wu, W. Cao, J. Peng, et al. Balanced reactive-deliberative architecture for multi-agent system for simulation league of RoboCup. International Journal of Control, Automation and Systems, 2009, 7(6): 945–955.
Article Google Scholar
K. Tumer, A. Agogino. Improving air traffic management with a learning multiagent system. IEEE Intelligent Systems, 2009, 24(1):18–21.
Article Google Scholar
S. Proper, P. Tadepalli. Solving multiagent assignment Markov decision processes. Proceedings of the 8th International Joint Conference on Autonomous Agents and Multiagent Systems. Richland: IFAAMAS, 2009: 681–688.
Google Scholar
J. R. Kok, M. T. J. Spaan, N. Vlassis. Non-communicative multi-robot coordination in dynamics environments. Robotics and Autonomous Systems, 2005, 50(2/3): 99–114.
Article Google Scholar
M. L. Littman. Friend-or-Foe Q-learning in general-sum games. Proceedings of the 18th International Conference on Machine Learning. Williamstown: Morgan Kaufmann Press, 2001: 322–328.
Google Scholar
X. Wang, T. Sandholm. Reinforcement learning to play an optimal Nash equilibrium in team Markov games. Proceedings of the Advances Neural Information Processing Systems. Cambridge: MIT Press, 2002: 1571–1578.
Google Scholar
R. I. Brafman, M. Tennenholtz. R-Max-a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 2002, 3(2): 213–231.
MathSciNet Google Scholar
L. Busoniu, R. Babuska, B. De Schutter. A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applications and Reviews, 2008, 38(2): 156–172.
Article Google Scholar
N. Mehta, S. Natarajan, P. Tadepalli, et al. Transfer in variable-reward hierarchical reinforcement learning. Machine Learning, 2008, 73(3):289–312.
Article Google Scholar
J. R. Kok, N. Vlassis. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 2006, 7: 1789–1828.
MathSciNet MATH Google Scholar
S. Kapetanakis, D. Kudenko. Reinforcement learning of coordination in cooperative multi-agent systems. Proceedings of the 18th National Conference on Artificial Intelligence. Washington: IEEE Computer Society, 2002: 326–331.
Google Scholar
C. Claus, C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. Proceedings of the 15th National Conference on Artificial Intelligence. Madison: AAAI Press, 1998:746–752. 746–752.
Google Scholar
C. J. C. H. Watkins, P. Dayan. Q-learning. Machine Learning, 1992, 8(3/4): 279–292.
Article MATH Google Scholar
C. S. Szepesvari, M. L. Littman. A unified analysis of value-function-based reinforcement-learning algorithms. Neural Computation, 1999, 11(8): 2017–2059.
Article Google Scholar
R. S. Sutton. Learning to predict by the method of temporal differences. Machine Learning, 1988, 3(1): 9–44.
Google Scholar
A. Bab, R. I. Brafman. Multi-agent reinforcement learning in common interest and fixed sum stochastic games: an experimental study. Journal of Machine Learning Research, 2008, 9: 2635–2675.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Central South University, Changsha Hunan, 410083, China
Xin Chen, Gang Chen, Weihua Cao & Min Wu

Authors

Xin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Gang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Weihua Cao
View author publications
You can also search for this author in PubMed Google Scholar
Min Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min Wu.

Additional information

This work was supported by National Nature Science Foundation of China (Nos. 61074058, 60874042), the Chinese Postdoctoral Science Foundation (No. 200902483), the Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20090162120068), and the Central South University Innovation Project (No. 2011ssxt221).

Xin CHEN received his B.S. degree in Industrial Automation, and M.S. degree in Control Theory and Control Engineering from Central South University, Changsha, China in 1999 and 2002, respectively. He received his Ph.D. in Electromechanical Engineering from the University of Macau, China in 2007. He is currently an associate professor at Central South University. His research interests include multi-agent system, robotics and intelligent control.

Gang CHEN received his B.S. degree in Automation from Central South University, Changsha, China in 2009. He is currently working toward the M.S. degree in Control Theory and Control Engineering at School of Information and Science, Central South University. His research interests include multi-agent system, reinforcement learning.

Weihua CAO received his B.S., M.S., and Ph.D. degrees in Engineering from Central South University, Changsha, China, in 1994, 1997, and 2007, respectively. Since May 1997, he has been a faculty member with Central South University, where he is currently a professor of automatic control engineering at the School of Information Science and Engineering. He was a visiting student with the Department of Engineering, Kanazawa University, Japan, from 1996 to 1997, and a Visiting Scholar with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada, during the 2007–2008 academic year. His research interests include multi-agent system, intelligent control and process control.

Min WU received his B.S. and M.S. degrees in Engineering from Central South University, Changsha, China, in 1983 and 1986, respectively, and Ph.D. degree in Engineering from Tokyo Institute of Technology, Tokyo, Japan, in 1999. Since July 1986, he has been a faculty member with Central South University, where he is currently a professor of Automatic Control Engineering at the School of Information Science and Engineering. He was a visiting scholar with the Department of Electrical Engineering, Tohoku University, Sendai, Japan, from 1989 to 1990, and a visiting research scholar with the Department of Control and Systems Engineering, Tokyo Institute of Technology, from 1996 to 1999. His current research interests include robust control and its applications, process control, and intelligent control.

Dr. Wu is a member of the Nonferrous Metals Society of China and the Chinese Association of Automation. He received the IFAC Control Engineering Practice Prize Paper Award in 1999 (together with M. Nakano and J. She). Dr. Wu is a senior member of the IEEE.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, X., Chen, G., Cao, W. et al. Cooperative learning with joint state value approximation for multi-agent systems. J. Control Theory Appl. 11, 149–155 (2013). https://doi.org/10.1007/s11768-013-1141-z

Download citation

Received: 06 July 2011
Revised: 29 February 2012
Published: 05 May 2013
Issue Date: May 2013
DOI: https://doi.org/10.1007/s11768-013-1141-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cooperative learning with joint state value approximation for multi-agent systems

Abstract

Access this article

Similar content being viewed by others

Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games

Breaking Deadlocks in Multi-agent Reinforcement Learning with Sparse Interaction

Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cooperative learning with joint state value approximation for multi-agent systems

Abstract

Access this article

Similar content being viewed by others

Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games

Breaking Deadlocks in Multi-agent Reinforcement Learning with Sparse Interaction

Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation