Solving Sparse Delayed Coordination Problems in Multi-Agent Reinforcement Learning

De Hauwere, Yann-Michaël; Vrancx, Peter; Nowé, Ann

doi:10.1007/978-3-642-28499-1_8

Yann-Michaël De Hauwere²²,
Peter Vrancx²² &
Ann Nowé²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7113))

Included in the following conference series:

International Workshop on Adaptive and Learning Agents

915 Accesses
2 Citations

Abstract

One of the main advantages of Reinforcement Learning is the capability of dealing with a delayed reward signal. Using an appropriate backup diagram, rewards are backpropagated through the state space. This allows agents to learn to take the correct action that results in the highest future (discounted) reward, even if that action results in a suboptimal immediate reward in the current state. In a multi-agent environment, agents can use the same principles as in single agent RL, but have to apply them in a complete joint-state-joint-action space to guarantee optimality. Learning in such a state space can however be very slow. In this paper we present our approach for mitigating this problem. Future Coordinating Q-learning (FCQ-learning) detects strategic interactions between agents several timesteps before these interactions occur. FCQ-learning uses the same principles as CQ-learning [3] to detect the states in which interaction is required, but several timesteps before this is reflected in the reward signal. In these states, the algorithm will augment the state information to include information about other agents which is used to select actions. The techniques presented in this paper are the first to explicitly deal with a delayed reward signal when learning using sparse interactions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Boutilier, C.: Planning, learning and coordination in multiagent decision processes. In: Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge, Renesse, Holland, pp. 195–210 (1996)
Google Scholar
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the 15th National Conference on Artificial Intelligence, pp. 746–752. AAAI Press (1998)
Google Scholar
De Hauwere, Y.-M., Vrancx, P., Nowé, A.: Learning multi-agent state space representations. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, pp. 715–722 (2010)
Google Scholar
De Hauwere, Y.-M., Vrancx, P., Nowé, A.: Adaptive state representations for multi-agent reinforcement learning. In: Proceedings of the 3th International Conference on Agents and Artificial Intelligence, Rome, Italy, pp. 181–189 (2011)
Google Scholar
Greenwald, A., Hall, K.: Correlated-q learning. In: AAAI Spring Symposium, pp. 242–249. AAAI Press (2003)
Google Scholar
Hu, J., Wellman, M.: Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research 4, 1039–1069 (2003)
MathSciNet MATH Google Scholar
Kok, J., ’t Hoen, P., Bakker, B., Vlassis, N.: Utile coordination: Learning interdependencies among cooperative agents. In: Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG), pp. 29–36 (2005)
Google Scholar
Kok, J., Vlassis, N.: Sparse cooperative Q-learning. In: Proceedings of the 21st International Conference on Machine Learning (ICML). ACM, New York (2004)
Google Scholar
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning (ICML), pp. 157–163. Morgan Kaufmann (1994)
Google Scholar
Melo, F.S., Veloso, M.: Learning of coordination: Exploiting sparse interactions in multiagent systems. In: Proceedings of the 8th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pp. 773–780. International Foundation for Autonomous Agents and Multiagent Systems (2009)
Google Scholar
Melo, F., Veloso, M.: Local multiagent coordination in decentralised mdps with sparse interactions. Tech. Rep. CMU-CS-10-133, School of Computer Science, Carnegie Mellon University (2010)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Tsitsiklis, J.: Asynchronous stochastic approximation and Q-learning. Journal of Machine Learning 16(3), 185–202 (1994)
MATH Google Scholar
Vrancx, P., Verbeeck, K., Nowé, A.: Decentralized learning in markov games. IEEE Transactions on Systems, Man and Cybernetics (Part B: Cybernetics) 38(4), 976–981 (2008)
Article Google Scholar
Watkins, C.: Learning from Delayed Rewards. Ph.D. thesis, University of Cambridge (1989)
Google Scholar

Download references

Author information

Authors and Affiliations

Computational Modeling Lab, Vrije Universiteit Brussel, Pleinlaan 2, B-1050, Brussels, Belgium
Yann-Michaël De Hauwere, Peter Vrancx & Ann Nowé

Authors

Yann-Michaël De Hauwere
View author publications
You can also search for this author in PubMed Google Scholar
Peter Vrancx
View author publications
You can also search for this author in PubMed Google Scholar
Ann Nowé
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

AI& Computational Modeling Lab, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussel, Belgium
Peter Vrancx
NASA Ames Research Park, Carnegie Mellon University, Building 23 (MS 23-11), P.O.Box 1, 94035, Moffet Field, CA, USA
Matthew Knudson
School of Computer Science, University of Waterloo, 200 University Avenue West, N2L 3G1, Waterloo, Ontario, Canada
Marek Grześ

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

De Hauwere, YM., Vrancx, P., Nowé, A. (2012). Solving Sparse Delayed Coordination Problems in Multi-Agent Reinforcement Learning. In: Vrancx, P., Knudson, M., Grześ, M. (eds) Adaptive and Learning Agents. ALA 2011. Lecture Notes in Computer Science(), vol 7113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28499-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-28499-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28498-4
Online ISBN: 978-3-642-28499-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics