Game Theory and Multi-agent Reinforcement Learning

Nowé, Ann; Vrancx, Peter; De Hauwere, Yann-Michaël

doi:10.1007/978-3-642-27645-3_14

Ann Nowé³,
Peter Vrancx³ &
Yann-Michaël De Hauwere³

Part of the book series: Adaptation, Learning, and Optimization ((ALO,volume 12))

30k Accesses
76 Citations

Abstract

Reinforcement Learning was originally developed for Markov Decision Processes (MDPs). It allows a single agent to learn a policy that maximizes a possibly delayed reward signal in a stochastic stationary environment. It guarantees convergence to the optimal policy, provided that the agent can sufficiently experiment and the environment in which it is operating is Markovian. However, when multiple agents apply reinforcement learning in a shared environment, this might be beyond the MDP model. In such systems, the optimal policy of an agent depends not only on the environment, but on the policies of the other agents as well. These situations arise naturally in a variety of domains, such as: robotics, telecommunications, economics, distributed control, auctions, traffic light control, etc. In these domains multi-agent learning is used, either because of the complexity of the domain or because control is inherently decentralized. In such systems it is important that agents are capable of discovering good solutions to the problem at hand either by coordinating with other learners or by competing with them. This chapter focuses on the application reinforcement learning techniques in multi-agent systems. We describe a basic learning framework based on the economic research into game theory, and illustrate the additional complexity that arises in such systems. We also described a representative selection of algorithms for the different areas of multi-agent reinforcement learning research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aumann, R.: Subjectivity and Correlation in Randomized Strategies. Journal of Mathematical Economics 1(1), 67–96 (1974)
Article MathSciNet Google Scholar
Bowling, M.: Convergence and No-Regret in Multiagent Learning. In: Advances in Neural Information Processing Systems 17 (NIPS), pp. 209–216 (2005)
Google Scholar
Bowling, M., Veloso, M.: Convergence of Gradient Dynamics with a Variable Learning Rate. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), pp. 27–34 (2001)
Google Scholar
Bowling, M., Veloso, M.: Scalable Learning in Stochastic Games. In: AAAI Workshop on Game Theoretic and Decision Theoretic Agents (2002)
Google Scholar
Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38(2), 156–172 (2008)
Article Google Scholar
Chalkiadakis, G., Boutilier, C.: Sequential Decision Making in Repeated Coalition Formation under Uncertainty. In: Parkes, P.M., Parsons (eds.) Proceedings of 7th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2008), pp. 347–354 (2008)
Google Scholar
Claus, C., Boutilier, C.: The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. In: Proceedings of the National Conference on Artificial Intelligence, pp. 746–752. John Wiley & Sons Ltd. (1998)
Google Scholar
De Hauwere, Y.M., Vrancx, P., Nowé, A.: Learning Multi-Agent State Space Representations. In: Proceedings of the 9th International Conference on Autonomous Agents and Multi-Agent Systems, Toronto, Canada, pp. 715–722 (2010)
Google Scholar
De Hauwere, Y.M., Vrancx, P., Nowé, A.: Detecting and Solving Future Multi-Agent Interactions. In: Proceedings of the AAMAS Workshop on Adaptive and Learning Agents, Taipei, Taiwan, pp. 45–52 (2011)
Google Scholar
Dorigo, M., Stützle, T.: Ant Colony Optimization. Bradford Company, MA (2004)
Book Google Scholar
Fakir, M.: Resource Optimization Methods for Telecommunication Networks. PhD thesis, Department of Electronics and Informatics, Vrije Universiteit Brussel, Belgium (2004)
Google Scholar
Foster, D., Young, H.: Regret Testing: A Simple Payoff-based Procedure for Learning Nash Equilibrium. University of Pennsylvania and Johns Hopkins University, Mimeo (2003)
Google Scholar
Gillette, D.: Stochastic Games with Zero Stop Probabilities. Ann. Math. Stud. 39, 178–187 (1957)
MathSciNet Google Scholar
Gintis, H.: Game Theory Evolving. Princeton University Press (2000)
Google Scholar
Greenwald, A., Hall, K., Serrano, R.: Correlated Q-learning. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 242–249 (2003)
Google Scholar
Guestrin, C., Lagoudakis, M., Parr, R.: Coordinated Reinforcement Learning. In: Proceedings of the 19th International Conference on Machine Learning, pp. 227–234 (2002a)
Google Scholar
Guestrin, C., Venkataraman, S., Koller, D.: Context-Specific Multiagent Coordination and Planning with Factored MDPs. In: 18th National Conference on Artificial Intelligence, pp. 253–259. American Association for Artificial Intelligence, Menlo Park (2002b)
Google Scholar
Hart, S., Mas-Colell, A.: A Reinforcement Procedure Leading to Correlated Equilibrium. Economic Essays: A Festschrift for Werner Hildenbrand, 181–200 (2001)
Google Scholar
Hu, J., Wellman, M.: Nash Q-learning for General-Sum Stochastic Games. The Journal of Machine Learning Research 4, 1039–1069 (2003)
MathSciNet Google Scholar
Kapetanakis, S., Kudenko, D.: Reinforcement Learning of Coordination in Cooperative Multi-Agent Systems. In: Proceedings of the National Conference on Artificial Intelligence, pp. 326–331. AAAI Press, MIT Press, Menlo Park, Cambridge (2002)
Google Scholar
Kapetanakis, S., Kudenko, D., Strens, M.: Learning to Coordinate Using Commitment Sequences in Cooperative Multiagent-Systems. In: Proceedings of the Third Symposium on Adaptive Agents and Multi-agent Systems (AAMAS-2003), p. 2004 (2003)
Google Scholar
Kok, J., Vlassis, N.: Sparse Cooperative Q-learning. In: Proceedings of the 21st International Conference on Machine Learning. ACM, New York (2004a)
Google Scholar
Kok, J., Vlassis, N.: Sparse Tabular Multiagent Q-learning. In: Proceedings of the 13th Benelux Conference on Machine Learning, Benelearn (2004b)
Google Scholar
Kok, J., Vlassis, N.: Collaborative Multiagent Reinforcement Learning by Payoff Propagation. Journal of Machine Learning Research 7, 1789–1828 (2006)
MathSciNet Google Scholar
Kok, J., ’t Hoen, P., Bakker, B., Vlassis, N.: Utile Coordination: Learning Interdependencies among Cooperative Agents. In: Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG 2005), pp. 29–36 (2005)
Google Scholar
Kononen, V.: Asymmetric Multiagent Reinforcement Learning. In: IEEE/WIC International Conference on Intelligent Agent Technology (IAT 2003), pp. 336–342 (2003)
Google Scholar
Könönen, V.: Policy Gradient Method for Team Markov Games. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 733–739. Springer, Heidelberg (2004)
Chapter Google Scholar
Leyton-Brown, K., Shoham, Y.: Essentials of Game Theory: A Concise Multidisciplinary Introduction. Synthesis Lectures on Artificial Intelligence and Machine Learning 2(1), 1–88 (2008)
Article Google Scholar
Littman, M.: Markov Games as a Framework for Multi-Agent Reinforcement Learning. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 157–163. Morgan Kaufmann (1994)
Google Scholar
Littman, M.: Friend-or-Foe Q-learning in General-Sum Games. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 322–328. Morgan Kaufmann (2001a)
Google Scholar
Littman, M.: Value-function Reinforcement Learning in Markov Games. Cognitive Systems Research 2(1), 55–66 (2001b), http://www.sciencedirect.com/science/article/B6W6C-430G1TK-4/2/822caf1574be32ae91adf15de90becc4 , doi:10.1016/S1389-0417(01)00015-8
Article Google Scholar
Littman, M., Boyan, J.: A Distributed Reinforcement Learning Scheme for Network Routing. In: Proceedings of the 1993 International Workshop on Applications of Neural Networks to Telecommunications, pp. 45–51. Erlbaum (1993)
Google Scholar
Mariano, C., Morales, E.: DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 324–335. Springer, Heidelberg (2001)
Chapter Google Scholar
Melo, F., Veloso, M.: Learning of Coordination: Exploiting Sparse Interactions in Multiagent Systems. In: Proceedings of the 8th International Conference on Autonomous Agents and Multi-Agent Systems, pp. 773–780 (2009)
Google Scholar
Nash, J.: Equilibrium Points in n-Person Games. Proceedings of the National Academy of Sciences of the United States of America, 48–49 (1950)
Google Scholar
Peshkin, L., Kim, K., Meuleau, N., Kaelbling, L.: Learning to Cooperate via Policy Search. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, UAI 2000, pp. 489–496. Morgan Kaufmann Publishers Inc., San Francisco (2000), http://portal.acm.org/citation.cfm?id=647234.719893
Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2003)
Google Scholar
Sastry, P., Phansalkar, V., Thathachar, M.: Decentralized Learning of Nash Equilibria in Multi-Person Stochastic Games with Incomplete Information. IEEE Transactions on Systems, Man and Cybernetics 24(5), 769–777 (1994)
Article MathSciNet Google Scholar
Shapley, L.: Stochastic Games. Proceedings of the National Academy of Sciences 39(10), 1095–1100 (1953)
Article MathSciNet Google Scholar
Shoham, Y., Leyton-Brown, K.: Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press (2009)
Google Scholar
Singh, S., Kearns, M., Mansour, Y.: Nash Convergence of Gradient Dynamics in General-Sum Games. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 541–548 (2000)
Google Scholar
Smith, J.: Evolution and the Theory of Games. Cambridge Univ. Press (1982)
Google Scholar
Sobel, M.: Noncooperative Stochastic Games. The Annals of Mathematical Statistics 42(6), 1930–1935 (1971)
Article MathSciNet Google Scholar
Spaan, M., Melo, F.: Interaction-Driven Markov Games for Decentralized Multiagent Planning under Uncertainty. In: Proceedings of the 7th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pp. 525–532. International Foundation for Autonomous Agents and Multiagent Systems (2008)
Google Scholar
Steenhaut, K., Nowe, A., Fakir, M., Dirkx, E.: Towards a Hardware Implementation of Reinforcement Learning for Call Admission Control in Networks for Integrated Services. In: Proceedings of the International Workshop on Applications of Neural Networks to Telecommunications, vol. 3, p. 63. Lawrence Erlbaum (1997)
Google Scholar
Stevens, J.P.: Intermediate Statistics: A Modern Approach. Lawrence Erlbaum (1990)
Google Scholar
Sutton, R., McAllester, D., Singh, S., Mansour, Y.: Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: Advances in Neural Information Processing Systems, vol. 12(22) (2000)
Google Scholar
Tsitsiklis, J.: Asynchronous stochastic approximation and Q-learning. Machine Learning 16(3), 185–202 (1994)
MathSciNet Google Scholar
Tuyls, K., Nowé, A.: Evolutionary Game Theory and Multi-Agent Reinforcement Learning. The Knowledge Engineering Review 20(01), 63–90 (2005)
Article Google Scholar
Verbeeck, K.: Coordinated Exploration in Multi-Agent Reinforcement Learning. PhD thesis, Computational Modeling Lab, Vrije Universiteit Brussel, Belgium (2004)
Google Scholar
Verbeeck, K., Nowe, A., Tuyls, K.: Coordinated Exploration in Multi-Agent Reinforcement Learning: An Application to Loadbalancing. In: Proceedings of the 4th International Conference on Autonomous Agents and Multi-Agent Systems (2005)
Google Scholar
Vrancx, P., Tuyls, K., Westra, R.: Switching Dynamics of Multi-Agent Learning. In: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2008), vol. 1, pp. 307–313. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2008a), http://portal.acm.org/citation.cfm?id=1402383.1402430
Google Scholar
Vrancx, P., Verbeeck, K., Nowe, A.: Decentralized Learning in Markov Games. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38(4), 976–981 (2008b)
Article Google Scholar
Vrancx, P., De Hauwere, Y.M., Nowé, A.: Transfer learning for Multi-Agent Coordination. In: Proceedings of the 3th International Conference on Agents and Artificial Intelligence, Rome, Italy, pp. 263–272 (2011)
Google Scholar
Weiss, G.: Multiagent Systems, A Modern Approach to Distributed Artificial Intelligence. The MIT Press (1999)
Google Scholar
Wheeler Jr., R., Narendra, K.: Decentralized Learning in Finite Markov Chains. IEEE Transactions on Automatic Control 31(6), 519–526 (1986)
Article Google Scholar
Williams, R.: Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8(3), 229–256 (1992)
Google Scholar
Wooldridge, M.: An Introduction to Multi Agent Systems. John Wiley and Sons Ltd. (2002)
Google Scholar
Wunder, M., Littman, M., Babes, M.: Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, pp. 1167–1174 (2010)
Google Scholar
Zinkevich, M.: Online Convex Programming and Generalized Infinitesimal Gradient Ascent. In: Machine Learning International Conference, vol. 20(2), p. 928 (2003)
Google Scholar
Zinkevich, M., Greenwald, A., Littman, M.: Cyclic equilibria in Markov games. In: Advances in Neural Information Processing Systems, vol. 18, p. 1641 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Vrije Universiteit Brussel, Brussel, Belgium
Ann Nowé, Peter Vrancx & Yann-Michaël De Hauwere

Authors

Ann Nowé
View author publications
You can also search for this author in PubMed Google Scholar
Peter Vrancx
View author publications
You can also search for this author in PubMed Google Scholar
Yann-Michaël De Hauwere
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ann Nowé .

Editor information

Editors and Affiliations

Fac. Mathematics &, Natural Sciences, University of Groningen, Nijenborgh 9, Groningen, 9747 AG, Netherlands
Marco Wiering
, Artificial Intelligence, Radboud University Nijmegen, B.02.30 Spinozagebouw, Montessorilaan 3, Nijmegen, 6500, Netherlands
Martijn van Otterlo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nowé, A., Vrancx, P., De Hauwere, YM. (2012). Game Theory and Multi-agent Reinforcement Learning. In: Wiering, M., van Otterlo, M. (eds) Reinforcement Learning. Adaptation, Learning, and Optimization, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27645-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-27645-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27644-6
Online ISBN: 978-3-642-27645-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics