Skip to main content

Game Theory and Multi-agent Reinforcement Learning

  • Chapter
Book cover Reinforcement Learning

Part of the book series: Adaptation, Learning, and Optimization ((ALO,volume 12))

Abstract

Reinforcement Learning was originally developed for Markov Decision Processes (MDPs). It allows a single agent to learn a policy that maximizes a possibly delayed reward signal in a stochastic stationary environment. It guarantees convergence to the optimal policy, provided that the agent can sufficiently experiment and the environment in which it is operating is Markovian. However, when multiple agents apply reinforcement learning in a shared environment, this might be beyond the MDP model. In such systems, the optimal policy of an agent depends not only on the environment, but on the policies of the other agents as well. These situations arise naturally in a variety of domains, such as: robotics, telecommunications, economics, distributed control, auctions, traffic light control, etc. In these domains multi-agent learning is used, either because of the complexity of the domain or because control is inherently decentralized. In such systems it is important that agents are capable of discovering good solutions to the problem at hand either by coordinating with other learners or by competing with them. This chapter focuses on the application reinforcement learning techniques in multi-agent systems. We describe a basic learning framework based on the economic research into game theory, and illustrate the additional complexity that arises in such systems. We also described a representative selection of algorithms for the different areas of multi-agent reinforcement learning research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aumann, R.: Subjectivity and Correlation in Randomized Strategies. Journal of Mathematical Economics 1(1), 67–96 (1974)

    Article  MathSciNet  Google Scholar 

  • Bowling, M.: Convergence and No-Regret in Multiagent Learning. In: Advances in Neural Information Processing Systems 17 (NIPS), pp. 209–216 (2005)

    Google Scholar 

  • Bowling, M., Veloso, M.: Convergence of Gradient Dynamics with a Variable Learning Rate. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), pp. 27–34 (2001)

    Google Scholar 

  • Bowling, M., Veloso, M.: Scalable Learning in Stochastic Games. In: AAAI Workshop on Game Theoretic and Decision Theoretic Agents (2002)

    Google Scholar 

  • Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38(2), 156–172 (2008)

    Article  Google Scholar 

  • Chalkiadakis, G., Boutilier, C.: Sequential Decision Making in Repeated Coalition Formation under Uncertainty. In: Parkes, P.M., Parsons (eds.) Proceedings of 7th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2008), pp. 347–354 (2008)

    Google Scholar 

  • Claus, C., Boutilier, C.: The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. In: Proceedings of the National Conference on Artificial Intelligence, pp. 746–752. John Wiley & Sons Ltd. (1998)

    Google Scholar 

  • De Hauwere, Y.M., Vrancx, P., Nowé, A.: Learning Multi-Agent State Space Representations. In: Proceedings of the 9th International Conference on Autonomous Agents and Multi-Agent Systems, Toronto, Canada, pp. 715–722 (2010)

    Google Scholar 

  • De Hauwere, Y.M., Vrancx, P., Nowé, A.: Detecting and Solving Future Multi-Agent Interactions. In: Proceedings of the AAMAS Workshop on Adaptive and Learning Agents, Taipei, Taiwan, pp. 45–52 (2011)

    Google Scholar 

  • Dorigo, M., Stützle, T.: Ant Colony Optimization. Bradford Company, MA (2004)

    Book  Google Scholar 

  • Fakir, M.: Resource Optimization Methods for Telecommunication Networks. PhD thesis, Department of Electronics and Informatics, Vrije Universiteit Brussel, Belgium (2004)

    Google Scholar 

  • Foster, D., Young, H.: Regret Testing: A Simple Payoff-based Procedure for Learning Nash Equilibrium. University of Pennsylvania and Johns Hopkins University, Mimeo (2003)

    Google Scholar 

  • Gillette, D.: Stochastic Games with Zero Stop Probabilities. Ann. Math. Stud. 39, 178–187 (1957)

    MathSciNet  Google Scholar 

  • Gintis, H.: Game Theory Evolving. Princeton University Press (2000)

    Google Scholar 

  • Greenwald, A., Hall, K., Serrano, R.: Correlated Q-learning. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 242–249 (2003)

    Google Scholar 

  • Guestrin, C., Lagoudakis, M., Parr, R.: Coordinated Reinforcement Learning. In: Proceedings of the 19th International Conference on Machine Learning, pp. 227–234 (2002a)

    Google Scholar 

  • Guestrin, C., Venkataraman, S., Koller, D.: Context-Specific Multiagent Coordination and Planning with Factored MDPs. In: 18th National Conference on Artificial Intelligence, pp. 253–259. American Association for Artificial Intelligence, Menlo Park (2002b)

    Google Scholar 

  • Hart, S., Mas-Colell, A.: A Reinforcement Procedure Leading to Correlated Equilibrium. Economic Essays: A Festschrift for Werner Hildenbrand, 181–200 (2001)

    Google Scholar 

  • Hu, J., Wellman, M.: Nash Q-learning for General-Sum Stochastic Games. The Journal of Machine Learning Research 4, 1039–1069 (2003)

    MathSciNet  Google Scholar 

  • Kapetanakis, S., Kudenko, D.: Reinforcement Learning of Coordination in Cooperative Multi-Agent Systems. In: Proceedings of the National Conference on Artificial Intelligence, pp. 326–331. AAAI Press, MIT Press, Menlo Park, Cambridge (2002)

    Google Scholar 

  • Kapetanakis, S., Kudenko, D., Strens, M.: Learning to Coordinate Using Commitment Sequences in Cooperative Multiagent-Systems. In: Proceedings of the Third Symposium on Adaptive Agents and Multi-agent Systems (AAMAS-2003), p. 2004 (2003)

    Google Scholar 

  • Kok, J., Vlassis, N.: Sparse Cooperative Q-learning. In: Proceedings of the 21st International Conference on Machine Learning. ACM, New York (2004a)

    Google Scholar 

  • Kok, J., Vlassis, N.: Sparse Tabular Multiagent Q-learning. In: Proceedings of the 13th Benelux Conference on Machine Learning, Benelearn (2004b)

    Google Scholar 

  • Kok, J., Vlassis, N.: Collaborative Multiagent Reinforcement Learning by Payoff Propagation. Journal of Machine Learning Research 7, 1789–1828 (2006)

    MathSciNet  Google Scholar 

  • Kok, J., ’t Hoen, P., Bakker, B., Vlassis, N.: Utile Coordination: Learning Interdependencies among Cooperative Agents. In: Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG 2005), pp. 29–36 (2005)

    Google Scholar 

  • Kononen, V.: Asymmetric Multiagent Reinforcement Learning. In: IEEE/WIC International Conference on Intelligent Agent Technology (IAT 2003), pp. 336–342 (2003)

    Google Scholar 

  • Könönen, V.: Policy Gradient Method for Team Markov Games. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 733–739. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  • Leyton-Brown, K., Shoham, Y.: Essentials of Game Theory: A Concise Multidisciplinary Introduction. Synthesis Lectures on Artificial Intelligence and Machine Learning 2(1), 1–88 (2008)

    Article  Google Scholar 

  • Littman, M.: Markov Games as a Framework for Multi-Agent Reinforcement Learning. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 157–163. Morgan Kaufmann (1994)

    Google Scholar 

  • Littman, M.: Friend-or-Foe Q-learning in General-Sum Games. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 322–328. Morgan Kaufmann (2001a)

    Google Scholar 

  • Littman, M.: Value-function Reinforcement Learning in Markov Games. Cognitive Systems Research 2(1), 55–66 (2001b), http://www.sciencedirect.com/science/article/B6W6C-430G1TK-4/2/822caf1574be32ae91adf15de90becc4 , doi:10.1016/S1389-0417(01)00015-8

    Article  Google Scholar 

  • Littman, M., Boyan, J.: A Distributed Reinforcement Learning Scheme for Network Routing. In: Proceedings of the 1993 International Workshop on Applications of Neural Networks to Telecommunications, pp. 45–51. Erlbaum (1993)

    Google Scholar 

  • Mariano, C., Morales, E.: DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 324–335. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  • Melo, F., Veloso, M.: Learning of Coordination: Exploiting Sparse Interactions in Multiagent Systems. In: Proceedings of the 8th International Conference on Autonomous Agents and Multi-Agent Systems, pp. 773–780 (2009)

    Google Scholar 

  • Nash, J.: Equilibrium Points in n-Person Games. Proceedings of the National Academy of Sciences of the United States of America, 48–49 (1950)

    Google Scholar 

  • Peshkin, L., Kim, K., Meuleau, N., Kaelbling, L.: Learning to Cooperate via Policy Search. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, UAI 2000, pp. 489–496. Morgan Kaufmann Publishers Inc., San Francisco (2000), http://portal.acm.org/citation.cfm?id=647234.719893

    Google Scholar 

  • Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2003)

    Google Scholar 

  • Sastry, P., Phansalkar, V., Thathachar, M.: Decentralized Learning of Nash Equilibria in Multi-Person Stochastic Games with Incomplete Information. IEEE Transactions on Systems, Man and Cybernetics 24(5), 769–777 (1994)

    Article  MathSciNet  Google Scholar 

  • Shapley, L.: Stochastic Games. Proceedings of the National Academy of Sciences 39(10), 1095–1100 (1953)

    Article  MathSciNet  Google Scholar 

  • Shoham, Y., Leyton-Brown, K.: Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press (2009)

    Google Scholar 

  • Singh, S., Kearns, M., Mansour, Y.: Nash Convergence of Gradient Dynamics in General-Sum Games. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 541–548 (2000)

    Google Scholar 

  • Smith, J.: Evolution and the Theory of Games. Cambridge Univ. Press (1982)

    Google Scholar 

  • Sobel, M.: Noncooperative Stochastic Games. The Annals of Mathematical Statistics 42(6), 1930–1935 (1971)

    Article  MathSciNet  Google Scholar 

  • Spaan, M., Melo, F.: Interaction-Driven Markov Games for Decentralized Multiagent Planning under Uncertainty. In: Proceedings of the 7th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pp. 525–532. International Foundation for Autonomous Agents and Multiagent Systems (2008)

    Google Scholar 

  • Steenhaut, K., Nowe, A., Fakir, M., Dirkx, E.: Towards a Hardware Implementation of Reinforcement Learning for Call Admission Control in Networks for Integrated Services. In: Proceedings of the International Workshop on Applications of Neural Networks to Telecommunications, vol. 3, p. 63. Lawrence Erlbaum (1997)

    Google Scholar 

  • Stevens, J.P.: Intermediate Statistics: A Modern Approach. Lawrence Erlbaum (1990)

    Google Scholar 

  • Sutton, R., McAllester, D., Singh, S., Mansour, Y.: Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: Advances in Neural Information Processing Systems, vol. 12(22) (2000)

    Google Scholar 

  • Tsitsiklis, J.: Asynchronous stochastic approximation and Q-learning. Machine Learning 16(3), 185–202 (1994)

    MathSciNet  Google Scholar 

  • Tuyls, K., Nowé, A.: Evolutionary Game Theory and Multi-Agent Reinforcement Learning. The Knowledge Engineering Review 20(01), 63–90 (2005)

    Article  Google Scholar 

  • Verbeeck, K.: Coordinated Exploration in Multi-Agent Reinforcement Learning. PhD thesis, Computational Modeling Lab, Vrije Universiteit Brussel, Belgium (2004)

    Google Scholar 

  • Verbeeck, K., Nowe, A., Tuyls, K.: Coordinated Exploration in Multi-Agent Reinforcement Learning: An Application to Loadbalancing. In: Proceedings of the 4th International Conference on Autonomous Agents and Multi-Agent Systems (2005)

    Google Scholar 

  • Vrancx, P., Tuyls, K., Westra, R.: Switching Dynamics of Multi-Agent Learning. In: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2008), vol. 1, pp. 307–313. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2008a), http://portal.acm.org/citation.cfm?id=1402383.1402430

    Google Scholar 

  • Vrancx, P., Verbeeck, K., Nowe, A.: Decentralized Learning in Markov Games. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38(4), 976–981 (2008b)

    Article  Google Scholar 

  • Vrancx, P., De Hauwere, Y.M., Nowé, A.: Transfer learning for Multi-Agent Coordination. In: Proceedings of the 3th International Conference on Agents and Artificial Intelligence, Rome, Italy, pp. 263–272 (2011)

    Google Scholar 

  • Weiss, G.: Multiagent Systems, A Modern Approach to Distributed Artificial Intelligence. The MIT Press (1999)

    Google Scholar 

  • Wheeler Jr., R., Narendra, K.: Decentralized Learning in Finite Markov Chains. IEEE Transactions on Automatic Control 31(6), 519–526 (1986)

    Article  Google Scholar 

  • Williams, R.: Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8(3), 229–256 (1992)

    Google Scholar 

  • Wooldridge, M.: An Introduction to Multi Agent Systems. John Wiley and Sons Ltd. (2002)

    Google Scholar 

  • Wunder, M., Littman, M., Babes, M.: Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, pp. 1167–1174 (2010)

    Google Scholar 

  • Zinkevich, M.: Online Convex Programming and Generalized Infinitesimal Gradient Ascent. In: Machine Learning International Conference, vol. 20(2), p. 928 (2003)

    Google Scholar 

  • Zinkevich, M., Greenwald, A., Littman, M.: Cyclic equilibria in Markov games. In: Advances in Neural Information Processing Systems, vol. 18, p. 1641 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ann Nowé .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Nowé, A., Vrancx, P., De Hauwere, YM. (2012). Game Theory and Multi-agent Reinforcement Learning. In: Wiering, M., van Otterlo, M. (eds) Reinforcement Learning. Adaptation, Learning, and Optimization, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27645-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27645-3_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27644-6

  • Online ISBN: 978-3-642-27645-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics