Learning to Reach the Pareto Optimal Nash Equilibrium as a Team

Verbeeck, Katja; Nowé, Ann; Lenaerts, Tom; Parent, Johan

doi:10.1007/3-540-36187-1_36

Katja Verbeeck³,
Ann Nowé³,
Tom Lenaerts³ &
…
Johan Parent³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2557))

Included in the following conference series:

Australian Joint Conference on Artificial Intelligence

1241 Accesses
4 Citations

Abstract

Coordination is an important issue in multi-agent systems when agents want to maximize their revenue. Often coordination is achieved through communication, however communication has its price. We are interested in finding an approach where the communication between the agents is kept low, and a global optimal behavior can still be found.

In this paper we report on an efficient approach that allows independent reinforcement learning agents to reach a Pareto optimal Nash equilibrium with limited communication. The communication happens at regular time steps and is basicallya signal for the agents to start an exploration phase. During each exploration phase, some agents exclude their current best action so as to give the team the opportunityto look for a possiblyb etter Nash equilibrium. This technique of reducing the action space byexclusions was onlyrecen tlyin troduced for finding periodical policies in games of conflicting interests. Here, we explore this technique in repeated common interest games with deterministic or stochastic outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Claus C., Boutilier C.: The dynamics of reinforcement learning in cooperative multi-agent systems. Proceedings of the fifteenth National Conference on Artificial Intelligence,(1998) p 746–752.
Google Scholar
Hu J., Wellman M. P.: Multi Agent Reinforcement Learning. Journal of Machine Learning Research 1 (2002) p 1–32.
Google Scholar
Jafari, C., Greenwald, A., Gondek, D. and Ercal, G.: On no-regret learning, fictitious play, and nash equilibrium. Proceedings of the Eighteenth International Conference on Machine Learning, (2001) p 223–226.
Google Scholar
Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. Proceedings of the seventeenth International Conference on Machine Learning (2000)
Google Scholar
Litmann M.L.: Markov games as a framework for multi-agent reinforcement learning. Proceedings of the Eleventh International Conference on Machine Learning, (1994) p 157–163.
Google Scholar
Narendra K., Thathachar M.,: Learning Automata: An Introduction. Prentice-Hall (1989).
Google Scholar
Nowé, A., Parent, J., Verbeeck, K.: Social agents playing a periodical poliy. Proceedings of the 12th European Conference on Machine Learning, (2001) p 382–393)
Google Scholar
Nowé, A., Verbeeck, K.: Distributed Reinforcement learning, Loadbased Routing a case study. Proceedings of the Neural, Symbolic and Reinforcement Methods for sequence Learning Workshop at ijcai99.
Google Scholar
Osborne J.O., Rubinstein A.: A course in game theory. Cambridge, MA: MIT Press (1994).
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An introduction. Cambridge, MA: MIT Press (1998).
Google Scholar
Samuelson, L.: Evolutionarygames and equilibrium selection. Cambridge, MA: MIT Press (1997).
Google Scholar
Verbeeck, K., Nowé, A., Parent, J.: Homo egualis reinforcement learning agents for load balancing. Proceedings of the first NASA Workshop on Radical Agent Concepts. (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

COMO como.vub.ac.be, Vrije Universiteit Brussel, Pleinlaan 2 1050, Brussel, Belgium
Katja Verbeeck, Ann Nowé, Tom Lenaerts & Johan Parent

Authors

Katja Verbeeck
View author publications
You can also search for this author in PubMed Google Scholar
Ann Nowé
View author publications
You can also search for this author in PubMed Google Scholar
Tom Lenaerts
View author publications
You can also search for this author in PubMed Google Scholar
Johan Parent
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Australian Defence Force Academy, University of New South Wales, ACT 2600, Canberra, Australia
Bob McKay
Computer Science Laboratory, Australian National University, RSISE Building, ACT 0200, Canberra, Australia
John Slaney

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Verbeeck, K., Nowé, A., Lenaerts, T., Parent, J. (2002). Learning to Reach the Pareto Optimal Nash Equilibrium as a Team. In: McKay, B., Slaney, J. (eds) AI 2002: Advances in Artificial Intelligence. AI 2002. Lecture Notes in Computer Science(), vol 2557. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36187-1_36

Download citation

DOI: https://doi.org/10.1007/3-540-36187-1_36
Published: 08 November 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00197-3
Online ISBN: 978-3-540-36187-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics