Learning to Reach the Pareto Optimal Nash Equilibrium as a Team

  • Katja Verbeeck
  • Ann Nowé
  • Tom Lenaerts
  • Johan Parent
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2557)


Coordination is an important issue in multi-agent systems when agents want to maximize their revenue. Often coordination is achieved through communication, however communication has its price. We are interested in finding an approach where the communication between the agents is kept low, and a global optimal behavior can still be found.

In this paper we report on an efficient approach that allows independent reinforcement learning agents to reach a Pareto optimal Nash equilibrium with limited communication. The communication happens at regular time steps and is basicallya signal for the agents to start an exploration phase. During each exploration phase, some agents exclude their current best action so as to give the team the opportunityto look for a possiblyb etter Nash equilibrium. This technique of reducing the action space byexclusions was onlyrecen tlyin troduced for finding periodical policies in games of conflicting interests. Here, we explore this technique in repeated common interest games with deterministic or stochastic outcomes.


Nash Equilibrium Action Space Synchronization Phase Stochastic Game Independent Learner 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Claus C., Boutilier C.: The dynamics of reinforcement learning in cooperative multi-agent systems. Proceedings of the fifteenth National Conference on Artificial Intelligence,(1998) p 746–752.Google Scholar
  2. 2.
    Hu J., Wellman M. P.: Multi Agent Reinforcement Learning. Journal of Machine Learning Research 1 (2002) p 1–32.Google Scholar
  3. 3.
    Jafari, C., Greenwald, A., Gondek, D. and Ercal, G.: On no-regret learning, fictitious play, and nash equilibrium. Proceedings of the Eighteenth International Conference on Machine Learning, (2001) p 223–226.Google Scholar
  4. 4.
    Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. Proceedings of the seventeenth International Conference on Machine Learning (2000)Google Scholar
  5. 5.
    Litmann M.L.: Markov games as a framework for multi-agent reinforcement learning. Proceedings of the Eleventh International Conference on Machine Learning, (1994) p 157–163.Google Scholar
  6. 6.
    Narendra K., Thathachar M.,: Learning Automata: An Introduction. Prentice-Hall (1989).Google Scholar
  7. 7.
    Nowé, A., Parent, J., Verbeeck, K.: Social agents playing a periodical poliy. Proceedings of the 12th European Conference on Machine Learning, (2001) p 382–393)Google Scholar
  8. 8.
    Nowé, A., Verbeeck, K.: Distributed Reinforcement learning, Loadbased Routing a case study. Proceedings of the Neural, Symbolic and Reinforcement Methods for sequence Learning Workshop at ijcai99.Google Scholar
  9. 9.
    Osborne J.O., Rubinstein A.: A course in game theory. Cambridge, MA: MIT Press (1994).Google Scholar
  10. 10.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An introduction. Cambridge, MA: MIT Press (1998).Google Scholar
  11. 11.
    Samuelson, L.: Evolutionarygames and equilibrium selection. Cambridge, MA: MIT Press (1997).Google Scholar
  12. 12.
    Verbeeck, K., Nowé, A., Parent, J.: Homo egualis reinforcement learning agents for load balancing. Proceedings of the first NASA Workshop on Radical Agent Concepts. (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Katja Verbeeck
    • 1
  • Ann Nowé
    • 1
  • Tom Lenaerts
    • 1
  • Johan Parent
    • 1
  1. 1.COMO Universiteit BrusselBrusselBelgium

Personalised recommendations