Advertisement

Autonomous Agents and Multi-Agent Systems

, Volume 14, Issue 3, pp 239–269 | Cite as

Exploring selfish reinforcement learning in repeated games with stochastic rewards

  • Katja VerbeeckEmail author
  • Ann Nowé
  • Johan Parent
  • Karl Tuyls
Article

Abstract

In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication. In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive load-balancing of parallel applications is added.

Keywords

Multi-agent reinforcement learning Learning automata Non-zero sum games 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aumann R. (1974). Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1, 67–96zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Brafman R., Tennenholtz M. (2003). Learning to coordinate efficiently: A model-based approach. Journal on Artificial Intelligence Research (JAIR), 19, 11–23zbMATHMathSciNetGoogle Scholar
  3. 3.
    Carpenter, M., & Kudenko, D. (2004). Baselines for joint-action reinforcement learning of coordination in cooperative multi-agent systems. In Proceedings of the 4th symposium on adaptive agents and multi-agent systems, (AISB04) Society for the study of Artificial Intelligence and Simulation of Behaviour (pp. 10–19).Google Scholar
  4. 4.
    Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th national conference on artificial intelligence (pp. 746–752).Google Scholar
  5. 5.
    Geist, A., & Beguelin, A. (1994). PVM: Parallel virtual machine. MIT Press.Google Scholar
  6. 6.
    Gintis H. (2000). Game theory evolving: A problem-centered introduction to modeling strategic behavior. Princeton, New Jersey, Princeton University PressGoogle Scholar
  7. 7.
    Greenwald, A., & Hall, K. (2003). Correlated q-learning. In Proceedings of the twentieth international conference on machine learning (pp. 242–249).Google Scholar
  8. 8.
    Hu J., Wellman M. (2003). Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research, 4:1039–1069CrossRefMathSciNetGoogle Scholar
  9. 9.
    Kapetanakis, S., & Kudenko, D. (2002). Reinforcement learning of coordination in cooperative multi-agent systems. In Proceedings of the 18th national conference on artificial intelligence (pp. 326–331).Google Scholar
  10. 10.
    Kapetanakis, S., Kudenko, D., & Strens, M. (2003). Learning to coordinate using commitment sequences in cooperative multi-agent systems. In Proceedings of the 3rd symposium on adaptive agents and multi-agent systems, (AISB03) Society for the study of Artificial Intelligence and Simulation of Behaviour.Google Scholar
  11. 11.
    Lauer, M., & Riedmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the 17th international conference on machine learning (pp. 535–542).Google Scholar
  12. 12.
    Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning (pp. 322–328).Google Scholar
  13. 13.
    Littman, M. (2001). Friend-or-foe q-learning in general-sum games. In Proceedings of the 18th international conference on machine learning (pp. 157–163).Google Scholar
  14. 14.
    Littman, M., & Szepesvári, C. (1996). A generalized reinforcement-learning model: Convergence and applications. In Proceedings of the 13th international conference on machine learning (pp. 310–318).Google Scholar
  15. 15.
    Narendra, K., & Thathachar, M. (1989). Learning automata: An introduction. Prentice-Hall International, Inc.Google Scholar
  16. 16.
    Nash, J. (1950). Equilibrium points in n-person games. Proceedings of the national academy of siences 36, 48–49.Google Scholar
  17. 17.
    Nowé, A., Parent, J., & Verbeeck, K. (2001). Social agents playing a periodical policy. In Proceedings of the 12th European conference on machine learning pp. 382–393. Freiburg, Germany: Springer-Verlag LNAI2168.Google Scholar
  18. 18.
    Osborne J., Rubinstein A. (1994). A course in game theory. Cambridge, MA, MIT PressGoogle Scholar
  19. 19.
    Samuelson L. (1997). Evolutionary games and equilibrium selection. Cambridge, MA, MIT PresszbMATHGoogle Scholar
  20. 20.
    Sastry P., Phansalkar V., Thathachar M. (1994). Decentralized learning of nash equilibria in multi-person stochastic games with incomplete information. IEEE Transactions on Systems, Man, and Cybernetics, 24(5):769–777CrossRefMathSciNetGoogle Scholar
  21. 21.
    Sutton R., Barto A. (1998). Reinforcement learning: An introduction. Cambridge, MA, MIT PressGoogle Scholar
  22. 22.
    Thathachar M., Sastry P. (2002). Varieties of learning automata: An overview. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 32(6):711–722CrossRefGoogle Scholar
  23. 23.
    Tsitsiklis J. (1994). Asynchronous stochastic approximation and q-learning. Machine Learning, 16, 185–202zbMATHGoogle Scholar
  24. 24.
    Tuyls, K. (2004). Multiagent reinforcement learning: A game theoretic approach. PhD Thesis, Computational Modeling Lab, Vrije Universiteit Brussel, Belgium.Google Scholar
  25. 25.
    Verbeeck, K. (2004). Coordinated exploration in multi-agent reinforcement learning. PhD Thesis, Computational Modeling Lab, Vrije Universiteit Brussel, Belgium.Google Scholar
  26. 26.
    Verbeeck, K., Nowé, A., & Parent, J. (2002). Homo egualis reinforcement learning agents for load balancing. In Proceedings of the 1st NASA workshop on radical agent concepts, pp. 81–91. Springer-Verlag LNAI 2564.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2006

Authors and Affiliations

  • Katja Verbeeck
    • 1
    Email author
  • Ann Nowé
    • 1
  • Johan Parent
    • 1
  • Karl Tuyls
    • 2
  1. 1.Computational Modeling Lab (COMO)Vrije Universiteit BrusselBrusselsBelgium
  2. 2.Institute for Knowledge and Agent Technology (IKAT)University of MaastrichtMaastrichtThe Netherlands

Personalised recommendations