Evolutionary Dynamics of Regret Minimization

Klos, Tomas; van Ahee, Gerrit Jan; Tuyls, Karl

doi:10.1007/978-3-642-15883-4_6

Tomas Klos²³,
Gerrit Jan van Ahee²⁴ &
Karl Tuyls²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6322))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2501 Accesses
7 Citations
1 Altmetric

Abstract

Learning in multi-agent systems (MAS) is a complex task. Current learning theory for single-agent systems does not extend to multi-agent problems. In a MAS the reinforcement an agent receives may depend on the actions taken by the other agents present in the system. Hence, the Markov property no longer holds and convergence guarantees are lost. Currently there does not exist a general formal theory describing and elucidating the conditions under which algorithms for multi-agent learning (MAL) are successful. Therefore it is important to fully understand the dynamics of multi-agent reinforcement learning, and to be able to analyze learning behavior in terms of stability and resilience of equilibria. Recent work has considered the replicator dynamics of evolutionary game theory for this purpose. In this paper we contribute to this framework. More precisely, we formally derive the evolutionary dynamics of the Regret Minimization polynomial weights learning algorithm, which will be described by a system of differential equations. Using these equations we can easily investigate parameter settings and analyze the dynamics of multiple concurrently learning agents using regret minimization. In this way it is clear why certain attractors are stable and potentially preferred over others, and what the basins of attraction look like. Furthermore, we experimentally show that the dynamics predict the real learning behavior and we test the dynamics also in non-self play, comparing the polynomial weights algorithm against the previously derived dynamics of Q-learning and various Linear Reward algorithms in a set of benchmark normal form games.

Download to read the full chapter text

Chapter PDF

Hedging Under Uncertainty: Regret Minimization Meets Exponentially Fast Convergence

Reinforcement Learning for Non-stationary Discrete-Time Linear–Quadratic Mean-Field Games in Multiple Populations

Article 10 May 2022

Muhammad Aneeq uz Zaman, Erik Miehling & Tamer Başar

Continuous Time Learning Algorithms in Optimization and Game Theory

Article 31 January 2022

Sylvain Sorin

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Van Ahee, G.J.: Models for Multi-Agent Learning. Master’s thesis, Delft University of Technology (2009)
Google Scholar
Blum, A., Mansour, Y.: Learning, regret minimization and equilibria. In: Algorithmic Game Theory. Cambridge University Press, Cambridge (2007)
Google Scholar
Börgers, T., Sarin, R.: Learning through reinforcement and replicator dynamics. J. Economic Theory 77 (1997)
Google Scholar
Bowling, M.: Convergence problems of general-sum multiagent reinforcement learning. In: ICML (2000)
Google Scholar
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: AAAI (1998)
Google Scholar
Conitzer, V., Sandholm, T.: AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning 67 (2007)
Google Scholar
Gomes, E.R., Kowalczyk, R.: Dynamic analysis of multiagent Q-learning with epsilon-greedy exploration. In: ICML (2009)
Google Scholar
Hennes, D., Tuyls, K.: State-coupled replicator dynamics. In: AAMAS (2009)
Google Scholar
Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge University Press, Cambridge (1998)
Google Scholar
Hu, J., Wellman, M.P.: Multiagent reinforcement learning: Theoretical framework and an algorithm. In: ICML (1998)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. J. Artificial Intelligence Research 4 (1996)
Google Scholar
Narendra, K., Thathachar, M.: Learning Automata: An Introduction. Prentice-Hall, Englewood Cliffs (1989)
Google Scholar
Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. J. AAMAS 11 (2005)
Google Scholar
Shoham, Y., Powers, R., Grenager, T.: If multi-agent learning is the answer, what is the question? Artificial Intelligence 171 (2007)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Tsitsiklis, J.: Asynchronous stochastic approximation and Q-learning. Tech. rep., LIDS Research Center, MIT (1993)
Google Scholar
Tuyls, K., ’t Hoen, P.J., Vanschoenwinkel, B.: An evolutionary dynamical analysis of multi-agent learning in iterated games. J. AAMAS 12 (2006)
Google Scholar
Tuyls, K., Parsons, S.: What evolutionary game theory tells us about multiagent learning. Artificial Intelligence 171 (2007)
Google Scholar
Tuyls, K., Verbeeck, K., Lenaerts, T.: A selection-mutation model for Q-learning in multi-agent systems. In: AAMAS (2003)
Google Scholar
Vega-Redondo, F.: Game Theory and Economics. Cambridge University Press, Cambridge (2001)
Google Scholar
Walsh, W.E., Das, R., Tesauro, G., Kephart, J.O.: Analyzing complex strategic interactions in multi-agent systems. In: Workshop on Game-Theoretic and Decision-Theoretic Agents (2002)
Google Scholar
Watkins, C., Dayan, P.: Q-learning. Machine Learning 8 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Delft University of Technology, Delft, The Netherlands
Tomas Klos
Yes2web, Rotterdam, The Netherlands
Gerrit Jan van Ahee
Maastricht University, Maastricht, The Netherlands
Karl Tuyls

Authors

Tomas Klos
View author publications
You can also search for this author in PubMed Google Scholar
Gerrit Jan van Ahee
View author publications
You can also search for this author in PubMed Google Scholar
Karl Tuyls
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Matemáticas, Estadística y Computación, Universidad de Cantabria, Avenida de los Castros, s/n, 39071, Santander, Spain
José Luis Balcázar
Yahoo! Research Barcelona, Avinguda Diagonal 177, 08018, Barcelona, Spain
Francesco Bonchi
Yahoo! Research Barcelona, Avinguda Diagnonal 177, 08018, Barcelona, Spain
Aristides Gionis
TAO, CNRS-INRIA-LRI, Université Paris-Sud, 91405, Orsay, France
Michèle Sebag

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Klos, T., van Ahee, G.J., Tuyls, K. (2010). Evolutionary Dynamics of Regret Minimization. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6322. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15883-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-15883-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15882-7
Online ISBN: 978-3-642-15883-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evolutionary Dynamics of Regret Minimization

Abstract

Chapter PDF

Similar content being viewed by others

Hedging Under Uncertainty: Regret Minimization Meets Exponentially Fast Convergence

Reinforcement Learning for Non-stationary Discrete-Time Linear–Quadratic Mean-Field Games in Multiple Populations

Continuous Time Learning Algorithms in Optimization and Game Theory

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Evolutionary Dynamics of Regret Minimization

Abstract

Chapter PDF

Similar content being viewed by others

Hedging Under Uncertainty: Regret Minimization Meets Exponentially Fast Convergence

Reinforcement Learning for Non-stationary Discrete-Time Linear–Quadratic Mean-Field Games in Multiple Populations

Continuous Time Learning Algorithms in Optimization and Game Theory

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation