Abstract
Learning in multi-agent systems (MAS) is a complex task. Current learning theory for single-agent systems does not extend to multi-agent problems. In a MAS the reinforcement an agent receives may depend on the actions taken by the other agents present in the system. Hence, the Markov property no longer holds and convergence guarantees are lost. Currently there does not exist a general formal theory describing and elucidating the conditions under which algorithms for multi-agent learning (MAL) are successful. Therefore it is important to fully understand the dynamics of multi-agent reinforcement learning, and to be able to analyze learning behavior in terms of stability and resilience of equilibria. Recent work has considered the replicator dynamics of evolutionary game theory for this purpose. In this paper we contribute to this framework. More precisely, we formally derive the evolutionary dynamics of the Regret Minimization polynomial weights learning algorithm, which will be described by a system of differential equations. Using these equations we can easily investigate parameter settings and analyze the dynamics of multiple concurrently learning agents using regret minimization. In this way it is clear why certain attractors are stable and potentially preferred over others, and what the basins of attraction look like. Furthermore, we experimentally show that the dynamics predict the real learning behavior and we test the dynamics also in non-self play, comparing the polynomial weights algorithm against the previously derived dynamics of Q-learning and various Linear Reward algorithms in a set of benchmark normal form games.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Van Ahee, G.J.: Models for Multi-Agent Learning. Master’s thesis, Delft University of Technology (2009)
Blum, A., Mansour, Y.: Learning, regret minimization and equilibria. In: Algorithmic Game Theory. Cambridge University Press, Cambridge (2007)
Börgers, T., Sarin, R.: Learning through reinforcement and replicator dynamics. J. Economic Theory 77 (1997)
Bowling, M.: Convergence problems of general-sum multiagent reinforcement learning. In: ICML (2000)
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: AAAI (1998)
Conitzer, V., Sandholm, T.: AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning 67 (2007)
Gomes, E.R., Kowalczyk, R.: Dynamic analysis of multiagent Q-learning with epsilon-greedy exploration. In: ICML (2009)
Hennes, D., Tuyls, K.: State-coupled replicator dynamics. In: AAMAS (2009)
Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge University Press, Cambridge (1998)
Hu, J., Wellman, M.P.: Multiagent reinforcement learning: Theoretical framework and an algorithm. In: ICML (1998)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. J. Artificial Intelligence Research 4 (1996)
Narendra, K., Thathachar, M.: Learning Automata: An Introduction. Prentice-Hall, Englewood Cliffs (1989)
Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. J. AAMAS 11 (2005)
Shoham, Y., Powers, R., Grenager, T.: If multi-agent learning is the answer, what is the question? Artificial Intelligence 171 (2007)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Tsitsiklis, J.: Asynchronous stochastic approximation and Q-learning. Tech. rep., LIDS Research Center, MIT (1993)
Tuyls, K., ’t Hoen, P.J., Vanschoenwinkel, B.: An evolutionary dynamical analysis of multi-agent learning in iterated games. J. AAMAS 12 (2006)
Tuyls, K., Parsons, S.: What evolutionary game theory tells us about multiagent learning. Artificial Intelligence 171 (2007)
Tuyls, K., Verbeeck, K., Lenaerts, T.: A selection-mutation model for Q-learning in multi-agent systems. In: AAMAS (2003)
Vega-Redondo, F.: Game Theory and Economics. Cambridge University Press, Cambridge (2001)
Walsh, W.E., Das, R., Tesauro, G., Kephart, J.O.: Analyzing complex strategic interactions in multi-agent systems. In: Workshop on Game-Theoretic and Decision-Theoretic Agents (2002)
Watkins, C., Dayan, P.: Q-learning. Machine Learning 8 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Klos, T., van Ahee, G.J., Tuyls, K. (2010). Evolutionary Dynamics of Regret Minimization. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6322. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15883-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-15883-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15882-7
Online ISBN: 978-3-642-15883-4
eBook Packages: Computer ScienceComputer Science (R0)