International Journal of Game Theory

, Volume 44, Issue 3, pp 667–699 | Cite as

Nonconvergence to saddle boundary points under perturbed reinforcement learning

  • Georgios C. Chasparis
  • Jeff S. Shamma
  • Anders Rantzer


For several reinforcement learning models in strategic-form games, convergence to action profiles that are not Nash equilibria may occur with positive probability under certain conditions on the payoff function. In this paper, we explore how an alternative reinforcement learning model, where the strategy of each agent is perturbed by a strategy-dependent perturbation (or mutations) function, may exclude convergence to non-Nash pure strategy profiles. This approach extends prior analysis on reinforcement learning in games that addresses the issue of convergence to saddle boundary points. It further provides a framework under which the effect of mutations can be analyzed in the context of reinforcement learning.


Learning in games Reinforcement learning Replicator dynamics 

JEL Classification

C72 C73 D83 



This work was supported by the Swedish Research Council through the Linnaeus Center LCCC and the AFOSR MURI project #FA9550-09-1-0538.


  1. Altman E, Hayel Y, Kameda H (2007) Evolutionary dynamics and potential games in non-cooperative routing. In: WiOpt 2007, LimassolGoogle Scholar
  2. Arthur WB (1993) On designing economic agents that behave like human agents. J Evol Econ 3:1–22CrossRefGoogle Scholar
  3. Beggs A (2005) On the convergence of reinforcement learning. J Econ Theory 122:1–36CrossRefGoogle Scholar
  4. Bergin J, Lipman BL (1996) Evolution with state-dependent mutations. Econometrica 64(4):943–956CrossRefGoogle Scholar
  5. Bonacich P, Liggett T (2003) Asymptotics of a matrix-valued markov chain arising in sociology. Stoch Process Appl 104:155–171CrossRefGoogle Scholar
  6. Börgers T, Sarin R (1997) Learning through reinforcement and replicator dynamics. J Econ Theory 77(1):1–14CrossRefGoogle Scholar
  7. Bush R, Mosteller F (1955) Stochastic models of learning. Wiley, New YorkCrossRefGoogle Scholar
  8. Chasparis G, Shamma J (2012) Distributed dynamic reinforcement of efficient outcomes in multiagent coordination and network formation. Dyn Games Appl 2(1):18–50CrossRefGoogle Scholar
  9. Cho IK, Matsui A (2005) Learning aspiration in repeated games. J Econ Theory 124:171–201CrossRefGoogle Scholar
  10. Erev I, Roth A (1998) Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. Am Econ Rev 88:848–881Google Scholar
  11. Hofbauer J, Sigmund K (1998) Evolution games and population dynamics. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  12. Hopkins E, Posch M (2005) Attainability of boundary points under reinforcement learning. Games Econ Behav 53:110–125CrossRefGoogle Scholar
  13. Kushner HJ, Yin GG (2003) Stochastic approximation and recursive algorithms and applications, 2nd edn. Springer-Verlag, New YorkGoogle Scholar
  14. Leslie D (2004) Reinforcement learning in games. Ph.D. thesis, School of Mathematics, University of BristolGoogle Scholar
  15. Marden J, Arslan G, Shamma J (2009) Cooperative control and potential games. IEEE Trans Syst Man Cybern B 39(6):1393–1407CrossRefGoogle Scholar
  16. Monderer D, Shapley L (1996) Potential games. Games Econ Behav 14:124–143CrossRefGoogle Scholar
  17. Narendra K, Thathachar M (1989) Learning automata: an introduction. Prentice-Hall, Upper Saddle RiverGoogle Scholar
  18. Nevelson MB, Hasminskii RZ (1976) Stochastic approximation and recursive. American Mathematical Society, ProvidenceGoogle Scholar
  19. Norman MF (1968) On linear models with two absorbing states. J Math Psychol 5:225–241CrossRefGoogle Scholar
  20. Pemantle R (1990) Nonconvergence to unstable points in urn models and stochastic approximations. Ann Probab 18(2):698–712CrossRefGoogle Scholar
  21. Posch M (1997) Cycling in a stochastic learning algorithm for normal form games. Evolut Econ 7:193–207CrossRefGoogle Scholar
  22. Rosenthal R (1973) A class of games possessing pure-strategy Nash equilibria. Int J Game Theory 2(1):65–67CrossRefGoogle Scholar
  23. Rudin W (1964) Principles of mathematical analysis. McGraw-Hill Book Company, New YorkGoogle Scholar
  24. Sandholm W (2001) Potential games with continuous player sets. J EconTheory 97:81–108Google Scholar
  25. Sandholm WH (2010) Population games and evolutionary dynamics. The MIT Press, CambridgeGoogle Scholar
  26. Savla K, Frazzoli E (2010) Game-theoretic learning algorithm for a spatial coverage problem. In: 47th annual allerton conference on communication, control and computing, AllertonGoogle Scholar
  27. Shapiro IJ, Narendra KS (1969) Use of stochastic automata for parameter self-organization with multi-modal performance criteria. IEEE Transac Syst Sci Cybern 5:352–360CrossRefGoogle Scholar
  28. Skyrms B, Pemantle R (2000) A dynamic model of social network formation. Proceedings of the national academy of sciences of the USA 97, 9340–9346Google Scholar
  29. Smith JM (1982) Evolution and the theory of games. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  30. Weibull J (1997) Evolutionary game theory. MIT Press, CambridgeGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Georgios C. Chasparis
    • 1
  • Jeff S. Shamma
    • 2
  • Anders Rantzer
    • 3
  1. 1.Department of Data Analysis SystemsSoftware Competence Center Hagenberg GmbHHagenbergAustria
  2. 2.School of Electrical and Computer EngineeringGeorgia Institute of TechnologyAtlantaUSA
  3. 3.Department of Automatic ControlLund UniversityLundSweden

Personalised recommendations