# Nonconvergence to saddle boundary points under perturbed reinforcement learning

## Abstract

For several reinforcement learning models in strategic-form games, convergence to action profiles that are not Nash equilibria may occur with positive probability under certain conditions on the payoff function. In this paper, we explore how an alternative reinforcement learning model, where the strategy of each agent is perturbed by a strategy-dependent perturbation (or mutations) function, may exclude convergence to non-Nash pure strategy profiles. This approach extends prior analysis on reinforcement learning in games that addresses the issue of convergence to saddle boundary points. It further provides a framework under which the effect of mutations can be analyzed in the context of reinforcement learning.

## Keywords

Learning in games Reinforcement learning Replicator dynamics## JEL Classification

C72 C73 D83## Notes

### Acknowledgments

This work was supported by the Swedish Research Council through the Linnaeus Center LCCC and the AFOSR MURI project #FA9550-09-1-0538.

## References

- Altman E, Hayel Y, Kameda H (2007) Evolutionary dynamics and potential games in non-cooperative routing. In: WiOpt 2007, LimassolGoogle Scholar
- Arthur WB (1993) On designing economic agents that behave like human agents. J Evol Econ 3:1–22CrossRefGoogle Scholar
- Beggs A (2005) On the convergence of reinforcement learning. J Econ Theory 122:1–36CrossRefGoogle Scholar
- Bergin J, Lipman BL (1996) Evolution with state-dependent mutations. Econometrica 64(4):943–956CrossRefGoogle Scholar
- Bonacich P, Liggett T (2003) Asymptotics of a matrix-valued markov chain arising in sociology. Stoch Process Appl 104:155–171CrossRefGoogle Scholar
- Börgers T, Sarin R (1997) Learning through reinforcement and replicator dynamics. J Econ Theory 77(1):1–14CrossRefGoogle Scholar
- Bush R, Mosteller F (1955) Stochastic models of learning. Wiley, New YorkCrossRefGoogle Scholar
- Chasparis G, Shamma J (2012) Distributed dynamic reinforcement of efficient outcomes in multiagent coordination and network formation. Dyn Games Appl 2(1):18–50CrossRefGoogle Scholar
- Cho IK, Matsui A (2005) Learning aspiration in repeated games. J Econ Theory 124:171–201CrossRefGoogle Scholar
- Erev I, Roth A (1998) Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. Am Econ Rev 88:848–881Google Scholar
- Hofbauer J, Sigmund K (1998) Evolution games and population dynamics. Cambridge University Press, CambridgeCrossRefGoogle Scholar
- Hopkins E, Posch M (2005) Attainability of boundary points under reinforcement learning. Games Econ Behav 53:110–125CrossRefGoogle Scholar
- Kushner HJ, Yin GG (2003) Stochastic approximation and recursive algorithms and applications, 2nd edn. Springer-Verlag, New YorkGoogle Scholar
- Leslie D (2004) Reinforcement learning in games. Ph.D. thesis, School of Mathematics, University of BristolGoogle Scholar
- Marden J, Arslan G, Shamma J (2009) Cooperative control and potential games. IEEE Trans Syst Man Cybern B 39(6):1393–1407CrossRefGoogle Scholar
- Monderer D, Shapley L (1996) Potential games. Games Econ Behav 14:124–143CrossRefGoogle Scholar
- Narendra K, Thathachar M (1989) Learning automata: an introduction. Prentice-Hall, Upper Saddle RiverGoogle Scholar
- Nevelson MB, Hasminskii RZ (1976) Stochastic approximation and recursive. American Mathematical Society, ProvidenceGoogle Scholar
- Norman MF (1968) On linear models with two absorbing states. J Math Psychol 5:225–241CrossRefGoogle Scholar
- Pemantle R (1990) Nonconvergence to unstable points in urn models and stochastic approximations. Ann Probab 18(2):698–712CrossRefGoogle Scholar
- Posch M (1997) Cycling in a stochastic learning algorithm for normal form games. Evolut Econ 7:193–207CrossRefGoogle Scholar
- Rosenthal R (1973) A class of games possessing pure-strategy Nash equilibria. Int J Game Theory 2(1):65–67CrossRefGoogle Scholar
- Rudin W (1964) Principles of mathematical analysis. McGraw-Hill Book Company, New YorkGoogle Scholar
- Sandholm W (2001) Potential games with continuous player sets. J EconTheory 97:81–108Google Scholar
- Sandholm WH (2010) Population games and evolutionary dynamics. The MIT Press, CambridgeGoogle Scholar
- Savla K, Frazzoli E (2010) Game-theoretic learning algorithm for a spatial coverage problem. In: 47th annual allerton conference on communication, control and computing, AllertonGoogle Scholar
- Shapiro IJ, Narendra KS (1969) Use of stochastic automata for parameter self-organization with multi-modal performance criteria. IEEE Transac Syst Sci Cybern 5:352–360CrossRefGoogle Scholar
- Skyrms B, Pemantle R (2000) A dynamic model of social network formation. Proceedings of the national academy of sciences of the USA 97, 9340–9346Google Scholar
- Smith JM (1982) Evolution and the theory of games. Cambridge University Press, CambridgeCrossRefGoogle Scholar
- Weibull J (1997) Evolutionary game theory. MIT Press, CambridgeGoogle Scholar