Skip to main content
Log in

Distributed Dynamic Reinforcement of Efficient Outcomes in Multiagent Coordination and Network Formation

  • Published:
Dynamic Games and Applications Aims and scope Submit manuscript

Abstract

We analyze reinforcement learning under so-called “dynamic reinforcement.” In reinforcement learning, each agent repeatedly interacts with an unknown environment (i.e., other agents), receives a reward, and updates the probabilities of its next action based on its own previous actions and received rewards. Unlike standard reinforcement learning, dynamic reinforcement uses a combination of long-term rewards and recent rewards to construct myopically forward looking action selection probabilities. We analyze the long-term stability of the learning dynamics for general games with pure strategy Nash equilibria and specialize the results for coordination games and distributed network formation. In this class of problems, more than one stable equilibrium (i.e., coordination configuration) may exist. We demonstrate equilibrium selection under dynamic reinforcement. In particular, we show how a single agent is able to destabilize an equilibrium in favor of another by appropriately adjusting its dynamic reinforcement parameters. We contrast the conclusions with prior game theoretic results according to which the risk-dominant equilibrium is the only robust equilibrium when agents’ decisions are subject to small randomized perturbations. The analysis throughout is based on the ODE method for stochastic approximations, where a special form of perturbation in the learning dynamics allows for analyzing its behavior at the boundary points of the state space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Arthur WB (1993) On designing economic agents that behave like human agents. J Evol Econ 3:1–22

    Article  Google Scholar 

  2. Bala V, Goyal S (2000) A noncooperative model of network formation. Econometrica 68(5):1181–1229

    Article  MathSciNet  MATH  Google Scholar 

  3. Basar T (1987) Relaxation techniques and asynchronous algorithms for online computation of non-cooperative equilibria. J Econ Dyn Control 11:531–549

    Article  MathSciNet  MATH  Google Scholar 

  4. Beggs AW (2005) On the convergence of reinforcement learning. J Econ Theory 122:1–36

    Article  MathSciNet  MATH  Google Scholar 

  5. Benaim M, Hofbauer J, Hopkins E (2009) Learning in games with unstable equilibria. J Econ Theory 144(4):1694–1709

    Article  MATH  Google Scholar 

  6. Bergin J, Lipman BL (1996) Evolution with state-dependent mutations. Econometrica 64(4):943–956

    Article  MATH  Google Scholar 

  7. Blondel VD, Hendrickx JM, Olshevsky A, Tsitsiklis JN (2005) Convergence in multiagent coordination, consensus, and flocking. In: Proceedings of the 44th IEEE conference on decision and control, Seville, Spain, December 2005, pp 2996–3000

    Chapter  Google Scholar 

  8. Börgers T, Sarin R (1997) Learning through reinforcement and replicator dynamics. J Econ Theory 77(1):1–14

    Article  Google Scholar 

  9. Borkar VS (2008) Stochastic approximation: a dynamical systems viewpoint. Cambridge University Press, New York

    Google Scholar 

  10. Cho IK, Matsui A (2005) Learning aspiration in repeated games. J Econ Theory 124:171–201

    Article  MathSciNet  MATH  Google Scholar 

  11. Conlisk J (1993) Adaptation in game: two solutions to the Crawford puzzle. J Econ Behav Organ 22:25–50

    Article  Google Scholar 

  12. Crawford VP (1985) Learning behavior and mixed strategy Nash equilibria. J Econ Behav Organ 6:69–78

    Article  Google Scholar 

  13. Dutta B, Jackson M (2000) The stability and efficiency of directed communication networks. Rev Econ Des 5:251–272

    Google Scholar 

  14. Erev I, Roth A (1998) Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. Am Econ Rev 88:848–881

    Google Scholar 

  15. Franklin GF, Powell JD, Emami-Naeini A (2002) Feedback control of dynamic systems. Prentice Hall, Upper Saddle River

    Google Scholar 

  16. Fudenberg D, Levine DK (1998) The theory of learning in games. MIT Press, Cambridge

    MATH  Google Scholar 

  17. Fudenberg D, Tirole J (1991) Game theory. MIT Press, Cambridge

    Google Scholar 

  18. Hart S (2005) Adaptive heuristics. Econometrica 73(5):1401–1430

    Article  MathSciNet  MATH  Google Scholar 

  19. Hess RA, Modjtahedzadeh A (1990) A control theoretic model of driver steering behavior. IEEE Control Syst Mag 10(5):3–8

    Article  Google Scholar 

  20. Hofbauer J, Sigmund K (1988) The theory of evolution and dynamical systems. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  21. Hopkins E, Posch M (2005) Attainability of boundary points under reinforcement learning. Games Econ Behav

  22. Jackson M, Wolinsky A (1996) A strategic model of social and economic networks. J Econ Theory 71:44–74

    Article  MathSciNet  MATH  Google Scholar 

  23. Jadbabaie A, Lin J, Morse SA (2003) Coordination of groups of mobile agents using nearest neighbor rules. IEEE Trans Autom Control 48(6):988–1001

    Article  MathSciNet  Google Scholar 

  24. Kandori M, Mailath G, Rob R (1993) Learning, mutation, and long-run equilibria in games. Econometrica 61:29–56

    Article  MathSciNet  MATH  Google Scholar 

  25. Karandikar R, Mookherjee D, Ray D (1998) Evolving aspirations and cooperation. J Econ Theory 80:292–331

    Article  MathSciNet  MATH  Google Scholar 

  26. Khalil HK (1992) Nonlinear systems. Prentice-Hall, New York

    MATH  Google Scholar 

  27. Kushner HJ, Yin GG (2003) Stochastic approximation and recursive algorithms and applications, 2nd edn. Springer, New York

    MATH  Google Scholar 

  28. Lewis D (2002) Convention: a philosophical study. Blackwell, Oxford

    Google Scholar 

  29. Massaquoi SG (1999) Modeling the function of the cerebellum in scheduled linear servo control of simple horizontal planar arm movements. PhD thesis, Cambridge, MA

  30. Matsui A, Matsuyama K (1995) An approach to equilibrium selection. J Econ Theory 65:415–434

    Article  MathSciNet  MATH  Google Scholar 

  31. Maynard Smith J (1982) Evolution and the theory of games. Cambridge University Press, Cambridge

    Google Scholar 

  32. McRuer D (1980) Human dynamics in man–machine systems. Automatica 16:237–253

    Article  MATH  Google Scholar 

  33. Moreau L (2005) Stability of multiagent systems with time-dependent communication links. IEEE Trans Autom Control 50(2):169–182

    Article  MathSciNet  Google Scholar 

  34. Najim K, Poznyak AS (1994) Learning automata: theory and applications. Pergamon, Elmsford

    Google Scholar 

  35. Narendra K, Thathachar M (1989) Learning automata: an introduction. Prentice-Hall, New York

    Google Scholar 

  36. Norman MF (1968) On linear models with two absorbing states. J Math Psychol 5:225–241

    Article  MathSciNet  MATH  Google Scholar 

  37. Olfati-Saber R (2006) Flocking for multiagent dynamic systems: algorithms and theory. IEEE Trans Autom Control 51(3):401–420

    Article  MathSciNet  Google Scholar 

  38. Olfati-Saber R, Fax JA, Murray RM (2007) Consensus and cooperation in networked multiagent systems. Proc IEEE 95(1):215–233

    Article  Google Scholar 

  39. Pemantle R (1990) Nonconvergence to unstable points in urn models and stochastic approximations. Ann Probab 18(2):698–712

    Article  MathSciNet  MATH  Google Scholar 

  40. Sandholm WH (2010) Population games and evolutionary dynamics. MIT Press, Cambridge

    MATH  Google Scholar 

  41. Schelling T (2006) The strategy of conflict. Harvard University Press, Cambridge

    Google Scholar 

  42. Selten R (1991) Anticipatory learning in two-person games. In: Selton R (ed) Game equilibrium models. Evolution and game dynamics, vol I. Springer, Berlin, pp 98–153

    Google Scholar 

  43. Shamma JS, Arslan G (2005) Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria. IEEE Trans Autom Control 50(3):312–327

    Article  MathSciNet  Google Scholar 

  44. Shapiro IJ, Narendra KS (1969) Use of stochastic automata for parameter self-organization with multi-modal performance criteria. IEEE Trans Syst Sci Cybern 5:352–360

    Article  MATH  Google Scholar 

  45. Skyrms B (2002) Signals, evolution and the explanatory power of transient information. Philos Sci 69:407–428

    Article  MathSciNet  Google Scholar 

  46. Skyrms B, Pemantle R (2000) A dynamic model of social network formation. Proc Natl Acad Sci USA 97:9340–9346

    Article  MATH  Google Scholar 

  47. Strogatz SH (2003) Sync: the emerging science of spontaneous order. Hyperion, New York

    Google Scholar 

  48. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  49. Tang F-F (2001) Anticipatory learning in two-person games: some experimental results. J Econ Behav Organ 44:221–232

    Article  Google Scholar 

  50. Verbeeck K, Nowe A, Lenaerts T, Parent J (2002) Learning to reach the Pareto optimal Nash equilibrium as a team. In: McKay RI, Slaney J (eds) AI2002 advances in artificial intelligence. LNCS, vol 2557. Springer, Berlin

    Google Scholar 

  51. Wang X, Sandholm T (2003) Learning near-Pareto-optimal conventions in polynomial time. In: Neural information processing systems: natural and synthetic (NIPS)

    Google Scholar 

  52. Weibull J (1997) Evolutionary game theory. MIT Press, Cambridge

    Google Scholar 

  53. Young HP (1993) The evolution of conventions. Econometrica 61:57–84

    Article  MathSciNet  MATH  Google Scholar 

  54. Young HP (1998) Individual strategy and social structure. Princeton University Press, Princeton

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios C. Chasparis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chasparis, G.C., Shamma, J.S. Distributed Dynamic Reinforcement of Efficient Outcomes in Multiagent Coordination and Network Formation. Dyn Games Appl 2, 18–50 (2012). https://doi.org/10.1007/s13235-011-0038-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13235-011-0038-z

Keywords

Navigation