Distributed Dynamic Reinforcement of Efficient Outcomes in Multiagent Coordination and Network Formation

Chasparis, Georgios C.; Shamma, Jeff S.

doi:10.1007/s13235-011-0038-z

Distributed Dynamic Reinforcement of Efficient Outcomes in Multiagent Coordination and Network Formation

Published: 05 November 2011

Volume 2, pages 18–50, (2012)
Cite this article

Dynamic Games and Applications Aims and scope Submit manuscript

Georgios C. Chasparis¹ &
Jeff S. Shamma¹

203 Accesses
22 Citations
Explore all metrics

Abstract

We analyze reinforcement learning under so-called “dynamic reinforcement.” In reinforcement learning, each agent repeatedly interacts with an unknown environment (i.e., other agents), receives a reward, and updates the probabilities of its next action based on its own previous actions and received rewards. Unlike standard reinforcement learning, dynamic reinforcement uses a combination of long-term rewards and recent rewards to construct myopically forward looking action selection probabilities. We analyze the long-term stability of the learning dynamics for general games with pure strategy Nash equilibria and specialize the results for coordination games and distributed network formation. In this class of problems, more than one stable equilibrium (i.e., coordination configuration) may exist. We demonstrate equilibrium selection under dynamic reinforcement. In particular, we show how a single agent is able to destabilize an equilibrium in favor of another by appropriately adjusting its dynamic reinforcement parameters. We contrast the conclusions with prior game theoretic results according to which the risk-dominant equilibrium is the only robust equilibrium when agents’ decisions are subject to small randomized perturbations. The analysis throughout is based on the ODE method for stochastic approximations, where a special form of perturbation in the learning dynamics allows for analyzing its behavior at the boundary points of the state space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Payoff-dependent dynamics and coordination games

Article 04 July 2016

Adaptive Multiagent Reinforcement Learning with Non-positive Regret

A Markovian Stackelberg game approach for computing an optimal dynamic mechanism

Article 14 July 2021

References

Arthur WB (1993) On designing economic agents that behave like human agents. J Evol Econ 3:1–22
Article Google Scholar
Bala V, Goyal S (2000) A noncooperative model of network formation. Econometrica 68(5):1181–1229
Article MathSciNet MATH Google Scholar
Basar T (1987) Relaxation techniques and asynchronous algorithms for online computation of non-cooperative equilibria. J Econ Dyn Control 11:531–549
Article MathSciNet MATH Google Scholar
Beggs AW (2005) On the convergence of reinforcement learning. J Econ Theory 122:1–36
Article MathSciNet MATH Google Scholar
Benaim M, Hofbauer J, Hopkins E (2009) Learning in games with unstable equilibria. J Econ Theory 144(4):1694–1709
Article MATH Google Scholar
Bergin J, Lipman BL (1996) Evolution with state-dependent mutations. Econometrica 64(4):943–956
Article MATH Google Scholar
Blondel VD, Hendrickx JM, Olshevsky A, Tsitsiklis JN (2005) Convergence in multiagent coordination, consensus, and flocking. In: Proceedings of the 44th IEEE conference on decision and control, Seville, Spain, December 2005, pp 2996–3000
Chapter Google Scholar
Börgers T, Sarin R (1997) Learning through reinforcement and replicator dynamics. J Econ Theory 77(1):1–14
Article Google Scholar
Borkar VS (2008) Stochastic approximation: a dynamical systems viewpoint. Cambridge University Press, New York
Google Scholar
Cho IK, Matsui A (2005) Learning aspiration in repeated games. J Econ Theory 124:171–201
Article MathSciNet MATH Google Scholar
Conlisk J (1993) Adaptation in game: two solutions to the Crawford puzzle. J Econ Behav Organ 22:25–50
Article Google Scholar
Crawford VP (1985) Learning behavior and mixed strategy Nash equilibria. J Econ Behav Organ 6:69–78
Article Google Scholar
Dutta B, Jackson M (2000) The stability and efficiency of directed communication networks. Rev Econ Des 5:251–272
Google Scholar
Erev I, Roth A (1998) Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. Am Econ Rev 88:848–881
Google Scholar
Franklin GF, Powell JD, Emami-Naeini A (2002) Feedback control of dynamic systems. Prentice Hall, Upper Saddle River
Google Scholar
Fudenberg D, Levine DK (1998) The theory of learning in games. MIT Press, Cambridge
MATH Google Scholar
Fudenberg D, Tirole J (1991) Game theory. MIT Press, Cambridge
Google Scholar
Hart S (2005) Adaptive heuristics. Econometrica 73(5):1401–1430
Article MathSciNet MATH Google Scholar
Hess RA, Modjtahedzadeh A (1990) A control theoretic model of driver steering behavior. IEEE Control Syst Mag 10(5):3–8
Article Google Scholar
Hofbauer J, Sigmund K (1988) The theory of evolution and dynamical systems. Cambridge University Press, Cambridge
MATH Google Scholar
Hopkins E, Posch M (2005) Attainability of boundary points under reinforcement learning. Games Econ Behav
Jackson M, Wolinsky A (1996) A strategic model of social and economic networks. J Econ Theory 71:44–74
Article MathSciNet MATH Google Scholar
Jadbabaie A, Lin J, Morse SA (2003) Coordination of groups of mobile agents using nearest neighbor rules. IEEE Trans Autom Control 48(6):988–1001
Article MathSciNet Google Scholar
Kandori M, Mailath G, Rob R (1993) Learning, mutation, and long-run equilibria in games. Econometrica 61:29–56
Article MathSciNet MATH Google Scholar
Karandikar R, Mookherjee D, Ray D (1998) Evolving aspirations and cooperation. J Econ Theory 80:292–331
Article MathSciNet MATH Google Scholar
Khalil HK (1992) Nonlinear systems. Prentice-Hall, New York
MATH Google Scholar
Kushner HJ, Yin GG (2003) Stochastic approximation and recursive algorithms and applications, 2nd edn. Springer, New York
MATH Google Scholar
Lewis D (2002) Convention: a philosophical study. Blackwell, Oxford
Google Scholar
Massaquoi SG (1999) Modeling the function of the cerebellum in scheduled linear servo control of simple horizontal planar arm movements. PhD thesis, Cambridge, MA
Matsui A, Matsuyama K (1995) An approach to equilibrium selection. J Econ Theory 65:415–434
Article MathSciNet MATH Google Scholar
Maynard Smith J (1982) Evolution and the theory of games. Cambridge University Press, Cambridge
Google Scholar
McRuer D (1980) Human dynamics in man–machine systems. Automatica 16:237–253
Article MATH Google Scholar
Moreau L (2005) Stability of multiagent systems with time-dependent communication links. IEEE Trans Autom Control 50(2):169–182
Article MathSciNet Google Scholar
Najim K, Poznyak AS (1994) Learning automata: theory and applications. Pergamon, Elmsford
Google Scholar
Narendra K, Thathachar M (1989) Learning automata: an introduction. Prentice-Hall, New York
Google Scholar
Norman MF (1968) On linear models with two absorbing states. J Math Psychol 5:225–241
Article MathSciNet MATH Google Scholar
Olfati-Saber R (2006) Flocking for multiagent dynamic systems: algorithms and theory. IEEE Trans Autom Control 51(3):401–420
Article MathSciNet Google Scholar
Olfati-Saber R, Fax JA, Murray RM (2007) Consensus and cooperation in networked multiagent systems. Proc IEEE 95(1):215–233
Article Google Scholar
Pemantle R (1990) Nonconvergence to unstable points in urn models and stochastic approximations. Ann Probab 18(2):698–712
Article MathSciNet MATH Google Scholar
Sandholm WH (2010) Population games and evolutionary dynamics. MIT Press, Cambridge
MATH Google Scholar
Schelling T (2006) The strategy of conflict. Harvard University Press, Cambridge
Google Scholar
Selten R (1991) Anticipatory learning in two-person games. In: Selton R (ed) Game equilibrium models. Evolution and game dynamics, vol I. Springer, Berlin, pp 98–153
Google Scholar
Shamma JS, Arslan G (2005) Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria. IEEE Trans Autom Control 50(3):312–327
Article MathSciNet Google Scholar
Shapiro IJ, Narendra KS (1969) Use of stochastic automata for parameter self-organization with multi-modal performance criteria. IEEE Trans Syst Sci Cybern 5:352–360
Article MATH Google Scholar
Skyrms B (2002) Signals, evolution and the explanatory power of transient information. Philos Sci 69:407–428
Article MathSciNet Google Scholar
Skyrms B, Pemantle R (2000) A dynamic model of social network formation. Proc Natl Acad Sci USA 97:9340–9346
Article MATH Google Scholar
Strogatz SH (2003) Sync: the emerging science of spontaneous order. Hyperion, New York
Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Tang F-F (2001) Anticipatory learning in two-person games: some experimental results. J Econ Behav Organ 44:221–232
Article Google Scholar
Verbeeck K, Nowe A, Lenaerts T, Parent J (2002) Learning to reach the Pareto optimal Nash equilibrium as a team. In: McKay RI, Slaney J (eds) AI2002 advances in artificial intelligence. LNCS, vol 2557. Springer, Berlin
Google Scholar
Wang X, Sandholm T (2003) Learning near-Pareto-optimal conventions in polynomial time. In: Neural information processing systems: natural and synthetic (NIPS)
Google Scholar
Weibull J (1997) Evolutionary game theory. MIT Press, Cambridge
Google Scholar
Young HP (1993) The evolution of conventions. Econometrica 61:57–84
Article MathSciNet MATH Google Scholar
Young HP (1998) Individual strategy and social structure. Princeton University Press, Princeton
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
Georgios C. Chasparis & Jeff S. Shamma

Authors

Georgios C. Chasparis
View author publications
You can also search for this author in PubMed Google Scholar
Jeff S. Shamma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georgios C. Chasparis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chasparis, G.C., Shamma, J.S. Distributed Dynamic Reinforcement of Efficient Outcomes in Multiagent Coordination and Network Formation. Dyn Games Appl 2, 18–50 (2012). https://doi.org/10.1007/s13235-011-0038-z

Download citation

Published: 05 November 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s13235-011-0038-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed Dynamic Reinforcement of Efficient Outcomes in Multiagent Coordination and Network Formation

Abstract

Access this article

Similar content being viewed by others

Payoff-dependent dynamics and coordination games

Adaptive Multiagent Reinforcement Learning with Non-positive Regret

A Markovian Stackelberg game approach for computing an optimal dynamic mechanism

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distributed Dynamic Reinforcement of Efficient Outcomes in Multiagent Coordination and Network Formation

Abstract

Access this article

Similar content being viewed by others

Payoff-dependent dynamics and coordination games

Adaptive Multiagent Reinforcement Learning with Non-positive Regret

A Markovian Stackelberg game approach for computing an optimal dynamic mechanism

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation