Skip to main content

Advertisement

SpringerLink
  • Log in
  1. Home
  2. Machine Learning
  3. Article
AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
Download PDF
Your article has downloaded

Similar articles being viewed by others

Slider with three articles shown per slide. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide.

Learning to alternate

12 April 2018

Jasmina Arifovic & John Ledyard

Algorithms may not learn to play a unique Nash equilibrium

14 March 2021

Takako Fujiwara-Greve & Carsten Krabbe Nielsen

Modeling opponent learning in multiagent repeated games

23 December 2022

Yudong Hu, Congying Han, … Tiande Guo

The Frequency of Convergent Games under Best-Response Dynamics

19 October 2021

Samuel C. Wiese & Torsten Heinrich

Dynamical systems as a level of cognitive analysis of multi-agent learning

23 June 2021

Wolfram Barfuss

Machine Discovery of Comprehensible Strategies for Simple Games Using Meta-interpretive Learning

01 April 2019

Stephen H. Muggleton & Celine Hocquette

Feature-weighted categorized play across symmetric games

28 February 2022

Marco LiCalzi & Roland Mühlenbernd

Adaptive learning in large populations

14 September 2019

Misha Perepelitsa

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

15 May 2019

Chengwei Zhang, Xiaohong Li, … Zhiyong Feng

Download PDF
  • Published: 18 September 2006

AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents

  • Vincent Conitzer1 &
  • Tuomas Sandholm1 

Machine Learning volume 67, pages 23–43 (2007)Cite this article

  • 1846 Accesses

  • 85 Citations

  • Metrics details

Abstract

Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action (repeated) games—assuming that the opponent’s mixed strategy is observable. Another algorithm, ReDVaLeR (which was introduced after the algorithm described in this paper), achieves the two properties in games with arbitrary numbers of actions and players, but still requires that the opponents' mixed strategies are observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players. It is still the only algorithm that does so while only relying on observing the other players' actual actions (not their mixed strategies). It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others' strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. We provide experimental results that suggest that AWESOME converges fast in practice. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing future multiagent learning algorithms as well.

Download to read the full article text

Working on a manuscript?

Avoid the most common mistakes and prepare your manuscript for journal editors.

Learn more

References

  • Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: The adversarial multi-arm bandit problem. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS) (pp. 322–331).

  • Aumann, R. (1974). Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1, 67–96.

    Article  MATH  MathSciNet  Google Scholar 

  • Banerjee, B., & Peng, J. (2004). Performance bounded reinforcement learning in strategic interactions. In Proceedings of the National Conference on Artificial Intelligence (AAAI) (pp. 2–7). San Jose, CA, USA.

  • Banerjee, B., Sen, S., & Peng, J. (2001). Fast concurrent reinforcement learners. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 825–830). Seattle, WA.

  • Bowling, M. (2005). Convergence and no-regret in multiagent learning. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS) (pp. 209–216). Vancouver, Canada.

  • Bowling, M., & Veloso, M. (2002). Multiagent learning using a variable learning rate. Artificial Intelligence, 136, 215–250.

    Article  MATH  MathSciNet  Google Scholar 

  • Brafman, R., & Tennenholtz, M. (2000). A near-optimal polynomial time algorithm for learning in certain classes of stochastic games. Artificial Intelligence, 121, 31–47.

    Article  MATH  MathSciNet  Google Scholar 

  • Brafman, R., & Tennenholtz, M. (2003). R-max—a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213–231.

    Article  MATH  MathSciNet  Google Scholar 

  • Brafman, R., & Tennenholtz, M. (2004). Efficient learning equilibrium. Artificial Intelligence, 159, 27–47.

    Article  MATH  MathSciNet  Google Scholar 

  • Brafman, R., & Tennenholtz, M. (2005). Optimal efficient learning equilibrium: Imperfect monitoring in symmetric games. In Proceedings of the National Conference on Artificial Intelligence (AAAI) (pp. 726–731). Pittsburgh, PA, USA.

  • Cahn, A. (2000). General procedures leading to correlated equilibria. Discussion paper 216, Center for Rationality, The Hebrew University of Jerusalem, Israel.

    Google Scholar 

  • Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence (AAAI) (pp. 746–752). Madison, WI.

  • Conitzer, V., & Sandholm, T. (2003a). BL-WoLF: A framework for loss-bounded learnability in zero-sum games. In International Conference on Machine Learning (ICML) (pp. 91–98). Washington, DC, USA.

  • Conitzer, V., & Sandholm, T. (2003b). Complexity results about Nash equilibria. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 765–771). Acapulco, Mexico.

  • Conitzer, V., & Sandholm, T. (2004). Communication complexity as a lower bound for learning in games. In International Conference on Machine Learning (ICML) (pp. 185–192). Banff, Alberta, Canada.

  • Foster, D., & Vohra, R. (1997). Calibrated learning and correlated equilibrium. Games and Economic Behavior, 21, 40–55.

    Article  MATH  MathSciNet  Google Scholar 

  • Foster, D. P., & Young, H. P. (2001). On the impossibility of predicting the behavior of rational agents. In Proceedings of the National Academy of Sciences, (Vol. 98, pp. 12848–12853).

    Article  MathSciNet  Google Scholar 

  • Freund, Y., & Schapire, R. (1999). Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29, 79–103.

    Article  MATH  MathSciNet  Google Scholar 

  • Fudenberg, D., & Levine, D. (1998). The theory of learning in games. MIT Press.

  • Fudenberg, D., & Levine, D. (1999). Conditional universal consistency. Games and Economic Behavior, 29, 104–130.

    Article  MATH  MathSciNet  Google Scholar 

  • Fudenberg, D., & Levine, D. K. (1995). Consistency and cautious fictitious play. Journal of Economic Dynamics and Control, 19, 1065–1089.

    Article  MATH  MathSciNet  Google Scholar 

  • Gilboa, I., & Zemel, E. (1989). Nash and correlated equilibria: some complexity considerations. Games and Economic Behavior, 1, 80–93.

    Article  MATH  MathSciNet  Google Scholar 

  • Greenwald, A., & Hall, K. (2003). Correlated Q-learning. International Conference on Machine Learning (ICML) (pp. 242–249). Washington, DC, USA.

  • Greenwald, A., & Jafari, A. (2003). A general class of no-regret learning algorithms and game-theoretic equilibria. Conference on Learning Theory (COLT). Washington, DC.

  • Hart, S., & Mas-Colell, A. (2000). A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68, 1127–1150.

    Article  MATH  MathSciNet  Google Scholar 

  • Hart, S., & Mas-Colell, A. (2003). Uncoupled dynamics do not lead to Nash equilibrium. American Economic Review, 93, 1830–1836.

    Article  Google Scholar 

  • Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: theoretical framework and an algorithm. International Conference on Machine Learning (ICML) (pp. 242–250).

  • Jafari, A., Greenwald, A., Gondek, D., & Ercal, G. (2001). On no-regret learning, fictitious play, and Nash equilibrium. International Conference on Machine Learning (ICML) (pp. 226–233). Williams College, MA, USA.

  • Kakade, S., & Foster, D. (2004). Deterministic calibration and Nash equilibrium. In Conference on Learning Theory (COLT). Banff, Alberta, Canada.

  • Kalai, E., & Lehrer, E. (1993). Rational learning leads to Nash equilibrium. Econometrica, 61, 1019–1045.

    Article  MATH  MathSciNet  Google Scholar 

  • Lemke, C., & Howson, J. (1964). Equilibrium points of bimatrix games. Journal of the Society of Industrial and Applied Mathematics, 12, 413–423.

    Article  MATH  MathSciNet  Google Scholar 

  • Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108, 212–261.

    Article  MATH  MathSciNet  Google Scholar 

  • Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In International Conference on Machine Learning (ICML) (pp. 157–163).

  • Littman, M. (2001). Friend or foe Q-learning in general-sum Markov games. In International Conference on Machine Learning (ICML) (pp. 322–328).

  • Littman, M., & Stone, P. (2003). A polynomial-time Nash equilibrium algorithm for repeated games. In Proceedings of the ACM Conference on Electronic Commerce (ACM-EC) (pp. 48–54). San Diego, CA.

  • Littman, M., & Szepesvári, C. (1996). A generalized reinforcement-learning model: convergence and applications. In International Conference on Machine Learning (ICML) (pp. 310–318).

  • Miyasawa, K. (1961). On the convergence of the learning process in a 2 × 2 nonzero sum two-person game. Research memo 33, Princeton University.

  • Nachbar, J. (1990). Evolutionary selection dynamics in games: Convergence and limit properties. International Journal of Game Theory, 19, 59–89.

    Article  MATH  MathSciNet  Google Scholar 

  • Nachbar, J. (1997). Prediction, optimization, and learning in games. Econometrica, 65, 275–309.

    Article  MATH  MathSciNet  Google Scholar 

  • Nachbar, J. (2001). Bayesian learning in repeated games of incomplete information. Social Choice and Welfare, 18, 303–326.

    Article  MATH  MathSciNet  Google Scholar 

  • Nash, J. (1950). Equilibrium points in n-person games. In Proc. of the National Academy of Sciences, 36, 48–49.

    Article  MATH  MathSciNet  Google Scholar 

  • Papadimitriou, C. (2001). Algorithms, games and the Internet. In Proceedings of the Annual Symposium on Theory of Computing (STOC) (pp. 749–753).

  • Pivazyan, K., & Shoham, Y. (2002). Polynomial-time reinforcement learning of near-optimal policies. In Proceedings of the National Conference on Artificial Intelligence (AAAI). Edmonton, Canada.

  • Porter, R., Nudelman, E., & Shoham, Y. (2004). Simple search methods for finding a Nash equilibrium. In Proceedings of the National Conference on Artificial Intelligence (AAAI) (pp. 664–669). San Jose, CA, USA.

  • Powers, R., & Shoham, Y. (2005a). Learning against opponents with bounded memory. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI). Edinburgh, UK.

  • Powers, R., & Shoham, Y. (2005b). New criteria and a new algorithm for learning in multi-agent systems. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada.

  • Robinson, J. (1951). An iterative method of solving a game. Annals of Mathematics, 54, 296–301.

    Article  MathSciNet  Google Scholar 

  • Sandholm, T., & Crites, R. (1996). Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems, 37, 147–166. Special issue on the Prisoner's Dilemma.

    Article  Google Scholar 

  • Sandholm, T., Gilpin, A., & Conitzer, V. (2005). Mixed-integer programming methods for finding Nash equilibria. In Proceedings of the National Conference on Artificial Intelligence (AAAI) (pp. 495–501). Pittsburgh, PA, USA.

  • Sen, S., & Weiss, G. (1998). Learning in multiagent systems. In G. Weiss (Ed.), Multiagent systems: a modern introduction to distributed artificial intelligence (Chapter 6, pp. 259–298). MIT Press.

  • Shapley, L. S. (1964). Some topics in two-person games. In M. Drescher, L. S. Shapley & A. W. Tucker (Eds.), Advances in game theory. Princeton University Press.

  • Simon, H. A. (1982). Models of bounded rationality, vol. 2. MIT Press.

  • Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 541–548). Stanford, CA.

  • Stimpson, J., Goodrich, M., & Walters, L. (2001). Satisficing and learning cooperation in the prisoner's dilemma. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 535–540). Seattle, WA.

  • Tan, M. (1993). Multi-agent reinforcement learning: independent vs. cooperative agents. In International Conference on Machine Learning (ICML) (pp. 330–337).

  • Wang, X., & Sandholm, T. (2002). Reinforcement learning to play an optimal Nash equilibrium in team Markov games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada.

  • Wang, X., & Sandholm, T. (2003). Learning near-Pareto-optimal conventions in polynomial time. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada.

  • Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In International Conference on Machine Learning (ICML) (pp. 928–936). Washington, DC, USA.

Download references

Author information

Authors and Affiliations

  1. , Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, 15213

    Vincent Conitzer & Tuomas Sandholm

Authors
  1. Vincent Conitzer
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Tuomas Sandholm
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vincent Conitzer.

Additional information

Editors: Amy Greenwald and Michael Littman

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Conitzer, V., Sandholm, T. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Mach Learn 67, 23–43 (2007). https://doi.org/10.1007/s10994-006-0143-1

Download citation

  • Received: 08 September 2005

  • Revised: 16 March 2006

  • Accepted: 21 June 2006

  • Published: 18 September 2006

  • Issue Date: May 2007

  • DOI: https://doi.org/10.1007/s10994-006-0143-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Game theory
  • Learning in games
  • Nash equilibrium
Download PDF

Working on a manuscript?

Avoid the most common mistakes and prepare your manuscript for journal editors.

Learn more

Advertisement

Over 10 million scientific documents at your fingertips

Switch Edition
  • Academic Edition
  • Corporate Edition
  • Home
  • Impressum
  • Legal information
  • Privacy statement
  • California Privacy Statement
  • How we use cookies
  • Manage cookies/Do not sell my data
  • Accessibility
  • FAQ
  • Contact us
  • Affiliate program

Not logged in - 3.236.209.138

Not affiliated

Springer Nature

© 2023 Springer Nature Switzerland AG. Part of Springer Nature.