Learning Pareto-optimal Solutions in 2x2 Conflict Games

Airiau, Stéphane; Sen, Sandip

doi:10.1007/11691839_4

Stéphane Airiau²² &
Sandip Sen²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3898))

Included in the following conference series:

International Workshop on Learning and Adaption in Multi-Agent Systems

951 Accesses
1 Citations

Abstract

Multiagent learning literature has investigated iterated two-player games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configurations imply that no player has the motivation to unilaterally change its strategy. Often, in general sum games, a higher payoff can be obtained by both players if one chooses not to respond myopically to the other player. By developing mutual trust, agents can avoid immediate best responses that will lead to a Nash Equilibrium with lesser payoff. In this paper we experiment with agents who select actions based on expected utility calculations that incorporate the observed frequencies of the actions of the opponent(s). We augment these stochastically greedy agents with an interesting action revelation strategy that involves strategic declaration of one’s commitment to an action to avoid worst-case, pessimistic moves. We argue that in certain situations, such apparently risky action revelation can indeed produce better payoffs than a non-revealing approach. In particular, it is possible to obtain Pareto-optimal Nash Equilibrium outcomes. We improve on the outcome efficiency of a previous algorithm and present results over the set of structurally distinct two-person two-action conflict games where the players’ preferences form a total order over the possible outcomes. We also present results on a large number of randomly generated payoff matrices of varying sizes and compare the payoffs of strategically revealing learners to payoffs at Nash equilibrium.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Littman, M.L., Stone, P.: Leading best-response strategies in repeated games. In: IJCAI Workshop on Economic Agents, Models and Mechanisms (2001)
Google Scholar
Watkins, C.J.C.H., Dayan, P.D.: Q-learning. Machine Learning 3, 279–292 (1992)
MATH Google Scholar
Fudenberg, D., Levine, K.: The Theory of Learning in Games. MIT Press, Cambridge (1998)
MATH Google Scholar
Littman, M.L., Stone, P.: A polynomial-time nash equilibrium algorithm for repeated games. Decision Support Systems 39, 55–66 (2005)
Article Google Scholar
Conitzer, V., Sandholm, T.: Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In: Proceedings ont the 20th International Conference on Machine Learning (2003)
Google Scholar
Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215–250 (2002)
Article MathSciNet MATH Google Scholar
Sen, S., Airiau, S., Mukherjee, R.: Towards a pareto-optimal solution in generalsum games. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems (2003)
Google Scholar
Brams, S.J.: Theory of Moves. Cambridge University Press, Cambridge (1994)
MATH Google Scholar
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 746–752. AAAI Press/MIT Press, Menlo Park (1998)
Google Scholar
Littman, M.L.: Friend-or-foe q-learning in general-sum games. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 322–328. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Greenwald, A., Hall, K.: Correlated-q learning. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 242–249 (2003)
Google Scholar
Aumann, R.: Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics 1, 67–96 (1974)
Article MathSciNet MATH Google Scholar
McKelvey, R.D., McLennan, A.M., Turocy, T.L.: Gambit: Software tools for game theory version 0.97.0.7 (2004), http://econweb.tamu.edu/gambit

Download references

Author information

Authors and Affiliations

Department of Mathematical & Computer Sciences, The University of Tulsa, USA
Stéphane Airiau & Sandip Sen

Authors

Stéphane Airiau
View author publications
You can also search for this author in PubMed Google Scholar
Sandip Sen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

MICC-IKAT, Universiteit Maastricht, The Netherlands
Karl Tuyls
Center for Mathematics and Computer Science (CWI), Kruislaan 413, P.O. Box 94079, 1090, Amsterdam, GB, The Netherlands
Pieter Jan’t Hoen
KaHo Sint-Lieven, Information Technology Group, Gebr. Desmetstraat 1, 9000, Gent, Belgium
Katja Verbeeck
Department of Mathematical and Computer Science, University of Tulsa, USA
Sandip Sen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Airiau, S., Sen, S. (2006). Learning Pareto-optimal Solutions in 2x2 Conflict Games. In: Tuyls, K., Hoen, P.J., Verbeeck, K., Sen, S. (eds) Learning and Adaption in Multi-Agent Systems. LAMAS 2005. Lecture Notes in Computer Science(), vol 3898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11691839_4

Download citation

DOI: https://doi.org/10.1007/11691839_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33053-0
Online ISBN: 978-3-540-33059-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics