Learning Pareto-optimal Solutions in 2x2 Conflict Games

  • Stéphane Airiau
  • Sandip Sen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3898)


Multiagent learning literature has investigated iterated two-player games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configurations imply that no player has the motivation to unilaterally change its strategy. Often, in general sum games, a higher payoff can be obtained by both players if one chooses not to respond myopically to the other player. By developing mutual trust, agents can avoid immediate best responses that will lead to a Nash Equilibrium with lesser payoff. In this paper we experiment with agents who select actions based on expected utility calculations that incorporate the observed frequencies of the actions of the opponent(s). We augment these stochastically greedy agents with an interesting action revelation strategy that involves strategic declaration of one’s commitment to an action to avoid worst-case, pessimistic moves. We argue that in certain situations, such apparently risky action revelation can indeed produce better payoffs than a non-revealing approach. In particular, it is possible to obtain Pareto-optimal Nash Equilibrium outcomes. We improve on the outcome efficiency of a previous algorithm and present results over the set of structurally distinct two-person two-action conflict games where the players’ preferences form a total order over the possible outcomes. We also present results on a large number of randomly generated payoff matrices of varying sizes and compare the payoffs of strategically revealing learners to payoffs at Nash equilibrium.


Nash Equilibrium Multiagent System Repeated Game Stage Game Correlate Equilibrium 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Littman, M.L., Stone, P.: Leading best-response strategies in repeated games. In: IJCAI Workshop on Economic Agents, Models and Mechanisms (2001)Google Scholar
  2. 2.
    Watkins, C.J.C.H., Dayan, P.D.: Q-learning. Machine Learning 3, 279–292 (1992)MATHGoogle Scholar
  3. 3.
    Fudenberg, D., Levine, K.: The Theory of Learning in Games. MIT Press, Cambridge (1998)MATHGoogle Scholar
  4. 4.
    Littman, M.L., Stone, P.: A polynomial-time nash equilibrium algorithm for repeated games. Decision Support Systems 39, 55–66 (2005)CrossRefGoogle Scholar
  5. 5.
    Conitzer, V., Sandholm, T.: Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In: Proceedings ont the 20th International Conference on Machine Learning (2003)Google Scholar
  6. 6.
    Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215–250 (2002)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Sen, S., Airiau, S., Mukherjee, R.: Towards a pareto-optimal solution in generalsum games. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems (2003)Google Scholar
  8. 8.
    Brams, S.J.: Theory of Moves. Cambridge University Press, Cambridge (1994)MATHGoogle Scholar
  9. 9.
    Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 746–752. AAAI Press/MIT Press, Menlo Park (1998)Google Scholar
  10. 10.
    Littman, M.L.: Friend-or-foe q-learning in general-sum games. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 322–328. Morgan Kaufmann, San Francisco (2001)Google Scholar
  11. 11.
    Greenwald, A., Hall, K.: Correlated-q learning. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 242–249 (2003)Google Scholar
  12. 12.
    Aumann, R.: Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics 1, 67–96 (1974)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    McKelvey, R.D., McLennan, A.M., Turocy, T.L.: Gambit: Software tools for game theory version (2004),

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Stéphane Airiau
    • 1
  • Sandip Sen
    • 1
  1. 1.Department of Mathematical & Computer SciencesThe University of TulsaUSA

Personalised recommendations