Autonomous Agents and Multi-Agent Systems

, Volume 15, Issue 1, pp 91–108 | Cite as

Reaching pareto-optimality in prisoner’s dilemma using conditional joint action learning

  • Dipyaman BanerjeeEmail author
  • Sandip Sen


We consider the learning problem faced by two self-interested agents repeatedly playing a general-sum stage game. We assume that the players can observe each other’s actions but not the payoffs received by the other player. The concept of Nash Equilibrium in repeated games provides an individually rational solution for playing such games and can be achieved by playing the Nash Equilibrium strategy for the single-shot game in every iteration. Such a strategy, however can sometimes lead to a Pareto-Dominated outcome for games like Prisoner’s Dilemma. So we prefer learning strategies that converge to a Pareto-Optimal outcome that also produces a Nash Equilibrium payoff for repeated two-player, n-action general-sum games. The Folk Theorem enable us to identify such outcomes. In this paper, we introduce the Conditional Joint Action Learner (CJAL) which learns the conditional probability of an action taken by the opponent given its own actions and uses it to decide its next course of action. We empirically show that under self-play and if the payoff structure of the Prisoner’s Dilemma game satisfies certain conditions, a CJAL learner, using a random exploration strategy followed by a completely greedy exploitation technique, will learn to converge to a Pareto-Optimal solution. We also show that such learning will generate Pareto-Optimal payoffs in a large majority of other two-player general sum games. We compare the performance of CJAL with that of existing algorithms such as WOLF-PHC and JAL on all structurally distinct two-player conflict games with ordinal payoffs.


Multiagent learning Game theory Prisoner’s dilemma 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bowling M.H. and Veloso M.M. (2002). Multiagent learning using a variable learning rate. Artificial Intelligence 136(2): 215–250zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Bowling M.H. and Veloso M.M. (2004). Existence of multiagent equilibria with limited agents. Journal of Artificial Intelligence Res. (JAIR) 22: 353–384zbMATHMathSciNetGoogle Scholar
  3. 3.
    Brams S.J. (1994). Theory of moves. Cambridge University Press, Cambridge, UKzbMATHGoogle Scholar
  4. 4.
    Brown G.W. (1951). Iterative solution of games by fictiious play. In activity analysis of production and allocation. Wiley, New YorkGoogle Scholar
  5. 5.
    Claus, C., & Boutilier, C. (1997). The dynamics of reinforcement learning in cooperative multiagent systems. In Collected papers from AAAI-97 workshop on Multiagent Learning, (pp. 13–18). AAAI.Google Scholar
  6. 6.
    Conitzer, V., &Sandholm, T. (2003). Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In ICML, (pp. 83–90).Google Scholar
  7. 7.
    Crandall, J. W., & Goodrich, M. A. (2005). Learning to compete, compromise, and cooperate in repeated general-sum games. In Proceedings of the nineteenth international conference on machine learning, pp. 161–168.Google Scholar
  8. 8.
    de Farias, D. P., & Megiddo, N. (2003). How to combine expert (and novice) advice when actions impact the environment? In NIPS.Google Scholar
  9. 9.
    Fudenberg D. and Levinem K. (1998). The theory of learning in games. MIT Press, Cambridge, MAzbMATHGoogle Scholar
  10. 10.
    Greenwald, A. R., & Hall, K. (2003). Correlated q-learning. In ICML, pp. 242–249.Google Scholar
  11. 11.
    Greenwald, A. R., & Jafari, A. (2003). A general class of no-regret learning algorithms and game-theoretic equilibria. In COLT, pp. 2–12.Google Scholar
  12. 12.
    Hu J. and Wellman M.P. (2003). Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research 4: 1039–1069 CrossRefMathSciNetGoogle Scholar
  13. 13.
    Kalai, A., & Vempala, S. (2002). Geometric algorithms for online optimization. Technical Report MIT-LCS-TR-861, MIT Laboratory for Computer Science.Google Scholar
  14. 14.
    Kapetanakis, S., Kudenko, D., & Strens, M. (2004). Learning of coordination in cooperative multi-agent systems using commitment sequences. Artificial Intelligence and the Simulation of Behavior, 1(5).Google Scholar
  15. 15.
    Littlestone, N., & Warmuth, M. K. (1989). The weighted majority algorithm. In IEEE symposium on foundations of computer science, pp. 256–261.Google Scholar
  16. 16.
    Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the eleventh international conference on machine learning, (pp. 157–163). San Mateo, CA: Morgan Kaufmann.Google Scholar
  17. 17.
    Littman, N. L. (2001). Friend-or-foe q-learning in general-sum games. In Proceedings of the eighteenth international conference on machine learning, (pp. 322–328) San Francisco, CA: Morgan Kaufmann.Google Scholar
  18. 18.
    Littman, M. L., & Stone, P. (2001). Implicit negotiation in repeated games. In Intelligent agents VIII: Agent theories, architecture, and languages, pp. 393–404.Google Scholar
  19. 19.
    Littman M.L. and Stone P. (2005). A polynomial-time nash equilibrium algorithm for repeated games. Decision Support System 39: 55–66 CrossRefGoogle Scholar
  20. 20.
    Mundhe, M., & Sen, S. (1999). Evaluating concurrent reinforcement learners. IJCAI-99 workshop on agents that learn about, from and with other agents.Google Scholar
  21. 21.
    Panait L. and Luke S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems 11(3): 387–434 CrossRefGoogle Scholar
  22. 22.
    Sandholm T.W. and Crites R.H. (1995). Multiagent reinforcement learning and iterated prisoner’s dilemma. Biosystems Journal 37: 147–166 CrossRefGoogle Scholar
  23. 23.
    Sekaran, M., & Sen, S. (1994). Learning with friends and foes. In Sixteenth annual conference of the cognitive science society, (pp. 800–805). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.Google Scholar
  24. 24.
    Sen, S., Mukherjee, R., & Airiau, S. (2003). Towards a pareto-optimal solution in general-sum games. In Proceedings of the second intenational joint conference on autonomous agents and multiagent systems (pp. 153–160). New York, NY: ACM Press.Google Scholar
  25. 25.
    Mas-Colell A. and Hart S. (2001). A general class of adaptive strategies. Journal of Economic Theory 98(1): 26–54 zbMATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Singh, S. P., Kearns, M. J., & Mansour, Y. (2000) Nash convergence of gradient dynamics in general-sum games. In UAI, pp. 541–548.Google Scholar
  27. 27.
    Stimpson, J. L., Goodrich, M. A., & Walters, L. C. (2001) Satisficing and learning cooperation in the prisoner’s dilemma. In Proceedings of the seventeenth international joint conference on artificial intelligence, pp. 535–540.Google Scholar
  28. 28.
    Tuyls K. and Nowé A. (2006). Evolutionary game theory and multi-agent reinforcement learning. The Knowledge Engineering Review 20(1): 63–90 CrossRefGoogle Scholar
  29. 29.
    Verbeeck, K., Nowé, A., Lenaerts, T., & Parentm, J. (2002). Learning to reach the pareto optimal nash equilibrium as a team. In LNAI 2557: Proceedings of the fifteenth Australian joint conference on artificial intelligence, Vol. (pp. 407–418). Springer-Verlag.Google Scholar
  30. 30.
    Vidal J.M. and Durfee E.H. (2003). Predicting the expected behavior of agents that learn about agents: the CLRI framework. Autonomous Agents and Multi-Agent Systems 6(1): 77–107CrossRefGoogle Scholar
  31. 31.
    Weiß, G. Learning to coordinate actions in multi-agent systems. In Proceedings of the international joint conference on artificial intelligence, pp. 311–316, August 1993.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of TulsaTulsaUSA

Personalised recommendations