Abstract
During multi-agent interactions, robust strategies are needed to help the agents to coordinate their actions on efficient outcomes. A large body of previous work focuses on designing strategies towards the goal of Nash equilibrium under self-play, which can be extremely inefficient in many situations. On the other hand, apart from performing well under self-play, a good strategy should also be able to well respond against those opponents adopting different strategies as much as possible. In this paper, we consider a particular class of opponents whose strategies are based on best-response policy and also we target at achieving the goal of social optimality. We propose a novel learning strategy TaFSO which can effectively influence the opponent’s behavior towards socially optimal outcomes by utilizing the characteristic of best-response learners. Extensive simulations show that our strategy TaFSO achieves better performance than previous work under both self-play and against the class of best-response learners.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Banerjee, D., Sen, S.: Reaching pareto optimality in prisoner’s dilemma using conditional joint action learning. In: AAMAS 2007, pp. 91–108 (2007)
Bowling, M.H., Veloso, M.M.: Multiagent learning using a variable learning rate. In: Artificial Intelligence, pp. 215–250 (2003)
Brams, S.J.: Theory of Moves. Cambridge University Press, Cambridge (1994)
Camerer, C.F., Ho, T.H., Chong, J.K.: Sophisticated ewa learning and strategic teaching in repeated games. Journal of Economic Theory 104, 137–188 (2002)
Crandall, J.W., Goodrich, M.A.: Learning to teach and follow in repeated games. In: AAAI Workshop on Multiagent Learning (2005)
Fudenberg, D., Levine, D.K.: The Theory of Learning in Games. MIT Press (1998)
Hao, J.Y., Leung, H.F.: Strategy and fairness in repeated two-agent interaction. In: ICTAI 2010, pp. 3–6. IEEE Computer Society (2010)
Jafari, A., Greenwald, A., Gondek, D., Ercal, G.: On no-regret learning, fictitious play, and nash equilibrium. In: ICML 2001, pp. 226–233 (2001)
Littman, M.: Markov games as a framework for multi-agent reinforcement learning. In: ICML 1994, pp. 322–328 (1994)
Littman, M.L., Stone, P.: Leading best-response strategies in repeated games. In: IJCAI Workshop on Economic Agents, Models, and Mechanisms (2001)
Littman, M.L., Stone, P.: A polynomial time nash equilibrium algorithm for repeated games. Decision Support Systems 39, 55–66 (2005)
Moriyama, K.: Learning-rate adjusting q-learning for prisoner’s dilemma games. In: WI-IAT 2008. pp. 322–325 (2008)
oH, J., Smith, S.F.: A few good agents: multi-agent social learning. In: AAMAS 2008, pp. 339–346 (2008)
Osborne, M.J., Rubinstein, A.: A Course in Game Theory. MIT Press, Cambridge (1994)
Sen, S., Airiau, S., Mukherjee, R.: Towards a pareto-optimal solution in general-sum games. In: AAMAS 2003, pp. 153–160 (2003)
Stimpson, J.L., Goodrich, M.A., Walters, L.C.: Satisficing and learning cooperation in the prisoner’s dilemma. In: IJCAI 2001, pp. 535–540 (2001)
Watkins, C.J.C.H., Dayan, P.D.: Q-learning. In: Machine Learning, pp. 279–292 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hao, J., Leung, Hf. (2012). Learning to Achieve Socially Optimal Solutions in General-Sum Games. In: Anthony, P., Ishizuka, M., Lukose, D. (eds) PRICAI 2012: Trends in Artificial Intelligence. PRICAI 2012. Lecture Notes in Computer Science(), vol 7458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32695-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-32695-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32694-3
Online ISBN: 978-3-642-32695-0
eBook Packages: Computer ScienceComputer Science (R0)