Introducing decision entrustment mechanism into repeated bilateral agent interactions to achieve social optimality

Hao, Jianye; Leung, Ho-fung

doi:10.1007/s10458-014-9265-1

Introducing decision entrustment mechanism into repeated bilateral agent interactions to achieve social optimality

Published: 17 May 2014

Volume 29, pages 658–682, (2015)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Jianye Hao¹^nAff2 &
Ho-fung Leung¹

428 Accesses
7 Citations
Explore all metrics

Abstract

During multiagent interactions, robust strategies are needed to help the agents to coordinate their actions on efficient outcomes. A large body of previous work focuses on designing strategies towards the goal of Nash equilibrium under self-play, which can be extremely inefficient in many situations such as prisoner’s dilemma game. To this end, we propose an alternative solution concept, socially optimal outcome sustained by Nash equilibrium (SOSNE), which refers to those outcomes that maximize the sum of all agents’ payoffs among all the possible outcomes that can correspond to a Nash equilibrium payoff profile in the infinitely repeated games. Adopting the solution concept of SOSNE guarantees that the system-level performance can be maximized provided that no agent will sacrifice its individual profits. On the other hand, apart from performing well under self-play, a good strategy should also be able to well respond against those opponents adopting different strategies as much as possible. To this end, we consider a particular class of rational opponents and we target at influencing those opponents to coordinate on SOSNE outcomes. We propose a novel learning strategy TaFSO which combines the characteristics of both teacher and follower strategies to effectively influence the opponent’s behavior towards SOSNE outcomes by exploiting their limitations. Extensive simulations show that our strategy TaFSO achieves better performance in terms of average payoffs obtained than previous work under both self-play and against the same class of rational opponents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

Article 15 May 2019

Adaptive Multiagent Reinforcement Learning with Non-positive Regret

A Game-Theoretic Approach to Multi-agent Trust Region Optimization

Notes

One agent is randomly chosen to reveal its action in case that both agents choose to reveal their actions simultaneously.
Note that in general an outcome is a profile of mixed strategies of all agents [19], and a profile of pure strategies is a special case. In this paper, we adopt the meaning that an outcome is a pure strategy profile unless otherwise mentioned.
WoLF-PHC is short for Win or Learn Fast—policy hill climbing.
A preference relation \(\succsim _i\) for player \(i\) is defined under the limit of means criterion if it satisfies the following property: \(O_1 \succsim _i O_2\) if and only if \(\lim _{t \rightarrow \infty }\Sigma _{k=1}^{t}(p_1^k - p_2^k)/t \ge 0\), where \(O_1 = (a_{i,t}^1, a_{j,t}^1)_{t=1}^{\infty }\) and \(O_2 = (a_{i,t}^2, a_{j,t}^2)_{t=1}^{\infty }\) are the outcomes of the infinitely repeated game, and \(p_1^k\) and \(p_2^k\) are the corresponding payoffs player \(i\) receives in round k of outcomes \(O_1\) and \(O_2\) respectively.
Similar results can be observed when the opponent adopts other types of best-response strategies (Q-learning and FP) and are omitted here.
Note that theoretically, the opponents should be able to understand the punishment signal given enough explorations. In practice, due to the exploration schedule, the opponents do not explore enough to understand the punishment and thus settle on a sub-optimal strategy.
Note that only the payoffs obtained after 500 rounds are counted here since at the beginning the agents may achieve very low payoffs due to initial explorations. The results are averaged over 50 runs. For the tricky game, the payoffs of both TaFSO and SPaM learners are averaged over the cases when they play as the row or column players.

References

Airiau, S., & Sen, S. (2006). Learning to commit in repeated games. In AAMAS’06 (pp 1263–1265).
Airiau, S., & Sen, S. (2007). Evolutionary tournament-based comparison of learning and non-learning algorithms for iterated games. Journal of Artificial Societies and Social, Simulation, 10, 11.
Google Scholar
Banerjee, D., & Sen, S. (2007). Reaching pareto optimality in prisoner’s dilemma using conditional joint action learning. In AAMAS’07 (pp 91–108).
Bowling, M. H., & Veloso, M. M. (2003). Multiagent learning using a variable learning rate. Artificial Intelligence, 136, 215–250.
Article MathSciNet Google Scholar
Brams, S. J. (1994). Theory of moves. Cambridge: Cambridge University Press.
MATH Google Scholar
Chakraborty, D., & Stone, P. (2013). Multiagent learning in the presence of memory-bounded agents. Autonomous Agents and Multi-Agent Systems, 28, 182–213.
Article Google Scholar
Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI’98 (pp 746–752).
Crandall, J. W., & Goodrich, M. A. (2005). Learning to teach and follow in repeated games. In AAAI Workshop on Multiagent Learning.
Fudenberg, D., & Levine, D. K. (1998). The theory of learning in games. Cambridge, MA: MIT Press.
MATH Google Scholar
Hu, J., & Wellman, M. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning (pp 242–250).
Jafari, A., Greenwald, A., Gondek, D., & Ercal, G. (2001). On no-regret learning, fictitious play, and Nash equilibrium. In: ICML’01 (pp 226–233)
Jong, S., Tuyls, K., & Verbeeck, K. (2008). Artificial agents learning human fairness. In AAMAS’08, ACM Press (pp 863–870).
Lauer, M., & Rienmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In ICML’00 (pp 535–542).
Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning (pp 322–328).
Littman, M. L., & Stone, P. (2001). Leading best-response strategies in repeated games. In IJCAI Workshop on Economic Agents, Models, and Mechanisms.
Littman, M. L., & Stone, P. (2005). A polynomial time Nash equilibrium algorithm for repeated games. Decision Support Systems, 39, 55–66.
Article Google Scholar
Moriyama, K. (2008). Learning-rate adjusting Q-learning for prisoner’s dilemma games. In WI-IAT ’08 (pp 322–325).
Oh, J., & Smith, S. F. (2008). A few good agents: multi-agent social learning. In AAMAS’08 (pp 339–346).
Osborne, M. J., & Rubinstein, A. (1994). A course in game theory. Cambridge: MIT Press.
MATH Google Scholar
Powers, R., & Shoham, Y. (2004). New criteria and a new algorithm for learning in multi-agent systems. In NIPS’04 17 (pp. 1089–1096).
Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In IJCAI’05 (pp 817–822).
Sen, S., Airiau, S., & Mukherjee, R. (2003). Towards a pareto-optimal solution in general-sum games. In AAMAS’03 (pp 153–160).
Shoham, Y., Powers, R., & Grenager, T. (2007). If multi-agent learning is the answer, what is the question? Artificial Intelligence, 171, 365–377.
Article MATH MathSciNet Google Scholar
Stimpson, J. L., Goodrich, M. A., Walters, L. C. (2001). Satisficing and learning cooperation in the prisoner’s dilemma. In IJCAI’01 (pp 535–540).
Watkins, C. J. C. H., & Dayan, P. D. (1992). Q-learning. Machine Learning, 8, 279–292.
MATH Google Scholar

Download references

Acknowledgments

The work described in this paper was partially supported by a CUHK Direct Grant for Research (Project ID 4055024).

Author information

Jianye Hao
Present address: Massachusetts Institute of Technology, Boston, MA, USA

Authors and Affiliations

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
Jianye Hao & Ho-fung Leung

Authors

Jianye Hao
View author publications
You can also search for this author in PubMed Google Scholar
Ho-fung Leung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianye Hao.

Appendix

The 57 structurally distinct games mentioned in Sect. 5 are listed as follows.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hao, J., Leung, Hf. Introducing decision entrustment mechanism into repeated bilateral agent interactions to achieve social optimality. Auton Agent Multi-Agent Syst 29, 658–682 (2015). https://doi.org/10.1007/s10458-014-9265-1

Download citation

Published: 17 May 2014
Issue Date: July 2015
DOI: https://doi.org/10.1007/s10458-014-9265-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introducing decision entrustment mechanism into repeated bilateral agent interactions to achieve social optimality

Abstract

Access this article

Similar content being viewed by others

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

Adaptive Multiagent Reinforcement Learning with Non-positive Regret

A Game-Theoretic Approach to Multi-agent Trust Region Optimization

Notes

References

Acknowledgments