Opponent learning awareness and modelling in multi-objective normal form games

Rădulescu, Roxana; Verstraeten, Timothy; Zhang, Yijie; Mannion, Patrick; Roijers, Diederik M.; Nowé, Ann

doi:10.1007/s00521-021-06184-3

Opponent learning awareness and modelling in multi-objective normal form games

S.I. : Adaptive and Learning Agents 2020
Published: 19 June 2021

Volume 34, pages 1759–1781, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Roxana Rădulescu ORCID: orcid.org/0000-0003-1446-5514¹,
Timothy Verstraeten¹,
Yijie Zhang²,
Patrick Mannion³,
Diederik M. Roijers^1,4 &
…
Ann Nowé¹

735 Accesses
8 Citations
8 Altmetric
Explore all metrics

Abstract

Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with nonlinear utilities. Specifically, we consider two-player multi-objective normal form games with nonlinear utility functions under the scalarised expected returns optimisation criterion. We contribute novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting, along with extensions that incorporate opponent policy reconstruction and learning with opponent learning awareness (i.e. learning while considering the impact of one’s policy when anticipating the opponent’s learning step). Empirical results in five different MONFGs demonstrate that opponent learning awareness and modelling can drastically alter the learning dynamics in this setting. When equilibria are present, opponent modelling can confer significant benefits on agents that implement it. When there are no Nash equilibria, opponent learning awareness and modelling allows agents to still converge to meaningful solutions that approximate equilibria.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Game-Theoretic Approach to Multi-agent Trust Region Optimization

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

Article 15 May 2019

Adaptive Multiagent Reinforcement Learning with Non-positive Regret

Notes

Specifically, the parameters of a softmax policy for MO-ACOLAM (Sect. 3.2.3) and the parameters of a sigmoid policy for MO-LOLAM (Sect. 3.3.1).
For a detailed explanation of this approach, please refer to Foerster et al. [13].
https://github.com/rradules/opponent_modelling_monfg_results.
https://github.com/rradules/opponent_modelling_monfg.
The interested reader can find the results for all the experiments presented in Fig. 2 in our results repository.

References

Albrecht SV, Stone P (2018) Autonomous agents modelling other agents: A comprehensive survey and open problems. Artif Intell 258:66–95
Article MathSciNet Google Scholar
Billingsley P (2008) Probability and measure. Wiley, New York
MATH Google Scholar
Blackwell D et al (1956) An analog of the minimax theorem for vector payoffs. Pacific J Math 6(1):1–8
Article MathSciNet Google Scholar
Bonilla E.V, Chai K.M, Williams C (2008) Multi-task gaussian process prediction. Advances in neural information processing systems, 153–160
Borm P, Tijs S, Van Den Aarssen J (1988) Pareto equilibria in multiobjective games. Methods Op Res 60:302–312
MATH Google Scholar
Borm P, Vermeulen D, Voorneveld M (2003) The structure of the set of equilibria for two person multicriteria games. Eur J Op Res 148(3):480–493
Article MathSciNet Google Scholar
Chajewska U, Koller D (2000) Utilities as random variables: Density estimation and structure discovery. In Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence, pp 63–71. Morgan Kaufmann Publishers Inc
Chajewska U, Koller D, Ormoneit D (2001) Learning an agent’s utility function by observing behavior. In Proceedings of the Eighteenth International Conference on Machine Learning, pp 35–42
Chajewska U, Koller D, Parr R (2000) Making rational decisions using adaptive utility elicitation. In: AAAI/IAAI, pp. 363–369
Chu W, Ghahramani Z (2005) Preference learning with gaussian processes. In Proceedings of the 22nd international conference on Machine learning, pp 137–144. ACM
Claus C (1998) Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 746–752:2
Google Scholar
Foerster J, Chen R.Y, Al-Shedivat M, Whiteson S, Abbeel P, Mordatch I (2018) Learning with opponent-learning awareness. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp 122–130
Foerster J, Farquhar G, Al-Shedivat M, Rocktäschel T, Xing E, Whiteson S (2018) Dice: The infinitely differentiable monte carlo estimator. In International Conference on Machine Learning, pp 1529–1538
Fudenberg D, Drew F, Levine DK, Levine DK (1998) The theory of learning in games, vol 2. MIT press, Cambridge
MATH Google Scholar
Guo S, Sanner S, Bonilla EV (2010) Gaussian process preference elicitation. Adv Neural Inf Process Syst 23:262–270
Google Scholar
Hayes C.F, Rădulescu R, Bargiacchi E, Källström J, Macfarlane M, Reymond M, Verstraeten T, Zintgraf L.M, Dazeley R, Heintz F, et al (2021) A practical guide to multi-objective reinforcement learning and planning. arXiv preprint arXiv:2103.09568
He H, Boyd-Graber J, Kwok K, Daumé III H (2016) Opponent modeling in deep reinforcement learning. In International Conference on Machine Learning, pp 1804–1813
Knegt SJ, Drugan MM, Wiering MA (2018) Opponent modelling in the game of tron using reinforcement learning. ICAART 2:29–40
Google Scholar
Lowe R, Wu YI, Tamar A, Harb J, Pieter Abbeel O, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst 30:6379–6390
Google Scholar
Lozovanu D, Solomon D, Zelikovsky A (2005) Multiobjective games and determining pareto-nash equilibria. Buletinul Academiei de Ştiinţe a Republicii Moldova. Matematica 3:115–122
MATH Google Scholar
Mannion P, Devlin S, Duggan J, Howley E (2018) Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning. Knowl Eng Rev 33:e23
Article Google Scholar
Nash J (1951) Non-cooperative games. Annals Math 54(2):286–295
Article MathSciNet Google Scholar
Ng A.Y, Russell S.J, et al (2000) Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, vol. 1, p 2
Nisan N, Roughgarden T, Tardos E, Vazirani VV (2007) Algorithmic game theory. Cambridge University Press, Cambridge
Book Google Scholar
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: An Imperative Style, High-Performance Deep Learning Library. In: H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, R. Garnett (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8026–8037. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
Perolat J, Leibo J.Z, Zambaldi V, Beattie C, Tuyls K, Graepel T (2017) A multi-agent reinforcement learning model of common-pool resource appropriation. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 3646–3655
Raileanu R, Denton E, Szlam A, Fergus R Modeling others using oneself in multi-agent reinforcement learning. In International Conference on Machine Learning (ICML)
Rasmussen CE, Kuss M (2003) Gaussian processes in reinforcement learning. Adv Neural Inf Process Syst 16:751–758
Google Scholar
Roijers D.M, Steckelmacher D, Nowé A (2018) Multi-objective reinforcement learning for the expected utility of the return. In Proceedings of the Adaptive and Learning Agents workshop at FAIM
Roijers DM, Vamplew P, Whiteson S, Dazeley R (2013) A survey of multi-objective sequential decision-making. J Artif Intell Res 48:67–113
Article MathSciNet Google Scholar
Roijers DM, Whiteson S (2017) Multi-objective decision making. Synth Lect Artif Intell Mach Learn 11(1):1–129
MATH Google Scholar
Rădulescu R, Mannion P, Roijers DM, Nowé A (2020) Multi-objective multi-agent decision making: a utility-based analysis and survey. Auton Agents Multi-Agent Syst 34(1):1–52. https://doi.org/10.1007/s10458-019-09433-x
Article Google Scholar
Rădulescu R, Mannion P, Zhang Y, Roijers DM, Nowé A (2020) A utility-based analysis of equilibria in multi-objective normal-form games. Knowl Eng Rev 35:e32. https://doi.org/10.1017/S0269888920000351
Article Google Scholar
Shapley LS, Rigby FD (1959) Equilibrium points in games with vector payoffs. Naval Res Logist Quart 6(1):57–61
Article MathSciNet Google Scholar
Shen Y, Wu Y, Chen G, Van Grinsven HJ, Wang X, Gu B, Lou X (2017) Non-linear increase of respiratory diseases and their costs under severe air pollution. Environ Pollut 224:631–637
Article Google Scholar
Sutton R.S, Barto A.G (2018) Reinforcement Learning: An Introduction, second edn. The MIT Press. http://incompleteideas.net/book/the-book-2nd.html
Uther W, Veloso M (1997) Adversarial reinforcement learning. Technical report, Carnegie Mellon University, USA (Unpublished)
Google Scholar
Voorneveld M, Vermeulen D, Borm P (1999) Axiomatizations of pareto equilibria in multicriteria games. Games Econ Behav 28(1):146–154
Article MathSciNet Google Scholar
Wang KA, Pleiss G, Gardner JR, Tyree S, Weinberger KQ, Wilson AG (2019) Exact gaussian processes on a million data points. Advances in Neural Information Processing Systems 32
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256. https://doi.org/10.1007/BF00992696
Article MATH Google Scholar
Wilson A, Nickisch H (2015) Kernel interpolation for scalable structured gaussian processes (kiss-gp). In International Conference on Machine Learning, pp 1775–1784. PMLR
Zhang Y, Rădulescu R, Mannion P, Roijers D.M, Nowé A (2020) Opponent modelling for reinforcement learning in multi-objective normal form games. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp 2080–2082
Zhang Y, Rădulescu R, Mannion P, Roijers D.M, Nowé A (2020) Opponent modelling using policy reconstruction for multi-objective normal form games. In: Proceedings of the Adaptive and Learning Agents Workshop (ALA-20) at AAMAS
Zintgraf L.M, Roijers D.M, Linders S, Jonker C.M, Nowé A (2018) Ordered preference elicitation strategies for supporting multi-objective decision making. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp 1477–1485. International Foundation for Autonomous Agents and Multiagent Systems

Download references

Acknowledgements

The authors would like to acknowledge FWO (Fonds Wetenschappelijk Onderzoek) for their support through the SB grants of Timothy Verstraeten (#1S47617N). This research was supported by funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program.

Author information

Authors and Affiliations

Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium
Roxana Rădulescu, Timothy Verstraeten, Diederik M. Roijers & Ann Nowé
Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
Yijie Zhang
School of Computer Science, National University of Ireland Galway, Galway, Ireland
Patrick Mannion
Microsystems Technology, HU University of Applied Science Utrecht, Utrecht, The Netherlands
Diederik M. Roijers

Authors

Roxana Rădulescu
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Verstraeten
View author publications
You can also search for this author in PubMed Google Scholar
Yijie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Mannion
View author publications
You can also search for this author in PubMed Google Scholar
Diederik M. Roijers
View author publications
You can also search for this author in PubMed Google Scholar
Ann Nowé
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roxana Rădulescu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Some preliminary results in this article were presented at AAMAS 2020 [42] and the Adaptive and Learning Agents Workshop 2020 [43].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rădulescu, R., Verstraeten, T., Zhang, Y. et al. Opponent learning awareness and modelling in multi-objective normal form games. Neural Comput & Applic 34, 1759–1781 (2022). https://doi.org/10.1007/s00521-021-06184-3

Download citation

Received: 16 November 2020
Accepted: 28 May 2021
Published: 19 June 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00521-021-06184-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Opponent learning awareness and modelling in multi-objective normal form games

Abstract

Access this article

Similar content being viewed by others

A Game-Theoretic Approach to Multi-agent Trust Region Optimization

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

Adaptive Multiagent Reinforcement Learning with Non-positive Regret

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Opponent learning awareness and modelling in multi-objective normal form games

Abstract

Access this article

Similar content being viewed by others

A Game-Theoretic Approach to Multi-agent Trust Region Optimization

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

Adaptive Multiagent Reinforcement Learning with Non-positive Regret

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation