Efficiently detecting switches against non-stationary opponents

Hernandez-Leal, Pablo; Zhan, Yusen; Taylor, Matthew E.; Sucar, L. Enrique; Munoz de Cote, Enrique

doi:10.1007/s10458-016-9352-6

Efficiently detecting switches against non-stationary opponents

Published: 26 November 2016

Volume 31, pages 767–789, (2017)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Pablo Hernandez-Leal¹,
Yusen Zhan²,
Matthew E. Taylor²,
L. Enrique Sucar³ &
…
Enrique Munoz de Cote^3,4

840 Accesses
18 Citations
Explore all metrics

Abstract

Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner’s dilemma and then in a more realistic scenario, the Power TAC simulator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Sven Gronauer & Klaus Diepold

Monte Carlo Tree Search: a review of recent modifications and applications

Article Open access 19 July 2022

Maciej Świechowski, Konrad Godlewski, … Jacek Mańdziuk

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Notes

These can be, for example, previous actions of the agents.
Where \(s_{-i}\) denotes the set of all agents except i.
One model uses a fixed size window of past interactions while the other uses all historic interactions.
In an ergodic set it is possible to go from every state to every state.
Other authors have seen a related behavior which is called observationally equivalent models [20].
Power TAC takes these prices as negative since it as a buying action.

References

Abdallah, S., & Lesser, V. (2008). A multiagent reinforcement learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research, 33(1), 521–549.
MathSciNet MATH Google Scholar
Adams, R. P., & MacKay, D. (2007). Bayesian online changepoint detection. arXiv:0710.3742v1 [stat.ML]
Albrecht, S. V., & Ramamoorthy, S. (2013). A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems. In Proceedings of 15th international conference on autonomous agents and multiagent systems (pp. 1155–1156).
Almeida, A., Ramalho, G., Santana, H., Tedesco, P., Menezes, T., Corruble, V., Chevaleyre, Y. (2004). Recent advances on multi-agent patrolling. In Advances in artificial intelligence—SBIA 2004 (pp. 474–483). IEEE.
Axelrod, R., & Hamilton, W. D. (1981). The evolution of cooperation. Science, 211(27), 1390–1396.
Article MathSciNet MATH Google Scholar
Banerjee, B., & Peng, J. (2005). Efficient learning of multi-step best response. In Proceedings of the 4th international conference on autonomous agents and multiagent systems (pp. 60–66). Utretch: ACM.
Barrett, S., & Stone, P. (2014). Cooperating with unknown teammates in complex domains: A robot soccer case study of Ad Hoc teamwork. In Twenty-ninth AAAI conference on artificial intelligence (pp. 2010–2016). Austin, Texas.
Bellman, R. (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 6(5), 679–684.
MathSciNet MATH Google Scholar
Bowling, M., Burch, N., Johanson, M., & Tammelin, O. (2015). Heads-up limit hold’em poker is solved. Science, 347, 145–149.
Article Google Scholar
Bowling, M., & Veloso, M. (2002). Multiagent learning using a variable learning rate. Artificial Intelligence, 136(2), 215–250.
Article MathSciNet MATH Google Scholar
Brafman, R. I., & Tennenholtz, M. (2003). R-MAX a general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning Research, 3, 213–231.
MathSciNet MATH Google Scholar
Brown, G. W. (1951). Iterative solution of games by fictitious play. Activity Analysis of Production and Allocation, 13(1), 374–376.
MathSciNet MATH Google Scholar
Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 38(2), 156–172.
Article Google Scholar
Chakraborty, D., & Stone, P. (2013). Multiagent learning in the presence of memory-bounded agents. Autonomous Agents and Multi-agent Systems, 28(2), 182–213.
Article Google Scholar
Choi, S. P. M., Yeung, D. Y., Zhang, N. L. (1999). An environment model for nonstationary reinforcement learning. In Advances in neural information processing systems (pp. 987–993). Denver, Colorado.
Conitzer, V., & Sandholm, T. (2006). AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 67(1–2), 23–43.
Google Scholar
Crandall, J. W., & Goodrich, M. A. (2011). Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning. Machine Learning, 82(3), 281–314.
Article MathSciNet MATH Google Scholar
Da Silva, B. C., Basso, E. W., Bazzan, A. L., Engel, P. M. (2006). Dealing with non-stationary environments using context detection. In Proceedings of the 23rd international conference on machine learnig (pp. 217–224). Pittsburgh, Pennsylvania.
de Cote, E. M., Chapman, A. C., Sykulski, A. M., Jennings, N. R. (2010). Automated planning in repeated adversarial games. In Uncertainty in artificial intelligence (pp. 376–383). Catalina Island, California.
Doshi, P., & Gmytrasiewicz, P. J. (2006). On the difficulty of achieving equilibrium in interactive POMDPs. In Twenty-first national conference on artificial intelligence (pp. 1131–1136). Boston, MA.
Elidrisi, M., Johnson, N., Gini, M., Crandall, J. W. (2014). Fast adaptive learning in repeated stochastic games by game abstraction. In Proceedings of the 13th international conference on autonomous agents and multiagent systems (pp. 1141–1148). Paris.
Fulda, N., & Ventura, D. (2007). Predicting and preventing coordination problems in cooperative Q-learning systems. In Proceedings of the twentieth international joint conference on artificial intelligence (pp. 780–785). Hyderabad.
Gama, J., Medas, P., Castillo, G., Rodrigues, P. (2004). Learning with drift detection. In Advances in artificial intelligence—SBIA (pp. 286–295).
Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4), 44.
Article MATH Google Scholar
Hernandez-Leal, P., Munoz de Cote, E., Sucar, L. E. (2014). Exploration strategies to detect strategy switches. In Proceedings of the adaptive learning agents workshop (ALA). Paris
Hernandez-Leal, P., Rosman, B., Taylor, M. E., Sucar, L. E., Munoz de Cote, E.(2016). A Bayesian approach for learning and tracking switching, non-stationary opponents (extended abstract). In Proceedings of 15th international conference on autonomous agents and multiagent systems (pp. 1315–1316). Singapore.
Hernandez-Leal, P., Taylor, M. E., Rosman, B., Sucar, L. E., Munoz de Cote, E. (2016). Identifying and tracking switching, non-stationary opponents: A Bayesian approach. In Multiagent interaction without prior coordination workshop at AAAI. Phoenix, AZ
Hernandez-Leal, P., Munoz de Cote, E., & Sucar, L. E. (2014). A framework for learning and planning against switching strategies in repeated games. Connection Science, 26(2), 103–122.
Article Google Scholar
Hido, S., Idé, T., Kashima, H., Kubo, H., Matsuzawa, H. (2008). Unsupervised change analysis using supervised learning. In Advances in knowledge discovery and data mining (pp. 148–159). Berlin: Springer.
Ketter, W., Collins, J., & Reddy, P. P. (2013). Power TAC: A competitive economic simulation of the smart grid. Energy Economics, 39, 262–270.
Article Google Scholar
Ketter, W., Collins, J., Reddy, P. P., & Weerdt, M. D. (2014). The 2014 power trading agent competition. Rotterdam: Department of Decision and Information Sciencies, Erasmus University.
Google Scholar
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning (pp. 157–163). New Brunswick, NJ.
Littman, M. L., & Stone, P.(2001). Implicit negotiation in repeated games. In ATAL ’01: Revised papers from the 8th international workshop on intelligent agents VIII.
Nash, J. F. (1950). Equilibrium points in n-person games. Proceedings of the National Academy of Sciences, 36(1), 48–49.
Article MathSciNet MATH Google Scholar
Nudelman, E., Wortman, J., Shoham, Y., Leyton-Brown, K. (2004). Run the GAMUT: A comprehensive approach to evaluating game-theoretic algorithms. In Proceedings of the 3rd international conference on autonomous agents and multiagent systems (pp. 880–887). New York, NY.
Papadimitriou, C. H., & Tsitsiklis, J. N. (1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3), 441–450.
Article MathSciNet MATH Google Scholar
Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In Proceedings of the 19th international joint conference on artificial intelligence (pp. 817–822). Edinburg: Morgan Kaufmann Publishers Inc.
Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.
Book MATH Google Scholar
Rosman, B., Hawasly, M., Ramamoorthy, S. (2016). Bayesian policy reuse. Machine Learning, 104(1), 99–127.
Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. The Journal of Machine Learning Research, 10, 1633–1685.
MathSciNet MATH Google Scholar
Tesauro, G., & Bredin, J. L. (2002). Strategic sequential bidding in auctions using dynamic programming. In Proceedings of the 1st international conference on autonomous agents and multiagent systems (p. 591). Bologna: ACM Request Permissions.
Urieli, D., & Stone, P. (2014). TacTex 13: A champion adaptive power trading agent. In Proceedings of the twenty-eighth conference on artificial intelligence (pp. 465–471). Quebec.
Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.
MATH Google Scholar
Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1), 69–101.
Google Scholar
Wilson, E. B. (1927). Probable inference, the law of succesion, and statistical inference. Journal of the American Statistical Association, 22(158), 209–212.
Article Google Scholar
Wurman, P. R., Walsh, W. E., & Wellman, M. (1998). Flexible double auctions for electronic commerce: Theory and implementation. Decision Support Systems, 24(1), 17–27.
Article Google Scholar
Yamada, M., Kimura, A., Naya, F., Sawada, H. (2013). Change-point detection with feature selection in high-dimensional time-series data. In Proceedings of the 23rd international joint conference on artificial intelligence (pp. 1827–1833). Bellevue, Washington.

Download references

Acknowledgements

This research was supported partially by project CB-2012-01-183684 and scholarship 335245/234507 granted by Consejo Nacional de Ciencia y Tecnologia (CONACyT) Mexico. This research has taken place in part at the Intelligent Robot Learning (IRL) Lab, Washington State University. IRL research is supported in part by grants NSF IIS-1149917, NSF IIS-1319412, USDA 2014-67021-22174, and a Google Research Award.

Author information

Authors and Affiliations

Centrum Wiskunde & Informatica (CWI), Science Park 123, Amsterdam, The Netherlands
Pablo Hernandez-Leal
Washington State University, Pullman, Washington, USA
Yusen Zhan & Matthew E. Taylor
Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Puebla, Mexico
L. Enrique Sucar & Enrique Munoz de Cote
PROWLER.io Ltd., Cambridge, United Kingdom
Enrique Munoz de Cote

Authors

Pablo Hernandez-Leal
View author publications
You can also search for this author in PubMed Google Scholar
Yusen Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Matthew E. Taylor
View author publications
You can also search for this author in PubMed Google Scholar
L. Enrique Sucar
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Munoz de Cote
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pablo Hernandez-Leal.

Additional information

Most of this work was performed while the first author was a graduate student at INAOE.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hernandez-Leal, P., Zhan, Y., Taylor, M.E. et al. Efficiently detecting switches against non-stationary opponents. Auton Agent Multi-Agent Syst 31, 767–789 (2017). https://doi.org/10.1007/s10458-016-9352-6

Download citation

Published: 26 November 2016
Issue Date: July 2017
DOI: https://doi.org/10.1007/s10458-016-9352-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficiently detecting switches against non-stationary opponents

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficiently detecting switches against non-stationary opponents

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation