Advertisement

Autonomous Agents and Multi-Agent Systems

, Volume 31, Issue 4, pp 767–789 | Cite as

Efficiently detecting switches against non-stationary opponents

  • Pablo Hernandez-Leal
  • Yusen Zhan
  • Matthew E. Taylor
  • L. Enrique Sucar
  • Enrique Munoz de Cote
Article

Abstract

Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner’s dilemma and then in a more realistic scenario, the Power TAC simulator.

Keywords

Learning Non-stationary environments Switching strategies Repeated games 

Notes

Acknowledgements

This research was supported partially by project CB-2012-01-183684 and scholarship 335245/234507 granted by Consejo Nacional de Ciencia y Tecnologia (CONACyT) Mexico. This research has taken place in part at the Intelligent Robot Learning (IRL) Lab, Washington State University. IRL research is supported in part by grants NSF IIS-1149917, NSF IIS-1319412, USDA 2014-67021-22174, and a Google Research Award.

References

  1. 1.
    Abdallah, S., & Lesser, V. (2008). A multiagent reinforcement learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research, 33(1), 521–549.MathSciNetzbMATHGoogle Scholar
  2. 2.
    Adams, R. P., & MacKay, D. (2007). Bayesian online changepoint detection. arXiv:0710.3742v1 [stat.ML]
  3. 3.
    Albrecht, S. V., & Ramamoorthy, S. (2013). A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems. In Proceedings of 15th international conference on autonomous agents and multiagent systems (pp. 1155–1156).Google Scholar
  4. 4.
    Almeida, A., Ramalho, G., Santana, H., Tedesco, P., Menezes, T., Corruble, V., Chevaleyre, Y. (2004). Recent advances on multi-agent patrolling. In Advances in artificial intelligence—SBIA 2004 (pp. 474–483). IEEE.Google Scholar
  5. 5.
    Axelrod, R., & Hamilton, W. D. (1981). The evolution of cooperation. Science, 211(27), 1390–1396.MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Banerjee, B., & Peng, J. (2005). Efficient learning of multi-step best response. In Proceedings of the 4th international conference on autonomous agents and multiagent systems (pp. 60–66). Utretch: ACM.Google Scholar
  7. 7.
    Barrett, S., & Stone, P. (2014). Cooperating with unknown teammates in complex domains: A robot soccer case study of Ad Hoc teamwork. In Twenty-ninth AAAI conference on artificial intelligence (pp. 2010–2016). Austin, Texas.Google Scholar
  8. 8.
    Bellman, R. (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 6(5), 679–684.MathSciNetzbMATHGoogle Scholar
  9. 9.
    Bowling, M., Burch, N., Johanson, M., & Tammelin, O. (2015). Heads-up limit hold’em poker is solved. Science, 347, 145–149.CrossRefGoogle Scholar
  10. 10.
    Bowling, M., & Veloso, M. (2002). Multiagent learning using a variable learning rate. Artificial Intelligence, 136(2), 215–250.MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Brafman, R. I., & Tennenholtz, M. (2003). R-MAX a general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning Research, 3, 213–231.MathSciNetzbMATHGoogle Scholar
  12. 12.
    Brown, G. W. (1951). Iterative solution of games by fictitious play. Activity Analysis of Production and Allocation, 13(1), 374–376.MathSciNetzbMATHGoogle Scholar
  13. 13.
    Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 38(2), 156–172.CrossRefGoogle Scholar
  14. 14.
    Chakraborty, D., & Stone, P. (2013). Multiagent learning in the presence of memory-bounded agents. Autonomous Agents and Multi-agent Systems, 28(2), 182–213.CrossRefGoogle Scholar
  15. 15.
    Choi, S. P. M., Yeung, D. Y., Zhang, N. L. (1999). An environment model for nonstationary reinforcement learning. In Advances in neural information processing systems (pp. 987–993). Denver, Colorado.Google Scholar
  16. 16.
    Conitzer, V., & Sandholm, T. (2006). AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 67(1–2), 23–43.Google Scholar
  17. 17.
    Crandall, J. W., & Goodrich, M. A. (2011). Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning. Machine Learning, 82(3), 281–314.MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Da Silva, B. C., Basso, E. W., Bazzan, A. L., Engel, P. M. (2006). Dealing with non-stationary environments using context detection. In Proceedings of the 23rd international conference on machine learnig (pp. 217–224). Pittsburgh, Pennsylvania.Google Scholar
  19. 19.
    de Cote, E. M., Chapman, A. C., Sykulski, A. M., Jennings, N. R. (2010). Automated planning in repeated adversarial games. In Uncertainty in artificial intelligence (pp. 376–383). Catalina Island, California.Google Scholar
  20. 20.
    Doshi, P., & Gmytrasiewicz, P. J. (2006). On the difficulty of achieving equilibrium in interactive POMDPs. In Twenty-first national conference on artificial intelligence (pp. 1131–1136). Boston, MA.Google Scholar
  21. 21.
    Elidrisi, M., Johnson, N., Gini, M., Crandall, J. W. (2014). Fast adaptive learning in repeated stochastic games by game abstraction. In Proceedings of the 13th international conference on autonomous agents and multiagent systems (pp. 1141–1148). Paris.Google Scholar
  22. 22.
    Fulda, N., & Ventura, D. (2007). Predicting and preventing coordination problems in cooperative Q-learning systems. In Proceedings of the twentieth international joint conference on artificial intelligence (pp. 780–785). Hyderabad.Google Scholar
  23. 23.
    Gama, J., Medas, P., Castillo, G., Rodrigues, P. (2004). Learning with drift detection. In Advances in artificial intelligence—SBIA (pp. 286–295).Google Scholar
  24. 24.
    Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4), 44.CrossRefzbMATHGoogle Scholar
  25. 25.
    Hernandez-Leal, P., Munoz de Cote, E., Sucar, L. E. (2014). Exploration strategies to detect strategy switches. In Proceedings of the adaptive learning agents workshop (ALA). ParisGoogle Scholar
  26. 26.
    Hernandez-Leal, P., Rosman, B., Taylor, M. E., Sucar, L. E., Munoz de Cote, E.(2016). A Bayesian approach for learning and tracking switching, non-stationary opponents (extended abstract). In Proceedings of 15th international conference on autonomous agents and multiagent systems (pp. 1315–1316). Singapore.Google Scholar
  27. 27.
    Hernandez-Leal, P., Taylor, M. E., Rosman, B., Sucar, L. E., Munoz de Cote, E. (2016). Identifying and tracking switching, non-stationary opponents: A Bayesian approach. In Multiagent interaction without prior coordination workshop at AAAI. Phoenix, AZGoogle Scholar
  28. 28.
    Hernandez-Leal, P., Munoz de Cote, E., & Sucar, L. E. (2014). A framework for learning and planning against switching strategies in repeated games. Connection Science, 26(2), 103–122.CrossRefGoogle Scholar
  29. 29.
    Hido, S., Idé, T., Kashima, H., Kubo, H., Matsuzawa, H. (2008). Unsupervised change analysis using supervised learning. In Advances in knowledge discovery and data mining (pp. 148–159). Berlin: Springer.Google Scholar
  30. 30.
    Ketter, W., Collins, J., & Reddy, P. P. (2013). Power TAC: A competitive economic simulation of the smart grid. Energy Economics, 39, 262–270.CrossRefGoogle Scholar
  31. 31.
    Ketter, W., Collins, J., Reddy, P. P., & Weerdt, M. D. (2014). The 2014 power trading agent competition. Rotterdam: Department of Decision and Information Sciencies, Erasmus University.Google Scholar
  32. 32.
    Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning (pp. 157–163). New Brunswick, NJ.Google Scholar
  33. 33.
    Littman, M. L., & Stone, P.(2001). Implicit negotiation in repeated games. In ATAL ’01: Revised papers from the 8th international workshop on intelligent agents VIII.Google Scholar
  34. 34.
    Nash, J. F. (1950). Equilibrium points in n-person games. Proceedings of the National Academy of Sciences, 36(1), 48–49.MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Nudelman, E., Wortman, J., Shoham, Y., Leyton-Brown, K. (2004). Run the GAMUT: A comprehensive approach to evaluating game-theoretic algorithms. In Proceedings of the 3rd international conference on autonomous agents and multiagent systems (pp. 880–887). New York, NY.Google Scholar
  36. 36.
    Papadimitriou, C. H., & Tsitsiklis, J. N. (1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3), 441–450.MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In Proceedings of the 19th international joint conference on artificial intelligence (pp. 817–822). Edinburg: Morgan Kaufmann Publishers Inc.Google Scholar
  38. 38.
    Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.CrossRefzbMATHGoogle Scholar
  39. 39.
    Rosman, B., Hawasly, M., Ramamoorthy, S. (2016). Bayesian policy reuse. Machine Learning, 104(1), 99–127.Google Scholar
  40. 40.
    Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. The Journal of Machine Learning Research, 10, 1633–1685.MathSciNetzbMATHGoogle Scholar
  41. 41.
    Tesauro, G., & Bredin, J. L. (2002). Strategic sequential bidding in auctions using dynamic programming. In Proceedings of the 1st international conference on autonomous agents and multiagent systems (p. 591). Bologna: ACM Request Permissions.Google Scholar
  42. 42.
    Urieli, D., & Stone, P. (2014). TacTex 13: A champion adaptive power trading agent. In Proceedings of the twenty-eighth conference on artificial intelligence (pp. 465–471). Quebec.Google Scholar
  43. 43.
    Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.zbMATHGoogle Scholar
  44. 44.
    Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1), 69–101.Google Scholar
  45. 45.
    Wilson, E. B. (1927). Probable inference, the law of succesion, and statistical inference. Journal of the American Statistical Association, 22(158), 209–212.CrossRefGoogle Scholar
  46. 46.
    Wurman, P. R., Walsh, W. E., & Wellman, M. (1998). Flexible double auctions for electronic commerce: Theory and implementation. Decision Support Systems, 24(1), 17–27.CrossRefGoogle Scholar
  47. 47.
    Yamada, M., Kimura, A., Naya, F., Sawada, H. (2013). Change-point detection with feature selection in high-dimensional time-series data. In Proceedings of the 23rd international joint conference on artificial intelligence (pp. 1827–1833). Bellevue, Washington.Google Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  • Pablo Hernandez-Leal
    • 1
  • Yusen Zhan
    • 2
  • Matthew E. Taylor
    • 2
  • L. Enrique Sucar
    • 3
  • Enrique Munoz de Cote
    • 3
    • 4
  1. 1.Centrum Wiskunde & Informatica (CWI)AmsterdamThe Netherlands
  2. 2.Washington State UniversityPullmanUSA
  3. 3.Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE)PueblaMexico
  4. 4.PROWLER.io Ltd.CambridgeUnited Kingdom

Personalised recommendations