Using a Priori Information for Fast Learning Against Non-stationary Opponents
For an agent to be successful in interacting against many different and unknown types of opponents it should excel at learning fast a model of the opponent and adapt online to non-stationary (changing) strategies. Recent works have tackled this problem by continuously learning models of the opponent while checking for switches in the opponent strategy. However, these approaches fail to use a priori information which can be useful for a faster detection of the opponent model. Moreover, if an opponent uses only a finite set of strategies, then maintaining a list of those strategies would also provide benefits for future interactions, in case of opponents who return to previous strategies (such as periodic opponents). Our contribution is twofold, first, we propose an algorithm that can use a priori information, in the form of a set of models, in order to promote a faster detection of the opponent model. The second is an algorithm that while learning new models keeps a record of them in case the opponent reuses one of those. Our approach outperforms the state of the art algorithms in the field (in terms of model quality and cumulative rewards) in the domain of the iterated prisoner’s dilemma against a non-stationary opponent that switches among different strategies.
KeywordsOptimal Policy Multiagent System Markov Decision Process Stochastic Game Repeated Game
Unable to display preview. Download preview PDF.
- 2.Bard, N., Bowling, M.: Particle filtering for dynamic agent modelling in simplified poker. In: Proceedings of the 22nd Conference on Artificial Intelligence, pp. 515–521. AAAI Press, MIT Press, Menlo Park, Cambridge (1999, 2007)Google Scholar
- 3.Bellman, R.: A Markovian decision process. Journal of Mathematics and Mechanics 6(5) (1957)Google Scholar
- 4.Bó, P.D.: Cooperation under the shadow of the future: experimental evidence from infinitely repeated games. American Economic Review, 1591–1604 (2005)Google Scholar
- 5.Camerer, C.F.: Behavioral Game Theory: Experiments in Strategic Interaction. Roundtable Series in Behavioral Economics. Princeton University Press (February 2003)Google Scholar
- 6.De Hauwere, Y.M., Devlinb, S., Kudenko, D., Nowé, A.: Context-sensitive reward shaping for sparse interaction MAS. In: 25th Benelux Conference on Artificial Intelligence, Delft, Netherlands (2013)Google Scholar
- 7.Elidrisi, M., Johnson, N., Gini, M., Crandall, J.: Fast adaptive learning in repeated stochastic games by game abstraction. In: Proceedings of the Autonomous Agents and Multiagent Systems, Paris, France (2014)Google Scholar
- 8.Elidrisi, M., Johnson, N., Gini, M.: Fast Learning against Adaptive Adversarial Opponents. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, Valencia, Spain (November 2012)Google Scholar
- 9.Goeree, J., Holt, C.: Ten little treasures of game theory and ten intuitive contradictions. American Economic Review, 1402–1422 (2001)Google Scholar
- 10.Hernandez-Leal, P., Munoz de Cote, E., Sucar, L.E.: Modeling Non-Stationary Opponents. In: Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2013), pp. 1135–1136 (May 2013)Google Scholar
- 13.Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, pp. 278–287 (1999)Google Scholar
- 14.Puterman, M.: Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, Inc. (1994)Google Scholar