Using a Priori Information for Fast Learning Against Non-stationary Opponents

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8864)


For an agent to be successful in interacting against many different and unknown types of opponents it should excel at learning fast a model of the opponent and adapt online to non-stationary (changing) strategies. Recent works have tackled this problem by continuously learning models of the opponent while checking for switches in the opponent strategy. However, these approaches fail to use a priori information which can be useful for a faster detection of the opponent model. Moreover, if an opponent uses only a finite set of strategies, then maintaining a list of those strategies would also provide benefits for future interactions, in case of opponents who return to previous strategies (such as periodic opponents). Our contribution is twofold, first, we propose an algorithm that can use a priori information, in the form of a set of models, in order to promote a faster detection of the opponent model. The second is an algorithm that while learning new models keeps a record of them in case the opponent reuses one of those. Our approach outperforms the state of the art algorithms in the field (in terms of model quality and cumulative rewards) in the domain of the iterated prisoner’s dilemma against a non-stationary opponent that switches among different strategies.


Optimal Policy Multiagent System Markov Decision Process Stochastic Game Repeated Game 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Axelrod, R., Hamilton, W.D.: The evolution of cooperation. Science 211(27), 1390–1396 (1981)CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Bard, N., Bowling, M.: Particle filtering for dynamic agent modelling in simplified poker. In: Proceedings of the 22nd Conference on Artificial Intelligence, pp. 515–521. AAAI Press, MIT Press, Menlo Park, Cambridge (1999, 2007)Google Scholar
  3. 3.
    Bellman, R.: A Markovian decision process. Journal of Mathematics and Mechanics 6(5) (1957)Google Scholar
  4. 4.
    Bó, P.D.: Cooperation under the shadow of the future: experimental evidence from infinitely repeated games. American Economic Review, 1591–1604 (2005)Google Scholar
  5. 5.
    Camerer, C.F.: Behavioral Game Theory: Experiments in Strategic Interaction. Roundtable Series in Behavioral Economics. Princeton University Press (February 2003)Google Scholar
  6. 6.
    De Hauwere, Y.M., Devlinb, S., Kudenko, D., Nowé, A.: Context-sensitive reward shaping for sparse interaction MAS. In: 25th Benelux Conference on Artificial Intelligence, Delft, Netherlands (2013)Google Scholar
  7. 7.
    Elidrisi, M., Johnson, N., Gini, M., Crandall, J.: Fast adaptive learning in repeated stochastic games by game abstraction. In: Proceedings of the Autonomous Agents and Multiagent Systems, Paris, France (2014)Google Scholar
  8. 8.
    Elidrisi, M., Johnson, N., Gini, M.: Fast Learning against Adaptive Adversarial Opponents. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, Valencia, Spain (November 2012)Google Scholar
  9. 9.
    Goeree, J., Holt, C.: Ten little treasures of game theory and ten intuitive contradictions. American Economic Review, 1402–1422 (2001)Google Scholar
  10. 10.
    Hernandez-Leal, P., Munoz de Cote, E., Sucar, L.E.: Modeling Non-Stationary Opponents. In: Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2013), pp. 1135–1136 (May 2013)Google Scholar
  11. 11.
    Hernandez-Leal, P., Munoz de Cote, E., Sucar, L.E.: A framework for learning and planning against swithching strategies in repeated games. Connection Science 26(2), 103–122 (2014)CrossRefGoogle Scholar
  12. 12.
    Littman, M.L., Stone, P.: Implicit Negotiation in Repeated Games. In: Meyer, J.-J.C., Tambe, M. (eds.) ATAL 2001. LNCS (LNAI), vol. 2333, pp. 393–404. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  13. 13.
    Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, pp. 278–287 (1999)Google Scholar
  14. 14.
    Puterman, M.: Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, Inc. (1994)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Instituto Nacional de AstrofísicaÓptica y ElectrónicaPueblaMexico

Personalised recommendations