Imitating Inscrutable Enemies: Learning from Stochastic Policy Observation, Retrieval and Reuse

  • Kellen Gillespie
  • Justin Karneeb
  • Stephen Lee-Urban
  • Héctor Muñoz-Avila
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6176)


In this paper we study the topic of CBR systems learning from observations in which those observations can be represented as stochastic policies. We describe a general framework which encompasses three steps: (1) it observes agents performing actions, elicits stochastic policies representing the agents’ strategies and retains these policies as cases. (2) The agent analyzes the environment and retrieves a suitable stochastic policy. (3) The agent then executes the retrieved stochastic policy, which results in the agent mimicking the previously observed agent. We implement our framework in a system called JuKeCB that observes and mimics players playing games. We present the results of three sets of experiments designed to evaluate our framework. The first experiment demonstrates that JuKeCB performs well when trained against a variety of fixed strategy opponents. The second experiment demonstrates that JuKeCB can also, after training, win against an opponent with a dynamic strategy. The final experiment demonstrates that JuKeCB can win against "new" opponents (i.e. opponents against which JuKeCB is untrained).


learning from observation case capture and reuse policy 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs (1995)MATHGoogle Scholar
  2. 2.
    López de Mántaras, R., McSherry, D., Bridge, D., Leake, D., Smyth, B., Craw, S., Faltings, B., Maher, M., Cox, M., Forbus, K., Keane, M., Aamodt, A., Watson, I.: Retrieval, reuse, revision, and retention in case-based reasoning. Knowledge Engineering Review 20(03), 215–240 (2005)CrossRefGoogle Scholar
  3. 3.
    Fikes, R.E., Nilsson, N.J.: STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence 2, 189–205 (1971)MATHCrossRefGoogle Scholar
  4. 4.
    Epstein, S.L., Shih, J.: Sequential Instance-Based Learning. In: Mercer, R., Neufeld, E. (eds.) Canadian AI 1998. LNCS, vol. 1418, pp. 442–454. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  5. 5.
    Aha, D., Molineaux, M., Ponsen, M.: Learning to win: Case-based plan selection in a real-time strategy game. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS (LNAI), vol. 3620, pp. 5–20. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Ontañón, S., Mishra, K., Sugandh, N., Ram, A.: Case-based planning and execution for real-time strategy games. In: Weber, R.O., Richter, M.M. (eds.) ICCBR 2007. LNCS (LNAI), vol. 4626, pp. 164–178. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. 7.
    Auslander, B., Lee-Urban, S., Hogg, C., Muñoz-Avila, H.: Recognizing the enemy: Combining reinforcement learning with strategy selection using case-based reasoning. In: Althoff, K.-D., Bergmann, R., Minor, M., Hanft, A. (eds.) ECCBR 2008. LNCS (LNAI), vol. 5239, pp. 59–73. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. 8.
    Sharma, M., Holmes, M., Santamaria, J.C., Irani, A., Isbell, C., Ram, A.: Transfer learning in real-time strategy games using hybrid CBR/RL. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 1041–1046 (2007)Google Scholar
  9. 9.
    Bridge, D.: The virtue of reward: Performance, reinforcement and discovery in case-based reasoning. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS (LNAI), vol. 3620, p. 1. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Molineaux, M., Aha, D.W., Sukthankar, G.: Beating the defense: Using plan recognition to inform learning agents. In: The Proceedings of the Twenty-Second International FLAIRS Conference, pp. 257–262. AAAI Press, Sanibel Island (2009)Google Scholar
  11. 11.
    Aamodt, A., Plaza, E.: Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Communications 7(1), 39–59 (1994)Google Scholar
  12. 12.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  13. 13.
    Vasta, M., Lee-Urban, S., Munoz-Avila, H.: RETALIATE: Learning Winning Policies in First-Person Shooter Games. In: Proceedings of the Seventeenth Innovative Applications of Artificial Intelligence Conference (IAAI 2007), pp. 1801–1806. AAAI Press, Menlo Park (2007)Google Scholar
  14. 14.
    Munoz-Avila, H., Aha, D.W., Jaidee, U., Klenk, M., Molineaux, M.: Applying goal directed autonomy to a team shooter game. In: Proceedings of the Twenty-Third Florida Artificial Intelligence Research Society Conference. AAAI Press, Daytona Beach (to appear, 2010)Google Scholar
  15. 15.
    Muñoz-Avila, H., Cox, M.: Case-based plan adaptation: An analysis and review. IEEE Intelligent Systems 23(4), 75–81 (2008)Google Scholar
  16. 16.
    Hammond, K.J.: Case-based planning: Viewing planning as a memory task. Academic Press, San Diego (1989)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Kellen Gillespie
    • 1
  • Justin Karneeb
    • 1
  • Stephen Lee-Urban
    • 1
  • Héctor Muñoz-Avila
    • 1
  1. 1.Department of Computer Science and EngineeringLehigh UniversityBethlehemUSA

Personalised recommendations