Imitating Inscrutable Enemies: Learning from Stochastic Policy Observation, Retrieval and Reuse
In this paper we study the topic of CBR systems learning from observations in which those observations can be represented as stochastic policies. We describe a general framework which encompasses three steps: (1) it observes agents performing actions, elicits stochastic policies representing the agents’ strategies and retains these policies as cases. (2) The agent analyzes the environment and retrieves a suitable stochastic policy. (3) The agent then executes the retrieved stochastic policy, which results in the agent mimicking the previously observed agent. We implement our framework in a system called JuKeCB that observes and mimics players playing games. We present the results of three sets of experiments designed to evaluate our framework. The first experiment demonstrates that JuKeCB performs well when trained against a variety of fixed strategy opponents. The second experiment demonstrates that JuKeCB can also, after training, win against an opponent with a dynamic strategy. The final experiment demonstrates that JuKeCB can win against "new" opponents (i.e. opponents against which JuKeCB is untrained).
Keywordslearning from observation case capture and reuse policy
Unable to display preview. Download preview PDF.
- 2.López de Mántaras, R., McSherry, D., Bridge, D., Leake, D., Smyth, B., Craw, S., Faltings, B., Maher, M., Cox, M., Forbus, K., Keane, M., Aamodt, A., Watson, I.: Retrieval, reuse, revision, and retention in case-based reasoning. Knowledge Engineering Review 20(03), 215–240 (2005)CrossRefGoogle Scholar
- 7.Auslander, B., Lee-Urban, S., Hogg, C., Muñoz-Avila, H.: Recognizing the enemy: Combining reinforcement learning with strategy selection using case-based reasoning. In: Althoff, K.-D., Bergmann, R., Minor, M., Hanft, A. (eds.) ECCBR 2008. LNCS (LNAI), vol. 5239, pp. 59–73. Springer, Heidelberg (2008)CrossRefGoogle Scholar
- 8.Sharma, M., Holmes, M., Santamaria, J.C., Irani, A., Isbell, C., Ram, A.: Transfer learning in real-time strategy games using hybrid CBR/RL. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 1041–1046 (2007)Google Scholar
- 10.Molineaux, M., Aha, D.W., Sukthankar, G.: Beating the defense: Using plan recognition to inform learning agents. In: The Proceedings of the Twenty-Second International FLAIRS Conference, pp. 257–262. AAAI Press, Sanibel Island (2009)Google Scholar
- 11.Aamodt, A., Plaza, E.: Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Communications 7(1), 39–59 (1994)Google Scholar
- 12.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
- 13.Vasta, M., Lee-Urban, S., Munoz-Avila, H.: RETALIATE: Learning Winning Policies in First-Person Shooter Games. In: Proceedings of the Seventeenth Innovative Applications of Artificial Intelligence Conference (IAAI 2007), pp. 1801–1806. AAAI Press, Menlo Park (2007)Google Scholar
- 14.Munoz-Avila, H., Aha, D.W., Jaidee, U., Klenk, M., Molineaux, M.: Applying goal directed autonomy to a team shooter game. In: Proceedings of the Twenty-Third Florida Artificial Intelligence Research Society Conference. AAAI Press, Daytona Beach (to appear, 2010)Google Scholar
- 15.Muñoz-Avila, H., Cox, M.: Case-based plan adaptation: An analysis and review. IEEE Intelligent Systems 23(4), 75–81 (2008)Google Scholar
- 16.Hammond, K.J.: Case-based planning: Viewing planning as a memory task. Academic Press, San Diego (1989)Google Scholar