Subgoal Identification for Reinforcement Learning and Planning in Multiagent Problem Solving
We provide a new probability flow analysis algorithm to automatically identify subgoals in a problem space. Our flow analysis, inspired by preflow-push algorithms, measures the topological structure of the problem space to identify states that connect different subset of state space as the subgoals within linear-time complexity. Then we apply a hybrid approach known as subgoal-based SMDP (semi-Markov Decision Process) that is composed of reinforcement learning and planning based on the identified subgoals to solve the problem in a multiagent environment. The effectiveness of this new method used in a multiagent system is demonstrated and evaluated using a capture-the-flag scenario. We showed also that the cooperative coordination emerged between two agents in the scenario through distributed policy learning.
Unable to display preview. Download preview PDF.
- 1.Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Englewood Cliffs (1993)Google Scholar
- 3.Botea, A., Müller, M., Schaeffer, J.: Using Component Abstraction for Automatic Generation of Macro-Actions. In: Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling, pp. 181–190. AAAI Press, Stanford, California, USA (2004)Google Scholar
- 4.Digney, B.: Learning Hierarchical Control Structure for Multiple Tasks and Changing Environments. In: Proceedings of the Fifth Conference on the Simulation of Adaptive Behavior (1998)Google Scholar
- 8.Mannor, S., Menache, I., Hoze, A., Klein, U.: Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 560–567. ACM Press, New York (2004)Google Scholar
- 9.McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 361–368. Morgan Kaufmann, San Francisco (2001)Google Scholar
- 12.Şimşek, Ö., Wolfe, A.P., Barto, A.G.: Identifying Useful Subgoals in Reinforcement Learning by Local Graph Partitioning. In: Proceedings of the Twenty-Second International Conference on Machine Learning, pp. 816–823. ACM Press, New York (2005)Google Scholar
- 13.Şimşek, Ö., Barto, A.G.: Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 751–758. ACM Press, New York (2004)Google Scholar