Subgoal Identification for Reinforcement Learning and Planning in Multiagent Problem Solving

  • Chung-Cheng Chiu
  • Von-Wun Soo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4687)


We provide a new probability flow analysis algorithm to automatically identify subgoals in a problem space. Our flow analysis, inspired by preflow-push algorithms, measures the topological structure of the problem space to identify states that connect different subset of state space as the subgoals within linear-time complexity. Then we apply a hybrid approach known as subgoal-based SMDP (semi-Markov Decision Process) that is composed of reinforcement learning and planning based on the identified subgoals to solve the problem in a multiagent environment. The effectiveness of this new method used in a multiagent system is demonstrated and evaluated using a capture-the-flag scenario. We showed also that the cooperative coordination emerged between two agents in the scenario through distributed policy learning.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Englewood Cliffs (1993)Google Scholar
  2. 2.
    Barto, A.G., Mahadevan, S.: Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems 13(4), 341–379 (2003)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Botea, A., Müller, M., Schaeffer, J.: Using Component Abstraction for Automatic Generation of Macro-Actions. In: Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling, pp. 181–190. AAAI Press, Stanford, California, USA (2004)Google Scholar
  4. 4.
    Digney, B.: Learning Hierarchical Control Structure for Multiple Tasks and Changing Environments. In: Proceedings of the Fifth Conference on the Simulation of Adaptive Behavior (1998)Google Scholar
  5. 5.
    Dietterich, T.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)zbMATHMathSciNetGoogle Scholar
  6. 6.
    Erol, K., Hendler, J., Nau, D.: Complexity results for HTN planning. Annals of Mathematics and Artificial Intelligence 18(1), 69–93 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Knoblock, C.A.: Automatically Generating Abstractions for Planning. Artificial Intelligence 68(2), 243–302 (1994)zbMATHCrossRefGoogle Scholar
  8. 8.
    Mannor, S., Menache, I., Hoze, A., Klein, U.: Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 560–567. ACM Press, New York (2004)Google Scholar
  9. 9.
    McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 361–368. Morgan Kaufmann, San Francisco (2001)Google Scholar
  10. 10.
    Menache, I., Mannor, S., Shimkin, N.: Q-Cut - Dynamic discovery of sub-goals in reinforcement learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 295–306. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Ghavamzadeh, M., Mahadevan, S., Makar, R.: Hierarchical multi-agent reinforcement learning. Journal of Autonomous Agents and Multiagent Systems 13(2), 197–229 (2006)CrossRefGoogle Scholar
  12. 12.
    Şimşek, Ö., Wolfe, A.P., Barto, A.G.: Identifying Useful Subgoals in Reinforcement Learning by Local Graph Partitioning. In: Proceedings of the Twenty-Second International Conference on Machine Learning, pp. 816–823. ACM Press, New York (2005)Google Scholar
  13. 13.
    Şimşek, Ö., Barto, A.G.: Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 751–758. ACM Press, New York (2004)Google Scholar
  14. 14.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)CrossRefGoogle Scholar
  15. 15.
    Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1), 181–211 (1999)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Chung-Cheng Chiu
    • 1
  • Von-Wun Soo
    • 1
    • 2
  1. 1.Department of Computer Science, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu, TaiwanR.O.C.
  2. 2.Department of Computer Science and Information Engineering, National Kaohsiung University, 700, Kaohsiung University Rd, Nan Tzu Dist., 811. Kaohsiung, TaiwanR.O.C.

Personalised recommendations