Improving reinforcement learning by using sequence trees

Girgin, Sertan; Polat, Faruk; Alhajj, Reda

doi:10.1007/s10994-010-5182-y

Improving reinforcement learning by using sequence trees

Published: 24 April 2010

Volume 81, pages 283–331, (2010)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Improving reinforcement learning by using sequence trees

Download PDF

Sertan Girgin¹,
Faruk Polat¹ &
Reda Alhajj^2,3

829 Accesses
14 Citations
Explore all metrics

Abstract

This paper proposes a novel approach to discover options in the form of stochastic conditionally terminating sequences; it shows how such sequences can be integrated into the reinforcement learning framework to improve the learning performance. The method utilizes stored histories of possible optimal policies and constructs a specialized tree structure during the learning process. The constructed tree facilitates the process of identifying frequently used action sequences together with states that are visited during the execution of such sequences. The tree is constantly updated and used to implicitly run corresponding options. The effectiveness of the method is demonstrated empirically by conducting extensive experiments on various domains with different properties.

References

Asadi, M., & Huber, M. (2005). Autonomous subgoal discovery and hierarchical abstraction for reinforcement learning using Monte Carlo method. In M. M. Veloso, & S. Kambhampati (Eds.), AAAI (pp. 1588–1589). Menlo Park/Cambridge: AAAI Press/MIT Press.
Google Scholar
Barto, A. G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4), 341–379.
Article MathSciNet Google Scholar
Bellman, R. (1957). Dynamic programming. Princeton: Princeton University Press.
MATH Google Scholar
Bianchi, R. A., Ribeiro, C. H., & Costa, A. H. (2008). Accelerating autonomous learning by using heuristic selection of actions. Journal of Heuristics, 14(2), 135–168.
Article Google Scholar
Bradtke, S. J., & Duff, M. O. (1994). Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 393–400). Cambridge: MIT Press.
Google Scholar
Chen, F., Gao, Y., Chen, S., & Ma, Z. (2007). Connect-based subgoal discovery for options in hierarchical reinforcement learning. In ICNC ’07: Proceedings of the third international conference on natural computation (pp. 698–702). Los Alamitos: IEEE Computer Society.
Chapter Google Scholar
Degris, T., Sigaud, O., & Wuillemin, P.-H. (2006). Learning the structure of factored Markov decision processes in reinforcement learning problems. In ICML ’06: Proceedings of the 23rd international conference on machine learning (pp. 257–264). New York: ACM.
Chapter Google Scholar
Dietterich, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13, 227–303.
MATH MathSciNet Google Scholar
Digney, B. (1998). Learning hierarchical control structure for multiple tasks and changing environments. In Proceedings of the fifth conference on the simulation of adaptive behavior: SAB 98.
Girgin, S., Polat, F., & Alhajj, R. (2006a). Effectiveness of considering state similarity for reinforcement learning. In LNCS. The international conference on intelligent data engineering and automated learning. Berlin: Springer.
Google Scholar
Girgin, S., Polat, F., & Alhajj, R. (2006b). Learning by automatic option discovery from conditionally terminating sequences. In The 17th European conference on artificial intelligence. Amsterdam: IOS Press.
Google Scholar
Girgin, S., Polat, F., & Alhajj, R. (2007). State similarity based approach for improving performance in RL. In LNCS. The international joint conference on artificial intelligent. Berlin: Springer.
Google Scholar
Goel, S., & Huber, M. (2003). Subgoal discovery for hierarchical reinforcement learning using learned policies. In I. Russell, & S. M. Haller (Eds.), FLAIRS conference (pp. 346–350). Menlo Park: AAAI Press.
Google Scholar
Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Hierarchical solution of Markov decision processes using macro-actions. In Uncertainty in artificial intelligence (pp. 220–229).
Hengst, B. (2002). Discovering hierarchy in reinforcement learning with HEXQ. In the International conference on machine learning. San Mateo: Morgan Kaufman.
Google Scholar
Jonsson, A., & Barto, A. G. (2001). Automated state abstraction for options using the u-tree algorithm. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems 13 (pp. 1054–1060). Cambridge: MIT Press.
Google Scholar
Kazemitabar, S. J., & Beigy, H. (2009). Automatic discovery of subgoals in reinforcement learning using strongly connected components. In M. Köppen, N. K. Kasabov, & G. G. Coghill (Eds.), Lecture notes in computer science : Vol. 5506. ICONIP (1) (pp. 829–834). Berlin: Springer.
Google Scholar
Kozlova, O., Sigaud, O., & Meyer, C. (2009). Automated discovery of options in factored reinforcement learning. In Proceedings of the ICML/UAI/COLT workshop on abstraction in reinforcement learning (pp. 24–29), Montreal, Canada.
Littman, M., Kaelbling, L., & Moore, A. (1996). Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237–285.
Google Scholar
Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3–4), 293–321.
Google Scholar
Mahadevan, S., Marchallek, N., Das, T. K., & Gosavi, A. (1997). Self-improving factory simulation using continuous-time average-reward reinforcement learning. In Proceedings of the 14th international conference on machine learning (pp. 202–210). San Mateo: Morgan Kaufmann.
Google Scholar
Mannor, S., Menache, I., Hoze, A., & Klein, U. (2004). Dynamic abstraction in reinforcement learning via clustering. In ICML ’04: Proceedings of the 21st international conference on machine learning (pp. 71–78). New York: ACM.
Google Scholar
McGovern, A. (1998). Acquire-macros: an algorithm for automatically learning macro-actions. In The neural information processing systems conference (NIPS’98) workshop on abstraction and hierarchy in reinforcement learning.
McGovern, A. (2002). Autonomous discovery of temporal abstractions from interactions with an environment. Ph.D. thesis, University of Massachusetts Amherts, May 2002.
McGovern, A., & Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. In ICML ’01: Proceedings of the 18th international conference on machine learning (pp. 361–368). San Mateo: Morgan Kaufmann.
Google Scholar
McGovern, A., & Sutton, R. S. (1998). Macro-actions in reinforcement learning: an empirical analysis. Technical Report 98-79, University of Massachusetts, Department of Computer Science.
Menache, I., Mannor, S., & Shimkin, N. (2002). Q-cut—dynamic discovery of sub-goals in reinforcement learning. In ECML ’02: Proceedings of the 13th European conference on machine learning (pp. 295–306). London: Springer.
Chapter Google Scholar
Noda, I., Matsubara, H., Hiraki, K., & Frank, I. (1998). Soccer server: a tool for research on multiagent systems. Applied Artificial Intelligence, 12(2–3), 233–250.
Google Scholar
Parr, R., & Russell, S. (1998). Reinforcement learning with hierarchies of machines. In NIPS ’97: Proceedings of the 1997 conference on advances in neural information processing systems 10 (pp. 1043–1049). Cambridge: MIT Press.
Google Scholar
Parr, R. E. (1998). Hierarchical control and learning for Markov decision processes. Ph.D. thesis, University of California at Berkeley.
Piater, J. H., Cohen, P. R., Zhang, X., & Atighetchi, M. (1998). A randomized ANOVA procedure for comparing performance curves. In ICML ’98: Proceedings of the fifteenth international conference on machine learning (pp. 430–438). San Mateo: Morgan Kaufmann.
Google Scholar
Precup, D., Sutton, R. S., & Singh, S. P. (1998). Theoretical results on reinforcement learning with temporally abstract options. In European conference on machine learning (pp. 382–393).
Simsek, O., & Barto, A. G. (2004). Using relative novelty to identify useful temporal abstractions in reinforcement learning. In ICML ’04: Proceedings of the 21st international conference on machine learning. Banff, Canada.
Simsek, O., Wolfe, A. P., & Barto, A. G. (2005). Identifying useful subgoals in reinforcement learning by local graph partitioning. In ICML ’05: Proceedings of the 22nd international conference on machine learning.
Stolle, M., & Precup, D. (2002). Learning options in reinforcement learning. In Proceedings of the 5th international symposium on abstraction, reformulation and approximation (pp. 212–223). London: Springer.
Chapter Google Scholar
Stone, P., & Sutton, R. S. (2001). Scaling reinforcement learning toward RoboCup soccer. In Proceedings of the eighteenth international conference on machine learning (pp. 537–544). San Mateo: Morgan Kaufmann.
Google Scholar
Stone, P., Sutton, R. S., & Kuhlmann, G. (2005). Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3), 165–188.
Article Google Scholar
Stone, P., Kuhlmann, G., Taylor, M. E., & Liu, Y. (2006). Keepaway soccer: from machine learning testbed to benchmark. In I. Noda, A. Jacoff, A. Bredenfeld, & Y. Takahashi (Eds.), RoboCup-2005: Robot Soccer World Cup IX (Vol. 4020, pp. 93–105). Berlin: Springer.
Chapter Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press. A Bradford Book.
Google Scholar
Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1–2), 181–211.
Article MATH MathSciNet Google Scholar
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3/4), 279–292.
Article MATH Google Scholar
Zang, P., Zhou, P., Minnen, D., & Isbell, C. (2009). Discovering options from example trajectories. In ICML ’09: Proceedings of the 26th annual international conference on machine learning (pp. 1217–1224). New York: ACM.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
Sertan Girgin & Faruk Polat
Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
Reda Alhajj
Department of Computer Science, Global University, Beirut, Lebanon
Reda Alhajj

Authors

Sertan Girgin
View author publications
You can also search for this author in PubMed Google Scholar
Faruk Polat
View author publications
You can also search for this author in PubMed Google Scholar
Reda Alhajj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reda Alhajj.

Additional information

Editor: R. Khardon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Girgin, S., Polat, F. & Alhajj, R. Improving reinforcement learning by using sequence trees. Mach Learn 81, 283–331 (2010). https://doi.org/10.1007/s10994-010-5182-y

Download citation

Received: 11 September 2007
Revised: 16 March 2010
Accepted: 17 March 2010
Published: 24 April 2010
Issue Date: December 2010
DOI: https://doi.org/10.1007/s10994-010-5182-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Improving reinforcement learning by using sequence trees

Abstract

Article PDF

Similar content being viewed by others

Bayesian Reinforcement Learning with Exploration

Local Roots: A Tree-Based Subgoal Discovery Method to Accelerate Reinforcement Learning

Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving reinforcement learning by using sequence trees

Abstract

Article PDF

Similar content being viewed by others

Bayesian Reinforcement Learning with Exploration

Local Roots: A Tree-Based Subgoal Discovery Method to Accelerate Reinforcement Learning

Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation