Learning Options in Reinforcement Learning

Stolle, Martin; Precup, Doina

doi:10.1007/3-540-45622-8_16

Martin Stolle³ &
Doina Precup³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2371))

Included in the following conference series:

International Symposium on Abstraction, Reformulation, and Approximation

1560 Accesses
68 Citations

Abstract

Temporally extended actions (e.g., macro actions) have proven very useful for speeding up learning, ensuring robustness and building prior knowledge into AI systems. The options framework (Precup, 2000; Sutton, Precup & Singh, 1999) provides a natural way of incorporating such actions into reinforcement learning systems, but leaves open the issue of how good options might be identified. In this paper, we empirically explore a simple approach to creating options. The underlying assumption is that the agent will be asked to perform different goal-achievement tasks in an environment that is othertherwise the same over time. Our approach is based on the intuition that states that are frequently visited on system trajectories, could prove to be useful subgoals (e.g., McGovern & Barto, 2001; Iba, 1989).

We propose a greedy algorithm for identifying subgoals based on state visitation counts. We present empirical studies of this approach in two gridworld navigation tasks. One of the environments we explored contains bottleneck states, and the algorithm indeed finds these states, as expected. The second environment is an empty gridworld with no obstacles. Although the environment does not contain any obvious subgoals, our approach still finds useful options, which essentially allow the agent to explore the environment more quickly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

[1995]Bradtke:SMDPQ Bradtke, S. J.,& Duff, M. O. (1995). Reinforcement learning methods for continuous-time Markov Decision Problems. Advances in Neural Information Processing Systems 7 (pp. 393–400). MIT Press.
Google Scholar
[1998]Dietterich:MAXQ Dietterich, T. G. (1998). The MAXQ method for hierarchical reinforcement learning. Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann.
Google Scholar
[1972]Fikes:RobotPlan Fikes, R., P.E. Hart, & Nilsson, N. J. (1972). Learning and executing generalized robot plans. Artificial Intelligence, 3, 251–288.
Article Google Scholar
[1989]Iba:Macro Iba, G. A. (1989). A heuristic approach to the discovery of macro-operators. Machine Learning, 3, 285–317.
Google Scholar
[1985]Korf:Macro Korf, R. E. (1985). Learning to solve problems by searching for macro-operators. Pitman Publishing Ltd.
Google Scholar
[1986]Laird:ChunkSOAR Laird, J. E., Rosenbloom, P. S., & Newell, A. (1986). Chunking in SOAR: The anatomy of a general learning mechanism. Machine Learning, 1, 11–46.
Google Scholar
[1997]Mahadevan:SMDP Mahadevan, S., Mar-challek, N., Das, T. K., & Gosavi, A. (1997). Self-improving factory simulation using continuous-time average-reward reinforcement learning. Proceedings of the Fourteenth International Conference on Machine Learning (pp. 202–210). Morgan Kaufmann.
Google Scholar
[1997]McGovern:MacroRL McGovern, A., Sutton, R.S., & Fagg, A. H. (1997). Roles of macro-actions in accelerating reinforcement learning. Grace Hopper Celebration of Women in Computing (pp. 13–17).
Google Scholar
[2002]McGovern:Thesis McGovern, E. A. (2002). Autonomous discovery of temporal abstractions from interaction with an environment. Doctoral dissertation, University of Massachusetts, Amherst.
Google Scholar
[2001]McGovern:ICML McGovern, E. A., & Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. Proceedings of the Eighteenth International Conference on Machine Learning (pp. 361–368). Morgan Kaufman.
Google Scholar
[1988]Minton:BookMinton, S. (1988). Learning search control knowledge. An explanation-based approach. Kluwer Academic Publishers.
Google Scholar
[1972]Newell:Simon Newell, A., & Simon, H. A. (1972). Human problem solving. Prentice-Hall.
Google Scholar
[1998]Parr:Thesis Parr, R. (1998). Hierarchical control and learning for Markov Decision Processes. Doctoral dissertation, Computer Science Division, University of California, Berkeley, USA.
Google Scholar
[1998]Parr:HAMs Parr, R., & Russell, S. (1998). Reinforcement learning with hierarchies of machines. Advances in Neural Information Processing Systems 10. MIT Press.
Google Scholar
[2000]Precup:Thesis Precup, D. (2000)._Temporal abstraction in reinforcement learning. Doctoral dissertation, Department of Computer Science, University of Massachusetts, Amherst, USA.
Google Scholar
[1994]Puterman:Book Puterman, M. L. (1994). Markov Decision Processes: Discrete stochastic dynamic programming. Wiley.
Google Scholar
[1974]Sacerdoti:PlanArt Sacerdoti, E. D. (1974). Planning in a hierarchy of abstraction spaces. Artificial Intelligence, 5, 115–135.
Google Scholar
[1992]Singh:HDynaAAAI Singh, S. P. (1992). Reinforcement learning with a hierarchy of abstract models. Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 202–207). MIT/AAAI Press.
Google Scholar
[1998]Sutton:BookSutton,R.S.,&Barto, A.G. (1998). Reinforcement learning: An introduction. MIT Press.
Google Scholar
[1999]Precup:Options Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181–211.
Google Scholar
[1989]Watkins:Qlearn Watkins, C. J. C. H. (1989). Learning with delayed rewards. Doctoral dissertation, Psychology Department, Cambridge University, Cambridge, UK.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, McGill University, Montreal, QC, H3A 2A7
Martin Stolle & Doina Precup

Authors

Martin Stolle
View author publications
You can also search for this author in PubMed Google Scholar
Doina Precup
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Computing, Georgia Institute of Technology, 801 Atlantic Dr NW, Atlanta, GA, 30332-0280, USA
Sven Koenig
Department of Computing Science, Universtity of Alberta, 2-21 Athabasca Hall, Edmonton, Alberta, T6G 2E8, Canada
Robert C. Holte

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stolle, M., Precup, D. (2002). Learning Options in Reinforcement Learning. In: Koenig, S., Holte, R.C. (eds) Abstraction, Reformulation, and Approximation. SARA 2002. Lecture Notes in Computer Science(), vol 2371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45622-8_16

Download citation

DOI: https://doi.org/10.1007/3-540-45622-8_16
Published: 09 July 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43941-7
Online ISBN: 978-3-540-45622-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics