Skip to main content

Automatic Construction of Temporally Extended Actions for MDPs Using Bisimulation Metrics

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7188)

Abstract

Temporally extended actions are usually effective in speeding up reinforcement learning. In this paper we present a mechanism for automatically constructing such actions, expressed as options [24], in a finite Markov Decision Process (MDP). To do this, we compute a bisimulation metric [7] between the states in a small MDP and the states in a large MDP, which we want to solve. The shape of this metric is then used to completely define a set of options for the large MDP. We demonstrate empirically that our approach is able to improve the speed of reinforcement learning, and is generally not sensitive to parameter tuning.

Keywords

  • Optimal Policy
  • Reinforcement Learning
  • Markov Decision Process
  • Target Domain
  • Option Construction

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, J.R.: Act: A simple theory of complex cognition. American Psychologist 51, 355–365 (1996)

    CrossRef  Google Scholar 

  2. Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems 13(4), 341–379 (2003)

    CrossRef  MathSciNet  Google Scholar 

  3. Castro, P.S., Precup, D.: Using bisimulation for policy transfer in MDPs. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI 2010), pp. 1065–1070 (2010)

    Google Scholar 

  4. Comanici, G., Precup, D.: Optimal policy switching algorithms in reinforcement learning. In: Proceedings of AAMAS (2010)

    Google Scholar 

  5. Dietterich, T.G.: Hierarchical reinforcement learning with the maxq value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)

    MathSciNet  MATH  Google Scholar 

  6. Ferns, N., Castro, P.S., Precup, D., Panangaden, P.: Methods for computing state similarity in Markov Decision Processes. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI 2006), pp. 174–181 (2006)

    Google Scholar 

  7. Ferns, N., Panangaden, P., Precup, D.: Metrics for finite Markov decision processes. In: Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2004), pp. 162–169 (2004)

    Google Scholar 

  8. Givan, R., Dean, T., Greig, M.: Equivalence Notions and Model Minimization in Markov Decision Processes. Artificial Intelligence 147(1-2), 163–223 (2003)

    CrossRef  MathSciNet  MATH  Google Scholar 

  9. Jonsson, A., Barto, A.G.: Causal graph based decomposition of factored MDPs. Journal of Machine Learning Research 7, 2259–2301 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Konidaris, G., Kuindersma, S., Barto, A.G., Grupen, R.A.: Constructing skill trees for reinforcement learning agents from demonstration trajectories. In: Advances in Neural Information Processing Systems 23, pp. 1162–1170 (2010)

    Google Scholar 

  11. Laird, J., Bloch, M.K., the Soar Group: Soar home page (2011)

    Google Scholar 

  12. Mannor, S., Menache, I., Hoze, A., Klein, U.: Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004 (2004)

    Google Scholar 

  13. McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the 18th International Conference on Machine Learning, ICML 2001 (2001)

    Google Scholar 

  14. Mehta, N., Ray, S., Tapadalli, P., Dietterich, T.: Automatic discovery and transfer of maxq hierarchies. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008 (2008)

    Google Scholar 

  15. Mugan, J., Kuipers, B.: Autonomously learning an action hierarchy using a learned qualitative state representation. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (2009)

    Google Scholar 

  16. Parr, R., Russell, S.: Reinforcement learning with hierarchies of machines. In: Advances in Neural Information Processing Systems, NIPS 1998 (1998)

    Google Scholar 

  17. Precup, D.: Temporal Abstraction in Reinforcement Learning. PhD thesis, University of Massachusetts, Amherst (2000)

    Google Scholar 

  18. Ravindran, B., Barto, A.G.: Relativized options: Choosing the right transformation. In: Proceedings of the 20th International Conference on Machine Learning, ICML 2003 (2003)

    Google Scholar 

  19. Soni, V., Singh, S.: Using Homomorphism to Transfer Options across Reinforcement Learning Domains. In: Proceedings of AAAI Conference on Artificial Intelligence, AAAI 2006 (2006)

    Google Scholar 

  20. Sorg, J., Singh, S.: Transfer via Soft Homomorphisms. In: Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2009 (2009)

    Google Scholar 

  21. Stolle, M., Precup, D.: Learning Options in Reinforcement Learning. In: Koenig, S., Holte, R.C. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, p. 212. Springer, Heidelberg (2002)

    CrossRef  Google Scholar 

  22. Stone, P., Sutton, R.S., Kuhlmann, G.: Reinforcement learning for robocup-soccer keepaway. Adaptive Behavior 13(3), 165–188 (2005)

    CrossRef  Google Scholar 

  23. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  24. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)

    CrossRef  MathSciNet  MATH  Google Scholar 

  25. Taylor, J., Precup, D., Panangaden, P.: Bounding performance loss in approximate MDP homomorphisms. In: Proceedings of the Conference on Advances in Neural Information Processing Systems, NIPS 2009 (2009)

    Google Scholar 

  26. Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10, 1633–1685 (2009)

    MathSciNet  MATH  Google Scholar 

  27. Šimšek, Ö., Wolfe, A.P., Barto, A.G.: Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005 (2005)

    Google Scholar 

  28. Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)

    MATH  Google Scholar 

  29. Wolfe, A.P., Barto, A.G.: Defining object types and options using MDP homomorphisms. In: Proceedings of the ICML 2006 Workshop on Structural Knowledge Transfer for Machine Learning (2006)

    Google Scholar 

  30. Zang, P., Zhou, P., Minnen, D., Isbell, C.: Discovering options from example trajectories. In: Proceedings of the 26th International Conference on Machine Learning, ICML 2009 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Castro, P.S., Precup, D. (2012). Automatic Construction of Temporally Extended Actions for MDPs Using Bisimulation Metrics. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29946-9_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29945-2

  • Online ISBN: 978-3-642-29946-9

  • eBook Packages: Computer ScienceComputer Science (R0)