Options with Exceptions

  • Munu Sairamesh
  • Balaraman Ravindran
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7188)

Abstract

An option is a policy fragment that represents a solution to a frequent subproblem encountered in a domain. Options may be treated as temporally extended actions thus allowing us to reuse that solution in solving larger problems. Often, it is hard to find subproblems that are exactly the same. These differences, however small, need to be accounted for in the reused policy. In this paper, the notion of options with exceptions is introduced to address such scenarios. This is inspired by the Ripple Down Rules approach used in data mining and knowledge representation communities. The goal is to develop an option representation so that small changes in the subproblem solutions can be accommodated without losing the original solution. We empirically validate the proposed framework on a simulated game domain.

Keywords

Options framework Transfer Learning Maintenance of skills 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barto, A.G., Mahadevan, S.: Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems 13(1-2) (2003)Google Scholar
  2. 2.
    Taylor, M.E., Stone, P.: Transfer Learning for Reinforcement Learning Domains: A Survey. Journal of Machine Learning Research 10, 1633–1685 (2009)MathSciNetMATHGoogle Scholar
  3. 3.
    Dietterich, T.G.: Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)MathSciNetMATHGoogle Scholar
  4. 4.
    McCallum, A.K.: Reinforcement Learning with Selective Perception and Hidden State, Ph.D. Thesis, Department of Computer Science, The College of Arts and Science, University of Rocheater, USA (1995)Google Scholar
  5. 5.
    Asadi, M., Huber, M.: Autonomous Subgoal Discovery and Hierarchical Abstraction Learned Policies. In: FLAIRS Conference, pp. 346–350 (2003)Google Scholar
  6. 6.
    Gaines, B.R., Compton, P.: Induction of Ripple-Down Rules Applied to Modeling Large Database. Knowledge Acquisition 2(3), 241–258 (1995)Google Scholar
  7. 7.
    McGovern, A.: Autonomous Discovery of Temporal Abstraction from Interaction with An Environment, Ph.D. Thesis, Department of Computer Science, University of Massachusetts, Amherst, USA (2002)Google Scholar
  8. 8.
    Precup, D.: Temporal Abstraction in Reinforcement Learning, Ph.D. Thesis, Department of Computer Science, University of Massachusetts, Amherst, USA (2000)Google Scholar
  9. 9.
    McGovern, A., Barto, A.G.: Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density. In: Proc. 18th International Conf. on Machine Learning, pp. 361–368. Morgan Kaufmann, San Francisco (2001)Google Scholar
  10. 10.
    Bradtke, S.J., Duff, M.O.: Reinforcement Learning Methods for Continuous-Time Markov Decision Problems. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 393–400. The MIT Press (1995)Google Scholar
  11. 11.
    Sutton, R.S., Precup, D.: Intra-option learning about temporally abstract actions. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 556–564. Morgan Kaufman (1998)Google Scholar
  12. 12.
    Kaelbling, L.P.: Hierarchical learning in stochastic domains: Preliminary results. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 167–173 (1993)Google Scholar
  13. 13.
    Ravindran, B., Barto, A.G.: Relativized Options: Choosing the Right Transformation. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 608–615 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Munu Sairamesh
    • 1
  • Balaraman Ravindran
    • 1
  1. 1.Indian Institute of Technology MadrasIndia

Personalised recommendations