Skip to main content

Options in Multi-task Reinforcement Learning - Transfer via Reflection

  • Conference paper
  • First Online:
Book cover Advances in Artificial Intelligence (Canadian AI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11489))

Included in the following conference series:

Abstract

Temporally extended actions such as options are known to lead to improvements in reinforcement learning (RL). At the same time, transfer learning across different RL tasks is an increasingly active area of research. Following Baxter’s formalism for transfer, the corresponding RL question considers the benefit that an RL agent can achieve on new tasks based on experience from previous tasks in a common “learning environment”. We address this in the specific context of goal-based multi-task RL, where the different tasks correspond to different goal states within a common state space, and we introduce Landmark Options Via Reflection (LOVR), a flexible framework that uses options to transfer domain knowledge. As an explicit analog of principles in transfer learning, we provide theoretical and empirical results demonstrating that when a set of landmark states covers the state space suitably, then a LOVR agent that learns optimal value functions for these in an initial phase and deploys the associated optimal policies as options in the main phase, can achieve a drastic reduction in cumulative regret compared to baseline approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahissar, M., Hochstein, S.: The reverse hierarchy theory of visual perceptual learning. Trends Cogn. Sci. 8, 457–464 (2004)

    Article  Google Scholar 

  2. Barto, A., Mahedevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dyn. Syst. 13, 341–379 (2003)

    Article  MathSciNet  Google Scholar 

  3. Baxter, J.: A model of inductive bias learning. J. Artif. Intell. Res. 12, 149–198 (2000)

    Article  MathSciNet  Google Scholar 

  4. Bourne, J., Rosa, M.: Hierarchical development of the primate visual cortex, as revealed by neurofilament immunireactivity: early maturation of the middle temporal area (MT). Cereb. Cortex 16, 405–514 (2006)

    Article  Google Scholar 

  5. Brunskill, E., Li, L.: Sample complexity of multi-task reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence (UAI) (2013)

    Google Scholar 

  6. Dayan, P., Hinton, G.: Feudal reinforcement learning. In: NIPS, pp. 271–278 (1998)

    Google Scholar 

  7. Frans, K., Ho, J., Abbeel, P., Schulman, J.: Meta learning shared hierarchies. Technical report (2017). arxiv:1710.09767 [cs.LG]

  8. Guergiuev, J., Lillicrap, T., Richards, B.: Towards deep learning with segregated dendrites. Technical report (2016). arxiv:1610.00161 [cs.LG]

  9. Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. Mach. Learn. 49, 209–232 (2002)

    Article  Google Scholar 

  10. Koenig, S., Simmons, R.: Complexity analysis of real-time reinforcement learning. In: AAAI, pp. 99–105 (1993)

    Google Scholar 

  11. Konidaris, G., Barto, A.: Building portable options: skill transfer in reinforcement learning. In: IJCAI, pp. 895–900 (2007)

    Google Scholar 

  12. Laroche, R., Fatemi, M., Romoff, J., van Seijen, H.: Multi-advisor reinforcement learning. Technical report (2017). arxiv:1704.00756 [cs.LG]

  13. Liu, Y., Brunskill, E.: When simple exploration is sample efficient: identifying sufficient conditions for random exploration to yield PAC RL algorithms. In: European Workshop on Reinforcement Learning (2018)

    Google Scholar 

  14. Mann, T., Mannor, S., Precup, D.: Approximate value iteration with temporally extended actions. J. Artif. Intell. Res. 53, 375–438 (2015)

    Article  MathSciNet  Google Scholar 

  15. Perkins, T., Precup, D.: Using options for knowledge transfer in reinforcement learning. Technical report UM-CS-99-34 (1999)

    Google Scholar 

  16. Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: ICML (2015)

    Google Scholar 

  17. van Seijen, H., Fatemi, M., Romoff, J., Larcohe, R., Barnes, T., Tsang, J.: Hybrid reward architecture for reinforcement learning. Technical report (2017). arxiv:1706.04208 [cs.LG]

  18. van Seijen, H., Fatemi, M., Romoff, J., Laroche, R.: Separation of concerns in reinforcement learning. Technical report (2017). arxiv:1612.05159 [cs.LG]

  19. Silver, D., Yang, Q., Li, L.: Lifelong machine learning systems: beyond learning algorithms. In: AAAI Spring Symposium: Lifelong Machine Learning, pp. 49–55 (2013)

    Google Scholar 

  20. Strehl, A., Li, L., Wiewiora, E., Langford, J., Littman, M.: PAC model-free reinforcement learning. In: ICML, pp. 881–888 (2006)

    Google Scholar 

  21. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  22. Sutton, S., Precup, D., Singh, S.: Beteween mdps and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 181–211 (1999)

    Article  Google Scholar 

  23. Thrun, S., Pratt, L.: Learning to Learn. Kluwer Academic Publishers, Norwell (1998)

    Book  Google Scholar 

  24. Vezhnevets, A., et al.: Feudal networks for hierarchical reinforcement learning. Technical report (2017). arxiv:1703.01161 [cs.LG]

Download references

Acknowledgments

The authors thank Doina Precup for helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicholas Denis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Denis, N., Fraser, M. (2019). Options in Multi-task Reinforcement Learning - Transfer via Reflection. In: Meurs, MJ., Rudzicz, F. (eds) Advances in Artificial Intelligence. Canadian AI 2019. Lecture Notes in Computer Science(), vol 11489. Springer, Cham. https://doi.org/10.1007/978-3-030-18305-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18305-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18304-2

  • Online ISBN: 978-3-030-18305-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics