Abstract
Temporally extended actions such as options are known to lead to improvements in reinforcement learning (RL). At the same time, transfer learning across different RL tasks is an increasingly active area of research. Following Baxter’s formalism for transfer, the corresponding RL question considers the benefit that an RL agent can achieve on new tasks based on experience from previous tasks in a common “learning environment”. We address this in the specific context of goal-based multi-task RL, where the different tasks correspond to different goal states within a common state space, and we introduce Landmark Options Via Reflection (LOVR), a flexible framework that uses options to transfer domain knowledge. As an explicit analog of principles in transfer learning, we provide theoretical and empirical results demonstrating that when a set of landmark states covers the state space suitably, then a LOVR agent that learns optimal value functions for these in an initial phase and deploys the associated optimal policies as options in the main phase, can achieve a drastic reduction in cumulative regret compared to baseline approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahissar, M., Hochstein, S.: The reverse hierarchy theory of visual perceptual learning. Trends Cogn. Sci. 8, 457–464 (2004)
Barto, A., Mahedevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dyn. Syst. 13, 341–379 (2003)
Baxter, J.: A model of inductive bias learning. J. Artif. Intell. Res. 12, 149–198 (2000)
Bourne, J., Rosa, M.: Hierarchical development of the primate visual cortex, as revealed by neurofilament immunireactivity: early maturation of the middle temporal area (MT). Cereb. Cortex 16, 405–514 (2006)
Brunskill, E., Li, L.: Sample complexity of multi-task reinforcement learning. In: Conference on Uncertainty in Artificial Intelligence (UAI) (2013)
Dayan, P., Hinton, G.: Feudal reinforcement learning. In: NIPS, pp. 271–278 (1998)
Frans, K., Ho, J., Abbeel, P., Schulman, J.: Meta learning shared hierarchies. Technical report (2017). arxiv:1710.09767 [cs.LG]
Guergiuev, J., Lillicrap, T., Richards, B.: Towards deep learning with segregated dendrites. Technical report (2016). arxiv:1610.00161 [cs.LG]
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. Mach. Learn. 49, 209–232 (2002)
Koenig, S., Simmons, R.: Complexity analysis of real-time reinforcement learning. In: AAAI, pp. 99–105 (1993)
Konidaris, G., Barto, A.: Building portable options: skill transfer in reinforcement learning. In: IJCAI, pp. 895–900 (2007)
Laroche, R., Fatemi, M., Romoff, J., van Seijen, H.: Multi-advisor reinforcement learning. Technical report (2017). arxiv:1704.00756 [cs.LG]
Liu, Y., Brunskill, E.: When simple exploration is sample efficient: identifying sufficient conditions for random exploration to yield PAC RL algorithms. In: European Workshop on Reinforcement Learning (2018)
Mann, T., Mannor, S., Precup, D.: Approximate value iteration with temporally extended actions. J. Artif. Intell. Res. 53, 375–438 (2015)
Perkins, T., Precup, D.: Using options for knowledge transfer in reinforcement learning. Technical report UM-CS-99-34 (1999)
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: ICML (2015)
van Seijen, H., Fatemi, M., Romoff, J., Larcohe, R., Barnes, T., Tsang, J.: Hybrid reward architecture for reinforcement learning. Technical report (2017). arxiv:1706.04208 [cs.LG]
van Seijen, H., Fatemi, M., Romoff, J., Laroche, R.: Separation of concerns in reinforcement learning. Technical report (2017). arxiv:1612.05159 [cs.LG]
Silver, D., Yang, Q., Li, L.: Lifelong machine learning systems: beyond learning algorithms. In: AAAI Spring Symposium: Lifelong Machine Learning, pp. 49–55 (2013)
Strehl, A., Li, L., Wiewiora, E., Langford, J., Littman, M.: PAC model-free reinforcement learning. In: ICML, pp. 881–888 (2006)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2016)
Sutton, S., Precup, D., Singh, S.: Beteween mdps and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 181–211 (1999)
Thrun, S., Pratt, L.: Learning to Learn. Kluwer Academic Publishers, Norwell (1998)
Vezhnevets, A., et al.: Feudal networks for hierarchical reinforcement learning. Technical report (2017). arxiv:1703.01161 [cs.LG]
Acknowledgments
The authors thank Doina Precup for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Denis, N., Fraser, M. (2019). Options in Multi-task Reinforcement Learning - Transfer via Reflection. In: Meurs, MJ., Rudzicz, F. (eds) Advances in Artificial Intelligence. Canadian AI 2019. Lecture Notes in Computer Science(), vol 11489. Springer, Cham. https://doi.org/10.1007/978-3-030-18305-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-18305-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18304-2
Online ISBN: 978-3-030-18305-9
eBook Packages: Computer ScienceComputer Science (R0)