Addressing Task Prioritization in Model-based Reinforcement Learning

Zholus, Artem; Ivchenkov, Yaroslav; Panov, Aleksandr I.

doi:10.1007/978-3-031-19032-2_3

Artem Zholus⁶,
Yaroslav Ivchenkov⁶ &
Aleksandr I. Panov^7,8

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1064))

Included in the following conference series:

International Conference on Neuroinformatics

525 Accesses
1 Citations

Abstract

World models facilitate sample-efficient reinforcement learning (RL) and, by design, can benefit from the multitask information. However, it is not used by typical model-based RL (MBRL) agents. We propose a data-centric approach to this problem. We build a controllable optimization process for MBRL agents that selectively prioritizes the data used by the model-based agent to improve its performance. We show how this can favor implicit task generalization in a custom environment based on MetaWorld with a parametric task variability. Furthermore, by bootstrapping the agent’s data, our method can boost the performance on unstable environments from DeepMind Control Suite. This is done without any additional data and architectural changes outperforming state-of-the-art visual model-based RL algorithms. Additionally, we frame the approach within the scope of methods that have unintentionally followed the controllable optimization process paradigm, filling the gap of the data-centric task-bootstrapping methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mnih, V., et al.: Playing Atari with deep reinforcement learning (2013). CoRR, abs/1312.5602
Google Scholar
Chaplot, D.S., Gandhi, D., Gupta, A., Salakhutdinov, R.: Object goal navigation using goal-oriented semantic exploration. Adv. Neural Inf. Syst. 33, 1–11 (2020)
Google Scholar
Staroverov, A., Panov, A.: Hierarchical landmark policy optimization for visual indoor navigation. IEEE Access 10, 70447–70455 (2022)
Article Google Scholar
Yu, L., Shao, X., Wei, Y., Zhou, K.: Intelligent land-vehicle model transfer trajectory planning method based on deep reinforcement learning. Sensors. 18, 2905 (2018)
Article Google Scholar
Gorbov, G., Jamal, M., Panov, A.: Learning adaptive parking Manevours for self-driving cars. In: Proceedings of the Sixth International Scientific Conference "Intelligent Information Technologies for Industry" (IITI 2022). IITI 2022. Lecture Notes in Networks and Systems (2022)
Google Scholar
Zhu, H., et al.: The ingredients of real-world robotic reinforcement learning. ICLR 2020, 1–20 (2020)
Google Scholar
Andrychowicz, M., et al.: Hindsight experience replay (2017). CoRR, abs/1707.01495
Google Scholar
Oh, J., Singh, S.P. Lee, H., Kohli, P.: Zero-shot task generalization with multi-task deep reinforcement learning (2017). CoRR, abs/1706.05064
Google Scholar
Kulkarni, T.D., Narasimhan, K.R., Saeedi, A., Tenenbaum, J.B.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. Adv. Neural Inf. Syst. 29, 1–9 (2016)
Google Scholar
Li, A.C., Florensa, C., Clavera, I., Abbeel, P.: Sub-policy adaptation for hierarchical reinforcement learning. In: ICLR 2020, pp. 1–15 (2020)
Google Scholar
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks (2017)
Google Scholar
Duan, Y., Schulman, J., Chen, X., Bartlett, P.L., Sutskever, I., Abbeel, P.: Rl\(^2\): fast reinforcement learning via slow reinforcement learning (2016). CoRR, abs/1611.02779
Google Scholar
Skrynnik, A., et al.: Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations. Knowl. Based Syst. 218, 106844 (2021)
Article Google Scholar
Skrynnik, A., Staroverov, A., Aitygulov, E., Aksenov, K., Davydov, V., Panov, A.I.: Hierarchical deep q-network from imperfect demonstrations in Minecraft. Cogn. Syst. Res. 65, 74–78 (2021)
Article Google Scholar
Moerland, T.M., Broekens, J., Jonker, C.M.: Model-based reinforcement learning: a survey (2021)
Google Scholar
Zholus, Artem, Panov, Aleksandr I..: Case-based task generalization in model-based reinforcement learning. In: Goertzel, Ben, Iklé, Matthew, Potapov, Alexey (eds.) AGI 2021. LNCS (LNAI), vol. 13154, pp. 344–354. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93758-4_35
Chapter Google Scholar
Panov, A.I.: Simultaneous learning and planning in a hierarchical control system for a cognitive agent. Autom Rem. Control. 83(6), 869–883 (2022). https://doi.org/10.1134/S0005117922060054
Article MathSciNet MATH Google Scholar
Kirk, R., Zhang, A., Grefenstette, E., Rocktäschel, T.: A survey of generalisation in deep reinforcement learning (2021). CoRR, abs/2111.09794
Google Scholar
Sekar, R., Rybkin, O., Daniilidis, K., Abbeel, P., Hafner, D., Pathak, D.: Planning to explore via self-supervised world models. In: ICML (2020)
Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay (2016)
Google Scholar
Jiang, M., Grefenstette, E., Rocktäschel, T.: Prioritized level replay (2020). CoRR, abs/2010.03934
Google Scholar
Yoon, J., Arik, S., Pfister, T.: Data valuation using reinforcement learning (2019). CoRR, abs/1909.11671
Google Scholar
Hafner, D., et al.: Learning latent dynamics for planning from pixels (2018). CoRR, abs/1811.04551
Google Scholar
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Advances in Neural Information Processing Systems, vol. 31, pp. 2451–2463. Curran Associates Inc. (2018). https://worldmodels.github.io
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. ICLR 2016 (2015). arxiv:1511.05952Comment
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bull. 2(4), 160–163 (1991)
Article Google Scholar
Hafner, D., Lillicrap, T., Ba, J., Norouzi, M.: Dream to control: Learning behaviors by latent imagination. In: International Conference on Learning Representations (2020)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press, Cambridge (2018)
MATH Google Scholar
Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation (2013). CoRR, abs/1308.3432
Google Scholar
Pritzel, A., et al.: Neural episodic control, Demis Hassabis (2017)
Google Scholar
Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight examples for robust deep learning (2019)
Google Scholar
Tassa, Y., et al.: dm_control: software and tasks for continuous control (2020)
Google Scholar
Tianhe, Y., et al.: and Sergey Levine. A benchmark and evaluation for multi-task and meta reinforcement learning, Meta-world (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Moscow, Russia
Artem Zholus & Yaroslav Ivchenkov
AIRI, Moscow, Russia
Aleksandr I. Panov
Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Moscow, Russia
Aleksandr I. Panov

Authors

Artem Zholus
View author publications
You can also search for this author in PubMed Google Scholar
Yaroslav Ivchenkov
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandr I. Panov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Artem Zholus .

Editor information

Editors and Affiliations

Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Boris Kryzhanovsky
Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Witali Dunin-Barkowski
Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Vladimir Redko
Moscow Aviation Institute (National Research University), Moscow, Russia
Yury Tiumentsev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zholus, A., Ivchenkov, Y., Panov, A.I. (2023). Addressing Task Prioritization in Model-based Reinforcement Learning. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research VI. NEUROINFORMATICS 2022. Studies in Computational Intelligence, vol 1064. Springer, Cham. https://doi.org/10.1007/978-3-031-19032-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-19032-2_3
Published: 19 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19031-5
Online ISBN: 978-3-031-19032-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Addressing Task Prioritization in Model-based Reinforcement Learning