Skip to main content

Addressing Task Prioritization in Model-based Reinforcement Learning

  • Conference paper
  • First Online:
Advances in Neural Computation, Machine Learning, and Cognitive Research VI (NEUROINFORMATICS 2022)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1064))

Included in the following conference series:

Abstract

World models facilitate sample-efficient reinforcement learning (RL) and, by design, can benefit from the multitask information. However, it is not used by typical model-based RL (MBRL) agents. We propose a data-centric approach to this problem. We build a controllable optimization process for MBRL agents that selectively prioritizes the data used by the model-based agent to improve its performance. We show how this can favor implicit task generalization in a custom environment based on MetaWorld with a parametric task variability. Furthermore, by bootstrapping the agent’s data, our method can boost the performance on unstable environments from DeepMind Control Suite. This is done without any additional data and architectural changes outperforming state-of-the-art visual model-based RL algorithms. Additionally, we frame the approach within the scope of methods that have unintentionally followed the controllable optimization process paradigm, filling the gap of the data-centric task-bootstrapping methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mnih, V., et al.: Playing Atari with deep reinforcement learning (2013). CoRR, abs/1312.5602

    Google Scholar 

  2. Chaplot, D.S., Gandhi, D., Gupta, A., Salakhutdinov, R.: Object goal navigation using goal-oriented semantic exploration. Adv. Neural Inf. Syst. 33, 1–11 (2020)

    Google Scholar 

  3. Staroverov, A., Panov, A.: Hierarchical landmark policy optimization for visual indoor navigation. IEEE Access 10, 70447–70455 (2022)

    Article  Google Scholar 

  4. Yu, L., Shao, X., Wei, Y., Zhou, K.: Intelligent land-vehicle model transfer trajectory planning method based on deep reinforcement learning. Sensors. 18, 2905 (2018)

    Article  Google Scholar 

  5. Gorbov, G., Jamal, M., Panov, A.: Learning adaptive parking Manevours for self-driving cars. In: Proceedings of the Sixth International Scientific Conference "Intelligent Information Technologies for Industry" (IITI 2022). IITI 2022. Lecture Notes in Networks and Systems (2022)

    Google Scholar 

  6. Zhu, H., et al.: The ingredients of real-world robotic reinforcement learning. ICLR 2020, 1–20 (2020)

    Google Scholar 

  7. Andrychowicz, M., et al.: Hindsight experience replay (2017). CoRR, abs/1707.01495

    Google Scholar 

  8. Oh, J., Singh, S.P. Lee, H., Kohli, P.: Zero-shot task generalization with multi-task deep reinforcement learning (2017). CoRR, abs/1706.05064

    Google Scholar 

  9. Kulkarni, T.D., Narasimhan, K.R., Saeedi, A., Tenenbaum, J.B.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. Adv. Neural Inf. Syst. 29, 1–9 (2016)

    Google Scholar 

  10. Li, A.C., Florensa, C., Clavera, I., Abbeel, P.: Sub-policy adaptation for hierarchical reinforcement learning. In: ICLR 2020, pp. 1–15 (2020)

    Google Scholar 

  11. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks (2017)

    Google Scholar 

  12. Duan, Y., Schulman, J., Chen, X., Bartlett, P.L., Sutskever, I., Abbeel, P.: Rl\(^2\): fast reinforcement learning via slow reinforcement learning (2016). CoRR, abs/1611.02779

    Google Scholar 

  13. Skrynnik, A., et al.: Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations. Knowl. Based Syst. 218, 106844 (2021)

    Article  Google Scholar 

  14. Skrynnik, A., Staroverov, A., Aitygulov, E., Aksenov, K., Davydov, V., Panov, A.I.: Hierarchical deep q-network from imperfect demonstrations in Minecraft. Cogn. Syst. Res. 65, 74–78 (2021)

    Article  Google Scholar 

  15. Moerland, T.M., Broekens, J., Jonker, C.M.: Model-based reinforcement learning: a survey (2021)

    Google Scholar 

  16. Zholus, Artem, Panov, Aleksandr I..: Case-based task generalization in model-based reinforcement learning. In: Goertzel, Ben, Iklé, Matthew, Potapov, Alexey (eds.) AGI 2021. LNCS (LNAI), vol. 13154, pp. 344–354. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93758-4_35

    Chapter  Google Scholar 

  17. Panov, A.I.: Simultaneous learning and planning in a hierarchical control system for a cognitive agent. Autom Rem. Control. 83(6), 869–883 (2022). https://doi.org/10.1134/S0005117922060054

    Article  MathSciNet  MATH  Google Scholar 

  18. Kirk, R., Zhang, A., Grefenstette, E., Rocktäschel, T.: A survey of generalisation in deep reinforcement learning (2021). CoRR, abs/2111.09794

    Google Scholar 

  19. Sekar, R., Rybkin, O., Daniilidis, K., Abbeel, P., Hafner, D., Pathak, D.: Planning to explore via self-supervised world models. In: ICML (2020)

    Google Scholar 

  20. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay (2016)

    Google Scholar 

  21. Jiang, M., Grefenstette, E., Rocktäschel, T.: Prioritized level replay (2020). CoRR, abs/2010.03934

    Google Scholar 

  22. Yoon, J., Arik, S., Pfister, T.: Data valuation using reinforcement learning (2019). CoRR, abs/1909.11671

    Google Scholar 

  23. Hafner, D., et al.: Learning latent dynamics for planning from pixels (2018). CoRR, abs/1811.04551

    Google Scholar 

  24. Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Advances in Neural Information Processing Systems, vol. 31, pp. 2451–2463. Curran Associates Inc. (2018). https://worldmodels.github.io

  25. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. ICLR 2016 (2015). arxiv:1511.05952Comment

  26. Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bull. 2(4), 160–163 (1991)

    Article  Google Scholar 

  27. Hafner, D., Lillicrap, T., Ba, J., Norouzi, M.: Dream to control: Learning behaviors by latent imagination. In: International Conference on Learning Representations (2020)

    Google Scholar 

  28. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  29. Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation (2013). CoRR, abs/1308.3432

    Google Scholar 

  30. Pritzel, A., et al.: Neural episodic control, Demis Hassabis (2017)

    Google Scholar 

  31. Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight examples for robust deep learning (2019)

    Google Scholar 

  32. Tassa, Y., et al.: dm_control: software and tasks for continuous control (2020)

    Google Scholar 

  33. Tianhe, Y., et al.: and Sergey Levine. A benchmark and evaluation for multi-task and meta reinforcement learning, Meta-world (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Artem Zholus .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zholus, A., Ivchenkov, Y., Panov, A.I. (2023). Addressing Task Prioritization in Model-based Reinforcement Learning. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research VI. NEUROINFORMATICS 2022. Studies in Computational Intelligence, vol 1064. Springer, Cham. https://doi.org/10.1007/978-3-031-19032-2_3

Download citation

Publish with us

Policies and ethics