Model-Based Policy Optimization with Neural Differential Equations for Robotic Arm Control

Gorodetskiy, Andrey; Mironov, Konstantin; Panov, Aleksandr

doi:10.1007/978-3-031-43111-1_23

Andrey Gorodetskiy¹⁰,
Konstantin Mironov^10,11 &
Aleksandr Panov^10,11,12

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14214))

Included in the following conference series:

International Conference on Interactive Collaborative Robotics

438 Accesses

Abstract

Applying learning-based control methods to real robots presents hard challenges, including the low sample efficiency of model-free reinforcement learning algorithms. The widely adopted approach to tackling this problem uses an environment dynamics model. We propose to use the Neural Ordinary Differential Equations to approximate transition dynamics as this allows for finer control of a trajectory generation process. NODE offers a continuous-time formulation that captures the temporal dependencies. We evaluate our approach on various tasks from simulation environment including learning 6-DoF robotic arm to open the door, which represents particular challenges for policy search. The NODE model is trained to predict movement of the arm and the door, and is used to generate trajectories for the model-based policy optimization. Our method shows better sample efficiency on this task comparing to the model-free and model-based baseline. It also shows comparable results on several other tasks. The application of NODE to model-based reinforcement learning enables more precise modeling of robotic system dynamics and enhances the sample efficiency of learning-based control methods. The empirical evaluation on various tasks demonstrates the efficacy of our approach, offering promising prospects for improving the performance and efficiency of real-world robotic systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kalashnikov, D., et al.: Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293 (2018)
Kalashnikov, D., et al.: Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on Robot Learning, pp. 651–673 (2018)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Ha, D., Schmidhuber, J.: World models. arXiv preprint arXiv:1803.10122 (2018)
Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019)
Google Scholar
Yildiz, C., Heinonen, M., Lahdesmaki, H.: Continuous-time model-based reinforcement learning. In: International Conference on Machine Learning, pp. 12009–12018 (2021)
Google Scholar
Du, J., Futoma, J., Doshi-Velez, F.: Model-based reinforcement learning for semi-markov decision processes with neural odes. Adv. Neural. Inf. Process. Syst. 33, 19805–19816 (2020)
Google Scholar
Rubanova, Y., Chen, R.T., Duvenaud, D.K.: Latent ordinary differential equations for irregularly-sampled time series. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Kidger, P., Morrill, J., Foster, J., Lyons, T.: Neural controlled differential equations for irregular time series. Adv. Neural. Inf. Process. Syst. 33, 6696–6707 (2020)
Google Scholar
Alvarez, V.M.M., Rosca, R., Falcutescu, C.G.: Dynode: Neural ordinary differential equations for dynamics modeling in continuous control. arXiv preprint arXiv:2009.04278 (2020)
Ivashko, A., Safonov, G.: Machine learning model for determination of the optimal strategy in an online auction. Inf. Autom. 22(1), 146–167 (2023). https://doi.org/10.15622/ia.22.1.6
Article Google Scholar
Hung, N., Loi, T., Huong, N., Hang, T.T., Huong, T.: AAFNDL – an accurate fake in-formation recognition model using deep learning for the Vietnamese language. Inf. Autom. 22(4), 795–825 (2023). https://doi.org/10.15622/ia.22.4.4
Article Google Scholar
Osipov, V., Kuleshov, S., Miloserdov, D., Zaytseva, A., Aksenov, A.: Recurrent neural networks with continuous learning in problems of news streams multifunctional pro-cessing. Inf. Autom. 21(6), 1145–1168 (2022). https://doi.org/10.15622/ia.21.6.3
Article Google Scholar
Favorskaya, M., Nishchhal, N.: Verification of marine oil spills using aerial images based on deep learning methods. Inf. Autom. 21(5), 937–962 (2022). https://doi.org/10.15622/ia.21.5.4
Article Google Scholar
Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free finetuning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566 (2018)
Google Scholar
Janner, M., Fu, J., Zhang, M., Levine, S.: When to trust your model: model-based policy optimization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Hafner, D., Lillicrap, T., Ba, J., Norouzi, M.: Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603 (2019)
Hafner, D., Lillicrap, T., Norouzi, M., Ba, J.: Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193 (2020)
Zhong, Y.D., Dey, B., Chakraborty, A.: Symplectic ode-net: Learning hamiltonian dynamics with control. arXiv preprint arXiv:1909.12077 (2019)
Greydanus, S., Dzamba, M., Yosinski, J.: Hamiltonian neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Yu, T., et al.: Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning. In: Conference on Robot Learning, pp. 1094–1100 (2020)
Google Scholar
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. arXiv preprint arXiv:1610.00633 (2016)
Zhu, H., Gupta, A., Rajeswaran, A., Levine, S., Kumar, V.: Dexterous manipulation with deep reinforcement learning: efficient, general, and low-cost. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3651–3657 (2019). https://doi.org/10.1109/ICRA.2019.8794102
Singh, A., Yang, L., Hartikainen, K., Finn, C., Levine, S.: End-to-end robotic reinforcement learning without reward engineering. arXiv preprint arXiv:1904.07854 (2019)
Sehgal, A., La, H., Louis, S., Nguyen, H.: Deep reinforcement learning using genetic algorithm for parameter optimization. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), pp. 596–601 (2019). https://doi.org/10.1109/IRC.2019.00121
Huang, Y., Xie, K., Bharadhwaj, H., Shkurti, F.: Continual model-based reinforcement learning with hypernetworks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 799–805 (2021). https://doi.org/10.1109/ICRA48506.2021.9560793
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: offpolicy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870 (2018)
Google Scholar
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Tunyasuvunakool, S., et al.: dm control: Software and tasks for continuous control. Softw. Impacts 6, 100022 (2020)
Article Google Scholar
Hafner, D., Pasukonis, J., Ba, J., Lillicrap, T.: Mastering Diverse Domains through World Models. arXiv preprint arXiv:2301.04104 (2023)

Download references

Author information

Authors and Affiliations

Center of Cognitive Modeling, Moscow Institute of Physics and Technology, 9, Institutskij per., Dolgoprudny, 141701, Russia
Andrey Gorodetskiy, Konstantin Mironov & Aleksandr Panov
Artificial Intelligence Research Institute, Moscow, 105064, Russia
Konstantin Mironov & Aleksandr Panov
Federal Research Center “Computer Science and Control”, 9, 60-letiya Oktyabrya pr., Moscow, 117312, Russia
Aleksandr Panov

Authors

Andrey Gorodetskiy
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Mironov
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandr Panov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrey Gorodetskiy .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Andrey Ronzhin
Institute of Control Systems of Ministry of Science and Education of the Republic of Azerbaijan, Baku, Azerbaijan
Aminagha Sadigov
V.A. Trapeznikov Institute of Control Sciences of the Russian Academy of Sciences, Moscow, Russia
Roman Meshcheryakov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gorodetskiy, A., Mironov, K., Panov, A. (2023). Model-Based Policy Optimization with Neural Differential Equations for Robotic Arm Control. In: Ronzhin, A., Sadigov, A., Meshcheryakov, R. (eds) Interactive Collaborative Robotics. ICR 2023. Lecture Notes in Computer Science(), vol 14214. Springer, Cham. https://doi.org/10.1007/978-3-031-43111-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-43111-1_23
Published: 05 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43110-4
Online ISBN: 978-3-031-43111-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics