Abstract
Applying learning-based control methods to real robots presents hard challenges, including the low sample efficiency of model-free reinforcement learning algorithms. The widely adopted approach to tackling this problem uses an environment dynamics model. We propose to use the Neural Ordinary Differential Equations to approximate transition dynamics as this allows for finer control of a trajectory generation process. NODE offers a continuous-time formulation that captures the temporal dependencies. We evaluate our approach on various tasks from simulation environment including learning 6-DoF robotic arm to open the door, which represents particular challenges for policy search. The NODE model is trained to predict movement of the arm and the door, and is used to generate trajectories for the model-based policy optimization. Our method shows better sample efficiency on this task comparing to the model-free and model-based baseline. It also shows comparable results on several other tasks. The application of NODE to model-based reinforcement learning enables more precise modeling of robotic system dynamics and enhances the sample efficiency of learning-based control methods. The empirical evaluation on various tasks demonstrates the efficacy of our approach, offering promising prospects for improving the performance and efficiency of real-world robotic systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kalashnikov, D., et al.: Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293 (2018)
Kalashnikov, D., et al.: Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on Robot Learning, pp. 651–673 (2018)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Ha, D., Schmidhuber, J.: World models. arXiv preprint arXiv:1803.10122 (2018)
Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019)
Yildiz, C., Heinonen, M., Lahdesmaki, H.: Continuous-time model-based reinforcement learning. In: International Conference on Machine Learning, pp. 12009–12018 (2021)
Du, J., Futoma, J., Doshi-Velez, F.: Model-based reinforcement learning for semi-markov decision processes with neural odes. Adv. Neural. Inf. Process. Syst. 33, 19805–19816 (2020)
Rubanova, Y., Chen, R.T., Duvenaud, D.K.: Latent ordinary differential equations for irregularly-sampled time series. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Kidger, P., Morrill, J., Foster, J., Lyons, T.: Neural controlled differential equations for irregular time series. Adv. Neural. Inf. Process. Syst. 33, 6696–6707 (2020)
Alvarez, V.M.M., Rosca, R., Falcutescu, C.G.: Dynode: Neural ordinary differential equations for dynamics modeling in continuous control. arXiv preprint arXiv:2009.04278 (2020)
Ivashko, A., Safonov, G.: Machine learning model for determination of the optimal strategy in an online auction. Inf. Autom. 22(1), 146–167 (2023). https://doi.org/10.15622/ia.22.1.6
Hung, N., Loi, T., Huong, N., Hang, T.T., Huong, T.: AAFNDL – an accurate fake in-formation recognition model using deep learning for the Vietnamese language. Inf. Autom. 22(4), 795–825 (2023). https://doi.org/10.15622/ia.22.4.4
Osipov, V., Kuleshov, S., Miloserdov, D., Zaytseva, A., Aksenov, A.: Recurrent neural networks with continuous learning in problems of news streams multifunctional pro-cessing. Inf. Autom. 21(6), 1145–1168 (2022). https://doi.org/10.15622/ia.21.6.3
Favorskaya, M., Nishchhal, N.: Verification of marine oil spills using aerial images based on deep learning methods. Inf. Autom. 21(5), 937–962 (2022). https://doi.org/10.15622/ia.21.5.4
Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free finetuning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566 (2018)
Janner, M., Fu, J., Zhang, M., Levine, S.: When to trust your model: model-based policy optimization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Hafner, D., Lillicrap, T., Ba, J., Norouzi, M.: Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603 (2019)
Hafner, D., Lillicrap, T., Norouzi, M., Ba, J.: Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193 (2020)
Zhong, Y.D., Dey, B., Chakraborty, A.: Symplectic ode-net: Learning hamiltonian dynamics with control. arXiv preprint arXiv:1909.12077 (2019)
Greydanus, S., Dzamba, M., Yosinski, J.: Hamiltonian neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Yu, T., et al.: Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning. In: Conference on Robot Learning, pp. 1094–1100 (2020)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. arXiv preprint arXiv:1610.00633 (2016)
Zhu, H., Gupta, A., Rajeswaran, A., Levine, S., Kumar, V.: Dexterous manipulation with deep reinforcement learning: efficient, general, and low-cost. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3651–3657 (2019). https://doi.org/10.1109/ICRA.2019.8794102
Singh, A., Yang, L., Hartikainen, K., Finn, C., Levine, S.: End-to-end robotic reinforcement learning without reward engineering. arXiv preprint arXiv:1904.07854 (2019)
Sehgal, A., La, H., Louis, S., Nguyen, H.: Deep reinforcement learning using genetic algorithm for parameter optimization. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), pp. 596–601 (2019). https://doi.org/10.1109/IRC.2019.00121
Huang, Y., Xie, K., Bharadhwaj, H., Shkurti, F.: Continual model-based reinforcement learning with hypernetworks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 799–805 (2021). https://doi.org/10.1109/ICRA48506.2021.9560793
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: offpolicy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870 (2018)
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Tunyasuvunakool, S., et al.: dm control: Software and tasks for continuous control. Softw. Impacts 6, 100022 (2020)
Hafner, D., Pasukonis, J., Ba, J., Lillicrap, T.: Mastering Diverse Domains through World Models. arXiv preprint arXiv:2301.04104 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gorodetskiy, A., Mironov, K., Panov, A. (2023). Model-Based Policy Optimization with Neural Differential Equations for Robotic Arm Control. In: Ronzhin, A., Sadigov, A., Meshcheryakov, R. (eds) Interactive Collaborative Robotics. ICR 2023. Lecture Notes in Computer Science(), vol 14214. Springer, Cham. https://doi.org/10.1007/978-3-031-43111-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-43111-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43110-4
Online ISBN: 978-3-031-43111-1
eBook Packages: Computer ScienceComputer Science (R0)