Skip to main content

Model-Based Policy Optimization with Neural Differential Equations for Robotic Arm Control

  • Conference paper
  • First Online:
Interactive Collaborative Robotics (ICR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14214))

Included in the following conference series:

  • 438 Accesses

Abstract

Applying learning-based control methods to real robots presents hard challenges, including the low sample efficiency of model-free reinforcement learning algorithms. The widely adopted approach to tackling this problem uses an environment dynamics model. We propose to use the Neural Ordinary Differential Equations to approximate transition dynamics as this allows for finer control of a trajectory generation process. NODE offers a continuous-time formulation that captures the temporal dependencies. We evaluate our approach on various tasks from simulation environment including learning 6-DoF robotic arm to open the door, which represents particular challenges for policy search. The NODE model is trained to predict movement of the arm and the door, and is used to generate trajectories for the model-based policy optimization. Our method shows better sample efficiency on this task comparing to the model-free and model-based baseline. It also shows comparable results on several other tasks. The application of NODE to model-based reinforcement learning enables more precise modeling of robotic system dynamics and enhances the sample efficiency of learning-based control methods. The empirical evaluation on various tasks demonstrates the efficacy of our approach, offering promising prospects for improving the performance and efficiency of real-world robotic systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kalashnikov, D., et al.: Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293 (2018)

  2. Kalashnikov, D., et al.: Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on Robot Learning, pp. 651–673 (2018)

    Google Scholar 

  3. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  4. Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  5. Ha, D., Schmidhuber, J.: World models. arXiv preprint arXiv:1803.10122 (2018)

  6. Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565 (2019)

    Google Scholar 

  7. Yildiz, C., Heinonen, M., Lahdesmaki, H.: Continuous-time model-based reinforcement learning. In: International Conference on Machine Learning, pp. 12009–12018 (2021)

    Google Scholar 

  8. Du, J., Futoma, J., Doshi-Velez, F.: Model-based reinforcement learning for semi-markov decision processes with neural odes. Adv. Neural. Inf. Process. Syst. 33, 19805–19816 (2020)

    Google Scholar 

  9. Rubanova, Y., Chen, R.T., Duvenaud, D.K.: Latent ordinary differential equations for irregularly-sampled time series. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  10. Kidger, P., Morrill, J., Foster, J., Lyons, T.: Neural controlled differential equations for irregular time series. Adv. Neural. Inf. Process. Syst. 33, 6696–6707 (2020)

    Google Scholar 

  11. Alvarez, V.M.M., Rosca, R., Falcutescu, C.G.: Dynode: Neural ordinary differential equations for dynamics modeling in continuous control. arXiv preprint arXiv:2009.04278 (2020)

  12. Ivashko, A., Safonov, G.: Machine learning model for determination of the optimal strategy in an online auction. Inf. Autom. 22(1), 146–167 (2023). https://doi.org/10.15622/ia.22.1.6

    Article  Google Scholar 

  13. Hung, N., Loi, T., Huong, N., Hang, T.T., Huong, T.: AAFNDL – an accurate fake in-formation recognition model using deep learning for the Vietnamese language. Inf. Autom. 22(4), 795–825 (2023). https://doi.org/10.15622/ia.22.4.4

    Article  Google Scholar 

  14. Osipov, V., Kuleshov, S., Miloserdov, D., Zaytseva, A., Aksenov, A.: Recurrent neural networks with continuous learning in problems of news streams multifunctional pro-cessing. Inf. Autom. 21(6), 1145–1168 (2022). https://doi.org/10.15622/ia.21.6.3

    Article  Google Scholar 

  15. Favorskaya, M., Nishchhal, N.: Verification of marine oil spills using aerial images based on deep learning methods. Inf. Autom. 21(5), 937–962 (2022). https://doi.org/10.15622/ia.21.5.4

    Article  Google Scholar 

  16. Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free finetuning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566 (2018)

    Google Scholar 

  17. Janner, M., Fu, J., Zhang, M., Levine, S.: When to trust your model: model-based policy optimization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  18. Hafner, D., Lillicrap, T., Ba, J., Norouzi, M.: Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603 (2019)

  19. Hafner, D., Lillicrap, T., Norouzi, M., Ba, J.: Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193 (2020)

  20. Zhong, Y.D., Dey, B., Chakraborty, A.: Symplectic ode-net: Learning hamiltonian dynamics with control. arXiv preprint arXiv:1909.12077 (2019)

  21. Greydanus, S., Dzamba, M., Yosinski, J.: Hamiltonian neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  22. Yu, T., et al.: Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning. In: Conference on Robot Learning, pp. 1094–1100 (2020)

    Google Scholar 

  23. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. arXiv preprint arXiv:1610.00633 (2016)

  24. Zhu, H., Gupta, A., Rajeswaran, A., Levine, S., Kumar, V.: Dexterous manipulation with deep reinforcement learning: efficient, general, and low-cost. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3651–3657 (2019). https://doi.org/10.1109/ICRA.2019.8794102

  25. Singh, A., Yang, L., Hartikainen, K., Finn, C., Levine, S.: End-to-end robotic reinforcement learning without reward engineering. arXiv preprint arXiv:1904.07854 (2019)

  26. Sehgal, A., La, H., Louis, S., Nguyen, H.: Deep reinforcement learning using genetic algorithm for parameter optimization. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), pp. 596–601 (2019). https://doi.org/10.1109/IRC.2019.00121

  27. Huang, Y., Xie, K., Bharadhwaj, H., Shkurti, F.: Continual model-based reinforcement learning with hypernetworks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 799–805 (2021). https://doi.org/10.1109/ICRA48506.2021.9560793

  28. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: offpolicy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870 (2018)

    Google Scholar 

  29. Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)

  30. Tunyasuvunakool, S., et al.: dm control: Software and tasks for continuous control. Softw. Impacts 6, 100022 (2020)

    Article  Google Scholar 

  31. Hafner, D., Pasukonis, J., Ba, J., Lillicrap, T.: Mastering Diverse Domains through World Models. arXiv preprint arXiv:2301.04104 (2023)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrey Gorodetskiy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gorodetskiy, A., Mironov, K., Panov, A. (2023). Model-Based Policy Optimization with Neural Differential Equations for Robotic Arm Control. In: Ronzhin, A., Sadigov, A., Meshcheryakov, R. (eds) Interactive Collaborative Robotics. ICR 2023. Lecture Notes in Computer Science(), vol 14214. Springer, Cham. https://doi.org/10.1007/978-3-031-43111-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43111-1_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43110-4

  • Online ISBN: 978-3-031-43111-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics