Skip to main content

Toward Faster Reinforcement Learning for Robotics: Using Gaussian Processes

Part of the Lecture Notes in Computer Science book series (LNAI,volume 11866)

Abstract

Standard robotic control works perfectly in case of ordinary conditions, but in the case of a change in the conditions (e.g. damaging of one of the motors), the robot won’t achieve its task anymore. We need an algorithm that provide the robot with the ability of adaption to unforeseen situations. Reinforcement learning provide a framework corresponds with that requirements, but it needs big data sets to learn robotic tasks, which is impractical. We discuss using Gaussian processes to improve the efficiency of the Reinforcement learning, where a Gaussian Process will learn a state transition model using data from the robot (interaction) phase, and after that use the learned GP model to simulate trajectories and optimize the robot’s controller in a (simulation) phase. PILCO algorithm considered as the most data efficient RL algorithm. It gives promising results in Cart-pole task, where a working controller was learned after seconds of (interaction) on the real robot, but the whole training time, considering the training in the (simulation) was longer. In this work, we will try to leverage the abilities of the computational graphs to produce a ROS friendly python implementation of PILCO, and discuss a case study of a real world robotic task.

Keywords

  • Robot learning
  • Reinforcement learning
  • Gaussian process
  • Data efficient

This work was supported by the Russian Science Foundation, project no. 18-71-00143.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-33274-7_11
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-33274-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.

References

  1. McFarlane, D.C., Glover, K.: Robust Controller Design Procedure Using Normalized Coprime Factor Plant Descriptions. Lecture Notes in Control and Information Sciences, vol. 138. Springer, Heidelberg (1990). https://doi.org/10.1007/BFb0043199

    CrossRef  MATH  Google Scholar 

  2. Rocco, P.: Stability of PID control for industrial robot arms. IEEE Trans. Robot. Autom. 12(4), 606–614 (1996)

    CrossRef  Google Scholar 

  3. Åström, K.J., Wittenmark, B.: Adaptive Control. Courier Corporation, Mineola (2013)

    MATH  Google Scholar 

  4. Wen, J.T., Murphy, S.H.: PID control for robot manipulators. Rensselaer Polytechnic Institute (1990)

    Google Scholar 

  5. Teixeira, R.A., Braga, A.D.P., De Menezes, B.R.: Control of a robotic manipulator using artificial neural networks with on-line adaptation. Neural Process. Lett. 12(1), 19–31 (2000)

    CrossRef  Google Scholar 

  6. Nesnas, I.A., et al.: CLARAty: challenges and steps toward reusable robotic software. Int. J. Adv. Robot. Syst. 3(1), 5 (2006)

    CrossRef  Google Scholar 

  7. Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  8. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    CrossRef  Google Scholar 

  9. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897, June 2015

    Google Scholar 

  10. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  11. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  12. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    CrossRef  Google Scholar 

  13. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    CrossRef  Google Scholar 

  14. Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trends® Robot. 2(1–2), 1–142 (2013)

    CrossRef  Google Scholar 

  15. Carlson, J., Murphy, R.R.: How UGVs physically fail in the field. IEEE Trans. Robot. 21(3), 423–437 (2005)

    CrossRef  Google Scholar 

  16. Cully, A., Clune, J., Tarapore, D., Mouret, J.B.: Robots that can adapt like animals. Nature 521(7553), 503 (2015)

    CrossRef  Google Scholar 

  17. Nagatani, K., et al.: Emergency response to the nuclear accident at the Fukushima Daiichi Nuclear Power Plants using mobile rescue robots. J. Field Robot. 30(1), 44–63 (2013)

    CrossRef  Google Scholar 

  18. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  19. Ko, J., Klein, D.J., Fox, D., Haehnel, D.: Gaussian processes and reinforcement learning for identification and control of an autonomous blimp. In: Proceedings 2007 IEEE International Conference on Robotics and Automation, pp. 742–747. IEEE, April 2007

    Google Scholar 

  20. Wilson, A., Fern, A., Tadepalli, P.: Incorporating domain models into Bayesian optimization for RL. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6323, pp. 467–482. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15939-8_30

    CrossRef  Google Scholar 

  21. Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In: Proceedings of the 20th International Conference on Machine Learning, ICML 2003, pp. 154–161 (2003)

    Google Scholar 

  22. Deisenroth, M.P., Fox, D., Rasmussen, C.E.: Gaussian processes for data-efficient learning in robotics and control. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 408–423 (2015)

    CrossRef  Google Scholar 

  23. Matthews, D.G., et al.: GPflow: a Gaussian process library using TensorFlow. J. Mach. Learn. Res. 18(1), 1299–1304 (2017)

    MathSciNet  MATH  Google Scholar 

  24. Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)

  25. http://www.ros.org

  26. http://mlg.eng.cam.ac.uk/pilco/

  27. http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aleksandr I. Panov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Younes, A., Panov, A.I. (2019). Toward Faster Reinforcement Learning for Robotics: Using Gaussian Processes. In: Osipov, G., Panov, A., Yakovlev, K. (eds) Artificial Intelligence. Lecture Notes in Computer Science(), vol 11866. Springer, Cham. https://doi.org/10.1007/978-3-030-33274-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33274-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33273-0

  • Online ISBN: 978-3-030-33274-7

  • eBook Packages: Computer ScienceComputer Science (R0)