Toward Faster Reinforcement Learning for Robotics: Using Gaussian Processes

Younes, Ali; Panov, Aleksandr I.

doi:10.1007/978-3-030-33274-7_11

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11866))

2423 Accesses
4 Citations

Abstract

Standard robotic control works perfectly in case of ordinary conditions, but in the case of a change in the conditions (e.g. damaging of one of the motors), the robot won’t achieve its task anymore. We need an algorithm that provide the robot with the ability of adaption to unforeseen situations. Reinforcement learning provide a framework corresponds with that requirements, but it needs big data sets to learn robotic tasks, which is impractical. We discuss using Gaussian processes to improve the efficiency of the Reinforcement learning, where a Gaussian Process will learn a state transition model using data from the robot (interaction) phase, and after that use the learned GP model to simulate trajectories and optimize the robot’s controller in a (simulation) phase. PILCO algorithm considered as the most data efficient RL algorithm. It gives promising results in Cart-pole task, where a working controller was learned after seconds of (interaction) on the real robot, but the whole training time, considering the training in the (simulation) was longer. In this work, we will try to leverage the abilities of the computational graphs to produce a ROS friendly python implementation of PILCO, and discuss a case study of a real world robotic task.

This work was supported by the Russian Science Foundation, project no. 18-71-00143.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

McFarlane, D.C., Glover, K.: Robust Controller Design Procedure Using Normalized Coprime Factor Plant Descriptions. Lecture Notes in Control and Information Sciences, vol. 138. Springer, Heidelberg (1990). https://doi.org/10.1007/BFb0043199
Book MATH Google Scholar
Rocco, P.: Stability of PID control for industrial robot arms. IEEE Trans. Robot. Autom. 12(4), 606–614 (1996)
Article Google Scholar
Åström, K.J., Wittenmark, B.: Adaptive Control. Courier Corporation, Mineola (2013)
MATH Google Scholar
Wen, J.T., Murphy, S.H.: PID control for robot manipulators. Rensselaer Polytechnic Institute (1990)
Google Scholar
Teixeira, R.A., Braga, A.D.P., De Menezes, B.R.: Control of a robotic manipulator using artificial neural networks with on-line adaptation. Neural Process. Lett. 12(1), 19–31 (2000)
Article Google Scholar
Nesnas, I.A., et al.: CLARAty: challenges and steps toward reusable robotic software. Int. J. Adv. Robot. Syst. 3(1), 5 (2006)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)
MATH Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Article Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897, June 2015
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trends® Robot. 2(1–2), 1–142 (2013)
Article Google Scholar
Carlson, J., Murphy, R.R.: How UGVs physically fail in the field. IEEE Trans. Robot. 21(3), 423–437 (2005)
Article Google Scholar
Cully, A., Clune, J., Tarapore, D., Mouret, J.B.: Robots that can adapt like animals. Nature 521(7553), 503 (2015)
Article Google Scholar
Nagatani, K., et al.: Emergency response to the nuclear accident at the Fukushima Daiichi Nuclear Power Plants using mobile rescue robots. J. Field Robot. 30(1), 44–63 (2013)
Article Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006)
MATH Google Scholar
Ko, J., Klein, D.J., Fox, D., Haehnel, D.: Gaussian processes and reinforcement learning for identification and control of an autonomous blimp. In: Proceedings 2007 IEEE International Conference on Robotics and Automation, pp. 742–747. IEEE, April 2007
Google Scholar
Wilson, A., Fern, A., Tadepalli, P.: Incorporating domain models into Bayesian optimization for RL. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6323, pp. 467–482. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15939-8_30
Chapter Google Scholar
Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In: Proceedings of the 20th International Conference on Machine Learning, ICML 2003, pp. 154–161 (2003)
Google Scholar
Deisenroth, M.P., Fox, D., Rasmussen, C.E.: Gaussian processes for data-efficient learning in robotics and control. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 408–423 (2015)
Article Google Scholar
Matthews, D.G., et al.: GPflow: a Gaussian process library using TensorFlow. J. Mach. Learn. Res. 18(1), 1299–1304 (2017)
MathSciNet MATH Google Scholar
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
http://www.ros.org
http://mlg.eng.cam.ac.uk/pilco/
http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Download references

Author information

Authors and Affiliations

Bauman Moscow State Technical University, Moscow, Russia
Ali Younes
Artificial Intelligence Research Institute, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Moscow, Russia
Aleksandr I. Panov
Moscow Institute of Physics and Technology, Moscow, Russia
Aleksandr I. Panov

Authors

Ali Younes
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandr I. Panov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aleksandr I. Panov .

Editor information

Editors and Affiliations

Federal Research Center "Computer Science and Control", Moscow, Russia
Gennady S. Osipov
Federal Research Center "Computer Science and Control", Moscow, Russia
Aleksandr I. Panov
Federal Research Center "Computer Science and Control", Moscow, Russia
Konstantin S. Yakovlev

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Younes, A., Panov, A.I. (2019). Toward Faster Reinforcement Learning for Robotics: Using Gaussian Processes. In: Osipov, G., Panov, A., Yakovlev, K. (eds) Artificial Intelligence. Lecture Notes in Computer Science(), vol 11866. Springer, Cham. https://doi.org/10.1007/978-3-030-33274-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-33274-7_11
Published: 14 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33273-0
Online ISBN: 978-3-030-33274-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics