A Gaussian Process Reinforcement Learning Algorithm with Adaptability and Minimal Tuning Requirements

  • Jonathan Strahl
  • Timo Honkela
  • Paul Wagner
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8681)


We present a novel Bayesian reinforcement learning algorithm that addresses model bias and exploration overhead issues. The algorithm combines different aspects of several state-of-the-art reinforcement learning methods that use Gaussian Processes model-based approaches to increase the use of the online data samples. The algorithm uses a smooth reward function requiring the reward value to be derived from the environment state. It works with continuous states and actions in a coherent way with a minimized need for expert knowledge in parameter tuning. We analyse and discuss the practical benefits of the selected approach in comparison to more traditional methodological choices, and illustrate the use of the algorithm in a motor control problem involving a two-link simulated arm.


Non-parametric reinforcement learning Gaussian processes batch reinforcement learning Bayesian reinforcement learning minimal domain-expert knowledge 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abbeel, P., Coates, A., Quigley, M., Ng, A.: An application of reinforcement learning to aerobatic helicopter flight. In: Advances in Neural Information Processing Systems, vol. 19, pp. 1–8 (2007)Google Scholar
  2. 2.
    Głowacka, D., Ruotsalo, T., Konuyshkova, K., Kaski, S., Jacucci, G.: Directing exploratory search: Reinforcement learning from user interactions with keywords. In: Proceedings of the 2013 International Conference on Intelligent User Interfaces, pp. 117–128. ACM (2013)Google Scholar
  3. 3.
    Kober, J., Peters, J.: Reinforcement learning in robotics: A survey. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. ALO, vol. 12, pp. 569–600. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Arleo, A., Smeraldi, F., Gerstner, W.: Cognitive navigation based on nonuniform gabor space sampling, unsupervised growing networks, and reinforcement learning. IEEE Transactions on Neural Networks 15(3), 639–652 (2004)CrossRefGoogle Scholar
  5. 5.
    Montazeri, H., Moradi, S., Safabakhsh, R.: Continuous state/action reinforcement learning: A growing self-organizing map approach. Neurocomputing 74(7), 1069–1082 (2011)CrossRefGoogle Scholar
  6. 6.
    Graziano, V., Koutník, J., Schmidhuber, J.: Unsupervised modeling of partially observable environments. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part I. LNCS (LNAI), vol. 6911, pp. 503–515. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. Cambridge Univ. Press (1998)Google Scholar
  8. 8.
    Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Technical report, Cambridge University Engineering Dept. (1994)Google Scholar
  9. 9.
    Deisenroth, M.P., Rasmussen, C.E.: PILCO: A model-based and data-efficient approach to policy search. In: Proceedings of the International Conference on Machine Learning (2011)Google Scholar
  10. 10.
    Deisenroth, M.P., Rasmussen, C.E.: Efficient reinforcement learning for motor control. In: Proceedings of the 10th International PhD Workshop on Systems and Control, Hlubok nad Vltavou, Czech Republic (2009)Google Scholar
  11. 11.
    Körding, K.P., Wolpert, D.M.: The loss function of sensorimotor learning. Proceedings of the National Academy of Sciences of the United States of America 101(26), 9839–9842 (2004)Google Scholar
  12. 12.
    Rasmussen, C.E., Kuss, M.: Gaussian processes in reinforcement learning. In: Advances in Neural Information Processing Systems 16, pp. 751–759. MIT Press (2004)Google Scholar
  13. 13.
    Jakab, H., Csató, L.: Improving Gaussian process value function approximation in policy gradient algorithms. In: Honkela, T. (ed.) ICANN 2011, Part II. LNCS, vol. 6792, pp. 221–228. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  14. 14.
    Englert, P., Paraschos, A., Peters, J., Deisenroth, M.P.: Model-based imitation learning by probabilistic trajectory matching. In: Proceedings of 2013 IEEE International Conference on Robotics and Automation (ICRA) (2013)Google Scholar
  15. 15.
    Ko, J., Klein, D.J.: Gaussian processes and reinforcement learning for identification and control of an autonomous blimp. In: IEEE Intl. Conf. on Robotics and Automation (ICRA) (2007)Google Scholar
  16. 16.
    Ghavamzadeh, M., Engel, Y.: Bayesian actor-critic algorithms. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 297–304. ACM, New York (2007)Google Scholar
  17. 17.
    Sugiyama, M., Hachiya, H., Towell, C., Vijayakumar, S.: Geodesic Gaussian kernels for value function approximation. Auton. Robots 25(3), 287–304 (2008)CrossRefGoogle Scholar
  18. 18.
    Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 201–208. ACM, New York (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jonathan Strahl
    • 1
  • Timo Honkela
    • 1
    • 2
  • Paul Wagner
    • 1
  1. 1.Department of Information and Computer ScienceAalto UniversityAaltoFinland
  2. 2.Department of Modern LanguagesUniversity of HelsinkiHelsinkiFinland

Personalised recommendations