Journal of Intelligent & Robotic Systems

, Volume 83, Issue 3–4, pp 393–408 | Cite as

Contextual Policy Search for Linear and Nonlinear Generalization of a Humanoid Walking Controller

  • Abbas Abdolmaleki
  • Nuno Lau
  • Luis Paulo Reis
  • Jan Peters
  • Gerhard Neumann


We investigate learning of flexible robot locomotion controllers, i.e., the controllers should be applicable for multiple contexts, for example different walking speeds, various slopes of the terrain or other physical properties of the robot. In our experiments, contexts are desired walking linear speed of the gait. Current approaches for learning control parameters of biped locomotion controllers are typically only applicable for a single context. They can be used for a particular context, for example to learn a gait with highest speed, lowest energy consumption or a combination of both. The question of our research is, how can we obtain a flexible walking controller that controls the robot (near) optimally for many different contexts? We achieve the desired flexibility of the controller by applying the recently developed contextual relative entropy policy search(REPS) method which generalizes the robot walking controller for different contexts, where a context is described by a real valued vector. In this paper we also extend the contextual REPS algorithm to learn a non-linear policy instead of a linear policy over the contexts which call it RBF-REPS as it uses Radial Basis Functions. In order to validate our method, we perform three simulation experiments including a walking experiment using a simulated NAO humanoid robot. The robot learns a policy to choose the controller parameters for a continuous set of forward walking speeds.


Learning humanoids robot locomotions Generalizing robot skills Stochastic search Contextual relative entropy policy search Nonlinear policies Nao robot 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kormushev, P., Ugurlu, B., Calinon, S., Tsagarakis, N.G., Caldwell, D.G.: Bipedal walking energy minimization by reinforcement learning with evolving policy parameterization. In: Proceedings of the International Conference on Robot Systems (2011)Google Scholar
  2. 2.
    Kupcsik, A.G., Deisenroth, M.P., Peters, J., Neumann, G: Data-efficient generalization of robot skills with contextual policy search. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2013)Google Scholar
  3. 3.
    Shafii, N., Khorsandian, A., Abdolmaleki, A., Jozi, B.: An optimized gait generator based on fourier series towards fast and robust biped locomotion involving arms swing. In: Proceedings of the International Conference on Automation and Logistics (2009)Google Scholar
  4. 4.
    Harada, K., Kajita, S., Kaneko, K., Hirukawa, H.: An analytical method for real-time gait planning for humanoid robots, International Journal of Humanoid Robotics (2006)Google Scholar
  5. 5.
    Gong, D., Yan, J., Zuo, G.: A review of gait optimization based on evolutionary computation, Applied Computational Intelligence and Soft Computing (2010)Google Scholar
  6. 6.
    Wang, J.M., Fleet, D.J., Hertzmann, A.: Optimizing walking controllers. In: ACM Transactions on Graphics (TOG) (2009)Google Scholar
  7. 7.
    Seungmoon, S., Hartmut, G.: Regulating speed and generating large speed transitions in a neuromuscular human walking model. In: Proceedings of the International Conference on Robotics and Automation (ICRA) (2012)Google Scholar
  8. 8.
    Kajita, S., Kanehiro, F., Kaneko, K., Fujiwara, K.: Biped walking pattern generation by using preview control of zero-moment point. In: Proceedings of the International Conference on Robotics and Automation (ICRA) (2003)Google Scholar
  9. 9.
    Shafii, N., Abdolmaleki, A., Ferreira, R., Lau, N., Reis, L.P.: Omnidirectional Walking and Active Balance for Soccer Humanoid Robot, in Progress in Artificial Intelligence (2013)Google Scholar
  10. 10.
    Vukobratovic, M., Stokic, D., Borovac, B., Surla, D.: Biped Locomotion: Dynamics, Stability, Control and Application. Springer, Berlin Heidelberg New York (1990)CrossRefMATHGoogle Scholar
  11. 11.
    Harada, K., Kajita, S., Kaneko, K., Hirukawa, H.: An analytical method for real-time gait planning for humanoid robots. International Journal of Humanoid Robotics (2006)Google Scholar
  12. 12.
    Srinivasan, M., Ruina, A.: Computer optimization of a minimal biped model discovers walking and running. Nature (2005)Google Scholar
  13. 13.
    Kagami, S., Nishivaki, K., Inaba, M., Inoue, H.: A Fast Dynamically Equilibrated Walking Trajectory Generation Method of Humanoid Robot. Autonomous Robots (2002)Google Scholar
  14. 14.
    Kofinas, N., Orfanoudakis, E., Lagoudakis, M.G.: Complete analytical inverse kinematics for NAO. In: Autonomous Robot Systems (Robotica) (2013)Google Scholar
  15. 15.
    Kajita, S., Kanehiro, F., Kaneko, K., Yokoi, K., Hirukawa, H.: The 3D linear inverted pendulum mode: a simple modeling for a biped walking pattern generation. Intelligent Robots and Systems (2001)Google Scholar
  16. 16.
    Cord, N., Rfer, T., Laue, T.: Gait optimization on a humanoid robot using particle swarm optimization. In: Proceedings of the Second Workshop on Humanoid Soccer Robots (2007)Google Scholar
  17. 17.
    Abdolmaleki, A., Shafii, N., Reis, L.P., Lau, N., Peters, J., Neumann, G.: Omnidirectional walking with a compliant inverted pendulum model. In: Advances in Artificial Intelligence–IBERAMIA (2014)Google Scholar
  18. 18.
    Ijspeert, A.J., Nakanishi, J., Schaal, S.: Learning attractor landscapes for learning motor primitives. In: Neural Information Processing Systems (NIPS) (2002)Google Scholar
  19. 19.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefMATHGoogle Scholar
  20. 20.
    Peters, J., Mlling, K., Altun, Y.: Relative entropy policy search. In: AAAI (2010)Google Scholar
  21. 21.
    MacAlpine, P., Barrett, S., Urieli, D., Vu, V., Stone, P.: Design and optimization of an omnidirectional humanoid walk: a winning approach at the RoboCup 2011 3D simulation competition. In: AAAI (2012)Google Scholar
  22. 22.
    Shafii, N., Lau, N., Reis, L.P.: Learning to Walk Fast: Optimized Hip Height Movement for Simulated and Real Humanoid Robots. Journal of Intelligent and Robotic Systems (2015)Google Scholar
  23. 23.
    Xu, Y., Vatankhah, H.: Simspark: an open source robot simulator developed by the RoboCup community. In: RoboCup 2013 (2014)Google Scholar
  24. 24.
    Glaser, S., Dorer, K.: Trunk controlled motion framework. In: Proceedings of the 8th Workshop on Humanoid Soccer Robots, IEEE-RAS International Conference on Humanoid Robots (2013)Google Scholar
  25. 25.
    Hansen, N., Mller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). In: Evolutionary Computation (2003)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • Abbas Abdolmaleki
    • 1
    • 2
    • 3
  • Nuno Lau
    • 1
  • Luis Paulo Reis
    • 2
    • 3
  • Jan Peters
    • 4
    • 5
  • Gerhard Neumann
    • 6
  1. 1.DETI / IEETA, University of AveiroAveiroPortugal
  2. 2.DSI, University of MinhoGuimarãesPortugal
  3. 3.LIACC, University of PortoPortoPortugal
  4. 4.IAS, TU DarmstadtDarmstadtGermany
  5. 5.MPI for Intelligent SystemsStuttgartGermany
  6. 6.CLAS, TU DarmstadtDarmstadtGermany

Personalised recommendations