Towards Generating Simulated Walking Motion Using Position Based Deep Reinforcement Learning

  • William JonesEmail author
  • Siddhant Gangapurwala
  • Ioannis Havoutis
  • Kazuya Yoshida
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11650)


Much of robotics research aims to develop control solutions that exploit the machine’s dynamics in order to achieve an extraordinarily agile behaviour [1]. This, however, is limited by the use of traditional model-based control techniques such as model predictive control and quadratic programming. These solutions are often based on simplified mechanical models which result in mechanically constrained and inefficient behaviour, thereby limiting the agility of the robotic system in development [2]. Treating the control of robotic systems as a reinforcement learning (RL) problem enables the use of model-free algorithms that attempt to learn a policy which maximizes the expected future (discounted) reward without inferring the effects of an executed action on the environment.


ANYmal Reinforcement learning Walking robot Proximal Policy Optimization 



This research is supported by the UKRI and EPSRC (EP/R026084/1, EP/R026173/1, EP/S002383/1) and the EU H2020 project MEMMO (780684). This work has been conducted as part of ANYmal Research, a community to advance legged robotics.


  1. 1.
    Gangapurwala, S., et al.: Generative adversarial imitation learning for quadrupedal locomotion using unstructured expert demonstrations (2018)Google Scholar
  2. 2.
    Mastalli, C., et al.: Trajectory and foothold optimization using low-dimensional models for rough terrain locomotion (2017)Google Scholar
  3. 3.
    Hwangbo, J., et al.: Learning agile and dynamic motor skills for legged robots. Sci. Robot. 4(26), eaau5872 (2019). CrossRefGoogle Scholar
  4. 4.
    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017)Google Scholar
  5. 5.
    Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897, 1 June 2015Google Scholar
  6. 6.
    Rohmer, E., Signgh, S. P. N., Freese, M.: V-REP: a versatile and scalable robot simulation framework. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2013)Google Scholar
  7. 7.
    Hutter, M., et al.: ANYmal - a highly mobile and dynamic quadrupedal robot. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, pp. 38–44 (2016).
  8. 8.
    Liang, J., et al.: GPU-accelerated robotic simulation for distributed reinforcement learning. CoRL (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • William Jones
    • 1
    Email author
  • Siddhant Gangapurwala
    • 2
  • Ioannis Havoutis
    • 2
  • Kazuya Yoshida
    • 1
  1. 1.Space Robotics Laboratory, Department of Aerospace EngineeringTohoku UniversitySendaiJapan
  2. 2.Oxford Robotics Institute, Department of Engineering ScienceOxford UniversityOxfordUK

Personalised recommendations