Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot

  • Yutaka Nakamura
  • Takeshi Mori
  • Shin Ishii
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3242)


Motivated by the perspective that animals’ rhythmic movements such as locomotion are controlled by neural circuits called central pattern generators (CPGs), motor control mechanisms by CPG have been studied. As an autonomous learning framework for a CPG controller, we previously proposed a reinforcement learning (RL) method called the CPG-actor-critic method. In this article, we propose a natural policy gradient learning algorithm for the CPG-actor-critic method, and applied our RL to an automatic control problem by a biped robot simulator. Computer simulations show that our RL makes the biped robot walk stably on various terrain.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Grillner, S., Wallen, P., Brodin, L., Lansner, A.: Neuronal network generating locomotor behavior in lamprey: circuitry, transmitters, membrane properties and simulations. Annual Review of Neuroscience 14, 169–199 (1991)CrossRefGoogle Scholar
  2. 2.
    Taga, G., Yamaguchi, Y., Shimizu, H.: Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment. Biological Cybernetics 65, 147–159 (1991)MATHCrossRefGoogle Scholar
  3. 3.
    Sato, M., Nakamura, Y., Ishii, S.: Reinforcement learning for biped locomotion. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 777–782. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  5. 5.
    Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. SIAM Journal on Control and Optimization 42, 1143–1146 (2003)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Sutton, R.S., McAllester, D., Singh, S., Manour, Y.: Policy gradient method for reinforcement learning with function approximation. In: Proceedings of the 1998 IEEE International Conference on Robotics & Automation (2000)Google Scholar
  7. 7.
    Kakade, S.: A natural policy gradient. Advances in Neural Information Processing Systems 14, 1531–1538 (2001)Google Scholar
  8. 8.
    Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: Third IEEE International Conference on Humanoid Robotics 2003, Germany (2003)Google Scholar
  9. 9.
    Sato, M., Ishii, S.: Reinforcement learning based on on-line em algorithm. Advances in Neural Information Processing Systems 11, 1052–1058 (1999)Google Scholar
  10. 10.
    Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22, 33–57 (1996)MATHGoogle Scholar
  11. 11.
    Lagoudakis, M.G., Parr, R., Littman, M.L.: Least-squares methods in reinforcement learning for control. In: SETN, pp. 249–260 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Yutaka Nakamura
    • 1
    • 2
  • Takeshi Mori
    • 2
  • Shin Ishii
    • 1
    • 2
  1. 1.CREST, JST 
  2. 2.Nara Institute of Science and TechnologyNaraJapan

Personalised recommendations