Reinforcement Learning Algorithm with CTRNN in Continuous Action Space

  • Hiroaki Arie
  • Jun Namikawa
  • Tetsuya Ogata
  • Jun Tani
  • Shigeki Sugano
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4232)


There are some difficulties in applying traditional reinforcement learning algorithms to motion control tasks of robot. Because most algorithms are concerned with discrete actions and based on the assumption of complete observability of the state. This paper deals with these two problems by combining the reinforcement learning algorithm and CTRNN learning algorithm. We carried out an experiment on the pendulum swing-up task without rotational speed information. It is shown that the information about the rotational speed, which is considered as a hidden state, is estimated and encoded on the activation of a context neuron. As a result, this task is accomplished in several hundred trials using the proposed algorithm.


Discrete Action Reinforcement Learn Algorithm Total Reward Reinforcement Learning Method Complete Observability 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Weng, J., McClelland, J., Pentland, A., Sporns, O., Stockman, I., Sur, M., Thelen, E.: Autonomous mental development by robots and animals. Science 291(5504), 599–600 (2001)CrossRefGoogle Scholar
  2. 2.
    Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 13, 834–846 (1983)Google Scholar
  3. 3.
    Bianco, R., Nolfi, S.: Evolving the neural controller for a robotic arm able to grasp objects on the basis of tactile sensors. Adaptive Behavior 12, 37–45 (2004)CrossRefGoogle Scholar
  4. 4.
    Gullapalli, V.: A stochastic reinforcement learning algorithm for learning realvalued functions. Neural Networks 3(6), 671–692 (1990)CrossRefGoogle Scholar
  5. 5.
    Doya, K.: Reinforcement learning in continuous time and space. Neural Computation 12, 219–245 (2000)CrossRefGoogle Scholar
  6. 6.
    Lin, L.J., Mitchell, T.M.: Reinforcement learning with hidden state. In: Proc. of the 2nd Int. Conf. on Simulation of Adaptive Behavior. MIT Press, Cambridge (1993)Google Scholar
  7. 7.
    Tani, J.: Model-based learning for mobile robot navigation from the dynamical system perspective. IEEE Transactions on System, Man and Cybernetics Part B 26, 421–436 (1996)CrossRefGoogle Scholar
  8. 8.
    McCallum, A.K.: Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, Univertsity of Rochester, Rochester, New York (1995)Google Scholar
  9. 9.
    Sutton, R.S.: Learning to predict by the methods of temporal difference. Machine Learning 3, 9–44 (1988)Google Scholar
  10. 10.
    Doya, K.: Temporal difference learning in continuous time and space. In: Advances in Neural Information Processing Systems, vol. 8. MIT Press, Cambridge (1996)Google Scholar
  11. 11.
    Jordan, M.I., Rumelhart, D.E.: Forward models: Supervised learning with a distal teacher. Cognitive Science 16, 307–354 (1992)CrossRefGoogle Scholar
  12. 12.
    Rumelhart, D., Hinton, G., Williams, R.: Learning internal representations by error propagation. In: Parallel distributed processing, vol. 1. MIT Press, Cambridge (1986)Google Scholar
  13. 13.
    Tani, J.: An interpretation of the “self” from the dynamical system perspective: A constructivist approach. Consciousness Studies 5(5-6) (1998)Google Scholar
  14. 14.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2), 157–166 (1994)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hiroaki Arie
    • 1
  • Jun Namikawa
    • 3
  • Tetsuya Ogata
    • 2
  • Jun Tani
    • 3
  • Shigeki Sugano
    • 1
  1. 1.Department of Mechanical EngineeringWaseda UniversityTokyoJapan
  2. 2.Graduate School of InfomaticsKyoto UniversityKyotoJapan
  3. 3.RIKEN, Brain Science InstituteLaboratory for Behavior and Dynamic CognitionSaitamaJapan

Personalised recommendations