Artificial Life and Robotics

, Volume 19, Issue 3, pp 251–257 | Cite as

A feature selection method for a sample-based stochastic policy

  • Jumpei Yamanaka
  • Yutaka Nakamura
  • Hiroshi Ishiguro
Original Article


Stochastic policy gradient methods have been applied to a variety of robot control tasks such as robot’s acquisition of motor skills because they have an advantage in learning in high-dimensional and continuous feature spaces by combining some heuristics like motor primitives. However, when we apply one of them to a real-world task, it is difficult to represent the task well by designing the policy function and the feature space due to the lack of enough prior knowledge about the task. In this research, we propose a method to extract a preferred feature space autonomously to achieve a task using a stochastic policy gradient method for a sample-based policy. We apply our method to a control of linear dynamical system and the computer simulation result shows that a desirable controller is obtained and that the performance of the controller is improved by the feature selection.


Feature selection Stochastic policy gradient Reinforcement learning 



This work was partly supported by JSPS KAKENHI Grant Number 26730136.


  1. 1.
    Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350zbMATHMathSciNetGoogle Scholar
  2. 2.
    Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol. 1. Springer, New YorkGoogle Scholar
  3. 3.
    Deisenroth MP, Rasmussen CE (2011) Pilco: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), pp 465–472Google Scholar
  4. 4.
    Kanda T, Shiomi M, Miyashita Z, Ishiguro H, Hagita N (2009) An affective guide robot in a shopping mall. In: Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, pp 173–180. ACMGoogle Scholar
  5. 5.
    Kimura H, Kobayashi S (1998) An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp 278–286. Morgan Kaufmann Publishers Inc.Google Scholar
  6. 6.
    Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with Gaussian processes. In: Advances in Neural Information Processing Systems, pp 19–27Google Scholar
  7. 7.
    Mitsunaga N, Smith C, Kanda T, Ishiguro H, Hagita N (2005) Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In: Intelligent Robots and Systems, 2005. (IROS 2005). 2005 IEEE/RSJ International Conference on, pp 218–225. IEEEGoogle Scholar
  8. 8.
    Mori T, Nakamura Y, Sato MA, Ishii S (2004) Reinforcement learning for CPG-driven biped robot. In: AAAI, pp 623–630Google Scholar
  9. 9.
    Okadome Y, Nakamura Y, Ishiguro H (2012) Control method for a redundant robot using stored instances. In: Proceedings of the International Symposium on Artificial life and robotics (AROB17th), pp 1123–1126Google Scholar
  10. 10.
    Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7):1180–1190CrossRefGoogle Scholar
  11. 11.
    Peters J, Schaal S (2008) Reinforcement learning of motor skills with policy gradients. Neural Netw 21(4):682–697CrossRefGoogle Scholar
  12. 12.
    Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT PressGoogle Scholar
  13. 13.
    Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. Adaptive computation and machine learning. MIT PressGoogle Scholar
  14. 14.
    Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Systems 12(22):1057–1063Google Scholar
  15. 15.
    Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach learn 8(3):229–256zbMATHGoogle Scholar

Copyright information

© ISAROB 2014

Authors and Affiliations

  • Jumpei Yamanaka
    • 1
  • Yutaka Nakamura
    • 1
  • Hiroshi Ishiguro
    • 1
  1. 1.ToyonakaJapan

Personalised recommendations