A feature selection method for a sample-based stochastic policy
- 111 Downloads
Stochastic policy gradient methods have been applied to a variety of robot control tasks such as robot’s acquisition of motor skills because they have an advantage in learning in high-dimensional and continuous feature spaces by combining some heuristics like motor primitives. However, when we apply one of them to a real-world task, it is difficult to represent the task well by designing the policy function and the feature space due to the lack of enough prior knowledge about the task. In this research, we propose a method to extract a preferred feature space autonomously to achieve a task using a stochastic policy gradient method for a sample-based policy. We apply our method to a control of linear dynamical system and the computer simulation result shows that a desirable controller is obtained and that the performance of the controller is improved by the feature selection.
KeywordsFeature selection Stochastic policy gradient Reinforcement learning
This work was partly supported by JSPS KAKENHI Grant Number 26730136.
- 2.Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol. 1. Springer, New YorkGoogle Scholar
- 3.Deisenroth MP, Rasmussen CE (2011) Pilco: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), pp 465–472Google Scholar
- 4.Kanda T, Shiomi M, Miyashita Z, Ishiguro H, Hagita N (2009) An affective guide robot in a shopping mall. In: Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, pp 173–180. ACMGoogle Scholar
- 5.Kimura H, Kobayashi S (1998) An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp 278–286. Morgan Kaufmann Publishers Inc.Google Scholar
- 6.Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with Gaussian processes. In: Advances in Neural Information Processing Systems, pp 19–27Google Scholar
- 7.Mitsunaga N, Smith C, Kanda T, Ishiguro H, Hagita N (2005) Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In: Intelligent Robots and Systems, 2005. (IROS 2005). 2005 IEEE/RSJ International Conference on, pp 218–225. IEEEGoogle Scholar
- 8.Mori T, Nakamura Y, Sato MA, Ishii S (2004) Reinforcement learning for CPG-driven biped robot. In: AAAI, pp 623–630Google Scholar
- 9.Okadome Y, Nakamura Y, Ishiguro H (2012) Control method for a redundant robot using stored instances. In: Proceedings of the International Symposium on Artificial life and robotics (AROB17th), pp 1123–1126Google Scholar
- 12.Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT PressGoogle Scholar
- 13.Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. Adaptive computation and machine learning. MIT PressGoogle Scholar
- 14.Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Systems 12(22):1057–1063Google Scholar