Abstract
Reinforcement learning (RL) attracts much attention as a technique for realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL to practical use. This difficulty includes the problem of designing a suitable action space for an agent, i.e., satisfying two requirements in trade-off: (i) to keep the characteristics (or structure) of an original search space as much as possible in order to seek strategies that lie close to the optimal, and (ii) to reduce the search space as much as possible in order to expedite the learning process. In order to design a suitable action space adaptively, in this article, we propose a RL model with switching controllers based on Q-learning and an actor-critic to mimic the process of an infant’s motor development in which gross motor skills develop before fine motor skills. Then a method for switching controllers is constructed by introducing and referring to the “entropy.” Further, through computational experiments by using a path-planning problem with continuous action space, the validity and potential of the proposed method have been confirmed.
Similar content being viewed by others
References
Sutton RS, Barto AG (1998) Reinforcement learning. A Bradford Book, MIT Press
Kimura H, Kobayashi S (2000) An analysis of actor-critic algorithms using eligibility traces: reinforcement learning with imperfect value functions (in Japanese). JSAI J 15(2):267–275
Morimoto J, Doya K (2001) Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics Auton Syst 36:37–51
Shibata K, Nishino T, Okabe Y (2001) Active perception learning system based on actor-Q architecture (in Japanese). T IEICE Jpn, J84-D-II(9), p 2121–2130
Ito A, Kanabuchi M (2001) Speeding up multi-agent reinforcement learning by coarse-graining of perception: hunter game as an example (in Japanese). T IEICE Jpn, J84-DI(3), pp 285–293
Nagayoshi M, Murao H, Tamaki H (2006) A state space filter for reinforcement learning. Proceedings of AROB 11th’06, pp 615–618 (GS1-3)
Nagayoshi M, Murao H, Tamaki H (2006) A state space filter for reinforcement learning in POMDPs: application to a continuous state space. Proceedings of the SICE-ICSE International Joint Conference 2006, pp 6037–6042 (SE18-4)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was presented in part at the 15th International Symposium on Artificial Life and Robotics, Oita, Japan, February 4–6, 2010
About this article
Cite this article
Nagayoshi, M., Murao, H. & Tamaki, H. A reinforcement learning with switching controllers for a continuous action space. Artif Life Robotics 15, 97–100 (2010). https://doi.org/10.1007/s10015-010-0772-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-010-0772-0