DCOB: Action space for reinforcement learning of high DoF robots
- 465 Downloads
Reinforcement learning (RL) for robot control is an important technology for future robots since it enables us to design a robot’s behavior using the reward function. However, RL for high degree-of-freedom robot control is still an open issue. This paper proposes a discrete action space DCOB which is generated from the basis functions (BFs) given to approximate a value function. The remarkable feature is that, by reducing the number of BFs to enable the robot to learn quickly the value function, the size of DCOB is also reduced, which improves the learning speed. In addition, a method WF-DCOB is proposed to enhance the performance, where wire-fitting is utilized to search for continuous actions around each discrete action of DCOB. We apply the proposed methods to motion learning tasks of a simulated humanoid robot and a real spider robot. The experimental results demonstrate outstanding performance.
KeywordsReinforcement learning Action space Motion learning Humanoid robot Crawling
- Asada, M., Noda, S., & Hosoda, K. (1996). Action-based sensor space categorization for robot learning. In The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS ’96) (pp. 1502–1509).Google Scholar
- Baird, L.C., & Klopf, A.H. (1993). Reinforcement learning with high-dimensional, continuous actions. Technical Report WL-TR-93-1147, Wright Laboratory, Wright-Patterson Air Force Base.Google Scholar
- Gaskett, C., Fletcher, L., & Zelinsky, A. (2000). Reinforcement learning for a vision based mobile robot. In The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’00).Google Scholar
- Ijspeert, A., & Schaal, S. (2002). Learning attractor landscapes for learning motor primitives. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems (pp. 1547–1554). Cambridge: MIT Press.Google Scholar
- Kimura, H., Yamashita, T., & Kobayashi, S. (2001). Reinforcement learning of walking behavior for a four-legged robot. In Proceedings of the 40th IEEE Conference on Decision and Control. Portugal.Google Scholar
- Kober, J., & Peters, J. (2009). Learning motor primitives for robotics. In The IEEE International Conference on Robotics and Automation (ICRA’09) (pp. 2509–2515).Google Scholar
- Loch, J., & Singh, S. (1998). Using eligibility traces to find the best memoryless policy in partially observable markov decision processes. In Proceedings of the Fifteenth International Conference on Machine Learning. (pp. 323–331).Google Scholar
- Matsubara, T., Morimoto, J., Nakanishi, J., Hyon, S., Hale, J.G., & Cheng, G. (2007). Learning to acquire whole-body humanoid CoM movements to achieve dynamic tasks. In The IEEE International Conference on Robotics and Automation (ICRA’07). (pp. 2688–2693). doi:10.1109/ROBOT.2007.363871.
- Mcgovern, A., & Barto, A.G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. In The Eighteenth International Conference on Machine Learning. (pp. 361–368). San Mateo, CA: Morgan Kaufmann.Google Scholar
- Menache, I., Mannor, S., & Shimkin, N. (2002). Q-cut - dynamic discovery of sub-goals in reinforcement learning. In ECML ’02: Proceedings of the 13th European Conference on Machine Learning (pp. 295–306). London: Springer.Google Scholar
- Morimoto, J., & Doya, K. (1998). Reinforcement learning of dynamic motor sequence: Learning to stand up. In The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’98). (pp 1721–1726).Google Scholar
- Peng, J., & Williams, R. J. (1994). Incremental multi-step Q-learning. In International Conference on Machine Learning. (pp. 226–232).Google Scholar
- Peters, J., Vijayakumar, S., & Schaal, S. (2003). Reinforcement learning for humanoid robotics. In IEEE-RAS International Conference on Humanoid Robots. Karlsruhe, Germany.Google Scholar
- Sedgewick, R., & Wayne, K. (2011). Algorithms. Boston: Addison-Wesley.Google Scholar
- Stolle, M. (2004). Automated discovery of options in reinforcement learning (Master’s thesis, McGill University).Google Scholar
- Sutton, R., & Barto, A. (1998). Reinforcement Learning: An Introduction. Cambridge: MIT Press. Retrieved from http://citeseer.ist.psu.edu/sutton98reinforcement.html.
- Takahashi, Y., & Asada, M. (2003). Multi-layered learning systems for vision-based behavior acquisition of a real mobile robot. In Proceedings of SICE Annual Conference 2003 (pp. 2937–2942).Google Scholar
- Tham, C. K., & Prager, R. W. (1994). A modular Q-learning architecture for manipulator task decomposition. In The Eleventh International Conference on Machine Learning (pp. 309–317).Google Scholar
- Theodorou, E., Buchli, J., & Schaal, S. (2010). Reinforcement learning of motor skills in high dimensions: A path integral approach. In The IEEE International Conference on Robotics and Automation (ICRA’10) (pp. 2397–2403). doi:10.1109/ROBOT.2010.5509336.
- Tsitsiklis, J. N., & Roy, B. V. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5), 674–690.Google Scholar
- Uchibe, E., Doya, K. (2004). Competitive-cooperative-concurrent reinforcement learning with importance sampling. In The International Conference on Simulation of Adaptive Behavior: From Animals and Animats (pp. 287–296).Google Scholar
- Yamaguchi, A. (2011). Highly modularized learning system for behavior acquisition of functional robots. Ph.D. Thesis, Nara Institute of Science and Technology, Japan.Google Scholar