Coaching Robots: Online Behavior Learning from Human Subjective Feedback
This chapter describes a novel methodology for behavior learning of an agent, called Coaching. The proposed method is an interactive and iterative learning method which allows a human trainer to give a subjective evaluation to the robotic agent in real time, and the agent can update the reward function dynamically based on this evaluation simultaneously. We demonstrated that the agent is capable of learning the desired behavior by receiving simple and subjective instructions such as positive and negative. The proposed approach is also effective when it is difficult to determine a suitable reward function for the learning situation in advance. We have conducted several experiments with a simulated and a real robot arm system, and the advantage of the proposed method is verified throughout those experiments.
KeywordsReinforcement Learning Humanoid Robot Radial Basis Function Network Target Behavior Inverted Pendulum
Unable to display preview. Download preview PDF.
- 1.Atkenson, C.G., Schaal, S.: Robot learning from demonstration. In: Proc. of 14th Intl. Conf. on Machine Learning (1997)Google Scholar
- 2.Atkenson, C.G., Schaal, S.: Learning tasks from a single demonstration. In: Proc. of IEEE Intl. Conf. on Robotics and Automation, pp. 1706–1712 (1997)Google Scholar
- 3.Cypher, A., Halbert, D.C., Kurlander, D., et al.: Watch what I do: programming by demonstration. MIT Press, Cambridge (1993)Google Scholar
- 6.Jakel, R., Schmidt-Rohr, S.R., Xue, Z., et al.: Learning of probabilistic grasping strategies using programming by demonstration. In: Proc. of IEEE Intl. on Robotics and Automation, pp. 873–880 (2010)Google Scholar
- 7.Kamatani, H., Kitayama, K., Fujimura, A., et al.: Reinforcement learning in continuous state space. In: SICE Tohoku Chapter Workshop (2006)Google Scholar
- 8.Marcia, R., Ude, A., Atkenson, C., et al.: Coaching: An Approach to Efficiently and Intuitively Create Humanoid Robot Behaviors. In: Proc. of IEEE Intl. Conf. on Humanoid Robots, pp. 567–574 (2007)Google Scholar
- 9.Morimoto, J., Doya, K.: Reinforcement learning of dynamic motor sequence: learning to stand up. In: Proc. of IEEE Intl. Conf. on Intelligent Robots and Systems, pp. 567–574 (1998)Google Scholar
- 10.Nakatani, M., Suzuki, K., Hashimoto, S.: Subjective-Evaluation Oriented Teaching Scheme for a Biped Humanoid Robot. In: Proc. of IEEE Intl. Conf. on Humanoid Robots (2003)Google Scholar
- 11.Schaal, S.: Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences (1999), doi: 10.1016/s1364-6613(99)01327-3Google Scholar
- 12.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
- 14.Thomaz, A.L.: Socially Guided Machine Learning. PhD thesis, Massachusetts Institute of Technology, Cambridge (2006)Google Scholar
- 15.Thomaz, A.L., Hoffman, G., Breazeal, C.: Experiments in Socially Guided Machine Learning: Understanding How Humans Teach. In: Proc. of the 1st Annual Conf. on Human-Robot Interaction, pp. 359–360 (2006)Google Scholar