Coaching Robots: Online Behavior Learning from Human Subjective Feedback

  • Masakazu Hirkoawa
  • Kenji Suzuki
Part of the Studies in Computational Intelligence book series (SCI, volume 442)


This chapter describes a novel methodology for behavior learning of an agent, called Coaching. The proposed method is an interactive and iterative learning method which allows a human trainer to give a subjective evaluation to the robotic agent in real time, and the agent can update the reward function dynamically based on this evaluation simultaneously. We demonstrated that the agent is capable of learning the desired behavior by receiving simple and subjective instructions such as positive and negative. The proposed approach is also effective when it is difficult to determine a suitable reward function for the learning situation in advance. We have conducted several experiments with a simulated and a real robot arm system, and the advantage of the proposed method is verified throughout those experiments.


Reinforcement Learning Humanoid Robot Radial Basis Function Network Target Behavior Inverted Pendulum 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Atkenson, C.G., Schaal, S.: Robot learning from demonstration. In: Proc. of 14th Intl. Conf. on Machine Learning (1997)Google Scholar
  2. 2.
    Atkenson, C.G., Schaal, S.: Learning tasks from a single demonstration. In: Proc. of IEEE Intl. Conf. on Robotics and Automation, pp. 1706–1712 (1997)Google Scholar
  3. 3.
    Cypher, A., Halbert, D.C., Kurlander, D., et al.: Watch what I do: programming by demonstration. MIT Press, Cambridge (1993)Google Scholar
  4. 4.
    Doya, K.: Reinforcement learning in continuous time and space. J. Neural Computation 12(1), 219–245 (2000), doi:10.1162/089976600300015880CrossRefGoogle Scholar
  5. 5.
    Inamura, T., Toshima, I., Tanie, H., et al.: Embodied Symbol Emergence Based on Mimesis Theory. J. Robotics Research 23(4), 363–377 (2004), doi:10.1177/0278364904042199CrossRefGoogle Scholar
  6. 6.
    Jakel, R., Schmidt-Rohr, S.R., Xue, Z., et al.: Learning of probabilistic grasping strategies using programming by demonstration. In: Proc. of IEEE Intl. on Robotics and Automation, pp. 873–880 (2010)Google Scholar
  7. 7.
    Kamatani, H., Kitayama, K., Fujimura, A., et al.: Reinforcement learning in continuous state space. In: SICE Tohoku Chapter Workshop (2006)Google Scholar
  8. 8.
    Marcia, R., Ude, A., Atkenson, C., et al.: Coaching: An Approach to Efficiently and Intuitively Create Humanoid Robot Behaviors. In: Proc. of IEEE Intl. Conf. on Humanoid Robots, pp. 567–574 (2007)Google Scholar
  9. 9.
    Morimoto, J., Doya, K.: Reinforcement learning of dynamic motor sequence: learning to stand up. In: Proc. of IEEE Intl. Conf. on Intelligent Robots and Systems, pp. 567–574 (1998)Google Scholar
  10. 10.
    Nakatani, M., Suzuki, K., Hashimoto, S.: Subjective-Evaluation Oriented Teaching Scheme for a Biped Humanoid Robot. In: Proc. of IEEE Intl. Conf. on Humanoid Robots (2003)Google Scholar
  11. 11.
    Schaal, S.: Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences (1999), doi: 10.1016/s1364-6613(99)01327-3Google Scholar
  12. 12.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  13. 13.
    Tamosiunaite, M., Asfour, T., Florentin, W.: Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions. J. Biological Cybernetics 100(3), 249–260 (2009), doi:10.1007/s00422-009-0295-8MathSciNetCrossRefGoogle Scholar
  14. 14.
    Thomaz, A.L.: Socially Guided Machine Learning. PhD thesis, Massachusetts Institute of Technology, Cambridge (2006)Google Scholar
  15. 15.
    Thomaz, A.L., Hoffman, G., Breazeal, C.: Experiments in Socially Guided Machine Learning: Understanding How Humans Teach. In: Proc. of the 1st Annual Conf. on Human-Robot Interaction, pp. 359–360 (2006)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Dept. of Intelligent Interaction TechnologiesUniversity of TsukubaTsukubaJapan
  2. 2.Faculty of Engineering, Information and SystemsUniversity of TsukubaTsukubaJapan

Personalised recommendations