Emergence of Safe Behaviours with an Intrinsic Reward
This paper explores the idea that robots can learn safe behaviors without prior knowledge about its environment nor the task at hand, using intrinsic motivation to reverse actions. Our general idea is that if the robot learns to reverse its actions, all the behaviors that emerge from this principle are intrinsically safe. We validate this idea with experiments to benchmark the performance of obstacle avoidance behavior. We compare our algorithm based on an abstract intrinsic reward with a Q-learning algorithm for obstacle avoidance based on external reward signal. Finally, we demonstrate that safety of learning can be increased further by first training the robot in the simulator using the intrinsic reward and then running the test with the real robot in the real environment.
The experimental results show that the performance of the proposed algorithm is on average only 5-10% lower than of the Q-Learning algorithm. A physical robot, using the knowledge obtained in simulation, in real world performs 10% worse than in simulation. However, its performance reaches the same success rate with the physically trained robot after a short learning period. We interpret this as the evidence confirming the hypothesis that our learning algorithm can be used to teach safe behaviors to a robot.
KeywordsIntrinsic Motivation Obstacle Avoidance Real Robot Reverse Action Safe Behaviour
Unable to display preview. Download preview PDF.
- 2.Prescott, T.J., Montes Gonzalez, F.M., Gurney, K., Humphries, M.D., Redgrav, P.: A robot model of the basal ganglia: Behavior and intrinsic processing. Neural Networks 19(1), 31–61 (2006)Google Scholar
- 4.Schmidhuber, J.: Self-Motivated Development Through Rewards for Predictor Errors / Improvements. In: 2005 AAAI Spring Symposium on Developmental Robotics, pp. 1994–1996 (2005)Google Scholar
- 5.Barto, A.G., Singh, S., Chentanez, N.: Intrinsically Motivated Learning of Hierarchical Collections of Skills. In: ICDL 2004, pp. 112–119 (2004)Google Scholar
- 6.Stout, A., Konidaris, G.D., Barto, A.G.: Intrinsically Motivated Reinforcement Learning-A Promising Framework For Developmental Robot Learning. In: The AAAI Spring Symposium on Developmental Robotics (2005)Google Scholar
- 7.Kaplan, F., Oudeyer, P.Y.: Motivational principles for visual know-how development. In: 3rd International Workshop on Epigenetic Robotics, pp. 73–80 (2003)Google Scholar
- 9.Oudeyer, P.Y., Kaplan, F.: What is intrinsic motivation? A topology of computational approaches. In: Front. Neurorobotics, vol. 1 (2007)Google Scholar
- 11.Kruusmaa, M., Gavshin, Y., Eppendahl, A.: Don’t Do Things You Can’t Undo: Reversibility Models for Generating Safe Behaviours. In: ICRA 2007, pp. 1134–1139 (2007)Google Scholar
- 12.Gavshin, Y., Kruusmaa, M.: Comparative experiments on the emergence of safe behaviours. In: TAROS 2008, pp. 65–70 (2008)Google Scholar
- 13.Gerkey, B., Vaughan, R., Howard, A.: The player/stage project: Tools for multi-robot and distributed sensor systems. In: ICAR 2003, pp. 317–323 (2003)Google Scholar
- 16.Macek, K., Petrovic, I., Peric, N.: A Reinforcement Learning Approach to Obstacle Avoidance of Mobile Robot. In: IEEE AMC 2002, pp. 462–466 (2002)Google Scholar