Skip to main content
Log in

Learning the Selection of Actions for an Autonomous Social Robot by Reinforcement Learning Based on Motivations

  • Published:
International Journal of Social Robotics Aims and scope Submit manuscript

Abstract

Autonomy is a prime issue on robotics field and it is closely related to decision making. Last researches on decision making for social robots are focused on biologically inspired mechanisms for taking decisions. Following this approach, we propose a motivational system for decision making, using internal (drives) and external stimuli for learning to choose the right action. Actions are selected from a finite set of skills in order to keep robot’s needs within an acceptable range. The robot uses reinforcement learning in order to calculate the suitability of every action in each state. The state of the robot is determined by the dominant motivation and its relation to the objects presents in its environment.

The used reinforcement learning method exploits a new algorithm called Object Q-Learning. The proposed reduction of the state space and the new algorithm considering the collateral effects (relationship between different objects) results in a suitable algorithm to be applied to robots living in real environments.

In this paper, a first implementation of the decision making system and the learning process is implemented on a social robot showing an improvement in robot’s performance. The quality of its performance will be determined by observing the evolution of the robot’s wellbeing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alami R, Chatila R, Fleury S, Ghallab M, Ingrand F (1998) An architecture for autonomy. Int J Robot Res 17(4):315–337. doi:10.1177/027836499801700402

    Article  Google Scholar 

  2. Aldewereld H (2007) Autonomy vs. conformity an institutional perspective on norms and protocols. PhD thesis

  3. Ávila García O, Cañamero, L (2004) Using hormonal feedback to modulate action selection in a competitive scenario. In: Proc 8th intl conference on simulation of adaptive behavior (SAB’04)

    Google Scholar 

  4. Bakker B, Zhumatiy V, Gruener G, Schmidhuber J (2003) A robot that reinforcement-learns to identify and memorize important previous observations. In: The IEEE/RSJ international conference on intelligent robots and systems, IROS2003

    Google Scholar 

  5. Barber K, Martin C (1999) Agent autonomy: specification, measurement, and dynamic adjustment. In: Proceedings of the autonomy control software workshop at autonomous agents, vol 1999, pp 8–15

    Google Scholar 

  6. Barber R, Salichs MA (2001) Mobile robot navigation based on event maps. In: International conference on field and service robotics, pp 61–66

    Google Scholar 

  7. Barber R, Salichs M (2002) A new human based architecture for intelligent autonomous robots. In: Proceedings of the 4th IFAC symposium on intelligent autonomous vehicles. Elsevier, Amsterdam, pp 85–90

    Google Scholar 

  8. Bechara A, Damasio H, Damasio AR (2000) Emotion decision making and the orbitofrontal cortex. Cereb Cortex (NY 1991) 10(3):295–307

    Article  Google Scholar 

  9. Bekey G (2005) Autonomous robots: from biological inspiration to implementation and control. MIT Press, Cambridge

    Google Scholar 

  10. Bellman KL (2003) Emotions in humans and artifacts, chap. Emotions: meaningful mappings between the individual and its world. MIT Press, Cambridge

    Google Scholar 

  11. Berridge KC (2004) Motivation concepts in behavioural neuroscience. Physiol Behav 81:179–209

    Article  Google Scholar 

  12. Bonarini A, Lazaric A, Restelli M, Vitali P (2006) Self-development framework for reinforcement learning agents. In: The 5th international conference on developmental learning (ICDL)

    Google Scholar 

  13. Boutilier C, Dearden R, Goldszmidt M (2000) Stochastic dynamic programming with factored representation. Artif Intell 121(1–2):49–107

    Article  MathSciNet  MATH  Google Scholar 

  14. Boyan J, Moore A (1995) Generalization in reinforcement learning: Safely approximating the value function. In: Advances in neural information processing systems, vol 7. MIT Press, Cambridge, pp 369–376

    Google Scholar 

  15. Callum A (1995) Reinforcement learning with selective perception and hidden state. Ph.D. thesis, University of Rochester, Rochester, NY

  16. Cañamero L (1997) Modeling motivations and emotions as a basis for intelligent behavior. In: First international symposium on autonomous agents (Agents’97). ACM Press, New York, pp 148–155

    Chapter  Google Scholar 

  17. Cañamero L (2000) Designing emotions for activity selection. Tech. rep., Dept. of Computer Science Technical Report DAIMI PB 545, University of Aarhus, Denmark

  18. Cañamero L (2003) In: Emotions in humans and artifacts, chap. Designing emotions for activity selection in autonomous agents. MIT Press, Cambridge

    Google Scholar 

  19. Estlin T, Volpe R, Nesnas I, Mutz D, Fisher F, Engelhardt B, Chien S (2001) Decision-making in a robotic architecture for autonomy. In: Proceedings of the international symposium on artificial intelligence, robotics, and automation in space

    Google Scholar 

  20. Gadanho S (1999) Reinforcement learning in autonomous robots: an empirical investigation of the role of emotions. PhD thesis, University of Edinburgh

  21. Gadanho S (2003) Learning behavior-selection by emotions and cognition in a multi-goal robot task. J Mach Learn Res 4:385–412

    Google Scholar 

  22. Gadanho S, Custodio L (2002) Asynchronous learning by emotions and cognition. In: From animals to animats VII, proceedings of the seventh international conference on simulation of adaptive behavior (SAB’02), Edinburgh, UK

    Google Scholar 

  23. Gancet J, Lacroix S (2007) Embedding heterogeneous levels of decisional autonomy in multi-robot systems. In: Distributed autonomous robotic systems, vol 6. Springer, Berlin, pp 263–272. doi:10.1007/978-4-431-35873-2

    Chapter  Google Scholar 

  24. Geerinck T, Colon E, Berrabah SA, Cauwerts K, Sahli H (2006) Tele-robot with shared autonomy: distributed navigation development framework. Integr Comput-Aided Eng 13:329–345

    Google Scholar 

  25. Givan R, Dean T, Greig M (2003) Equivalence notions and model minimization in Markov decision processes. Artif Intell 147(1–2), 163–223

    Article  MathSciNet  MATH  Google Scholar 

  26. Guestrin C, Koller D, Parr R, Venkataraman S (2003) Efficient solution algorithms for factored mdps. J Artif Intell Res 19:399–468

    MathSciNet  MATH  Google Scholar 

  27. Hull CL (1943) Principles of behavior. Appleton Century Crofts, New York

    Google Scholar 

  28. Humphrys M (1997) Action selection methods using reinforcement learning. PhD thesis, Trinity Hall, Cambridge

  29. Isbell C, Shelton CR, Kearns M, Singh S, Stone P (2001) A social reinforcement learning agent. In: the fifth international conference on autonomous agents, Montreal, Quebec, Canada

    Google Scholar 

  30. Kaelbling LP, Littman LM, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Google Scholar 

  31. Kaupp T, Makarenko A, Durrant-Whyte H (2010) Human-robot communication for collaborative decision making—a probabilistic approach. Robot Auton Syst 58(5):444–456. doi:10.1016/j.robot.2010.02.003

    Article  Google Scholar 

  32. Li L, Walsh T, Littman M (2006) Towards a unified theory of state abstraction for MDP. In: Ninth international symposium on artificial intelligence and mathematics, pp 531–539

    Google Scholar 

  33. Lorenz K (1977) Behind the mirror. Methuen Young Books, London. ISBN 04 16942709

    Google Scholar 

  34. Lorenz K, Leyhausen P (1973) Motivation of human and animal behaviour; an ethological view, vol XIX. Van Nostrand-Reinhold, New York

    Google Scholar 

  35. Malfaz M (2007) Decision making system for autonomous social agents based on emotions and self-learning. PhD thesis, Carlos III University of Madrid (2007)

  36. Malfaz M, Salichs M (2009) Learning to deal with objects. In: The 8th international conference on development and learning (ICDL 2009)

    Google Scholar 

  37. Malfaz M, Salichs M (2009) The use of emotions in an autonomous agent’s decision making process. In: Ninth international conference on epigenetic robotics: modeling cognitive development in robotic systems (EpiRob09), Venice, Italy

    Google Scholar 

  38. Malfaz M, Salichs M (2010) Using muds as an experimental platform for testing a decision making system for self-motivated autonomous agents. AISB J 2(1):21–44

    Google Scholar 

  39. Malfaz M, Castro-Gonzalez A, Barber R, Salichs M (2011) A biologically inspired architecture for an autonomous and social robot. IEEE Trans Auton Ment Dev 3(2). doi:10.1109/TAMD.2011.2112766

  40. Martinson E, Stoytchev A, Arkin R (2002) Robot behavioral selection using q-learning. In: The IEEE/RSJ international conference on intelligent robots and systems (IROS), EPFL, Switzerland

    Google Scholar 

  41. Mataric M (1998) Behavior-based robotics as a tool for synthesis of artificial behavior and analysis of natural behavior. Trends Cogn Sci 2(3):82–87

    Article  Google Scholar 

  42. Michaud F, Ferland F, Létourneau D, Legault MA, Lauria M (2010) Toward autonomous, compliant, omnidirectional humanoid robots for natural interaction in real-life settings. Paladyn 1(1):57–65. doi:10.2478/s13230-010-0003-3

    Article  Google Scholar 

  43. Ribeiro CHC, Pegoraro R, RealiCosta AH (2002) Experience generalization for concurrent reinforcement learners: the minimax-qs algorithm. In: AAMAS 2002

    Google Scholar 

  44. Rivas R, Corrales A, Barber R, Salichs MA (2007) Robot skill abstraction for ad architecture. In: 6th IFAC symposium on intelligent autonomous vehicles

    Google Scholar 

  45. Salichs MA, Barber R, Khamis MA, Malfaz M, Gorostiza FJ, Pacheco R, Rivas R, Corrales A, Delgado E (2006) Maggie: a robotic platform for human-robot social interaction. In: IEEE international conference on robotics, automation and mechatronics (RAM), Bangkok, Thailand

    Google Scholar 

  46. Salichs J, Castro-Gonzalez A, Salichs MA (2009) Infrared remote control with a social robot. In: FIRA RoboWorld congress 2009, Incheon, Korea. Springer, Berlin

    Google Scholar 

  47. Salichs MA, Malfaz M, Gorostiza JF (2010) Toma de decisiones en robotica. Rev Iberoam Autom Inf Ind 7(2):5–16

    Article  Google Scholar 

  48. Santa-Cruz J, Tobal JM, Vindel AC, Fernández EG (1989) Introducción a la psicología. Facultad de Psicología. Universidad Complutense de Madrid

  49. Schermerhorn P, Scheutz M (2009) Dynamic robot autonomy: investigating the effects of robot decision-making in a human-robot team task. In: Procceding of ICMI-MLMI 2009, Cambridge, MA, USA. doi:10.1145/1647314.1647328

    Google Scholar 

  50. Scheutz M, Schermerhorn P (2009) Affective goal and task selection for social robots. Handbook of research on synthetic emotions and sociable robotics: new applications in affective computing and artificial intelligence, p 74

  51. Smart WD, Kaelbling LP (2002) Effective reinforcement learning for mobile robots. In: International conference on robotics and automation (ICRA2002)

    Google Scholar 

  52. Smith EB (2009) The motion control of a mobile robot using multiobjective decision making. In: ACM-SE 47: proceedings of the 47th annual southeast regional conference. ACM Press, New York, pp 1–6

    Chapter  Google Scholar 

  53. Sprague N, Ballard D (2003) Multiple-goal reinforcement learning with modular sarsa(0). In: The 18th international joint conference on artificial intelligence (IJCAI-03), Acapulco, Mexico

    Google Scholar 

  54. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press/Bradford Book, Cambridge

    Google Scholar 

  55. Thomaz AL, Breazeal C (2006) Transparency and socially guided machine learning. In: The 5th international conference on developmental learning (ICDL)

    Google Scholar 

  56. Touzet C (2003) Q-learning for robots. In: The handbook of brain theory and neural networks. MIT Press, Cambridge, pp 934–937

    Google Scholar 

  57. Velásquez J (1997) Modeling emotions and other motivations in synthetic agents. In: Fourteenth national conf artificial intelligence

    Google Scholar 

  58. Velásquez J (1998) Modelling emotion-based decision-making. In: 1998 AAAI fall symposium emotional and intelligent: the tangled knot of cognition

    Google Scholar 

  59. Velásquez J (1998) When robots weep: emotional memories and decision making. In: Proceedings of AAAI-98

    Google Scholar 

  60. Verhagen H (2000) Norm autonomous agents. PhD thesis, The Royal Institute of Technology and Stockholm University

  61. Vigorito C, Barto A (2010) Intrinsically motivated hierarchical skill learning in structured environment. IEEE Trans Auton Ment Dev 2(2):132–143. Special Issue on Active Learning and Intrinsically Motivated Exploration in Robots

    Article  Google Scholar 

  62. Watkins CJ (1989) Models of delayed reinforcement learning. PhD thesis, Cambridge University, Cambridge, UK

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Álvaro Castro-González.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Castro-González, Á., Malfaz, M. & Salichs, M.A. Learning the Selection of Actions for an Autonomous Social Robot by Reinforcement Learning Based on Motivations. Int J of Soc Robotics 3, 427–441 (2011). https://doi.org/10.1007/s12369-011-0113-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12369-011-0113-z

Keywords

Navigation