Design of a Control Architecture for Habit Learning in Robots

  • Erwan Renaudo
  • Benoît Girard
  • Raja Chatila
  • Mehdi Khamassi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8608)

Abstract

Researches in psychology and neuroscience have identified multiple decision systems in mammals, enabling control of behavior to shift with training and familiarity of the environment from a goal-directed system to a habitual system. The former relies on the explicit estimation of future consequences of actions through planning towards a particular goal, which makes decision time longer but produces rapid adaptation to changes in the environment. The latter learns to associate values to particular stimulus-response associations, leading to quick reactive decision- making but slow relearning in response to environmental changes. Computational neuroscience models have formalized this as a coordination of model-based and model-free reinforcement learning. From this inspiration we hypothesize that it could enable robots to learn habits, detect when these habits are appropriate and thus avoid long and costly computations of the planning system. We illustrate this in a simple repetitive cube-pushing task on a conveyor belt, where a speed-accuracy trade-off is required. We show that the two systems have complementary advantages in these tasks, which can be combined for performance improvement.

Keywords

Adaptive Behaviour Habit Learning Reinforcement Learning Robotic Architecture 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Balleine, B.W., Dickinson, A.: Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419 (1998)CrossRefGoogle Scholar
  2. 2.
    Balleine, B.W., O’Doherty, J.P.: Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010)CrossRefGoogle Scholar
  3. 3.
    Caluwaerts, K., Favre-Félix, A., Staffa, M., N’Guyen, S., Grand, C., Girard, B., Khamassi, M.: Neuro-inspired navigation strategies shifting for robots: Integration of a multiple landmark taxon strategy. In: Prescott, T.J., Lepora, N.F., Mura, A., Verschure, P.F.M.J. (eds.) Living Machines 2012. LNCS, vol. 7375, pp. 62–73. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Caluwaerts, K., Staffa, M., N’Guyen, S., Grand, C., Dollé, L., Favre-Félix, A., Girard, B., Khamassi, M.: A biologically inspired meta-control navigation system for the psikharpax rat robot. Bioinspiration and Biomimetics (2012)Google Scholar
  5. 5.
    Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience 8(12), 1704–1711 (2005)CrossRefGoogle Scholar
  6. 6.
    Dezfouli, A., Balleine, B.W.: Habits, action sequences and reinforcement learning. European Journal of Neuroscience 35(7), 1036–1051 (2012)CrossRefGoogle Scholar
  7. 7.
    Dickinson, A.: Contemporary animal learning theory. Cambridge University Press, Cambridge (1980)Google Scholar
  8. 8.
    Dickinson, A.: Actions and habits: The development of behavioural autonomy. Phil Trans Roy Soc B: Biol Sci 308, 67–78 (1985)CrossRefGoogle Scholar
  9. 9.
    Dollé, L., Sheynikhovich, D., Girard, B., Chavarriaga, R., Guillot, A.: Path planning versus cue responding: a bioinspired model of switching between navigation strategies. Biological Cybernetics 103(4), 299–317 (2010)CrossRefMATHGoogle Scholar
  10. 10.
    Gat, E.: On three-layer architectures. In: Artificial Intelligence and Mobile Robots. MIT Press (1998)Google Scholar
  11. 11.
    Huys, Q.J., Eshel, N., O’Nions, E., Sheridan, L., Dayan, P., Roiser, J.P.: Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Computational Biology 8(3) (2012)Google Scholar
  12. 12.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  13. 13.
    Keramati, M., Dezfouli, A., Piray, P.: Speed/accuracy trade-off between the habitual and goal-directed processes. PLoS Computational Biology 7(5), 1–25 (2011)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Khamassi, M., Humphries, M.D.: Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Frontiers in Behavioral Neuroscience 6, 79 (2012)CrossRefGoogle Scholar
  15. 15.
    Kober, J., Bagnell, D., Peters, J.: Reinforcement learning in robotics: A survey. International Journal of Robotics Research (11), 1238–1274 (2013)Google Scholar
  16. 16.
    Lesaint, F., Sigaud, O., Flagel, S.B., Robinson, T.E., Khamassi, M.: Modelling Individual Differences in the Form of Pavlovian Conditioned Approach Responses: A Dual Learning Systems Approach with Factored Representations. PLoS Comput Biol 10(2) (February 2014)Google Scholar
  17. 17.
    Minguez, J., Lamiraux, F., Laumond, J.P.: Motion planning and obstacle avoidance. In: Siciliano, B., Khatib, O. (eds.) Handbook of Robotics, pp. 827–852. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Quigley, M., Conley, K., Gerkey, B.P., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.Y.: Ros: an open-source robot operating system. In: ICRA Workshop on Open Source Software (2009)Google Scholar
  19. 19.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)Google Scholar
  20. 20.
    Watkins, C.: Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, UK (1989)Google Scholar
  21. 21.
    Yin, H.H., Ostlund, S.B., Balleine, B.W.: Reward-guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico-basal ganglia networks. Eur. J. Neurosci. 28, 1437–1448 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Erwan Renaudo
    • 1
    • 2
  • Benoît Girard
    • 1
    • 2
  • Raja Chatila
    • 1
    • 2
  • Mehdi Khamassi
    • 1
    • 2
  1. 1.Institut des Systèmes Intelligents et de RobotiqueSorbonne Universités, UPMC Univ Paris 06, UMR 7222ParisFrance
  2. 2.Institut des Systèmes Intelligents et de RobotiqueCNRS, UMR 7222ParisFrance

Personalised recommendations