Design of a Control Architecture for Habit Learning in Robots

Renaudo, Erwan; Girard, Benoît; Chatila, Raja; Khamassi, Mehdi

doi:10.1007/978-3-319-09435-9_22

Erwan Renaudo^24,25,
Benoît Girard^24,25,
Raja Chatila^24,25 &
…
Mehdi Khamassi^24,25

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8608))

Included in the following conference series:

Conference on Biomimetic and Biohybrid Systems

2887 Accesses
14 Citations

Abstract

Researches in psychology and neuroscience have identified multiple decision systems in mammals, enabling control of behavior to shift with training and familiarity of the environment from a goal-directed system to a habitual system. The former relies on the explicit estimation of future consequences of actions through planning towards a particular goal, which makes decision time longer but produces rapid adaptation to changes in the environment. The latter learns to associate values to particular stimulus-response associations, leading to quick reactive decision- making but slow relearning in response to environmental changes. Computational neuroscience models have formalized this as a coordination of model-based and model-free reinforcement learning. From this inspiration we hypothesize that it could enable robots to learn habits, detect when these habits are appropriate and thus avoid long and costly computations of the planning system. We illustrate this in a simple repetitive cube-pushing task on a conveyor belt, where a speed-accuracy trade-off is required. We show that the two systems have complementary advantages in these tasks, which can be combined for performance improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Balleine, B.W., Dickinson, A.: Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419 (1998)
Article Google Scholar
Balleine, B.W., O’Doherty, J.P.: Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010)
Article Google Scholar
Caluwaerts, K., Favre-Félix, A., Staffa, M., N’Guyen, S., Grand, C., Girard, B., Khamassi, M.: Neuro-inspired navigation strategies shifting for robots: Integration of a multiple landmark taxon strategy. In: Prescott, T.J., Lepora, N.F., Mura, A., Verschure, P.F.M.J. (eds.) Living Machines 2012. LNCS, vol. 7375, pp. 62–73. Springer, Heidelberg (2012)
Chapter Google Scholar
Caluwaerts, K., Staffa, M., N’Guyen, S., Grand, C., Dollé, L., Favre-Félix, A., Girard, B., Khamassi, M.: A biologically inspired meta-control navigation system for the psikharpax rat robot. Bioinspiration and Biomimetics (2012)
Google Scholar
Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience 8(12), 1704–1711 (2005)
Article Google Scholar
Dezfouli, A., Balleine, B.W.: Habits, action sequences and reinforcement learning. European Journal of Neuroscience 35(7), 1036–1051 (2012)
Article Google Scholar
Dickinson, A.: Contemporary animal learning theory. Cambridge University Press, Cambridge (1980)
Google Scholar
Dickinson, A.: Actions and habits: The development of behavioural autonomy. Phil Trans Roy Soc B: Biol Sci 308, 67–78 (1985)
Article Google Scholar
Dollé, L., Sheynikhovich, D., Girard, B., Chavarriaga, R., Guillot, A.: Path planning versus cue responding: a bioinspired model of switching between navigation strategies. Biological Cybernetics 103(4), 299–317 (2010)
Article MATH Google Scholar
Gat, E.: On three-layer architectures. In: Artificial Intelligence and Mobile Robots. MIT Press (1998)
Google Scholar
Huys, Q.J., Eshel, N., O’Nions, E., Sheridan, L., Dayan, P., Roiser, J.P.: Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Computational Biology 8(3) (2012)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Keramati, M., Dezfouli, A., Piray, P.: Speed/accuracy trade-off between the habitual and goal-directed processes. PLoS Computational Biology 7(5), 1–25 (2011)
Article MathSciNet Google Scholar
Khamassi, M., Humphries, M.D.: Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Frontiers in Behavioral Neuroscience 6, 79 (2012)
Article Google Scholar
Kober, J., Bagnell, D., Peters, J.: Reinforcement learning in robotics: A survey. International Journal of Robotics Research (11), 1238–1274 (2013)
Google Scholar
Lesaint, F., Sigaud, O., Flagel, S.B., Robinson, T.E., Khamassi, M.: Modelling Individual Differences in the Form of Pavlovian Conditioned Approach Responses: A Dual Learning Systems Approach with Factored Representations. PLoS Comput Biol 10(2) (February 2014)
Google Scholar
Minguez, J., Lamiraux, F., Laumond, J.P.: Motion planning and obstacle avoidance. In: Siciliano, B., Khatib, O. (eds.) Handbook of Robotics, pp. 827–852. Springer, Heidelberg (2008)
Chapter Google Scholar
Quigley, M., Conley, K., Gerkey, B.P., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.Y.: Ros: an open-source robot operating system. In: ICRA Workshop on Open Source Software (2009)
Google Scholar
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)
Google Scholar
Watkins, C.: Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, UK (1989)
Google Scholar
Yin, H.H., Ostlund, S.B., Balleine, B.W.: Reward-guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico-basal ganglia networks. Eur. J. Neurosci. 28, 1437–1448 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institut des Systèmes Intelligents et de Robotique, Sorbonne Universités, UPMC Univ Paris 06, UMR 7222, F-75005, Paris, France
Erwan Renaudo, Benoît Girard, Raja Chatila & Mehdi Khamassi
Institut des Systèmes Intelligents et de Robotique, CNRS, UMR 7222, F-75005, Paris, France
Erwan Renaudo, Benoît Girard, Raja Chatila & Mehdi Khamassi

Authors

Erwan Renaudo
View author publications
You can also search for this author in PubMed Google Scholar
Benoît Girard
View author publications
You can also search for this author in PubMed Google Scholar
Raja Chatila
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Khamassi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universitat Pompeu Fabra, Barcelona, Spain
Armin Duff
University of Bristol, UK
Nathan F. Lepora
University of Pompeau Fabra, Barcelona, Spain
Anna Mura
University of Sheffield, Sheffield, UK
Tony J. Prescott
Universitat Pompeu Fabra and Catalan Institution for Research and Advanced Studies, Barcelona, Spain
Paul F. M. J. Verschure

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Renaudo, E., Girard, B., Chatila, R., Khamassi, M. (2014). Design of a Control Architecture for Habit Learning in Robots. In: Duff, A., Lepora, N.F., Mura, A., Prescott, T.J., Verschure, P.F.M.J. (eds) Biomimetic and Biohybrid Systems. Living Machines 2014. Lecture Notes in Computer Science(), vol 8608. Springer, Cham. https://doi.org/10.1007/978-3-319-09435-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-09435-9_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09434-2
Online ISBN: 978-3-319-09435-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics