A Model of Reaching that Integrates Reinforcement Learning and Population Encoding of Postures

  • Dimitri Ognibene
  • Angelo Rega
  • Gianluca Baldassarre
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4095)


When monkeys tackle novel complex behavioral tasks by trial-and-error they select actions from repertoires of sensorimotor primitives that allow them to search solutions in a space which is coarser than the space of fine movements. Neuroscientific findings suggested that upper-limb sensorimotor primitives might be encoded, in terms of the final goal-postures they pursue, in premotor cortex. A previous work by the authors reproduced these results in a model based on the idea that cortical pathways learn sensorimotor primitives while basal ganglia learn to assemble and trigger them to pursue complex reward-based goals. This paper extends that model in several directions: a) it uses a Kohonen network to create a neural map with population encoding of postural primitives; b) it proposes an actor-critic reinforcement learning algorithm capable of learning to select those primitives in a biologically plausible fashion (i.e., through a dynamic competition between postures); c) it proposes a procedure to pre-train the actor to select promising primitives when tackling novel reinforcement learning tasks. Some tests (obtained with a task used for studying monkeys engaged in learning reaching-action sequences) show that the model is computationally sound and capable of learning to select sensorimotor primitives from the postures’ continuous space on the basis of their population encoding.


Premotor Cortex Adulthood Phase Posture Controller Dynamic Competition Accumulator Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aflalo, T.N., Graziano, M.S.A.: Partial Tuning of Motor Cortex Neurons to Final Posture in a Free-Moving Paradigm. Proceedings of the National Academy of Science 103(8), 2909–2914 (2006)CrossRefGoogle Scholar
  2. 2.
    Arbib, M.: Visuomotor Coordination: From Neural Nets to Schema Theory. Cognition and Brain Theory 4, 23–39 (1981)Google Scholar
  3. 3.
    Baldassarre, G.: A Modular Neural-Network Model of the Basal Ganglia’s Role in Learning and Selecting Motor Behaviours. Journal of Cognitive Systems Research 3, 5–13 (2002)CrossRefGoogle Scholar
  4. 4.
    Barto, A.G., Mahadevan, S.: Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems 13, 341–379 (2003)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Girard, B., Filliat, D., Meyer, J.-A., Berthoz, A., Guillot, A.: Integration of Navigation and Action Selection Functionalities in a Computational Model of Cortico-Basal Ganglia-Thalamo-Cortical Loops. Adaptive Behavior 13(2), 115–130 (2005)CrossRefGoogle Scholar
  6. 6.
    Giszter, S.F., Mussa-Ivaldi, F.A., Bizzi, E.: Convergent Force Fields Organised in the Frog’s Spinal Cord. Journal of Neuroscience 13(2), 467–491 (1993)Google Scholar
  7. 7.
    Graybiel, A.M.: The Basal Ganglia and Chunking of Action Repertoires. Neurobiology of Learning and Memory 70, 119–136 (1998)CrossRefGoogle Scholar
  8. 8.
    Graziano, M.S., Taylor, C.S., Moore, T.: Complex Movements Evoked by Microstimulation of Precentral Cortex. Neuron 34, 841–851 (2002)CrossRefGoogle Scholar
  9. 9.
    Gurney, K., Prescott, T.J., Redgrave, P.: A Computational Model of Action Selection in the Basal Ganglia I. A New Functional Anatomy. Biological Cybernetics 84, 401–410 (2001)MATHCrossRefGoogle Scholar
  10. 10.
    Houk, J.C., Davis, J.L., Beiser, D.G. (eds.): Models of Information Processing in the Basal Ganglia. MIT Press, Cambridge (1995)Google Scholar
  11. 11.
    Joel, D.E.E., Niv, Y., Ruppin, E.: Actor-critic Models of the Basal Ganglia: New Anatomical and Computational Perspectives. Neural Networks 15, 535–547 (2002)CrossRefGoogle Scholar
  12. 12.
    Kandel, E.R., Schwartz, J.H., Jessell, T.M.: Principles of Neural Science. McGraw-Hill, New York (2000)Google Scholar
  13. 13.
    Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Heidelberg (2001)MATHGoogle Scholar
  14. 14.
    Kuperstein, M.: A Neural Model of Adaptive Hand-Eye Coordination for Single Postures. Science 239, 1308–1311 (1988)CrossRefGoogle Scholar
  15. 15.
    Meltzoff, A.N., Moore, M.K.: Explaining Facial Imitation: A Theoretical Model. Early Development and Parenting 6, 179–192 (1997)CrossRefGoogle Scholar
  16. 16.
    Ognibene, D., Mannella, F., Pezzulo, G., Baldassarre, G.: Integrating Reinforcement-Learning, Accumulator Models, and Motor-Primitives to Study Action Selection and Reaching in Monkeys. In: Fum, D., Del Missier, F., Stocco, A. (eds.) Proceedings of the 7th International Conference on Cognitive Modelling - ICCM 2006, pp. 214–219 (2006)Google Scholar
  17. 17.
    Pasupathy, A., Miller, E.K.: Different Time Courses of Learning-Related Activity In the Prefrontal Cortex and Striatum. Nature 433, 873–876 (2005)CrossRefGoogle Scholar
  18. 18.
    Pouget, A., Lathaam, P.E.: Population Codes. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks, pp. 893–897. MIT Press, Cambridge (2003)Google Scholar
  19. 19.
    Rand, M.K., Hikosaka, O., Miyachi, S., Lu, X., Miyashita, K.: Characteristics of a Long-Term Procedural Skill in the Monkey. Experimental Brain Research 118, 293–297 (1998)CrossRefGoogle Scholar
  20. 20.
    Widrow, B., Hoff, M.E.: Adaptive Switching Circuits. IRE WESCON Convention Record 4, 96–104 (1960)Google Scholar
  21. 21.
    Schall, J.D.: Neural Basis of Deciding, Choosing and Acting. Nature Reviews Neuroscience 2, 33–42 (2001)CrossRefGoogle Scholar
  22. 22.
    Shadmehr, R., Wise, S.: The Computational Neurobiology of Reaching and Pointing. MIT Press, Cambridge (2005)Google Scholar
  23. 23.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  24. 24.
    Usher, M., McClelland, J.L.: On the Time Course of Perceptual Choice: The Leaky Competing Accumulator Model. Psychological Review 108, 550–592 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Dimitri Ognibene
    • 1
  • Angelo Rega
    • 1
  • Gianluca Baldassarre
    • 1
  1. 1.Laboratory of Autonomous Robotics and Artificial Life, Istituto di Scienze e Tecnologie della CognizioneConsiglio Nazionale delle Ricerche (LARAL-ISTC-CNR)RomaItaly

Personalised recommendations