Combining Self-organizing Maps with Mixtures of Experts: Application to an Actor-Critic Model of Reinforcement Learning in the Basal Ganglia

  • Mehdi Khamassi
  • Louis-Emmanuel Martinet
  • Agnès Guillot
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4095)


In a reward-seeking task performed in a continuous environment, our previous work compared several Actor-Critic (AC) architectures implementing dopamine-like reinforcement learning mechanisms in the rat’s basal ganglia. The task complexity imposes the coordination of several AC submodules, each module being an expert trained in a particular subset of the task. We showed that the classical method where the choice of the expert to train at a given time depends on each expert’s performance suffered from strong limitations. We rather proposed to cluster the continuous state space by an ad hoc method that lacked autonomy and generalization abilities. In the present work we have combined the mixture of experts with self-organizing maps in order to cluster autonomously the experts’ responsibility space. On the one hand, we find that classical Kohonen maps give very variable results: some task decompositions provide very good and stable reinforcement learning performances, whereas some others are unadapted to the task. Moreover, they require the number of experts to be set a priori. On the other hand, algorithms like Growing Neural Gas or Growing When Required have the property to choose autonomously and incrementally the number of experts to train. They lead to good performances, even if they are still weaker than our hand-tuned task decomposition and than the best Kohonen maps that we got. We finally discuss on propositions about what information to add to these algorithms, such as knowledge of current behavior, in order to make the task decomposition appropriate to the reinforcement learning process.


Reinforcement Learning Task Decomposition Color Table Continuous State Space Responsibility Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Albertin, S.V., Mulder, A.B., Tabuchi, E., Zugaro, M.B., Wiener, S.I.: Lesions of the medial shell of the nucleus accumbens impair rats in finding larger rewards, but spare reward-seeking behavior. Behavioral Brain Research 117(1-2), 173–183 (2000)CrossRefGoogle Scholar
  2. 2.
    Arleo, A., Gerstner, W.: Spatial cognition and neuro-mimetic navigation: a model of hippo-campal place cell activity. Biological Cybernetics 83(3), 287–299 (2000)CrossRefGoogle Scholar
  3. 3.
    Baldassarre, G.: A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviors. Journal of Cognitive Systems Research 3(1), 5–13 (2002)CrossRefGoogle Scholar
  4. 4.
    Doya, K., Samejima., K., Katagiri, K., Kawato, M.: Multiple model-based reinforcement learning. Neural Computation 14(6), 1347–1369 (2002)zbMATHCrossRefGoogle Scholar
  5. 5.
    Filliat, D., Girard, B., Guillot, A., Khamassi, M., Lachèze, L., Meyer, J.-A.: State of the artificial rat Psikharpax. In: Schaal, S., Ijspeert, A., Billard, A., Vijayakumar, S., Hallam, J., Meyer, J.-A. (eds.) From Animals to Animats 8: Proceedings of the Seventh International Conference on Simulation of Adaptive Behavior, pp. 3–12. MIT Press, Cambridge (2004)Google Scholar
  6. 6.
    Fritzke, B.: A growing neural gas network learns topologies. In: Tesauro, G., Touretzkys, D.S., Leen, K. (eds.) Advances in Neural Information Processing Systems, pp. 625–632. MIT Press, Cambridge (1995)Google Scholar
  7. 7.
    Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Computation 4, 1–58 (1992)CrossRefGoogle Scholar
  8. 8.
    Gurney, K., Prescott, T.J., Redgrave, P.: A computational model of action selection in the basal ganglia. I. A new functional anatomy. Biological Cybernetics 84, 401–410 (2001)zbMATHCrossRefGoogle Scholar
  9. 9.
    Holmström, J.: Growing neural gas: Experiments with GNG, GNG with utility and supervised GNG. Master’s thesis, Uppsala University (2002)Google Scholar
  10. 10.
    Jog, M.S., Kubota, Y., Connolly, C.I., Hillegaart, V., Graybiel, A.M.: Building neural representations of habits. Science 286(5445), 1745–1749 (1999)CrossRefGoogle Scholar
  11. 11.
    Khamassi, M., Lachèze, L., Girard, B., Berthoz, A., Guillot, A.: Actor-critic models of rein-forcement learning in the basal ganglia: From natural to artificial rats. Adaptive Behavior, Special Issue Towards Artificial Rodents 13(2), 131–148 (2005)Google Scholar
  12. 12.
    Kohonen, T.: Self-organizing maps. Springer, Heidelberg (1995)Google Scholar
  13. 13.
    Lee, J.K., Kim, I.H.: Reinforcement learning control using self-organizing map and multi-layer feed-forward neural network. In: International Conference on Control Automation and Systems, ICCAS 2003 (2003)Google Scholar
  14. 14.
    Meyer, J.-A., Guillot, A., Girard, B., Khamassi, M., Pirim, P., Berthoz, A.: The Psikharpax project: Towards building an artificial rat. Robotics and Autonomous Systems 50(4), 211–223 (2005)CrossRefGoogle Scholar
  15. 15.
    Marsland, S., Shapiro, J., Nehmzow, U.: A self-organising network that grows when required. Neural Networks 15, 1041–1058 (2002)CrossRefGoogle Scholar
  16. 16.
    Prescott, T.J., Redgrave, P., Gurney, K.: Layered control architectures in robots and vertebrates. Adaptive Behavior 7, 99–127 (1999)CrossRefGoogle Scholar
  17. 17.
    Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)CrossRefGoogle Scholar
  18. 18.
    Smith, A.J.: Applications of the self-organizing map to reinforcement learning. Neural Networks 15(8-9), 1107–1124 (2002)CrossRefGoogle Scholar
  19. 19.
    Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. The MIT Press, Cambridge (1998)Google Scholar
  20. 20.
    Tang, B., Heywood, M.I., Shepherd, M.: Input Partitioning to Mixture of Experts. In: IEEE/INNS International Joint Conference on Neural Networks, Honolulu, Hawaii, pp. 227–232 (2002)Google Scholar
  21. 21.
    Tani, J., Nolfi, S.: Learning to perceive the world as articulated: an approach for hierarchical learning in sensory-motor systems. Neural Networks 12, 1131–1141 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Mehdi Khamassi
    • 1
    • 2
  • Louis-Emmanuel Martinet
    • 1
  • Agnès Guillot
    • 1
  1. 1.AnimatLab – LIP6, F-75005 Paris, France ; CNRS, UMR7606Université Pierre et Marie Curie – Paris 6, UMR7606ParisFrance
  2. 2.Laboratoire de Physiologie de la Perception et de l’Action, UMR7152 CNRS, Collège de FranceParisFrance

Personalised recommendations