Creating Brain-Like Intelligence pp 328-350

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5436) | Cite as

Basal Ganglia Models for Autonomous Behavior Learning

  • Hiroshi Tsujino
  • Johane Takeuchi
  • Osamu Shouno

Abstract

We propose two basal ganglia (BG) models for autonomous behavior learning: the BG system model and the BG spiking neural network model. These models were developed on the basis of reinforcement learning (RL) theories and neuroscience principals of behavioral learning. The BG system model focuses on problems with RL input selection and reward setting. This model assumes that parallel BG modules receive a variety of inputs. We also propose an automatic setting method of internal reward for this model. The BG spiking neural network model focuses on problems with biological neural network architecture, ambiguous inputs and the mechanism of timing. This model accounts for the neurophysiological characteristics of neurons and differential functions of the direct and indirect pathways. We demonstrate that the BG system model achieves goals in fewer trials by learning the internal state representation, whereas the BG spiking neural network model has the capacity for probabilistic selection of action. Our results suggest that these two models are a step toward developing an autonomous learning system.

Keywords

system architecture basal ganglia reinforcement learning modular learning system reward spiking neuron input space selection execution timing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Reiner, A., Medina, L., Veenman, C.L.: Structural and functional evolution of the basal ganglia in vertebrates. Brain Res. Brain Res. Rev. 28(3), 235–285 (1998)CrossRefPubMedGoogle Scholar
  2. 2.
    Barto, A.G., Sutton, R.S., Anderson, C.: Neuron-like adaptive elements that can solve difficult learning control problems. IEEE Trans. on Systems, Man, and Cybernetics, SMC 13, 834–846 (1983)CrossRefGoogle Scholar
  3. 3.
    Montague, P.R., Dayan, P., Sejnowski, T.J.: A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996)PubMedGoogle Scholar
  4. 4.
    Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275(5306), 1593–1599 (1997)CrossRefPubMedGoogle Scholar
  5. 5.
    Berns, G.S., McClure, S.M., Pagnoni, G., Montague, P.R.: Predictability modulates human brain response to reward. J. Neurosci. 21, 2793–2798 (2001)PubMedGoogle Scholar
  6. 6.
    Haruno, M., Kuroda, T., Doya, K., Toyama, K., Kimura, M., Samejima, K., Imamizu, H., Kawato, M.: A neural correlate of reward-based behavioral learning in caudate nucleus: a functional magnetic resonance imaging study of a stochastic decision task. J. Neurosci. 24, 1660–1665 (2004)CrossRefPubMedGoogle Scholar
  7. 7.
    McHaffie, J.G., Jiang, H., May, P.J., Coizet, V., Overton, P.G., Stein, B.E., Redgrave, P.: A direct projection from superior colliculus to substantia nigra pars compacta in the cat. Neurosci. 138, 221–234 (2006)CrossRefGoogle Scholar
  8. 8.
    Balleine, B.W., Delgado, M.R., Hikosaka, O.: The role of the dorsal striatum in reward and decision-making. J. Neurosci. 27, 8161–8165 (2007)CrossRefPubMedGoogle Scholar
  9. 9.
    Niv, Y., Schoenbaum, G.: Dialogues on prediction errors. Trends Cogn. Sci. 12(7), 265–272 (2008)CrossRefPubMedGoogle Scholar
  10. 10.
    Dayan, P., Niv, Y.: Reinforcement learning: The Good. The Bad and The Ugly, Curr. Opin. Neurobiol. 18(2), 185–196 (2008)CrossRefPubMedGoogle Scholar
  11. 11.
    Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005)CrossRefPubMedGoogle Scholar
  12. 12.
    Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proc. of the Seventh International Conference on Machine Learning, Austin, TX (1990)Google Scholar
  13. 13.
    Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artif. Intell. 72(1), 81–138 (1995)CrossRefGoogle Scholar
  14. 14.
    Sutton, R.S.: Learning to predict by the method of temporal differences. Machine Learning 3(1), 9–44 (1988)Google Scholar
  15. 15.
    Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8(3), 279–292 (1992)Google Scholar
  16. 16.
    Coutureau, E., Killcross, S.: Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats. Behav. Brain Res. 146, 167–174 (2003)CrossRefPubMedGoogle Scholar
  17. 17.
    Balleine, B.W., Killcross, A.S., Dickinson, A.: The effect of lesions of the basolateral amygdale on instrumental conditioning. J. Neurosci. 23, 666–675 (2003)PubMedGoogle Scholar
  18. 18.
    Balleine, B.W.: Neural bases of food-seeking: affect, arousal and reward in corticostriatolimbic circuits. Physiol. Behav. 86, 717–730 (2005)CrossRefPubMedGoogle Scholar
  19. 19.
    Valentin, V.V., Dickinson, A., O’Doherty, J.P.: Determining the neural substrates of goal-directed learning in the human brain. J. Neurosci. 27, 4019–4026 (2007)CrossRefPubMedGoogle Scholar
  20. 20.
    Alexander, G.E., et al.: Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci. 9, 357–381 (1986)CrossRefPubMedGoogle Scholar
  21. 21.
    Parent, A., Hazrati, L.N.: Functional anatomy of the basal ganglia.1. The cortico–basal ganglia–thalamo–cortical loop. Brain Res. Rev. 20, 91–127 (1995)CrossRefPubMedGoogle Scholar
  22. 22.
    Middleton, F.A., Strick, P.L.: Basal ganglia and cerebellar loops: motor and cognitive circuits. Brain Res. Rev. 31, 236–250 (2000)CrossRefPubMedGoogle Scholar
  23. 23.
    Montague, P.R., Dayan, P., Sejnowski, T.J.: A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996)PubMedGoogle Scholar
  24. 24.
    Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275(5306), 1593–1599 (1997)CrossRefPubMedGoogle Scholar
  25. 25.
    Matsumoto, M., Hikosaka, O.: Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 447, 1111–1115 (2007)CrossRefPubMedGoogle Scholar
  26. 26.
    Comoli, E., Coizet, V., Boyes, J., Bolam, J.P., Canteras, N.S., Quirk, R.H., Overton, P.G., Redgrave, P.: A direct projection from superior colliculus to substantia nigra for detecting salient visual events. Nat. Neurosci. 6(9), 974–980 (2003)CrossRefPubMedGoogle Scholar
  27. 27.
    Zhou, F.M., Liang, Y., Dani, J.A.: Endogenous nicotinic cholinergic activity regulates dopamine release in the striatum. Nat. Neurosci. 4(12), 1224–1229 (2001)CrossRefPubMedGoogle Scholar
  28. 28.
    Partridge, J.G., Apparsundaram, S., Gerhardt, G.A., Ronesi, J., Lovinger, D.M.: Nicotinic acetylcholine receptors interact with dopamine in induction of striatal long-term depression. J. Neurosci. 22(7), 2541–2549 (2002)PubMedGoogle Scholar
  29. 29.
    Tanaka, S.C., Doya, K., Okada, G., Ueda, K., Okamoto, Y., Yamawaki, S.: Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat. Neurosci. 7(8), 887–893 (2004)CrossRefPubMedGoogle Scholar
  30. 30.
    Graybiel, A.M.: Habits, Rituals, and the Evaluative Brain. Annu. Rev. Neurosci. 31, 359–387 (2008)CrossRefPubMedGoogle Scholar
  31. 31.
    Pasupathy, A., Miller, E.K.: Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433, 873–876 (2005)CrossRefPubMedGoogle Scholar
  32. 32.
    Redgrave, P., Prescott, T.J., Gurney, K.: The basal ganglia: a vertebrate solution to the selection problem? Neurosci. 89, 1009–1023 (1999)CrossRefGoogle Scholar
  33. 33.
    Gurney, K., Prescott, T.J., Redgrave, P.: A computational model of action selection in the basal ganglia. II. Analysis and simulation of behaviour. Biol. Cybern. 84, 411–423 (2001)PubMedGoogle Scholar
  34. 34.
    Prescott, T.J., Gurney, K., Montes-Gonzalez, F., Humphries, M.D., Redgrave, P.: The robot basal ganglia: action selection by an embedded model of the basal ganglia. In: Nicholson, L., Faull, R. (eds.) Basal Ganglia VII, pp. 349–356. Plenum PressGoogle Scholar
  35. 35.
    Humphries, M.D., Stewart, R.D., Gurney, K.N.: A physiologically plausible model of action selection and oscillatory activity in the basal ganglia. J. Neurosci. 26(50), 12921–12942 (2006)CrossRefPubMedGoogle Scholar
  36. 36.
    Bogacz, R., Gurney, K.: The Basal Ganglia and Cortex Implement Optimal Decision Making Between Alternative Actions. Neural. Compu. 19, 442–477 (2007)CrossRefGoogle Scholar
  37. 37.
    Doya, K., Samejima, K., Katagiri, K., Kawato, M.: Multiple model-based reinforcement learning. Neural. Comput. 14(6), 1347–1369 (2002)CrossRefPubMedGoogle Scholar
  38. 38.
    Hallett, M., Shahani, B., Young, R.: EMG analysis of patients with cerebellar lesions. Journal of Neurology, Neurosurgery, and Psychiatry 38, 1163–1169 (1975)CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Hore, J., Wild, B., Diener, H.C.: Cerebellar dysmetria at the elbow, wrist, and fingers. J. Neurophysiol. 65, 563–571 (1991)PubMedGoogle Scholar
  40. 40.
    Jeuptner, M., Rijntjes, M., Weiller, C., Faiss, J.H., Timmann, D., Mueller, S., Diener, H.C.: Localization of cerebellar timing processes using PET. Neurology 45, 1540–1545 (1995)CrossRefGoogle Scholar
  41. 41.
    O’Boyle, D.J., Freeman, J.S., Cody, F.W.J.: The accuracy and precision of timing of self-paced, repetitive movements in subjects with Parkinson’s disease. Brain 119, 51–70 (1996)CrossRefPubMedGoogle Scholar
  42. 42.
    Lo, C.-C., Wang, X.-J.: Cortico–basal ganglia circuit mechanism for a decision threshold in reaction time tasks. Nat. Neurosci. 9, 956–963 (2006)CrossRefPubMedGoogle Scholar
  43. 43.
    Maimon, G., Assad, J.: A cognitive signal for the proactivetiming of action in macaque LIP. Nat. Neuro. 9(7), 948–955 (2006)CrossRefGoogle Scholar
  44. 44.
    Doya, K.: What are the computations of the cerebellum, the basal ganglia, and the cerebral cortex. Neural Netw. 12, 961–974 (1999)CrossRefPubMedGoogle Scholar
  45. 45.
    Romanelli, P., Esposito, V., Schaal, D.W., Heit, G.: Somatotopy in the basal ganglia: experimental and clinical evidence for segregated sensorimotor channels. Brain Res. Brain Res. Rev. 48, 112–128 (2005)CrossRefPubMedGoogle Scholar
  46. 46.
    Middleton, F.A., Strick, P.L.: Basal ganglia and cerebellar loops: motor and cognitive circuits. Brain Res. Brain Res. Rev. 31, 236–250 (2000)CrossRefPubMedGoogle Scholar
  47. 47.
    Takeuchi, J., Shouno, O., Tsujino, H.: Modular neural networks for reinforcement learning with temporal intrinsic rewards. In: Proc. of 2007 International Joint Conference on Neural Networks (IJCNN) (2007)Google Scholar
  48. 48.
    Jaeger, H.: The ‘echo state’ approach to analysing and training recurrent neural networks. GMD report 148, German National Research Center for Information Technology (2001)Google Scholar
  49. 49.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  50. 50.
    Nishida, S., Ishii, K., Furukawa, T.: An online adaptation control system using mnSOM. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4232, pp. 935–942. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  51. 51.
    Schmidhuber, J.: Curious model-building control system. In: Proc. International Joint Conference on Neural Networks (IJCNN 1991), pp. 1458–1463 (1991)Google Scholar
  52. 52.
    Oudeyer, P.Y., Kaplan, F., Hafner, V.V.: Intrinsic motivation systems for autonomous mental development. IEEE Trans. Evol. Comput. 11(1), 265–286 (2007)CrossRefGoogle Scholar
  53. 53.
    Plenz, D., Kitai, S.T.: A basal ganglia pacemaker formed by the subthalamic nucleus and external globus pallidus. Nature 400, 677–682 (1999)CrossRefPubMedGoogle Scholar
  54. 54.
    Diesmann, M., Gewaltig, M.-O.: NEST: An Environment for Neural Systems Simulations. Forschung und wisschenschaftliches Rechnen, Beiträge zum Heinz-Billing-Preis 2001. Ges. für Wiss. Datenverarbeitung, 43–70 (2002)Google Scholar
  55. 55.
    Matsumoto, G., Tsujino, H.: Design of a brain computer using the novel principles of output-driven operation and memory-based architecture. In: Ono, T., Matsumoto, G., Llinas, R., Berthoz, A., Norgen, R., Nishijo, H., Tamura, R. (eds.) Cognition and Emotion in the Brain, pp. 529–546. Elsevier Science B.V, Amsterdam (2003)Google Scholar
  56. 56.
    Watanabe, T., Nanez, J.E., Sasaki, Y.: Perceptual learning without perception. Nature 413, 844–848 (2001)CrossRefPubMedGoogle Scholar
  57. 57.
    Barto, A.G., Singh, S., Chentanez, N.: Intrinsically motivated learning of hierarchical collection of skills. In: Proc. of the 3rd International Conference on Developmental Learning (ICDL) (2004)Google Scholar
  58. 58.
    Singh, S., Barto, A.G., Chentanez, N.: Intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 17, pp. 1281–1288. MIT Press, Cambridge (2005)Google Scholar
  59. 59.
    Tsujino, H.: Output-driven operation and memory-based architecture principles embedded in a real-world device. J. Integr. Neurosci. 3(2), 133–142 (2004)CrossRefPubMedGoogle Scholar
  60. 60.
    Koerner, E., Tsujino, H., Masutani, T.: A Cortical-type Modular Neural Network for Hypothetical Reasoning. Neural Netw. 10, 791–814 (1997)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Hiroshi Tsujino
    • 1
  • Johane Takeuchi
    • 1
  • Osamu Shouno
    • 1
  1. 1.Honda Research Institute Japan Co., Ltd.SaitamaJapan

Personalised recommendations