Action Discovery and Intrinsic Motivation: A Biologically Constrained Formalisation

  • Kevin Gurney
  • Nathan Lepora
  • Ashvin Shah
  • Ansgar Koene
  • Peter Redgrave
Chapter

Abstract

We introduce a biologically motivated, formal framework or “ontology” for dealing with many aspects of action discovery which we argue is an example of intrinsically motivated behaviour (as such, this chapter is a companion to that by Redgrave et al. in this volume). We argue that action discovery requires an interplay between separate internal forward models of prediction and inverse models mapping outcomes to actions. The process of learning actions is driven by transient changes in the animal’s policy (repetition bias) which is, in turn, a result of unpredicted, phasic sensory information (“surprise”). The notion of salience as value is introduced and broken down into contributions from novelty (or surprise), immediate reward acquisition, or general task/goal attainment. Many other aspects of biological action discovery emerge naturally in our framework which aims to guide future modelling efforts in this domain.

Keywords

Intrinsic Motivation Superior Colliculus Motivate Learning Internal Model Inverse Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

Written while the authors were in receipt of research funding from The Wellcome Trust, BBSRC and EPSRC.

This research has also received funds from the European Commission 7th Framework Programme (FP7/2007-2013), “Challenge 2 - Cognitive Systems, Interaction, Robotics”, Grant Agreement No. ICT-IP-231722, Project “IM-CLeVeR—Intrinsically Motivated Cumulative Learning Versatile Robots”. (NL was partially supported by EU Framework project EFAA (ICT-270490))

References

  1. .
    Allport, A., Sanders, H., Heuer, A.: Selection for action: Some behavioural and neurophysiological considerations of attention and action. In: Perspectives on Perception and Action. Lawrence Erlbaum Associates Inc., Hillsdale (1987)Google Scholar
  2. .
    Baldi, P., Itti, L.: Of bits and wows: A bayesian theory of surprise with applications to attention. Neural Netw. 23(5), 649–666 (2010)CrossRefGoogle Scholar
  3. .
    Balleine, B.W., Dickinson, A.: Goal-directed instrumental action: Contingency and incentive learning and their cortical substrates. Neuropharmacology 37(4–5), 407–419 (1998)CrossRefGoogle Scholar
  4. .
    Barto, A., Singh, S., Chentanez, N.: Intrinsically motivated reinforcement learning. In: 18th Annual Conference on Neural Information Processing Systems (NIPS). Vancouver (2004)Google Scholar
  5. .
    Cisek, P.: Cortical mechanisms of action selection: The affordance competition hypothesis. Philos.Trans. R. Soc. Lond B Biol. Sci. 362(1485), 1585–1599 (2007)CrossRefGoogle Scholar
  6. .
    Cisek, P., Kalaska, J.: Neural mechanisms for interacting with a world full of action choices. Annu. Rev. Neurosci. 33, 269–298 (2010)CrossRefGoogle Scholar
  7. .
    Comoli, E., Coizet, V., Boyes, J., Bolam, J., Canteras, N., Quirk, R., Overton, P, Redgrave, P.: A direct projection from superior colliculus to substantia nigra for detecting salient visual events. Nat. Neurosci. 6(9), 974–980 (2003)CrossRefGoogle Scholar
  8. .
    Connor, C.E., Egeth, H.E., Yantis, S.: Visual attention: Bottom-Up versus Top-Down. Curr. Biol. 14(19), R850–R852 (2004)CrossRefGoogle Scholar
  9. .
    Cope, A., Chambers, J., Gurney, K.: Object-based biasing for attentional control of gaze: A comparison of biologically plausible mechanisms. BMC Neurosci. 10(Suppl. 1), P19 (2009)CrossRefGoogle Scholar
  10. .
    Dommett, E., Coizet, V., Blaha, C., Martindale, J., Lefebvre, V., Walton, N., Mayhew, J., Overton, P., Redgrave, P.: How visual stimuli activate dopaminergic neurons at short latency. Science 307(5714), 1476–1479 (2005)CrossRefGoogle Scholar
  11. .
    Fiorillo, C.D., Tobler, P.N., Schultz, W.: Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299(5614), 1898 (2003)CrossRefGoogle Scholar
  12. .
    Friston, K.: A theory of cortical responses. Philos. Trans. R. Soc. B Biol. Sci. 360(1456), 815–836 (2005)CrossRefGoogle Scholar
  13. .
    Gene Ontology Consortium: Creating the gene ontology resource: Design and implementation. Genome Res. 11(8), 1425–1433 (2001)Google Scholar
  14. .
    Gruber, T.: A translation approach to portable ontology specification. http://www-ksl.stanford.edu/kst/what-is-an-ontology.html (1992)
  15. .
    Gurney, K., Humphries, M., Redgrave, P.: Cortico-striatal plasticity for action-outcome learning using spike timing dependent eligibility. BMC Neurosci. 10(Suppl. 1), P135 (2009a)CrossRefGoogle Scholar
  16. .
    Gurney, K., Hussain, A., Chambers, J., Abdullah, R.: Controlled and automatic processing in animals and machines with application to autonomous vehicle control. In: Controlled and Automatic Processing in Animals and Machines with Application to Autonomous Vehicle Control, Lecture Notes in Computer Science, vol. 5768, pp. 198–207. Springer, Berlin (2009b)Google Scholar
  17. .
    Ikeda, T., Hikosaka, O.: Reward-dependent gain and bias of visual responses in primate superior colliculus. Neuron 39(4), 693–700 (2003)CrossRefGoogle Scholar
  18. .
    Körding, K.P., Wolpert, D.M.: Bayesian decision theory in sensorimotor control. Trends Cogn. Sci. 10(7), 319–326 (2006)CrossRefGoogle Scholar
  19. .
    Marr, D., Poggio, T.: From understanding computation to understanding neural circuitry. Technical report, MIT AI Laboratory (1976)Google Scholar
  20. .
    Matsumoto, M., Hikosaka, O.: Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 447(7148), 1111–1115 (2007)CrossRefGoogle Scholar
  21. .
    Oudeyer, P., Kaplan, F.: What is intrinsic motivation? a typology of computational approaches. Front. Neurorobot. 1, 6 (2007). PMID 18958277Google Scholar
  22. .
    Poggio, T., Koch, C.: Ill-posed problems in early vision: From computational theory to analogue networks. Proc. R. Soc. Lond. B. Biol. Sci. 226(1244), 303 (1985)CrossRefMATHGoogle Scholar
  23. .
    Ranganath, C., Rainer, G.: Neural mechanisms for detecting and remembering novel events. Nat. Rev. Neurosci. 4(3), 193–202 (2003)CrossRefGoogle Scholar
  24. .
    Redgrave, P., Gurney, K.: The short-latency dopamine signal: A role in discovering novel actions? Nat. Rev. Neurosci. 7(12) (2006)Google Scholar
  25. .
    Redgrave, P., Gurney, K., Reynolds, J.: What is reinforced by phasic dopamine signals? Brain Res. Rev. 58(2), 322–339 (2008)CrossRefGoogle Scholar
  26. .
    Redgrave, P., Gurney, K., Stafford, T., Thirkettle, M., Lewis, J.: The role of the basal ganglia in discovering novel actions. In: Baldassarre, G., Mirolli, M. (eds.) Intrinsically Motivated Learning in Natural and Artificial Systems, pp. 129–149. Springer, Berlin (2012)Google Scholar
  27. .
    Redgrave, P., Prescott, T., Gurney, K.: The basal ganglia: A vertebrate solution to the selection problem? Neuroscience 89, 1009–1023 (1999)CrossRefGoogle Scholar
  28. .
    Reynolds, J.N.J., Wickens, J.R.: Dopamine-dependent plasticity of corticostriatal synapses. Neural Netw. 15(4–6), 507–521 (2002)CrossRefGoogle Scholar
  29. .
    Ryan, R.M., Deci, E.L.: Intrinsic and extrinsic motivations: Classic definitions and new directions. 1. Contemp. Educ. Psychol. 25(1), 54–67 (2000)Google Scholar
  30. .
    Schleidt, M., Kien, J.: Segmentation in behavior and what it can tell us about brain function. Hum. Nat. 8(1), 77–111 (1997)CrossRefGoogle Scholar
  31. .
    Schmidhuber, J.: Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. In: Anticipatory Behavior in Adaptive Learning Systems, pp. 48–76 (2009)Google Scholar
  32. .
    Schultz, W.: Dopamine signals for reward value and risk: Basic and recent data. Behav. Brain Funct. 6(1), 24 (2010)CrossRefGoogle Scholar
  33. .
    Schultz, W., Dayan, P., Montague, P.: A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)CrossRefGoogle Scholar
  34. .
    Snyder, L.H., Batista, A.P., Andersen, R.A.: Coding of intention in the posterior parietal cortex. Nature 386(6621), 167–170 (1997)CrossRefGoogle Scholar
  35. .
    Sokolov, E.N.: Higher nervous functions: The orienting reflex. Annu. Rev. Physiol. 25(1), 545–580 (1963)CrossRefGoogle Scholar
  36. .
    Sutton, R., Barto, A.: Reinforcement Learning : An Introduction. MIT, Cambridge (1998)Google Scholar
  37. .
    Thompson, K.G., Bichot, N.P., Sato, T.R.: Frontal eye field activity before visual search errors reveals the integration of Bottom-Up and Top-Down salience. J. Neurophysiol. 93(1), 337–351 (2005)CrossRefGoogle Scholar
  38. .
    Timberlake, W., Lucas, G.A.: The basis of superstitious behavior: Chance contingency, stimulus substitution, or appetitive behavior? J. Exp. Anal. Behav. 44(3), 279 (1985)CrossRefGoogle Scholar
  39. .
    Tobler, P., Fiorillo, C., Schultz, W.: Adaptive coding of reward value by dopamine neurons. Science 307(5715), 1642 (2005)CrossRefGoogle Scholar
  40. .
    Tolman, E.: Cognitive maps in rats and men. Psychol. Rev. 55(4), 189 (1948)CrossRefGoogle Scholar
  41. .
    Wurtz, R.H., Albano, J.E.: Visual-motor function of the primate superior colliculus. Annu. Rev. Neurosci. 3(1), 189–226 (1980)CrossRefGoogle Scholar
  42. .
    Yin, H.H., Knowlton, B.J.: The role of the basal ganglia in habit formation. Nat. Rev. Neurosci. 7(6), 464–476 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Kevin Gurney
    • 1
  • Nathan Lepora
    • 1
  • Ashvin Shah
    • 1
  • Ansgar Koene
    • 2
  • Peter Redgrave
    • 1
  1. 1.Adaptive Behaviour Research Group, Department of PsychologyUniversity of SheffieldSheffieldUK
  2. 2.Laboratory for Integrated Theoretical NeuroscienceRIKEN Brain Science InstituteSaitamaJapan

Personalised recommendations