Advertisement

Journal of Computational Neuroscience

, Volume 28, Issue 1, pp 107–130 | Cite as

Alternative time representation in dopamine models

  • François Rivest
  • John F. Kalaska
  • Yoshua Bengio
Article

Abstract

Dopaminergic neuron activity has been modeled during learning and appetitive behavior, most commonly using the temporal-difference (TD) algorithm. However, a proper representation of elapsed time and of the exact task is usually required for the model to work. Most models use timing elements such as delay-line representations of time that are not biologically realistic for intervals in the range of seconds. The interval-timing literature provides several alternatives. One of them is that timing could emerge from general network dynamics, instead of coming from a dedicated circuit. Here, we present a general rate-based learning model based on long short-term memory (LSTM) networks that learns a time representation when needed. Using a naïve network learning its environment in conjunction with TD, we reproduce dopamine activity in appetitive trace conditioning with a constant CS-US interval, including probe trials with unexpected delays. The proposed model learns a representation of the environment dynamics in an adaptive biologically plausible framework, without recourse to delay lines or other special-purpose circuits. Instead, the model predicts that the task-dependent representation of time is learned by experience, is encoded in ramp-like changes in single-neuron activity distributed across small neural networks, and reflects a temporal integration mechanism resulting from the inherent dynamics of recurrent loops within the network. The model also reproduces the known finding that trace conditioning is more difficult than delay conditioning and that the learned representation of the task can be highly dependent on the types of trials experienced during training. Finally, it suggests that the phasic dopaminergic signal could facilitate learning in the cortex.

Keywords

Dopamine Reward Interval-timing Trace conditioning Reinforcement learning Representation learning 

Notes

Acknowledgements

We are grateful to Douglas Eck, Aaron Courville, Doina Precup, and many others for discussion in the development of the present work. This manuscript also profited from the comments of Pascal Fortier-Poisson and Elliot Ludvig, as well as from the anonymous reviewers. F.R. was supported by doctoral studentships from the New Emerging Team Grant in Computational Neuroscience (CIHR) and from the Groupe de recherche sur le système nerveux central (FRSQ). Y.B and J.K. were supported by the CIHR New Emerging Team Grant in Computational Neuroscience and an infrastructure grant from the FRSQ.

Supplementary material

10827_2009_191_MOESM1_ESM.doc (71 kb)
Supplemental Pseudocode (DOC 71 kb)
10827_2009_191_MOESM2_ESM.doc (820 kb)
Supplemental Tables and Figures (DOC 820 kb)

References

  1. Bakker, B. (2002). Reinforcement learning with long short-term memory. In T. G. Dietterich, S. Becker & Z. Ghahramani (Eds.), Neural information processing systems (pp. 1475–1482). Cambridge: MIT.Google Scholar
  2. Balsam, P. D., Drew, M. R., & Yang, C. (2002). Timing at the start of associative learning. Learning and Motivation, 33, 141–155.CrossRefGoogle Scholar
  3. Barto, A. G. (1995). Adaptive critics and the basal ganglia. In J. C. Houk, J. L. Davis, & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 215–232). The MIT.Google Scholar
  4. Bertin, M., Schweighofer, N., & Doya, K. (2007). Multiple model-based reinforcement learning explains dopamine neuronal activity. Neural Networks, 20, 668–675.CrossRefPubMedGoogle Scholar
  5. Beylin, A. V., Gandhi, C. C., Wood, G. E., Talk, A. C., Matzel, L. D., & Shors, T. J. (2001). The role of the hippocampus in trace conditioning: temporal discontinuity or task difficulty? Neurobiology of Learning and Memory, 76, 447–461.CrossRefPubMedGoogle Scholar
  6. Braver, T. S., & Cohen, J. D. (2000). On the control of control: The role of dopamine in regulating prefrontal function and working memory. In S. Monsell & J. Driver (Eds.), Control of cognitive processes: Attention and performance XVIII (pp. 713–737). Cambridge: MIT.Google Scholar
  7. Brody, C. D., Hernandez, A., Zainos, A., & Romo, R. (2003). Timing and neural encoding of somatosensory parametric working memory in macaque prefrontal cortex. Cerebral Cortex, 13, 1196–1207.CrossRefPubMedGoogle Scholar
  8. Brown, J., Bullock, D., & Grossberg, S. (1999). How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues. Journal of Neuroscience, 19, 10502–10511.PubMedGoogle Scholar
  9. Buhusi, C. V., & Meck, W. H. (2000). Timing for the absence of a stimulus: the gap paradigm reversed. Journal of Experimental Psychology: Animal Behavior Processes, 26, 305–322.CrossRefPubMedGoogle Scholar
  10. Buhusi, C. V., & Meck, W. H. (2005). What makes us tick? Functional and neural mechanisms of interval timing. Nature Reviews. Neuroscience, 6, 755–765.CrossRefPubMedGoogle Scholar
  11. Church, R. M. (2003). A concise introduction to scalar timing theory. In W. H. Meck (Ed.), Functional and neural mechanisms of interval timing (pp. 3–22). Boca Raton: CRC.Google Scholar
  12. Clark, R. E., & Squire, L. R. (1998). Classical conditioning and brain systems: the role of awareness. Science, 280, 77–81.CrossRefPubMedGoogle Scholar
  13. Daw, N. D., & Doya, K. (2006). The computational neurobiology of learning and reward. Curr. Opin. Neurobiol.Google Scholar
  14. Daw, N. D., Courville, A. C., & Touretzky, D. S. (2003). Timing and partial observability in the dopamine system. In S. Becker, S. Thrun & K. Obermayer (Eds.), Neural information processing systems (pp. 83–90). Cambridge: MIT.Google Scholar
  15. Daw, N. D., Courville, A. C., & Touretzky, D. S. (2006). Representation and timing in theories of the dopamine system. Neural Computation, 18, 1637–1677.CrossRefPubMedGoogle Scholar
  16. Dormont, J. F., Conde, H., & Farin, D. (1998). The role of the pedunculopontine tegmental nucleus in relation to conditioned motor performance in the cat. I. Context-dependent and reinforcement-related single unit activity. Experimental Brain Research, 121, 401–410.CrossRefGoogle Scholar
  17. Doya, K. (1999). What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Networks, 12, 961–974.CrossRefPubMedGoogle Scholar
  18. Doya, K. (2000). Complementary roles of basal ganglia and cerebellum in learning and motor control. Current Opinion in Neurobiology, 10, 732–739.CrossRefPubMedGoogle Scholar
  19. Dragoi, V., Staddon, J. E., Palmer, R. G., & Buhusi, C. V. (2003). Interval timing as an emergent learning property. Psychological Review, 110, 126–144.CrossRefPubMedGoogle Scholar
  20. Durstewitz, D. (2004). Neural representation of interval time. NeuroReport, 15, 745–749.CrossRefPubMedGoogle Scholar
  21. Eck, D., & Schmidhuber, A. (2002). Learning the long-term structure of the blues. Artificial Neural Networks—Icann, 2002(2415), 284–289.Google Scholar
  22. Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2003). Discrete coding of reward probability and uncertainty by dopamine neurons. Science, 299, 1898–1902.CrossRefPubMedGoogle Scholar
  23. Fiorillo, C. D., Newsome, W. T., & Schultz, W. (2008). The temporal precision of reward prediction in dopamine neurons. Nature Neuroscience, 11, 966–973.CrossRefGoogle Scholar
  24. Florian, R. V. (2007). Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Computation, 19, 1468–1502.CrossRefPubMedGoogle Scholar
  25. Fukuda, M., Ono, T., Nishino, H., & Nakamura, K. (1986). Neuronal responses in monkey lateral hypothalamus during operant feeding behavior. Brain Research Bulletin, 17, 879–883.CrossRefPubMedGoogle Scholar
  26. Fukuda, M., Ono, T., Nakamura, K., & Tamura, R. (1990). Dopamine and ACh involvement in plastic learning by hypothalamic neurons in rats. Brain Research Bulletin, 25, 109–114.CrossRefPubMedGoogle Scholar
  27. Funahashi, S., Bruce, C. J., & Goldman-Rakic, P. S. (1989). Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex. Journal of Neurophysiology, 61, 331–349.PubMedGoogle Scholar
  28. Gallistel, C. R., & Gibbon, J. (2000). Time, rate, and conditioning. Psychological Review, 107, 289–344.CrossRefPubMedGoogle Scholar
  29. Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: continual prediction with LSTM. Neural Computation, 12, 2451–2471.CrossRefPubMedGoogle Scholar
  30. Gers, F. A., Schraudolph, N. N., & Schmidhuber, J. (2002). Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, 3, 115–143.CrossRefGoogle Scholar
  31. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780.CrossRefPubMedGoogle Scholar
  32. Hollerman, J. R., & Schultz, W. (1998). Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience, 1, 304–309.CrossRefPubMedGoogle Scholar
  33. Hopson, J. W. (2003). General learning models: Timing without a clock. In W. H. Meck (Ed.), Functional and neural mechanisms of interval timing (pp. 23–60). Boca Raton: CRC.Google Scholar
  34. Houk, J. C., Adams, J. L., & Barto, A. G. (1995). A model of how the basal ganglia generate and use neural signals that predict reinforcement. In J. C. Houk, J. L. Davis, & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 249–270). The MIT.Google Scholar
  35. Ivry, R. B., & Schlerf, J. E. (2008). Dedicated and intrinsic models of time perception. Trends in Cognitive Sciences, 12, 273–280.CrossRefPubMedGoogle Scholar
  36. Izhikevich, E. M. (2007). Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral Cortex, 17, 2443–2452.CrossRefPubMedGoogle Scholar
  37. Joel, D., & Weiner, I. (2000). The connections of the dopaminergic system with the striatum in rats and primates: an analysis with respect to the functional and compartmental organization of the striatum. Neuroscience, 96, 451–474.CrossRefPubMedGoogle Scholar
  38. Karmarkar, U. R., & Buonomano, D. V. (2007). Timing in the absence of clocks: encoding time in neural network states. Neuron, 53, 427–438.CrossRefPubMedGoogle Scholar
  39. Kolodziejski, C., Porr, B., & Worgotter, F. (2009). On the asymptotic equivalence between differential hebbian and temporal difference learning. Neural Computation, 21, 1173–1202.CrossRefPubMedGoogle Scholar
  40. Komura, Y., Tamura, R., Uwano, T., Nishijo, H., Kaga, K., & Ono, T. (2001). Retrospective and prospective coding for predicted reward in the sensory thalamus. Nature, 412, 546–549.CrossRefPubMedGoogle Scholar
  41. Laubach, M. (2005). Who’s on first? What’s on second? The time course of learning in corticostriatal systems. Trends in Neurosciences, 28, 508–511.CrossRefGoogle Scholar
  42. Lebedev, M. A., O’Doherty, J. E., & Nicolelis, M. A. (2008). Decoding of temporal intervals from cortical ensemble activity. Journal of Neurophysiology, 99, 166–186.CrossRefPubMedGoogle Scholar
  43. Leon, M. I., & Shadlen, M. N. (2003). Representation of time by neurons in the posterior parietal cortex of the macaque. Neuron, 38, 317–327.CrossRefPubMedGoogle Scholar
  44. Lewis, P. A. (2002). Finding the timer. Trends in Cognitive Sciences, 6, 195–196.CrossRefPubMedGoogle Scholar
  45. Ljungberg, T., Apicella, P., & Schultz, W. (1992). Responses of monkey dopamine neurons during learning of behavioral reactions. Journal of Neurophysiology, 67, 145–163.PubMedGoogle Scholar
  46. Lucchetti, C., & Bon, L. (2001). Time-modulated neuronal activity in the premotor cortex of macaque monkeys. Experimental Brain Research, 141, 254–260.CrossRefGoogle Scholar
  47. Lucchetti, C., Ulrici, A., & Bon, L. (2005). Dorsal premotor areas of nonhuman primate: functional flexibility in time domain. European Journal of Applied Physiology, 95, 121–130.CrossRefPubMedGoogle Scholar
  48. Ludvig, E. A., Sutton, R. S., & Kehoe, E. J. (2008). Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Computation, 20, 3034–3054.CrossRefPubMedGoogle Scholar
  49. Ludvig, E. A., Sutton, R. S., Verbeek, E., & Kehoe, E. J. (2009). A computational model of hippocampal function in trace conditioning. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Neural information processing systems (pp. 993–1000).Google Scholar
  50. Mirenowicz, J., & Schultz, W. (1994). Importance of unpredictability for reward responses in primate dopamine neurons. Journal of Neurophysiology, 72, 1024–1027.PubMedGoogle Scholar
  51. Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive hebbian learning. Journal of Neuroscience, 16, 1936–1947.PubMedGoogle Scholar
  52. Montague, P. R., Hyman, S. E., & Cohen, J. D. (2004). Computational roles for dopamine in behavioural control. Nature, 431, 760–767.CrossRefPubMedGoogle Scholar
  53. Morris, G., Arkadir, D., Nevet, A., Vaadia, E., & Bergman, H. (2004). Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron, 43, 133–143.CrossRefPubMedGoogle Scholar
  54. Morris, G., Nevet, A., Arkadir, D., Vaadia, E., & Bergman, H. (2006). Midbrain dopamine neurons encode decisions for future action. Nature Neuroscience, 9, 1057–1063.CrossRefPubMedGoogle Scholar
  55. O’Reilly, R. C., & Frank, M. J. (2006). Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Computation, 18, 283–328.CrossRefPubMedGoogle Scholar
  56. Otani, S., Daniel, H., Roisin, M. P., & Crepel, F. (2003). Dopaminergic modulation of long-term synaptic plasticity in rat prefrontal neurons. Cerebral Cortex, 13, 1251–1256.CrossRefPubMedGoogle Scholar
  57. Pan, W. X., Schmidt, R., Wickens, J. R., & Hyland, B. I. (2005). Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. Journal of Neuroscience, 25, 6235–6242.CrossRefPubMedGoogle Scholar
  58. Potjans, W., Morrison, A., & Diesmann, M. (2009). A spiking neural network model of an actor-critic learning agent. Neural Computation, 21, 301–339.CrossRefPubMedGoogle Scholar
  59. Reynolds, J. N., & Wickens, J. R. (2002). Dopamine-dependent plasticity of corticostriatal synapses. Neural Networks, 15, 507–521.CrossRefPubMedGoogle Scholar
  60. Rivest, F., Bengio, Y., & Kalaska, J. F. (2005). Brain inspired reinforcement learning. In L. K. Saul, Y. Weiss & L. Bottou (Eds.), Neural information processing systems (pp. 1129–1136). Cambridge: The MIT.Google Scholar
  61. Roberts, P. D., Santiago, R. A., & Lafferriere, G. (2008). An implementation of reinforcement learning based on spike timing dependent plasticity. Biological Cybernetics, 99, 517–523.CrossRefPubMedGoogle Scholar
  62. Roesch, M. R., Calu, D. J., & Schoenbaum, G. (2007). Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neuroscience, 10, 1615–1624.CrossRefPubMedGoogle Scholar
  63. Romo, R., Brody, C. D., Hernandez, A., & Lemus, L. (1999). Neuronal correlates of parametric working memory in the prefrontal cortex. Nature, 399, 470–473.CrossRefPubMedGoogle Scholar
  64. Rougier, N. P., Noelle, D. C., Braver, T. S., Cohen, J. D., & O’Reilly, R. C. (2005). Prefrontal cortex and flexible cognitive control: rules without symbols. Proceedings of the National Academy of Sciences USA, 102, 7338–7343.CrossRefGoogle Scholar
  65. Samejima, K., Ueda, Y., Doya, K., & Kimura, M. (2005). Representation of action-specific reward values in the striatum. Science, 310, 1337–1340.CrossRefPubMedGoogle Scholar
  66. Schultz, W., Apicella, P., Scarnati, E., & Ljungberg, T. (1992). Neuronal activity in monkey ventral striatum related to the expectation of reward. Journal of Neuroscience, 12, 4595–4610.PubMedGoogle Scholar
  67. Schultz, W., Apicella, P., & Ljungberg, T. (1993). Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. Journal of Neuroscience, 13, 900–913.PubMedGoogle Scholar
  68. Schultz, W., Apicella, P., Romo, R., & Scarnati, E. (1995). Context-dependent activity in primate striatum reflecting past and future behavioral events. In J. C. Houk, J. L. Davis, & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 11–27). The MIT.Google Scholar
  69. Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.CrossRefPubMedGoogle Scholar
  70. Suri, R. E., & Schultz, W. (1998). Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Experimental Brain Research, 121, 350–354.CrossRefGoogle Scholar
  71. Suri, R. E., & Schultz, W. (1999). A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience, 91, 871–890.CrossRefPubMedGoogle Scholar
  72. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: The MIT.Google Scholar
  73. Thibaudeau, G., Potvin, O., Allen, K., Dore, F. Y., & Goulet, S. (2007). Dorsal, ventral, and complete excitotoxic lesions of the hippocampus in rats failed to impair appetitive trace conditioning. Behavioural Brain Research, 185, 9–20.CrossRefPubMedGoogle Scholar
  74. Todd, M. T., Niv, Y., & Cohen, J. D. (2009). Learning to use working memory in partially observable environments through dopaminergic reinforcement. In D. Koller, D. Schuurmans, Y. Bengio & L. Bottou (Eds.), Neural information processing systems (pp. 1689–1696). Cambridge: The MIT.Google Scholar
  75. Wickens, J., & Kotter, R. (1995). Cellular models of reinforcement. In J. C. Houk, J. L. Davis & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 187–214). Cambridge: The MIT.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • François Rivest
    • 1
    • 2
    • 3
  • John F. Kalaska
    • 1
    • 3
  • Yoshua Bengio
    • 1
    • 2
  1. 1.Groupe de Recherche sur le Système Nerveux Central (FRSQ)Université de MontréalMontréalCanada
  2. 2.Département d’informatique et de recherche opérationnelleUniversité de MontréalMontréalCanada
  3. 3.Département de physiologieUniversité de MontréalMontréalCanada

Personalised recommendations