Advertisement

Biological Cybernetics

, Volume 108, Issue 1, pp 23–48 | Cite as

Conditioning and time representation in long short-term memory networks

  • Francois Rivest
  • John F. Kalaska
  • Yoshua Bengio
Original Paper

Abstract

Dopaminergic models based on the temporal-difference learning algorithm usually do not differentiate trace from delay conditioning. Instead, they use a fixed temporal representation of elapsed time since conditioned stimulus onset. Recently, a new model was proposed in which timing is learned within a long short-term memory (LSTM) artificial neural network representing the cerebral cortex (Rivest et al. in J Comput Neurosci 28(1):107–130, 2010). In this paper, that model’s ability to reproduce and explain relevant data, as well as its ability to make interesting new predictions, are evaluated. The model reveals a strikingly different temporal representation between trace and delay conditioning since trace conditioning requires working memory to remember the past conditioned stimulus while delay conditioning does not. On the other hand, the model predicts no important difference in DA responses between those two conditions when trained on one conditioning paradigm and tested on the other. The model predicts that in trace conditioning, animal timing starts with the conditioned stimulus offset as opposed to its onset. In classical conditioning, it predicts that if the conditioned stimulus does not disappear after the reward, the animal may expect a second reward. Finally, the last simulation reveals that the buildup of activity of some units in the networks can adapt to new delays by adjusting their rate of integration. Most importantly, the paper shows that it is possible, with the proposed architecture, to acquire discharge patterns similar to those observed in dopaminergic neurons and in the cerebral cortex on those tasks simply by minimizing a predictive cost function.

Keywords

Time representation learning Temporal-difference learning Long short-term memory networks Dopamine Conditioning Reinforcement learning 

Notes

Acknowledgments

This manuscript profited from the comments of James Bergstra, Paul Cisek, Richard Courtemanche, and anonymous reviewers. F.R. was supported by doctoral studentships from the CIHR New Emerging Team Grant in Computational Neurosciences and from the Groupe de recherche sur le système nerveux central (FRSQ), and by a start-up fund from the Royal Military College of Canada. Y.B. and J.K. were supported by the CIHR New Emerging Team Grant in Computational Neurosciences (NET 54000; J.K., Y.B.) and by an FRSQ infrastructure grant. J.K. was also supported by CIHR operating grant (MOP 84454; J.K.) and CIHR Group Grant in Neurological Sciences (MGC 15176; J.K.). Part of this work also appeared as part of F.R. Ph.D. Thesis (Rivest 2009).

Supplementary material

422_2013_575_MOESM1_ESM.docx (1.6 mb)
Supplementary material 1 (docx 1653 KB)

References

  1. Balci F, Gallistel CR, Allen BD, Frank KM, Gibson JM, Brunner D (2009) Acquisition of peak responding: what is learned? Behav Process 80(1):67–75CrossRefGoogle Scholar
  2. Balsam PD, Drew MR, Yang C (2002) Timing at the start of associative learning. Learn. Motiv. 33(1):141–155CrossRefGoogle Scholar
  3. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166PubMedCrossRefGoogle Scholar
  4. Beylin AV, Gandhi CC, Wood GE, Talk AC, Matzel LD, Shors TJ (2001) The role of the hippocampus in trace conditioning: temporal discontinuity or task difficulty? Neurobiol Learn Mem 76(3):447–461PubMedCrossRefGoogle Scholar
  5. Brody CD, Hernandez A, Zainos A, Romo R (2003) Timing and neural encoding of somatosensory parametric working memory in macaque prefrontal cortex. Cereb Cortex 13(11):1196–1207PubMedCrossRefGoogle Scholar
  6. Brown J, Bullock D, Grossberg S (1999) How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues. J Neurosci 19(23):10502–10511PubMedGoogle Scholar
  7. Buhusi CV, Meck WH (2000) Timing for the absence of a stimulus: the gap paradigm reversed. J Exp Psychol Anim Behav Process 26(3):305–322PubMedCrossRefGoogle Scholar
  8. Buhusi CV, Meck WH (2005) What makes us tick? Functional and neural mechanisms of interval timing. Nat Rev Neurosci 6(10):755–765PubMedCrossRefGoogle Scholar
  9. Buonomano DV (2005) A learning rule for the emergence of stable dynamics and timing in recurrent networks. J Neurophysiol 94(4):2275–2283PubMedCrossRefGoogle Scholar
  10. Constantinidis C, Steinmetz MA (1996) Neuronal activity in posterior parietal area 7a during the delay periods of a spatial memory task. J Neurophysiol 76(2):1352–1355PubMedGoogle Scholar
  11. Daw ND, Courville AC, Touretzky DS (2006) Representation and timing in theories of the dopamine system. Neural Comput 18(7):1637–1677PubMedCrossRefGoogle Scholar
  12. Dominey PF, Boussaoud D (1997) Encoding behavioral context in recurrent networks of the fronto-striatal system: a simulation study. Brain Res Cogn Brain Res 6(1):53–65PubMedCrossRefGoogle Scholar
  13. Dragoi V, Staddon JE, Palmer RG, Buhusi CV (2003) Interval timing as an emergent learning property. Psychol Rev 110(1):126–144PubMedCrossRefGoogle Scholar
  14. Fiorillo CD, Newsome WT, Schultz W (2008) The temporal precision of reward prediction in dopamine neurons. Nat Neurosci 11:966–973CrossRefGoogle Scholar
  15. Frank M (2010) Interesting Hypothesis, New Finding. Faculty of 1000 BiologyGoogle Scholar
  16. Funahashi S, Bruce CJ, Goldman-Rakic PS (1989) Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex. J Neurophysiol 61(2):331–349PubMedGoogle Scholar
  17. Gallistel CR, Gibbon J (2000) Time, rate, and conditioning. Psychol Rev 107(2):289–344PubMedCrossRefGoogle Scholar
  18. Gallistel CR, King AP (2009) Memory and the computational brain: why cognitive science will transform neuroscience. Wiley-Blackwell, New YorkCrossRefGoogle Scholar
  19. Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471PubMedCrossRefGoogle Scholar
  20. Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3:115–143Google Scholar
  21. Gibbon J (1977) Scalar expectancy theory and Weber’s Law in animal timing. Psychol Rev 84(3):279–325CrossRefGoogle Scholar
  22. Gibbon J, Church RM, Meck WH (1984) Scalar timing in memory. In: Gibbon J, Allen LG (eds) Timing and time perception. New York Academy of Sciences, New York, pp 52–77Google Scholar
  23. Hernandez G, Hamdani S, Rajabi H, Conover K, Stewart J, Arvanitogiannis A, Shizgal P (2006) Prolonged rewarding stimulation of the rat medial forebrain bundle: neurochemical and behavioral consequences. Behav Neurosci 120(4):888–904PubMedCrossRefGoogle Scholar
  24. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780PubMedCrossRefGoogle Scholar
  25. Hollerman JR, Schultz W (1998) Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1(4):304–309PubMedCrossRefGoogle Scholar
  26. Ivry RB, Schlerf JE (2008) Dedicated and intrinsic models of time perception. Trends Cogn Sci 12(7):273–280PubMedCrossRefGoogle Scholar
  27. Karmarkar UR, Buonomano DV (2007) Timing in the absence of clocks: encoding time in neural network states. Neuron 53(3):427–438PubMedCentralPubMedCrossRefGoogle Scholar
  28. Kehoe EJ, Ludvig EA, Sutton RS (2009) Magnitude and timing of conditioned responses in delay and trace classical conditioning of the nictitating membrane response of the rabbit (Oryctolagus cuniculus). Behav Neurosci 123(5):1095–1101. doi: 10.1037/a0017112 PubMedCrossRefGoogle Scholar
  29. Kirkpatrick-Steger K, Miller SS, Betti CA, Wasserman EA (1996) Cyclic responding by pigeons on the peak timing procedure. J Exp Psychol Anim Behav Process 22(4):447–460PubMedCrossRefGoogle Scholar
  30. Kolodziejski C, Porr B, Worgotter F (2008) Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison. Biol Cybern 98(3):259–272PubMedCentralPubMedCrossRefGoogle Scholar
  31. Komura Y, Tamura R, Uwano T, Nishijo H, Kaga K, Ono T (2001) Retrospective and prospective coding for predicted reward in the sensory thalamus. Nature 412(6846):546–549PubMedCrossRefGoogle Scholar
  32. Lebedev MA, O’Doherty JE, Nicolelis MA (2008) Decoding of temporal intervals from cortical ensemble activity. J Neurophysiol 99(1):166–186PubMedCrossRefGoogle Scholar
  33. Leon MI, Shadlen MN (2003) Representation of time by neurons in the posterior parietal cortex of the macaque. Neuron 38(2): 317–327Google Scholar
  34. Ljungberg T, Apicella P, Schultz W (1992) Responses of monkey dopamine neurons during learning of behavioral reactions. J Neurophysiol 67(1):145–163PubMedGoogle Scholar
  35. Lucchetti C, Bon L (2001) Time-modulated neuronal activity in the premotor cortex of macaque monkeys. Exp Brain Res 141(2):254–260PubMedCrossRefGoogle Scholar
  36. Lucchetti C, Ulrici A, Bon L (2005) Dorsal premotor areas of nonhuman primate: functional flexibility in time domain. Eur J Appl Physiol 95(2–3):121–130PubMedCrossRefGoogle Scholar
  37. Ludvig EA, Sutton RS, Kehoe EJ (2008) Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Comput 20(12):3034–3054PubMedCrossRefGoogle Scholar
  38. Ludvig EA, Sutton RS, Verbeek E, Kehoe EJ (2009) A computational model of hippocampal function in trace conditioning. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems, vol 21. MIT Press, Vancouver, pp 993–1000Google Scholar
  39. Luzardo A, Ludvig EA, Rivest F (2013) An adaptive drift-diffusion model of interval timing dynamics. Behav Process. doi: 10.1016/j.beproc.2013.02.003
  40. Machado A (1997) Learning the temporal dynamics of behavior. Psychol Rev 104(2):241–265PubMedCrossRefGoogle Scholar
  41. Matell MS, Meck WH (2004) Cortico-striatal circuits and interval timing: coincidence detection of oscillatory processes. Brain Res Cogn Brain Res 21(2):139–170PubMedCrossRefGoogle Scholar
  42. Mauritz KH, Wise SP (1986) Premotor cortex of the rhesus monkey: neuronal activity in anticipation of predictable environmental events. Exp Brain Res 61(2):229–244PubMedCrossRefGoogle Scholar
  43. Miall C (1989) The storage of time intervals using oscillating neurons. Neural Comput 1(3):359–371. doi: 10.1162/neco.1989.1.3.359 CrossRefGoogle Scholar
  44. Montague PR, Dayan P, Sejnowski TJ (1996) A framework for mesencephalic dopamine systems based on predictive hebbian learning. J Neurosci 16(5):1936–1947PubMedGoogle Scholar
  45. Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H (2004) Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43(1):133–143PubMedCrossRefGoogle Scholar
  46. Nakamura K, Ono T (1986) Lateral hypothalamus neuron involvement in integration of natural and artificial rewards and cue signals. J Neurophysiol 55(1):163–181PubMedGoogle Scholar
  47. Niki H, Watanabe M (1979) Prefrontal and cingulate unit activity during timing behavior in the monkey. Brain Res 171(2):213–224PubMedCrossRefGoogle Scholar
  48. O’Reilly RC, Frank MJ (2006) Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput 18(2):283–328PubMedCrossRefGoogle Scholar
  49. Otani S, Daniel H, Roisin MP, Crepel F (2003) Dopaminergic modulation of long-term synaptic plasticity in rat prefrontal neurons. Cereb Cortex 13(11):1251–1256PubMedCrossRefGoogle Scholar
  50. Pan WX, Schmidt R, Wickens JR, Hyland BI (2005) Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J Neurosci 25(26):6235–6242PubMedCrossRefGoogle Scholar
  51. Reutimann J, Yakovlev V, Fusi S, Senn W (2004) Climbing neuronal activity as an event-based cortical representation of time. J Neurosci 24(13):3295–3303PubMedCrossRefGoogle Scholar
  52. Rhodes BJ, Bullock D (2002) A scalable model of cerebellar adaptive timing and sequencing: the recurrent slide and latch (RSL) model. Appl Intell 17(1):35–48CrossRefGoogle Scholar
  53. Rivest F (2009) Modèle informatique du coapprentissage des ganglions de la base et du cortex : L’apprentissage par renforcement et le développement de représentations. Dissertation, Université de Montréal. https://papyrus.bib.umontreal.ca/xmlui/handle/1866/4309. Accessed 5 May 2010
  54. Rivest F, Bengio Y (2011) Adaptive Drift-diffusion process to learn time intervals. Cornell University Librairy, arXiv:1103.2382v1Google Scholar
  55. Rivest F, Kalaska JF, Bengio Y (2010) Alternative time representation in dopamine models. J Comput Neurosci 28(1):107–130PubMedCrossRefGoogle Scholar
  56. Robinson AJ, Fallside F (1987) The utility driven dynamic error propagation network. Technical report CUED/F-INFENG/TR.1. Cambridge University, Engineering Department, Cambridge, EnglandGoogle Scholar
  57. Romo R, Brody CD, Hernandez A, Lemus L (1999) Neuronal correlates of parametric working memory in the prefrontal cortex. Nature 399(6735):470–473PubMedCrossRefGoogle Scholar
  58. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumerlhart DE, McClelland JL, Group tPR (eds) Parallel distributed processing: explorations in the microstructure of cognition. vol 1 Foundations. MITPress/Bradford Books, CambridgeGoogle Scholar
  59. Sanabria F, Killeen PR (2007) Temporal generalization accounts for response resurgence in the peak procedure. Behav Process 74(2):126–141CrossRefGoogle Scholar
  60. Schneider BA, Ghose GM (2012) Temporal production signals in parietal cortex. PLoS Biol 10(10):e1001413. doi: 10.1371/journal.pbio.1001413 PubMedCentralPubMedCrossRefGoogle Scholar
  61. Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J Neurosci 13(3):900–913PubMedGoogle Scholar
  62. Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275(5306):1593–1599PubMedCrossRefGoogle Scholar
  63. Simen P, Balci F, de Souza L, Cohen JD, Holmes P (2011) A model of interval timing by neural integration. J Neurosci 31(25):9238–9253. doi: 10.1523/JNEUROSCI.3121-10.2011 PubMedCentralPubMedCrossRefGoogle Scholar
  64. Steuber V, Willshaw DJ (1999) Adaptive leaky integrator models of cerebellar Purkinje cells can learn the clustering of temporal patterns. Comput Neurosci 26–27:271–276Google Scholar
  65. Suri RE, Schultz W (1998) Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Exp Brain Res 121(3):350–354 PubMedCrossRefGoogle Scholar
  66. Suri RE, Schultz W (1999) A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91(3):871–890PubMedCrossRefGoogle Scholar
  67. Sussillo D, Abbott LF (2009) Generating coherent patterns of activity from chaotic neural networks. Neuron 63(4):544–557PubMedCentralPubMedCrossRefGoogle Scholar
  68. Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9–44Google Scholar
  69. Sutton RS, Barto AG (1990) Time-derivative models of pavlovian reinforcement. In: Gabriel M, Moore J (eds) Learning and computational neuroscience: foundations of adaptive networks. MIT Press, Cambridge, pp 497–538Google Scholar
  70. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction (adaptive computation and machine learning). MIT Press, CambridgeGoogle Scholar
  71. Thibaudeau G, Potvin O, Allen K, Dore FY, Goulet S (2007) Dorsal, ventral, and complete excitotoxic lesions of the hippocampus in rats failed to impair appetitive trace conditioning. Behav Brain Res 185(1):9–20PubMedCrossRefGoogle Scholar
  72. Yamazaki T, Tanaka S (2007) The cerebellum as a liquid state machine. Neural Netw 20(3):290–297. doi: 10.1016/j.neunet.2007.04.004 PubMedCrossRefGoogle Scholar

Copyright information

© © Her Majesty the Queen in Right of Canada 2013

Authors and Affiliations

  • Francois Rivest
    • 1
    • 2
  • John F. Kalaska
    • 3
  • Yoshua Bengio
    • 4
  1. 1.Department of Mathematics and Computer ScienceRoyal Military College of CanadaStation Forces, KingstonCanada
  2. 2.Centre for Neuroscience StudiesQueen’s UniversityKingstonCanada
  3. 3.Department of PhysiologyUniversity of MontrealMontrealCanada
  4. 4.Department of Computer Science and Operations ResearchUniversity of MontrealMontrealCanada

Personalised recommendations