Skip to main content
Log in

Conditioning and time representation in long short-term memory networks

  • Original Paper
  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

Dopaminergic models based on the temporal-difference learning algorithm usually do not differentiate trace from delay conditioning. Instead, they use a fixed temporal representation of elapsed time since conditioned stimulus onset. Recently, a new model was proposed in which timing is learned within a long short-term memory (LSTM) artificial neural network representing the cerebral cortex (Rivest et al. in J Comput Neurosci 28(1):107–130, 2010). In this paper, that model’s ability to reproduce and explain relevant data, as well as its ability to make interesting new predictions, are evaluated. The model reveals a strikingly different temporal representation between trace and delay conditioning since trace conditioning requires working memory to remember the past conditioned stimulus while delay conditioning does not. On the other hand, the model predicts no important difference in DA responses between those two conditions when trained on one conditioning paradigm and tested on the other. The model predicts that in trace conditioning, animal timing starts with the conditioned stimulus offset as opposed to its onset. In classical conditioning, it predicts that if the conditioned stimulus does not disappear after the reward, the animal may expect a second reward. Finally, the last simulation reveals that the buildup of activity of some units in the networks can adapt to new delays by adjusting their rate of integration. Most importantly, the paper shows that it is possible, with the proposed architecture, to acquire discharge patterns similar to those observed in dopaminergic neurons and in the cerebral cortex on those tasks simply by minimizing a predictive cost function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Balci F, Gallistel CR, Allen BD, Frank KM, Gibson JM, Brunner D (2009) Acquisition of peak responding: what is learned? Behav Process 80(1):67–75

    Article  Google Scholar 

  • Balsam PD, Drew MR, Yang C (2002) Timing at the start of associative learning. Learn. Motiv. 33(1):141–155

    Article  Google Scholar 

  • Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  CAS  PubMed  Google Scholar 

  • Beylin AV, Gandhi CC, Wood GE, Talk AC, Matzel LD, Shors TJ (2001) The role of the hippocampus in trace conditioning: temporal discontinuity or task difficulty? Neurobiol Learn Mem 76(3):447–461

    Article  CAS  PubMed  Google Scholar 

  • Brody CD, Hernandez A, Zainos A, Romo R (2003) Timing and neural encoding of somatosensory parametric working memory in macaque prefrontal cortex. Cereb Cortex 13(11):1196–1207

    Article  PubMed  Google Scholar 

  • Brown J, Bullock D, Grossberg S (1999) How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues. J Neurosci 19(23):10502–10511

    CAS  PubMed  Google Scholar 

  • Buhusi CV, Meck WH (2000) Timing for the absence of a stimulus: the gap paradigm reversed. J Exp Psychol Anim Behav Process 26(3):305–322

    Article  CAS  PubMed  Google Scholar 

  • Buhusi CV, Meck WH (2005) What makes us tick? Functional and neural mechanisms of interval timing. Nat Rev Neurosci 6(10):755–765

    Article  CAS  PubMed  Google Scholar 

  • Buonomano DV (2005) A learning rule for the emergence of stable dynamics and timing in recurrent networks. J Neurophysiol 94(4):2275–2283

    Article  PubMed  Google Scholar 

  • Constantinidis C, Steinmetz MA (1996) Neuronal activity in posterior parietal area 7a during the delay periods of a spatial memory task. J Neurophysiol 76(2):1352–1355

    CAS  PubMed  Google Scholar 

  • Daw ND, Courville AC, Touretzky DS (2006) Representation and timing in theories of the dopamine system. Neural Comput 18(7):1637–1677

    Article  PubMed  Google Scholar 

  • Dominey PF, Boussaoud D (1997) Encoding behavioral context in recurrent networks of the fronto-striatal system: a simulation study. Brain Res Cogn Brain Res 6(1):53–65

    Article  CAS  PubMed  Google Scholar 

  • Dragoi V, Staddon JE, Palmer RG, Buhusi CV (2003) Interval timing as an emergent learning property. Psychol Rev 110(1):126–144

    Article  PubMed  Google Scholar 

  • Fiorillo CD, Newsome WT, Schultz W (2008) The temporal precision of reward prediction in dopamine neurons. Nat Neurosci 11:966–973

    Article  CAS  Google Scholar 

  • Frank M (2010) Interesting Hypothesis, New Finding. Faculty of 1000 Biology

  • Funahashi S, Bruce CJ, Goldman-Rakic PS (1989) Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex. J Neurophysiol 61(2):331–349

    CAS  PubMed  Google Scholar 

  • Gallistel CR, Gibbon J (2000) Time, rate, and conditioning. Psychol Rev 107(2):289–344

    Article  CAS  PubMed  Google Scholar 

  • Gallistel CR, King AP (2009) Memory and the computational brain: why cognitive science will transform neuroscience. Wiley-Blackwell, New York

    Book  Google Scholar 

  • Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471

    Article  CAS  PubMed  Google Scholar 

  • Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3:115–143

    Google Scholar 

  • Gibbon J (1977) Scalar expectancy theory and Weber’s Law in animal timing. Psychol Rev 84(3):279–325

    Article  Google Scholar 

  • Gibbon J, Church RM, Meck WH (1984) Scalar timing in memory. In: Gibbon J, Allen LG (eds) Timing and time perception. New York Academy of Sciences, New York, pp 52–77

    Google Scholar 

  • Hernandez G, Hamdani S, Rajabi H, Conover K, Stewart J, Arvanitogiannis A, Shizgal P (2006) Prolonged rewarding stimulation of the rat medial forebrain bundle: neurochemical and behavioral consequences. Behav Neurosci 120(4):888–904

    Article  CAS  PubMed  Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  CAS  PubMed  Google Scholar 

  • Hollerman JR, Schultz W (1998) Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1(4):304–309

    Article  CAS  PubMed  Google Scholar 

  • Ivry RB, Schlerf JE (2008) Dedicated and intrinsic models of time perception. Trends Cogn Sci 12(7):273–280

    Article  PubMed  Google Scholar 

  • Karmarkar UR, Buonomano DV (2007) Timing in the absence of clocks: encoding time in neural network states. Neuron 53(3):427–438

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Kehoe EJ, Ludvig EA, Sutton RS (2009) Magnitude and timing of conditioned responses in delay and trace classical conditioning of the nictitating membrane response of the rabbit (Oryctolagus cuniculus). Behav Neurosci 123(5):1095–1101. doi:10.1037/a0017112

    Article  PubMed  Google Scholar 

  • Kirkpatrick-Steger K, Miller SS, Betti CA, Wasserman EA (1996) Cyclic responding by pigeons on the peak timing procedure. J Exp Psychol Anim Behav Process 22(4):447–460

    Article  CAS  PubMed  Google Scholar 

  • Kolodziejski C, Porr B, Worgotter F (2008) Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison. Biol Cybern 98(3):259–272

    Article  PubMed Central  PubMed  Google Scholar 

  • Komura Y, Tamura R, Uwano T, Nishijo H, Kaga K, Ono T (2001) Retrospective and prospective coding for predicted reward in the sensory thalamus. Nature 412(6846):546–549

    Article  CAS  PubMed  Google Scholar 

  • Lebedev MA, O’Doherty JE, Nicolelis MA (2008) Decoding of temporal intervals from cortical ensemble activity. J Neurophysiol 99(1):166–186

    Article  PubMed  Google Scholar 

  • Leon MI, Shadlen MN (2003) Representation of time by neurons in the posterior parietal cortex of the macaque. Neuron 38(2): 317–327

    Google Scholar 

  • Ljungberg T, Apicella P, Schultz W (1992) Responses of monkey dopamine neurons during learning of behavioral reactions. J Neurophysiol 67(1):145–163

    CAS  PubMed  Google Scholar 

  • Lucchetti C, Bon L (2001) Time-modulated neuronal activity in the premotor cortex of macaque monkeys. Exp Brain Res 141(2):254–260

    Article  CAS  PubMed  Google Scholar 

  • Lucchetti C, Ulrici A, Bon L (2005) Dorsal premotor areas of nonhuman primate: functional flexibility in time domain. Eur J Appl Physiol 95(2–3):121–130

    Article  PubMed  Google Scholar 

  • Ludvig EA, Sutton RS, Kehoe EJ (2008) Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Comput 20(12):3034–3054

    Article  PubMed  Google Scholar 

  • Ludvig EA, Sutton RS, Verbeek E, Kehoe EJ (2009) A computational model of hippocampal function in trace conditioning. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems, vol 21. MIT Press, Vancouver, pp 993–1000

  • Luzardo A, Ludvig EA, Rivest F (2013) An adaptive drift-diffusion model of interval timing dynamics. Behav Process. doi:10.1016/j.beproc.2013.02.003

  • Machado A (1997) Learning the temporal dynamics of behavior. Psychol Rev 104(2):241–265

    Article  CAS  PubMed  Google Scholar 

  • Matell MS, Meck WH (2004) Cortico-striatal circuits and interval timing: coincidence detection of oscillatory processes. Brain Res Cogn Brain Res 21(2):139–170

    Article  PubMed  Google Scholar 

  • Mauritz KH, Wise SP (1986) Premotor cortex of the rhesus monkey: neuronal activity in anticipation of predictable environmental events. Exp Brain Res 61(2):229–244

    Article  CAS  PubMed  Google Scholar 

  • Miall C (1989) The storage of time intervals using oscillating neurons. Neural Comput 1(3):359–371. doi:10.1162/neco.1989.1.3.359

    Article  Google Scholar 

  • Montague PR, Dayan P, Sejnowski TJ (1996) A framework for mesencephalic dopamine systems based on predictive hebbian learning. J Neurosci 16(5):1936–1947

    CAS  PubMed  Google Scholar 

  • Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H (2004) Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43(1):133–143

    Article  CAS  PubMed  Google Scholar 

  • Nakamura K, Ono T (1986) Lateral hypothalamus neuron involvement in integration of natural and artificial rewards and cue signals. J Neurophysiol 55(1):163–181

    CAS  PubMed  Google Scholar 

  • Niki H, Watanabe M (1979) Prefrontal and cingulate unit activity during timing behavior in the monkey. Brain Res 171(2):213–224

    Article  CAS  PubMed  Google Scholar 

  • O’Reilly RC, Frank MJ (2006) Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput 18(2):283–328

    Article  PubMed  Google Scholar 

  • Otani S, Daniel H, Roisin MP, Crepel F (2003) Dopaminergic modulation of long-term synaptic plasticity in rat prefrontal neurons. Cereb Cortex 13(11):1251–1256

    Article  PubMed  Google Scholar 

  • Pan WX, Schmidt R, Wickens JR, Hyland BI (2005) Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J Neurosci 25(26):6235–6242

    Article  CAS  PubMed  Google Scholar 

  • Reutimann J, Yakovlev V, Fusi S, Senn W (2004) Climbing neuronal activity as an event-based cortical representation of time. J Neurosci 24(13):3295–3303

    Article  CAS  PubMed  Google Scholar 

  • Rhodes BJ, Bullock D (2002) A scalable model of cerebellar adaptive timing and sequencing: the recurrent slide and latch (RSL) model. Appl Intell 17(1):35–48

    Article  Google Scholar 

  • Rivest F (2009) Modèle informatique du coapprentissage des ganglions de la base et du cortex : L’apprentissage par renforcement et le développement de représentations. Dissertation, Université de Montréal. https://papyrus.bib.umontreal.ca/xmlui/handle/1866/4309. Accessed 5 May 2010

  • Rivest F, Bengio Y (2011) Adaptive Drift-diffusion process to learn time intervals. Cornell University Librairy, arXiv:1103.2382v1

  • Rivest F, Kalaska JF, Bengio Y (2010) Alternative time representation in dopamine models. J Comput Neurosci 28(1):107–130

    Article  PubMed  Google Scholar 

  • Robinson AJ, Fallside F (1987) The utility driven dynamic error propagation network. Technical report CUED/F-INFENG/TR.1. Cambridge University, Engineering Department, Cambridge, England

  • Romo R, Brody CD, Hernandez A, Lemus L (1999) Neuronal correlates of parametric working memory in the prefrontal cortex. Nature 399(6735):470–473

    Article  CAS  PubMed  Google Scholar 

  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumerlhart DE, McClelland JL, Group tPR (eds) Parallel distributed processing: explorations in the microstructure of cognition. vol 1 Foundations. MITPress/Bradford Books, Cambridge

  • Sanabria F, Killeen PR (2007) Temporal generalization accounts for response resurgence in the peak procedure. Behav Process 74(2):126–141

    Article  Google Scholar 

  • Schneider BA, Ghose GM (2012) Temporal production signals in parietal cortex. PLoS Biol 10(10):e1001413. doi:10.1371/journal.pbio.1001413

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J Neurosci 13(3):900–913

    CAS  PubMed  Google Scholar 

  • Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275(5306):1593–1599

    Article  CAS  PubMed  Google Scholar 

  • Simen P, Balci F, de Souza L, Cohen JD, Holmes P (2011) A model of interval timing by neural integration. J Neurosci 31(25):9238–9253. doi:10.1523/JNEUROSCI.3121-10.2011

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Steuber V, Willshaw DJ (1999) Adaptive leaky integrator models of cerebellar Purkinje cells can learn the clustering of temporal patterns. Comput Neurosci 26–27:271–276

    Google Scholar 

  • Suri RE, Schultz W (1998) Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Exp Brain Res 121(3):350–354

    Article  CAS  PubMed  Google Scholar 

  • Suri RE, Schultz W (1999) A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91(3):871–890

    Article  CAS  PubMed  Google Scholar 

  • Sussillo D, Abbott LF (2009) Generating coherent patterns of activity from chaotic neural networks. Neuron 63(4):544–557

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9–44

    Google Scholar 

  • Sutton RS, Barto AG (1990) Time-derivative models of pavlovian reinforcement. In: Gabriel M, Moore J (eds) Learning and computational neuroscience: foundations of adaptive networks. MIT Press, Cambridge, pp 497–538

    Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction (adaptive computation and machine learning). MIT Press, Cambridge

    Google Scholar 

  • Thibaudeau G, Potvin O, Allen K, Dore FY, Goulet S (2007) Dorsal, ventral, and complete excitotoxic lesions of the hippocampus in rats failed to impair appetitive trace conditioning. Behav Brain Res 185(1):9–20

    Article  PubMed  Google Scholar 

  • Yamazaki T, Tanaka S (2007) The cerebellum as a liquid state machine. Neural Netw 20(3):290–297. doi:10.1016/j.neunet.2007.04.004

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

This manuscript profited from the comments of James Bergstra, Paul Cisek, Richard Courtemanche, and anonymous reviewers. F.R. was supported by doctoral studentships from the CIHR New Emerging Team Grant in Computational Neurosciences and from the Groupe de recherche sur le système nerveux central (FRSQ), and by a start-up fund from the Royal Military College of Canada. Y.B. and J.K. were supported by the CIHR New Emerging Team Grant in Computational Neurosciences (NET 54000; J.K., Y.B.) and by an FRSQ infrastructure grant. J.K. was also supported by CIHR operating grant (MOP 84454; J.K.) and CIHR Group Grant in Neurological Sciences (MGC 15176; J.K.). Part of this work also appeared as part of F.R. Ph.D. Thesis (Rivest 2009).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francois Rivest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (docx 1653 KB)

Appendices

Appendix 1: Changes from (Rivest et al. 2010) and hyper-parameters search

The new simulations are done at 10 Hz instead of 5 Hz, do not use probe trials during training (only probe trials on frozen networks after training), and memory blocks have a single memory cell instead of two (see Sect. 4, Experiment 1). In the present study, some experiments tested networks with 1, 2, and 5 memory blocks while previous experiments always used 2 memory blocks \((M = 2)\). The TD hyper-parameters remained the same (\(\alpha _\mathrm{{TD}} = 0.1,\) \(= 0.9, \gamma ~= 0.98\)) and are similar to those used by other authors (eg: Suri and Schultz 1999). The LSTM networks hyper-parameters were optimized for the new tasks and are now \((\alpha _\mathrm{{LSTM}} = 0.01, \beta = 1.0, \hbox {\char 106}_\mathrm{{LSTM}} = 0.0\)).

We optimized the LSTM hyper-parameters \(\alpha _\mathrm{{LSTM}} , \beta \), and (the last being the LSTM eligibility trace factor from the previous model) by performing an extensive grid search on all three tasks (Experiment 1) with different numbers of memory blocks \((M \in \{1,2,5\})\). Each hyper-parameter was varied within the range of powers \(\{\mathrm{{sqrt}}(10^{i}) : i \in \{-4,\ldots ,2\}\}U\{0\}\). For each hyper-parameter setting, 30 networks were trained on each task as in Experiment 1. We looked at how many networks could learn the task in the allocated time and how fast they were learning it. The search revealed no clear improvement for nonzero LSTM eligibility traces factor in success rate or learning speed for the near optimal \((\alpha _\mathrm{{LSTM}}, \beta )\) pair. Therefore, we eliminated the LSTM eligibility trace learning mechanism from our previous paper and selected what seemed to be the best general values for all tasks and number of memory blocks confounded for the two other hyper-parameters \((\alpha _\mathrm{{LSTM}} = 0.01, \beta = 1.0)\).

Appendix 2: Model’s time interval learning capacity

To rapidly evaluate how large an interval the model could learn with the selected parameter, randomly initialized networks with a single memory block () were trained on 4-min training blocks of alternating trials and intertrials intervals as in Experiment 1. Under each conditioning paradigm, we started a pool of 20 randomly initialized networks for each CS–US onsets interval, starting with 1,000 ms and increasing by steps of 100 ms. For a given task, as soon as one of the networks had a successful block, we cleared up the pool and started a new pool with an interval 100 ms longer. A training block was considered successful when the LSTM network was able to properly predict its next input \(({\vert }y_{\mathrm{{US}},t}- x_{\mathrm{{US}},t+1}{\vert } < 0.5)\) for 300 consecutive time steps (i.e., or about 5 consecutive trials). As in Experiment 1, each network was limited to 500 training blocks (5,000 for trace). Each network, independently of the task or interval length, was randomly initialized and therefore trained from scratch. We stopped increasing the CS–US onsets interval for a given task when no networks succeed for 5 different intervals. We have been able to successfully train a single memory block network for CS–US onsets intervals of 3.2 s (trace), 9.9 s (delay), and 5.1 s (extended).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rivest, F., Kalaska, J.F. & Bengio, Y. Conditioning and time representation in long short-term memory networks. Biol Cybern 108, 23–48 (2014). https://doi.org/10.1007/s00422-013-0575-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00422-013-0575-1

Keywords

Navigation