Short-term plasticity as cause–effect hypothesis testing in distal reward learning

Soltoggio, Andrea

doi:10.1007/s00422-014-0628-0

Short-term plasticity as cause–effect hypothesis testing in distal reward learning

Original Paper
Published: 05 September 2014

Volume 109, pages 75–94, (2015)
Cite this article

Biological Cybernetics Aims and scope Submit manuscript

Andrea Soltoggio¹

624 Accesses
13 Citations
8 Altmetric
Explore all metrics

Abstract

Asynchrony, overlaps, and delays in sensory–motor signals introduce ambiguity as to which stimuli, actions, and rewards are causally related. Only the repetition of reward episodes helps distinguish true cause–effect relationships from coincidental occurrences. In the model proposed here, a novel plasticity rule employs short- and long-term changes to evaluate hypotheses on cause–effect relationships. Transient weights represent hypotheses that are consolidated in long-term memory only when they consistently predict or cause future rewards. The main objective of the model is to preserve existing network topologies when learning with ambiguous information flows. Learning is also improved by biasing the exploration of the stimulus–response space toward actions that in the past occurred before rewards. The model indicates under which conditions beliefs can be consolidated in long-term memory, it suggests a solution to the plasticity–stability dilemma, and proposes an interpretation of the role of short-term plasticity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Perception of action-outcomes is shaped by life-long and contextual expectations

Article Open access 26 March 2019

Myrthel Dogge, Ruud Custers, … Henk Aarts

The successor representation in human reinforcement learning

Article 28 August 2017

I. Momennejad, E. M. Russek, … S. J. Gershman

Short-term reward experience biases inference despite dissociable neural correlates

Article Open access 22 November 2017

Adrian G. Fischer, Sacha Bourgeois-Gironde & Markus Ullsperger

Notes

In that case, it is essential that the traces $E$ are bound to positive values: negative traces that multiply with the negative baseline modulation would lead to unwanted weight increase.
The exact increment depends on the learning rate, on the exact circumstantial delay between activity and reward, and on the intensity of the stochastic reward.

References

Abbott LF, Regehr WG (2004) Synaptic computation. Nature 431:796–803
Article CAS PubMed Google Scholar
Abraham WC (2008) Metaplasticity: tuning synapses and networks for plasticity. Nat Rev Neurosci 9:387–399
Article CAS PubMed Google Scholar
Abraham WC, Bear MF (1996) Metaplasticity: the plasticity of synaptic plasticity. Trends Neurosci 19:126–130
Article CAS PubMed Google Scholar
Abraham WC, Robins A (2005) Memory retention—the synaptic stability versus plasticity dilemma. Trends Neurosci 28:73–78
Article CAS PubMed Google Scholar
Alexander WH, Sporns O (2002) An embodied model of learning, plasticity, and reward. Adapt Behav 10(3–4):143–159
Article Google Scholar
Asada M, Hosoda K, Kuniyoshi Y, Ishiguro H, Inui T, Yoshikawa Y, Ogino M, Yoshida C (2009) Cognitive developmental robotics: a survey. IEEE Trans Auton Mental Dev 1(1):12–34
Article Google Scholar
Bailey CH, Giustetto M, Huang YY, Hawkins RD, Kandel ER (2000) Is heterosynaptic modulation essential for stabilizing Hebbian plasticity and memory? Nat Rev Neurosci 1(1):11–20
Article CAS PubMed Google Scholar
Baras D, Meir R (2007) Reinforcement learning, spike-time-dependent plasticity, and the BCM rule. Neural Comput 19(8):2245–2279
Article PubMed Google Scholar
Ben-Gal I (2007) Bayesian networks. In: Encyclopedia of statistics in quality and reliability. Wiley, London
Berridge KC (2007) The debate over dopamine’s role in reward: the case for incentive salience. Psychopharmacology 191:391–431
Article CAS PubMed Google Scholar
Bosman R, van Leeuwen W, Wemmenhove B (2004) Combining Hebbian and reinforcement learning in a minibrain model. Neural Netw 17:29–36
Article CAS PubMed Google Scholar
Bouton ME (1994) Conditioning, remembering, and forgetting. J Exp Psychol Anim Behav Process 20(3):219
Article Google Scholar
Bouton ME (2000) A learning theory perspective on lapse, relapse, and the maintenance of behavior change. Health Psychol 19(1S):57
Article CAS PubMed Google Scholar
Bouton ME (2004) Context and behavioral processes in extinction. Learn Mem 11(5):485–494
Article PubMed Google Scholar
Bouton ME, Moody EW (2004) Memory processes in classical conditioning. Neurosci Biobehav Rev 28(7):663–674
Article PubMed Google Scholar
Brembs B (2003) Operant conditioning in invertebrates. Curr Opin Neurobiol 13(6):710–717
Article CAS PubMed Google Scholar
Brembs B, Lorenzetti FD, Reyes FD, Baxter DA, Byrne JH (2002) Operant reward learning in Aplysia: neuronal correlates and mechanisms. Science 296(5573):1706–1709
Article CAS PubMed Google Scholar
Clopath C, Ziegler L, Vasilaki E, Büsing L, Gerstner W (2008) Tag-trigger-consolidation: a model of early and late long-term-potentiation and depression. PLoS Comput Biol 4(12):335–347
Cox RB, Krichmar JL (2009) Neuromodulation as a robot controller: a brain inspired strategy for controlling autonomous robots. IEEE Robot Autom Mag 16(3):72–80
Article Google Scholar
Deco G, Rolls ET (2005) Synaptic and spiking dynamics underlying reward reversal in the orbitofrontal cortex. Cereb Cortex 15:15–30
Article PubMed Google Scholar
Dudai Y (2004) The neurobiology of consolidations, or, how stable is the engram? Annu Rev Psychol 55:51–86
Article PubMed Google Scholar
Farries MA, Fairhall AL (2007) Reinforcement learning with modulated spike timing-dependent synaptic plasticity. J Neurophysiol 98:3648–3665
Article PubMed Google Scholar
Fisher SA, Fischer TM, Carew TJ (1997) Multiple overlapping processes underlying short-term synaptic enhancement. Trends Neurosci 20(4):170–177
Article CAS PubMed Google Scholar
Florian RV (2007) Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Comput 19:1468–1502
Article PubMed Google Scholar
Frémaux N, Sprekeler H, Gerstner W (2010) Functional requirements for reward-modulated spike-timing-dependent plasticity. J Neurosci 30(40):13,326–13,337
Article Google Scholar
Frey U, Morris RGM (1997) Synaptic tagging and long-term potentiation. Nature 385:533–536
Article CAS PubMed Google Scholar
Friedrich J, Urbanczik R, Senn W (2010) Learning spike-based population codes by reward and population feedback. Neural Comput 22:1698–1717
Article PubMed Google Scholar
Friedrich J, Urbanczik R, Senn W (2011) Spatio-temporal credit assignment in neuronal population learning. PLoS Comput Biol 7(6):1–13
Fusi S, Senn W (2006) Eluding oblivion with smart stochastic selection of synaptic updates. Chaos An Interdiscip J Nonlinear Sci 16(2):026,112
Article Google Scholar
Fusi S, Drew PJ, Abbott L (2005) Cascade models of synaptically stored memories. Neuron 45(4):599–611
Article CAS PubMed Google Scholar
Fusi S, Asaad WF, Miller EK, Wang XJ (2007) A neural circuit model of flexible sensorimotor mapping: learning and forgetting on multiple timescales. Neuron 54(2):319–333
Article CAS PubMed Central PubMed Google Scholar
Garris P, Ciolkowski E, Pastore P, Wighmann R (1994) Efflux of dopamine from the synaptic cleft in the nucleus accumbens of the rat brain. J Neurosci 14(10):6084–6093
CAS PubMed Google Scholar
Gerstner W (2010) From Hebb rules to spike-timing-dependent plasticity: a personal account. Front Synaptic Neurosci 2:1–3
Gil M, DeMarco RJ, Menzel R (2007) Learning reward expectations in honeybees. Learn Mem 14:291–496
Article Google Scholar
Goelet P, Castellucci VF, Schacher S, Kandel ER (1986) The long and the short of long-term memory—a molecular framework. Nature 322(6078):419–422
Grossberg S (1971) On the dynamics of operant conditioning. J Theor Biol 33(2):225–255
Article CAS PubMed Google Scholar
Grossberg S (1988) Nonlinear neural networks: principles, mechanisms, and architectures. Neural Netw 1:17–61
Article Google Scholar
Hamilton RH, Pascual-Leone A (1998) Cortical plasticity associated with braille learning. Trends Cogn Sci 2(5):168–174
Article CAS PubMed Google Scholar
Hammer M, Menzel R (1995) Learning and memory in the honeybee. J Neurosci 15(3):1617–1630
CAS PubMed Google Scholar
Heckerman D, Geiger D, Chickering DM (1995) Learning bayesian networks: the combination of knowledge and statistical data. Mach Learn 20:197–243
Google Scholar
Howson C, Urbach P (1989) Scientific reasoning: the Bayesian approach. Open Court Publishing Co, Chicago, USA
Google Scholar
Hull CL (1943) Principles of behavior. Appleton Century, New York
Google Scholar
Izhikevich EM (2007) Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex 17:2443–2452
Jay MT (2003) Dopamine: a potential substrate for synaptic plasticity and memory mechanisms. Prog Neurobiol 69(6):375–390
Article CAS PubMed Google Scholar
Jonides J, Lewis RL, Nee DE, Lustig CA, Berman MG, Moore KS (2008) The mind and brain of short-term memory. Annu Rev Psychol 59:193
Article PubMed Central PubMed Google Scholar
Kempter R, Gerstner W, Van Hemmen JL (1999) Hebbian learning and spiking neurons. Phys Rev E 59(4):4498–4514
Article CAS Google Scholar
Krichmar JL, Roehrbein F (2013) Value and reward based learning in neurorobots. Front Neurorobot 7(13):1–2
Lamprecht R, LeDoux J (2004) Structural plasticity and memory. Nat Rev Neurosci 5(1):45–54
Article CAS PubMed Google Scholar
Legenstein R, Chase SM, Schwartz A, Maass W (2010) A reward-modulated Hebbian learning rule can explain experimentally observed network reorganization in a brain control task. J Neurosci 30(25):8400–8401
Article CAS PubMed Central PubMed Google Scholar
Leibold C, Kempter R (2008) Sparseness constrains the prolongation of memory lifetime via synaptic metaplasticity. Cereb Cortex 18(1):67–77
Article PubMed Google Scholar
Lin LJ (1993) Reinforcement learning for robots using neural networks. Ph.D. thesis, School of Computer Science. Carnegie Mellon University
Lungarella M, Metta G, Pfeifer R, Sandini G (2003) Developmental robotics: a survey. Connect Sci 15(4):151–190
Article Google Scholar
Lynch MA (2004) Long-term potentiation and memory. Physiol Rev 84(1):87–136
Article CAS PubMed Google Scholar
Mayford M, Siegelbaum SA, Kandel ER (2012) Synapses and memory storage. Cold Spring Harbor Perspect Biol 4(6):a005,751
Article Google Scholar
McGaugh JL (2000) Memory—a century of consolidation. Science 287:248–251
Menzel R, Müller U (1996) Learning and memory in honeybees: from behavior to natural substrates. Annu Rev Neurosci 19:179–404
Article Google Scholar
Montague PR, Dayan P, Person C, Sejnowski TJ (1995) Bee foraging in uncertain environments using predictive Hebbian learning. Nature 377:725–728
Article CAS PubMed Google Scholar
Nguyen PV, Abel T, Kandel ER (1994) Requirement of a critical period of transcription for induction of a late phase of ltp. Science 265(5175):1104–1107
Article CAS PubMed Google Scholar
Nitz DA, Kargo WJ, Fleisher J (2007) Dopamine signaling and the distal reward problem. Learn Mem 18(17):1833–1836
CAS Google Scholar
O’Brien MJ, Srinivasan N (2013) A spiking neural model for stable reinforcement of synapses based on multiple distal rewards. Neural Comput 25(1):123–156
Article PubMed Google Scholar
O’Doherty JP, Kringelbach ML, Rolls ET, Andrews C (2001) Abstract reward and punishment representations in the human orbitofrontal cortex. Nat Neurosci 4(1):95–102
Article PubMed Google Scholar
Ono K (1987) Superstitious behavior in humans. J Exp Anal Behav 47(3):261–271
Article CAS PubMed Central PubMed Google Scholar
Päpper M, Kempter R, Leibold C (2011) Synaptic tagging, evaluation of memories, and the distal reward problem. Learn Mem 18:58–70
Pennartz CMA (1996) The ascending neuromodulatory systems in learning by reinforcement: comparing computational conjectures with experimental findings. Brain Res Rev 21:219–245
Article Google Scholar
Pennartz CMA (1997) Reinforcement learning by hebbian synapses with adaptive threshold. Neuroscience 81(2):303–319
Article CAS PubMed Google Scholar
Redgrave P, Gurney K, Reynolds J (2008) What is reinforced by phasic dopamine signals? Brain Res Rev 58:322–339
Article CAS PubMed Google Scholar
Robins A (1995) Catastrophic forgetting, rehearsal, and pseudorehearsal. Connect Sci J Neural Comput Artif Intell Cogn Res 7:123–146
Google Scholar
Sandberg A, Tegnér J, Lansner A (2003) A working memory model based on fast hebbian learning. Netw Comput Neural Syst 14(4):789–802
Article CAS Google Scholar
Sarkisov DV, Wang SSH (2008) Order-dependent coincidence detection in cerebellar Purkinje neurons at the inositol trisphosphate receptor. J Neurosci 28(1):133–142
Article CAS PubMed Google Scholar
Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80:1–27
CAS PubMed Google Scholar
Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J Neurosci 13:900–913
Schultz W, Dayan P, Montague PR (1997) A neural substrate for prediction and reward. Science 275:1593–1598
Article CAS PubMed Google Scholar
Senn W, Fusi S (2005) Learning only when necessary: better memories of correlated patterns in networks with bounded synapses. Neural Comput 17(10):2106–2138
Article PubMed Google Scholar
Skinner BF (1948) “Superstition” in the pigeon. J Exp Psychol 38:168–172
Article CAS PubMed Google Scholar
Skinner BF (1953) Science and human behavior. MacMillan, New York
Google Scholar
Soltoggio A, Stanley KO (2012) From modulated Hebbian plasticity to simple behavior learning through noise and weight saturation. Neural Netw 34:28–41
Article PubMed Google Scholar
Soltoggio A, Steil JJ (2013) Solving the distal reward problem with rare correlations. Neural Comput 25(4):940–978
Article PubMed Google Scholar
Soltoggio A, Bullinaria JA, Mattiussi C, Dürr P, Floreano D (2008) Evolutionary advantages of neuromodulated plasticity in dynamic, reward-based scenarios. In: Artificial life XI: proceedings of the eleventh international conference on the simulation and synthesis of living systems. MIT Press, Cambridge
Soltoggio A, Lemme A, Reinhart FR, Steil JJ (2013a) Rare neural correlations implement robotic conditioning with reward delays and disturbances. Front Neurorobot 7:1–16 (Research Topic: Value and Reward Based Learning in Neurobots)
Soltoggio A, Reinhart FR, Lemme A, Steil JJ (2013b) Learning the rules of a game: neural conditioning in human–robot interaction with delayed rewards. In: Proceedings of the third joint IEEE international conference on development and learning and on epigenetic robotics, Osaka, Japan
Sporns O, Alexander WH (2002) Neuromodulation and plasticity in an autonomous robot. Neural Netw 15:761–774
Article PubMed Google Scholar
Sporns O, Alexander WH (2003) Neuromodulation in a learning robot: interactions between neural plasticity and behavior. Proc Int Joint Conf Neural Netw 4:2789–2794
Google Scholar
Staubli U, Fraser D, Faraday R, Lynch G (1987) Olfaction and the “data” memory system in rats. Behav Neurosci 101(6):757–765
Article CAS PubMed Google Scholar
Sutton RS (1984) Temporal credit assignment in reinforcement learning. Ph.D. thesis, Department of Computer Science, University of Massachusetts, Amherst, MA 01003
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MA, USA
Google Scholar
Swartzentruber D (1995) Modulatory mechanisms in pavlovian conditioning. Anim Learn Behav 23(2):123–143
Article Google Scholar
Thorndike EL (1911) Animal intelligence. Macmillan, New York
Google Scholar
Urbanczik R, Senn W (2009) Reinforcement learning in populations of spiking neurons. Nat Neurosci 12:250–252
Article CAS PubMed Google Scholar
Van Hemmen J (1997) Hebbian learning, its correlation catastrophe, and unlearning. Netw Comput Neural Syst 8(3):V1–V17
Article Google Scholar
Wang SSH, Denk W, Häusser M (2000) Coincidence detection in single dendritic spines mediated by calcium release. Nat Neurosci 3(12):1266–1273
Article CAS PubMed Google Scholar
Weng J, McClelland J, Pentland A, Sporns O, Stockman I, Sur M, Thelen E (2001) Autonomous mental development by robots and animals. Science 291(5504):599–600
Article CAS PubMed Google Scholar
Wighmann R, Zimmerman J (1990) Control of dopamine extracellular concentration in rat striatum by impulse flow and uptake. Brain Res Brain Res Rev 15(2):135–144
Article Google Scholar
Wise RA, Rompre PP (1989) Brain dopamine and reward. Annu Rev Psychol 40:191–225
Article CAS PubMed Google Scholar
Xie X, Seung HS (2004) Learning in neural networks by reinforcement of irregular spiking. Phys Rev E 69:1–10
Ziemke T, Thieme M (2002) Neuromodulation of reactive sensorimotor mappings as short-term memory mechanism in delayed response tasks. Adapt Behav 10:185–199
Article Google Scholar
Zucker RS (1989) Short-term synaptic plasticity. Annu Rev Neurosci 12(1):13–31
Zucker RS, Regehr WG (2002) Short-term synaptic plasticity. Annu Rev Physiol 64(1):355–405
Article CAS PubMed Google Scholar

Download references

Acknowledgments

The author thanks William Land, Albert Mukovskiy, Kenichi Narioka, Felix Reinhart, Walter Senn, Kenneth Stanley, and Paul Tonelli for constructive discussions and valuable comments on early drafts of the manuscript. A large part of this work was carried out while the author was with the CoR-Lab at Bielefeld University, funded by the European Community’s Seventh Framework Programme FP7/2007-2013, Challenge 2 Cognitive Systems, Interaction, Robotics under Grant Agreement No. 248311-AMARSi.

Author information

Authors and Affiliations

Computer Science Department, Loughborough University, Loughborough, LE11 3TU, UK
Andrea Soltoggio

Authors

Andrea Soltoggio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Soltoggio.

Appendices

Appendix 1: Unlearning

Unlearning of the long-term components of the weights can be effectively implemented as symmetrical to learning, i.e., when the transient weights are very negative (lower than $-\varPsi $), the long-term component of a weight is decreased. This process represents the validation of the hypothesis that a certain stimulus–action pair is not associated with a reward anymore or that is possibly associated with punishment. In such a case, the neural weight that represents this stimulus–action pair is decreased and so is the probability of occurring. The conversion of negative transient weights to decrements of long-term weights, similarly to Eq. (10), can be formally expressed as

$$\begin{aligned} \dot{w}^{lt}_{ji}(t) = -\rho \cdot H(-w^{st}_{ji}(t) - \varPsi ). \end{aligned}$$

(11)

No other changes are required to the algorithm described in the paper.

The case can be illustrated reproducing the preliminary test of Fig. 3, augmenting it with a phase characterized by a negative average modulation. Figure 10 shows that, when modulatory updates become negative on average (from reward 4,000 to reward 5,000), the transient weight detects it by becoming negative. The use of Eq. (11) then causes the long-term component to reduce its value, thereby reversing the previous learning.

Preliminary experiments with unlearning on the complete neural model of this study show that the rate of negative modulation drops drastically as unlearning proceed. In other words, as the network experiences negative modulation, and consequently reduces the frequencies of punishing stimulus–action pairs, it also reduces the rate of unlearning because punishing episodes become sporadic. It appears that unlearning from negative experiences might be slower than that learning from positive experiences. Evidence from biology indicates that extinction does not remove completely the previous association (Bouton 2000, 2004), suggesting that more complex dynamics as those proposed here may regulate this process in animals.

Appendix 2: Implementation

All implementation details are also available as part of the open source Matlab code provided as support material. The code can be used to reproduce the results in this work, or modified to perform further experiments. The source code can be downloaded from http://andrea.soltoggio.net/HTP.

1.1 Network, inputs, outputs, and rewards

The network is a feed-forward single layer neural network with 300 inputs, 30 outputs, 9,000 weights, and sampling time of 0.1 s. Three hundred stimuli are delivered to the network by means of 300 input neurons. Thirty actions are performed by the network by means of 30 output neurons.

The flow of stimuli consists of a random sequence of stimuli each of duration between 1 and 2 s. The probability of 0, 1, 2 or 3 stimuli to be shown to the network simultaneously is described in Table 2.

The agent continuously performs actions chosen from a pool of 30 possibilities. Thirty output neurons may be interpreted as single neurons, or populations. When one action terminates, the output neuron with the highest activity initiates the next action. Once the response action is started, it lasts a variable time between 1 and 2 s. During this time, the neuron that initiated the action receives a feedback signal I of 0.5. The feedback current enables the output neuron responsible for one action to correlate correctly with the stimulus that is simultaneously active. A feedback signal is also used in Urbanczik and Senn (2009) to improve the reinforcement learning performance of a neural network.

The rewarding stimulus–action pairs are $(i,i)$ with $1 \le i \le 10$ during scenario 1, $(i,i-5)$ with $11 \le i \le 20$ in scenario 2, and $(i,i-20)$ with $21 \le i \le 30$ in scenario 3. When a rewarding stimulus–action pair is performed, a reward is delivered to the network with a random delay in the interval [1, 4] s. Given the delay of the reward, and the frequency of stimuli and actions, a number of stimulus–action pairs could be responsible for triggering the reward. The parameters are listed in Table 2. Table 3 lists the parameters of the neural model.

Table 2 Summary of parameters for the input, output and reward signals

Full size table

Table 3 Summary of parameters of the neural model

Full size table

1.2 Integration

The integration of Eqs. (3) and (2) with a sampling time $\Delta t$ of $100$ ms is implemented step-wise by

$$\begin{aligned} E_{ji}(t+\Delta t)&= E_{ji}(t) \cdot e^{\frac{-\Delta t}{\tau _{E}}} + \mathrm{RCHP}_{ji}(t)\end{aligned}$$

(12)

$$\begin{aligned} m(t+\Delta t)&= m(t) \cdot e^{\frac{-\Delta t}{\tau _{m}}} + \lambda \, r(t) + b. \end{aligned}$$

(13)

The same integration method is used for all leaky integrators used in this study. Given that $r(t)$ is a signal from the environment, it might be a one-step signal as in the present study, which is high for one step when reward is delivered, or any other function representing a reward: In a test of RCHP on the real robot iCub (Soltoggio et al. 2013a, b), $r(t)$ was determined by the human teacher by pressing skin sensors on the robots arms.

1.3 Rarely correlating Hebbian plasticity

Rarely correlating Hebbian plasticity (RCHP) (Soltoggio and Steil 2013) is a type of Hebbian plasticity that filters out the majority of correlations and produces nonzero values only for a small percentage of synapses. Rate-based neurons can use a Hebbian rule augmented with two thresholds to extract low percentages of correlations and decorrelations. RCHP expressed by Eq. (4) is simulated with the parameters in Table 4. The rate of correlations can be expressed by a global concentration $\omega _{c}$. This measure represents how much the activity of the network correlates, i.e., how much the network activity is deterministically driven by connections or is instead noise-driven. The instantaneous matrix of correlations $\mathrm{RCHP}^+$ (i.e. the first row in Eq. (4) computed for all synapses) can be low filtered as

$$\begin{aligned} \dot{\omega }_{c}(t) = -\frac{\omega _{c}(t)}{\tau _{c}} + \sum _{j = 1}^{300}\sum _{i = 1}^{30} \mathrm{RCHP}_{ji}^+(t), \end{aligned}$$

(14)

to estimate the level of correlations in the recent past, where $j$ is the index of input neurons, and $i$ the index of the output neurons. In the current settings, $\tau _{c}$ was chosen equal to 5 s. Alternatively, a similar measure of recent correlations $\omega _{c}(t)$ can be computed in discrete time over a sliding time window of 5 s summing all correlations $\mathrm{RCHP}^+(t)$

$$\begin{aligned} \omega _{c}(t) = \Delta t \frac{\sum _{0}^{t-5} \mathrm{RCHP}^+(t)}{5}. \end{aligned}$$

(15)

Similar equations to (14) and (15) are used to estimate decorrelations $\omega _{d}(t)$ from the detected decorrelations $\mathrm{RCHP}^-(t)$. The adaptive thresholds $\theta _{hi}$ and $\theta _{lo}$ in Eq. (4) are estimated as follows.

$$\begin{aligned} \theta _{hi}(t+\Delta t) = \left\{ \begin{array}{l@{\quad }l} \theta _{hi} + \eta \cdot \Delta t \,\, &{}\mathrm{if} \,\,\omega _{c}(t) > 2\mu \\ \theta _{hi} - \eta \cdot \Delta t \,\, &{}\mathrm{if} \,\,\omega _{c}(t) < \mu /2\\ \theta _{hi}(t)&{}\mathrm{otherwise} \end{array} \right. \end{aligned}$$

(16)

and

$$\begin{aligned} \theta _{lo}(t+\Delta t) = \left\{ \begin{array}{l@{\quad }l} \theta _{lo} - \eta \cdot \Delta t \,\, &{}\mathrm{if} \,\,\omega _{d}(t) > 2\mu \\ \theta _{lo} + \eta \cdot \Delta t \,\, &{}\mathrm{if} \,\,\omega _{d}(t) < \mu /2\\ \theta _{lo}(t)&{}\!\mathrm{otherwise}\\ \end{array} \right. \end{aligned}$$

(17)

with $\eta = 0.001$ and $\mu $, the target rate of rare correlations, set to 0.1 %/s. If correlations are lower than half of the target or are greater than twice the target, the thresholds are adapted to the new increased or reduced activity. This heuristic has the purpose of maintaining the thresholds relatively constant and perform adaptation only when correlations are too high or too low for a long period of time.

Table 4 Summary of parameters of the plasticity rules (RCHP and RCHP$^{+}$ plus HTP)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soltoggio, A. Short-term plasticity as cause–effect hypothesis testing in distal reward learning. Biol Cybern 109, 75–94 (2015). https://doi.org/10.1007/s00422-014-0628-0

Download citation

Received: 16 December 2013
Accepted: 06 August 2014
Published: 05 September 2014
Issue Date: February 2015
DOI: https://doi.org/10.1007/s00422-014-0628-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Short-term plasticity as cause–effect hypothesis testing in distal reward learning

Abstract

Access this article

Similar content being viewed by others

Perception of action-outcomes is shaped by life-long and contextual expectations

The successor representation in human reinforcement learning

Short-term reward experience biases inference despite dissociable neural correlates

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Unlearning

Appendix 2: Implementation

1.1 Network, inputs, outputs, and rewards

1.2 Integration

1.3 Rarely correlating Hebbian plasticity

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Short-term plasticity as cause–effect hypothesis testing in distal reward learning

Abstract

Access this article

Similar content being viewed by others

Perception of action-outcomes is shaped by life-long and contextual expectations

The successor representation in human reinforcement learning

Short-term reward experience biases inference despite dissociable neural correlates

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Unlearning

Appendix 2: Implementation

1.1 Network, inputs, outputs, and rewards

1.2 Integration

1.3 Rarely correlating Hebbian plasticity

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation