Skip to main content
Log in

Short-term plasticity as cause–effect hypothesis testing in distal reward learning

  • Original Paper
  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

Asynchrony, overlaps, and delays in sensory–motor signals introduce ambiguity as to which stimuli, actions, and rewards are causally related. Only the repetition of reward episodes helps distinguish true cause–effect relationships from coincidental occurrences. In the model proposed here, a novel plasticity rule employs short- and long-term changes to evaluate hypotheses on cause–effect relationships. Transient weights represent hypotheses that are consolidated in long-term memory only when they consistently predict or cause future rewards. The main objective of the model is to preserve existing network topologies when learning with ambiguous information flows. Learning is also improved by biasing the exploration of the stimulus–response space toward actions that in the past occurred before rewards. The model indicates under which conditions beliefs can be consolidated in long-term memory, it suggests a solution to the plasticity–stability dilemma, and proposes an interpretation of the role of short-term plasticity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. In that case, it is essential that the traces \(E\) are bound to positive values: negative traces that multiply with the negative baseline modulation would lead to unwanted weight increase.

  2. The exact increment depends on the learning rate, on the exact circumstantial delay between activity and reward, and on the intensity of the stochastic reward.

References

  • Abbott LF, Regehr WG (2004) Synaptic computation. Nature 431:796–803

    Article  CAS  PubMed  Google Scholar 

  • Abraham WC (2008) Metaplasticity: tuning synapses and networks for plasticity. Nat Rev Neurosci 9:387–399

    Article  CAS  PubMed  Google Scholar 

  • Abraham WC, Bear MF (1996) Metaplasticity: the plasticity of synaptic plasticity. Trends Neurosci 19:126–130

    Article  CAS  PubMed  Google Scholar 

  • Abraham WC, Robins A (2005) Memory retention—the synaptic stability versus plasticity dilemma. Trends Neurosci 28:73–78

    Article  CAS  PubMed  Google Scholar 

  • Alexander WH, Sporns O (2002) An embodied model of learning, plasticity, and reward. Adapt Behav 10(3–4):143–159

    Article  Google Scholar 

  • Asada M, Hosoda K, Kuniyoshi Y, Ishiguro H, Inui T, Yoshikawa Y, Ogino M, Yoshida C (2009) Cognitive developmental robotics: a survey. IEEE Trans Auton Mental Dev 1(1):12–34

    Article  Google Scholar 

  • Bailey CH, Giustetto M, Huang YY, Hawkins RD, Kandel ER (2000) Is heterosynaptic modulation essential for stabilizing Hebbian plasticity and memory? Nat Rev Neurosci 1(1):11–20

    Article  CAS  PubMed  Google Scholar 

  • Baras D, Meir R (2007) Reinforcement learning, spike-time-dependent plasticity, and the BCM rule. Neural Comput 19(8):2245–2279

    Article  PubMed  Google Scholar 

  • Ben-Gal I (2007) Bayesian networks. In: Encyclopedia of statistics in quality and reliability. Wiley, London

  • Berridge KC (2007) The debate over dopamine’s role in reward: the case for incentive salience. Psychopharmacology 191:391–431

    Article  CAS  PubMed  Google Scholar 

  • Bosman R, van Leeuwen W, Wemmenhove B (2004) Combining Hebbian and reinforcement learning in a minibrain model. Neural Netw 17:29–36

    Article  CAS  PubMed  Google Scholar 

  • Bouton ME (1994) Conditioning, remembering, and forgetting. J Exp Psychol Anim Behav Process 20(3):219

    Article  Google Scholar 

  • Bouton ME (2000) A learning theory perspective on lapse, relapse, and the maintenance of behavior change. Health Psychol 19(1S):57

    Article  CAS  PubMed  Google Scholar 

  • Bouton ME (2004) Context and behavioral processes in extinction. Learn Mem 11(5):485–494

    Article  PubMed  Google Scholar 

  • Bouton ME, Moody EW (2004) Memory processes in classical conditioning. Neurosci Biobehav Rev 28(7):663–674

    Article  PubMed  Google Scholar 

  • Brembs B (2003) Operant conditioning in invertebrates. Curr Opin Neurobiol 13(6):710–717

    Article  CAS  PubMed  Google Scholar 

  • Brembs B, Lorenzetti FD, Reyes FD, Baxter DA, Byrne JH (2002) Operant reward learning in Aplysia: neuronal correlates and mechanisms. Science 296(5573):1706–1709

    Article  CAS  PubMed  Google Scholar 

  • Clopath C, Ziegler L, Vasilaki E, Büsing L, Gerstner W (2008) Tag-trigger-consolidation: a model of early and late long-term-potentiation and depression. PLoS Comput Biol 4(12):335–347

  • Cox RB, Krichmar JL (2009) Neuromodulation as a robot controller: a brain inspired strategy for controlling autonomous robots. IEEE Robot Autom Mag 16(3):72–80

    Article  Google Scholar 

  • Deco G, Rolls ET (2005) Synaptic and spiking dynamics underlying reward reversal in the orbitofrontal cortex. Cereb Cortex 15:15–30

    Article  PubMed  Google Scholar 

  • Dudai Y (2004) The neurobiology of consolidations, or, how stable is the engram? Annu Rev Psychol 55:51–86

    Article  PubMed  Google Scholar 

  • Farries MA, Fairhall AL (2007) Reinforcement learning with modulated spike timing-dependent synaptic plasticity. J Neurophysiol 98:3648–3665

    Article  PubMed  Google Scholar 

  • Fisher SA, Fischer TM, Carew TJ (1997) Multiple overlapping processes underlying short-term synaptic enhancement. Trends Neurosci 20(4):170–177

    Article  CAS  PubMed  Google Scholar 

  • Florian RV (2007) Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Comput 19:1468–1502

    Article  PubMed  Google Scholar 

  • Frémaux N, Sprekeler H, Gerstner W (2010) Functional requirements for reward-modulated spike-timing-dependent plasticity. J Neurosci 30(40):13,326–13,337

    Article  Google Scholar 

  • Frey U, Morris RGM (1997) Synaptic tagging and long-term potentiation. Nature 385:533–536

    Article  CAS  PubMed  Google Scholar 

  • Friedrich J, Urbanczik R, Senn W (2010) Learning spike-based population codes by reward and population feedback. Neural Comput 22:1698–1717

    Article  PubMed  Google Scholar 

  • Friedrich J, Urbanczik R, Senn W (2011) Spatio-temporal credit assignment in neuronal population learning. PLoS Comput Biol 7(6):1–13

  • Fusi S, Senn W (2006) Eluding oblivion with smart stochastic selection of synaptic updates. Chaos An Interdiscip J Nonlinear Sci 16(2):026,112

    Article  Google Scholar 

  • Fusi S, Drew PJ, Abbott L (2005) Cascade models of synaptically stored memories. Neuron 45(4):599–611

    Article  CAS  PubMed  Google Scholar 

  • Fusi S, Asaad WF, Miller EK, Wang XJ (2007) A neural circuit model of flexible sensorimotor mapping: learning and forgetting on multiple timescales. Neuron 54(2):319–333

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Garris P, Ciolkowski E, Pastore P, Wighmann R (1994) Efflux of dopamine from the synaptic cleft in the nucleus accumbens of the rat brain. J Neurosci 14(10):6084–6093

    CAS  PubMed  Google Scholar 

  • Gerstner W (2010) From Hebb rules to spike-timing-dependent plasticity: a personal account. Front Synaptic Neurosci 2:1–3

  • Gil M, DeMarco RJ, Menzel R (2007) Learning reward expectations in honeybees. Learn Mem 14:291–496

    Article  Google Scholar 

  • Goelet P, Castellucci VF, Schacher S, Kandel ER (1986) The long and the short of long-term memory—a molecular framework. Nature 322(6078):419–422

  • Grossberg S (1971) On the dynamics of operant conditioning. J Theor Biol 33(2):225–255

    Article  CAS  PubMed  Google Scholar 

  • Grossberg S (1988) Nonlinear neural networks: principles, mechanisms, and architectures. Neural Netw 1:17–61

    Article  Google Scholar 

  • Hamilton RH, Pascual-Leone A (1998) Cortical plasticity associated with braille learning. Trends Cogn Sci 2(5):168–174

    Article  CAS  PubMed  Google Scholar 

  • Hammer M, Menzel R (1995) Learning and memory in the honeybee. J Neurosci 15(3):1617–1630

    CAS  PubMed  Google Scholar 

  • Heckerman D, Geiger D, Chickering DM (1995) Learning bayesian networks: the combination of knowledge and statistical data. Mach Learn 20:197–243

    Google Scholar 

  • Howson C, Urbach P (1989) Scientific reasoning: the Bayesian approach. Open Court Publishing Co, Chicago, USA

    Google Scholar 

  • Hull CL (1943) Principles of behavior. Appleton Century, New York

    Google Scholar 

  • Izhikevich EM (2007) Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex 17:2443–2452

  • Jay MT (2003) Dopamine: a potential substrate for synaptic plasticity and memory mechanisms. Prog Neurobiol 69(6):375–390

    Article  CAS  PubMed  Google Scholar 

  • Jonides J, Lewis RL, Nee DE, Lustig CA, Berman MG, Moore KS (2008) The mind and brain of short-term memory. Annu Rev Psychol 59:193

    Article  PubMed Central  PubMed  Google Scholar 

  • Kempter R, Gerstner W, Van Hemmen JL (1999) Hebbian learning and spiking neurons. Phys Rev E 59(4):4498–4514

    Article  CAS  Google Scholar 

  • Krichmar JL, Roehrbein F (2013) Value and reward based learning in neurorobots. Front Neurorobot 7(13):1–2

  • Lamprecht R, LeDoux J (2004) Structural plasticity and memory. Nat Rev Neurosci 5(1):45–54

    Article  CAS  PubMed  Google Scholar 

  • Legenstein R, Chase SM, Schwartz A, Maass W (2010) A reward-modulated Hebbian learning rule can explain experimentally observed network reorganization in a brain control task. J Neurosci 30(25):8400–8401

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Leibold C, Kempter R (2008) Sparseness constrains the prolongation of memory lifetime via synaptic metaplasticity. Cereb Cortex 18(1):67–77

    Article  PubMed  Google Scholar 

  • Lin LJ (1993) Reinforcement learning for robots using neural networks. Ph.D. thesis, School of Computer Science. Carnegie Mellon University

  • Lungarella M, Metta G, Pfeifer R, Sandini G (2003) Developmental robotics: a survey. Connect Sci 15(4):151–190

    Article  Google Scholar 

  • Lynch MA (2004) Long-term potentiation and memory. Physiol Rev 84(1):87–136

    Article  CAS  PubMed  Google Scholar 

  • Mayford M, Siegelbaum SA, Kandel ER (2012) Synapses and memory storage. Cold Spring Harbor Perspect Biol 4(6):a005,751

    Article  Google Scholar 

  • McGaugh JL (2000) Memory—a century of consolidation. Science 287:248–251

  • Menzel R, Müller U (1996) Learning and memory in honeybees: from behavior to natural substrates. Annu Rev Neurosci 19:179–404

    Article  Google Scholar 

  • Montague PR, Dayan P, Person C, Sejnowski TJ (1995) Bee foraging in uncertain environments using predictive Hebbian learning. Nature 377:725–728

    Article  CAS  PubMed  Google Scholar 

  • Nguyen PV, Abel T, Kandel ER (1994) Requirement of a critical period of transcription for induction of a late phase of ltp. Science 265(5175):1104–1107

    Article  CAS  PubMed  Google Scholar 

  • Nitz DA, Kargo WJ, Fleisher J (2007) Dopamine signaling and the distal reward problem. Learn Mem 18(17):1833–1836

    CAS  Google Scholar 

  • O’Brien MJ, Srinivasan N (2013) A spiking neural model for stable reinforcement of synapses based on multiple distal rewards. Neural Comput 25(1):123–156

    Article  PubMed  Google Scholar 

  • O’Doherty JP, Kringelbach ML, Rolls ET, Andrews C (2001) Abstract reward and punishment representations in the human orbitofrontal cortex. Nat Neurosci 4(1):95–102

    Article  PubMed  Google Scholar 

  • Ono K (1987) Superstitious behavior in humans. J Exp Anal Behav 47(3):261–271

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Päpper M, Kempter R, Leibold C (2011) Synaptic tagging, evaluation of memories, and the distal reward problem. Learn Mem 18:58–70

  • Pennartz CMA (1996) The ascending neuromodulatory systems in learning by reinforcement: comparing computational conjectures with experimental findings. Brain Res Rev 21:219–245

    Article  Google Scholar 

  • Pennartz CMA (1997) Reinforcement learning by hebbian synapses with adaptive threshold. Neuroscience 81(2):303–319

    Article  CAS  PubMed  Google Scholar 

  • Redgrave P, Gurney K, Reynolds J (2008) What is reinforced by phasic dopamine signals? Brain Res Rev 58:322–339

    Article  CAS  PubMed  Google Scholar 

  • Robins A (1995) Catastrophic forgetting, rehearsal, and pseudorehearsal. Connect Sci J Neural Comput Artif Intell Cogn Res 7:123–146

    Google Scholar 

  • Sandberg A, Tegnér J, Lansner A (2003) A working memory model based on fast hebbian learning. Netw Comput Neural Syst 14(4):789–802

    Article  CAS  Google Scholar 

  • Sarkisov DV, Wang SSH (2008) Order-dependent coincidence detection in cerebellar Purkinje neurons at the inositol trisphosphate receptor. J Neurosci 28(1):133–142

    Article  CAS  PubMed  Google Scholar 

  • Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80:1–27

    CAS  PubMed  Google Scholar 

  • Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J Neurosci 13:900–913

  • Schultz W, Dayan P, Montague PR (1997) A neural substrate for prediction and reward. Science 275:1593–1598

    Article  CAS  PubMed  Google Scholar 

  • Senn W, Fusi S (2005) Learning only when necessary: better memories of correlated patterns in networks with bounded synapses. Neural Comput 17(10):2106–2138

    Article  PubMed  Google Scholar 

  • Skinner BF (1948) “Superstition” in the pigeon. J Exp Psychol 38:168–172

    Article  CAS  PubMed  Google Scholar 

  • Skinner BF (1953) Science and human behavior. MacMillan, New York

    Google Scholar 

  • Soltoggio A, Stanley KO (2012) From modulated Hebbian plasticity to simple behavior learning through noise and weight saturation. Neural Netw 34:28–41

    Article  PubMed  Google Scholar 

  • Soltoggio A, Steil JJ (2013) Solving the distal reward problem with rare correlations. Neural Comput 25(4):940–978

    Article  PubMed  Google Scholar 

  • Soltoggio A, Bullinaria JA, Mattiussi C, Dürr P, Floreano D (2008) Evolutionary advantages of neuromodulated plasticity in dynamic, reward-based scenarios. In: Artificial life XI: proceedings of the eleventh international conference on the simulation and synthesis of living systems. MIT Press, Cambridge

  • Soltoggio A, Lemme A, Reinhart FR, Steil JJ (2013a) Rare neural correlations implement robotic conditioning with reward delays and disturbances. Front Neurorobot 7:1–16 (Research Topic: Value and Reward Based Learning in Neurobots)

  • Soltoggio A, Reinhart FR, Lemme A, Steil JJ (2013b) Learning the rules of a game: neural conditioning in human–robot interaction with delayed rewards. In: Proceedings of the third joint IEEE international conference on development and learning and on epigenetic robotics, Osaka, Japan

  • Sporns O, Alexander WH (2002) Neuromodulation and plasticity in an autonomous robot. Neural Netw 15:761–774

    Article  PubMed  Google Scholar 

  • Sporns O, Alexander WH (2003) Neuromodulation in a learning robot: interactions between neural plasticity and behavior. Proc Int Joint Conf Neural Netw 4:2789–2794

    Google Scholar 

  • Staubli U, Fraser D, Faraday R, Lynch G (1987) Olfaction and the “data” memory system in rats. Behav Neurosci 101(6):757–765

    Article  CAS  PubMed  Google Scholar 

  • Sutton RS (1984) Temporal credit assignment in reinforcement learning. Ph.D. thesis, Department of Computer Science, University of Massachusetts, Amherst, MA 01003

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MA, USA

    Google Scholar 

  • Swartzentruber D (1995) Modulatory mechanisms in pavlovian conditioning. Anim Learn Behav 23(2):123–143

    Article  Google Scholar 

  • Thorndike EL (1911) Animal intelligence. Macmillan, New York

    Google Scholar 

  • Urbanczik R, Senn W (2009) Reinforcement learning in populations of spiking neurons. Nat Neurosci 12:250–252

    Article  CAS  PubMed  Google Scholar 

  • Van Hemmen J (1997) Hebbian learning, its correlation catastrophe, and unlearning. Netw Comput Neural Syst 8(3):V1–V17

    Article  Google Scholar 

  • Wang SSH, Denk W, Häusser M (2000) Coincidence detection in single dendritic spines mediated by calcium release. Nat Neurosci 3(12):1266–1273

    Article  CAS  PubMed  Google Scholar 

  • Weng J, McClelland J, Pentland A, Sporns O, Stockman I, Sur M, Thelen E (2001) Autonomous mental development by robots and animals. Science 291(5504):599–600

    Article  CAS  PubMed  Google Scholar 

  • Wighmann R, Zimmerman J (1990) Control of dopamine extracellular concentration in rat striatum by impulse flow and uptake. Brain Res Brain Res Rev 15(2):135–144

    Article  Google Scholar 

  • Wise RA, Rompre PP (1989) Brain dopamine and reward. Annu Rev Psychol 40:191–225

    Article  CAS  PubMed  Google Scholar 

  • Xie X, Seung HS (2004) Learning in neural networks by reinforcement of irregular spiking. Phys Rev E 69:1–10

  • Ziemke T, Thieme M (2002) Neuromodulation of reactive sensorimotor mappings as short-term memory mechanism in delayed response tasks. Adapt Behav 10:185–199

    Article  Google Scholar 

  • Zucker RS (1989) Short-term synaptic plasticity. Annu Rev Neurosci 12(1):13–31

  • Zucker RS, Regehr WG (2002) Short-term synaptic plasticity. Annu Rev Physiol 64(1):355–405

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The author thanks William Land, Albert Mukovskiy, Kenichi Narioka, Felix Reinhart, Walter Senn, Kenneth Stanley, and Paul Tonelli for constructive discussions and valuable comments on early drafts of the manuscript. A large part of this work was carried out while the author was with the CoR-Lab at Bielefeld University, funded by the European Community’s Seventh Framework Programme FP7/2007-2013, Challenge 2 Cognitive Systems, Interaction, Robotics under Grant Agreement No. 248311-AMARSi.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Soltoggio.

Appendices

Appendix 1: Unlearning

Unlearning of the long-term components of the weights can be effectively implemented as symmetrical to learning, i.e., when the transient weights are very negative (lower than \(-\varPsi \)), the long-term component of a weight is decreased. This process represents the validation of the hypothesis that a certain stimulus–action pair is not associated with a reward anymore or that is possibly associated with punishment. In such a case, the neural weight that represents this stimulus–action pair is decreased and so is the probability of occurring. The conversion of negative transient weights to decrements of long-term weights, similarly to Eq.  (10), can be formally expressed as

$$\begin{aligned} \dot{w}^{lt}_{ji}(t) = -\rho \cdot H(-w^{st}_{ji}(t) - \varPsi ). \end{aligned}$$
(11)

No other changes are required to the algorithm described in the paper.

The case can be illustrated reproducing the preliminary test of Fig. 3, augmenting it with a phase characterized by a negative average modulation. Figure 10 shows that, when modulatory updates become negative on average (from reward 4,000 to reward 5,000), the transient weight detects it by becoming negative. The use of Eq. (11) then causes the long-term component to reduce its value, thereby reversing the previous learning.

Fig. 10
figure 10

Unlearning dynamics. In this experiment, the model presented in the paper was augmented with Eq. (11), which decreases long-term weights if the transient weights are lower than \(-\varPsi \). The stochastic modulatory update (top graph) is set to have a slightly negative average in the last phase (from reward 4,001 to 5,000). The negative average is detected by the short-term component that becomes negative. The long-term component decreases its value due to Eq. (11)

Preliminary experiments with unlearning on the complete neural model of this study show that the rate of negative modulation drops drastically as unlearning proceed. In other words, as the network experiences negative modulation, and consequently reduces the frequencies of punishing stimulus–action pairs, it also reduces the rate of unlearning because punishing episodes become sporadic. It appears that unlearning from negative experiences might be slower than that learning from positive experiences. Evidence from biology indicates that extinction does not remove completely the previous association (Bouton 2000, 2004), suggesting that more complex dynamics as those proposed here may regulate this process in animals.

Appendix 2: Implementation

All implementation details are also available as part of the open source Matlab code provided as support material. The code can be used to reproduce the results in this work, or modified to perform further experiments. The source code can be downloaded from http://andrea.soltoggio.net/HTP.

1.1 Network, inputs, outputs, and rewards

The network is a feed-forward single layer neural network with 300 inputs, 30 outputs, 9,000 weights, and sampling time of 0.1 s. Three hundred stimuli are delivered to the network by means of 300 input neurons. Thirty actions are performed by the network by means of 30 output neurons.

The flow of stimuli consists of a random sequence of stimuli each of duration between 1 and 2 s. The probability of 0, 1, 2 or 3 stimuli to be shown to the network simultaneously is described in Table 2.

The agent continuously performs actions chosen from a pool of 30 possibilities. Thirty output neurons may be interpreted as single neurons, or populations. When one action terminates, the output neuron with the highest activity initiates the next action. Once the response action is started, it lasts a variable time between 1 and 2 s. During this time, the neuron that initiated the action receives a feedback signal I of 0.5. The feedback current enables the output neuron responsible for one action to correlate correctly with the stimulus that is simultaneously active. A feedback signal is also used in Urbanczik and Senn (2009) to improve the reinforcement learning performance of a neural network.

The rewarding stimulus–action pairs are \((i,i)\) with \(1 \le i \le 10\) during scenario 1, \((i,i-5)\) with \(11 \le i \le 20\) in scenario 2, and \((i,i-20)\) with \(21 \le i \le 30\) in scenario 3. When a rewarding stimulus–action pair is performed, a reward is delivered to the network with a random delay in the interval [1, 4] s. Given the delay of the reward, and the frequency of stimuli and actions, a number of stimulus–action pairs could be responsible for triggering the reward. The parameters are listed in Table 2. Table 3 lists the parameters of the neural model.

Table 2 Summary of parameters for the input, output and reward signals
Table 3 Summary of parameters of the neural model

1.2 Integration

The integration of Eqs. (3) and (2) with a sampling time \(\Delta t\) of \(100\) ms is implemented step-wise by

$$\begin{aligned} E_{ji}(t+\Delta t)&= E_{ji}(t) \cdot e^{\frac{-\Delta t}{\tau _{E}}} + \mathrm{RCHP}_{ji}(t)\end{aligned}$$
(12)
$$\begin{aligned} m(t+\Delta t)&= m(t) \cdot e^{\frac{-\Delta t}{\tau _{m}}} + \lambda \, r(t) + b. \end{aligned}$$
(13)

The same integration method is used for all leaky integrators used in this study. Given that \(r(t)\) is a signal from the environment, it might be a one-step signal as in the present study, which is high for one step when reward is delivered, or any other function representing a reward: In a test of RCHP on the real robot iCub (Soltoggio et al. 2013a, b), \(r(t)\) was determined by the human teacher by pressing skin sensors on the robots arms.

1.3 Rarely correlating Hebbian plasticity

Rarely correlating Hebbian plasticity (RCHP) (Soltoggio and Steil 2013) is a type of Hebbian plasticity that filters out the majority of correlations and produces nonzero values only for a small percentage of synapses. Rate-based neurons can use a Hebbian rule augmented with two thresholds to extract low percentages of correlations and decorrelations. RCHP expressed by Eq. (4) is simulated with the parameters in Table 4. The rate of correlations can be expressed by a global concentration \(\omega _{c}\). This measure represents how much the activity of the network correlates, i.e., how much the network activity is deterministically driven by connections or is instead noise-driven. The instantaneous matrix of correlations \(\mathrm{RCHP}^+\) (i.e. the first row in Eq. (4) computed for all synapses) can be low filtered as

$$\begin{aligned} \dot{\omega }_{c}(t) = -\frac{\omega _{c}(t)}{\tau _{c}} + \sum _{j = 1}^{300}\sum _{i = 1}^{30} \mathrm{RCHP}_{ji}^+(t), \end{aligned}$$
(14)

to estimate the level of correlations in the recent past, where \(j\) is the index of input neurons, and \(i\) the index of the output neurons. In the current settings, \(\tau _{c}\) was chosen equal to 5 s. Alternatively, a similar measure of recent correlations \(\omega _{c}(t)\) can be computed in discrete time over a sliding time window of 5 s summing all correlations \(\mathrm{RCHP}^+(t)\)

$$\begin{aligned} \omega _{c}(t) = \Delta t \frac{\sum _{0}^{t-5} \mathrm{RCHP}^+(t)}{5}. \end{aligned}$$
(15)

Similar equations to (14) and (15) are used to estimate decorrelations \(\omega _{d}(t)\) from the detected decorrelations \(\mathrm{RCHP}^-(t)\). The adaptive thresholds \(\theta _{hi}\) and \(\theta _{lo}\) in Eq. (4) are estimated as follows.

$$\begin{aligned} \theta _{hi}(t+\Delta t) = \left\{ \begin{array}{l@{\quad }l} \theta _{hi} + \eta \cdot \Delta t \,\, &{}\mathrm{if} \,\,\omega _{c}(t) > 2\mu \\ \theta _{hi} - \eta \cdot \Delta t \,\, &{}\mathrm{if} \,\,\omega _{c}(t) < \mu /2\\ \theta _{hi}(t)&{}\mathrm{otherwise} \end{array} \right. \end{aligned}$$
(16)

and

$$\begin{aligned} \theta _{lo}(t+\Delta t) = \left\{ \begin{array}{l@{\quad }l} \theta _{lo} - \eta \cdot \Delta t \,\, &{}\mathrm{if} \,\,\omega _{d}(t) > 2\mu \\ \theta _{lo} + \eta \cdot \Delta t \,\, &{}\mathrm{if} \,\,\omega _{d}(t) < \mu /2\\ \theta _{lo}(t)&{}\!\mathrm{otherwise}\\ \end{array} \right. \end{aligned}$$
(17)

with \(\eta = 0.001\) and \(\mu \), the target rate of rare correlations, set to 0.1 %/s. If correlations are lower than half of the target or are greater than twice the target, the thresholds are adapted to the new increased or reduced activity. This heuristic has the purpose of maintaining the thresholds relatively constant and perform adaptation only when correlations are too high or too low for a long period of time.

Table 4 Summary of parameters of the plasticity rules (RCHP and RCHP\(^{+}\) plus HTP)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Soltoggio, A. Short-term plasticity as cause–effect hypothesis testing in distal reward learning. Biol Cybern 109, 75–94 (2015). https://doi.org/10.1007/s00422-014-0628-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00422-014-0628-0

Keywords

Navigation