Phenomenological models of synaptic plasticity based on spike timing
- First Online:
- Cite this article as:
- Morrison, A., Diesmann, M. & Gerstner, W. Biol Cybern (2008) 98: 459. doi:10.1007/s00422-008-0233-1
- 2.8k Views
Synaptic plasticity is considered to be the biological substrate of learning and memory. In this document we review phenomenological models of short-term and long-term synaptic plasticity, in particular spike-timing dependent plasticity (STDP). The aim of the document is to provide a framework for classifying and evaluating different models of plasticity. We focus on phenomenological synaptic models that are compatible with integrate-and-fire type neuron models where each neuron is described by a small number of variables. This implies that synaptic update rules for short-term or long-term plasticity can only depend on spike timing and, potentially, on membrane potential, as well as on the value of the synaptic weight, or on low-pass filtered (temporally averaged) versions of the above variables. We examine the ability of the models to account for experimental data and to fulfill expectations derived from theoretical considerations. We further discuss their relations to teacher-based rules (supervised learning) and reward-based rules (reinforcement learning). All models discussed in this paper are suitable for large-scale network simulations.
KeywordsSpike-timing dependent plasticityShort term plasticityModelingSimulationLearning
Synaptic changes are thought to be involved in learning, memory, and cortical plasticity, but the exact relation between microscopic synaptic properties and macroscopic functional consequences remains highly controversial. In experimental preparations, synaptic changes can be induced by specific stimulation conditions defined through presynaptic firing rates (Bliss and Lomo 1973; Dudek and Bear 1992), postsynaptic membrane potential (Kelso et al. 1986; Artola et al. 1990), calcium entry (Lisman 1989; Malenka et al. 1988), or spike timing (Markram et al. 1997; Bi and Poo 2001).
Whereas detailed biophysical models are crucial to understand the biological mechanisms underlying synaptic plasticity, phenomenological models which describe the synaptic changes without reference to mechanism are generally more tractable and less computationally expensive. Consequently, phenomenological models are of great use in analytical and simulation studies. In this manuscript, we will examine a number of phenomenological models with respect to their compatibility with both experimental and theoretical results. In all cases, we consider a synapse from a presynaptic neuron j to a postsynaptic neuron i. The strength of a connection from j to i is characterized by a weight wij that quantifies the amplitude of the postsynaptic response, typically measured as the height of the postsynaptic potential or the slope of the postsynaptic current at onset. The conditions for synaptic changes as well as their directions and magnitudes can be formulated as ‘synaptic update rules’ or ‘learning rules’. Such rules can be developed from purely theoretical considerations, or to account for macroscopic phenomena such as the development of receptive fields, or based on findings from electrophysiological experiments manipulating firing rate or voltage. In this manuscript, however, we restrict our scope to rules which have been developed to account for the results of experiments in which synaptic plasticity was observed as a result of pre- and postsynaptic spikes (for more general reviews, see Dayan and Abbott 2001; Gerstner and Kistler 2002; Cooper et al. 2004).
For the classification of the synaptic plasticity rules, it is important to specify the time necessary to induce such a change as well as the time scale of persistence of the change. For both short-term and long-term plasticity, changes can be induced in about 1 s or less. In short-term plasticity (see Sect. 3), a sequence of eight presynaptic spikes at 20Hz evokes successively smaller (depression) or successively larger (facilitation) responses in the postsynaptic cell. The characteristic feature of short-term plasticity is that this change does not persist for more than a few hundred milliseconds: the amplitude of the postsynaptic response recovers to close-to-normal values within less than a second (Markram et al. 1998; Thomson et al. 1993).
In contrast to short-term plasticity, long-term potentiation and depression (LTP and LTD) refer to persistent changes of synaptic responses (see Sect. 4). Note that the time necessary for induction can still be relatively brief. For example, in spike-timing-dependent plasticity (Bi and Poo 2001; Sjostrom et al. 2001), a change of the synapse can be induced by 60 pairs of pre- and postsynaptic spikes with a repetition frequency of 20Hz; hence stimulation is over after 3 s. However, this change can persist for more than one hour. The final stabilization of, say, a potentiated synapse occurs only thereafter, called the late phase of LTP (Frey and Morris 1997). An additional aspect is that neurons in the brain must remain within a sustainable activity regime, despite the changes induced by LTP and LTD. This is achieved by homeostatic plasticity, an up- or down-regulation of all synapses converging onto the same postsynaptic neuron which occurs on the time scale of minutes to hours (Turrigiano and Nelson 2004).
The phenomenological models discussed in this manuscript can be classified from a theoretical point of view as unsupervised learning rules. There is no notion of a task to be solved, nor is there any notion of the change being ‘good’ or ‘bad’ for the survival of the animal; learning consists simply of an adaptation of the synapse to the statistics of the activity of pre- and postsynaptic neurons. This is to be contrasted with reward-based learning, also called reinforcement learning (Sutton and Barto 1998).Inreward-based learning the direction and amount of change depends on the presence or absence of a success signal, that may reflect the current reward or the difference between expected and received reward (Schultz et al. 1997). Reward-based learning rules are distinct from supervised learning since the success signal is considered as a global and unspecific feedback signal, that often comes with a delay, whereas in supervised learning the feedback is much more specific. In the theoretical literature, there exists a large variety of update rules that can be classified as supervised, unsupervised or reward based learning rules.
In this paper, we start with a review of some basic experimental facts that could be relevant for modeling, followed by a list of theoretical concepts arising from fundamental notions of learning and memory formation (Sect. 2).We then review models of short-term plasticity in Sect. 3 and models of long-term potentiation/depression (LTP/LTD), in particular the spike-timing dependent form, in Sect. 4. Throughout the review we discuss spike-based plasticity rules from a computational perspective, giving implementations that are appropriate for analytical and simulation approaches. In the final sections we briefly mention reward driven learning rules for spiking neurons (Sect. 5) and provide an outlook toward current challenges for modeling. The relevance of molecular mechanisms and signaling chains (Lisman 1989; Malenka et al. 1988) for models of synaptic plasticity (Lisman and Zhabotinsky 2001; Shouval et al. 2002; Rubin et al. 2005; Badoual et al. 2006; Graupner and Brunel 2007; Zou and Destexhe 2007), as well as the importance of the postsynaptic voltage (Kelso et al. 1986; Artola et al. 1990; Sjostrom et al. 2001), is acknowledged but not further explored.
2 2 Perspectives on plasticity
Over the last 30 years, a large body of experimental results on synaptic plasticity has been accumulated. The most important discoveries are summarized in Sect. 2.1. Simultaneously, theoreticians have investigated the role of synaptic plasticity in long-term memory, developmental learning and task-specific learning. The most important concepts arising from this research are described in Sect. 2.2. Many of the plasticity models employed in the theoretical approach were inspired by Hebb’s (1949) postulate that describes how synaptic connections should be modified:
When an axon of cell A is near enough to excite cell B or repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.
In classical Hebbian models, this famous postulate is often rephrased in the sense that modifications in the synaptic transmission efficacy are driven by correlations in the firing activity of pre- and postsynaptic neurons. Even though the idea of learning through correlations dates further back in the past (James 1890), correlation-based learning is now generally called Hebbian learning. Most classic theoretical studies represented the activity of pre- and postsynaptic neurons in terms of rates, expressed as continuous functions. This has led to a sound understanding of rate-based Hebbian learning. However, rate-based Hebbian learning neglects the fine temporal structure between pre- and postsynaptic spikes. Spike-based learning models for temporally structured input need to take this timing information into account (e.g. Gerstner et al. 1993) which leads to models of spike-timing dependent plasticity (STDP) (Gerstner et al. 1996; Kempter et al. 1999; Roberts 1999; Abbott and Nelson 2000) that can be seen as a spike-based generalization of Hebbian learning. The first experimental reports showing both long-term potentiation and depression induced by causal and acausal spike timings on a time scale of 10ms were published by Markram and Sakmann (1995) and Markram et al. (1997), slightly after the theoretical work, however potentiation induced by the pairing of EPSPs with postsynaptic depolarization on a time scale of 100ms was demonstrated considerably earlier (Gustafsson et al. 1987). Timing in rate-based Hebbian learning (although not spike-based) can be traced even further back in the past (Levy and Steward 1983). From a conceptual point of view, all spike-based and rate-based Hebbian learning rules share the feature that only variables that are locally available at the synapse can be used to change the synaptic weight. These local elements that can be used to construct such rules are listed in Sect. 2.3.
2.1 2.1 Experimental results
Long-term plasticity is sensitive to the presynaptic firing rate over a time scale of tens or hundreds of seconds. For example 900 presynaptic stimulation pulses at 1Hz (i.e. 15min of total stimulation time) yield a persistent depression of the synapses, whereas the same number of pulses at 50Hz yields potentiation (Dudek and Bear 1992).
Long-term plasticity depends on the exact timing of the pre- and postsynaptic spikes on the time scale of milliseconds (Markram et al. 1997; Bi and Poo 2001). For example LTP is induced if a presynaptic spike precedes the postsynaptic one by 10 ms, whereas LTD occurs if the order of spikes is reversed. In this context it is important to realize that most experiments are done with repetitions of 50–60 pairs of spikes whereas a single pair has no effect.
STDP depends on the repetition frequency of the prepost spike-pairings. In fact, 60 pairings pre-before-post at low frequency have no effect, whereas the same number of pairs at a repetition frequency of 20Hz gives strong potentiation (Sjostrom et al. 2001).
Plasticity depends on the postsynaptic potential (Kelso et al. 1986; Artola et al. 1990). If the postsynaptic neuron is clamped to a voltage slightly above rest during presynaptic spike arrival, the synapses are depressed, while at higher depolarization the same stimulation leads to LTP (Artola et al. 1990; Ngezahayo et al. 2000).
On a slow time scale of hours, homeostatic changes of synapses may occur in form of rescaling of synaptic response amplitudes (Turrigiano et al. 1994). These changes can be useful to stabilize neuronal firing rates.
Also on the time scale of hours, early phase LTP is consolidated into late phase LTP. During the consolidation phase heterosynaptic interactions may take place, probably as a result of synaptic tagging and competition for scarce protein supply (Frey and Morris 1997). Consolidation is thought to lead to long-term stability of the synapses.
Distributions of synaptic strength (e.g., the EPSP amplitudes) in data collected across several pairs of neurons are reported to be unimodal (Sjostrom et al. 2001). At a first glance, this seems to be at odds with experimental data suggesting that single synaptic contacts are in fact binary (Petersen et al. 1998; O’Connor et al. 2005).
Synapses do not form a homogeneous group, but different types of synapse have different plasticity properties (Abbott and Nelson 2000; Thomson and Lamy 2007). In fact, the same presynaptic neuron makes connections to different types of target neurons with different plasticity properties for short-term (Markram et al. 1998) and long-term plasticity (Lu et al. 2007).
Many other experimental features could be added to this list, e.g., the role of intracellular calcium, of NMDA receptors, etc., but we will not do so; see Bliss and Collingridge (1993) and Malenka and Nicoll (1993) for reviews. We emphasize that, given the heterogeneity of synapses between different brain areas (plasticity has mainly been studied in visual or somatosensory cortex and hippocampus) and between different neuron and synapse types, we cannot expect that a single theoretical model can account for all experimental facts. In the next section, we will instead consider which theoretical principles could guide our search for suitable plasticity rules.
2.2 2.2 Theoretical concepts
a mechanism for the development of input selectivity such as receptive fields (Bienenstock et al. 1982; Miller et al. 1989), in the presence of strong input features. This is the essence of developmental learning
the ability to take into account the quality of task performance mediated by a global success signal (e.g. neuro-modulators, Schultz et al. 1997). This is the essence of reinforcement learning (Sutton and Barto 1998).
These items are not necessarily exclusive, and the relative importance of a given aspect may vary from one subsystem to the next; for example, synaptic memory maintenance might be more important for a long-term memory system than for primary sensory cortices. There is so far no rule which exhibits all of the above properties; moreover, theoretical models which reproduce some aspects of experimental findings are generally incompatible with other findings. For example, traditional learning rules that have been proposed as an explanation of receptive field development (Bienenstock et al. 1982; Miller et al. 1989), exhibit a spontaneous separation of synaptic weights into two groups, even if the input shows no or only weak correlations. This is difficult to reconcile with experimental results in visual cortex of young rats where a unimodal distribution was found (Sjostrom et al. 2001). Moreover model neurons that specialize early in development on one subset of features cannot readily re-adapt later on. On the other hand, learning rules that do produce a unimodal distribution of synaptic weights (van Rossum et al. 2000; Rubin et al. 2001; Gütig et al. 2003; Morrison et al. 2007) do not lead to long-term stability of synaptic changes, as the trajectories of individual synaptic weights perform random walks. Hence it appears that long-term stability of memory requires a multimodal synapse distribution (Toyoizumi et al. 2007; Billings and van Rossum 2008) or additional mechanisms to stabilize the synaptic weights contributing to the retention of a memory item.
2.3 2.3 Locally computable measures
spontaneous growth or decay (a non-Hebbian zeroorder term)—this could be a small effect that leads to slow ‘homeostatic’ scaling of weights in the absence of any activity
effects caused by postsynaptic spikes alone independent of presynaptic spike arrival (a non-Hebbian firstorder term). This could be an additional realization of homeostasis: if the postsynaptic neuron spikes at a high rate over hours, all synapses are down-regulated
effects caused by presynaptic spikes, independent of postsynaptic variables (another non-Hebbian first-order term). This is typically the case for short-term synaptic plasticity
effects caused by presynaptic spikes in conjunction with postsynaptic spikes (STDP) or in conjunction with postsynaptic depolarization (Hebbian terms)
all of the above effects may depend on the current value of the synaptic weight. For example, close to a maximum weight synaptic changes could become smaller.
In principle, voltage dependence could be treated in a similar fashion, see, e.g., Brader et al. (2007), but we will focus in the following on learning rules for short-term and long-term plasticity that use spike timing as the relevant variable for inducing postsynaptic changes.
3 3 Short-term plasticity
Biological synapses have an inherent dynamics, which controls how the pattern of amplitudes of postsynaptic responses depends on the temporal pattern of the incoming spike train. Notably, each successive spike can evoke a response in the postsynaptic neuron that is smaller (depression) or larger (facilitation) than the previous one. Its time scale ranges from 100 ms to about a second. Fast synaptic dynamics is firmly established in biological literature (Markram et al. 1998; Gupta et al. 2000), and well-accepted models exist for it (Abbott et al. 1997; Tsodyks et al. 1998). Neurotransmitter is released in quanta of fixed size, each evoking a contribution to the postsynaptic potential of fixed amplitude; this is known as the quantal synaptic potential (Kandel et al. 2000). The release of an individual quantum is known to be stochastic, but the details of the mechanism underlying this stochasticity remain unclear. However, the following two phenomenological models describe the average response and are therefore entirely deterministic. Both models use the idea of a ‘trace’ left by presynaptic spikes (see previous section), but in slightly different formulations.
3.1 3.1 Markram-Tsodyks Model
One well-established phenomenological model for fast synaptic dynamics was originally formulated for depression only in Tsodyks and Markram (1997) and later extended to facilitating dynamics in Markram et al. (1998). Here, we discuss the formulation of the model presented in Tsodyks et al. (2000).
Note that the updated value of u is used to update the variables x and y. This reflects the assumption that the effectivity of resource use is determined not just by the history of the synapse but also by the arrival of the new presynaptic spike, thus ensuring a non-zero response to the first spike (Tsodyks et al. 1998).
In many simulation systems synapse models are constrained to transmit a synaptic weight rather than a continuous synaptic current. In such cases, the synaptic weight transmitted to the postsynaptic neuron is wijy0, assuming the postsynaptic neuron reproduces the dynamics of the y variable. It is not necessary for the neuron to reproduce the y dynamics for each individual synapse; due to the linearity of y between increments, all synapses with the same τI can be lumped together. This is the implementation used in NEST (Gewaltig and Diesmann 2007). If the postsynaptic neuron also implements an exact integration scheme (for a worked example see Morrison et al. 2007), the dynamics of y can be incorporated into the propagator of the dynamics of the postsynaptic neuron.
3.2 3.2 Abbott model
4 4 Long-term plasticity (STDP)
Experimentally reported STDP curves vary qualitatively depending on the system and the neuron type—see Abbott and Nelson (2000) and Bi and Poo (2001) for reviews. It is therefore obvious that we cannot expect that a single STDP rule, be it defined in the framework of temporal traces outlined above or in a more biophysical framework, would hold for all experimental preparations and across all neuron and synapse types. The first spike-timing experiments were perform by Markram and Sakmann on layer 5 pyramidal neurons in neocortex (Markram et al. 1997). In the neocortex, the width of the negative window seems to vary depending on layer, and inhibitory neurons seem to have amore symmetric STDP curve. The standard STDP curve that has become an icon of theoretical research on STDP (Fig. 1 in Bi and Poo 1998) was originally found for pyramidal neurons in rat hippocampal cell culture. Inverted STDP curves have also been reported, for example in the ELL system in electric fish. This gives rise to different functional properties (Bell et al. 1997).
4.1 4.1 Pair-based STDP rules
Depending on the definition of the trace dynamics (accumulating or saturating, see Sect. 2.3), different spike pairing schemes can be realized. Before we turn to the consequences of these subtle differences (Sect. 4.1.2) and the implementation of synaptic delays (Sect. 4.1.3), we now discuss the choice of the factors F+ (w) and F− (w), i.e. the weight dependence of STDP.
4.1.1 4.1.1 Weight dependence of STDP
Instead of plotting the percentage weight change, Fig. 4b shows the absolute weight change in double logarithmic representation. The exponent of the weight dependence can now be determined from the slope of a linear fit to the data, see Morrison et al. (2007) for more details. A multiplicative update rule (F−(w) α w) is the best fit to the depression data but a poor fit to the potentiation data. The best fit to the potentiation data is a power law update (F+(w) α wμ). The quality of an additive update (F+(w) = A+) fit is between the power law fit and the multiplicative fit.
188.8.131.52 184.108.40.206 Unimodal versus bimodal distributions
Gütig et al. (2003) further demonstrated that the unimodal distribution is the rule rather than the exception for update rules of this form. A bimodal distribution is only produced by rules with a very weak weight dependence (i.e. μ ≪ 1). Moreover, the critical value for μ at which bimodal distributions appear decreases as the the effective population size Nrτ increases, where N is the number of synapses converging onto the postsynaptic neuron, r is the rate of the input spike trains in Hz and τ is the time constant of the STDP window (assumed to be equal for potentiation and depression). Figure 5c shows the equilibrium distributions as a function of μ for N = 1,000, r = 10Hz and τ = 0.02 s. μcrit is already very low for this effective population size. Because of the high connectivity of the cortex, we may expect that the effective population size in vivo would be an order of magnitude greater, and so the region of bimodal stability would be vanishingly small according to this analysis. It is worth noting that in the case that a sub-group of inputs is correlated, a bimodal distribution develops for all values of μ, whereby the synaptic weights of the correlated group become stronger than those of the uncorrelated group (data not shown—see Gütig et al. 2003). In contrast to a purely additive rule, the peaks of the distributions are not at the extrema of the permitted weight range. Moreover, the bimodal distribution does not persist if the correlations in the input are removed after learning. A unimodal distribution for uncorrelated Poissonian inputs and an ability to develop multimodal distributions in the presence of correlation is also exhibited by the additive/multiplicative update rule proposed by van Rossum et al. (2000): F+(w)=λ, F−(w)=λθw; and by the power law update rule proposed by Morrison et al. (2007) and also Standage et al. (2007): F+(w) α λwμ, F−(w) α λαw.
220.127.116.11 18.104.22.168 Fixed point analysis of STDP update rules
Note that the existence of a fixed point and its stability does not crucially depend on the presence of soft or hard bounds on the weight. Equations (18) and (19) can equate to zero for hard-bounded or or unbounded rules.
22.214.171.124 126.96.36.199 Consequences for network stability
Results on the consequences of STDP in large-scale networks are few and far between, and tend to contradict each other. Part of the reason for the lack of simulation papers on this important subject is the fact that simulating such networks consumes huge amounts of memory, is computationally expensive, and potentially requires extremely long simulation times to overcome transients in the weight dynamics which can be of the order of hundreds of seconds of biological time. A lack of theoretical papers on the subject can be explained by the complexity of the interactions between the activity dynamics of the network and the weight dynamics, although some progress is being made in this area (Burkitt et al. 2007).
It was recently shown that power law STDP is compatible with balanced random networks in the asynchronousirregular regime (Morrison et al. 2007), resulting in a unimodal distribution of weights and no self-organization of structure. This result was verified for Gütig et al. (2003) STDP for an intermediate value of the exponent (μ = 0.4). Although it has not yet been possible to perform systematic tests, it seems likely that all the formulations of STDP with the fixed point structure discussed in Sect. 188.8.131.52 would give qualitatively similar behavior. The results for additive STDP seem to be more contradictory. Izhikevich et al. (2004) reported self-organization of neuronal groups, whereas the chief feature of the networks investigated by Iglesias et al. (2005) seems to be extensive withering of the synaptic connections. In the former case, it is the existence of many strong synapses which defines the network, in the latter, the presence of many weak ones. This discrepancy may be attributable to different choices for the effective stabilized firing rates (20) in combination with different choices of delays in the network, see Sect. 4.1.3.
4.1.2 4.1.2 Spike pairing scheme
It is sometimes assumed that the scheme used makes no difference, as the ISI of cortical network models is typically an order of magnitude larger than the time constant of the STDP window. However, this is not generally true (Kempter et al. 2001; Izhikevich and Desai 2003; Morrison et al. 2007). For a review of a wide variety of schemes and their consequences, particularly with respect to selectivity of higher-frequency inputs, see Burkitt et al. (2004). Experimental results on this issue suggest limited interaction between pairs of spikes. Sjostrom et al. (2001) found that their data was best fit by a nearest neighbor interaction similar to Fig. 7c but giving precedence to LTP, i.e. a postsynaptic spike can only contribute to a post-before-pre pairing if it has not already contributed to a pre-before-post pairing. However, this result may also be due to the limitations of pair-based STDP models to explain the experimentally observed frequency dependence, see Sect. 4.2. More recently, Froemke et al. (2006) demonstrated that the amount of LTD was not dependent on the number of presynaptic spikes following a postsynaptic spike, suggesting nearest-neighbor interactions for depression as in Fig. 7c. However, the amount of LTP was negatively correlated with the number of presynaptic spikes preceding a postsynaptic spike. This suggests that multiple spike pairings contribute to LTP, but not in the linear fashion of the all-to-all scheme, which would predict a positive correlation between the number of spikes and the amount of LTP. Again, these results are good evidence for the limitations of pair-based STDP rules.
4.1.3 4.1.3 Synaptic delays
4.2 4.2 Beyond pair effects
There is considerable evidence that the pair-based rules discussed above cannot give a full account of STDP. Specifically, they reproduce neither the dependence of plasticity on the repetition frequency of pairs of spikes in an experimental protocol, nor the results of recent triplet and quadruplet experiments.
STDP experiments are usually carried out with about 60 pairs of spikes. The temporal distance of the spikes in the pair is of the order of a few to tens of milliseconds, whereas the temporal distance between the pairs is of the order of hundreds of milliseconds to seconds. In the case of a facilitation protocol (i.e. pre-before-post), standard pair-based STDP models predict that if the repetition frequency is increased, the strength of the depressing interaction (i.e. post-before-pre) becomes greater, leading to less net potentiation. This prediction is independent of whether the spike pairing scheme is all-to-all or nearest neighbor (see Sect. 4.1.2). However, experiments show that increasing the repetition frequency leads to an increase in potentiation (Sjostrom et al. 2001). Other recent experiments employed multiple-spike protocols, such as repeated presentations of symmetric triplets of the form pre-post-pre and post-pre-post (Bi and Wang 2002; Froemke and Dan 2002; Wang et al. 2005; Froemke et al. 2006). Standard pair-based models predict that the two sequences should give essentially the same results, as they each contain one pre-post pair and one post-pre pair. Experimentally, quite different results are observed.
Here we review two examples of simple models which account for these experimental findings. For other models which also reproduce frequency dependence or multiple-spike protocol results, see Abarbanel et al. (2002), Senn (2002) and Appleby and Elliott (2005).
4.2.1 4.2.1 Triplet model
4.2.2 4.2.2 Suppression model
This model gives a good fit to triplet and quadruplet protocols in visual cortex slice, and also gives a much better prediction for synaptic modification due to natural spike trains (Froemke and Dan 2002). However, it does not predict the increase of LTP with the repetition frequency observed by Sjostrom et al. (2001). A revised version of the model (Froemke et al. 2006) also accounts for the switch of LTD to LTP at high frequencies by modifying the efficacy functions.
4.3 4.3 Voltage dependence
Traditional LTP/LTD experiments employ the following induction paradigm: the postsynaptic neuron is held at a fixed depolarization while one or several presynaptic neurons are activated. Often a presynaptic pathway is stimulated extracellularly, so that several presynaptic neurons are activated. Depending on the level of the postsynaptic membrane potential, the activated synapses increase their efficacy while other non-activated synapses do not change their weight (Artola et al. 1990; Artola and Singer 1993). More recently, depolarization has also been combined with STDP experiments. In particular, Sjostrom et al. (2004) showed a dependence of synaptic weight changes on the synaptic membrane potential just before a postsynaptic spike.
There is an ongoing discussion whether the voltage dependence is more fundamental than the dependence on postsynaptic spiking. Indeed, voltage dependence alone can generate STDP-like behavior (Brader et al. 2007), as the membrane potential behaves in a characteristic way in the vicinity of a spike (high shortly before a spike, and low shortly after). Alternatively, a dependence on the slope of the postsynaptic membrane potential has also been shown to reproduce the characteristic STDP weight change curve (Saudargiene et al. 2003). The voltage effects caused by back-propagating spikes is implicitly contained in the mechanistic formulation of STDP models outlined above. In particular, the fast postsynaptic trace y1 in the above triplet model could be seen as an approximation of a back-propagating action potential. However, the converse is not true: a pure STDP rule does not automatically generate a voltage dependence. Moreover, synaptic effects caused by subthreshold depolarization in the absence of postsynaptic firing cannot be modeled by standard STDP or triplet models.
4.4 4.4 Induction versus maintenance
We stress that all the above models concern induction of potentiation and depression, but not their maintenance. The induction of LTP may take only a few seconds: for example, stimulation with 50 pairs of pre- and postsynaptic spikes given at 20Hz takes less than 3 s. However, afterwards the synapse takes 60 min or more to consolidate these changes, and this process may also be interrupted (Frey and Morris 1997). During this time synapses are ‘tagged’, that is, they are ready for consolidation. Consolidation is thought to rely on a different molecular mechanism than that of induction. Simply speaking, gene transcription is necessary to trigger the building of new proteins that increase the synaptic efficacy.
4.4.1 4.4.1 Functional consequences
Long-term stability of synapses is necessary to retain memories that have been learned, despite ongoing activity of presynaptic neurons. A simple possibility used in many models is that plasticity is simply switched off once the neuron has learned what it should. This approach makes sense in the context of reward-based learning: the learning rate goes to zero once the actual reward equals the expected reward and learning stops automatically (see Sect. 5.2). It also makes sense in the framework of supervised learning (see Sect. 5.1). Learning is normally driven by the difference between desired output and actual output. However, in the context of unsupervised learning it is inconsistent to switch off the dynamics. Nevertheless, receptive field properties should be retained for a fairly long time even if the stimulation characteristic changes.
4.4.2 4.4.2 Bistability model
4.4.3 4.4.3 Biological evidence
Whether single synapses themselves are binary or continuous is a matter of intense debate. Some experiments have suggested that synapses are binary (Petersen et al. 1998; O’Connor et al. 2005). However, this would seem to result in a bistable distribution of weights which is at odds with the unimodal distribution reported by other studies (Turrigiano et al. 1998; Sjostrom et al. 2001; Song et al. 2005), and with the finding that the magnitude of LTP/LTD increases with the number of spike pairs in a protocol until saturation is reached (Froemke et al. 2006).
Some possibilities to reconcile these findings include: (i) since pairs of neurons form several contacts with each other, it is likely that in standard plasticity experiments several synapses are measured at the same time; (ii) LTP and STDP results are typically reported as pooled experiments over several pairs of neurons. Under the assumption that the upper bound is not the same for all synapses, a broad distribution could result; (iii) both unimodal distribution and bimodal distributions could be stable. Untrained neurons would show a unimodal distribution whereas neurons that have learned to respond to a specific pattern would develop a bimodal distribution of synaptic weights (Toyoizumi et al. 2007); (iv) all synapses are binary, but the efficacy of the ‘strong’ state is subject to short-term plasticity and homeostasis; (v) some synapses are binary and some are not. Potentially a combination of several of these possibilities must be considered in order to explain the experimental findings.
5 5 Supervised and reinforcement learning
All the models considered in Sect. 4 are unsupervised ‘Hebbian’ rules: changes are triggered as a result of combined action of pre- and postsynaptic neurons. The postsynaptic neuron itself is driven by its input arising from presynaptic neurons. There is no notion of whether or not the postsynaptic output is ‘good’ or ‘useful’. If, however, the local variables are combined with global teacher or reinforcement signals, completely different learning paradigms are possible.
5.1 5.1 Supervised learning
Supervised plasticity has been demonstrated experimentally by Fregnac and Schulz (2006): the behavior of a (cortical) neuron can be changed by pairing some class of stimuli with an (artificial) increase of neural activity while pairing another class of stimuli with a decrease of responsiveness. Theoretical studies have demonstrated that a teacher-forced STDP approach can be used to learn precise spike times (Legenstein et al. 2005; Pfister et al. 2006). In a natural situation, this would mean that a few strong neural inputs can drive the neuron and therefore drive learning of other inputs. If these strong inputs are controlled in a task-specific way, they act as a teacher for the postsynaptic neuron. For a practical realization of this idea see Brader et al. (2007).
5.2 5.2 Reinforcement learning
If neuronal activity leads to actions, feedback may arise from the environment in forms of reward (a piece of pizza) or punishment (burnt fingers). It is thought that success of an action is signaled by neuromodulators—a top candidate is dopamine (Schultz et al. 1997). Dopamine signals are closely related to a quantity in reinforcement learning known as δ, that can be interpreted as the difference between the received reward and the expected reward. Here ‘reward’ means current or future rewards that can be reliably predicted. In reinforcement learning, the difference between actual and expected rewards plays an important role for the update of weights in Q-learning, SARSA, and related variants of temporal difference learning (Sutton and Barto 1998).
Under a suitable interpretation of the role of pre- and postsynaptic neurons, the weight update rules can be derived from an optimality framework (Pfister et al. 2006). The learning rule can be interpreted as a Hebbian learning based on joined pre- and postsynaptic activity, but conditioned on the presence of a global reward signal. Variants of such reinforcement rules for spiking neurons have been developed (Seung 2003; Pfister et al. 2006; Izhikevich 2007; Florian 2007).
6 6 Discussion
Pair-based STDP models can be decomposed into three aspects: weight dependence, spike-pairing scheme and delay partition (Sect. 4.1).We have shown that all of these aspects can have significant consequences for the behavior of the model system under investigation. However, in many cases there is not enough experimental data to settle these questions definitively. Therefore, choices for each aspect should be made consciously and take into consideration the relevant available experimental findings. Moreover, these choices should be explicitly documented and critically addressed: it should be clear to what extent results depend on the specific choices.
In particular, the choice of STDP weight dependence is critical. The available evidence suggests that both potentiation and depression are dependent on the weight. Whereas it is useful to start with very simplified models to gain insight, we now know that STDP models which assume some weight dependence produce qualitatively different behavior from the additive model. Moreover, weight dependent rules are no harder to implement computationally than additive rules. In the absence of fresh experimental evidence supporting an additive rule, weight dependent rules should therefore be considered as the standard.
Pair-based models of STDP have their limitations. They give incorrect predictions for many experiments such as triplet and quadruplet protocols and cannot account for synaptic modification due to natural spike trains or pairing protocols at different frequencies. Models of STDP that are beyond the pair-based framework (Sect. 4.2) can account for these findings at the cost of only a small number of additional variables, and so should attract increasing theoretical interest.
In this manuscript, we have considered models in which synaptic modifications depend only on spike timing. However, this ignores many aspects of synaptic plasticity which may prove to be of great importance to the functioning of the brain, and will therefore have to be taken into consideration in future phenomenological modeling. Most STDP models assume that the absolute synaptic strength is modified (but see Senn 2002). However, it may turn out that a formulation in terms of the release probability is a more accurate description, thus allowing a unified view of short-term and long-term plasticity. Additionally, STDP has been shown to be sensitive to a number of factors beyond spike timing, for example active dendritic properties and the location of the synapse on the dendrite — see Kampa et al. (2007) for a review. There is also substantial evidence that inhibition is an important physiological feature fine-tuning induction and maintenance of LTP/LTD. Inhibition gates induction of LTP/LTD as a function of physiological conditions and physiologically-induced changes in the activity of networks (Larson and Lynch 1986; Pacelli et al. 1989; Radpour and Thomson 1991; Steele and Mauk 1999; Nishiyama et al. 2000; Togashi et al. 2003). Here, the main challenge is to derive appropriate phenomenological models from experiments and detailed biophysical models. Finally, although some progress has been made in investigating the interactions of STDP with other plasticity mechanisms such as homeostasis and heterosynaptic spread of LTP/LTD(van Rossum et al. 2000; Toyoizumi et al. 2005, Toyoizumi et al. 2007; Triesch 2007), this complex topic remains largely unexplored. In this area, the main challenge is to perform analytical and simulation studies which can identify and characterize their composite effects, and investigate their functional consequences.
The idea for this paper grew out of a FACETS workshop on synaptic plasticity held in Lausanne in June 2006. We would therefore like to thank all the participants, especially A. Davison, A. Destexhe, Y. Fregnac, C. Lamy, R. Legenstein, W. Maass, J.-P. Pfister, and A. Thomson for their contributions. We also thank M. Helias for helpful discussions about short-term plasticity and the implementation in NEST, and G. Hennequin for proofreading the manuscript. We are very grateful to G-q. Bi and M-m. Poo for providing us with their original data. This work was partially funded by EU Grant 15879 (FACETS), DIP F1.2 and BMBF Grant 01GQ0420 to the Bernstein Center for Computational Neuroscience Freiburg.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.