# Phenomenological models of synaptic plasticity based on spike timing

## Authors

- First Online:

- Received:
- Accepted:

DOI: 10.1007/s00422-008-0233-1

- Cite this article as:
- Morrison, A., Diesmann, M. & Gerstner, W. Biol Cybern (2008) 98: 459. doi:10.1007/s00422-008-0233-1

## Abstract

Synaptic plasticity is considered to be the biological substrate of learning and memory. In this document we review phenomenological models of short-term and long-term synaptic plasticity, in particular spike-timing dependent plasticity (STDP). The aim of the document is to provide a framework for classifying and evaluating different models of plasticity. We focus on phenomenological synaptic models that are compatible with integrate-and-fire type neuron models where each neuron is described by a small number of variables. This implies that synaptic update rules for short-term or long-term plasticity can only depend on spike timing and, potentially, on membrane potential, as well as on the value of the synaptic weight, or on low-pass filtered (temporally averaged) versions of the above variables. We examine the ability of the models to account for experimental data and to fulfill expectations derived from theoretical considerations. We further discuss their relations to teacher-based rules (supervised learning) and reward-based rules (reinforcement learning). All models discussed in this paper are suitable for large-scale network simulations.

### Keywords

Spike-timing dependent plasticity Short term plasticity Modeling Simulation Learning## 1 Introduction

Synaptic changes are thought to be involved in learning, memory, and cortical plasticity, but the exact relation between microscopic synaptic properties and macroscopic functional consequences remains highly controversial. In experimental preparations, synaptic changes can be induced by specific stimulation conditions defined through presynaptic firing rates (Bliss and Lomo 1973; Dudek and Bear 1992), postsynaptic membrane potential (Kelso et al. 1986; Artola et al. 1990), calcium entry (Lisman 1989; Malenka et al. 1988), or spike timing (Markram et al. 1997; Bi and Poo 2001).

Whereas detailed biophysical models are crucial to understand the biological mechanisms underlying synaptic plasticity, phenomenological models which describe the synaptic changes without reference to mechanism are generally more tractable and less computationally expensive. Consequently, phenomenological models are of great use in analytical and simulation studies. In this manuscript, we will examine a number of phenomenological models with respect to their compatibility with both experimental and theoretical results. In all cases, we consider a synapse from a presynaptic neuron *j* to a postsynaptic neuron *i*. The strength of a connection from *j* to *i* is characterized by a weight *w*
_{
ij
} that quantifies the amplitude of the postsynaptic response, typically measured as the height of the postsynaptic potential or the slope of the postsynaptic current at onset. The conditions for synaptic changes as well as their directions and magnitudes can be formulated as ‘synaptic update rules’ or ‘learning rules’. Such rules can be developed from purely theoretical considerations, or to account for macroscopic phenomena such as the development of receptive fields, or based on findings from electrophysiological experiments manipulating firing rate or voltage. In this manuscript, however, we restrict our scope to rules which have been developed to account for the results of experiments in which synaptic plasticity was observed as a result of pre- and postsynaptic spikes (for more general reviews, see Dayan and Abbott 2001; Gerstner and Kistler 2002; Cooper et al. 2004).

For the classification of the synaptic plasticity rules, it is important to specify the time necessary to *induce* such a change as well as the time scale of *persistence* of the change. For both short-term and long-term plasticity, changes can be induced in about 1 s or less. In short-term plasticity (see Sect. 3), a sequence of eight presynaptic spikes at 20Hz evokes successively smaller (depression) or successively larger (facilitation) responses in the postsynaptic cell. The characteristic feature of short-term plasticity is that this change does not persist for more than a few hundred milliseconds: the amplitude of the postsynaptic response recovers to close-to-normal values within less than a second (Markram et al. 1998; Thomson et al. 1993).

In contrast to short-term plasticity, long-term potentiation and depression (LTP and LTD) refer to persistent changes of synaptic responses (see Sect. 4). Note that the time necessary for induction can still be relatively brief. For example, in spike-timing-dependent plasticity (Bi and Poo 2001; Sjostrom et al. 2001), a change of the synapse can be induced by 60 pairs of pre- and postsynaptic spikes with a repetition frequency of 20Hz; hence stimulation is over after 3 s. However, this change can persist for more than one hour. The final stabilization of, say, a potentiated synapse occurs only thereafter, called the late phase of LTP (Frey and Morris 1997). An additional aspect is that neurons in the brain must remain within a sustainable activity regime, despite the changes induced by LTP and LTD. This is achieved by homeostatic plasticity, an up- or down-regulation of all synapses converging onto the same postsynaptic neuron which occurs on the time scale of minutes to hours (Turrigiano and Nelson 2004).

The phenomenological models discussed in this manuscript can be classified from a theoretical point of view as unsupervised learning rules. There is no notion of a task to be solved, nor is there any notion of the change being ‘good’ or ‘bad’ for the survival of the animal; learning consists simply of an adaptation of the synapse to the statistics of the activity of pre- and postsynaptic neurons. This is to be contrasted with reward-based learning, also called reinforcement learning (Sutton and Barto 1998).Inreward-based learning the direction and amount of change depends on the presence or absence of a success signal, that may reflect the current reward or the difference between expected and received reward (Schultz et al. 1997). Reward-based learning rules are distinct from supervised learning since the success signal is considered as a global and unspecific feedback signal, that often comes with a delay, whereas in supervised learning the feedback is much more specific. In the theoretical literature, there exists a large variety of update rules that can be classified as supervised, unsupervised or reward based learning rules.

In this paper, we start with a review of some basic experimental facts that could be relevant for modeling, followed by a list of theoretical concepts arising from fundamental notions of learning and memory formation (Sect. 2).We then review models of short-term plasticity in Sect. 3 and models of long-term potentiation/depression (LTP/LTD), in particular the spike-timing dependent form, in Sect. 4. Throughout the review we discuss spike-based plasticity rules from a computational perspective, giving implementations that are appropriate for analytical and simulation approaches. In the final sections we briefly mention reward driven learning rules for spiking neurons (Sect. 5) and provide an outlook toward current challenges for modeling. The relevance of molecular mechanisms and signaling chains (Lisman 1989; Malenka et al. 1988) for models of synaptic plasticity (Lisman and Zhabotinsky 2001; Shouval et al. 2002; Rubin et al. 2005; Badoual et al. 2006; Graupner and Brunel 2007; Zou and Destexhe 2007), as well as the importance of the postsynaptic voltage (Kelso et al. 1986; Artola et al. 1990; Sjostrom et al. 2001), is acknowledged but not further explored.

## 2 2 Perspectives on plasticity

Over the last 30 years, a large body of experimental results on synaptic plasticity has been accumulated. The most important discoveries are summarized in Sect. 2.1. Simultaneously, theoreticians have investigated the role of synaptic plasticity in long-term memory, developmental learning and task-specific learning. The most important concepts arising from this research are described in Sect. 2.2. Many of the plasticity models employed in the theoretical approach were inspired by Hebb’s (1949) postulate that describes how synaptic connections should be modified:

When an axon of cell A is near enough to excite cell B or repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.

In classical Hebbian models, this famous postulate is often rephrased in the sense that modifications in the synaptic transmission efficacy are driven by correlations in the firing activity of pre- and postsynaptic neurons. Even though the idea of learning through correlations dates further back in the past (James 1890), correlation-based learning is now generally called Hebbian learning. Most classic theoretical studies represented the activity of pre- and postsynaptic neurons in terms of rates, expressed as continuous functions. This has led to a sound understanding of rate-based Hebbian learning. However, rate-based Hebbian learning neglects the fine temporal structure between pre- and postsynaptic spikes. Spike-based learning models for temporally structured input need to take this timing information into account (e.g. Gerstner et al. 1993) which leads to models of spike-timing dependent plasticity (STDP) (Gerstner et al. 1996; Kempter et al. 1999; Roberts 1999; Abbott and Nelson 2000) that can be seen as a spike-based generalization of Hebbian learning. The first experimental reports showing both long-term potentiation and depression induced by causal and acausal spike timings on a time scale of 10ms were published by Markram and Sakmann (1995) and Markram et al. (1997), slightly after the theoretical work, however potentiation induced by the pairing of EPSPs with postsynaptic depolarization on a time scale of 100ms was demonstrated considerably earlier (Gustafsson et al. 1987). Timing in rate-based Hebbian learning (although not spike-based) can be traced even further back in the past (Levy and Steward 1983). From a conceptual point of view, all spike-based and rate-based Hebbian learning rules share the feature that only variables that are *locally* available at the synapse can be used to change the synaptic weight. These local elements that can be used to construct such rules are listed in Sect. 2.3.

### 2.1 2.1 Experimental results

- (i)
Short-term plasticity depends on the sequence of presynaptic spikes on a time scale of tens of milliseconds (Markram et al. 1998; Thomson et al. 1993).

- (ii)
Long-term plasticity is sensitive to the presynaptic firing rate over a time scale of tens or hundreds of seconds. For example 900 presynaptic stimulation pulses at 1Hz (i.e. 15min of total stimulation time) yield a persistent depression of the synapses, whereas the same number of pulses at 50Hz yields potentiation (Dudek and Bear 1992).

- (iii)
Long-term plasticity depends on the exact timing of the pre- and postsynaptic spikes on the time scale of milliseconds (Markram et al. 1997; Bi and Poo 2001). For example LTP is induced if a presynaptic spike precedes the postsynaptic one by 10 ms, whereas LTD occurs if the order of spikes is reversed. In this context it is important to realize that most experiments are done with repetitions of 50–60 pairs of spikes whereas a single pair has no effect.

- (iv)
STDP depends on the repetition frequency of the prepost spike-pairings. In fact, 60 pairings pre-before-post at low frequency have no effect, whereas the same number of pairs at a repetition frequency of 20Hz gives strong potentiation (Sjostrom et al. 2001).

- (v)
Plasticity depends on the postsynaptic potential (Kelso et al. 1986; Artola et al. 1990). If the postsynaptic neuron is clamped to a voltage slightly above rest during presynaptic spike arrival, the synapses are depressed, while at higher depolarization the same stimulation leads to LTP (Artola et al. 1990; Ngezahayo et al. 2000).

- (vi)
On a slow time scale of hours, homeostatic changes of synapses may occur in form of rescaling of synaptic response amplitudes (Turrigiano et al. 1994). These changes can be useful to stabilize neuronal firing rates.

- (vii)
Also on the time scale of hours, early phase LTP is consolidated into late phase LTP. During the consolidation phase heterosynaptic interactions may take place, probably as a result of synaptic tagging and competition for scarce protein supply (Frey and Morris 1997). Consolidation is thought to lead to long-term stability of the synapses.

- (viii)
Distributions of synaptic strength (e.g., the EPSP amplitudes) in data collected across several pairs of neurons are reported to be unimodal (Sjostrom et al. 2001). At a first glance, this seems to be at odds with experimental data suggesting that single synaptic contacts are in fact binary (Petersen et al. 1998; O’Connor et al. 2005).

- (ix)
Synapses do not form a homogeneous group, but different types of synapse have different plasticity properties (Abbott and Nelson 2000; Thomson and Lamy 2007). In fact, the same presynaptic neuron makes connections to different types of target neurons with different plasticity properties for short-term (Markram et al. 1998) and long-term plasticity (Lu et al. 2007).

Many other experimental features could be added to this list, e.g., the role of intracellular calcium, of NMDA receptors, etc., but we will not do so; see Bliss and Collingridge (1993) and Malenka and Nicoll (1993) for reviews. We emphasize that, given the heterogeneity of synapses between different brain areas (plasticity has mainly been studied in visual or somatosensory cortex and hippocampus) and between different neuron and synapse types, we cannot expect that a single theoretical model can account for all experimental facts. In the next section, we will instead consider which theoretical principles could guide our search for suitable plasticity rules.

### 2.2 2.2 Theoretical concepts

- (i)
sensitivity to correlations between pre- and postsynaptic neurons (Hebb 1949) in order to respond to correlations in the input (Oja 1982). This is the essence of all unsupervised learning rules

- (ii)
a mechanism for the development of input selectivity such as receptive fields (Bienenstock et al. 1982; Miller et al. 1989), in the presence of strong input features. This is the essence of developmental learning

- (iii)
a high degree of stability (Fusi et al. 2005) in the synaptic memories whilst remaining plastic (Grossberg 1987). This is the essence of memory formation and memory maintenance

- (iv)
the ability to take into account the quality of task performance mediated by a global success signal (e.g. neuro-modulators, Schultz et al. 1997). This is the essence of reinforcement learning (Sutton and Barto 1998).

These items are not necessarily exclusive, and the relative importance of a given aspect may vary from one subsystem to the next; for example, synaptic memory maintenance might be more important for a long-term memory system than for primary sensory cortices. There is so far no rule which exhibits all of the above properties; moreover, theoretical models which reproduce some aspects of experimental findings are generally incompatible with other findings. For example, traditional learning rules that have been proposed as an explanation of receptive field development (Bienenstock et al. 1982; Miller et al. 1989), exhibit a spontaneous separation of synaptic weights into two groups, even if the input shows no or only weak correlations. This is difficult to reconcile with experimental results in visual cortex of young rats where a unimodal distribution was found (Sjostrom et al. 2001). Moreover model neurons that specialize early in development on one subset of features cannot readily re-adapt later on. On the other hand, learning rules that do produce a unimodal distribution of synaptic weights (van Rossum et al. 2000; Rubin et al. 2001; Gütig et al. 2003; Morrison et al. 2007) do not lead to long-term stability of synaptic changes, as the trajectories of individual synaptic weights perform random walks. Hence it appears that long-term stability of memory requires a multimodal synapse distribution (Toyoizumi et al. 2007; Billings and van Rossum 2008) or additional mechanisms to stabilize the synaptic weights contributing to the retention of a memory item.

### 2.3 2.3 Locally computable measures

- (i)
spontaneous growth or decay (a non-Hebbian zeroorder term)—this could be a small effect that leads to slow ‘homeostatic’ scaling of weights in the absence of any activity

- (ii)
effects caused by postsynaptic spikes alone independent of presynaptic spike arrival (a non-Hebbian firstorder term). This could be an additional realization of homeostasis: if the postsynaptic neuron spikes at a high rate over hours, all synapses are down-regulated

- (iii)
effects caused by presynaptic spikes, independent of postsynaptic variables (another non-Hebbian first-order term). This is typically the case for short-term synaptic plasticity

- (iv)
effects caused by presynaptic spikes in conjunction with postsynaptic spikes (STDP) or in conjunction with postsynaptic depolarization (Hebbian terms)

- (v)
all of the above effects may depend on the current value of the synaptic weight. For example, close to a maximum weight synaptic changes could become smaller.

*τ*. Hence, the hidden variable implements a low-pass filter. For example, let us denote by

*δ*(

*t*−

*t*

^{ f }) a spike of a neuron occurring at time

*t*

^{ f }. Then an internal variable

*x*can be defined with dynamics:

*A*and decays between spikes with a time constant

*τ*(see Fig. 1, top). If the time constant is sufficiently long and

*A*= 1/

*τ*, the hidden variable gives an online estimate of the mean firing rate in the spike train. Other variations in the formulation of such a ‘trace’ left by a spike are possible that do not scale linearly with the rate. First, instead of updating by the same amount each time, we may induce saturation,

*A*< 1 the amount of increase gets smaller as the variable

*x*before the update (denoted by

*x*_) approaches its maximal value of 1. Hence the variable

*x*stays bounded in the range 0 ≤

*x*≤ 1. An extreme case of saturation is given by

*A*= 1, in which case the reset is always to the value of 1, regardless of the value of

*x*just before. In this case, the value of the trace

*x*depends only on the time since the most recent spike (see Fig. 1, bottom). We will see in the following sections that the idea of traces left by pre- or postsynaptic spikes plays a fundamental role in algorithmic formulations of short-term and long-term plasticity. For example, in the case of Hebbian long-term potentiation, traces left by presynaptic spikes need to be combined with postsynaptic spikes, whereas short-term plasticity can be seen as induced by traces of presynaptic spikes, independent of the state of the postsynaptic neuron.

In principle, voltage dependence could be treated in a similar fashion, see, e.g., Brader et al. (2007), but we will focus in the following on learning rules for short-term and long-term plasticity that use spike timing as the relevant variable for inducing postsynaptic changes.

## 3 3 Short-term plasticity

Biological synapses have an inherent dynamics, which controls how the pattern of amplitudes of postsynaptic responses depends on the temporal pattern of the incoming spike train. Notably, each successive spike can evoke a response in the postsynaptic neuron that is smaller (depression) or larger (facilitation) than the previous one. Its time scale ranges from 100 ms to about a second. Fast synaptic dynamics is firmly established in biological literature (Markram et al. 1998; Gupta et al. 2000), and well-accepted models exist for it (Abbott et al. 1997; Tsodyks et al. 1998). Neurotransmitter is released in quanta of fixed size, each evoking a contribution to the postsynaptic potential of fixed amplitude; this is known as the quantal synaptic potential (Kandel et al. 2000). The release of an individual quantum is known to be stochastic, but the details of the mechanism underlying this stochasticity remain unclear. However, the following two phenomenological models describe the average response and are therefore entirely deterministic. Both models use the idea of a ‘trace’ left by presynaptic spikes (see previous section), but in slightly different formulations.

### 3.1 3.1 Markram-Tsodyks Model

One well-established phenomenological model for fast synaptic dynamics was originally formulated for depression only in Tsodyks and Markram (1997) and later extended to facilitating dynamics in Markram et al. (1998). Here, we discuss the formulation of the model presented in Tsodyks et al. (2000).

*i*receives a synapse from neuron

*j*(see Fig. 2b), the synaptic current (or conductance) in neuron

*i*is

*w*

_{ ij }

*y*

_{ ij }(

*t*), where

*w*

_{ ij }is the absolute strength and

*y*

_{ ij }(

*t*) is a scaling factor that describes themomentary input to neuron

*i*. Dropping the indices for the rest of this discussion,

*y*evolves according to:

*x*,

*y*and

*z*are the fractions of synaptic resources in the recovered, active, and inactive states respectively,

*t*

_{ j }

^{ f }gives the timing of presynaptic spikes,

*τ*

_{I}is the decay constant of PSCs and

*τ*

_{rec}is the recovery time from synaptic depression. These equations describe the use of synaptic resources by each presynaptic spike—a fraction

*u*

_{+}of the available resources

*x*is used by each presynaptic spike. The variable

*u*

_{+}therefore describes the effective use of the synaptic resources of the synapses, which is analogous to the probability of release in the model described in (Markram et al. 1998). The notation

*x*

_{−}in the update equations (3) and (4) is intended to remind the reader that the value of

*x*just

*before*the update is used. In facilitating synapses,

*u*

_{+}is not a fixed parameter, but derived from a variable

*u*which is increased with each presynaptic spike and returns to baseline with a time constant

*τ*

_{fac}:

*U*determines the increase in

*u*with each spike. We note that the update is equivalent to the saturated trace (2). The notation

*u*

_{−}indicates that the value of

*u*is taken just before the update caused by presynaptic spike arrival. However, in (3) and (4) we use the value

*u*

_{+}just

*after*the update of the variable

*u*. If

*τ*

_{fac}→ 0, facilitation is not exhibited, and

*u*

_{+}is identical to

*U*after each spike, as is the case with depressing synapses between excitatory pyramidal neurons (Tsodyks and Markram 1997). The model described by Eqs. 3–6 gives a very good fit to experimental results: compare Fig. 2a and c. However, it should be noted that the values for the model parameters, including the time constants, are quite heterogeneous, even within a single neural circuit (Markram et al. 1998). The biophysical causes of the heterogeneity are still largely unclear.

*u*, but also the variable

*y*in (4) is essentially a ‘trace’ very similar to the one defined in the preceding section. To see this we eliminate the variable

*x*which is possible since the total amount of synaptic resources is fixed (\(x + y + z = 1\)). Hence (4) becomes:

*τ*

_{rec}≪

*τ*

_{I}. Then

*z*decays rapidly back to zero and the above equation becomes the standard saturated trace. Since

*x*= 1 −

*y*and d

*x*/d

*t*= −d

*y*/d

*t*, the available synaptic resources have the dynamics:

*x*is reduced at each presynaptic spike and, in the absence of spikes, approaches an asymptotic value of unity with time constant

*τ*

_{I}.

*z*can be eliminated from the system. Let the state of the synapse be given by:

*u*given by (6), that of

*y*by (4) and that of

*x*given by \({\rm{d}}x/{\rm{d}}t = (1 - x - y)/{\tau _{{\rm{rec}}}} - {u_ + }{x_ - }\delta (t - t_j^f)\). Between two successive presynaptic firing times

*t*′ and

*t*″ the state of the synapse evolves linearly. At

*t*″, the state of the synapse without the effects of the new spike can be calculated as:

*s*(

*t*′), 1)

^{ T }is a four-dimensional vector. The closed form expression of the propagator matrix is:

*t*″ is the sum of the linear evolution since

*t*′ and the non-linear modification of the state due to the new spike:

Note that the updated value of *u* is used to update the variables *x* and *y*. This reflects the assumption that the effectivity of resource use is determined not just by the history of the synapse but also by the arrival of the new presynaptic spike, thus ensuring a non-zero response to the first spike (Tsodyks et al. 1998).

In many simulation systems synapse models are constrained to transmit a synaptic weight rather than a continuous synaptic current. In such cases, the synaptic weight transmitted to the postsynaptic neuron is *w*
_{
ij
}
*y*
_{0}, assuming the postsynaptic neuron reproduces the dynamics of the *y* variable. It is not necessary for the neuron to reproduce the *y* dynamics for each individual synapse; due to the linearity of *y* between increments, all synapses with the same *τ*
_{I} can be lumped together. This is the implementation used in NEST (Gewaltig and Diesmann 2007). If the postsynaptic neuron also implements an exact integration scheme (for a worked example see Morrison et al. 2007), the dynamics of *y* can be incorporated into the propagator of the dynamics of the postsynaptic neuron.

### 3.2 3.2 Abbott model

*g*

_{s}=

*-g*

_{s}

*P*

_{s}

*P*

_{rel}, where

*-g*

_{s}is the maximum conductance,

*P*

_{s}is the fraction of open postsynaptic channels and

*P*

_{rel}is the fraction of presynaptic sites releasing transmitter.

*P*

_{s}generates the shape of the postsynaptic conductance, and will not be further considered here. Facilitation and depression can both be modeled as presynaptic processes that modify

*P*

_{rel}. In both cases, between presynaptic action potentials

*P*

_{rel}decays exponentially with a time constant

*τ*

_{P}back to its ‘resting’ level

*P*

_{0}. In the case of facilitation, a presynaptic spike causes

*P*

_{rel}to be increased by

*f*

_{F}(1 −

*P*

_{rel}):

*t*

_{ j }

^{ f }is the timing of the presynaptic spikes,

*f*

_{F}controls the degree of facilitation (with 0 ≤

*f*

_{F}≤ 1), and the factor (1−

*P*

_{rel}) prevents the release probability from growing larger than 1. Note that (8) is just a modification of the saturated trace in (2) due to a nonzero ‘resting’ level.

*P*

_{rel}to be decreased by

*f*

_{D}

*P*

_{rel}:

*f*

_{D}controls the degree of depression (with 0 ≤

*f*

_{D}≦ 1), and the factor

*P*

_{rel}prevents the release probability from becoming less than 0. Note that with an equilibrium value

*P*

_{0}= 1 (which is always possible) this equation is equivalent to the that of the simplified Tsodyks model without the inactive state (7).

## 4 4 Long-term plasticity (STDP)

Experimentally reported STDP curves vary qualitatively depending on the system and the neuron type—see Abbott and Nelson (2000) and Bi and Poo (2001) for reviews. It is therefore obvious that we cannot expect that a single STDP rule, be it defined in the framework of temporal traces outlined above or in a more biophysical framework, would hold for all experimental preparations and across all neuron and synapse types. The first spike-timing experiments were perform by Markram and Sakmann on layer 5 pyramidal neurons in neocortex (Markram et al. 1997). In the neocortex, the width of the negative window seems to vary depending on layer, and inhibitory neurons seem to have amore symmetric STDP curve. The standard STDP curve that has become an icon of theoretical research on STDP (Fig. 1 in Bi and Poo 1998) was originally found for pyramidal neurons in rat hippocampal cell culture. Inverted STDP curves have also been reported, for example in the ELL system in electric fish. This gives rise to different functional properties (Bell et al. 1997).

### 4.1 4.1 Pair-based STDP rules

*F*

_{±}(

*w*) describes the dependence of the update on the current weight of the synapse. A pair-based model is fully specified by defining: (i) the form of

*F*

_{±}(

*w*); (ii) which pairs are taken into consideration to perform an update. In order to incorporate STDP into a neuronal network simulation, it is also necessary to specify how the synaptic delay is partitioned into axonal and dendritic contributions.

*j*and neuron

*i*. Suppose that each spike from presynaptic neuron

*j*contributes to a trace

*x*

_{ j }at the synapse:

*t*

_{ m }

^{ f }denotes the firing times of the presynaptic neuron. In other words, the variable is increased by an amount of one at themoment of a presynaptic spike and decreases exponentially with time constant

*τ*

_{ x }afterwards; see the discussion of traces in Sect. 2.3. Similarly, each spike from postsynaptic neuron

*i*contributes to a trace

*y*

_{ i }:

*t*

_{ i }

^{ f }denotes the firing times of the postsynaptic neuron. On the occurrence of a presynaptic spike, a decrease of the weight is induced proportional to the momentary value of the postsynaptic trace

*y*

_{ i }. Likewise, on the occurrence of a postsynaptic spike a potentiation of the weight is induced proportional to the trace

*x*

_{ j }left by previous presynaptic spikes:

Depending on the definition of the trace dynamics (accumulating or saturating, see Sect. 2.3), different spike pairing schemes can be realized. Before we turn to the consequences of these subtle differences (Sect. 4.1.2) and the implementation of synaptic delays (Sect. 4.1.3), we now discuss the choice of the factors *F*
_{+} (*w*) and *F*
_{−} (*w*), i.e. the weight dependence of STDP.

#### 4.1.1 4.1.1 Weight dependence of STDP

*w*/

*w*≈ constant) suggests a multiplicative dependence of depression on the initial synaptic strength (Δ

*w*∞

*w*). For potentiation the picture is less clear.

Instead of plotting the percentage weight change, Fig. 4b shows the absolute weight change in double logarithmic representation. The exponent of the weight dependence can now be determined from the slope of a linear fit to the data, see Morrison et al. (2007) for more details. A multiplicative update rule (*F*
_{−}(*w*) α *w*) is the best fit to the depression data but a poor fit to the potentiation data. The best fit to the potentiation data is a power law update (*F*
_{+}(*w*) α *w*
^{
μ
}). The quality of an additive update (*F*
_{+}(*w*) = *A*
_{+}) fit is between the power law fit and the multiplicative fit.

##### 4.1.1.1 4.1.1.1 Unimodal versus bimodal distributions

*F*

_{+}(

*w*)=

*λ*,

*F*

_{−}(

*w*)=

*λα*, where

*λ*≪ 1 is the learning rate and

*α*an asymmetry parameter) is compared with the behavior of a multiplicative STDP rule (\({F_ + }(w ) = \lambda (1 - w )\),

*F*

_{−}(

*w*)=

*λαw*, with

*w*in the range [0, 1). In the lowest histograms, the equilibrium distributions are shown for a neuron receiving 1,000 uncorrelated Poissonian spike trains at 10 Hz. In the case of additive STDP, a bimodal distribution develops, whereas in the case of multiplicative STDP, the equilibrium distribution is unimodal. Experimental evidence currently suggests that a unimodal distribution of synaptic strengths is more realistic than the extreme bimodal distribution depicted in Fig. 5a, see, for example, Turrigiano et al. (1998) and Song et al. (2005). Gütig et al. (2003) extended this analysis by regarding additive and multiplicative STDP as the two extrema of a continuous spectrum of rules:

*F*

_{−}(

*w*)=

*λαw*

^{ μ }, \({F_ + }(w ) = \lambda {(1 - w )^\mu }\). A choice of

*μ*= 0 results in additive STDP, a choice of

*μ*= 1 leads to multiplicative STDP, and intermediate values result in rules which have an intermediate dependence on the synaptic strength.

Gütig et al. (2003) further demonstrated that the unimodal distribution is the rule rather than the exception for update rules of this form. A bimodal distribution is only produced by rules with a very weak weight dependence (i.e. *μ* ≪ 1). Moreover, the critical value for *μ* at which bimodal distributions appear decreases as the the effective population size *Nrτ* increases, where *N* is the number of synapses converging onto the postsynaptic neuron, *r* is the rate of the input spike trains in Hz and *τ* is the time constant of the STDP window (assumed to be equal for potentiation and depression). Figure 5c shows the equilibrium distributions as a function of *μ* for *N* = 1,000, *r* = 10Hz and *τ* = 0.02 s. *μ*_{crit} is already very low for this effective population size. Because of the high connectivity of the cortex, we may expect that the effective population size in vivo would be an order of magnitude greater, and so the region of bimodal stability would be vanishingly small according to this analysis. It is worth noting that in the case that a sub-group of inputs is correlated, a bimodal distribution develops for all values of *μ*, whereby the synaptic weights of the correlated group become stronger than those of the uncorrelated group (data not shown—see Gütig et al. 2003). In contrast to a purely additive rule, the peaks of the distributions are not at the extrema of the permitted weight range. Moreover, the bimodal distribution does not persist if the correlations in the input are removed after learning. A unimodal distribution for uncorrelated Poissonian inputs and an ability to develop multimodal distributions in the presence of correlation is also exhibited by the additive/multiplicative update rule proposed by van Rossum et al. (2000): *F*_{+}(*w*)=*λ*, *F*_{−}(*w*)=*λθw*; and by the power law update rule proposed by Morrison et al. (2007) and also Standage et al. (2007): *F*_{+}(*w*) *α λw*^{
μ
}, *F*_{−}(*w*) α *λαw*.

##### 4.1.1.2 4.1.1.2 Fixed point analysis of STDP update rules

*ν*

_{ i/j }=〈

*ρ*

_{ i/j }〉. Assuming stationarity, the raw cross-correlation function is given by

*t*while keeping the delay Δ

*t*between the two spike trains fixed. The synaptic drift is obtained by integrating the synaptic weight changes given by (10) over Δ

*t*weighted by the probability, as expressed by (16), of the temporal difference Δ

*t*occurring between a pre- and post-synaptic spike:

*ρ*

_{ i }>=<

*ρ*

_{ j }>=

*ν*and Γ

_{ ji }(Δ

*t*) =

*ν*

^{2}. We can therefore write:

*ν*of a neuron is dependent on the weight of its incoming synapses and so the right side of this equation cannot be easily determined. However, we can reformulate the equation as:

*.w*= 0, and therefore also by

*˙w*/

*ν*

^{2}= 0. Figure 6 plots (18) for a range of

*w*and a variety of STDP models. In all cases except for additive STDP the curves pass through

*˙w*/

*ν*

^{2}= 0 at an intermediate value of

*w*and with a negative slope, i.e. for weights below the fixed point there is a net potentiating effect, and for weights above the fixed point there is a net depressing effect, resulting in a stable fixed point which is not at an extremum of the weight range. In the case of additive STDP there is no such fixed point, stable or otherwise.

*α*, threshold

*ν*

_{0}and membrane potential \({u_i}(t) = {\Sigma _j}{w _{ij}}\varepsilon (t - t_j^f)\), where

*∈*(

*t*) denotes the time course of an excitatory postsynaptic potential generated by a presynaptic spike arrival. The notation [

*x*]

_{+}denotes a piecewise linear function: [

*x*]

_{+}=

*x*for

*x*> 0 and zero otherwise. In the following we assume that the argument of our piecewise linear function is positive so that we can suppress the square brackets. Assuming once again that all input spike trains are Poisson processes with rate

*ν*, the expected firing rate of the postsynaptic neuron is simply:

*ε∈*(

*s*)d

*s*, the total area under an excitatory postsynaptic potential. The conditional rate of firing given an input spike at time

*t*

_{ j }

^{ f }is given by

_{ ji }. Hence, in addition to the terms in (18), the synaptic dynamics contains a term of the form

*αvw F*

_{+}(

*w*)

*ε*

*K*

_{+}(

*s*)ƒ(

*s*)d

*s*that is linear rather than quadratic in the presynaptic firing rate (Kempter et al. 1999, Kempter et al. 2001). With this additional term, (18) becomes

*C*

_{ss}=

*F*

_{+}(

*w*)

*εK*

_{+}(

*s*)

*∈*(

*s*)d

*s*denotes the contribution of the spike-spike correlations. In contrast to the curves in Fig. 6, the slope at the zero-crossing is now positive, indicating instability of the fixed point. This instability leads to the formation of a bimodal weight distribution that is typical for the additive model. Despite the instability of individual weights (which move to their upper or lower bounds), the mean firing rate of the neuron is stabilized (Kempter et al. 2001). To see this we consider the evolution of the output rate d

*ν*

_{ i }/d

*t*=

*αν¯ε*Σ

_{ j }d

*w*

_{ ij }/d

*t*. Since \(d{w _{ij}}/{\rm{d}}t = - C{\nu _i}\nu + \alpha \nu {w_{ij}}{C_{{\rm{ss}}}}\) and \(\alpha \nu \bar \varepsilon {\Sigma _j}{w _{ij}} = {\nu _i} + {\nu _0}\), we can write:

*N*is the number of synapses converging on the post-synaptic neuron. Thus we have a dynamics of the form:

*ν*

_{0}> 0 and

*C*> 0. The first condition states that, in the absence of any input, the neuron does not show any spontaneous activity, and this is trivially true for all standard neuron models, including the integrate-and-fire model. The latter condition is equivalent to the requirement that the integral over the STDP curve be negative: \(\smallint {\rm{d}}s[{F_ + }(w){K_ + }(s) - {F_ - }(w){K_ - }(s)] = - C < 0\). Exact conditions for stabilization of output rates are given in Kempter et al. (2001). Since for constant input rates

*ν*we have \({\nu _i} = {\nu _0} + \alpha \nu \bar \varepsilon {\Sigma _j}{w_{ij}}\), stabilization of the output rate implies normalization of the summed weights. Hence STDP can lead to a control of total presynaptic input and of the postsynaptic firing rate — a feature that is usually associated with homeostatic processes rather than Hebbian learning per se (Kempter et al. 1999, Kempter et al. 2001; Song et al. 2000).

Note that the existence of a fixed point and its stability does not crucially depend on the presence of soft or hard bounds on the weight. Equations (18) and (19) can equate to zero for hard-bounded or or unbounded rules.

##### 4.1.1.3 4.1.1.3 Consequences for network stability

Results on the consequences of STDP in large-scale networks are few and far between, and tend to contradict each other. Part of the reason for the lack of simulation papers on this important subject is the fact that simulating such networks consumes huge amounts of memory, is computationally expensive, and potentially requires extremely long simulation times to overcome transients in the weight dynamics which can be of the order of hundreds of seconds of biological time. A lack of theoretical papers on the subject can be explained by the complexity of the interactions between the activity dynamics of the network and the weight dynamics, although some progress is being made in this area (Burkitt et al. 2007).

It was recently shown that power law STDP is compatible with balanced random networks in the asynchronousirregular regime (Morrison et al. 2007), resulting in a unimodal distribution of weights and no self-organization of structure. This result was verified for Gütig et al. (2003) STDP for an intermediate value of the exponent (*μ* = 0.4). Although it has not yet been possible to perform systematic tests, it seems likely that all the formulations of STDP with the fixed point structure discussed in Sect. 4.1.1.1 would give qualitatively similar behavior. The results for additive STDP seem to be more contradictory. Izhikevich et al. (2004) reported self-organization of neuronal groups, whereas the chief feature of the networks investigated by Iglesias et al. (2005) seems to be extensive withering of the synaptic connections. In the former case, it is the existence of many strong synapses which defines the network, in the latter, the presence of many weak ones. This discrepancy may be attributable to different choices for the effective stabilized firing rates (20) in combination with different choices of delays in the network, see Sect. 4.1.3.

#### 4.1.2 4.1.2 Spike pairing scheme

*t*

_{ j }

^{ f }and

*t*

_{ i }

^{ f }denote the firing times of the pre- and postsynaptic neurons respectively, and

*x*

_{ j }

^{−}gives the value of

*x*

_{ j }just before the update. In other words, the trace is reset to 1 on the occurrence of a presynaptic spike and reset to 0 on the occurrence of a postsynaptic spike. Similarly, the reduced symmetric interpretation shown in Fig. 7c can be implemented by pre- and postsynaptic ‘doubly resetting’ traces of this form.

It is sometimes assumed that the scheme used makes no difference, as the ISI of cortical network models is typically an order of magnitude larger than the time constant of the STDP window. However, this is not generally true (Kempter et al. 2001; Izhikevich and Desai 2003; Morrison et al. 2007). For a review of a wide variety of schemes and their consequences, particularly with respect to selectivity of higher-frequency inputs, see Burkitt et al. (2004). Experimental results on this issue suggest limited interaction between pairs of spikes. Sjostrom et al. (2001) found that their data was best fit by a nearest neighbor interaction similar to Fig. 7c but giving precedence to LTP, i.e. a postsynaptic spike can only contribute to a post-before-pre pairing if it has not already contributed to a pre-before-post pairing. However, this result may also be due to the limitations of pair-based STDP models to explain the experimentally observed frequency dependence, see Sect. 4.2. More recently, Froemke et al. (2006) demonstrated that the amount of LTD was not dependent on the number of presynaptic spikes following a postsynaptic spike, suggesting nearest-neighbor interactions for depression as in Fig. 7c. However, the amount of LTP was negatively correlated with the number of presynaptic spikes preceding a postsynaptic spike. This suggests that multiple spike pairings contribute to LTP, but not in the linear fashion of the all-to-all scheme, which would predict a positive correlation between the number of spikes and the amount of LTP. Again, these results are good evidence for the limitations of pair-based STDP rules.

#### 4.1.3 4.1.3 Synaptic delays

*t*as the temporal difference between a pre- and a postsynaptic spike, i.e. \(\Delta t = t_i^f - t_j^f\). However, many classic STDP experiments are expressed in terms of the temporal difference between the start of the EPSP and the postsynaptic spike (Markram et al. 1997; Bi and Poo 1998). In fact, when a presynaptic spike is generated at

*t*

_{ j }

^{ f }, it must first travel down the axon before arriving at the synapse, thus arriving at \(t_j^s = t_j^f + {d_{\rm{A}}}\), where

*d*

_{A}is the axonal propagation delay. Similarly, a postsynaptic spike at

*t*

_{ i }

^{ f }must backpropagate through the dendrite before arriving at the synapse at \(t_i^s = t_i^f + {d_{{\rm{BP}}}}\), where

*d*

_{BP}is the backpropagation delay. Consequently, the relevant temporal difference for STDP update rules is \(\Delta {t^s} = t_i^s - t_j^s\) as initially suggested by Gerstner et al. (1993) and Debanne et al. (1998). Senn et al. (2002) showed that under fairly general conditions, STDP may cause adaptation in the presynaptic and postsynaptic delays in order to optimize the effect of the presynaptic spike on the postsynaptic neuron. In order to calculate the synaptic drift as in (17), we therefore need to integrate the synaptic weight changes over Δ

*t*

^{ s }, weighted by the raw cross-correlation function

*at the synapse*. With \({\Gamma _{ji}}(\Delta t) = {\Gamma _{ji}}(\Delta {t^s} + ({d_{\rm{A}}} - {d_{{\rm{BP}}}}))\), we reformulate (17) as:

*d*

_{A}−

*d*

_{BP}) has no effect, as Γ

_{ ji }(Δ

*t*)} is constant. Generally, however, this is not the case. For example, networks of neurons, both in experiment and simulation, typically exhibit oscillations with a period several times larger than the synaptic delay, even when individual spike trains are irregular (see Kriener et al. 2008, for discussion). If the axonal delay is the same as the backpropagation delay, i.e.

*d*

_{A}=

*d*

_{BP}=

*d/*2, where

*d*is the total transmission delay of the spike, the raw cross-correlation function at the synapse is the same as the raw cross-correlation at the soma:

*w*

_{0}be the synaptic weight for which the synaptic drift given in (21) is 0, i.e. the fixed point of the synaptic dynamics for the cross-correlation shown. If the axonal delay is larger than the backpropagation delay, this results in a shift of the raw cross-correlation function to the left. This is shown in Fig. 8a for the extreme case of

*d*

_{A}=

*d*,

*d*

_{BP}= 0, resulting in a net shift of

*d*. This increases the value of the first integral in (21) and decreases the second integral, such that

*˙w*< 0 at

*w*

_{0}. Conversely, if the axonal delay is smaller than the backpropagation delay, the raw cross-correlation function is shifted to the right (Fig. 8c, for the extreme case of

*d*

_{A}= 0,

*d*

_{BP}=

*d*). This decreases the value of the first integral in (21) and increases the second integral, such that

*˙w*> 0 at

*w*

_{0}. Therefore, a given network dynamics may cause systematic depression, systematic potentiation or no systematic change at all to the synaptic weights, depending on the partition of the synaptic delay into axonal and dendritic contributions. Systematic synaptic weight changes can in turn result in qualitatively different network behavior. For example, in Morrison et al. (2007) small systematic biases in the synaptic weight dynamics were applied to a network with an equilibrium characterized by a unimodal weight distribution and medium rate (< 10 Hz) asynchronous irregular activity dynamics. Here, a small systematic depression led to a lower weight, lower rate equilibrium also in the asynchronous irregular regime, whereas a systematic potentiation led to a sudden transition out of the asynchronous irregular regime: the activity was characterized by strongly patterned high-rate peaks of activity interspersed with silence, and the unimodal weight distribution splintered into several peaks.

### 4.2 4.2 Beyond pair effects

There is considerable evidence that the pair-based rules discussed above cannot give a full account of STDP. Specifically, they reproduce neither the dependence of plasticity on the repetition frequency of pairs of spikes in an experimental protocol, nor the results of recent triplet and quadruplet experiments.

STDP experiments are usually carried out with about 60 pairs of spikes. The temporal distance of the spikes in the pair is of the order of a few to tens of milliseconds, whereas the temporal distance between the pairs is of the order of hundreds of milliseconds to seconds. In the case of a facilitation protocol (i.e. pre-before-post), standard pair-based STDP models predict that if the repetition frequency is increased, the strength of the depressing interaction (i.e. post-before-pre) becomes greater, leading to less net potentiation. This prediction is independent of whether the spike pairing scheme is all-to-all or nearest neighbor (see Sect. 4.1.2). However, experiments show that increasing the repetition frequency leads to an increase in potentiation (Sjostrom et al. 2001). Other recent experiments employed multiple-spike protocols, such as repeated presentations of symmetric triplets of the form pre-post-pre and post-pre-post (Bi and Wang 2002; Froemke and Dan 2002; Wang et al. 2005; Froemke et al. 2006). Standard pair-based models predict that the two sequences should give essentially the same results, as they each contain one pre-post pair and one post-pre pair. Experimentally, quite different results are observed.

Here we review two examples of simple models which account for these experimental findings. For other models which also reproduce frequency dependence or multiple-spike protocol results, see Abarbanel et al. (2002), Senn (2002) and Appleby and Elliott (2005).

#### 4.2.1 4.2.1 Triplet model

*j*contributes to a trace

*x*

_{ j }at the synapse:

*t*

_{ j }

^{ f }denotes the firing times of the presynaptic neuron. Unlike pair-based rules, each spike from postsynaptic neuron

*i*contributes to a fast trace

*y*

_{ i }

^{1}and a slow trace

*y*

_{ i }

^{2}at the synapse:

*τ*

_{1}<

*τ*

_{2}, see Fig. 9. LTD is induced as in the standard STDP pair model given in (13), i.e. the weight change is proportional to the value of the fast postsynaptic trace

*y*{sr

*i*/1} evaluated at the moment of a presynaptic spike. The new feature of the rule is that LTP is induced by a triplet effect: the weight change is proportional to the value of the presynaptic trace

*x*

_{ j }evaluated at the moment of a postsynaptic spike and also to the slow postsynaptic trace

*y*

_{ i }

^{2}remaining from previous postsynaptic spikes:

*t*

_{ i }

^{f−}indicates that the function

*y*

_{ i }

^{2}is to be evaluated before it is incremented due to the postsynaptic spike at

*t*

_{ i }

^{ f }. Analogously to pair-based models, the triplet rule can also be implemented with nearest-neighbor rather than all-to-all spike pairings by an appropriate choice of trace dynamics, see Sect. 4.1.2.

#### 4.2.2 4.2.2 Suppression model

*efficacy*of the spikes. Each spike of presynaptic neuron

*j*sets the presynaptic spike efficacy

*∈*

_{ j }to 0 whereafter it recovers exponentially to 1 with a time constant

*τ*

_{ j }. The efficacy of the

*n*th presynaptic spike is given by:

*t*

_{ j }

^{ n }denotes the

*n*th spike of neuron

*j*. In other words, the efficacy of a spike is suppressed by the proximity of a previous spike. Similarly, the postsynaptic spike efficacy is reset to 0 by each spike of postsynaptic neuron

*i*, recovering exponentially to 1 with time constant

*τ*

_{ i }. The model can be implemented with local variables as follows. Each presynaptic spike contributes to an efficacy trace

*∈*

_{ j }(

*t*) with dynamics:

*∈*

_{ j }

^{−}denotes the value of

*∈*

_{ j }just before the update. The standard presynaptic trace

*x*

_{ j }given in (11) is adapted to take the spike efficacy into account:

*x*

_{ j }by the value of the spike efficacy before the update. Similarly, each postsynaptic spike contributes to an efficacy trace

*∈*

_{ i }(

*t*) with dynamics:

*y*

_{ i }with increments weighted by the postsynaptic spike efficacy:

This model gives a good fit to triplet and quadruplet protocols in visual cortex slice, and also gives a much better prediction for synaptic modification due to natural spike trains (Froemke and Dan 2002). However, it does not predict the increase of LTP with the repetition frequency observed by Sjostrom et al. (2001). A revised version of the model (Froemke et al. 2006) also accounts for the switch of LTD to LTP at high frequencies by modifying the efficacy functions.

### 4.3 4.3 Voltage dependence

Traditional LTP/LTD experiments employ the following induction paradigm: the postsynaptic neuron is held at a fixed depolarization while one or several presynaptic neurons are activated. Often a presynaptic pathway is stimulated extracellularly, so that several presynaptic neurons are activated. Depending on the level of the postsynaptic membrane potential, the activated synapses increase their efficacy while other non-activated synapses do not change their weight (Artola et al. 1990; Artola and Singer 1993). More recently, depolarization has also been combined with STDP experiments. In particular, Sjostrom et al. (2004) showed a dependence of synaptic weight changes on the synaptic membrane potential just before a postsynaptic spike.

There is an ongoing discussion whether the voltage dependence is more fundamental than the dependence on postsynaptic spiking. Indeed, voltage dependence alone can generate STDP-like behavior (Brader et al. 2007), as the membrane potential behaves in a characteristic way in the vicinity of a spike (high shortly before a spike, and low shortly after). Alternatively, a dependence on the slope of the postsynaptic membrane potential has also been shown to reproduce the characteristic STDP weight change curve (Saudargiene et al. 2003). The voltage effects caused by back-propagating spikes is implicitly contained in the mechanistic formulation of STDP models outlined above. In particular, the fast postsynaptic trace *y*
_{1} in the above triplet model could be seen as an approximation of a back-propagating action potential. However, the converse is not true: a pure STDP rule does not automatically generate a voltage dependence. Moreover, synaptic effects caused by subthreshold depolarization in the absence of postsynaptic firing cannot be modeled by standard STDP or triplet models.

### 4.4 4.4 Induction versus maintenance

We stress that all the above models concern induction of potentiation and depression, but not their maintenance. The induction of LTP may take only a few seconds: for example, stimulation with 50 pairs of pre- and postsynaptic spikes given at 20Hz takes less than 3 s. However, afterwards the synapse takes 60 min or more to consolidate these changes, and this process may also be interrupted (Frey and Morris 1997). During this time synapses are ‘tagged’, that is, they are ready for consolidation. Consolidation is thought to rely on a different molecular mechanism than that of induction. Simply speaking, gene transcription is necessary to trigger the building of new proteins that increase the synaptic efficacy.

#### 4.4.1 4.4.1 Functional consequences

Long-term stability of synapses is necessary to retain memories that have been learned, despite ongoing activity of presynaptic neurons. A simple possibility used in many models is that plasticity is simply switched off once the neuron has learned what it should. This approach makes sense in the context of reward-based learning: the learning rate goes to zero once the actual reward equals the expected reward and learning stops automatically (see Sect. 5.2). It also makes sense in the framework of supervised learning (see Sect. 5.1). Learning is normally driven by the difference between desired output and actual output. However, in the context of unsupervised learning it is inconsistent to switch off the dynamics. Nevertheless, receptive field properties should be retained for a fairly long time even if the stimulation characteristic changes.

#### 4.4.2 4.4.2 Bistability model

*w*is calculated by one of the STDP or short-term plasticity models. Maintenance is implemented by adding on top of the STDP dynamics a slow bistable dynamics (Gerstner and Kistler 2002):

*τ*

_{a}is a time constant of consolidation in the range of several minutes of biological time. The result is that in the absence of any stimulation, individual synapses evolve towards binary values of 0 or 1 which are intrinsically stable fixed points of the slow dynamics. As a result, rather strong stimuli are necessary to perturb the synaptic dynamics.

#### 4.4.3 4.4.3 Biological evidence

Whether single synapses themselves are binary or continuous is a matter of intense debate. Some experiments have suggested that synapses are binary (Petersen et al. 1998; O’Connor et al. 2005). However, this would seem to result in a bistable distribution of weights which is at odds with the unimodal distribution reported by other studies (Turrigiano et al. 1998; Sjostrom et al. 2001; Song et al. 2005), and with the finding that the magnitude of LTP/LTD increases with the number of spike pairs in a protocol until saturation is reached (Froemke et al. 2006).

Some possibilities to reconcile these findings include: (i) since pairs of neurons form several contacts with each other, it is likely that in standard plasticity experiments several synapses are measured at the same time; (ii) LTP and STDP results are typically reported as pooled experiments over several pairs of neurons. Under the assumption that the upper bound is not the same for all synapses, a broad distribution could result; (iii) both unimodal distribution and bimodal distributions could be stable. Untrained neurons would show a unimodal distribution whereas neurons that have learned to respond to a specific pattern would develop a bimodal distribution of synaptic weights (Toyoizumi et al. 2007); (iv) all synapses are binary, but the efficacy of the ‘strong’ state is subject to short-term plasticity and homeostasis; (v) some synapses are binary and some are not. Potentially a combination of several of these possibilities must be considered in order to explain the experimental findings.

## 5 5 Supervised and reinforcement learning

All the models considered in Sect. 4 are unsupervised ‘Hebbian’ rules: changes are triggered as a result of combined action of pre- and postsynaptic neurons. The postsynaptic neuron itself is driven by its input arising from presynaptic neurons. There is no notion of whether or not the postsynaptic output is ‘good’ or ‘useful’. If, however, the local variables are combined with global teacher or reinforcement signals, completely different learning paradigms are possible.

### 5.1 5.1 Supervised learning

Supervised plasticity has been demonstrated experimentally by Fregnac and Schulz (2006): the behavior of a (cortical) neuron can be changed by pairing some class of stimuli with an (artificial) increase of neural activity while pairing another class of stimuli with a decrease of responsiveness. Theoretical studies have demonstrated that a teacher-forced STDP approach can be used to learn precise spike times (Legenstein et al. 2005; Pfister et al. 2006). In a natural situation, this would mean that a few strong neural inputs can drive the neuron and therefore drive learning of other inputs. If these strong inputs are controlled in a task-specific way, they act as a teacher for the postsynaptic neuron. For a practical realization of this idea see Brader et al. (2007).

### 5.2 5.2 Reinforcement learning

If neuronal activity leads to actions, feedback may arise from the environment in forms of reward (a piece of pizza) or punishment (burnt fingers). It is thought that success of an action is signaled by neuromodulators—a top candidate is dopamine (Schultz et al. 1997). Dopamine signals are closely related to a quantity in reinforcement learning known as *δ*, that can be interpreted as the difference between the received reward and the expected reward. Here ‘reward’ means current or future rewards that can be reliably predicted. In reinforcement learning, the difference between actual and expected rewards plays an important role for the update of weights in *Q*-learning, SARSA, and related variants of temporal difference learning (Sutton and Barto 1998).

Under a suitable interpretation of the role of pre- and postsynaptic neurons, the weight update rules can be derived from an optimality framework (Pfister et al. 2006). The learning rule can be interpreted as a Hebbian learning based on joined pre- and postsynaptic activity, but conditioned on the presence of a global reward signal. Variants of such reinforcement rules for spiking neurons have been developed (Seung 2003; Pfister et al. 2006; Izhikevich 2007; Florian 2007).

## 6 6 Discussion

Pair-based STDP models can be decomposed into three aspects: weight dependence, spike-pairing scheme and delay partition (Sect. 4.1).We have shown that all of these aspects can have significant consequences for the behavior of the model system under investigation. However, in many cases there is not enough experimental data to settle these questions definitively. Therefore, choices for each aspect should be made consciously and take into consideration the relevant available experimental findings. Moreover, these choices should be explicitly documented and critically addressed: it should be clear to what extent results depend on the specific choices.

In particular, the choice of STDP weight dependence is critical. The available evidence suggests that both potentiation and depression are dependent on the weight. Whereas it is useful to start with very simplified models to gain insight, we now know that STDP models which assume some weight dependence produce qualitatively different behavior from the additive model. Moreover, weight dependent rules are no harder to implement computationally than additive rules. In the absence of fresh experimental evidence supporting an additive rule, weight dependent rules should therefore be considered as the standard.

Pair-based models of STDP have their limitations. They give incorrect predictions for many experiments such as triplet and quadruplet protocols and cannot account for synaptic modification due to natural spike trains or pairing protocols at different frequencies. Models of STDP that are beyond the pair-based framework (Sect. 4.2) can account for these findings at the cost of only a small number of additional variables, and so should attract increasing theoretical interest.

In this manuscript, we have considered models in which synaptic modifications depend only on spike timing. However, this ignores many aspects of synaptic plasticity which may prove to be of great importance to the functioning of the brain, and will therefore have to be taken into consideration in future phenomenological modeling. Most STDP models assume that the absolute synaptic strength is modified (but see Senn 2002). However, it may turn out that a formulation in terms of the release probability is a more accurate description, thus allowing a unified view of short-term and long-term plasticity. Additionally, STDP has been shown to be sensitive to a number of factors beyond spike timing, for example active dendritic properties and the location of the synapse on the dendrite — see Kampa et al. (2007) for a review. There is also substantial evidence that inhibition is an important physiological feature fine-tuning induction and maintenance of LTP/LTD. Inhibition gates induction of LTP/LTD as a function of physiological conditions and physiologically-induced changes in the activity of networks (Larson and Lynch 1986; Pacelli et al. 1989; Radpour and Thomson 1991; Steele and Mauk 1999; Nishiyama et al. 2000; Togashi et al. 2003). Here, the main challenge is to derive appropriate phenomenological models from experiments and detailed biophysical models. Finally, although some progress has been made in investigating the interactions of STDP with other plasticity mechanisms such as homeostasis and heterosynaptic spread of LTP/LTD(van Rossum et al. 2000; Toyoizumi et al. 2005, Toyoizumi et al. 2007; Triesch 2007), this complex topic remains largely unexplored. In this area, the main challenge is to perform analytical and simulation studies which can identify and characterize their composite effects, and investigate their functional consequences.

## Acknowledgments

The idea for this paper grew out of a FACETS workshop on synaptic plasticity held in Lausanne in June 2006. We would therefore like to thank all the participants, especially A. Davison, A. Destexhe, Y. Fregnac, C. Lamy, R. Legenstein, W. Maass, J.-P. Pfister, and A. Thomson for their contributions. We also thank M. Helias for helpful discussions about short-term plasticity and the implementation in NEST, and G. Hennequin for proofreading the manuscript. We are very grateful to G-q. Bi and M-m. Poo for providing us with their original data. This work was partially funded by EU Grant 15879 (FACETS), DIP F1.2 and BMBF Grant 01GQ0420 to the Bernstein Center for Computational Neuroscience Freiburg.

### Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.