The Impact of Sparse Coding on Memory Lifetimes in Simple and Complex Models of Synaptic Plasticity

Models of associative memory with discrete state synapses learn new memories by forgetting old ones. In the simplest models, memories are forgotten exponentially quickly. Sparse population coding ameliorates this problem, as do complex models of synaptic plasticity that posit internal synaptic states, giving rise to synaptic metaplasticity. We examine memory lifetimes in both simple and complex models of synaptic plasticity with sparse coding. We consider our own integrative, filter-based model of synaptic plasticity, and examine the cascade and serial synapse models for comparison. We explore memory lifetimes at both the single-neuron and the population level, allowing for spontaneous activity. Memory lifetimes are defined using either a signal-to-noise ratio (SNR) approach or a first passage time (FPT) method, although we use the latter only for simple models at the single-neuron level. All studied models exhibit a decrease in the optimal single-neuron SNR memory lifetime, optimised with respect to sparseness, as the probability of synaptic updates decreases or, equivalently, as synaptic complexity increases. This holds regardless of spontaneous activity levels. In contrast, at the population level, even a low but nonzero level of spontaneous activity is critical in facilitating an increase in optimal SNR memory lifetimes with increasing synaptic complexity, but only in filter and serial models. However, SNR memory lifetimes are valid only in an asymptotic regime in which a mean field approximation is valid. By considering FPT memory lifetimes, we find that this asymptotic regime is not satisfied for very sparse coding, violating the conditions for the optimisation of single-perceptron SNR memory lifetimes with respect to sparseness. Similar violations are also expected for complex models of synaptic plasticity.


Introduction
One line of experimental evidence suggests that synapses may occupy only a very limited number of discrete states of synaptic strength (Petersen et al. 1998;Madison 2002, 2004;O'Connor et al. 2005a, b;Bartol et al. 2015), or may change their strengths via discrete, jump-like processes (Yasuda et al. 2003;Bagal et al. 2005;Sobczyk and Svoboda 2007). Discrete state synapses overcome the catastrophic forgetting of the Hopfield model (Hopfield 1982) in associative memory tasks, turning memory systems into socalled palimpsests, which learn new memories by forgetting old ones (Nadal et al. 1986;Parisi 1986 memory lifetimes in the simplest such models are rather limited, growing only logarithmically with the number of synapses (Tsodyks 1990;Amit and Fusi 1994; see also Leibold and Kempter 2006;Barrett and van Rossum 2008;Huang and Amit 2010). Memory lifetimes may be extended by considering either sparse coding at the population level (Tsodyks and Feigel'man 1988) or complex models of synaptic plasticity in which synapses can express metaplasticity (changes in internal states) without necessarily expressing plasticity (changes in strength) (Fusi et al. 2005;Leibold and Kempter 2008;Elliott and Lagogiannis 2012;Lahiri and Ganguli 2013). Two previous studies have examined complex models of synaptic plasticity operating in concert with sparse coding (Leibold and Kempter 2008;Rubin and Fusi 2007). For a discussion of the possible roles of the persistence and transience of memories and the synaptic mechanisms underlying synaptic stability, see, for example, Richards and Frankland (2017), and Rao-Ruiz et al. (2021).
We have proposed integrate-and-express models of synaptic plasticity in which synapses act as low-pass filters in order to control fluctuations in developmental patterns of synaptic connectivity (Elliott 2008;Elliott and Lagogiannis 2009). We have also applied these complex models of synaptic plasticity to memory formation, retention and longevity with discrete synapses (Elliott and Lagogiannis 2012), finding that they outperform cascade models (Fusi et al. 2005) in most biologically relevant regions of parameter space (Elliott 2016b). In this paper, we consider the role of sparse coding in the memory dynamics of a filter-based model. For comparison, we also consider the cascade model (Fusi et al. 2005), the serial synapse model (Leibold and Kempter 2008;Rubin and Fusi 2007) and a model of simple synapses (Tsodyks 1990) using our protocols.
Our paper is organised as follows. In Sect. 2, we present our general approach by describing the two memory storage protocols that we study, considering two different definitions of memory lifetimes, and obtaining general, modelindependent results. Then, in Sect. 3, we consider both simple and complex models of synaptic plasticity, obtaining the analytical results required to study memory lifetimes in detail. We compare and contrast results for memory lifetimes in simple and complex models in Sect. 4. Finally, in Sect. 5, we briefly discuss our results.

General approach and formulation
We provide a convenient list of the most commonly used mathematical symbols and their meanings, excluding those that appear in the appendices, in Table 1.

Memories and memory lifetimes
We consider a population of P neurons forming a memory system, perhaps performing association or auto-association tasks. Let each neuron receive N synaptic connections from N other neurons that are randomly selected from the entire population. Fully recurrent connectivity would imply that N = P − 1 (excluding self-connections) but in general N P. Other than the requirement that N < P, N may be regarded as mathematically independent of P. Memories are stored sequentially, one after the other, by this memory system. We take them to be stored at times t ≥ 0 s governed by a Poisson process of rate r Hz. This continuous time approach is more realistic than a discrete time approach in which memories are stored at uniformly spaced time steps. Due to ongoing synaptic plasticity driven by the storage of later memories, the synaptic patterns that embody earlier memories may be degraded, so that the fidelity of recall of earlier memories may fall over time, ultimately falling to an equilibrium or background level of complete amnesia. It is typical in these scenarios to track the fidelity of recall of the first memory as subsequent memories are stored. This first memory is taken to be stored at time t = 0 s on the background equilibrium probability distribution of synaptic strengths.
In previous work, we have focused on a single neuron, or perceptron, in such a system and have examined its recall of stored memories. Here, in a sparse population coding context, we must consider the collective dynamics of the entire population of neurons, but these collective dynamics are nevertheless driven by synaptic processes occurring at the level of single perceptrons in the system. Considering, then, a single perceptron in this population, let its N synapses have strengths S i (t) ∈ {−1, +1}, i = 1, . . . , N , at time t ≥ 0 s. These two strength states should be thought of as low and high rather than inhibitory and excitatory. As memories are presented to the system for storage, the perceptron is exposed to synaptic inputs characterised by the N -dimensional vectors ξ α , α = 0, 1, 2, . . ., where α indexes the memories. The component ξ α i represents the input through synapse i during the presentation of memory α, and for simplicity we assume that these components are independent between synapses and across memories.
In response to each of these memory vectors, the perceptron must generate the correct activation or output. With inputs x i through its N synapses, the perceptron's activation is defined as usual by The perceptron's output is some possibly nonlinear function of its activation, where this output can correspond to spontaneous activity under conditions of no (or spontaneous) input. We track the fidelity of recall of the first memory ξ 0 by examining the perceptron's activation upon re-presentation (but not re-storage) of this memory at later times t > 0 s. We refer to h(t) ≡ h ξ 0 (t) as the tracked memory signal or just the memory signal. The dynamics of h(t) will determine the lifetime of memory ξ 0 , at least as far as this single perceptron's capacity to generate the correct output upon re-presentation of ξ 0 is concerned. Of course, we are not interested in the lifetime of any particular tracked memory ξ 0 stored on any particular pattern of synaptic connectivity and subject to any particular sequence of subsequent, non-tracked memories ξ α , α > 0, stored at any particular set of Poisson-distributed times 0 < t 1 < t 2 < t 3 < · · · . Rather, we are interested only in the lifetime of a typical tracked memory subject to a typical sequence of later memories. Thus, we consider only the statistical properties of h(t) when suitably averaged over all memories.  Operator describing simultaneous changes in n synapses' states at each (average) non-tracked memory storage step A n Normalised unit eigenstate of T n , giving the joint equilibrium probability distribution of n synapses' states T = −1 T | +1 T , the vector by which to weight a synapse's internal states by their strengths P eff Number of neurons in the population of P neurons that experience evoked activity during tracked memory storage μ p (t), σ p (t) 2 , SNR p (t) Mean, variance and signal-to-noise ratio of the population memory signal τ pop SNR memory lifetime of a typical memory for the population of neurons p, ψ Probability p of synaptic updates for a simple stochastic updater (SU) synapse; ψ = f p is a convenient shorthand κ 2 κ 2 = ψ/(2 − ψ), the equilibrium correlation between pairs of SU synapses' strengths in the Hebb protocol Memory lifetimes may be defined in a variety of ways using these statistical properties. The simplest definition is to consider the mean and variance of h(t) define the signal-to-noise ratio (SNR) as: and then define the memory lifetime as that value of t, call it τ snr , that is the (largest) finite, non-negative solution of the equation SNR(τ snr ) = 1 when this solution exists; otherwise, we set τ snr = 0 s (Tsodyks 1990). This is the last time at which μ(t) is distinguishable from its equilibrium value μ(∞) at the level of one standard deviation. Although an "ideal observer" approach to defining memory lifetimes has also been considered (Fusi et al. 2005;Lahiri and Ganguli 2013), it is essentially equivalent to the SNR approach (Elliott 2016b). The activation h(t) provides a direct read-out of the perceptron's response to the re-presentation of ξ 0 at later times, and would correspond to a neuron's membrane potential in a more realistic, integrate-and-fire model. By focusing on this read-out of the perceptron's state, we are naturally led to consider the first passage time (FPT) for the perceptron's activation to fall below firing threshold, and thus to consider the mean first passage time (MFPT) for this process, which is the mean taken over all tracked and non-tracked memories (Elliott 2014). We may then define an alternative memory lifetime, call it τ mfpt , as this MFPT for the perceptron's activation in response to re-presentation of a typical tracked memory to fall below firing threshold. We have extensively discussed and contrasted the SNR and FPT approaches to defining memory lifetimes elsewhere (Elliott 2014(Elliott , 2016a(Elliott , 2017a. In essence, SNR memory lifetimes are only valid in asymptotic, typically large N regimes, while FPT memory lifetimes are valid in all regimes. SNR lifetimes must therefore be interpreted with caution. To compute FPT lifetimes, we require Prob[h α+1 |h α ], the transition probability describing the probability that the perceptron's activation (in response to re-presentation of the tracked memory) is h α+1 immediately after the storage of average non-tracked memory ξ α+1 , given that its activation is h α immediately before the memory's storage. This transition probability is most easily computed in simple models of synaptic plasticity, for which it is independent of the memory storage step (Elliott 2014(Elliott , 2017a(Elliott , 2019. This independence arises because simple synapses with only two strength states are "stateless" (Elliott 2020), having no internal states and not enough strength states to carry information between consecutive memory storage steps. In this case, all the probabilities Prob[h α+1 |h α ] over all the possible, discrete values of h α and h α+1 define the elements of a transition matrix in the perceptron's activation between memory storage steps that is independent of the non-tracked memory storage step α + 1 (α ≥ 0). We can then drop the index α and consider general elements Prob[ h | h ] between any two possible values of the perceptron activation, h and h . We will therefore examine FPT lifetimes only for simple synapses, and SNR memory lifetimes for both simple and complex synapses, but with the understanding that SNR results must be interpreted cautiously. With the transition probabilities Prob[ h | h ] being independent of the memory storage step, and with the storage of the definite tracked memory ξ 0 inducing the definite activation h 0 immediately after its storage, the FPT lifetime of the memory ξ 0 is the solution of the equation where transitions to activations below the firing threshold ϑ are disallowed (Elliott 2014; see van Kampen 1992, for a general discussion). Equation (4) where the expectation values are calculated with respect to Then, standard methods (Elliott 2014;see van Kampen 1992, in general) give the MFPT as the solution of the equation: subject to the boundary condition r τ mfpt (ϑ) = 0. Equations similar to Eqs. (4) and (6) give the higher-order FPT moments (Elliott 2019). Given τ mfpt (h 0 ), we obtain τ mfpt by averaging over the distribution of h 0 (for values of h 0 > ϑ), corresponding to averaging over the tracked memory ξ 0 , i.e. τ mfpt = τ mfpt (h 0 ) h 0 >ϑ .

Hebb protocol
We adopt and adapt the memory storage protocol employed by Leibold and Kempter (2008). Their memory system performs an association task. Within the population of P neurons, a sub-population of "cue" neurons is required to activate a sub-population of "target" neurons. Synapses from cue to target neurons experience potentiating induction signals during memory storage, while those from target to cue experience depressing induction signals; all other synapses do not experience plasticity induction signals. Although Leibold and Kempter (2008) do allow for the possibility of some overlap between cue and target sub-populations, this will not be relevant here. The storage of different memories involves different cue and target sub-populations, so that the entire population of P neurons will be involved in storing many memories over time. If cue and target sub-populations are of equal size, as we assume, then potentiation and depression processes are equally balanced on average. This assumption stands in lieu of realistic neuron models, in which we expect (Elliott 2016a) synaptic plasticity to be dynamically regulated to move to stable dynamical fixed points in which such balancing is achieved automatically (Bienenstock et al. 1982;Burkitt et al. 2004;Appleby and Elliott 2006). While Leibold and Kempter (2008) consider activities ξ α i ∈ {0, 1}, corresponding to inactive (ξ α i = 0) and active (ξ α i = 1) input neurons, we will consider the more general case of ξ α i ∈ {ζ, 1}, with 0 ≤ ζ < 1, where ξ α i = ζ represents a spontaneous, non-evoked, or background level of activity for an input neuron that is in neither cue nor target sub-populations, while ξ α i = 1 represents evoked activity from a cue or target input. We often refer below to active and inactive inputs or neurons, with the understanding that we mean evoked activity and spontaneous activity, respectively. Because synaptic plasticity occurs only between cue and target neurons, synapses between a pre-or postsynaptic neuron that is only spontaneously active do not undergo synaptic plasticity. This accords with our expectations from known physiology: protocols for long-term potentiation (LTP; Bliss and Lømo 1973) and long-term depression (LTD; Lynch et al. 1977) require sustained bouts of evoked electrical activity rather than just spontaneous levels of activity. On a broadly BCM view of synaptic plasticity (Bienenstock et al. 1982), we would expect two thresholds for synaptic plasticity: as activity levels ramp up from spontaneous to weak to strong tetanisation, plasticity switches from none to LTD to LTP. Since synaptic plasticity can only occur between pairs of active, synaptically coupled neurons in this scenario, we refer to it as the Hebb protocol: Hebbian synaptic plasticity is typically understood to mean activity-dependent, bidirectional synaptic plasticity between active pre-and postsynaptic neurons. Although spontaneous activity has by assumption no impact on synaptic plasticity here, it nevertheless has a direct impact on h(t).
For a particular perceptron, let the probability that it is active during the storage of any particular memory be g. Since the perceptron could be part of either cue or target sub-populations, the probability that it is either cue or target during the storage of a memory is just 1 2 g. The probability that any one of its synaptic inputs is active during memory storage is also just g. However, for the purposes of clarity it is convenient to distinguish between these two probabilities, so we denote the probability that an input is active as f (≡ g). In this way, the appearance of a factor of g indicates a global, postsynaptic factor due to the perceptron, or postsynaptic cell, being in the cue or target population, while a factor of f indicates a local, presynaptic factor due to an input being in the cue or target population. The probability g, or f , controls the sparseness of the memory representation in this memory system. Considering just a single perceptron, if it is neither cue nor target, then none of its synapses can experience plasticity induction signals. If it is a cue, then only those inputs that correspond to target cells (if any) experience plasticity induction signals, and specifically depressing signals. If it is a target, then similarly only cue inputs experience induction signals, and so only potentiating signals. Without loss of generality, we may therefore just assume that during memory storage, an active perceptron's active inputs are either all cue or all target neurons. This simplifying assumption effectively doubles on average the rate of plasticity induction signals experienced by synapses compared to the scenario in which the perceptron's active inputs could represent a combination of cue and target neurons. We could therefore just scale f accordingly.
We summarise the Hebb protocol in Fig. 1, which schematically illustrates a sample of the population of pairs of pre-and postsynaptic neurons, showing all possible combinations of presynaptic activities and postsynaptic roles with their respective probabilities, together with the direction of synaptic plasticity induced by them.
To assess memory lifetimes under this protocol, we may track the ability of the cue sub-population to successfully evoke activity in the target sub-population. Considering a single perceptron in the target sub-population, we may obtain general expressions for μ(t) and σ (t) 2 in Eq. (2), where these expressions are independent of any particular model of synaptic plasticity. Because , their expectation values lead to Six pairs of synaptically coupled neurons are shown. Each cell body is represented by a triangle, with the value (ζ or 1) inside the triangle indicating the neuron's activity during memory storage. A neuron's axon is denoted by a directed line, while two of its dendrites are denoted by the dashed lines. Synaptic coupling is indicated by a small black blob where an axon terminates on a dendrite, with the symbol to the right of the blob indicating the direction of induced synaptic plasticity during memory storage ("↑" indicates potentiation, "↓" depression, and "×" no change). The labels "C", "N" or "T" attached to a postsynaptic cell body indicate that the neuron is a cue cell, neither a cue cell nor a target cell, or a target cell, respectively, in the population. Probabilities of presynaptic activity ( f or 1− f ) are indicated, as are the joint probabilities of postsynaptic activity and specific role ( 1 2 g or 1 − g). The fact that an active presynaptic neuron synapsing on a cue or target cell always experiences the induction of depression or potentiation, respectively, reflects the simplifying assumption discussed in the main text where we could pick any synapse i in Eq. (7a) and any distinct pair of synapses i and j in Eq. (7b) but we restrict without loss of generality to i = 1 and j = 2. In these equations, we condition on whether a synapse has experienced a potentiating induction signal ("+") with probability f or not ("×") with probability 1 − f , during the storage of ξ 0 . For the models of synaptic plasticity that we consider below, the (marginal) equilibrium probability distribution of any single synapse's strength is uniform, or Prob[S i (∞) = ±1] = 1 2 , so that if a synapse does not experience a plasticity induction signal during the storage of ξ 0 , then E [S i (t) | × ] ≡ 0 at t = 0 s and this remains true for all times t ≥ 0 s when potentiation and depression processes are treated symmetrically, as indicated in Eq. (7a). However, for the pairwise correlations in Eq. (7b) that condition on one or both synapses not having experienced an induction signal, the expectation values do not vanish under the Hebb protocol. This is because of the higher-order equilibrium correlational structure induced by the fact that it is impossible for some of the synapses of an active neuron to experience potentiating induction signals while others experience depressing induction signals during the storage of the same memory under the Hebb protocol.
We may obtain general expressions for the expectation values in Eq. (7) by writing down the transition processes that govern changes in a single synapse's strength or simultaneous changes in a pair of synapses' strengths. Let each synapse have s possible internal states for each of its two possible strengths ±1, so that the possible state of a synapse is described by a 2s-dimensional vector, with the internal states for strength −1 (respectively, +1) corresponding to the first (respectively, last) s components. Given the stochastic nature of the plasticity induction signals, this vector defines a joint probability distribution for a synapse's combined strength and internal state. Let the transition matrix M + implement the definite change in a synapse's state in response to a potentiating induction signal, and M − that for a depressing induction signal. We then determine the transition matrix governing the change in a single synapse's state in response to the storage of a typical non-tracked memory by conditioning on all possible combinations of presynaptic activity and postsynaptic role. Defining K ± = (1− f )I+ f M ± , where I is the identity matrix, this transition matrix is The three terms in Eq. (8a) arise from conditioning on the three possible perceptron roles in memory storage (determined by the global factor g), while the two terms in each of K ± arise from conditioning on the two possible levels of presynaptic activity (determined by the local factor f ). Similarly, the transition operator that governs simultaneous changes in pairs of synapses' states during typical non-tracked memory storage is with the generalisation to T n for any number of synapses n being clear. The (marginal) equilibrium probability distribution of a single synapse's state, denoted by A 1 , is the (normalised) eigenvector of T 1 with unit eigenvalue, which is also just the unit eigenvector of M. That for any pair of synapses, A 2 , corresponds to the unit eigenvector of T 2 . However, because T 2 = (1 − g)I ⊗ I + g K ⊗ K, then A 2 = A 1 ⊗ A 1 . Rather, A 2 must be explicitly computed as the unit eigenstate of 1 2 (K + ⊗ K + + K − ⊗ K − ). It is this failure of factorisation that induces the non-trivial pairwise correlational structure in the equilibrium state.
Using T 1 and T 2 , we may write down the conditional expectation values in Eq. (7). We define the vector T = −1 T | +1 T , where T denotes the transpose and the sdimensional vector 1 is a vector all of whose components are unity. This vector weights synaptic states according to their two possible strengths. Then, and and for the other two pairwise expectation values in Eq. (7b), we replace M + ⊗ M + in Eq. (9b) by M + ⊗ I for +× and I ⊗ I for ××. Since T 1 − I = f g (M − I), we have μ(t) = f T e (M−I) f grt M + A 1 , so that sparse coding just introduces a multiplicative factor of f and scales the rate r by the product f g in μ(t). In the equilibrium limit, by definition exp[(T n − I ⊗ · · · ⊗ I)rt] v → A n for any state v corresponding to a probability distribution, as t → ∞. Hence, μ(∞) = f T A 1 ≡ 0, which always follows when potentiation and depression processes are treated symmetrically. For the equilibrium variance, we obtain The second, covariance term does not in general vanish because of the equilibrium synaptic pairwise correlations.
These general results allow us to obtain SNR lifetimes when the matrices M ± are specified for any particular model of synaptic plasticity. We defer the derivation of the transition matrix elements Prob[ h | h ] that are required for FPT lifetimes until we explicitly discuss simple models of synaptic plasticity in Sect. 3.1.

Hopfield protocol
Although the Hebb protocol is intuitive as a means of exploring memory lifetimes in an associative memory system, its non-trivial equilibrium distribution of synaptic states is awkward. To avoid this awkwardness, we may consider an alternative protocol that is nevertheless equivalent to the Hebb protocol in the limit of small f gN . We first define the protocol and then demonstrate the equivalence.
During memory storage, instead of defining cue and target sub-populations, we now specify the entire activity pattern, representing a memory, across the whole population of neurons. We allow these activities to take values from the set Here, the values ±1 represent evoked activity (the neuron is involved in memory storage), with +1 (respectively, −1) representing a strongly (respectively, weakly) tetanising stimulus in the usual LTP (respectively, LTD) sense. In contrast, the values ±ζ represent spontaneous activity (the neuron is not involved in memory storage). For a single perceptron, this amounts to specifying memory vectors ξ α with components ξ α i taking one of these four values, and also specifying the perceptron's required output in response to an input vector, where this output is drawn from the same set with the same probabilities, but with f replaced by g as usual. We can track the perceptron's activation when its required output is either +1 or −1, but by symmetry its activation would differ only by a sign between these two cases, so for concreteness we just take the required output to be +1 during the storage of ξ 0 . As with the Hebb protocol, a synapse does not experience a plasticity induction signal if its presynaptic input or the postsynaptic perceptron itself is only spontaneously active. However, if both input and perceptron are active, then the synapse experiences a plasticity induction signal, either potentiating if both activities are the same or depressing if different. This is just the standard Hopfield rule (Hopfield 1982), so we refer to this protocol as the Hopfield protocol: we obtain a pattern of synaptic plasticity induction signals in response to evoked activity that is identical to the standard Hopfield rule, but supplemented by the presence of spontaneous activity that does not induce synaptic plasticity. Figure 2 summarises the Hopfield protocol, showing all allowed combinations of pre-and postsynaptic activities during memory storage, together with their associated probabilities and induced plasticity induction signals. Depressing and potentiating induction signals both occur with the same overall probability 1 2 f g as in the Hebb protocol. Computing μ(t) and σ (t) 2 for the Hopfield protocol and using the various symmetries E [S 1 (t) These are structurally identical to the expressions in Eq. (7) for the Hebb protocol, except that the linear terms in ζ are absent because of cancellation. Had we instead used a single level ζ of spontaneous activity rather than the two levels ±ζ , we would have obtained identical linear terms, too. Writing down the transition operators T 1 and T 2 in the Hopfield protocol, we obtain with immediate generalisation to T n . The (marginal) equilibrium distribution of a pair of synapses' states is therefore determined by the unit eigenstate of K⊗K and thus of M⊗M, and so is just A 2 = A 1 ⊗ A 1 ; again, generalisation to A n is immediate. The result is that all conditional expectation values involving at least one synapse that does not experience a plasticity induction event during the storage of ξ 0 vanish, when potentiation and depression processes are treated symmetrically. So, whether we use four-level or three-level activities in the Hopfield protocol, the ζ -dependent contributions to the covariance term in Eq. (11b) drop out, as indicated, so that the variance is affected only by the ζ 2 term in the first term on the right-hand side (RHS) of Eq. (11b). Moreover, the covariance term vanishes entirely in the large t, The equivalence of the Hebb and Hopfield protocols in the limit of small f gN is now clear. The corresponding transition matrices T 1 are in any case identical for both protocols, and hence so are the means. For T 2 , in both protocols we have that and for general T N the O( f g) term on the RHS contains N terms, each of which contains N − 1 factors of I and just one factor of M − I. This structure reflects the fact that in the limit of small f gN , at most one of the perceptron's synapses experiences a plasticity induction signal, regardless of the protocol. The corresponding unit eigenstate of T N in this limit is just A 1 ⊗ · · · ⊗ A 1 , regardless of the protocol. Therefore, in the small f gN limit, the equilibrium distribution of synaptic states in the Hebb protocol reduces to that in the Hopfield protocol, and all statistical properties of h(t) must therefore also reduce in the same way. The Hopfield protocol therefore offers a way of extrapolating the small f gN behaviour of the Hebb protocol to larger f without the awkwardness of the Hebb protocol's equilibrium structure in this regime. Furthermore, the simpler form of the results in the Hopfield protocol allow us to use it to extract the scaling properties of memory lifetimes as a function of small f (or g) in both protocols.
For the non-sparse-coding case of f = 1, spontaneous activity does not contribute to the Hopfield protocol's dynamics, and we recover precisely the Hopfield model with discrete-state synapses. For f < 1, we expand the possible activities of neurons to allow for spontaneously active neurons that are not involved in memory storage. Thus, although the Hopfield protocol provides a convenient tool for examining the small f gN limit of the Hebb protocol, we also regard the Hopfield protocol as a fully fledged protocol in its own right, because it constitutes a very natural way of examining sparse coding with a Hopfield plasticity rule.

Population memory lifetimes
So far we have focused on the memory dynamics of a single perceptron. We now consider the memory dynamics of the entire population of P neurons. We do this only for the Hopfield protocol for simplicity. The tracked memory will evoke activity in a sub-population of on average g P neurons. In an experimental protocol, during the storage of the tracked memory we can at least in principle explicitly identify all those neurons that are active, and then subsequently track all their activities during later re-presentations of the tracked memory. Because of synaptic coupling between these tracked neurons and the other on average (1 − g)P neurons, spontaneous activity in the other neurons will affect and potentially degrade the activation of the tracked neurons upon re-presentation of the tracked memory, affecting the tracked neurons' ability to read out the tracked memory. But, as we are only concerned with the tracked neurons' read-out of the tracked memory, we do not need to explicitly track the activities of all these other neurons: their activities do not directly form part of the memory signal from the tracked neurons.
In the Hopfield protocol, a tracked neuron will by definition have an output of +1 or −1 during memory storage. For a single perceptron, we focused on an output of +1 without loss of generality. A perceptron with an initial output of −1 Fig. 2 Schematic illustration of the Hopfield protocol for memory storage. The format of this figure is essentially identical to that for the Hebb protocol in Fig. 1, except that labels indicating postsynaptic roles are not required. To avoid duplication, spontaneously active neurons are shown with both possible spontaneous activity levels, ±ζ ; the corresponding probability is for each of these levels rather than for both will have identical dynamics to one with an initial output of +1, except that the activation will be reversed in sign. Therefore, we can just define the memory signal for any active perceptron to be ±h(t), depending on this sign. Denoting the moment generating function (MGF) of +h(t) for a tracked neuron with an initial output of +1 by M (z; t), the MGF for −h(t) for a tracked neuron with an initial output of −1 will also be just M (z; t). All tracked neurons therefore have the same MGF for their memory signals.
Suppose that P eff neurons form the sub-population that stores the tracked memory, where P eff is binomially distributed with parameter P and probability g. Although these neurons' activations will not in general evolve independently, as an extremely coarse approximation we assume that their activations do evolve independently during subsequent memory storage (cf. Rubin and Fusi 2007). Population memory lifetimes obtained from this simplifying assumption will therefore only be theoretical, and perhaps very loose, upper bounds on exact memory lifetimes. With this simplification, the MGF for the memory signal from the tracked sub-population is then [M (z; t)] P eff , by independence. Averaging over P eff , the MGF of the population memory signal is then just [(1 − g) + gM (z; t)] P . The mean, μ p (t), and variance, σ p (t) 2 , of this population signal follow directly.
Ignoring covariance terms (or considering the limit t → ∞ for the variance), we have μ p (t) = g Pμ(t) and σ p (t) 2 ≈ g Pσ (t) 2 , where μ(t) and σ (t) 2 are the single-perceptron mean and variance above. 1 Hence, the population SNR, , is just scaled by the factor √ g P relative to the single-perceptron SNR, so The population SNR memory lifetime, which we denote by depends on N = N P, the total number of synapses in the memory system, but it also contains the additional factor of √ g compared to SNR(t), which modifies scaling behaviour compared to single-perceptron results.

Simple synapses: the stochastic updater
The simplest model of synaptic plasticity to consider is one in which synapses lack any internal states so that s = 1, and given a plasticity induction signal, they change strength (if possible) with some fixed probability p (Tsodyks 1990). Because a synapse just changes its strength stochastically in this model, we have called such a synapse a "stochastic updater" (SU; Elliott and Lagogiannis 2012). The underlying strength transition matrices are then and so where we define ψ = f p for convenience. The equilibrium distribution of a single synapse's strength in both protocols is just the normalised unit eigenvector of K, or A 1 = 1 2 (1, 1) T . For the Hopfield protocol, any pair of synapses' strengths has the equilibrium distribution A 2 = A 1 ⊗ A 1 . For the Hebb protocol, we require the unit eigenstate of 1 2 (K + ⊗ K + + K − ⊗ K − ), which gives where κ 2 = ψ/(2 − ψ). The quantity κ 2 determines the pairwise correlations present in this state, since T ⊗ T A 2 ≡ κ 2 . For f → 0, κ 2 → 0 and A 2 → A 1 ⊗ A 1 . With these equilibrium distributions, we may explicitly compute μ(t) and σ (t) 2 in both protocols using Eqs. (7) and (11). For the common mean, we obtain and for the two variances, we need the various correlation functions in Eqs. (7b) and (11b). For the Hebb protocol, these are and for the Hopfield protocol, we just set κ 2 = 0 in these equations. These results allow us to determine SNR lifetimes for simple, SU synapses. Approximating the Hopfield variance by its asymptotic form σ (∞) 2 , the single-perceptron SNR memory lifetime in the Hopfield protocol for a stochastic updater is then and for the population SNR lifetime τ pop , we replace To and for the second jump moment, we get for the Hebb and Hopfield protocols, respectively. We have explicitly indicated the dependence of B(h ) on N eff , where N eff is the number of a perceptron's synapses that are active during the storage of ξ 0 . We write where we separate out the quadratic dependence on h and it is convenient to remove an overall factor of ψ g from the definition of B 0 (N eff ). Dropping the quadratic term from B(h | N eff ) is equivalent to considering dynamics based on the Ornstein-Uhlenbeck process (Uhlenbeck and Ornstein 1930), which we have found to be a very good approximation (Elliott 2014(Elliott , 2017a(Elliott , 2019, so we work with just the constant term. For the MIE approach to FPTs, a technical difficulty as discussed in Appendix A requires us to restrict to the specific case of ζ = 0 only. We use numerical methods to obtain FPT lifetimes from the MIE approach, but for small f N, the dynamics are dominated by N eff = 1. For N eff = 1 and ϑ = 0, Eq. (4) is trivial because the only contribution to the sum involves no transition, occurring with probability 1 − 1 2 ψ g regardless of the protocol. Writing σ 2 fpt as the variance in the FPT, we obtain at leading order, for small f (= g) in both protocols. We see that τ mfpt scales as 1/ f in this regime, but that σ fpt scales as 1/ f 3/2 . Although σ fpt swamps τ mfpt for small f , τ mfpt is nevertheless robustly positive. We may use our earlier results to obtain the corresponding forms for the FPE approach to FPT lifetimes for small f (see Eqs. (3.29) and (3.30) in Elliott 2017a). We obtain In contrast to Eq. (23), now τ mfpt scales as 1/ f 2 and not 1/ f , and σ fpt scales as 1/ f 2 and not 1/ f 3/2 . Moreover, in the FPE approach, the FPT moments have lost their overall scaling with N . Although the forms in Eq. (24) are obtained using mean field approximations that are expected to be invalid when f N is small, in fact we obtain the same scaling behaviour when the expectation values are obtained by averaging properly over h 0 and N eff . Our simulation results, discussed in Sect. 4, agree with the behaviour in Eq. (23). Therefore, the failure of the FPE approach for small f N in Eq. (24) is due to the approximations intrinsic to the FPE approach itself. These include the diffusion and especially the continuum limit. For small f N, the system is nowhere near the continuum limit, so the scaling behaviour must be incorrect there.

Complex synapses
We now turn to models of complex synapses that have internal states, so that s > 1. In such models, synapses can undergo metaplastic changes in their internal states without expressing changes in synaptic strength. We will only consider SNR lifetimes in relation to complex synapses. We have studied FPT lifetimes for filter-based synaptic plasticity for both bistate (Elliott 2017b) and multistate (Elliott 2020) synapses in a non-sparse coding context, but we have yet to consider other models of complex synapses. We therefore restrict to SNR lifetimes, but with the caveat that they are valid only in an asymptotic regime. We have discussed filter-based models of synaptic plasticity at length elsewhere (Elliott 2008;Lagogiannis 2009, 2012;Elliott 2016b), so we only briefly summarise them here. Synapses are proposed to implement a form of low-pass filtering by integrating plasticity induction signals in an internal filter state. Synapses then filter out highfrequency noise in their induction signals and pass only low-frequency trends, rendering them less susceptible to changes in strength due to fluctuations in their inputs. Potentiating (respectively, depressing) induction signals increment (respectively, decrement) the filter state, with synaptic plasticity being expressed (if possible) only when the filter reaches an upper (respectively, lower) threshold. For symmetric potentiation and depression processes, we may take these thresholds to be ± . The filter can occupy the 2 − 1 states −( − 1), . . . , +( − 1), with the thresholds ± not being occupiable states. Several variant filter models are distinguishable by their different dynamics upon reaching threshold (Elliott 2016b), but we consider only the simplest of them here. In the simplest model, the filter always resets to the zero filter state upon reaching threshold, regardless of its strength state and regardless of the type of plasticity induction signal. This filter generalises to any multistate synapse. If the synapse is saturated at its upper (respectively, lower) strength state and reaches its upper (respectively, lower) filter threshold upon receipt of a potentiating (respectively, depressing) induction signal, the filter resets to zero despite the fact that it cannot increment (respectively, decrement) its strength. The transitions for this filter for the case of = 3 are illustrated in Fig. 3A. Although for clarity we have shown all permitted transitions between all filter and strength states, we stress that each synapse possesses only a single synaptic filter: the filter is not duplicated for each strength state. Transitions in filter state occur independently of strength state. Nevertheless, to describe transitions in the joint strength and filter state, we require 2(2 − 1) × 2(2 − 1) matrices, so s = 2 − 1, although the number of required physical states for filter-based synapses is just 2 − 1 for the filter states themselves, and an additional, binary-valued variable for the bistate strength, so a total of 2 states.
We state without derivation the result for μ(t) in this filter model: where · denotes the floor function. This expression is obtained from Eq. (4.24) in Elliott (2016b) just by multiplying by f and inserting a factor of f g into the exponents. This result is required for obtaining SNR lifetimes. The pairwise correlation functions required for σ (t) 2 are computed via numerical matrix methods using the matrices M ± for the filter model (given in Elliott (2016b) or implied by the transitions in Fig. 3A), and we also obtain the Hebb equilibrium distribution A 2 by numerical methods.
To estimate SNR lifetimes in the filter model for the Hopfield protocol, we consider the slowest decaying mode in the first and second terms of Eq. (25). For non-sparse coding, it is usually enough to consider just the slowest mode in the first term, but with sparseness, both terms must be considered for a better approximation. For large enough, we then have Approximating the Hopfield variance by its asymptotic form σ (∞) 2 , the single-perceptron SNR memory lifetime for the filter model is then where in deriving this expression, we have regarded the second term as a correction to the first term, with the first term arising purely from the first term in Eq. (26). To obtain the population SNR memory lifetime τ We also consider the serial synapse model (Leibold and Kempter 2008;Rubin and Fusi 2007). In this model, a synapse performs a symmetric, unbiased, one-step random walk on a set of 2s states between reflecting boundaries. The first (respectively, second) group of s states are identified as corresponding to strength −1 (respectively, +1). For each strength state, there are thus s metastates. If a synapse has strength −1 (respectively, +1) and experiences a sequence of depressing (respectively, potentiating) induction signals, then it is pushed into progressively higher metastates. However, the synapse can only change strength when in the lowest, i = 1 metastate. The transitions are illustrated in Fig. 3b. The transition matrices M ± are just where diag u and diag l denote the upper and lower diagonals, respectively. The eigen-decomposition of M = 1 2 (M + +M − ) is standard (cf. Elliott 2016a, for the eigendecomposition of the similar matrix C there), so we can directly evaluate . (29) For the Hebb protocol, we again use numerical matrix methods to obtain A 2 . To estimate SNR lifetimes for the Hopfield protocol, it is sufficient to consider just the slowest decaying term in Eq. (29), giving for s large enough, and hence as the required approximation, with τ ser pop obtained in the usual way.
In the cascade model of synaptic plasticity (Fusi et al. 2005), there are also 2s metalevels, s for each bistate strength state, but unlike the serial synapse model, a potentiating (respectively, depressing) induction signal for a synapse with strength −1 (respectively, +1) in metastate i can with probability 2 1−i (or 2 2−i for i = s) cause the synapse to change strength and return to metastate i = 1. The same probabilities govern transitions to higher metastates. The transitions are illustrated in Fig. 3C. The cascade model essentially constitutes a tower of stochastic updaters that progressively render the synapse less labile. We have extensively analysed the cascade model elsewhere (Elliott and Lagogiannis 2012) and compared its memory performance to filter-based synapses, which outperform the cascade model in almost all biologically relevant regions of parameter space (Elliott 2016b). It is possible to obtain analytical results for the Laplace transform of the mean dynamics in the cascade model (Elliott and Lagogiannis 2012), but here we use numerical matrix methods. Rubin and Fusi (2007) give a formula for the SNR based on finding a fit to numerical results. The implied formula for the mean is Taking the asymptotic variance σ (∞) 2 in the Hopfield protocol, we can then use the expression μ cas (t)/σ N (∞) for the SNR. This still cannot be solved analytically for the SNR lifetime τ cas snr (or the population form τ cas pop ), but we can use it to obtain numerical solutions that can be compared to results obtained from exact matrix methods.
A serial or cascade synapse possesses 2s states, with each set of s metalevels duplicated for each strength. Metalevel i for strength −1 cannot be identified with metalevel i for strength +1 because the transitions induced by plasticity induction signals are in opposite directions. This is in contrast to the filter model, in which the filter transitions are independent of the strength state. Serial and cascade synapses therefore possess fully 2s physical states characterising the state of a synapse, while a filter synapse possesses 2 physical states and not 2(2 − 1) states. Hence, we may directly compare the performance of a filter synapse with threshold to a serial or cascade synapse with a total of 2s metastates, or s metastates per strength state.

Results
We now turn to a discussion of our results, comparing and contrasting the various models of synaptic plasticity considered above, for the Hebb and Hopfield protocols. For simplicity we consider simulation results only for SU synapses, to confirm and validate our analytical results. Simulations are run according to protocols discussed extensively elsewhere (see, for example, Elliott and Lagogiannis 2012; Elliott 2014), but modified to allow for sparse coding. We first consider single-perceptron memory lifetimes and then population memory lifetimes.

Single-perceptron memory lifetimes
In Fig. 4, we show results for memory lifetimes for SU synapses with no spontaneous activity, ζ = 0, comparing the Hopfield and Hebb protocols. We consider both FPT and SNR lifetimes, and for FPT lifetimes, we show results for both the FPE and MIE approaches. Simulation results are also shown, although only for f ≥ 10 −3 : for smaller values it becomes increasingly difficult to obtain enough statistics for decent averaging due to the longer simulation run times. We select an update probability of p = 1/10, which is our standard choice of p in earlier work (see, for example, Elliott 2014). From Eq. (24), r τ mfpt and r σ fpt are expected to scale as 1/ f 2 for small f for the FPE approach, so we remove this scaling by multiplying by f 2 , which in this figure affords greater clarity and resolution.
Above we showed that the Hopfield and Hebb protocols must coincide for f 1/ √ N . For the various choices of N used in Fig. 4, we see this convergence of both proto-A B C D Fig. 4 Convergence of Hebb and Hopfield protocol results for stochastic updater synapses in the limit of sparse coding. Scaled singleperceptron memory lifetimes are shown as a function of sparseness, f . Results in red (respectively, blue) correspond to the Hopfield (respectively, Hebb) protocol. Shaded regions indicate f 2 r (τ mfpt ± σ fpt ) (with the central solid line showing f 2 r τ mfpt ) computed using the FPE approach to FPTs, so that we show the (scaled) MFPT τ mfpt surrounded by the one standard deviation region around it, governed by σ fpt . Shortdashed lines show f 2 r τ mfpt obtained using the exact, MIE approach to FPTs. Circular data points correspond to results from simulation, for f ≥ 10 −3 . Long-dashed lines show results for f 2 r τ snr ; r τ snr = 0 for the Hebb protocol over the whole range of f in panel A. The value of N is indicated in each panel. In all panels, p = 1/10, ζ = 0 and ϑ = 0 cols' results, becoming indistinguishable for f below 1/ √ N , for all forms of memory lifetime. Focusing first on r τ mfpt from the FPE approach, for smaller N we clearly see that f 2 r τ mfpt asymptotes to a common, N -independent constant as f becomes small; we would see the same behaviour for larger N too, but would need to take smaller values of f than those used in this figure. We also see that f 2 r τ mfpt from the MIE approach tracks that from the FPE approach quite closely, and indeed for intermediate values of f and smaller choices of N , it plateaus, so that r τ mfpt scales as 1/ f 2 in this regime. However, for N = 10 3 and f 10 −3 , we clearly see the MIE f 2 r τ mfpt turn downwards and approach zero as f decreases. This behaviour is consistent with the derived form of the exact scaling behaviour in Eq. (23), in which r τ mfpt ∝ 1/ f for small f . We also just see this change for N = 10 4 for f close to 10 −4 , but for larger N we would need to take f smaller to see the 1/ f scaling of the exact form of r τ mfpt . Our simulation results agree with the results from the MIE approach, validating both. Although we do not take f small enough to see the switch to 1/ f scaling for N = 10 3 in Fig. 4A, we nevertheless do clearly see the start of the down-turn at f = 10 −3 .
For f > 1/ √ N in Fig. 4, we see very significant differences between the Hebb and Hopfield protocols. While for the Hopfield protocol f 2 r τ mfpt grows like log e N for f N large enough, this is not the case for the Hebb protocol. For f in the region of unity, f 2 r τ mfpt is roughly speaking independent of N . This means that the dynamics are dominated by the correlations between pairs of synapses' strength in the Hebb protocol. For f = 1, we obtain r τ mfpt ≈ 5.34 and 5.35 for N = 10 3 and N = 10 6 , respectively, from the FPE approach. (The corresponding values from the MIE approach are 6.64 and 6.79, respectively.) In the regime of f not too far from unity, memory lifetimes in the Hebb protocol are therefore significantly reduced by the synaptic correlations induced by this protocol, where the influence of these correlations cannot be removed by increasing N .
We see that r τ mfpt is robustly positive in Fig. 4 for all choices of N over the whole range of displayed f , and it remains so for small f because of the discussed scaling behaviour. However, looking at the one standard deviation region around r τ mfpt , it is clear that in some regimes for f , there can be high variability in FPT memory lifetimes. For the Hopfield protocol, this regime of high variability occurs for small f (where what is "small" f depends on N ), while in the Hebb protocol, there is an additional regime for f close to unity. High variability does not mean that memories cannot be stored: r τ mfpt is always robustly positive. Rather, high variability simply means that some memories are stored strongly while others are stored weakly or not at all.
Turning to a consideration of r τ snr , we see from Fig. 4 that r τ snr exists (i.e. r τ snr > 0) in precisely those regions of low variability in FPT lifetimes. Indeed, the results for r τ snr track quite closely those for r (τ mfpt − σ fpt ) over some range of f , and deviate from it elsewhere. We have shown in a nonsparse coding context that FPT and SNR lifetimes for simple synapses essentially coincide (up to additive constants) in the regime in which the distribution of h 0 is tightly concentrated around its supra-threshold mean (Elliott 2017a). For the specific case of ϑ = 0, as here, we showed that if we can write the initial variance σ (0) 2 in the form σ (0) 2 ≈ B 0 (N )/2, then the parameter μ ≡ μ(0) √ 2/B 0 (N ) must be large enough, which means μ 2 (Elliott 2017a). We then have that μ ≈ μ(0)/σ (0) 2, which is just a condition on the initial SNR. 2 Using the pre-averaged form B 0 (N eff ) N eff (see Appendix A), this condition reduces to 4/( p 2 N ) f in the Hopfield protocol for ζ = 0. For the Hebb protocol, the limit of large N with p not too close to unity additionally satisfies the requirement on σ (0) 2 , giving the upper bound f p/2 for ζ = 0. In the Hebb protocol, we therefore have the interval 4/( p 2 N ) f p/2 for equivalence of SNR and FPT memory lifetimes, for ζ = 0. (We must have N 8/ p 3 for this interval to exist.) With p = 1/10 in Fig. 4, these conditions are 400/N f and 400/N f 0.05 for the Hopfield and Hebb protocols, respectively. For 400/N f in both protocols (except for the Hebb protocol for N = 10 3 , where the bounding range of f is invalid), we do indeed see that the FPE results for f 2 r τ mfpt and those for f 2 r τ snr run essentially parallel to each other, but that for f < 400/N , f 2 r τ snr peels away from f 2 r τ mfpt . The same is true for the Hebb protocol for N > 10 3 : as f increases above 0.05, f 2 r τ snr also peels away from f 2 r τ mfpt . Thus, these two estimates for the two protocols appear to capture well the region of f for which r τ snr is a reliable indicator of memory longevity. SNR lifetimes are therefore acceptable surrogates for FPT lifetimes when the latter are subject to low variability, but outside these regions SNR lifetimes fail to capture the possibility of memory storage, albeit with high variability. Importantly, the requirement that f 4/( p 2 N ) in both protocols means that the SNR approach cannot be extended to very small or just small f , because such values violate the asymptotic regime. Essentially, then, the SNR approach cannot probe the very sparse coding regime in either protocol.
For the Hopfield protocol, Eq. (20) is just With ζ = 0, we require f > 1/( p 2 N ) for r τ snr > 0, and we see precisely these threshold values for the different choices of N in Fig. 4. Alternatively, we require N > 1/( f p 2 ) for memories to be stored according to the SNR criterion. However, these conditions do not carry over to FPT memory lifetimes: we need neither a minimum N nor a minimum f for r τ mfpt > 0, because it is always positive. This failure of SNR conditions to carry over to the FPT case also applies to any optimality conditions derived from r τ snr . From Eq. (33) with ζ = 0, we may find that value of f , f opt , that maximises r τ snr , giving rise to r τ opt snr , with the result that f opt = √ e/( p 2 N ). The same value essentially applies to the Hebb protocol, albeit with complicated corrections. However, for the validity of the SNR results, both protocols require f 4/( p 2 N ). If the SNR optimality condition is valid, then it must satisfy f opt = √ e/( p 2 N ) 4/( p 2 N ), or √ e 4. This is clearly false, and hence the SNR optimality condition for f is spurious, because at f = f opt , the asymptotic validity condition is violated. In fact, we may essentially take f as small as we like and r τ mfpt will continue to grow, albeit with increasing variability in the FPT lifetimes. Thus, although we will shortly consider optimality conditions for SNR memory lifetimes with complex synapses, these conditions must be viewed with extreme caution. Figure 4 considers only the case of exactly zero spontaneous activity, ζ = 0. In Fig. 5, we examine the impact of spontaneous activity on SU memory lifetimes. We show only the case of N = 10 5 to avoid unnecessary clutter, but the results are qualitatively similar for other choices of N . In the Hopfield protocol, ζ appears only through a quadratic term in B(h ) or σ (t) 2 , while in the Hebb protocol, ζ also appears through a linear term. This difference makes the Hebb protocol much more sensitive to spontaneous activity than the Hopfield protocol, and we see this explicitly in Fig. 5. In the Hopfield protocol, the asymptotic variance takes the form σ (∞) 2 = [ f + (1 − f )ζ 2 ]/N , so ζ exerts a significant influence on memory lifetimes only for f ζ 2 . We therefore only start to see a divergence of memory lifetimes from those for ζ = 0 at around f ≈ ζ 2 , and this is confirmed in the figure. However, as f is taken small, the dependence of r τ mfpt (from the FPE) on ζ is lost (just as its dependence on N is lost), so that for very small f , ζ does not affect (FPE) FPT lifetimes, neither their means nor their variances. This is because for small f , the scaling results in Eq. (24) depend only on the A and not the B jump moment, so they depend only on drift and not diffusion. However, ζ appears only through the diffusion term. In contrast to the Hopfield protocol, even a choice of ζ = 0.01 induces a large reduction in memory lifetimes in the Hebb protocol, at least away from the small f regime. For small f , the Hebb and Hopfield protocols coincide, so we observe the same loss of dependence on ζ in (FPE) FPT lifetimes in the Hebb protocol. However, away from the small f regime, the linear term in ζ in B or σ (t) 2 significantly impacts memory lifetimes.
Examining Eq. (33) for the Hopfield protocol, for ζ = 0, we have just f 1 in the logarithm, while for ζ = 1, we have f 2 . Roughly speaking, for intermediate values of ζ , the effective power of f switches rapidly from one to two in the vicinity of f = ζ 2 . This switching can be seen clearly in Fig. 5, where as f decreases, f 2 r τ snr (and also f 2 r τ mfpt ) tracks closely the form for ζ = 0, until it rapidly peels away, following a different power. Although it is still clearly the case that optimality conditions obtained from r τ snr are invalid, it is nevertheless worth examining f opt . For ζ = 0, we again obtain f opt = √ e/( p 2 N ), but for ζ = 1, we instead obtain f opt = √ e/( p 2 N ) 1/2 , so that the N -dependence changes. The corresponding optimal lifetimes are r τ opt snr = p 3 N 2 /(4e) for ζ = 0 and r τ opt snr = pN /(2e) for ζ = 1. Of course, we see explicitly in Fig. 5 that these SNR-derived optimal values of f and thus maximum possible SNR lifetimes are invalid, but SNR lifetimes do at least indicate when FPT lifetimes are subject to lower variability and when they are subject to higher variability.
Considering ζ = 1 is of course biologically meaningless, as then there is no distinction between spontaneous and evoked electrical activity levels. However, taking either ζ = 0 or ζ = 1 allows explicit optimality results to be obtained for these two cases, while such results are not available for intermediate values of ζ . As just indicated, empirically we observe a very rapid switching in dynamics in the vicinity of ζ = √ f , with the explicit results for ζ = 0 and ζ = 1 therefore indicating the general behaviour prior to and after, respectively, this switching. When we give results for ζ = 1, we therefore do so with this understanding: that the limit is biologically meaningless, but that it nevertheless indicates the general behaviour for ζ in excess of around √ f . We now turn to complex models of synaptic plasticity, considering only SNR lifetimes. In Figs. 6 and 7, we plot SNR lifetimes against sparseness, f , for the three complex models discussed above, for both zero and nonzero spontaneous firing rates, and for the Hopfield (Fig. 6) and Hebb (Fig. 7) protocols. All results are obtained by numerical matrix methods to solve the SNR equation μ(τ snr ) = σ (τ snr ), where the standard deviation σ (t) is computed fully rather than via just its asymptotic form σ (∞).
For the Hopfield protocol in Fig. 6 we see in all cases and for all choices of parameters an onset of SNR lifetimes for a minimum, threshold value of f , the rapid attainment of a peak or optimal value of r τ snr , followed by a steady fall in lifetimes as f increases further. For all complex models, this onset of SNR lifetimes occurs for increasingly large values of f as or s increases. At least for the parameter ranges in this figure, in the filter and serial models, for a given choice of f , increasing or s increases r τ snr , although as the number of internal states continues to increase, ultimately r τ snr will start to fall. In the case of the cascade model, however, the dependence of r τ snr on s for fixed f is not as simple as for the other complex models. We note that for all models in this figure, the optimal values of r τ snr decrease for increasing or s, at least for ζ = 0. However, when we increase the spontaneous activity to ζ = 0.1, the optimal values lose most of their dependence on or s in the filter and serial models, although not in the cascade model. This loss of dependence on or s is strongly N -and ζ -dependent. For N = 10 3 , we must take ζ close to unity before this loss of dependence is noticeable, while for N = 10 6 , even ζ = 10 −3/2 ≈ 0.0316 is sufficient.
For the Hebb protocol, in Fig. 7, for smaller f we obtain essentially the same results as for the Hopfield protocol because these two protocols must coincide for f 1/ √ N , regardless of the model of synaptic plasticity. However, for larger f , the synaptic correlation terms induced by the Hebb protocol again significantly impact SNR memory lifetimes, with the impact being greater for larger or s in the filter and serial models. Thus, as with SU synapses under the Hebb protocol, SNR lifetimes exist only in some interval of f (below f = 1), with this interval shrinking and disappearing as the number of internal states increases (or as p decreases for SU synapses). These dynamics dramatically limit the number of internal states that give rise to positive SNR lifetimes. Nev-

Fig. 5
Impact of spontaneous activity on stochastic updater singleperceptron memory lifetimes. Results are shown for f 2 r τ mfpt (from the FPE approach) and f 2 r τ snr for both the Hopfield and Hebb protocols, as indicated in the different panels. Different line styles correspond to different levels of spontaneous activity, ζ , as indicated in the common legend in panel D. Some lines style are absent in panel D because there is no corresponding r τ snr > 0. In all panels we take N = 10 5 , with p = 1/10 and ϑ = 0 in all cases ertheless, as or s increases, r τ snr in general increases, at least until the permissible range of f becomes very small and then disappears entirely. For the cascade model, however, the upper limit on f is roughly speaking independent of s, but we also see that in general, as s increases, r τ snr decreases for fixed f . This relative insensitivity of the upper limit of the permissible range of f to s in the cascade model occurs because the cascade model has different metastates with different update probabilities, with some synapses residing in the lower metastates and so having larger update probabilities than those residing in higher metastates.
In the presence of spontaneous activity, we see a dramatic change in the memory lifetimes. Indeed, such is the sensitivity of the Hebb protocol to ζ , especially for complex synapse models, that in contrast to Fig. 6 for the Hopfield protocol, for which we took ζ = 0.1, in Fig. 7 we take ζ = 0.01. Even with just 1% spontaneous activity, the filter and serial models' number of internal states becomes severely restricted, in terms of giving rise to positive SNR lifetimes. The cascade model under the Hebb protocol is not quite so sensitive, again because of its different metastates, but a 10% level of spontaneous activity would still dramatically restrict the permissible ranges of f and s, compared to the Hopfield protocol.
We quantify these observations by explicitly considering the optimal choices of the parameters f and either or s, so f opt and either opt or s opt , that maximise r τ snr , giving rise to r τ opt snr . In Figs. 8 and 9, we plot f opt and r τ opt snr against or s, for different levels of spontaneous activity, ζ , for the particular choice of N = 10 5 . Results are obtained both by numerical matrix methods and by using the approxima- We have set N = 10 5 in all panels tions for μ(t) and r τ snr given in Sect. 3.2. For the latter, we maximise r τ snr as a function of f for fixed or s. For the Hopfield protocol in Fig. 8, we see that for ζ = 0, r τ opt snr falls as a function of or s. However, in the filter and serial models, as ζ increases, the fall in r τ opt snr with or s reduces and disappears; indeed, the exact results in fact show a very slight increase in r τ opt snr with or s for ζ = 1, although this behaviour is not noticeable in Fig. 8. For the displayed choice of N = 10 5 , we need only take ζ ≈ 0.1 for the filter and serial models' r τ opt snr to be relatively insensitive to or s. This is N -dependent: for N = 10 6 , even ζ = 0.01 is sufficient; for N = 10 3 , ζ needs to be quite close to unity. In contrast, for the cascade model, r τ opt snr always falls with s for any choice of ζ , including ζ = 1. The behaviour of the filter and serial models' r τ opt snr is easy to extract from the approximate results in Sect. 3.2. Ignoring for simplicity the correction terms in Eq. (27), both filter and serial models' r τ snr can be written in the form (cf. Eq. (33)), where a and b are numerical constants and q denotes or s. For ζ = 0 we obtain f opt = q 2 √ e/(bN ) and r τ opt snr ≈ ab 2 N 2 /(2eq 2 ), while for ζ = 1 we obtain f opt = q √ e/ √ bN and r τ opt snr ≈ abN /e. Therefore, f opt scales differently with q and with N in these two cases, and for ζ = 0, r τ opt snr falls as q increases, but for ζ = 1, r τ opt snr is completely independent of q. Intermediate choices of ζ result in intermediate behaviours between these two extremes, and the corrections terms in Eq. (27) provide only corrections to, rather than fundamentally alter, this behaviour. We see in Fig. 8 that the numerical and approximate analytical results agree well for the filter and serial models, and that, moreover, both these models' optimal values are very similar. Unfortunately, in the case of the cascade model, no such simple analysis, even using the fitted form for μ cas (t) in Eq. (32), is available to explain the fact that r τ opt snr falls with s for all values of ζ , including ζ = 1. The numerical and fitted results for r τ opt snr agree well in the cascade model, although there are quite large discrepancies in f opt obtained from the (exact) numerical methods and the fitted expression, particularly for larger values of s and for ζ closer to zero than to unity. Fitting our numerical matrix results for r τ opt snr in the cascade model to power laws in s and N for large enough s, we find that for ζ = 0, f opt ∼ s 2 /N and r τ opt snr ∼ N 2 /s 4 , while for ζ = 1, f opt ∼ s/ √ N and r τ opt snr ∼ N /s 2 . While the scaling behaviour of f opt is the same as that in the filter and serial models, the dependence of r τ opt snr on q ≡ s differs in the cascade model compared to the filter and serial models.
For the Hebb protocol in Fig. 9, again the pairwise correlation structure present in σ (t) 2 , and the Hebb protocol's extreme sensitivity to even very small levels of spontaneous activity ζ , have a significant impact on optimality conditions. In the filter and serial models, the permissible range of or s is considerably reduced, so that even for ζ = 0.01, s cannot exceed 6 in the serial synapse model, or 5 in the filter model. As N is reduced from the displayed value of N = 10 5 , the permissible ranges of and s reduce. The cascade model in the Hebb protocol is also extremely sensitive to noise, but as discussed, the different metastates' different update probabilities somewhat ameliorate this sensitivity. Nevertheless, increasing ζ from ζ = 0 to just ζ = 0.1 reduces r τ opt snr by several orders of magnitude.
In Figs. 10 and 11 we instead examine opt or s opt as a function of f , rather than vice versa, so that we maximise r τ snr with respect to or s while holding f fixed. For the Hopfield protocol in Fig. 10, subject to a minimum, threshold requirement, opt or s opt increases as a function of f , for any level of spontaneous activity ζ , in all three complex models considered here. However, as ζ moves from ζ = 0 to ζ = 1, the functional dependence of opt or s opt on f changes. We can derive this explicitly by again using the simple expression for r τ snr in Eq. (34) for the filter and serial models. The optimal value of q (either or s) is Thus, as f increases, q opt essentially switches from linear growth in f to slower, √ f growth, at around f ≈ ζ 2 . This behaviour is clearer for the smaller nonzero choices of ζ used in Fig. 10. The corrections due to the additional terms in Eq. (27) do not fundamentally change this behaviour for the filter model. The corresponding optimal SNR memory lifetime is For ζ = 0, r τ opt snr decreases as f increases, but for ζ = 1, r τ opt snr is independent of f . As f increases, the transition from r τ opt snr being independent of f to falling as 1/ f is again sharp, occurring around f ≈ ζ 2 . This transition is clear for the filter and serial models in Fig. 10. In the case of the cascade model, however, although s opt increases with f , albeit according to clearly different power laws than for the filter and serial models, the corresponding value of r τ opt snr always decreases as a function of f , regardless of ζ .
For the Hebb protocol in Fig. 11, again the small f behaviour must be identical to that for the Hopfield protocol in Fig. 10. However, the increase in opt or s opt with increasing f is halted and then reversed as f increases further, as the effects of the pairwise synaptic correlations induced by the Hebb protocol are felt. These correlations not only pull down the optimal value of opt or s opt , but they also have a

A B
C D E F Fig. 10 Optimal synaptic complexity in complex synapse models for single perceptrons in the Hopfield protocol. The format of this figure is very similar to Fig. 8, except that we have optimised with respect to or s rather than f . In panels A and C the lines switch from numerical matrix to approximate analytical results when the corresponding values of opt or s opt exceed 20 in the right-hand panels; before this transition, the lines correspond to numerical matrix results and the discrete points to approximate analytical results. We have set N = 10 5 in all panels

A B
C D E F Fig. 11 Optimal synaptic complexity in complex synapse models for single perceptrons in the Hebb protocol. The format of this figure is essentially identical to Fig. 10, except that approximate analytical results are not available for the Hebb protocol. We have set N = 10 5 in all panels deleterious effect on r τ opt snr , changing the 1/ f behaviour of the filter and serial models in the Hopfield protocol to approximately 1/ f 3 behaviour in the Hebb protocol (obtained by fitting), for ζ = 0. Furthermore, while spontaneous activity can make r τ opt snr independent of f before the switch to 1/ f behaviour in the Hopfield protocol for the filter and serial models, in the Hebb protocol, r τ opt snr always decreases with increasing f , for all three complex models considered here.

Population memory lifetimes
We now turn to population SNR memory lifetimes. Because SNR p (t) ≈ √ g P SNR(t), optimisation of τ pop with respect to or s is not affected by the additional factor of √ g. When we instead optimise population SNR lifetimes with respect to f = g, however, the additional factor of √ g in μ p (t)/σ p (t) compared to μ(t)/σ (t) functionally changes the optima compared to those for a single perceptron, and so we focus on this case. Because of the independence approximation involved in estimating τ pop in Sect. 2.4, τ pop is only an upper bound on population SNR lifetimes, and this will be implicit below. For simple, SU synapses, Eq. (20) indicates that τ snr and τ pop differ in the logarithmic term, with the former having argument We therefore see immediately that single-perceptron SNR lifetimes with ζ = 1 and population SNR lifetimes with ζ = 0 have identical f -dependence. For single-perceptron lifetimes with ζ = 0 and ζ = 1 and population lifetimes with ζ = 0 and ζ = 1, the f -dependence under the logarithm is f 1 , f 2 , f 2 and f 3 , respectively. The effective power of f switches rapidly in the vicinity of f = ζ 2 , in an N -or N -dependent way. Because N = N P N , we expect very rapid switching in the population case, and with only very small, even negligible levels of spontaneous activity being required to induce the change in effective power. Above we found for single-perceptron optimal lifetimes that r τ opt snr ≈ p 3 N 2 /(4e) at f opt = √ e/( p 2 N ) for ζ = 0, and r τ opt snr ≈ pN /(2e) at f opt = √ e/( p 2 N ) 1/2 for ζ = 1. For optimal population lifetimes these become r τ e/( p 2 N ) 1/3 . Spontaneous activity changes the N -dependence of r τ opt snr from N 2 to N , and the N -dependence of r τ opt pop from N to N 2/3 , which latter is a smaller overall reduction, although in all cases the dependence on p involves a positive power. Because the dominant behaviour of SNR lifetimes in the filter and serial models is governed by a similar, single logarithmic term, many of these scaling observations for simple synapses carry over unchanged to these complex synapses.
We examine the behaviour of optimal population SNR lifetimes for complex synapses in Fig. 12. Compared to the single-perceptron optimal SNR lifetimes in Fig. 8, the pop-ulation results in Fig. 12 are markedly different, particularly for the filter and serial models. For these models, with ζ = 0, r τ opt pop is now approximately independent of or s, while with ζ > 0, r τ opt pop grows as a function of or s. Even with ζ = 0.01, this growth is present and of almost the same profile as that for ζ = 1, while at the single-perceptron level, for smaller choices of N it is necessary to take ζ close to unity to halt the decrease in r τ opt pop with increasing or s. This sensitivity to small, nonzero values of ζ at the population level is N -dependent, but even with N = 10 6 (e.g. N = 10 3 and P = 10 3 ), we only require ζ = 0.1 for r τ opt pop to adopt the same profile as that for ζ = 1. For the cascade model, however, optimal population SNR lifetimes fall with s just as they do for single perceptrons. Nonzero ζ does render r τ opt pop nearly independent of s for small s (s ≤ 6), but for larger s, r τ opt pop falls with s. We may quantify the filter and serial models' population SNR lifetimes as before, using the slowest decaying modes. We obtain where now we have f 3 rather than f 2 in the numerator of the logarithm, just as for SU synapses. The optimal values of f are now f opt = q √ e/ √ bN (cf. q 2 √ e/(bN )) for ζ = 0 and f opt = q 2/3 √ e/(bN ) 1/3 (cf. q √ e/ √ bN ) for ζ = 1. The corresponding optimal memory lifetimes are r τ opt pop = abN /e (cf. a 2 bN 2 /(2eq 2 )) and r τ opt pop = 3a(bqN ) 2/3 /(2e) (cf. abN /e), respectively. The corrections due to the additional terms in the filter models' results again modify but do not fundamentally alter this behaviour. Thus, at the population level, for ζ = 0, the filter and serial models' optimal SNR lifetimes are independent of q, while at the single perceptron level, they fall as 1/q 2 . However, for ζ = 1, the population lifetimes grow as q 2/3 , while for a single perceptron, they are constant. We cannot obtain similar analytical results for the cascade model, so we fit the numerical results for the cascade model in Fig. 12 to power laws in s and N . We find that for larger values of s, at the population level r τ opt pop ∼ N /s 2 with f opt ∼ s/ √ N for ζ = 0 (cf. N 2 /s 4 and s 2 /N , respectively, for a single perceptron), and r τ opt pop ∼ N 2/3 /s 4/3 with f opt ∼ s 2/3 /N 1/3 for ζ = 1 (cf. N /s 2 and s/ √ N , respectively, for a single perceptron), with the same rapid switching behaviour for intermediate ζ as for the filter and serial models. The population dynamics soften the fall of r τ opt pop with s, but not enough to turn the dependence into growth with s.
In Table 2 we summarise the scaling behaviour of r τ opt and f opt as functions of either p or q and of either N or N , for simple and complex synapses, for both single-perceptron and population results, for ζ = 0 and ζ = 1. In each column, regardless of the model, f opt scales identically as a function of q or p −1 and of N or N . This is not surpris-

A B
C D E F Fig. 12 Optimal sparseness in complex synapse models for neuronal populations in the Hopfield protocol. The format of this figure is essentially identical to that in Fig. 8, which shows results for the single-perceptron case. Lines show numerical solutions of the equa-tion μ p (τ pop )/σ p (τ pop ) = 1 maximised with respect to f , so r τ opt pop at f = f opt , while data points show approximate analytical results. We have set N = 10 4 and P = 10 8 , or N = 10 12 , in all panels ing given that the SU, filter and serial model results for f opt come from the same dominating logarithmic behaviour, but the cascade results are obtained by fitting numerical matrix data to power laws to extract the behaviour. For τ opt , we also obtain the same scaling behaviour as a function of N or N within each column, again regardless of the model. However, the scaling of τ opt with q (or p −1 ) within each column does depend on the particular model of plasticity. Across a row, moving from single-perceptron ζ = 0 and ζ = 1 results to population ζ = 0 and ζ = 1 results, the dependence of τ opt on q (or p −1 ) changes in such a way that increasing q (or decreasing p) has an increasingly less deleterious effect on memory lifetimes for SU and cascade synapses. For SU synapses, the power of p reduces from 3 to 1 to 1 3 , while for cascade synapses the power of q (or s) changes from −4 to −2 to − 4 3 . For both SU and cascade synapses, optimal memory lifetimes therefore always decrease as p decreases or s (the number of metastates) increases, regardless of the level of spontaneous activity, and regardless of whether at a single-perceptron or population level. For filter and serial synapses, however, the power of q changes from −2 to 0 (i.e. no dependence) to + 2 3 . Increasing the number of serial metastates or filter states available to a filter or serial synapse therefore increases optimal population SNR lifetimes, but only in the presence of spontaneous activity. As Fig. 12 indicates, we need only have very low levels of spontaneous activity to induce this growth of optimal population SNR lifetimes with the number of internal states available to filter or serial synapses.

Discussion
Memory is a complex, multi-level, system-wide phenomenon involving processes occurring over many time scales and across different brain regions, with integrated and orchestrated control processes coordinating, for example, the transition from short-to long-term memory (Eichenbaum and Cohen 2001). Palimpsest models of memory, in which older memories are forgotten as newer ones are stored (Nadal et al. 1986;Parisi 1986), focus on the dynamics of memory storage and retrieval within a single memory system, such as the hippocampal CA3 recurrent network (Andersen et al. 2007). Sparse population coding (see, for example, Csicsvari et al. 2000;Olshausen and Field 2004) enhances memory lifetimes in these memory models by reducing the overall rate of synaptic plasticity at single synapses, so effectively dilating time, and by decorrelating synaptic updates induced by overlapping memories (Tsodyks and Feigel'man 1988). Complex synapse models, involving metaplastic changes in synapses' internal states without associated changes in synaptic strength, have also been proposed as a way in which to enhance memory lifetimes in palimpsest models (Fusi et al. 2005), whereas we introduced models of integrateand-express, filter-based synapses as a means of enhancing the stability of developmental patterns of synaptic connectivity in stochastic models of synaptic plasticity (Elliott 2008;Elliott and Lagogiannis 2009).
Understanding the interaction between sparseness and synaptic complexity in palimpsest memory models is therefore crucial (Leibold and Kempter 2008;Rubin and Fusi 2007). Taken at face-value, our results for SNR singleperceptron memory lifetimes support two conclusions. First, optimal single-perceptron SNR lifetimes, optimised with respect to sparseness, require lower synaptic complexity for longer optimal lifetimes. Second, optimal single-perceptron SNR lifetimes, optimised instead with respect to synaptic complexity, again require lower synaptic complexity as sparseness increases for longer optimal lifetimes. These conclusions hold regardless of the level of spontaneous activity, although spontaneous activity can prevent the decrease in optimal single-perceptron memory lifetimes. These conclusions appear to argue in favour of reduced synaptic complexity in real neurons in the presence of sparse population coding, at least at the single-neuron level. However, at the population level, the first of these conclusions is overturned, at least for filter and serial synapses. Critically, even in the presence of low but nonzero levels of spontaneous activity, optimal population SNR lifetimes, optimised with respect to sparseness, increase rather than decrease with synaptic complexity, for filter and serial synapses but not for cascade synapses. At a population level, sparseness, synaptic complexity and, crucially, nonzero spontaneous activity interact to promote increased optimal population SNR memory lifetimes. It is remarkable that non-cascade complex synapse models therefore appear to require the existence of spontaneous activity in a population setting with sparse population coding.
In reaching these conclusions, we have employed two superficially rather different memory storage protocols. First, the Hebb protocol uses two-level inputs ξ i ∈ {ζ, 1}, while the Hopfield protocol uses four-level inputs ξ i ∈ {−1, −ζ, +ζ, +1}, although in stripping out spontaneous activity, the latter reduces to the standard conventions of the Hopfield model with its two-level inputs ξ i ∈ {−1, +1}. However, because we have considered binary strength synapses with S i ∈ {−1, +1}, the effective contributions of these two sets of inputs to a perceptron's activation are identical: in both protocols, ξ i S i takes the same four possible values. Second, the Hebb protocol uses the cue and target sub-populations approach to determine the direction of synaptic plasticity, while the Hopfield protocol uses the standard Hopfield rule governed by the product of evoked pre-and postsynaptic activity. However, in both protocols, synapses experience identical potentiating and depressing plasticity induction signals at the same separate rates, r f g/2. Furthermore, in both Table 2 Overall dependence of optimal single-perceptron and population SNR memory lifetimes and the corresponding optimal sparseness on model parameters. Here, q represents or s, depending on the complex model, and is assumed large Single-perceptron Population protocols, these induction signals are produced by imposing a pattern of electrical activity on the sub-population of active neurons during memory storage, rather than by allowing neurons' activities to be generated via direct, afferent synaptic drive. Both protocols therefore implicitly assume executive control of memory storage by other brain regions (see, for example, Eichenbaum and Cohen 2001). These two differences are indeed therefore just superficial, and this is reflected in the fact that the mean activation μ(t) evolves identically under both protocols. The real difference between the Hebb and Hopfield protocols does not reside in these matters of convention and definition. Rather, it resides in the fact that an active perceptron's synapses with active inputs experience either only potentiating or only depressing induction signals during memory storage under the Hebb protocol, while in the Hopfield protocol some experience potentiating and others depressing induction signals. This difference gives rise to the Hebb protocol's complicated equilibrium structure, with its nonzero pairwise and higher-order synaptic correlation functions. Remove this higher-order structure, and the two protocols would have identical statistics for perceptron activation. Indeed, in the limit of small f gN in which at most one of a perceptron's synapses experiences a plasticity induction signal during memory storage, the dynamical difference between the two protocols vanishes and their statistical structures become identical. Two earlier studies have considered memory lifetimes in complex models of synaptic plasticity in the presence of sparse population coding (Leibold and Kempter 2008;Rubin and Fusi 2007). Leibold and Kempter (2008) used the cue-target protocol that we have adapted and referred to as the Hebb protocol. They employed synaptic strengths S i ∈ {0, 1} rather than our S i ∈ {−1, +1}, although this difference is unimportant because it just amounts to an effective re-definition of the firing threshold ϑ (Elliott and Lagogiannis 2012). They also employed two-level activities, but with ξ i ∈ {0, 1}, so without considering the possible influence of nonzero spontaneous activity, ζ > 0, on memory lifetimes. Rubin and Fusi (2007) used the Hopfield protocol with two-level activities, ξ i ∈ {−1, +1}, interpreting ξ i = −1 as spontaneous activity and ξ i = +1 as evoked activity, and stressed the importance of considering the impact of spontaneous activity on memory lifetimes. We have modelled spontaneous activity in the Hopfield protocol by moving to four-level inputs, but as indicated, this approach is essentially equivalent to two-level inputs for synapses with S i ∈ {−1, +1} in terms of the overall statistical structure of perceptron activation. However, by using four activity levels, we are able to consider varying ζ over its allowed range in order to explore the impact of different degrees of spontaneous activity on memory lifetimes. A significant difference between our approach and that of Rubin and Fusi (2007) is that we do not allow spontaneous activity to induce synaptic plasticity, a position that we consider to be mandated by a broadly BCM (Bienenstock et al. 1982) view of synaptic plasticity, as discussed earlier. Finally, our respective definitions of the memory signal, from which SNR memory lifetimes are obtained, differ in a population setting. Rubin and Fusi (2007) define this signal over the entire population of neurons, while we define it over only that sub-population of neurons that are directly involved in memory storage (or the equivalent of Leibold and Kempter (2008)'s target sub-population). This difference leads to different scaling behaviours of optimal population SNR memory lifetimes as a function of the sparseness of the population coding.
The difference between the scaling behaviours of optimal SNR memory lifetimes (optimised with respect to sparseness) in the single-perceptron and population cases is intriguing. Furthermore, the role of even very small levels of spontaneous activity in enhancing optimal population SNR lifetimes with increasing synaptic complexity in non-cascade models is fascinating. However, we have cautioned against over-interpreting results from an SNR analysis of memory lifetimes. This analysis depends on the distribution of h 0 being tightly concentrated around its supra-threshold mean. We have shown in earlier work that this requirement is often not satisfied, and that a FPT approach is required to examine memory lifetimes away from this regime (Elliott 2016a(Elliott , 2017a(Elliott , 2020. Here, for simple synapses, we have explicitly seen that the single-perceptron SNR analysis breaks down in the limit of small f , and so it cannot probe the very sparse coding regime. The explanation for this failure is straightfor-ward: as f is reduced, the initial SNR μ(0)/σ (0) reduces, and below some threshold value of f the SNR validity condition μ(0)/σ (0) 2 fails. For a single perceptron, we saw that this condition is N f 2 p 2 / f + (1 − f )ζ 2 4 (for either protocol). Plugging in f opt for SU synapses with ζ = 0 and ζ = 1, this condition becomes √ e 4 and e 4, respectively, where we saw the former case earlier. Both conditions are violated, although with spontaneous activity the violation is not so great. Although we have not extended our FPT analysis of filter-based synapses (Elliott 2017a(Elliott , 2020 to the sparse coding regime considered here, the same issues arise with complex synapses. Therefore, we fully expect single-perceptron SNR optimality conditions to be violated for complex synapses, too. Whether population SNR optimality conditions are violated, in either simple or complex models, is unclear. We would need to extend our single-perceptron FPT analysis to a population setting. Furthermore, this extended analysis would need to be reducible to the population SNR analysis with its rather coarse approximation that neurons' activities evolve independently, despite synaptic coupling. However, it is extremely tempting to speculate that the simple synapse condition for population SNR validity is just the obvious generalisation, namely μ p (0)/σ p (0) 2. Using the population results for f opt for simple synapses, this condition becomes the false e 4 for ζ = 0 and the true e 3/2 4 for ζ = 1. It is thus quite remarkable that if this speculation is borne out by a more careful analysis, then optimal population SNR memory lifetimes for simple synapses are valid in the presence of spontaneous activity. long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.  (t). Let N eff of a perceptron's synapses be active during the storage of ξ 0 . Consider first the Hebb protocol. Immediately before the storage of any subsequent non-tracked memory, let i of these N eff synapses have strength +1 and j of the other N − N eff synapses have strength +1. Then,

A Transition matrix elements and jump moments
Similarly, immediately after the storage of this non-tracked memory, let k and l synapses out of the N eff and N − N eff synapses, respectively, have strength +1, so that Then, the transition operator T N = (1 − g) I ⊗ · · · ⊗ I + 1 2 g K + ⊗ · · · ⊗ K + + 1 2 g K − ⊗ · · · ⊗ K − induces the corresponding transition probability where δ i, j is the Kronecker delta function, and n m ( p) = n C m p m (1 − p) n−m is the binomial probability distribution with n C m being the binomial coefficient. The three terms correspond to the three parts of T N , with the last two just being products of the probabilities for the possible ways in which the different sets of synapses can change strength to give the required transition process. We can also obtain a similar result for the Hopfield protocol. In this case, let i of the N eff synapses and j of the other N − N eff synapses contribute positively to h ; and similarly k and l, respectively, to h. For Coefficient in power series for H (x), or n synapses' equilibrium strength correlation coefficient Exponential generating function for the correlation coefficients κ n ψ j ψ j = ψ (1 − ψ) j example, a synapse with a component ξ 0 = +1 (or +ζ ) and S(t) = +1 and a synapse with a component ξ 0 = −1 (or −ζ ) and S(t) = −1 both contribute positively to i (or j). We and similarly for N h. Then, from T N = (1 − g) I ⊗ · · · ⊗ I + g K ⊗ · · · ⊗ K, we obtain is, up to a multiplicative factor, the h -independent part of B(h | N eff ).
To obtain FPT lifetimes from the FPE approach, we must solve Eq. (6) with its N eff -dependent second jump moment for any particular choice of N eff . For this given value, we must average the resulting solution τ mfpt (h 0 | N eff ) over the initial distribution of h 0 , which also depends on N eff , and then average over N eff according to its binomial distribution, obtaining τ mfpt = τ mfpt (h 0 | N eff ) h 0 >ϑ N eff . However, for large enough f N, it is sufficient to average B 0 (N eff ) over the distribution of N eff and just use the average jump moment ψ g B 0 (N eff ) N eff in Eq. (6). This "pre-averaging" method is similar to a mean field approximation, but goes beyond just replacing N eff with its mean value, f N. We then average the resulting solution τ mfpt (h 0 ) over h 0 , where the unconditional statistics of h 0 = h(0) are given in Eqs. (7) and (11) with the correlations in Eq. (19). Because the FPE is valid only to second order, it suffices to take the distribution of h 0 as a Gaussian with these first-and second-order statistics.
Although Prob[k, l|i, j] determines Prob[h|h ] uniquely, the reverse is in general not the case in the presence of nonzero spontaneous activity. In particular, the equation may have multiple solutions for i and j, given a value of h , depending on the value of ζ . To avoid this awkwardness, for determining FPT lifetimes according to the exact, MIE approach of Eq. (4) or its continuum limit, we restrict to the specific case of ζ = 0. Then, the contributions to h(t) from the inactive inputs drop out, and we need only work with Prob[ k | i ], with the transition processes involving j and l being irrelevant. Since N h = (2i − N eff ) and N h = (2k − N eff ), Prob[ k | i ] uniquely determines Prob[ h | h ] and vice versa. As with Eq. (6), Eq. (4) is then conditioned on N eff synapses contributing positively to the perceptron's activation during the storage of ξ 0 , and so we must also solve Eq. (4) for each value of N eff and compute the same double average, τ mfpt = τ mfpt (h 0 | N eff ) h 0 >ϑ N eff . Where feasible, we always perform this exact calculation. For larger choices of N and values of f closer to unity (for which f N remains sizeable), we move to the mean field approximation, setting N eff = f N (or its closest integer), which is an excellent approximation that makes the calculations tractable. For even larger N (N = 10 6 ), we move to the integral equation form of Eq. (4), corresponding to a continuum limit for h. This limit also works well for smaller values of N , but we prefer exact methods where possible. Unlike the FPE approach, we need the exact distribution of h 0 in the MIE approach, or a good approximation to it, to average correctly over h 0 , and so we need the equilibrium distribution of all synapses' strengths. We give the details of the calculation for the Hebb protocol in Appendix B.

B Hebb equilibrium structure for simple synapses
We require the unit eigenstate of the operator 1 2 K + ⊗ · · · ⊗ K + + 1 2 K − ⊗· · ·⊗K − for all N synapses, where K ± are given in Eq. (16). This operator induces the transition probabilities for the number of synapses of strength +1 in equilibrium (cf. Eq. (40)). These probabilities are the elements of an (N + 1) × (N + 1) matrix whose unit eigenvector determines the equilibrium distribution. Let the (N + 1)-dimensional vector e(N ) with components e i (N ), i = 0, . . . , N , be this eigenvector, with N i=0 e i (N ) = 1. Then e i (N ) is the probability that i synapses have strength +1 in equilibrium, where we indicate the dependence of e i (N ) on N for clarity. We define the probability generating function (PGF) for the columns of the transition matrix, obtaining and also define the PGF for the components of its unit eigenvector e(N ) by writing F N (z) = N i=0 e i (N ) z i . Then, the eigenvalue equation can be written as: Forming an ordinary generating function (OGF) by writing This equation must be solved subject to the two boundary conditions G(1, w) = 1/(1 − w) and G(z, 0) = F 0 (z) ≡ 1. Although Eq. (45) is a nasty functional equation, we can exploit very general, model-independent arguments to simplify it. For indistinguishable synapses, marginalising a general equilibrium distribution A N +1 over one synapse must give A N . Let ε i (N ) be the probability that any i out of N synapses have strength +1 in this general equilibrium distribution A N . The probability of any particular configuration of i such synapses having strength +1 is then just ε i (N )/ N C i . Considering an additional synapse added to these N synapses, it could have strength −1 or +1, and these two new, particular configurations have probabilities ε i (N + 1)/ N +1 C i and ε i+1 (N + 1)/ N +1 C i+1 , respectively. So, we must have which represents the result of marginalising A N +1 over one synapse's strength to obtain A N . Then, This equation has natural boundary conditions: putting i = N + 1 or i = −1 gives zero on the RHS, respecting the convention that ε i (N ) = 0 for i < 0 and i > N . Writing the PGF F N (z) = N i=0 ε i (N ) z i and using these boundary conditions, the PGF must satisfy the ordinary differential equation Then writing the OGF G (z, w) = ∞ N =0 F N (z) w N , we obtain the partial differential equation which is subject to the boundary conditions G (1, w) = 1/(1 − w) and G (z, 0) = 1. The general solution of this equation can be written in the form: for an arbitrary (at least once-differentiable) function H (x), where the two boundary conditions at z = 1 and w = 0 impose the same requirement, that H (x)/x → 2 as x → 0. The solution in Eq. (50) imposes a functional constraint on the form of G (z, w) in any model with indistinguishable synapses.
Applying the general form in Eq. (50) to the particular case in Eq. (45), so writing G(z, w) in terms of some function , we reduce the functional equation involving a function in two variables to the much simpler and more symmetric functional equation in just one variable. We solve this equation using a power series solution. There are no even terms (because H (0) ≡ 0 guarantees that H (x)/x is finite as x → 0), so we write with κ 0 ≡ 1 satisfying the boundary conditions. The coefficients κ 2i must then satisfy the infinite tower of recurrence relations These equations can be solved iteratively to any desired order. Given the κ 2i coefficients, we then have H (x), with G(z, w) following directly as: Then, F N (z) follows, which can be written in the form: where we define the odd coefficients κ 2i+1 ≡ 0 to make the expression for F N (z) take its simplest form. By definition F N (z) ≡ N i=0 e i (N ) z i , so by reading off the coefficient of z i in Eq. (55) we obtain e i (N ) in terms of the κ 2 j coefficients. From the definition of F N (z) as the PGF for the number of synapses with strength +1 in equilibrium, the equilibrium distribution A N takes the form: where P denotes a sum over all N C i combinations of the N indicated tensor products involving i synapses of strength +1 and N −i of strength −1, and the e i (N ) can be expressed in terms of the κ 2 j via Eq. (55). Although this completely solves the problem of finding the Hebb equilibrium distribution, in fact we can write A N directly in terms of the κ i coefficients. To see this, we first evaluate F N (z) at z = −1 using Eq. (55), giving F N (−1) ≡ κ N . But F N (−1) = N i=0 (−1) i e i (N ), and this alternating sum is, up to an overall sign, just the equilibrium correlation coefficient E[S 1 (∞) × · · · × S N (∞)], since E[S 1 (∞) × · · · × S N (∞)] = (−1) N × e 0 (N ) + (−1) N −1 × e 1 (N ) + · · · + (−1) 0 × e N (N ) So E[S 1 (∞)×· · ·×S N (∞)] = (−1) N F N (−1) ≡ (−1) N κ N . But κ 2i+1 ≡ 0, so we can just drop the parity factor (−1) N . Hence, κ N is the equilibrium correlation function between the strengths of N synapses, for any choice of N . We must therefore be able to expand A N directly in terms of these correlation functions. Equation (56) writes A N in terms of the two orthogonal vectors (1, 0) T and (0, 1) T . Although these definite strength states are the natural ones, we can instead expand the equilibrium state using a different pair of orthogonal vectors. In particular, we may use the pair A 1 = 1 2 (+1, +1) T and A ⊥ 1 = 1 2 (+1, −1) T , where the former is just the monosynaptic equilibrium distribution. Then, we may instead write Because κ 2i+1 = 0, only an even number of A ⊥ 1 vectors can appear in each term. We may confirm by explicit calculation that Eqs. (56) and (58) are equivalent representations of the same equilibrium state A N , and we may also confirm that the form in Eq. (58) has the correct marginal and correlational structure. For example, to compute a correlation function involving j out of the N synapses, we need to marginalise over N − j synapses. This marginalisation is achieved just by summing over their states, which we do by dotting through with the 2-dimensional vector 1 in the N − j relevant places in the tensor product. To obtain expectation values of the other j synapses' strengths, we just dot through with in the j relevant places. But, 1 ≡ 2 A 1 and ≡ −2 A ⊥ 1 , so Eq. (58) is just a disguised expansion in the orthogonal vectors 1 and that must be used when computing the equilibrium correlation functions. Equation (58) is therefore the only possible form with the requisite correlational structure.
Although in obtaining Eq. (53) we have essentially found the Hebb equilibrium distribution, using it to obtain the κ 2i is awkward. For example, the first few κ 2i are and they become increasingly complicated as i increases. We can compute these coefficients numerically for any given value of ψ, but high precision is required to obtain stable results, with more precision required as N increases. Rather than using Eq. (53) directly, we instead define the exponential generating function (EGF) of the coefficients κ 2i , This EGF undoes the convolution structure in Eq. (53), and we obtain the q-like equation (see, for example, Andrews et al. 1999) The solution follows by iteration, so that where we have used K (0) ≡ 1. This EGF can be evaluated in closed form for the three cases of ψ = 1, ψ = 1 2 and ψ = 0 (or strictly in the limit ψ → 0), giving The coefficients κ 2i follow as: κ 2i = δ i,0 for ψ = 0; κ 2i = 1/(1 + 2i) for ψ = 1 2 ; and κ 2i = 1 for ψ = 1. Plugging these into Eq. (55), we obtain Thus, for ψ = 0, e i (N ) = N C i /2 N , so binomially distributed with probability 1 2 ; for ψ = 1 2 , e i (N ) = 1/(N + 1), so uniformly distributed; for ψ = 1, e i (N ) = 1 2 δ i,0 + δ i,N , so bimodally distributed with equiprobable spikes at i = 0 and i = N only. The case of ψ = 0 (or ψ → 0) corresponds to the distribution A N = A 1 ⊗· · ·⊗ A 1 .
Away from these exact cases, we approximate K (z) by considering only a finite number of terms in the product for the EGF in Eq. (62), Writing cosh in its exponential form and considering all combinations of products of the individual terms, the coefficient of z i in K m (z) is where we write ψ j = ψ(1 − ψ) j , and the sum is over all 2 m possible combinations of signs. For m = 0, the sum just means ψ 0 . Thus, we write κ (m) and then lim m→∞ κ (m) i = κ i . For ψ > 0, ψ j falls to zero geometrically fast as j increases, so the convergence is rapid. Thus, a controlled approximation replaces the coefficients κ i with their truncated forms κ (m) i , where we only need to take m large enough for good convergence. For notational simplicity, we write Eq. (67) in the form: where the sum over p i α is shorthand for the full sum in Eq. (67). Inserting this into Eq. (55), we obtain the PGF with again lim m→∞ F (m) N (z) = F N (z). The approximated equilibrium distribution is therefore an average over 2 m+1 binomial distributions all with parameter N but with the 2 m+1 probabilities 1 2 (1±ψ 0 ±· · ·±ψ m ). Although it involves a sum over 2 m+1 terms, it does not require high numerical precision to obtain stable results, and in general it provides a very efficient method for obtaining the equilibrium distribution for anything but very small N . The approximation can be made even more efficient by replacing a binomial distribution with parameter N and probability 1 2 (1 ± p α ) with a Gaussian distribution of mean 1 2 N (1 ± p α ) and variance 1 4 N (1 − p 2 α ) for N large enough.
In Fig. 13 we illustrate the complexity of the Hebb equilibrium distribution, for various choices of ψ and N . As ψ is reduced from unity to zero, the equilibrium distribution moves from bimodal via uniform to binomial (or Gaussian  (N ) and i scaled appropriately to permit comparison in the continuum limit). We focus on values of ψ not far from ψ = 1 2 , for which the distribution in uniform, so as to capture this transition in the overall structure of the distribution, and also ψ = 0.1 and ψ = 0.9 at the extreme ranges. For fixed ψ, increasing N can create more oscillations in the distribution. The distribution respects the overall envelope for smaller N , but the maxima can split apart into multiple new maxima and minima as N increases. For fixed N , the oscillations spread out from the bimodal peaks at i = 0 and i = N as ψ is reduced, then they flip over in the transition through ψ = 1 2 , and finally they coalesce as ψ is reduced to zero, where we expect the equilibrium distribution to become exactly binomial (or Gaussian). Even at ψ = 0.1, the equilibrium distribution is not far from Gaussian for larger values of N .
SNR lifetimes are not sensitive to the full equilibrium structure of the Hebb protocol. When obtaining SNR lifetimes from simulations, we therefore only need to ensure that the equilibrium distribution of synaptic strengths has the correct first-and second-order statistical structure. Defining A ± = A 1 ± √ κ 2 A ⊥ 1 , we find that It is not possible in general to write A N for N ≥ 4 as a similar sum over tensor products with identical factors corresponding to probability distributions. However, the approximation A N ≈ 1 2 A + ⊗ · · · ⊗ A + + A − ⊗ · · · ⊗ A − is exact for N = 1, N = 2 and N = 3, and is guaranteed to have the correct marginal first-, second-and third-order statistical structure for N ≥ 4. In simulations to obtain SNR lifetimes, it therefore suffices to prepare the equilibrium distribution according to Eq. (71) rather than the exact form in Eq. (56). To compute τ mfpt = τ mfpt (h 0 | N eff ) h 0 >ϑ N eff using the MIE approach in Eq. (4), we require the distribution of h 0 conditioned on exactly N eff components of ξ 0 being +1. For ζ = 0, for which we can obtain Prob[ h | h ], the distribution of h 0 is equivalent to the distribution of the number of synapses of strength +1 after the storage of ξ 0 . Before the storage of ξ 0 , these N eff synapses are in equilibrium, with probability e i (N eff ) that i of them have strength +1. During the storage of ξ 0 , the other N eff −i may potentiate, each with probability p. The probability that k of these N eff synapses have strength +1 after the storage of ξ 0 is therefore just k i=0 e i (N eff ) N eff −i k−i ( p). Because of the convolution structure of this sum, we can find the PGF for these probabilities in terms of F N eff (z), obtaining (cf. Eq. (44)) Using F + N eff (z), we obtain the first two moments of h 0 as and by averaging over N eff , we recover μ(0) and σ (0) 2 in Eq. (7) (using Eq. (19a) for the correlation function) for ζ = 0 and with t = 0 s. For smaller values of N eff , we require the full conditional distribution of h 0 encoded in F + N eff (z), but for larger values, we can safely replace it by a Gaussian with the moments in Eq. (73).