Introduction

Information and energy are intimately related for all physical systems because information has to be written on some physical substrate which always comes at some energy cost (Landauer 1961; Bennett 1982; Leff and Rex 1990; Berut et al. 2012; Parrondo et al. 2015). Brains are physical devices that process information and simultaneously dissipate energy (Levy and Baxter 1996; Laughlin et al. 1998) in the form of heat (Karbowski 2009). This energetic cost is relatively high (Laughlin et al. 1998; Aiello and Wheeler 1995; Attwell and Laughlin 2001; Karbowski 2007), which is the likely cause for a sparse coding strategy in neural circuits (Balasubramanian et al. 2001; Niven and Laughlin 2008). Experimental studies (Shulman et al. 2004; Logothetis 2008; Alle et al. 2009), as well as theoretical calculations based on data (Harris et al. 2012; Karbowski 2012), indicate that fast synaptic signaling, i.e. synaptic transmission, together with neuron’s action potentials are the major consumers of metabolic energy. This type of energy use is of electric origin, and is caused by flows of electric charge due to voltage and concentration gradients and subsequent pumping ions out to maintain these gradients (Attwell and Laughlin 2001; Karbowski 2009). This very pumping of electric charge requires large amounts of energy.

Brains are also highly adapting objects, which learn and remember by encoding and storing long-term information in excitatory synapses (dendritic spines) (Kasai et al. 2003; Takeuchi et al. 2014). These important slow processes are driven by correlated electric activities of pre- and post-synaptic neurons (Markram et al. 1997; Bienenstock et al. 1982; Miller and MacKay 1994; Song et al. 2000; Van Rossum et al. 2000) and cause plastic modifications in spine’s intrinsic molecular machinery, leading to changes in spine size, its conductance (weight) and postsynaptic density (PSD) (Kasai et al. 2003; Bonhoeffer and Yuste 2002; Holtmaat et al. 2005; Meyer et al. 2014). Consequently synaptic plasticity and associated information writing and storing must cost energy, since spines require some energy for inserting and maintaining AMPA and NMDA receptors on spine membrane (Huganir and Nicoll 2013; Choquet and Triller 2013), as well as for powering various molecular processes associated with PSD (Lisman et al. 2012; Miller et al. 2005). In contrast to fast synaptic transmission and neuron discharges, which are of electric nature, the plastic slow synaptic processes are of chemical origin (interactions between spine proteins), and thus require the chemical energy (Karbowski 2019).

One of the empirical manifestations of the plasticity-energy relationship is present for mammalian cortical development, during which synaptic density can change several fold and strongly correlates with changes in glucose metabolic rate of cortical tissue (Karbowski 2012). Unfortunately, despite a massive literature on modeling synaptic plasticity (e.g. Bienenstock et al. 1982; Miller and MacKay, 1994; Song et al. 2000; Van Rossum et al. 2000; Billings and Van Rossum, 2009; Clopath et al. 2010; Pfister and Gerstner 2006; Tetzlaff et al. 2011; Toyoizumi et al. 2014; Costa et al. 2015; Shouval et al. 2002; Graupner and Brunel 2012; Ziegler et al., 2015; Fusi et al. 2005; Benna and Fusi 2016; Gutig et al. 2003; Smolen et al. 2012), our theoretical understanding of the energetic requirements underlying synaptic plasticity and memory storing is currently lacking. In particular, we do not know the answers to the basic questions, such as how does energy consumed by plastic synapses depend on key neurophysiological parameters, and more importantly, whether energy restricts the precision of synaptically encoded information and its lifetime, and to what extent. Such a knowledge might lead to a deeper understanding of two fundamental problems in neuroscience: one related to the physical cost and control of learning and memory in the brain (Kasai et al. 2003; Takeuchi et al. 2014; Lisman et al. 2012; Costa et al. 2015; Kandel et al. 2014; Chaudhuri and Fiete 2016; Zenke and Gerstner 2017), and another more practical related to dissecting the contribution of synaptic plasticity to signals in brain imaging (Attwell and Laughlin 2001; Logothetis 2008; Engl et al. 2017; Magistretti et al. 1999; Shulman et al. 2004). A recent study by the author (Karbowski 2019) provided some answers to the above questions, by analyzing molecular data in synaptic spines and by modeling energy cost of learning and memory in a cascade model of synaptic plasticity (mimicking molecular interactions in spines). From that study it follows that the average cost of synaptic plasticity constitutes a small fraction of the metabolic cost used for fast excitatory synaptic transmission, about 4 − 11%, and that storing longer memory traces can be relatively cheap (Karbowski 2019). However, this study left open other questions, e.g., how does the energy cost of synaptic plasticity depend on neuronal firing rates, synaptic noise, and other neural characteristics, and what is the relationship between such energy cost and a precise storing of synaptic information?

The main goal of this study is to uncover a relationship between synaptic plasticity, its energetics, and a precise information storing at excitatory synapses for one of the best known forms of synaptic plasticity due to Bienenstock, Cooper, and Munro, the so-called BCM rule (Bienenstock et al. 1982). This is a different (more macroscopic) but a complementary level of modeling to the one (microscopic) in Karbowski (2019). Specifically, we want to determine the energy cost associated with the accuracy of information encoded in a population of plastic synapses about the presynaptic input. Additionally, we want to find the relationship between plastic energy rate and memory duration about a single event at synapses. In other words, our goal is to find the metabolic requirement of maintaining an accurate information at synapses in the face of ongoing variable neural activity and thermodynamic fluctuations inside spines associated with variation in the number of membrane receptors. The phenomenological BCM rule has been shown to explain several key experimental observations (Cooper and Bear 2012), and it is equivalent to a more microscopic STDP rule (Markram et al. 1997; Song et al. 2000; Van Rossum et al. 2000) under some very general conditions (Pfister and Gerstner 2006; Izhikevich and Desai 2003). Since, the BCM rule is believed to describe initial phases of learning and memory (Zenke and Gerstner 2017), the focus of this work is on the energy cost and coding accuracy of the early synaptic plasticity, i.e. early long-term potentiation (e-LTP) and depression (e-LTD), which lasts from minutes to several hours. We do not consider explicitly the effects of memory consolidation that operate on much longer time scales and which are associated with late phases of LTP and LTD (l-LTP and l-LTD) (Ziegler et al. 2015; Benna and Fusi 2016; Redondo and Morris 2011). However, we do provide a rough estimate of the energetics of these late processes, and they turn out to be much less energy demanding than the early phase plasticity.

One can question whether the approach taken here, with the macroscopic BCM type model, is reasonable for modeling and calculating energy cost of synaptic plasticity? Maybe a more microscopic approach should be used with explicit molecular interactions between PSD proteins? However, the basic problem with such a microscopic more detailed approach is that we do not know most of the molecular signaling pathways in a dendritic spine, we do not know the rates of various reactions, and even the basic mechanism of encoding information at synapses is unclear. For example, for a long time it was thought that CaMKII persistent autophosphorylation provides a basic mechanism of information storage via bistability (Lisman et al. 2012; Miller et al. 2005). However, experimental data indicate that CaMKII enhanced activity after spine activation is transient and lasts only about 2 min (Lee et al. 2009), which casts doubts on its persistent enzymatic activity and its role as a “memory molecule” (for a review see, Smolen et al. 2019). Taking all these uncertainties into account, it seems that more macroscopic approach might be more reliable, at least partly.

Because synapses/spines are small, they are strongly influenced by thermal fluctuations (Kasai et al. 2003; Choquet and Triller 2013; Statman et al. 2014). For this reason, this paper uses universal methods of stochastic dynamical systems and non-equilibrium statistical mechanics (Nicolis and Prigogine 1977; Van Kampen 2007; Risken 1996; Lan et al. 2012; Mehta and Schwab 2012; Tome 2006; Tome and de Oliveira 2010; Seifert 2012). The latter are generally valid for all physical systems, including the brain, operating out of thermodynamic equilibrium. Regrettably, the methods of non-equilibrium thermodynamics have virtually not been used in neuroscience despite their large potential in linking brain physicality with its information processing capacity, with two recent exceptions (Goldt and Seifert 2017; Karbowski 2019). (This should not be confused with equilibrium thermodynamics, whose methods have occasionally been used in neuroscience, although in a different context, e.g., Balasubramanian et al. 2001; Tkacik et al. 2015; Friston2010.)

General outline of the problem considered

It is generally believed that long-term information in excitatory synapses is encoded in the pattern of synaptic strengths or weights (membrane electric conductance), which is coupled to the molecular structure of postsynaptic density within dendritic spines (Takeuchi et al. 2014; Lisman et al. 2012; Miller et al. 2005; Kandel et al. 2014; Zhu et al. 2016). This study considers the energy cost associated with maintaining the pattern of synaptic weights. In particular, we analyze the energetics and information capacity of the fluctuations in the number of AMPA and NMDA receptors on a spine membrane, or equivalently, fluctuations in the synaptic conductance. Such a variability in the receptor number tends to spread the range of synaptic weights (affecting their structure and distribution) that has a negative consequence on the encoded information and can lead to its erasure. In terms of statistical mechanics, the receptor fluctuations increase the entropy associated with the distribution of synaptic weights, and that entropy has to be reduced to preserve the information encoded in the weights. This very process of reducing the synaptic entropy production is a nonequilibrium phenomenon that costs some energy, which has to be provided by various processes involving ATP generation (Nicolis and Prigogine 1977).

The BCM type of synaptic plasticity used here is a phenomenological model that does not relate in a straightforward way to the underlying synaptic molecular processes. Empirically speaking, a change in synaptic weight in e-LTP is caused by a sequence of molecular events, of which the main are: activation of proteins in postsynaptic density, which subsequently stimulates downstream actin filaments elongation (responsible for a spine enlargement), and AMPA and NMDA receptor trafficking (Huganir and Nicoll 2013; Choquet and Triller 2013). Therefore, it is assumed here that BCM-type rule used here macroscopically reflects broadly these three microscopic processes, especially the first and the last. (Spine volume related to actin dynamics is not explicitly included in the model, although it is known experimentally that spine volume and conductance are positively correlated (Kasai et al. 2003).) Thus, it is expected that the synaptic energy rate calculated here is related to ATP used mainly for postsynaptic protein activation through phosphorylation process (Zhu et al. 2016), and receptor insertion and movement along spine membrane. Obviously, there are many more molecular processes in a typical spine, but they are either not directly involved in spine conductance variability or they are much faster than the above processes (e.g. releasing Ca2+ from internal stores is fast). A detailed empirical estimation based on molecular data suggests that protein activation via enhanced phosphorylation is the dominant contribution to the energy cost (ATP rate) of synaptic plasticity (Karbowski 2019). Therefore, the theoretical energy rate of synaptic plasticity determined here should be viewed as a minimal but a reasonable estimate of energetic requirement of LTP and LTD, and it is strictly associated with the information encoded in synaptic weights.

Experimental data show that excitatory synapses can exist in two or more stable states, characterized by discrete synaptic weights or sizes (Kasai et al. 2003; Montgomery and Madison 2004; Petersen et al. 1998; O’Connor et al. 2005; Loewenstein et al. 2011; Bartol et al. 2015). Data on a single synapse level indicate that synapses can operate as binary elements either with low or high electric conductance (Petersen et al. 1998; O’Connor et al. 2005). On the other hand, the data on a population level, more relevant to this work, show that synapses can assume more than two stable discrete states (Kasai et al. 2003; Loewenstein et al. 2011; Bartol et al. 2015). In either case, the issue of bistability vs. multistability is not yet resolved. In this study, a minimal scenario is considered in which synapses together with their postsynaptic neuron can effectively act as a binary coupled system, characterized by a single variable, which is the mean-field postsynaptic current with one or two stable states. The bistability is produced here from an extended BCM model, which in principle allows for continuous changes in synaptic weights for individual synapses. The important point is that these continuous weights are correlated, due to plasticity constraints, and thus converge on a mean-field population level either to one or to a couple of stable values.

Synaptic plasticity processes are induced by a correlated firing in pre- and post-synaptic neurons, and thus a model of neuron activity is also needed. This study uses a firing rate neuron model of the so-called class one nonlinear firing rate curve, which is believed to be a good approximation to biophysical neuronal models (Ermentrout 1998; Ermentrout and Terman 2010), see the Methods for details.

The paper is organized as follows. First, we introduce and solve an extended model of the classical BCM plasticity rule. Then, we derive an effective equation for the mean-field stochastic dynamics of the synaptic currents for that extended plasticity model. Next, we translate this effective equation into probabilistic Fokker-Planck formalism, and derive an effective potential for the mean-field synaptic current. With the help of the effective potential we find entropy production and Fisher information associated with the synaptic plasticity stochastic dynamics. Entropy production is related to the energy cost of the extended BCM plasticity, while the Fisher information is related to the accuracy of encoded information in a population of plastic synapses about the presynaptic input. Details of the calculations are provided in the Methods (and some in Supporting Information ??).

Results

Model of synaptic plasticity: stochastic BCM type

We consider a sensory neuron with N plastic excitatory synapses (dendritic spines). We assume that synaptic weights wi (i = 1,...,N), corresponding to spine electric conductances, change due to two factors: correlated activity in presynaptic and postsynaptic firing rates (fi and r, respectively), and noise in spine conductance (\(\sim \sigma _{w}\)). The noise is caused by two basic factors: an internal thermodynamic fluctuations in spines because of their small size (< 1 μ m) and relatively small number of molecular components (Kasai et al. 2003; Statman et al. 2014), and by presynaptic fluctuations in the firing rates that drive the ionic and molecular fluxes in spines. The dynamics of synaptic weights is given by a modified BCM plasticity rule (Bienenstock et al. 1982):

$$ \begin{array}{@{}rcl@{}} \frac{dw_{i}}{dt} \!&=&\! \lambda f_{i}r(r - \theta) - \frac{(w_{i}-\epsilon a)}{\tau_{w}} + \frac{\sqrt{2}\sigma_{w}(1 + \tau_{f}f_{i})}{\sqrt{\tau_{w}}} \eta_{i} \end{array} $$
(1)
$$ \begin{array}{@{}rcl@{}} \tau_{\theta}\frac{d\theta}{dt} \!&=&\! -\theta + \alpha r^{2}, \end{array} $$
(2)

where λ is the amplitude of synaptic plasticity controlling the rate of change of synaptic conductance, τw is the weights time constant controlling their decay duration, 𝜃 is the homeostatic variable the so-called sliding threshold (adaptation for plasticity) related to an interplay of LTP and LTD with time constant τ𝜃, and α is the coupling intensity of 𝜃 to the postsynaptic firing rate r. The noise term in Eq. (1) is represented as Gaussian white noise ηi with zero mean and Delta function correlations, i.e., 〈ηi(t)〉η = 0 and \(\langle \eta _{i}(t)\eta _{j}(t^{\prime })\rangle _{\eta }= \delta _{ij}\delta (t-t^{\prime })\) (Van Kampen 2007). The amplitude of the noise in weights is proportional to the standard deviation σw (in units of conductance) due to basic thermodynamic fluctuations in spines, and to the factor (1 + τffi) due to additional fluctuations in the presynaptic activities. The latter factor simply amplifies the basic thermodynamic fluctuations. The time scale τf of fluctuations in fi was added in the noise term to maintain a unitless form of the amplifying factor. Finally, the product 𝜖a is the minimal synaptic weight when there is no presynaptic stimulation (fi = 0), where the unitless parameter 𝜖 ≪ 1. There are two modifications to the conventional BCM rule: the stochastic term \(\sim \sigma _{w}\), and the decay term of synaptic weights with the time constant τw, which is key for reproducing a binary nature of synapses (Petersen et al. 1998; O’Connor et al. 2005) and for determining energy used by synaptic plasticity.

The conventional BCM rule (i.e. for \(\tau _{w}\mapsto \infty \) and σw = 0) describes temporal changes in synaptic weights due to correlated activity of pre- and post-synaptic neurons (both fi and r are present on the right in Eq. (1)). These activity changes can either increase the weight, if postsynaptic firing r is greater than the sliding threshold 𝜃 (this corresponds to LTP), or they can decrease the weight if r < 𝜃 (corresponding to LTD). The interesting aspect is that 𝜃 is also time dependent, and it responds quickly to changes in the postsynaptic firing. In effect, when both dynamical processes in Eqs. (1-2) are taken into account, the synapse is potentiated for low r (LTP) and depressed for high r (LTD).

We assume, in accordance with empirical data, that presynaptic firing rates fi change on a much faster time scale (\(\tau _{f} \sim 0.1-1\) sec) than the synaptic weights wi (changes on time scale \(\tau _{w} \sim 1\) hr). We further assume that each presynaptic firing rate fi fluctuates stochastically around mean value fo with a standard deviation σf, and that these fluctuations are uncorrelated. This implies that there is a time scale separation between neural activities and synaptic plasticity activities.

Numerical solution of the stochastic extended BCM plasticity model

In this section we solve numerically the model represented by N + 1 Eqs. (12).

We first consider the model without synaptic noise, i.e., σw = 0. This deterministic system can exhibit collective bistability, regardless of whether σf is 0 or finite. The critical factor in generating bistability is that the time constant T𝜃 for the homeostatic variable 𝜃 is much smaller than synaptic plasticity time constant τw. That is, the variable 𝜃 must be much faster than the synaptic weights wi. Typically, bistability is found for T𝜃/τw ≤ 0.06, and we work in this regime throughout the whole study (for the neurobiological validity of this regime, see the Discussion section). Collective bistability means that all synaptic weights can converge to two different fixed points depending on the initial conditions (Fig. 1). When all synapses start from sufficiently small weights, then they all converge into the same small synaptic weight 𝜖a. If the initial weights are much larger than 𝜖a, then all synapses become asymptotically strong. Thus, there is a strong collective behavior of synapses in the deterministic case.

Fig. 1
figure 1

Asymptotic behavior of the deterministic extended BCM model. Temporal dependence of deterministic synaptic weights wi from Eqs. (1-2). (Upper panel) When mean firing rate fo is smaller than some threshold value, then all weights converge on the same asymptotic value regardless of the initial values (red lines for strong initial synapses, and blue lines for weak initial synapses). This case corresponds to monostability. (Lower panel ) When fo is in the intermediate interval, then synaptic weights can converge into two separate values, depending on initial conditions. In this regime, there are two different coexisting fixed points (bistability). Strong initial weights lead to large final value (red lines), and weak initial weights lead to low final value (blue lines). Parameters used: λ = 9 ⋅ 10− 7 and κ = 0.001

Two main parameters that control the shape of the bifurcation diagram are the synaptic plasticity amplitude λ and the mean firing rate fo (Fig. 2). For a fixed fo, synapses can be either in monostable or bistable phase, depending on the value of λ (Fig. 2a). Generally, for small and sufficiently large λ there is monostability, while for intermediate λ there is bistability. For a fixed λ, the picture is slightly more complex: bistability can emerge already for fo = 0 (for intermediate λ), or for some finite fo (for small λ), or bistability can never appear (for large λ) (Fig. 2a). The bifurcation diagram, i.e., the dependence of asymptotic value of wi on fo is presented in Fig. 2b.

Fig. 2
figure 2

Phase and bifurcation diagrams of wi for the deterministic extended BCM model. a Schematic phase diagram λ vs. fo for monostable (mono) and bistable (bi) behavior of synaptic weights wi. Bistability is lost for sufficiently large λ. Note that the critical values of λ and fo, for which bistability emerges and disappears, are inversely related. b Bifurcation diagram of an asymptotic synaptic weight vs. fo for a typical synapse. The bistable regime is indicated by dotted lines. Parameters used: λ = 9 ⋅ 10− 7, and κ = 0.001

Inclusion of synaptic noise, i.e. σw > 0, leads to stochastic fluctuations of individual synapses. In the monostable regime, fluctuations are around a given fixed point (either weak or strong weight). In the bistable regime, individual synapses fluctuate between weak and strong weights (Fig. 3a). Despite synaptic noise, the collective nature of synapses is statistically preserved, as all synapses have similar weight distributions (Fig. 3b, c). These distributions are much more spread in the bistable regime than in the monostable, and they seem to be almost uniform for bistability.

Fig. 3
figure 3

Distributions of synaptic weights wi for the stochastic extended BCM model. a Temporal fluctuations of an individual synapse in the bistable regime. Note stochastic jumps between weak and strong weights. b Distribution of synaptic weights for an individual synapse (the same as in a) in the bi- (fo = 10 Hz; blue solid line) and mono-stable (fo = 1 Hz; red dashed line) regimes. c Cumulative distribution of synaptic weights for all N synapses in the bi- (fo = 10 Hz; blue solid line) and mono-stable (fo = 1 Hz; red dashed line) regimes. Note that both distributions in (b) and (c) are very similar. Parameters used: λ = 9 ⋅ 10− 7, κ = 0.001, and σw = 0.1 nS

Dimensional reduction of the stochastic BCM model: dynamic mean field

The stochastic system of N + 1 equations described by Eqs. (12) is not tractable analytically, because it is the coupled nonlinear system. The coupling takes place via postsynaptic firing rate r, which depends on all synaptic weights wi (in Eqs. 12). In this section an effective mean-field model corresponding to the extended BCM model in Eqs. (12) is presented and discussed that is amenable to analytical considerations. In this dynamical mean-field, we focus on a single dynamic variable, which is a population averaged synaptic current v defined by Eq. (25) in the Methods. The single variable v is sufficient to describe the global stochastic dynamics of the original model given by Eqs. (12), because together with postsynaptic firing r it forms a closed mathematical system of just two equations; see below. The practical reason behind introducing the dynamic mean-field is that this approach enables us to obtain explicit formulae for synaptic plasticity energy rate and coding accuracy.

We can reduce the multidimensional system (1-2) into a single effective equation, primary because of the time scale separation between neural firing dynamics (changes typically on the order of seconds or less) and between synaptic plasticity (changes on the timescale of minutes/hours). Moreover, we assume that the two synaptic plasticity processes, described by Eqs. (1) and (2), have two distinct time scales, and τw dominates over T𝜃 in duration. For the neurophysiological validity of this assumption, see the Discussion. Consequently, for times of the order of τw, we have d𝜃/dt ≈ 0, which implies that 𝜃αr2. The details of the reduction procedure can be found in the Methods, in which we obtain a single plasticity equation for a population averaged excitatory postsynaptic current v per synapse, which is related to wi and fi by \(v= (\upbeta /N){\sum }_{i} f_{i}w_{i}\), where β depends on neurophysiological parameters and is defined in Eq. (25). The result of the reduction procedure is

$$ \begin{array}{@{}rcl@{}} \frac{dv}{dt} = hr^{2}(1-\alpha r) - (v-\epsilon cf_{o})/\tau_{w} + \frac{\sqrt{2}\sigma_{v}}{\sqrt{\tau_{w}}} \overline{\eta}. \end{array} $$
(3)

This equation essentially couples slow synaptic activities with fast neural activities, and gives a single equation describing the mean-field dynamics of the coupled system: synapses plus their postsynaptic neuron. In Eq. (3), the symbol h is the driving-plasticity parameter given by

$$ \begin{array}{@{}rcl@{}} h= \lambda\upbeta({f_{o}^{2}}+{\sigma_{f}^{2}}), \end{array} $$
(4)

with fo and σf denoting the mean and standard deviation of presynaptic firing rates. Mathematically, the driving-plasticity h is proportional to the product of plasticity amplitude λ and the presynaptic driving \(({f_{o}^{2}}+{\sigma _{f}^{2}})\), which implies that h grows quickly with the presynaptic firing rate. Physically, h is proportional to the electric charge that, on average, can enter the spine due to a correlated activity of pre- and post-synaptic neurons (h has a unit of electric charge). This means that the magnitude of h is a major determinant of the plasticity (driving force counteracting the synaptic decay), since larger h can experimentally correspond to more Ca+ 2 entering the spine and a higher chance of invoking a change in synaptic strength, which agrees qualitatively with the experimental data (Huganir and Nicoll 2013; Lisman et al. 2012).

The rest of the parameters in Eq. (3) are c = aβ, and \(\overline {\eta }= ({\sum }_{i} \eta _{i})/\sqrt {N}\), which denotes a new (population averaged) Gaussian noise with zero mean and delta function correlations. This population noise has the amplitude σv, which corresponds to a standard deviation of v when h = 0, and it is given by

$$ \begin{array}{@{}rcl@{}} \sigma_{v}= \frac{\upbeta\sigma_{w}}{\sqrt{N}} [f_{o} + \tau_{f}\left( {f_{o}^{2}}+{\sigma_{f}^{2}}\right)]. \end{array} $$
(5)

Note that σv scales as \(1/\sqrt {N}\), and it is a product of the intrinsic synaptic conductance noise and of the presynaptic neural activity. The latter implies that a higher presynaptic activity amplifies the current noise.

In Eq. (3), the postsynaptic firing rate r assumes its quasi-stationary value (due to time scale separation), and is related to v through (for details see the Methods):

$$ \begin{array}{@{}rcl@{}} r= \frac{1}{2}\left( -A^{2}\kappa + \sqrt{A^{4}\kappa^{2} + 4A^{2}v}\right), \end{array} $$
(6)

where A is the postsynaptic firing rate amplitude, and κ is the intensity of firing rate r adaptation. Broadly speaking, the magnitude of κ reflects the strength of neuronal self-inhibition due to adaptation to synaptic stimulation (see Eqs. (21) and (22) in the Methods). Generally, increasing κ leads to decreasing postsynaptic firing rate r (Fig. 4a). For κ = 0, we recover a nonlinear firing rate curve (square root dependence on synaptic current v) that is characteristic for class one neurons (Ermentrout 1998; Ermentrout and Terman 2010), while for sufficiently large κ, i.e. for \(\kappa \gg 2\sqrt {v}/A \), we obtain a linear firing rate curve r(v) ≈ v/κ (Fig. 4a). Equations (3) and (6) form a closed system for determining the stochastic dynamics of the postsynaptic current v.

Fig. 4
figure 4

Firing rates and emergence of bistability in the mean-field model: theory. a Postsynaptic firing rates r as functions of the population averaged synaptic current v for different neuronal adaptations values κ (in nA⋅sec). Increasing κ causes decrease in r and makes the functional form r(v) more linear. b Graphical solutions of Eq. (7) and multiple roots for stationary v. For small driving-plasticity h there is only one intersection of g(v) and the line y = v at \(v \sim O(\epsilon )\), corresponding to vd and monostability (dashed red line; f0 = 0.3 Hz, σf = 8 Hz). For higher h (h > hcr) there are three intersections, but the middle one corresponds to an unstable solution, which in effect yields two stable solutions, i.e. bistability (dashed-dotted yellow line; f0 = 5.0 Hz, σf = 10 Hz). When h is very large, then there is only one intersection, and it occurs for large v, corresponding to monostability with strong synapses vu only (dotted green line; f0 = 30.0 Hz, σf = 10 Hz). In panel (b) λ = 9 ⋅ 10− 7 and κ = 0.001

Geometric steady state solution of the deterministic mean-field: emergence of bistability in v

We can use geometric considerations to gain some intuitive understanding of the mean-field deterministic behavior represented by Eqs. (3) and (6). If we put dv/dt = 0 and σv = 0 in Eq. (3), we can rearrange it to obtain

$$ \begin{array}{@{}rcl@{}} v = g(v), \end{array} $$
(7)

where the right hand side of this equation, g(v) = 𝜖cfo + τwhr2(1 − αr), and it depends on v only through r as in Eq. (6) (see Fig. 4a). Moreover, the function g(v) has a maximum with height proportional to h. When h is very small, Eq. (7) has only one solution \(v\sim O(\epsilon )\) (i.e. one intersection point of the curves representing the functions on the right and on the left; Fig. 4b). This solution corresponds to weak synapses and monostable regime. Increasing h, by increasing f0, causes an increase in the maximal value of the right hand side in Eq. (7), such that more solutions are possible (Fig. 4b). In particular, when h grows above a certain critical value hcr, Eq. (7) generates 3 solutions (one \(\sim O(\epsilon )\) and two other \(\sim O(1)\)), of which the middle one is unstable (Fig. 4b). This case corresponds to bistable regime with two stable solutions, representing weak and strong synaptic currents that can be called, respectively, “down” and “up” synaptic states. These two states could hypothetically be related to thin and mushroom dendritic spines, with small and large number of AMPA receptors, respectively (Bourne and Harris 2007). For very large driving-plasticity h the two lower solutions disappear and we have again a monostable regime with strong synapses only (Fig. 4b).

A geometrical condition for the emergence of bistability is when the function g(v) in Eq. (7) first touches tangentially the line y = v, i.e. when dg/dv = 1 (Fig. 4b). Solving this condition together with Eq. (6) yields for 𝜖 ≪ 1 the critical value of the driving-plasticity parameter hcr as

$$ \begin{array}{@{}rcl@{}} h_{cr}= \frac{\alpha\kappa}{\tau_{w}}\left( 1 + \sqrt{1 + (\alpha\kappa A^{2})^{-1}}\right)^{2} + O(\epsilon). \end{array} $$
(8)

Note that for very fast decay in Eq. (3), i.e. for τw↦0, the bistability is lost, since then \(h_{cr} \mapsto \infty \), and there is only one solution corresponding to weak synapses \(v\sim O(\epsilon )\). Bistability is also lost in the opposite limit of extremely slow decay, \(\tau _{w} \mapsto \infty \), but in this case the only stable solution corresponds to strong synapses. Interestingly, for very strong neural adaptation, \(\kappa \mapsto \infty \), the bistability also disappears, since then \(h_{cr} \mapsto \infty \). This case corresponds to extremely small postsynaptic firing rates, rv/κ ≈ 0 (Fig. 4a), and indicates the absence of a driving force capable of pushing synapses to a higher conducting state. On the other hand, when there is no adaptation, κ↦0, the critical hcr↦(τwA2)− 1, i.e. it is finite. This means that it is easier to produce synaptic bistability for neurons with stronger nonlinearity in their firing rate curves (see Eq. 6; Fig. 4a).

Analytic steady state solution of the deterministic mean-field

The above geometric intuition can be supported by an analytic approach. In the deterministic limit, σv = 0 (which is obtained either for \(N\mapsto \infty \) or for σw = 0), we can solve the mean-field model of Eq. (3) in the stationary state, i.e. we can find its fixed points by setting dv/dt = 0. To achieve this, it is more convenient to work with the postsynaptic firing rate r than with v variable, due to a nonlinear dependence of r on v, which is given by Eq. (6). Inverting Eq. (6), we find v = κr + (r/A)2, which can be used in the condition dv/dt = 0. This generates an equation for roots of the cubic polynomial in the r variable:

$$ \begin{array}{@{}rcl@{}} \alpha\tau_{w}hr^{3} - (\tau_{w}h - A^{-2})r^{2} + \kappa r - \epsilon cf_{o} = 0. \end{array} $$
(9)

The discriminant Δ of this equation is

$$ \begin{array}{@{}rcl@{}} {\Delta}&=& \kappa^{2}(\tau_{w}h - A^{-2})^{2} - 4\alpha\tau_{w}\kappa^{3}h \\&&-2\epsilon cf_{o}(\tau_{w}h - A^{-2})[2(\tau_{w}h - A^{-2})^{2} - 9\alpha\tau_{w}\kappa h] \\ &&- 27(\epsilon cf_{o}\alpha\tau_{w}h)^{2}. \end{array} $$
(10)

The sign of Δ determines how many real roots Eq. (9) has. Specifically, if Δ < 0, then Eq. (9) has one real root, whereas if Δ > 0 then Eq. (9) has three distinct real roots. The former case corresponds to monostability, while the latter to bistability in the mean-field deterministic dynamics of Eq. (3). The transition between this two regimes takes place for Δ = 0. The existence of these regimes obviously depends on the values of various parameters in the discriminant Δ.

The phase diagram of mono- vs. bi-stability in the parameter space of λ,fo (plasticity amplitude vs. mean presynaptic firing) is shown in Fig. 5. A specific bifurcation diagram, computed numerically from Eq. (3) for σv = 0, in which a stationary value of v is plotted as a function of fo is presented in Fig. 6. The phase and bifurcation diagrams for stationary v in Figs. 5 and 6 look qualitatively similar to the phase and bifurcation diagrams of stationary wi in Fig. 2 for the full N-synaptic system of Eqs. (12).

Fig. 5
figure 5

Phase diagram of mono- vs. bi-stability in the deterministic mean-field model. Numerical solution of Eqs. (3) and (6) for σv = 0. a κ = 0.001, b κ = 0.012. The solid and dashed lines are the boundaries between mono- (mono) and bistable (bi) regimes, and they correspond to the condition Δ = 0 in the (λ,fo) space

Fig. 6
figure 6

Bifurcation diagram of v in the deterministic mean-field model. Numerical solution of Eqs. (3) and (6) for σv = 0. a Parameters used: λ = 9 ⋅ 10− 7 and κ = 0.001. b Parameters used: λ = 1 ⋅ 10− 5 and κ = 0.012. Note that both diagrams in A and B are qualitatively similar, and there exists a critical value of fo above which bistability emerges

Equation (9) can be solved analytically for small 𝜖, as a series expansion in 𝜖. The detailed procedure is described in the Suppl. Information S1. Depending on the sign of \({\Delta }_{o}\equiv \lim _{\epsilon \mapsto 0} {\Delta }/\kappa ^{2}\), there can be one or three fixed points, which have the following values

$$ \begin{array}{@{}rcl@{}} v_{d}= cf_{o}\epsilon + \left( \frac{cf_{o}}{\kappa}\right)^{2}h\tau\epsilon^{2} + O(\epsilon^{3}) \\ v_{u}= \kappa r_{+} + (r_{+}/A)^{2} + O(\epsilon) \\ v_{max}= \kappa r_{-} + (r_{-}/A)^{2} + O(\epsilon), \end{array} $$
(11)

where

$$ \begin{array}{@{}rcl@{}} r_{\pm}= \frac{ h\tau_{w}-A^{-2} \pm \sqrt{{\Delta}_{o}} }{2\alpha\tau_{w}h}. \end{array} $$
(12)

The value vd is the fixed point for weak synapses (down state), while vu is the fixed point for strong synapses (up state). The intermediate value vmax corresponds to an always unstable fixed point, which serves as a boundary between the domains of attraction for down and up fixed points. Thus all initial values of v in the (0,vmax) interval converge asymptotically into vd, and all initial values of v in the \((v_{max},\infty )\) interval converge asymptotically into vu. From Eqs. (11) and (12) it can be seen that the value of the intermediate point vmax decreases as fo (or h) increases, from the value vu (at the onset of bistability for Δo = 0) to the value vd. This mean that the domain of attraction for the vu fixed point increases at the expense of the domain of attraction for the vd fixed point, which shrinks with increasing fo in the bistable regime.

The critical value of the driving-plasticity parameter hcr for the emergence of bistability in Eq. (9) can be also obtained directly from the discriminant Δ in the limit 𝜖↦0. We obtain hcr by setting Δ0 = 0, and solving this equation for h.

Stochastic mean-field: numerics and effective potential for synaptic current

When the synaptic noise is present, σv > 0, the synaptic current v fluctuates. The distribution of v is unimodal for small mean firing rate fo, and bimodal for sufficiently large fo (Fig. 7). The bimodal distribution reflects the bistability found for the deterministic case, and it corresponds to synapses changing weights from weak to strong.

Fig. 7
figure 7

Distribution of synaptic current v in the stochastic mean-field model. Numerical solution of Eqs. (3) and (6) for σv > 0. In the monostable regime (fo = 1 Hz) the distribution is unimodal with a sharp peak around vd ≈ 0. In the bistable regime (fo = 10 Hz) the distribution is bimodal with two maxima around two fixed points vd and vu. a Parameters used: λ = 9 ⋅ 10− 7 and κ = 0.001. b Parameters used: λ = 10− 5 and κ = 0.012. In both (a) and (b) σw = 0.1 nS

Stationary average synaptic current 〈v〉, which is a measure of synaptic weights, increases weakly with presynaptic firing rate mean fo and its standard deviation σf (Figs. 8 and 9). Mean-field values of 〈v〉 (computed from Eq. (41)) start to deviate from the exact numerical values (computed from Eqs. (1)–(2)) for larger levels of synaptic noise σw and for higher fo (Figs. 8 and 9).

Fig. 8
figure 8

Average synaptic current 〈v〉 as a function of mean presynaptic firing rate: comparison of exact results with mean-field. a Dependence of 〈v〉 on fo for σw = 0.02. b The same for σw = 0.1. c The same for σw = 0.5. For all panels solid lines correspond to exact result (from Eqs. (1)–(2)), dashed lines to mean-field (from Eq. (44)). The results are for λ = 9 ⋅ 10− 7 and κ = 0.001

Fig. 9
figure 9

Average synaptic current 〈v〉 as a function of standard deviation of the presynaptic firing rate: comparison of exact results with mean-field. a Dependence of 〈v〉 on σf for fo = 1 Hz. b The same for fo = 10 Hz. For all panels solid lines correspond to exact numerical result (from Eqs. (1))–(2)), dashed lines to mean-field (from Eq. (44)). The results are for λ = 9 ⋅ 10− 7, κ = 0.001, and σw = 0.1

Stochastic Eq. (3) for the dynamics of v can be mapped into an equation for the dynamics of the probability distribution of v conditioned on fo, i.e. P(v|fo), described by a Fokker-Planck equation (see Eqs. (31-34) in the Methods). In the stochastic stationary state, characterized by the stationary probability distribution Ps(v|fo), we can define a new and important quantity called an effective potential Φ(v|fo), which is a function of the synaptic current v. The effective potential Φ is proportional to the amount of energy associated with the synaptic plasticity described by Eq. (3), and it is equal to the integral of the right hand side of Eq. (3) with σv = 0 (Van Kampen:2007, see Eq. (36) in the Methods). The explicit form of the effective potential Φ is

$$ \begin{array}{@{}rcl@{}} {\Phi}(v|f_{o})\!&=&\! \frac{v}{\tau_{w}}\left( \frac{1}{2}v-\epsilon cf_{o}\right) \\&&\!- h\left[\kappa r^{3}\left( \frac{1}{3} - \frac{\alpha r}{4}\right) + \frac{r^{4}}{A^{2}}\left( \frac{1}{2} - \frac{2\alpha r}{5}\right)\right]. \end{array} $$
(13)

Note that the second term in Φ (with the large bracket) is proportional to the plasticity amplitude λ through h. This term depends on v through the firing rate r (see Eq. (6)). In general, the functional form of the potential Φ(v|fo) determines the thermodynamics of synaptic memory, and thus it is an important function.

The shape of the potential Φ(v|fo) depends on the relative magnitude of the driving-plasticity h and the inverse of the decay time constant 1/τw (Fig. 10a). In fact, there are two competing terms in Φ that are controlled by 1/τw and h. The first term (\(\sim 1/\tau _{w}\)) maintains monostability, while the second (\(\sim h\)) promotes bistability. For h greater than the critical value hcr (Eq. (8)), there is bistability and Φ has two minima at vu and vd, corresponding to up (strong) and down (weak) synaptic states (Fig. 10a), similar to the result for the deterministic limit. For very large h, there is again only one minimum related to strong synapses (Fig. 10a). The two minima are separated by a maximum corresponding to a potential barrier at vmax. Metastable values of v, i.e. the minima and maximum of the potential, can be found from the condition dΦ/dv = 0, which is equivalent to finding the fixed points of Eq. (3) in the deterministic limit.

Fig. 10
figure 10

Effective synaptic potential, metastability, and memory lifetime: theory. a The metastable synaptic states can be described in probabilistic terms and correspond to minima of an effective potential Φ(v|fo). For weak presynaptic driving input fo the potential Φ has only one minimum at \(v_{d}\sim O(\epsilon )\), related to weak synapses. If fo is above a certain threshold, then the potential displays two minima, corresponding to bistable coexistence of weak and strong synapses (vd and vu). In the bistable regime, the synapses can jump between weak and strong states due to fluctuations in the input and/or synaptic noise. b Characteristic long times in the up (Tu), down (Td) synaptic states, and memory lifetime Tm as functions of presynaptic firing fo. Curves in (a) and (b) are for λ = 9 ⋅ 10− 7 and κ = 0.001

If we use a mechanical analogy and treat v as a spatial coordinate, then synaptic plasticity can be visualized as a movement in v space (state transitions), which is constrained by the energy related to Φ. This means that the shape of the function Φ(v) determines what kind of motions in v-space (state space) are possible or more likely. In particular, the binary nature of synaptic plasticity given by Eq. (3) can be described as transitions between two wells of the effective potential Φ(v|f0), corresponding to weak and strong synapses, or down and up synaptic states (e.g. Billings and Van Rossum 2009; Graupner and Brunel 2012). These transitions, caused by intrinsic synaptic noise (σw) and fluctuations in the presynaptic input (σf), can be thought as a “hill climbing” process in the v space, which requires energy due to a barrier separating the two wells (Fig. 10). The dwelling times in both states (Tu,Td) can be found from the classic Kramers “escape” formula (Eq. 47; Van Kampen 2007), and they are generally much larger than the time constant τw (Fig. 10b).

We define the memory time Tm of the synaptic system as a characteristic time needed to relax synaptic weights to their stationary values following a brief perturbation, or single memory event. Mathematically, it is equivalent to finding a relaxation time of the probability distribution P(v|fo) to its steady state distribution Ps(v|fo) after a brief perturbation; see Eq. (50) in the Methods. The characteristic memory time Tm is strictly related to the dwelling times Tu and Td by Eq. (51), and they mutual relationship is depicted in Fig. 10b. Generally, the memory lifetime Tm is very small in the monostable regime (\(T_{m} \sim \tau _{w}\)), i.e. for small presynaptic firing. However, it jumps by several orders of magnitude when synapses become bistable (i.e. when hhcr), but then Tm monotonically decreases with increasing fo (Fig. 10b).

Energy rate of synaptic plasticity

In this section we determine the energy rate, or metabolic rate, associated with stochastic BCM type synaptic plasticity. In a nutshell, energy is provided to the synaptic system to drive the plasticity related transitions between different synaptic weights associated mainly with the increase of synaptic weights. In the steady state this energy rate balances the energy dissipated due to the synaptic noise, which tends to decrease synaptic weights.

The plasticity related energy rate is determined both numerically, for the whole system of N synapses described by Eqs. (1-2), and analytically for the mean-field approximation described by a single Eq. (3). The numerical procedure for the whole system is described in the Methods (see section “Numerical simulations of the full synaptic system”), and the analytical results are described below.

The power dissipated by synaptic plasticity \(\dot {E}\) in the mean-field approximation is proportional to the average temporal rate of the effective potential decrease, i.e. −〈dΦ(v|f0)/dt〉, where 〈...〉 denotes averaging with respect to the probability distribution P(v|fo). Since the potential Φ(v|f0) depends on time only through v, after rearranging we get \(\dot {E} \sim - \langle (dv/dt)(d{\Phi }/dv) \rangle \). Thermodynamically, this formula is equivalent to the entropy production rate associated with the stochastic process described by Eq. (3), and represented by the effective potential Φ(v|fo) (Nicolis and Prigogine 1977; Lan et al. 2012; Mehta and Schwab 2012; Tome 2006; Tome and de Oliveira 2010). The synaptic plasticity energy rate per synapse \(\dot {E}\) can be found analytically using 1/N expansion, and in the steady state takes the form (see Methods):

$$ \begin{array}{@{}rcl@{}} \dot{E}= p_{d}\dot{E}_{d} + p_{u}\dot{E}_{u} \end{array} $$
(14)

where \(\dot {E}_{d}\) and \(\dot {E}_{u}\) are the energy rates dissipated, respectively, in the down and up synaptic states, which have the occupancies pd and pu. The energy rates \(\dot {E}_{d}\) and \(\dot {E}_{u}\) are given by

$$ \begin{array}{@{}rcl@{}} \dot{E}_{i}= \frac{E_{o}D}{4({\Phi}_{i}^{(2)})^{2}} [3({\Phi}_{i}^{(3)})^{2} + 2{\Phi}_{i}^{(2)}{\Phi}_{i}^{(4)}] + O(D^{2}), \end{array} $$
(15)

where i = d (down state) or i = u (up state). The symbols Φi and \({\Phi }^{(n)}_{i}\) denote values of the potential Φ(v) and its n-th derivative with respect to v for v = vi. The symbol Eo is the characteristic energy scale for variability in synaptic (spine) conductance, and it provides a link with underlying molecular processes (see the Methods). For convenience, we defined a new noise related parameter D, which is

$$ \begin{array}{@{}rcl@{}} D= {\sigma_{v}^{2}}/\tau_{w}. \end{array} $$
(16)

D can be viewed as the effective noise amplitude, and D relates to the number of synapses N as \(D\sim 1/N\).

Note that in Eq. (15) the terms of the order O(1) disappear, and the first nonzero contribution to \(\dot {E}\) is of the order O(1/N), since \(D \sim 1/N\). Moreover, to have nonzero power in this order, the potential Φ(v) must contain at least a cubic nonlinearity.

Equations (14) and (15) indicate that energy is needed for plasticity processes associated with the potential Φ “hill climbing”, which is in analogy to the energy needed for a particle trapped in a potential well (of a certain shape) to escape. The energetics of such a “motion” in the v-space depends on the shape of the potential, which is mathematically accounted for by various higher-order derivatives of Φ. Thus, a fraction of synapses that were initially in the down state can move up the potential gradient to the up state by overcoming a potential barrier, but this requires the energy that is proportional to \({\sigma _{v}^{2}}\) and to the derivatives of the potential. By analogy, a similar picture holds for synapses that were initially in the up state. The prefactor D in Eq. (15) indicates that the transitions up\(\leftrightarrow \)down, as well as local fluctuations near these states, cost energy that is proportional to the intrinsic synaptic noise (\(\sim \sigma _{w}\)) and presynaptic activity (including its fluctuations fo and σf). The important point is that if there is no intrinsic spine noise (σw = 0), then there are no transitions between the up and down states in the steady state, and consequently there is no energy dissipation (σv = 0), regardless of the fast presynaptic input magnitude. Likewise for very long decay of synaptic weights, \(\tau _{w}\mapsto \infty \), corresponding to very slow synaptic plasticity (and the lack of the decay term in Eq. 1), there is no energy used. In such a noiseless stationary state, the plasticity processes described by Eq. (3) are energetically costless, since there are no net forces that can change synaptic weight, or mathematically speaking, that can push synapses in the v-space. (This is not true under non-stationary conditions when there is some temporal variability in one or more parameters in Eq. (3), leading to dissipation, but the focus here is on the steady state). This situation resembles the so-called “fluctuation-dissipation” theorem known from statistical physics (Nicolis and Prigogine 1977; Van Kampen 2007; Risken 1996), where thermal fluctuations always cause energy loss. In our case, this fluctuation-dissipation relationship underlines a key role of thermodynamic fluctuations for the metabolic load of synaptic plasticity.

We can compare the energy rate coming from the mean-field (Eq. (3)) to the energy rate computed numerically for the full synaptic system described by Eqs. (12). The results are presented in Figs. 11 and 12. Generally, a better agreement between mean-field and exact results is achieved for intermediate synaptic noise σw and also for intermediate values of mean presynaptic firing rate fo. For larger σw, in the regions close to mono-bistability transitions, there are peaks in the mean-field \(\dot {E}\) that are absent in the numerical \(\dot {E}\) (Fig. 11). These peaks are the artifacts of the approximation methods used in the mean-field. Moreover, Fig. 11 shows that the energy rate \(\dot {E}\) mostly increases steadily with fo (the exact result). The exception is a narrow interval near the mono- to bi-stability regions, where \(\dot {E}\) slightly decreases (Fig. 11). The energy rate also steady increases with the standard deviation in the presynaptic firing σf (Fig. 12).

Fig. 11
figure 11

Energy rate \(\dot {E}\) as a function of mean presynaptic firing rate: comparison of exact results with mean-field. Results are for σw = 0.02 nS (upper panel), σw = 0.1 nS (middle panel), and σw = 0.5 nS (lower panel). Solid lines correspond to exact numerical results for the whole system of N synapses obtained from Eq. (79). Dashed lines correspond to the mean-field approach (Eqs. (14)–(15)). The best agreement between exact and mean-field results is for the intermediate σw, and not too small fo. Note two peaks in the mean-field result for \(\dot {E}\) (corresponding to mono \(\leftrightarrow \) bistability transitions) for larger noise σw, which are absent in the exact results. These peaks are the artifacts of the approximation method in the mean-field. All plots are for λ = 9 ⋅ 10− 7, κ = 0.001, and σf = 10 Hz

Fig. 12
figure 12

Energy rate \(\dot {E}\) as a function of standard deviation of the presynaptic firing rate: comparison of exact results with mean-field. Results are for fo = 1 Hz (upper panel), and fo = 10 Hz (lower panel). Solid lines correspond to exact numerical results obtained from Eq. (79), and dashed lines correspond to the mean-field results obtained from Eqs. (14-15). A better agreement between exact and mean-field results is obtained for the higher fo. All plots are for λ = 9 ⋅ 10− 7, κ = 0.001, and σw = 0.1 nS

A next interesting question is how the plasticity energy rate depends on the synaptic weights? In Fig. 13 we plot the dependence of \(\dot {E}\) on the average synaptic current 〈v〉, which is proportional to the synaptic weights and spine size (Kasai et al. 2003). It is clear that the synaptic energy rate related to plasticity grows nonlinearly with 〈v〉. For small 〈v〉, the energy rate \(\dot {E}\) depends weakly on 〈v〉, whereas for large 〈v〉 it increases strongly with 〈v〉 (Fig. 13).

Fig. 13
figure 13

Energy rate \(\dot {E}\) as a function of average synaptic current 〈v〉: exact numerical results. Note a sharp increase in \(\dot {E}\) for larger 〈v〉. Energy rate is calculated from Eq. (79), and 〈v〉 is calculated from Eqs. (1-2). All plots are for λ = 9 ⋅ 10− 7, κ = 0.001, and σf = 10 Hz

Which dependence of \(\dot {E}\) is stronger: on fo or on 〈v〉? In Fig. 14 it is shown that the energy rate \(\dot {E}\) increases nonlinearly both with fo and with 〈v〉, but the dependence on the average synaptic current 〈v〉 is much steeper.

Fig. 14
figure 14

Nonlinear dependence of energy rate on presynaptic firing rate and average synaptic current: exact numerical results. a The ratio \(\dot {E}/f_{o}\) as a function of fo. b The ratio \(\dot {E}/\langle v\rangle \) as a function of 〈v〉. The energy rate \(\dot {E}\) depends nonlinearly both on fo and v, but the dependence on the synaptic current 〈v〉 is more steep. Energy rate is calculated from Eq. (79), and 〈v〉 is calculated from Eqs. (1-2). All plots are for λ = 9 ⋅ 10− 7, κ = 0.001, and σf = 10 Hz

Energy cost of plastic synapses as a fraction of neuronal electric energy cost: comparison to experimental data

In order to assess the magnitude of the synaptic plasticity energy rate, we compare it to the rate of energy consumption by a typical cortical neuron for its electric activities related to fast synaptic transmission, action potentials and maintenance of the resting potential (Attwell and Laughlin 2001). The neural spiking activity and synaptic transmission are known to consume the majority of the neural energy budget (Attwell and Laughlin 2001; Harris et al. 2012; Karbowski 2012). The ratio of the total energy rate used by plastic synapses \(N\dot {E}\) to the neuron’s energy rate \(\dot {E}_{n}\) (given by Eq. (63) in the Methods) is computed for different presynaptic firing rates fo, various levels of synaptic noise σw, and for different cortical regions. The results for macaque and human cerebral cortex are shown in Figs. 15 and 16. These plots indicate that the synaptic plasticity contribution depends strongly on the level of synaptic noise σw; the higher the noise the larger the ratio \(N\dot {E}/\dot {E}_{n}\). Higher firing rates fo also tend to increase that ratio but not that strongly, and the dependence is nonmonotonic (Figs. 15 and 16). Generally, the value of \(N\dot {E}/\dot {E}_{n}\) ranges from negligible (\(\sim 10^{-4}-10^{-3}\)) for small/intermediate noise (σw = 0.1 nS), to substantial (\(\sim 10^{-2}-10^{0}\)) for very large noise (σw = 2 nS). The results are qualitatively similar across different cortical regions within one species, as well as between human and macaque cortex, despite large differences in the cortical sizes of both species (Figs. 15 and 16). Small quantitative differences are the result of small differences in the synaptic densities between areas and species.

Fig. 15
figure 15

Energy cost of synaptic plasticity as a fraction of neuron’s electric energy cost in the cerebral cortex of macaque monkey. The ratio of the total energy rate used by plastic synapses \(N\dot {E}\) (chemical energy) to neuron’s energy rate \(\dot {E}_{n}\) (electric energy used mainly for fast synaptic transmission and neural spiking) as a function of presynaptic firing rate fo for different levels of synaptic noise σw, and different regions of the macaque cortex (visual and frontal). Note that the energy contribution of plastic synapses to the neuron’s energy budget depends strongly on the synaptic noise level. For weak and intermediate noise this contribution is mostly marginal. For very large noise (σw = 2 nS) it can be substantial, but only for very large firing rates. The neuron’s energy rate \(\dot {E}_{n}\) was computed using Eq. (63), while the plasticity energy rate of all synapses \(N\dot {E}\) was computed from Eqs. (14-15). All plots are for λ = 9 ⋅ 10− 7, κ = 0.001, and σf = 10 Hz

Fig. 16
figure 16

Similar as in Fig. 15, but for human cerebral cortex. Overall, the ratio of the energy rate of plastic synapses to the neuron’s electric energy rate for human cortex is very similar to the one for macaque cortex

Information encoded by plastic synapses

In our model, information or memory about the mean input fo is written in the population of synapses, represented by the synaptic current v. In the stochastic steady state, the synaptic current v is characterized by probability distribution Ps(v|f0), which is related to the potential Φ(v|f0). This means that information encoded in synapses also depends on the structure of the potential (Eq. (13)).

The accuracy of the encoded information can be characterized by Fisher information IF (Cover and Thomas 2006). In general, larger IF implies a higher coding precision. Fisher information, related to synaptic current v, can be derived analytically (see the Methods). In the limit of small effective noise amplitude D we obtain:

$$ \begin{array}{@{}rcl@{}} \mathit{I}_{\mathit{F}}(f_{o})\!&=&\! p_{u}p_{d} \left( \! \left( \frac{{\Phi}_{u}-{\Phi}_{d}}{D}\right)' + \frac{1}{2}\left[\frac{({\Phi}^{(2)}_{u})'}{{\Phi}^{(2)}_{u}} - \frac{({\Phi}^{(2)}_{d})'}{{\Phi}^{(2)}_{d}} \right] \right)^{2} \\ &&\!+ {\sum}_{i=d,u} \mathit{p}_{i}\!\left( \! \frac{{\Phi}^{(2)}_{i}}{D}(v_{i}^{\prime})^{2} + \frac{1}{2}\left[\!\frac{({\Phi}^{(2)}_{i})'}{{\Phi}^{(2)}_{i}} - \frac{D^{\prime}}{D}\! \right]^{2} \right) \end{array} $$
(17)

where pi denote the fractions of synapses in the up (i = u) and down (i = d) states, and the prime denotes a derivative with respect to fo. Note that the effective noise amplitude D depends on fo, since σv depends on fo and \(D\sim {\sigma _{v}^{2}}\) (see Eq. (5)).

The first term in Eq. (17) proportional to pdpu is of the order of \( \sim 1/D^{2} \sim N^{2}\), and it appears only in the bistable regime (both pd and pu must be nonzero). This term depends on the difference in the potentials between up and down states. The second term in Eq. (17) proportional to the weighted sums of pd and pu is of the order of \(\sim 1/D \sim N \), and it is always present regardless of mono- or bistability. Thus, the first term is much bigger than the second for small D, which is the primary reason why IF (and coding accuracy) is several orders of magnitude larger when synapses are bistable (see below). Because Fisher information IF(fo) is either proportional to \(1/D^{2} \sim N^{2}\) (in the bistable regime) or to \(1/D \sim N\) (in the monostable regime), it implies that many synapses are much better in coding the presynaptic firing fo than a single synapse.

Equation for Fisher information (Eq. (17)) indicates that there is no simple relationship between IF and the synaptic current v. Rather, IF depends in a nonlinear manner on the derivatives of synaptic currents in up and down states \(v_{u}^{\prime }\) and \(v_{d}^{\prime }\). This follows form the fact that the potential Φ depends in a complicated way on v (see Fig. 10, and Eq. (13)).

Accuracy and lifetime of synaptically stored information vs plasticity energy rate

How the long-term energy used by synapses relates to the accuracy and persistence of stored information? The above results indicate that \(\dot {E}\) and IF depend inversely on the synaptic noise σv (or D), suggesting that its lowering should be beneficial since gain in information is accompanied by a decrease in synaptic energy rate.

A more complicated picture emerges if other parameters are varied, notably driving presynaptic input fo, at different regimes of mono- and bistability (Fig. 17). At the onset of bistability, Fisher information IF and memory lifetime Tm both increase dramatically, whereas the plasticity energy rate \(\dot {E}\) increases mildly. Approximate mean-field calculations of \(\dot {E}\) provide a small peak at the transition point, but more exact numerical calculations of \(\dot {E}\) based on Eqs. (1-2) indicate a smooth behavior (with a slight decrease), which suggest that the small peak in the mean-field is an artifact of the approximation (Fig. 17). Taken together, this implies that a high improvement in information coding accuracy and its retention, in the initial region of bistability, do not involve a huge amounts of energy. On the contrary, the corresponding energy cost is rather small.

Fig. 17
figure 17

Comparison of synaptic plasticity energy rate with accuracy and lifetime of stored information as a function of presynaptic firing rate. Dependence of pd, \(\dot {E}\), IF, and Tm on firing rate fo. Fisher information IF and memory lifetime Tm have large peaks at the onset of bistability, whereas the synaptic energy rate \(\dot {E}\) increases only mildly. Beyond the transition point to bistability, IF and Tm exhibit a different dependence on fo than \(\dot {E}\). The former two quantities decrease while the latter increases with fo. In the dependence of \(\dot {E}\) on fo, the solid line corresponds to the mean-field approximation, and the dashed line to the exact numerical result. All plots are for λ = 9 ⋅ 10− 7, κ = 0.001, and σw = 0.1 nS

For higher fo, deeper in the bistability region, there is a different trend. In this coexistence region, \(\dot {E}\) increases monotonically, while IF and Tm decrease, which in turn indicates an inefficiency of information storing. However, even here the huge values in IF and Tm overcome the growth in \(\dot {E}\). For even higher fo, in the monostable phase with strong synapses only, \(\dot {E}\) still increases monotonically, whereas IF and Tm further decrease to the levels similar for very small fo. Consequently, the biggest gains in synaptic information precision and lifetime per energy used (\(I_{F}/\dot {E}\) and \(T_{m}/\dot {E}\)) are achieved for the bistable phase only (Fig. 18). Interestingly, the gains in the information precision and lifetime depend nonmonotonically on the plasticity amplitude λ, and there are some optimal values of λ that are different for the gains \(I_{F}/\dot {E}\) and \(T_{m}/\dot {E}\) (Fig. 19).

Fig. 18
figure 18

Gains in information accuracy and lifetime per synaptic energy used as functions of presynaptic firing rate. Fraction of synapses in the down state pd (upper panel), the ratio \(I_{F}/\dot {E}\) (middle panel) and \(T_{m}/\dot {E}\) (lower panel) as functions of fo. The biggest gains in information and memory lifetime are at the transition point to the bistability. Solid lines correspond to the energy rate calculated in the mean-field, and dashed lines to the energy rate calculated numerically. All plots are for λ = 9 ⋅ 10− 7, κ = 0.001, and σw = 0.1 nS

Fig. 19
figure 19

Gains in information accuracy and lifetime per synaptic energy used as functions of plasticity amplitude. Fraction of synapses in the down state pd (upper panel), the ratio \(I_{F}/\dot {E}\) (middle panel) and \(T_{m}/\dot {E}\) (lower panel) as functions of λ. Note the sharp peaks for \(I_{F}/\dot {E}\) at the transition point from mono- to bi-stability, and for \(T_{m}/\dot {E}\) at the transition point from bi- to mono-stability. Solid lines correspond to the energy rate calculated in the mean-field approach. All plots are for fo = 10 Hz, κ = 0.001, and σw = 0.1 nS

Taken together, these results suggest that storing of accurate information in synapses can be relatively cheap in the bistable regime, and thus metabolically efficient.

Precision of coding memory is restricted by sensitivity of synaptic plasticity energy rate on the driving input

The above results suggest that synaptic energy utilization does not limit directly the coding precision of a stimulus, because there is no simple relationship between Fisher information and power dissipated by synapses. However, a careful inspection of the curves in Fig. 17 suggests that there might be a link between IF and the derivative of \(\dot {E}\) with respect to the driving input fo. In fact, it can be shown that in the most interesting regime of synaptic bistability, in the limit of very weak effective noise D↦0, we have either (see the Methods)

$$ \begin{array}{@{}rcl@{}} I_{F}(f_{o})= \frac{(\partial p^{(0)}_{d}/\partial f_{o})^{2}}{p^{(0)}_{u}p^{(0)}_{d}} \left[ 1 + O(D) \right], \end{array} $$
(18)

or equivalently

$$ \begin{array}{@{}rcl@{}} I_{F}(f_{o})= \frac{(\partial\dot{E}/\partial f_{o})^{2}}{p^{(0)}_{u}p^{(0)}_{d} (\dot{E}_{u}-\dot{E}_{d})^{2}}\left[ 1 + O(D) \right], \end{array} $$
(19)

where \(p_{d}^{(0)}, p_{u}^{(0)}\) are the fractions of synapses in the down and up states (weak and strong synapses) in the limit D↦0. It is important to stress that simple formulas (18) and (19) have a general character, since they do not depend explicitly on the potential Φ, and thus they are independent of the plasticity type model. Equation (18) shows that synaptic coding precision increases greatly for sharp transitions from mono- to bistability, since then \((\partial p^{(0)}_{u}/\partial f_{o})^{2}\) is large. Additionally, Eq. (19) makes an explicit connection between precision of synaptic information and nonequilibrium dissipation. Specifically, the latter formula implies that to attain a high fidelity of stored information, the energy used by synapses \(\dot {E}\) does not have to be large, but instead it must change sufficiently quickly in response to changes in the presynaptic input.

We can also estimate a relative error ef in synaptic coding of the average presynaptic firing fo. This error is related to Fisher information by a Cramer-Rao inequality \(e_{f} \ge (f_{o}\sqrt {I_{F}})^{-1}\) (Cover and Thomas 2006). Using Eq. (19), in our case this relation implies

$$ \begin{array}{@{}rcl@{}} e_{f} \ge \frac{\sqrt{p^{(0)}_{u}p^{(0)}_{d}}}{f_{o}|\dot{E}^{\prime}/(\dot{E}_{u}-\dot{E}_{d})|}, \end{array} $$
(20)

where the prime denotes the derivative with respect to fo. The value of the product pupd is in the range from 0 to 1/4. In the worst case scenario for coding precision, i.e. for \(p^{(0)}_{u}p^{(0)}_{d}= 1/4\), this implies that a 10% coding error (ef = 0.1), corresponds to the relative sensitivity of the plasticity energy rate on presynaptic firing \(f_{0}|\dot {E}^{\prime }/(\dot {E}_{u}-\dot {E}_{d}|= 5\). Generally, the larger the latter value, the higher precision of synaptic coding. In our particular case, this high level of synaptic coding fidelity is achieved right after the appearance of bistability (Fig. 17).

Discussion

Summary of the main results

In this study, the energy cost of long-term synaptic plasticity was determined and compared to the accuracy and lifetime of an information stored at excitatory synapses. The main results of this study are:

  1. (a)

    Formulation of the dynamic mean-field of the extended BCM synaptic plasticity model (Eqs. (3)–(5)).

  2. (b)

    Energy rate of plastic synapses increases nonlinearly both with the presynaptic firing rate (Figs. 11 and 14) and with average synaptic current or weights (Figs. 13 and 14).

  3. (c)

    Coding of more accurate information in synapses need not require a large energy cost (cheap long-term information). The accuracy of stored information about presynaptic input can increase by several orders of magnitude with only a mild increase in the plasticity energy rate at the onset of bistability (Figs. 17 and 18).

  4. (d)

    The accuracy of information stored at synapses and its lifetime are not limited by the available energy rate, but by the sensitivity of the energy rate on the presynaptic firing. For very weak synaptic noise the coding accuracy at plastic synapses (Fisher information) is proportional to the square of derivative of the plasticity energy rate with respect to mean presynaptic firing (Eq. (19)).

  5. (e)

    Energy rate of synaptic plasticity, which is of chemical origin, constitutes in most cases only a tiny fraction of neuron’s energy rate associated with fast synaptic transmission and action potentials, which are of electric origin. That fraction can be substantial only for very large synaptic noise and presynaptic firing rates (Figs. 15 and 16).

Discussion of the main results

The dynamic mean-field for synaptic plasticity was derived analytically by applying (i) the timescale separation between neural and synaptic plasticity activities, and (ii) dimensional reduction of the original synaptic system. The formulated mean-field of the synaptic current v seems to work reasonably well for average 〈v〉 if intrinsic synaptic noise σw is small and presynaptic firing rates fo are not too high (Fig. 8). For larger σw and fo, the mean-field value 〈v〉 diverges form the exact numerical average calculated from Eqs. (12).

The mean-field approximation to the synaptic energy rate \(\dot {E}\) was additionally derived in the limit of small effective noise D (either large N or small σw, or both). Surprisingly, the mean-field approximation for \(\dot {E}\) works better for intermediate noise σw than for its smaller values (Fig. 11). For those intermediate values of σw, the energy rate \(\dot {E}\) calculated in the mean-field is close to that calculated numerically in the whole neurophysiological interval of fo variability (Fig. 11, middle panel). It seems that the primary reason for the breakdown of the mean-field for 〈v〉 and \(\dot {E}\) is the way the integrals in Eqs. (43) and (57) were approximated. In those integrals, it was assumed that xd1 (the lower limit of integration) tends to \(-\infty \) for D↦0, which however is not always true, since \(x_{d1} \sim v_{d} \sim \epsilon \), and vd can be very small for very small value of 𝜖 (see Eq. 11), especially if fo is small. As a consequence, the real value of xd1 can be in the range − (0.1 − 1), even for very small D.

Comparing the mean-field \(\dot {E}\) to its numerical values suggests that the peaks in the mean-field approximation of \(\dot {E}\) are artifacts (Fig. 11). They are the result of some (small) differences in the exact location of the transitions points mono/bistability between mean-field and the numerics. This causes certain errors in the relative magnitudes of pd and pu, which leads to over- or under-estimates in the mean-field values of \(\dot {E}\) (Eq. (14)).

Nonlinear increase of plasticity energy rate with presynaptic firing rate fo and average synaptic current v (Figs. 11, 13, and 14) suggests that high presynaptic activities and large synapses/spines are metabolically costly (there exists a positive correlation between synaptic current and size; see Kasai et al. 2003). Consequently, it seems that large firing rates and synaptic weights (proportional to average synaptic current) should not be preferred in real neural circuits. This simple conclusion is qualitatively in line with experimental data for cortical neurons, showing low mean firing rates and weak mean synaptic weights, with skewed distributions (Buzsaki and Mizuseki 2014).

The most striking result of this study is that precise memory storing about presynaptic firing rate fo does not have to be metabolically expensive. Strictly speaking, the information encoded at synapses, i.e., its accuracy and lifetime, do not have to correlate positively with the energy used by synapses (Fig. 17). Such a correlation is only present at the onset of synaptic bistability, where a large increase in information precision (IF) and lifetime (Tm) is accompanied only by a mild increase in the energy rate. This suggests an energetic efficiency of stored information in the bistable synaptic regime, i.e., relatively high information gain per energy used (Fig. 18). Moreover, the results in Fig. 18 show that there exists an optimal value of the presynaptic firing rate for which the information gain per energy, as well as memory lifetime per energy, are maximal. An additional support for the metabolic efficiency of synaptic information comes from the fact that energy used \(\dot {E}\) and coding precision IF depend the opposite way on the effective noise amplitude D (compare Eqs. (15) and (17)), and thus, IF increases, while \(\dot {E}\) decreases with decreasing D. Because \(D\sim 1/N\), this also implies that \(I_{F}\sim N^{2}\) in the bistable regime, i.e., that more synapses (large N) are much better at precise coding of mean presynaptic firing than a single synapse (N = 1). Taken together, these findings are compatible with a study by Still et al. (2012) showing that abstract stochastic systems with memory, operating far from thermodynamic equilibrium, can be the most predictive about an environment if they use minimal energy.

Estimating an external variable is never perfect, and it is shown here that synaptic coding accuracy (IF) relates to the derivative of the energy rate with respect to an average input. The fundamental relationship linking memory precision and synaptic metabolic sensitivity is present in Eq. (19), which is valid regardless of the specific plasticity mechanism, as long as synapses can exist in two metastable states, in the limit of very small synaptic noise D. This binary synaptic nature is a key feature enabling a high fidelity of long-term synaptic information (Petersen et al. 1998), despite ongoing neural activity, which is generally detrimental to information storing (Fusi et al. 2005). Specifically, for realistic neurophysiological parameters, it is seen from Fig. 17 that the relative coding error in synapses \(e_{f} \sim (f_{o}\sqrt {I_{F}})^{-1}\) can be as small as 0.03 − 0.1 (or 3 − 10%) near the onset of bistability. However, away from that point the error gets larger. Thus, again it seems that there exist an optimal firing rate fo for which coding accuracy is maximal and quite high, despite large fluctuations in presynaptic neural activities (large σf in relation to fo).

Neural computation is thought to be metabolically expensive (Aiello and Wheeler 1995; Laughlin et al. 1998; Attwell and Laughlin 2001; Karbowski 2007, 2009; Niven and Laughlin 2008; Harris et al. 2012), and it must be supported by cerebral blood flow and constrained by underlying microvasculature and neuroanatomy (Karbowski 2014, 2015). It is shown here that an important aspect of this computation, namely long-term synaptic plasticity involved in learning and memory, constitutes in most cases only a small fraction of that neuronal energy cost associated mostly with fast synaptic transmission and spiking (Figs. 15 and 16). Specifically, for intermediate/large synaptic noise (σw = 0.1 and 0.5 nS), metabolic cost of synaptic plasticity can maximally be on a level of 1 − 10% of the electric neuronal cost, both for human and macaque monkey (Figs. 15 and 16). Higher levels of synaptic plasticity cost (maximally 100% of electric cost) are possible, but only for very large synaptic noise, σw = 2.0 nS (Figs. 15 and 16). The latter value is, however, unlikely because it is 20 times larger than the mean values of synaptic weights wi (see Fig. 2b), and thus, it seems that higher costs of synaptic plasticity are physiologically implausible. Taken together, these results suggest that a precise memory storing can be relatively cheap, which agrees with empirical estimates presented in Karbowski (2019).

Discussion of other aspects of the plasticity model

In this study, an extended BCM model of synaptic plasticity is introduced and solved. There are 3 additional elements in our model (Eq. (1)) that are absent in the classical BCM plasticity rule: weight decay term (\(\sim 1/\tau _{w}\)), synaptic noise (\(\sim \sigma _{w}/\sqrt {\tau _{w}}\)), and the nonlinear dependence of the postsynaptic firing rate on synaptic input (Eq. (6)). Moreover, it is assumed here that presynaptic firing rates fluctuate stochastically and fast around a common mean fo with standard deviation σf. These features make the behavior of our model significantly different from the behavior of the classical BCM model (Bienenstock et al. 1982). In particular, due to the stochasticity of synaptic weights, our model does not exhibit an input selectivity, in contrast to the classical BCM rule. Input selectivity in the classical BCM means that the largest static presynaptic firing rate “selects” its corresponding synapse by increasing its weight, in such a way that the weights of all other synapses decay to zero. In our model this never happens, because all synapses are driven on average by the same input, and more importantly, synaptic noise constantly brings all synapses up and down in an unpredictable fashion. For these reasons, the mean-field approach proposed here, although mathematically correct, does not make sense for the classical BCM rule (no weights decay, no noise) if our goal is studying input selectivity, because in that model only one synapse is effectively present at the steady-state, and there is no need for large N approach.

The main reasons for choosing the mean-field approach, and constructing a single dynamical equation for the population averaged synaptic current v are: (i) we wanted to treat analytically the multidimensional stochastic model given by Eqs. (1-2), and (ii) the variable v emerges as a natural choice, since r in Eqs. (12) depends only on one variable, precisely on v (see Eq. (6)). The feature (ii) makes Eqs. (3) and (6) a closed mathematical system of just two equations that can be handled analytically. Another practical reason behind introducing the dynamic mean-field is that it enables us to obtain explicit formulae for synaptic plasticity energy rate and Fisher information (coding accuracy).

In deriving the dynamic mean-field we assumed that the time constant related to wi dynamics, i.e. τw, is much larger than the time constant related to the sliding threshold 𝜃, which is T𝜃. This is in agreement with empirical observations and estimations, since τw must be of the order of 1 hr to be consistent with slice experiments, showing wiping out synaptic potentiation after about 1 hr when presynaptic firing becomes zero (Frey and Morris 1997; Zenke et al. 2013). (Note that τw refers to the decay of synaptic weights to the baseline value 𝜖a, and it should not be confused with a characteristic time of plasticity induction, which is controlled by the product λfir in Eq. (1) and which can be much faster, \(\sim \) minutes (Petersen et al. 1998; O’Connor et al. 2005).) On the other hand, the time constant τ𝜃 must be smaller than about 3 min for stability reasons (Zenke and Gerstner 2017; Zenke et al. 2013), and it even has been estimated to be as small as \(\sim 12\) sec (Jedlicka et al. 2015).

Although individual synapses in the original model Eqs. (12) exhibit bistability (see Figs. 1 and 2), this bistability has a collective character. That is, if most synaptic weights are initially weak, then they all converge into a lower fixed point. On the other hand, if a sufficient fraction of synaptic weights is initially strong, then they all converge into an upper fixed point (Fig. 1). This means that the majority of synapses participate in a coordinated switching between up and down states, due to effective noise (internal and external). This mechanism is probably different from the mechanism found in Petersen et al. (1998) and O’Connor et al. (2005), where bistability was reported on a level of a single synapse, independent of other synapses. (However, from these papers it is difficult to judge how long the potentiation lasts in the absence of presynaptic stimulation). Our scenario for bistability is conceptually closer to the model of synaptic bistability proposed by Zenke et al. (2015), which also emerges on a population level. Interestingly, both models, the one presented here and the one in Zenke et al. (2015), exhibit the so-called anti-Hebbian plasticity, in the sense that LTP (i.e. \(\dot {v} > 0\)) appears for low firing rates, instead of LTD as for classical BCM rule. However, in the present model the initial LTP window is very narrow, and appears for very small postsynaptic firing rates \(r < (cf_{0}/\kappa )\epsilon \sim O(\epsilon )\). This feature is necessary for stable bistability, and does not contradict experimental results on BCM rule verification (Kirkwood et al. 1996), showing LTD for low firing rates. The reason is that these experiments were performed for firing rates above 0.1 Hz, leaving uncertainty about LTP vs. LTD for very low activity levels (or very long times).

The cooperativity in synaptic bistable plasticity found here is to some extent similar to the data showing that neighboring dendritic spines interact and tend to cluster as either strong or weak synapses (Govindarajan et al. 2006, 2011). These clusters can be as long as single dendritic segments, which is called “clustered plasticity hypothesis” (Govindarajan et al. 2006, 2011). However, the difference is that in the present model there are no dendritic segments, and spatial dependence is averaged over, which leads effectively to one synaptic “cluster” either with up or down states.

Metabolic cost of synaptic plasticity in the mean-field: intuitive picture

The formula for the plasticity energy rate (Eq. (15)) contains various derivatives of the effective potential Φ, which encodes the plasticity rules for synaptic weights. In this scenario, the synaptic plasticity corresponds to a driven stochastic motion of the population averaged postsynaptic current v in the space constrained by the potential Φ, in analogy to a ball moving on a rugged landscape with a ball coordinate corresponding to v. Because our potential can exhibit two minima separated by a potential barrier, the plasticity considered here can be viewed as a stochastic process of “hill climbing”, or transitions between the two minima (the idea of “synaptic potential” was used also in Van Rossum et al. (2000), Billings and Van Rossum (2009), and Graupner and Brunel (2012)).

The energy rate of plastic synapses \(\dot {E}\) (or power dissipated by plasticity) is the energy used for climbing the potential shape in v-space, and it is proportional to the average temporal rate of decrease in the potential, −〈dΦ/dt〉, due to variability in v. In terms of thermodynamics, the plasticity energy rate \(\dot {E}\) is equivalent to the entropy production rate, because synapses like all biological systems operate out of thermodynamic equilibrium with their environment and act as dissipative structures (Nicolis and Prigogine 1977). Dissipation requires a permanent influx of energy from the outside (provided by blood flow, see e.g. Karbowski 2014) to maintain synaptic structure, which in our case is the distribution of synaptic weight. A physical reason for the energy dissipation in synapses in the steady state is the presence of noise (both internal synaptic \(\sim \sigma _{w}\), and external presynaptic \(\sim \sigma _{f}\)), causing fluctuations, that tend to wipe out the pattern of synaptic weights. Thermodynamically speaking, this means reducing the synaptic order and thus increasing synaptic entropy. To preserve the order, this increased entropy has to be “pumped out”, in the form of heat, by investing some energy in the process, which relates to ATP consumption.

Thermodynamics of memory storing and bistability

The general lack of high energetic demands for sustaining accurate synaptic memory may seem non-intuitive, given an intimate relation between energy and information known from classical physics (Leff and Rex 1990). For example, transmitting 1 bit of information through synapses is rather expensive and costs 104 ATP molecules (Laughlin et al. 1998), and a comparative number of glucose molecules (Karbowski 2012), which energetically is much higher (\(\sim 10^{5} kT\)) than a thermodynamic minimum set by the Landauer limit (\(\sim 1 kT\)) (Landauer 1961). Additionally, there are classic and recent theoretical results that show dissipation-error tradeoff for biomolecular processes, i.e., that higher coding accuracy needs more energy (Lan et al. 2012; Mehta and Schwab 2012; Barato and Seifert 2015; Bennett 1979; Lang et al. 2014). How can we understand our result in that light?

First, there is a difference between transmitting information and storing it, primarily in their time scales, and faster processes generally need more power (see also below). Second, it is known from thermodynamics that erasing an information can be more energy costly than storing information (Landauer 1961; Bennett 1982), since the former process is irreversible and is always associated with energy dissipation, and the latter can in principle be performed very slowly (i.e. in equilibrium with the environment) without any heat released. In our system, the information is maximal for intermediate presynaptic input generating metastability with two synaptic states (Fig. 2). If we decrease the input below a certain critical value, or increase it above a certain high level, our system becomes monostable, which implies that it does not store much information (entropy is close to zero). Thus, the transition from bistability to monostability is equivalent to erasing the information stored in synapses, which according to the Landauer principle (Landauer 1961; Berut et al. 2012) should cost energy.

Third, the papers showing energy-error tradeoff in biomolecular systems (Lan et al. 2012; Mehta and Schwab 2012; Lang et al. 2014; Bennett 1979; Barato and Seifert 2015) use fairly linear (or weakly nonlinear) models, while in our model the plasticity dynamics is highly nonlinear (see Eqs. (1), (3), and (6)). Additionally, we consider the prediction of an external variable (average input fo), in contrast to some of the biomolecular models (Bennett 1979; Barato and Seifert 2015), which dealt with estimating errors in an internal variable.

Cost of synaptic plasticity in relation to other neural costs

The energy cost of synaptic plasticity is a new and an additional contribution to the overall neural energy budget considered before and associated with fast signaling (action potentials, synaptic transmission, maintenance of negative resting potential), and slow nonsignaling factors (Attwell and Laughlin 2001; Engl and Attwell 2015). The important distinction between slow synaptic plasticity dynamics and fast signaling is that the former is of chemical origin (protein/receptor interactions), while the latter is of electric origin (ionic movement against gradients of potential and concentration). Consequently these two, although coupled but to a large extent, separate phenomena have different characteristic time and energy scales, which results in rather small energy cost of synaptic plasticity in relation to the fast electric cost (Figs. 15 and 16).

The earlier studies of the neuronal energy cost (Attwell and Laughlin 2001; Engl and Attwell 2015) provided important order of magnitude estimates based on ATP turnover rates, but they had mainly a phenomenological character and cannot be directly applied to nonlinear dynamics underlying synaptic plasticity. Contrary, the current approach and the complementary approach taken in Karbowski (2019) are based on “first principles” taken from non-equilibrium statistical physics and in combination with neural modeling can serve as a basis for future more sophisticated calculations of energy used in excitatory synapses, possibly with inclusion of some molecular detail (e.g. Lisman et al. 2012; Miller et al. 2005; Kandel et al. 2014).

The calculations performed here indicate that the energy dissipated by synaptic plasticity increases nonlinearly with presynaptic firing rate (Fig. 11). The dependence on presynaptic firing is consistent with a strong dependence of CaMKII autophosphorylation level on Ca2+ influx frequency to a dendritic spine (De Koninck and Schulman 1998), which should translate to a similar dependence of ATP consumption rate related to protein activation on presynaptic firing. Moreover, these results raise the possibility of observing or measuring the energetics of synaptic plasticity for high firing rates. It is hard to propose a specific imaging technique for detecting enhanced synaptic plasticity, but nevertheless, it seems that techniques relying on spectroscopy, e.g., near-infrared spectroscopy with its high spatial and temporal resolution, could be of help.

Regardless of whether the energetics of synaptic plasticity is observable or not, it could have some functional implications. For example, it was reported that small regional decreases in glucose metabolic rate associated with age, and presumably with synaptic decline, lead to significant cognitive impairment associated with learning (Gage et al. 1984).

A relatively small cost of plasticity in relation to neuronal cost of fast electric signaling (Figs. 15 and 16) is in some part due to relatively slow dynamics of spine conductance decay, quantified by \(\tau _{w}\sim 1\) hr (Frey and Morris 1997; Zenke et al. 2013), since \(\dot {E}\sim 1/\tau _{w}\) in Eqs. (15) and (16). The time scale τw characterizes the duration of early LTP on a single synapse level. On a synaptic population level, characterized by synaptic current v, the duration of early LTP is given by Tm (memory maintenance of a brief synaptic event), which can be of the order of several hours.

Late phases of LTP and LTD, during which memory is consolidated, are much slower than τw and they are governed by longer timescales of the orders of days/weeks (Ziegler et al. 2015; Redondo and Morris 2011). Consequently, one can expect that such plasticity processes, as well as equally slow homeostatic synaptic scaling (Turrigiano and Nelson 2004), should be energetically inexpensive. Nevertheless, there are experimental studies related to long-term memory cost in fruitfly that claim that memory in general is metabolically costly (Mery and Kawecki 2005; Placais and Preat 2013; Placais et al. 2017). However, the problem with those papers is that they do not measure directly the energy cost related to plasticity in synapses, but instead they estimate the global fly metabolism, which indeed affects long-term memory (Mery and Kawecki 2005; Placais and Preat 2013). In a recent paper by Placais et al. (2017) it was found that upregulated energy metabolism in dopaminergic neurons is correlated with long-term memory formation. However, again, no measurement was made directly in synapses, and thus it is difficult to say how much of this enhanced neural metabolism can be attributed to plasticity processes and how much to enhanced neural and synaptic electric signaling (spiking and transmission). It is important to stress that the energy cost of protein synthesis, process believed to be associated with long-term memory consolidation (Kandel et al. 2014), was estimated to be very small, on a level of \(\sim 0.03-0.1\%\) of the metabolic cost of fast synaptic electric signaling related to synaptic transmission (Karbowski 2019, see also below for an alternative estimate). Consequently, it is possible that memory induction, maintenance, and consolidation involve a significant increase in neural activity and hence metabolism, but it seems that the majority of this energy enhancement goes for upregulating neural electric activity, not for chemical changes in plastic synapses.

The energetics of very slow processes associated with memory consolidation were not included in the budget of the energy scale Eo (present in Eq. (15), and estimated in the Methods), since we were concerned only with the early phases of LTP and LTD, which are believed to be described by BCM model (both standard and extended). Nevertheless, for the sake of completeness, we can estimate the energy cost of the late LTP and LTD, as well as energy requirement of mechanical changing of spine volume (also not included in the budget of Eo).

Protein synthesis, which is associated with l-LTP and l-LTD, underlines synaptic consolidation and scaling (Kandel et al. 2014). There are roughly 104 proteins in PSD including their copies (Sheng and Hoogenraad 2007), on average each with \(\sim 400-500\) amino acids, which are bound by peptide bonds. These bonds require 4 ATP molecules to form (Engl and Attwell 2015), which is 4 ⋅ 20kT of energy (Phillips et al 2012). This means that chemical energy associated with PSD proteins is about (3.2 − 4.0) ⋅ 108kT, i.e. (1.6 − 2.0) ⋅ 107 ATP molecules, or equivalently (1.4 − 1.75) ⋅ 10− 12 J. Given that an average lifetime of PSD proteins is 3.7 days (Cohen et al 2013), we obtain the energy rate of protein turnover as \(\sim (4.6-5.8)\cdot 10^{-18}\) W, or 52 − 65 ATP/s per spine. For human cerebral cortex with a volume of 680 cm3 (Hofman 1988) and average density of synapses 3 ⋅ 1011 cm− 3 (Huttenlocher and Dabholkar 1997), we have 2 ⋅ 1014 synapses. This means that the global energy cost of protein turnover in spines of the human cortex is (9.2 − 11.5) ⋅ 10− 4 W, or equivalently (1 − 1.3) ⋅ 1016 ATP/s, which is extremely small (\(\sim 0.01 \%\)) as human cortex uses about 5.7 Watts of energy (Karbowski 2009).

The changes in spine volume are related directly to the underlying dynamics of actin cytoskeleton (Honkura et al. 2008; Cingolani and Goda 2008). We can estimate the energy cost of spine size using a mechanistic argument. Dendritic spine grows due to pressure exerted on the dendrite membrane by actin molecules. The reported membrane tension is in the range (10− 4 − 1) kT/nm2 (Phillips et al. 2009) with the upper bound being likely an overestimate, given that it is close to the so-called rapture tension (1 − 2 kT/nm2), when the membrane breaks (Phillips et al. 2009). A more reasonable value of the membrane tension seems to be 0.02 kT/nm2, as it was measured directly (Stachowiak et al. 2013). Taking this value, we get that to create a typical 1 μm2 of stable spine requires 2 ⋅ 104kT or 103 ATP molecules. Since the actin turnover rate in spine is 1/40 sec− 1 (Honkura et al. 2008), which is also the rate of spine volume dynamics, we obtain that the cost of maintaining spine size is 25 ATP/s. This value is comparable but two-fold smaller than the ATP rate used for PSD protein turnover per spine (52 − 65 ATP/s) given above.

How do the costs of protein turnover and spine mechanical stability relate to the energy cost of e-LTP and e-LTD calculated in this paper using the extended BCM model? From Fig. 11, we get that the latter type of synaptic plasticity uses energy in the range (10− 3 − 100)Eo (solid lines for exact numerical results) per second per spine, depending mainly on firing rate and synaptic noise. Since the energy scale Eo = 2.3 ⋅ 104 ATP (see the Methods), we obtain that the energy cost of the plasticity related to e-LTP and e-LTD is 23 − 23000 ATP/s, i.e., its upper range can be 400 times larger than the contributions from protein turnover and spine volume changes. This result strongly suggests that the calculations of the energetics of synaptic plasticity based on the extended BCM model provide a large portion of the total energy required for the induction and maintenance of synaptic plasticity.

Methods

Neuron model

We consider a sensory neuron with a nonlinear firing rate curve (so called class one, valid for most biophysical models) and with activity adaptation given by (Ermentrout 1998; Ermentrout and Terman 2010)

$$ \begin{array}{@{}rcl@{}} \tau_{r}\frac{dr}{dt} = -r + \bar{A}\sqrt{I_{syn}-s} \end{array} $$
(21)
$$ \begin{array}{@{}rcl@{}} \tau_{a}\frac{ds}{dt} = -s + \bar{\kappa} r \end{array} $$
(22)

where r is the instantaneous neuron firing rate with mean amplitude \(\bar {A}\), s is the adaptation current (or equivalently self-inhibition) with the intensity \(\bar {\kappa }\), τr and τa are the time constants for variability in neural firing and adaptation, and Isyn is the total excitatory synaptic current to the neuron provided by N excitatory synapses, i.e., \(I_{syn} \sim {\sum }_{i} f_{i}w_{i}\). If Isyn < s in Eq. (21), then this equation simplifies and becomes τrdr/dt = −r. In order to ensure a saturation of the firing rate r for very large number of synapses N, and for s to be relevant in this limit, \(\bar {A}\) and \(\bar {\kappa }\) must scale as \(\bar {A}= A/\sqrt {N}\) and \(\bar {\kappa }= N\kappa \). In a mature brain N can fluctuate due to structural plasticity, but we assume in agreement with the data (Sherwood et al. 2020; DeFelipe et al. 2002) that there is some well defined average value of N.

We assume that the neuron is driven by stochastic presynaptic firing rates fi (i = 1,...,N) that change on a much faster time scale τf than the synaptic weights wi. Additionally, we assume that the fast variability in presynaptic firing rates is stationary in a stochastic sense, i.e., the probability distribution of fi does not change in time. Consequently, for each time step t, in the stationary stochastic state we can write

$$ \begin{array}{@{}rcl@{}} f_{i}(t) = f_{o} + \sigma_{f}x_{i}(t) \end{array} $$
(23)

where fo is the mean firing rate of all presynaptic neurons and σf denotes the standard deviation in the variability of fi. The variable xi is the Gaussian random variable, which reflects noise in the presynaptic neuronal activity. For the noise xi we have the following averages (Van Kampen 2007): 〈xix = 0 and 〈xixjx = δij, where the last equality means that different xi are independent, which also implies that fluctuations in different firing rates fi are statistically independent. Equation (23) allows negative values of fi, which is not realistic. However, in analytical calculations this is not a problem, because we use only average of fi and its standard deviation σf. In numerical simulations, we prevent the negative values of fi by setting fi = 0, whenever fi becomes negative.

Given Eq. (23), one can easily verify the following average

$$ \begin{array}{@{}rcl@{}} \langle f_{i}(t)^{2} \rangle_{x} = {f_{o}^{2}} + {\sigma_{f}^{2}}, \end{array} $$
(24)

Equation (24) indicates that presynaptic firing rates fluctuate around average value fo with standard deviation σf. The important point is that these fluctuations are fast, on the order of τf (\(\sim 0.1-1\) sec), which is much faster than the timescale τw. Equation (24) is also used below.

Definition of synaptic current per spine v

The synaptic current Isyn has two additive components related to AMPA and NMDA receptors, Isyn = Iampa + Inmda, with the receptor currents

$$I_{ampa}= qq_{ampa}|V_{r}|\tau_{ampa}g_{ampa}{\sum}_{i=1}^{N}f_{i}M^{ampa}_{i},$$

and

$$I_{nmda}= qq_{nmda}|V_{r}|\tau_{nmda}g_{nmda}{\sum}_{i=1}^{N}f_{i}M^{nmda}_{i},$$

where q is the probability of neurotransmitter release, Vr is resting membrane potential of the neuron (we used the fact that the reversal potential for AMPA/NMDA is close to 0 mV; Ermentrout and Terman 2010), gampa and gnmda are single channel conductances of AMPA and NMDA receptors, qampa and qnmda are probabilities of their opening with characteristic times τampa and τnmda. The symbols \(M^{ampa}_{i}\) and \(M^{nmda}_{i}\) denote AMPA and NMDA receptor numbers for spine i. Data indicate that during synaptic plasticity the most profound changes are in the number of AMPA receptors Mampa and opening probability of NMDA qnmda (Kasai et al. 2003; Huganir and Nicoll 2013; Matsuzaki et al. 2004). We define the excitatory synaptic weight wi as a weighted average of AMPA and NMDA conductances, i.e.,

$$ \begin{array}{@{}rcl@{}} w_{i}&=& (\tau_{nmda}q_{nmda}M_{i}^{nmda}g_{nmda} \\&&+ \tau_{ampa}q_{ampa}M_{i}^{ampa}g_{ampa}) /(\tau_{nmda}+\tau_{ampa}). \end{array} $$

This enables us to write the synaptic current per spine, i.e. v = Isyn/N (which is more convenient to use than Isyn), as

$$ \begin{array}{@{}rcl@{}} v= \frac{\upbeta}{N} {\sum}_{i=1}^{N} f_{i}w_{i}, \end{array} $$
(25)

where β = q|Vr|(τnmda + τampa). The current per spine v is the key dynamical variable in our dimensional reduction procedure and subsequent analysis (see below).

Dependence of the postsynaptic firing rate r on synaptic current v

The time scales related to neuronal firing rates and firing adaptation τf,τr and τa are much faster than the time scale τw associated with synaptic plasticity. Therefore, for long times of the order of τw, firing rate r and postsynaptic current adaptation s are in quasi-stationary state, i.e., dr/dtds/dt ≈ 0. This implies a set of coupled algebraic equations:

$$ \begin{array}{@{}rcl@{}} r= A\sqrt{v - s/N} \\ s= N\kappa r, \end{array} $$
(26)

which yields a quadratic equation for r, i.e., r2 + A2κrA2v = 0. The solution for r, which depends on v, is given by

$$ \begin{array}{@{}rcl@{}} r= \frac{1}{2}\left( -A^{2}\kappa + \sqrt{A^{4}\kappa^{2} + 4A^{2}v}\right), \end{array} $$
(27)

Note that r depends nonlinearly on the synaptic current v. Additionally, s/N is always smaller than v in the steady state, which means that r in Eq. (26) is well defined.

Dimensional reduction of the extended BCM model: Dynamic mean-field model

We focus on the population averaged synaptic current v (Eq. (25)). Since v is proportional to weights wi, and because r depends directly on v, it is possible to obtain a closed form dynamic equation for plasticity of v. Thus, instead of dealing with N dimensional dynamics of synaptic weights, we can study a one dimensional dynamics of the population average current v. This dimensional reduction is analogous to observing the motion of a center of mass of many particle system, which is easier than simultaneous observation of the motions of all particles. Such an approach is feasible for an analytical treatment where one can directly apply the methods of stochastic dynamical systems and thermodynamics (Van Kampen 2007).

The time derivative of v, given by Eq. (25), is denoted with dot and reads

$$\dot{v}= (\upbeta/N) {\sum}_{i=1} (\dot{f_{i}}w_{i} + f_{i}\dot{w_{i}}) \approx (\upbeta/N) {\sum}_{i=1} f_{i}\dot{w_{i}},$$

where we used the fact that fluctuations in fi are much faster than changes in weights wi, and hence fi are in stochastic quasi-stationary states. Now, using Eq. (1) for \(\dot {w}_{i}\) and quasi-stationarity of 𝜃, we obtain the following equation for \(\dot {v}\):

$$ \begin{array}{@{}rcl@{}} \dot{v} &=& \frac{\lambda\upbeta}{N} r^{2}(1-\alpha r) {\sum}_{i=1}^{N} {f_{i}^{2}} - \frac{1}{\tau_{w}}\left( v- \frac{\epsilon c}{N}{\sum}_{i=1}^{N} f_{i}\right) \\&&+ \frac{\sqrt{2}\upbeta\sigma_{w}}{N\sqrt{\tau_{w}}} {\sum}_{i=1}^{N} (f_{i} + \tau_{f}{f_{i}^{2}})\eta_{i}, \end{array} $$
(28)

where c = aβ.

The next step is to perform averaging over fast fluctuations in presynaptic rate fi. We need to find the following three averages with respect to the random variable xi: \(\langle {\sum }_{i=1}^{N} f_{i} \rangle _{x}\), \(\langle {\sum }_{i=1}^{N} {f_{i}^{2}} \rangle _{x}\), and \(\langle {\sum }_{i=1}^{N} {f_{i}^{2}}\eta _{i} \rangle _{x}\).

From Eq. (23) it follows that 〈fix = fo, and thus the first average is

$$ \begin{array}{@{}rcl@{}} \left\langle {\sum}_{i=1}^{N} f_{i} \right\rangle_{x} = N f_{o}. \end{array} $$
(29)

The second average follows from Eq. (24), and we have

$$ \begin{array}{@{}rcl@{}} \left\langle {\sum}_{i=1}^{N} {f_{i}^{2}} \right\rangle_{x} = N({f_{o}^{2}} + {\sigma_{f}^{2}}). \end{array} $$
(30)

The third average can be decomposed as

$$\left\langle {\sum}_{i=1}^{N} {f_{i}^{2}}\eta_{i} \right\rangle_{x} = {\sum}_{i=1}^{N} \langle {f_{i}^{2}}\rangle_{x} \eta_{i}= {\sum}_{i=1}^{N} ({f_{o}^{2}} + {\sigma_{f}^{2}})\eta_{i},$$

where we used the fact that the noise η is independent of the noise x, and again Eq. (24).

The final step is to insert the above averages into the equation for \(\dot {v}\) (Eq. (28)). As a result we obtain Eq. (3) in the main text, which is a starting point for determining energetics of synaptic plasticity and information characteristics.

Distribution of synaptic currents in the stochastic mean-field model: weak and strong synapses

Stochastic Eq. (3) for the population averaged synaptic current v can be written in short notation as

$$ \begin{array}{@{}rcl@{}} \frac{dv}{dt} = F(v) + \sqrt{2D}\overline{\eta}, \end{array} $$
(31)

where the function F(v) is defined as

$$ \begin{array}{@{}rcl@{}} F(v)= hr^{2}(1-\alpha r)- (v-\epsilon cf_{o})/\tau_{w}, \end{array} $$
(32)

and D is the effective noise amplitude (it includes also fluctuations in the presynaptic input) given by

$$ \begin{array}{@{}rcl@{}} D={\sigma_{v}^{2}}/\tau_{w}. \end{array} $$
(33)

Equation (31) corresponds to the following Fokker-Planck equation for the probability distribution of the synaptic current P(v|fo;t) conditioned on fo (Van Kampen 2007):

$$ \begin{array}{@{}rcl@{}} \frac{\partial P(v|f_{o};t)}{\partial t} &=& - \frac{\partial}{\partial v}\left( F(v)P(v|f_{o};t)\right) + D \frac{\partial^{2} P(v|f_{o};t)}{\partial v^{2}} \\ &=& - \frac{\partial J(v)}{\partial v} \end{array} $$
(34)

The function J(v) in the last equality in Eq. (34) is the probability current, which is J(v) = F(v) − DP(v)/v.

The stationary solution of the Fokker-Planck equation (Eq. (34)) is obtained for a constant probability current J (Gardiner 2004; Van Kampen 2007). For monostable systems, which have a unique steady state (fixed point) one usually sets J(v) = 0, which corresponds to a detailed balance (Gardiner 2004; Tome 2006). Such a unique steady state corresponds to thermal equilibrium with the environment (Tome 2006) and the solution is of the form (Van Kampen 2007)

$$ \begin{array}{@{}rcl@{}} P_{s}(v|f_{o}) \sim \exp\left( -{\Phi}(v|f_{o})/D\right), \end{array} $$
(35)

where Φ(v|fo) is the effective potential for synaptic current v, and it is obtained by integration of F(v) in Eq. (31), i.e.,

$$ \begin{array}{@{}rcl@{}} {\Phi}(v|f_{o})= - {{\int}_{0}^{v}}dx F(x). \end{array} $$
(36)

The potential can have either one (monostability) or two (bistability) minima, depending on fo and other parameters. The explicit form of Φ(v|fo) is shown in Eq. (13).

For bistable systems, for which there are two possible steady states (fixed points), the situation is more complicated. In the presence of nonzero, D > 0, effective synaptic noise, there can be noise induced jumps between the two fixed points. In this case the probability current J in the steady state must be a nonzero constant, because of the exchange of probabilities between the two fixed points, or equivalently, because of the stochastic jumps of v between two potential wells (Gardiner 2004). For small noise D, such jumps between the two potential wells happen on very long time scales. This long-time dynamics is the primary reason that the stationary state of such “driven” systems (by thermal and presynaptic fluctuations) is globally out of thermal equilibrium with the environment and it is called thermodynamic nonequilibrium steady state, in which the detailed balance is broken (Tome 2006). However, locally, close to each fixed point and for not too long times the system is in local thermal equilibrium. Thus, we can locally approximate the probability distribution of v by the form given in Eq. (35), by expanding the potential Φs(v) around vd and vu. Using a Gaussian approximation, which should be valid for small D (either for large N or for small σw or both), we can write Ps(v) for v close to vd as \(P_{s}(v) \sim e^{-{\Phi }_{d}/D} \exp \left (-\frac {{\Phi }^{(2)}_{d}}{2D}(v-v_{d})^{2} \right )\), and for v close to vu we have \(P_{s}(v) \sim e^{-{\Phi }_{u}/D} \exp \left (-\frac {{\Phi }^{(2)}_{u}}{2D}(v-v_{u})^{2} \right )\), where \({\Phi }^{(2)}_{i}\) is the second derivative of the potential with respect to v at vi, where the subscript i is either d (down state) or u (up state). For the sake of computations we have to extend these local approximations to longer intervals of v, corresponding to the domains of attraction for two fixed points vd and vu. Consequently, we assume that the first approximation works for 0 ≤ vvmax, and the second for v > vmax. In sum, we approximate the stationary probability density Ps(v|fo) as two Gaussian peaks centered at vd and vu:

$$ \begin{array}{@{}rcl@{}} P_{s}(v|f_{o})= \left\{ \begin{array}{ccl} P_{d}(v)= Z^{-1} e^{-{\Phi}_{d}/D} \exp\left( -\frac{{\Phi}^{(2)}_{d}}{2D}(v-v_{d})^{2} \right); & \text{for} & 0 \le v \le v_{max} \\ P_{u}(v)= Z^{-1} e^{-{\Phi}_{u}/D} \exp\left( -\frac{{\Phi}^{(2)}_{u}}{2D}(v-v_{u})^{2} \right); & \text{for} & v > v_{max} \end{array} \right. \end{array} $$
(37)

where Z is the normalization factor, which can be written as a sum Z = Zd + Zu, with

$$ \begin{array}{@{}rcl@{}} Z_{d} &=& e^{-{\Phi}_{d}/D} {\int}_{0}^{v_{max}} dv \exp\left( -\frac{{\Phi}^{(2)}_{d}}{2D}(v-v_{d})^{2} \right) \\ &=& e^{-{\Phi}_{d}/D}\sqrt{\frac{\pi D}{2{\Phi}_{d}^{(2)}}} [\text{erf}(|x_{d1}|) + \text{erf}(|x_{d2}|)] \end{array} $$
(38)

and

$$ \begin{array}{@{}rcl@{}} Z_{u} &=& e^{-{\Phi}_{u}/D} {\int}_{v_{max}}^{\infty} dv \exp\left( -\frac{{\Phi}^{(2)}_{u}}{2D}(v-v_{u})^{2} \right) \\ &=& e^{-{\Phi}_{u}/D}\sqrt{\frac{\pi D}{2{\Phi}_{u}^{(2)}}} [1 + \text{erf}(|x_{u}|)] \end{array} $$
(39)

where erf(...) is the error function, \(x_{d1}= \sqrt {\frac {{\Phi }_{d}^{(2)}}{2D}}(-v_{d})\), \(x_{d2}= \sqrt {\frac {{\Phi }_{d}^{(2)}}{2D}}(v_{max}-v_{d})\), and \(x_{u}= \sqrt {\frac {{\Phi }_{u}^{(2)}}{2D}}(v_{max}-v_{u})\). Note that because the unstable fixed point vmax depends on fo, the arguments of the error functions in Zd and Zu change with changes in fo. This influences the determination of pd,pu, as well as energy rate and Fisher information (see below).

Fractions of weak and strong synapses

We define the fraction of synapses in the down state pd (fraction of weak synapses) as the probability that synaptic current v is in the domain of attraction of the down fixed point in the deterministic limit. This takes place for v in the range 0 ≤ vvmax, where vmax is the unstable fixed point separating the two stable fixed points vd and vu. By analogy, the fraction of synapses in the up state pu is the probability that v is greater than vmax. We can write this mathematically as

$$ \begin{array}{@{}rcl@{}} p_{d}= {\int}_{0}^{v_{max}} dv P_{d}(v|f_{o}) \equiv Z_{d}/Z \end{array} $$
(40)

and

$$ \begin{array}{@{}rcl@{}} p_{u}= {\int}_{v_{max}}^{\infty} dv P_{u}(v|f_{o}) \equiv Z_{u}/Z \end{array} $$
(41)

where Pd(v|fo) and Pu(v|fo) are given by Eq. (37). Using the expressions for Zd and Zu, we find an explicit form of pd as

$$ p_{d} = \left( 1 + e^{({\Phi}_{d}-{\Phi}_{u})/D}\sqrt{\frac{{\Phi}_{d}^{(2)}}{{\Phi}_{u}^{(2)}}} \frac{[1 + \text{erf}(|x_{u}|)]}{[\text{erf}(|x_{d1}|) + \text{erf}(|x_{d2}|)]} \right)^{-1} $$
(42)

Note that pd and pu sum to unity, since Z = Zd + Zu.

Average values of v and r in the mean-field

Average value of the synaptic current v in the mean-field is denoted as 〈v〉, and computed as

$$ \begin{array}{@{}rcl@{}} \langle v \rangle= {\int}_{0}^{v_{max}} dv P_{d}(v) v + {\int}_{v_{max}}^{\infty} dv P_{u}(v) v. \end{array} $$
(43)

Execution of these integrals yields

$$ \begin{array}{@{}rcl@{}} \langle v \rangle= p_{d}v_{d} + p_{u}v_{u} + O(e^{-x_{d1}^{2}}, e^{-x_{d2}^{2}}, e^{-{x_{u}^{2}}}), \end{array} $$
(44)

where \(O(e^{-x_{d1}^{2}}, e^{-x_{d2}^{2}}, e^{-{x_{u}^{2}}})\) denotes small exponential terms in the limit of very small D.

Standard deviation of v can be found analogically, which yields

$$ \begin{array}{@{}rcl@{}} \sqrt{\langle v^{2} \rangle - \langle v \rangle^{2} } = \sqrt{ (v_{u}-v_{d})^{2}p_{u}p_{d} + D(\frac{p_{d}}{{\Phi}^{(2)}_{d}} + \frac{p_{u}}{{\Phi}^{(2)}_{u}} ) }. \end{array} $$
(45)

The average value of the postsynaptic firing rate r, denoted as 〈r〉, is computed in the limit of large A (see Eq. (6)). In this limit \(r\approx (v/\kappa )\left [1 - v/(A\kappa )^{2}\right ]\), and we find

$$ \begin{array}{@{}rcl@{}} \langle r\rangle \approx p_{d}\left[ \frac{v_{d}}{\kappa} - \frac{1}{A^{2}\kappa^{3}}\left( {v_{d}^{2}} + D/{\Phi}^{(2)}_{d} \right) \right] \\ + p_{u}\left[ \frac{v_{u}}{\kappa} - \frac{1}{A^{2}\kappa^{3}}\left( {v_{u}^{2}} + D/{\Phi}^{(2)}_{u} \right) \right], \end{array} $$
(46)

which means that the form of 〈r〉 is more complicated than 〈v〉.

Transitions between weak and strong synaptic states: Kramer escape rate

For cortical neurons the number of spines per neuron are very large, i.e. \(N \sim 10^{3}-10^{4}\) (Elston et al. 2001; DeFelipe et al. 2002; Sherwood et al. 2020), and thus one can expect that σv is small and consequently the fluctuations around the population average current v are rather weak. The results described below are obtained in the limit of small σv.

Plastic synapses can jump between down and up states due to effective synaptic noise σv or D. From a physical point of view, this corresponds to a noise induced “escape” of some synapses through a potential barrier. Average dwelling times in the up (Tu) and down (Td) states can be determined from the Kramers’s formula (Van Kampen 2007):

$$ \begin{array}{@{}rcl@{}} T_{i}= \frac{2\pi}{\sqrt{{\Phi}^{(2)}_{i}|{\Phi}^{(2)}_{max}|}} \exp\left( \frac{\tau_{w}}{{\sigma_{v}^{2}}} {\Delta}{\Phi}_{i}\right), \end{array} $$
(47)

where the index i = d or i = u, \({\Phi }^{(2)}_{i}\) and \({\Phi }^{(2)}_{max}\) are the second derivatives of the potential at its minima (v = vi) and maximum (v = vmax), and the potential difference ΔΦi = Φ(vmax) −Φ(vi) > 0. Note that for large number of synapses N, the exponential factor in Eq. (47) can be large, which can lead to very long dwelling times that are generally much longer than any time scale in the original Eqs. (1-2). The fact that the times Tu and Td are long but finite is an indication of metastability of “locally” stable up and down synaptic states.

There exist a relationship between fractions of weak/strong synapses and the Kramer’s escape times Td and Tu in the limit of very weak noise D↦0. Namely, it can be easily verified that in this limit \(p^{(0)}_{d}/p^{(0)}_{u}= T_{d}/T_{u}\), and consequently, we can write

$$ \begin{array}{@{}rcl@{}} p^{(0)}_{d}= \frac{T_{d}}{T_{d} + T_{u}}, \end{array} $$
(48)

where \(p^{(0)}_{d}\) is the fraction of weak synapses for D↦0 given by

$$ \begin{array}{@{}rcl@{}} p^{(0)}_{d} = \left( 1 + e^{({\Phi}_{d}-{\Phi}_{u})/D}\sqrt{\frac{{\Phi}_{d}^{(2)}}{{\Phi}_{u}^{(2)}}} \right)^{-1}. \end{array} $$
(49)

Memory lifetime

Synaptic memory lifetime Tm is defined as a characteristic time the synapses remember a perturbation to their steady state distribution. Mathematically, it means that we have to consider a time-dependent solution of the probability density P(v|f0;t) to the Fokker-Planck equation given by Eq. (34). This solution can be written as (Van Kampen 2007; Risken 1996)

$$ \begin{array}{@{}rcl@{}} P(v|f_{o};t)= P_{s}(v|f_{o}) + {\sum}_{k=0}^{\infty} e^{-\gamma_{k}t}\psi_{k}(v|f_{o}), \end{array} $$
(50)

where γk and ψk(v|fo) are appropriate eigenvalues and eigenvectors. The eigenvalues are inverses of characteristic time scales, which describe a relaxation process to the steady state. The smallest eigenvalue, denoted as γ0, determines the longest relaxation time 1/γ0, and we associate that time with the memory lifetime Tm. It has been shown that γ0 = 1/Td + 1/Tu (Van Kampen 2007; Risken 1996), which implies that

$$ \begin{array}{@{}rcl@{}} T_{m}= \frac{T_{u}T_{d}}{T_{u}+T_{d}}. \end{array} $$
(51)

A similar approach, through eigenvalues, to estimating the memory lifetime was adopted also in (Fusi and Abbott 2007).

Entropy production rate, entropy flux, and power dissipated by plasticity

Processes underlying synaptic plasticity (e.g. AMPA receptor trafficking, PSD protein phosphorylation, as well as protein synthesis and degradation; see Huganir and Nicoll 2013, Choquet and Triller 2013) operate out of thermodynamic equilibrium, and therefore require energy influx. At a stochastic steady state, this energy is dissipated as heat, which roughly corresponds to a metabolic rate of synaptic plasticity. The rate of dissipated energy is proportional to the average rate of decrease in the effective potential Φ, or equivalently to the entropy production rate (Nicolis and Prigogine 1977).

Given the above, we can write the energy rate for synaptic plasticity \(\dot {E}\) as \(\dot {E} \sim - \langle d{\Phi }(v|f_{0})/dt \rangle = - \langle {\Phi }^{(1)}\dot {v} \rangle \), where Φ(1) is the first derivative of Φ with respect to v, the symbol \(\dot {v}\) is the temporal derivative of v, and the averaging 〈...〉 is performed over the distribution P(v|f0). The second equality follows from the fact that v is the only variable in the potential that changes with time on the time scale τw. Next, we can use Eq. (3) or (31) in the equivalent form, namely \(\dot {v}= -{\Phi }^{(1)} + \sqrt {2/\tau _{w}}\sigma _{v}\overline {\eta }\), and this equation resembles the motion of an overdamped particle (with negligible mass) in the potential Φ, with v playing the role of a spatial coordinate. After that step, we can write the energy rate as \(\dot {E} \sim \langle [{\Phi }^{(1)}]^{2} \rangle - \sqrt {2/\tau _{w}}\sigma _{v} \langle {\Phi }^{(1)}\overline {\eta } \rangle \). The final step is to use the Novikov theorem (Novikov 1965) for the second average, i.e. \(\langle {\Phi }^{(1)}\overline {\eta } \rangle = \frac {1}{2}\sqrt {2/\tau _{w}}\sigma _{v} \langle {\Phi }^{(2)} \rangle \). This leads to

$$\dot{E} \sim \frac{{\sigma_{v}^{2}}}{\tau_{w}} \left( - \langle {\Phi}^{(2)} \rangle + (\tau_{w}/{\sigma_{v}^{2}})\langle[{\Phi}^{(1)}]^{2}\rangle \right).$$

We can obtain a similar result for \(\dot {E}\) using a thermodynamic reasoning. The dynamics of synaptic plasticity is characterized by the distribution of synaptic currents per synapse P(v|fo), which evolves in time according to Eq. (34). With this distribution we can associate the entropy S(t), defined as \(S(t)= -{\int \limits }_{0}^{\infty } dv P(v|f_{o})\ln P(v|f_{o})\), measuring the level of order in a typical spine. It can be shown (Nicolis and Prigogine 1977; Tome 2006; Tome and de Oliveira 2010) that the temporal derivative of the entropy, dS/dt, is composed of two competing terms, dS/dt = π −Γ, called entropy production rate (π) and entropy flux (Γ), both per synapse. In the case of thermodynamic equilibrium, which is not biologically realistic, one has dS/dt = π = Γ = 0, and there is neither energy influx to a system nor dissipated energy to the environment. However, for processes out of thermodynamic equilibrium, relevant for spine dynamics, we still can find a stationary regime where entropy of the spine does not change, dS/dt = 0, but entropy flux Γ and entropy production π are nonzero and balance each other (Nicolis and Prigogine 1977; Tome 2006). It is more convenient to determine the stationary dissipated power by finding the entropy flux, which is given by Tome (2006) and Tome and de Oliveira (2010) (see ??)

$$ \begin{array}{@{}rcl@{}} {\Gamma} = \frac{\tau_{w}}{{\sigma_{v}^{2}}}\langle[{\Phi}^{(1)}]^{2}\rangle - \langle {\Phi}^{(2)} \rangle \end{array} $$
(52)

Note that Eq. (52) is very similar in form to the energy rate \(\dot {E}\) derived above; the two expressions differ only by the factor \({\sigma _{v}^{2}}/\tau _{w}\), and none of them has the units of energy (Γ has the unit of the inverse of time). Thus, we need to introduce the energy scale in the problem. Generally, the stationary dissipated power per synapse \(\dot {E}\) can be written as \(\dot {E}= E_{o}{\Pi }= E_{o}{\Gamma }\) (Nicolis and Prigogine 1977), where Eo is the characteristic energy scale associated with spine conductance changes, and its value is estimated next.

Estimation of the characteristic energy scale for synaptic plasticity

As was said in the Introduction, the BCM model (either classical or extended) is only a phenomenological model of plasticity that does not relate directly to the underlying molecular processes in synapses. Consequently, a small, single, change of synaptic weight by Δwi in Eq. (1) is accompanied in reality by many molecular transitions in synapse i. This means that a single degree of freedom related to wi is in fact associated with many, hidden, molecular degrees of freedom. To be realistic in our energy cost estimates, we have to include those hidden degrees of freedom.

If we dealt with a process representing a single degree of freedom, then the energy scale Eo relating entropy flux Γ and energy rate \(\dot {E}\), would be Eo = kT (Nicolis and Prigogine 1977), where k is the Boltzmann constant and T is the tissue absolute temperature (T ≈ 310 K). However, a dendritic spine is a composite object with multiple components and many degrees of freedom (Bonhoeffer and Yuste 2002; Holtmaat et al. 2005; Meyer et al. 2014; Choquet and Triller 2013), and hence the characteristic energy scale Eo is much bigger than kT. The changes in spine conductance on time scale of \(\sim \) 1 hr, i.e. for e-LTP and e-LTD, are induced by protein interactions in PSD (Lisman et al. 2012; Kandel et al. 2014) and subsequent membrane trafficking associated with AMPA and NMDA receptors (Borgdorff and Choquet 2002; Huganir and Nicoll 2013; Choquet and Triller 2013). Protein interactions are powered by phosphorylation process, which is one of the main biochemical mechanism of molecular signal transduction in PSD relevant for synaptic plasticity (Bhalla and Iyengar 1999; Zhu et al. 2016). Phosphorylation rates in an active LTP phase can be very fast, e.g., for CaMKII autophosphorylation they are in the range 60 − 600 min− 1 (Bradshaw et al. 2002). Other processes in a spine, most notably protein turnovers in PSD (likely involved in l-LTP and l-LTD), are much slower \(\sim 3.7\) days (Cohen et al. 2013), and therefore their contribution to the energetics of the early phase of spine plasticity seems to be much less important (see, however Discussion for an estimate of the protein turnover energy rate).

The energy scale for protein interaction can be estimated as follows. A typical dendritic spine contains about 104 proteins (including their copies) (Sheng and Hoogenraad 2007). One cycle of protein phosphorylation requires the hydrolysis of 1 ATP molecule (Hill 1989; Qian 2007), which costs about 20kT (Phillips et al. 2012). Each protein has on average 4-6 phosphorylation sites (Collins et al. 2005; Trinidad et al. 2012). If we assume conservatively that only about 20% of all PSD proteins are phosphorylated, then we obtain the energy scale for protein interactions roughly 2 ⋅ 105kT, which is 8.6 ⋅ 10− 16 J.

Energy scale for receptor trafficking can be broadly decomposed into two parts: energy required for insertion of the receptors into the spine membrane, and energy related to their horizontal movement along the membrane to the top near a presynaptic terminal. The insertion energy for a typical protein is either about 3 − 17 kcal/mol (Gumbart et al. 2011) or 8 − 17kT (Grafmuller et al. 2009), with the range spanning 4 − 25kT, and is caused by a deformation in the membrane structure (Gumbart et al. 2011). Since an average spine contains about 100 AMPA (Matsuzaki et al. 2001; Smith et al. 2003) and 10 NMDA (Nimchinsky et al. 2004) receptors, we obtain the total insertion energy in the rage 500 − 3200kT. The second, movement contribution can be estimated by noting that typical forces that overcome friction and push macromolecules along membrane are about 10 pN, and they are powered by ATP hydrolysis (Fisher and Kolomeisky 1999). AMPA and NMDA receptors have to travel a spine distance of about 1 μ m (Benavides-Piccione et al. 2013), which requires the work of 110 ⋅ 10− 11 ⋅ 10− 6 N⋅m = 1.1 ⋅ 10− 15 J or 2.5 ⋅ 105kT. The latter figure is 100 times larger than the insertion contribution, which indicates that the energy scale for receptor trafficking is dominated by the horizontal movement and is similar to the above for protein phosphorylation.

To summarize, the total energy scale Eo for spine conductance is about Eo = 2 ⋅ 10− 15 J, or equivalently 4.6 ⋅ 105kT (or 2.3 ⋅ 104 ATP molecules).

Analytical approximation of the energy rate related to synaptic plasticity

It is not possible to find analytically the entropy flux Γ in Eq. (52) for an arbitrary probability distribution. However, Γ can be determined approximately for the probability distribution Ps(v|fo) in Eq. (37), by the saddle point method as a series expansion in the small noise amplitude D, which is proportional to 1/N. We can write the entropy flux Γ in terms of the probability densities Pd and Pu appearing in Eq. (37) as

$$ \begin{array}{@{}rcl@{}} {\Gamma}= \left\langle \frac{({\Phi}^{(1)})^{2}}{D} - {\Phi}^{(2)} \right\rangle_{d} + \left\langle \frac{({\Phi}^{(1)})^{2}}{D} - {\Phi}^{(2)} \right\rangle_{u} \end{array} $$
(53)

where

$$ \begin{array}{@{}rcl@{}} \left\langle\! \frac{({\Phi}^{(1)})^{2}}{D} - {\Phi}^{(2)}\! \right\rangle_{d} = {\int}_{0}^{v_{max}} dv P_{d}(v) \left( \!\frac{({\Phi}^{(1)})^{2}}{D} - {\Phi}^{(2)}\!\right) \end{array} $$
(54)

and

$$ \begin{array}{@{}rcl@{}} \left\langle\! \frac{({\Phi}^{(1)})^{2}}{D} - {\Phi}^{(2)} \!\right\rangle_{u} = {\int}_{v_{max}}^{\infty} dv P_{u}(v) \left( \!\frac{({\Phi}^{(1)})^{2}}{D} - {\Phi}^{(2)}\!\right) \end{array} $$
(55)

The essence of the saddle point method is in noting that for very small D, the probability distributions in Eq. (37) have two sharp maxima corresponding to two most likely synaptic currents vd and vu. This implies that the values of v that are the closest to vd and vu in (Φ(1))2/D −Φ(2) provide the biggest contributions to the integrals in Eqs. (54) and (55), and hence to the entropy flux Γ. Consequently, we have to expand the function (Φ(1)(v))2/D −Φ(2)(v) around vd and vu.

For v near vd, the expansion is simpler if we introduce a unitless variable x, related to v such that \((v-v_{d})= \sqrt {2D/{\Phi }_{d}^{(2)}}x\), where \({\Phi }_{d}^{(2)}\) is the second derivative of Φ(v) at v = vd. Then to the order \(\sim D\) we have:

$$ \begin{array}{@{}rcl@{}} &&\frac{({\Phi}^{(1)}(x))^{2}}{D} - {\Phi}^{(2)}(x)\\ &=& {\Phi}_{d}^{(2)}(2x^{2}-1) + \sqrt{\frac{2D}{{{\Phi}^{2}_{d}}}}{\Phi}_{d}^{(3)}(2x^{3}-x) \\ &&+ \frac{2D}{{{\Phi}^{2}_{d}}}\left[ {\Phi}_{d}^{(4)}(\frac{2}{3}x^{4}-\frac{1}{2}x^{2}) + \frac{1}{2}\frac{({\Phi}_{d}^{(3)})^{2}}{{\Phi}_{d}^{(2)}}x^{4} \right] \\&&+ O(D^{3/2}) \end{array} $$
(56)

A similar expression holds v near vu, with a substitution \({\Phi }_{d}^{(n)} \mapsto {\Phi }_{u}^{(n)}\). Thus for \(\langle \frac {({\Phi }^{(1)})^{2}}{D} - {\Phi }^{(2)} \rangle _{d}\) we have

$$ \begin{array}{@{}rcl@{}} &&\left\langle \frac{({\Phi}^{(1)})^{2}}{D} - {\Phi}^{(2)} \right\rangle_{d}\\&=& Z^{-1} \sqrt{\frac{2D}{{\Phi}^{(2)}_{d}}} e^{-{\Phi}_{d}/D} {\int}_{x_{d1}}^{x_{d2}} dx e^{-x^{2}}\\&& \left( {\Phi}_{d}^{(2)}(2x^{2}-1) + \sqrt{\frac{2D}{{{\Phi}^{2}_{d}}}}{\Phi}_{d}^{(3)}(2x^{3}-x) \right. \\ &&\left. + \frac{2D}{{\Phi}^{(2)}_{d}}\left[ {\Phi}_{d}^{(4)}(\frac{2}{3}x^{4}-\frac{1}{2}x^{2}) + \frac{1}{2}\frac{({\Phi}_{d}^{(3)})^{2}}{{\Phi}_{d}^{(2)}}x^{4} \right] \right) \end{array} $$
(57)

where the limits of integration are \(x_{d1}= \sqrt {\frac {{\Phi }_{d}^{(2)}}{2D}}(-v_{d})\), and \(x_{d2}= \sqrt {\frac {{\Phi }_{d}^{(2)}}{2D}}(v_{max}-v_{d})\). Execution the above integrals yields

$$ \begin{array}{@{}rcl@{}} &&\left\langle \frac{({\Phi}^{(1)})^{2}}{D} - {\Phi}^{(2)} \right\rangle_{d}\\&=& Z^{-1} \sqrt{\frac{2D}{{\Phi}^{(2)}_{d}}} e^{-{\Phi}_{d}/D} \frac{\sqrt{\pi}}{8}[\text{erf}(|x_{d1}|) + \text{erf}(|x_{d2}|)] \\ &&\left( \frac{D}{({\Phi}_{d}^{(2)})^{2}} [3({\Phi}_{d}^{(3)})^{2} + 2{\Phi}_{d}^{(2)}{\Phi}_{d}^{(4)}] \right.\\&&\left.+ O(e^{-x_{d1}^{2}}, e^{-x_{d2}^{2}}) + O(D^{2})\vphantom{\frac{D}{({\Phi}_{d}^{(2)})^{2}}} \right), \end{array} $$
(58)

where in the limit of small D, the exponential terms (\(\sim e^{-x_{d1}^{2}}, e^{-x_{d2}^{2}}\)) are small, and thus negligible. Next, it is easy to note that the prefactor in front of the large bracket simplifies, i.e, \(Z^{-1} \sqrt {\frac {2D}{{\Phi }^{(2)}_{d}}} e^{-{\Phi }_{d}/D} \frac {\sqrt {\pi }}{8}[\text {erf}(|x_{d1}|) + \text {erf}(|x_{d2}|)] = \frac {Z_{d}}{4Z}= p_{d}/4\).

Applying the same procedure for \(\langle \frac {({\Phi }^{(1)})^{2}}{D} - {\Phi }^{(2)} \rangle _{u}\) gives us the total expression for the entropy flux Γ

$$ {\Gamma}= {\sum}_{i=u,d} p_{i} \frac{D}{4({\Phi}_{i}^{(2)})^{2}} [3({\Phi}_{i}^{(3)})^{2} + 2{\Phi}_{i}^{(2)}{\Phi}_{i}^{(4)}] + O(D^{2}), $$
(59)

where i = d (down state) or i = u (up state).

Having the entropy flux, we can determine analytically the power dissipated per synapse \(\dot {E}\) due to synaptic plasticity. The result is

$$ \begin{array}{@{}rcl@{}} \dot{E} = E_{o}{\Gamma}\equiv p_{d}\dot{E}_{d} + p_{u}\dot{E}_{u} \end{array} $$
(60)

where \(\dot {E}_{d}\) and \(\dot {E}_{u}\) are the energy rates dissipated in the down and up states, respectively. They take the form:

$$ \begin{array}{@{}rcl@{}} \dot{E}_{i}= \frac{E_{o}D}{4({\Phi}_{i}^{(2)})^{2}} [3({\Phi}_{i}^{(3)})^{2} + 2{\Phi}_{i}^{(2)}{\Phi}_{i}^{(4)}] + O(D^{2}). \end{array} $$
(61)

Note that the first nonzero contribution to the energy rate is of the order \(\sim D\).

Neuron energy rate related to fast electric signaling

We provide below an estimate of the energy used by a sensory neuron for short-term signaling for the sake of comparison with the energy requirement of synaptic plasticity. It has been suggested that the majority of neuronal energy goes to pumping out Na+ ions (Na+-K+-ATPase), which accumulates mostly due to neural spiking activity, synaptic background activity, and passive Na+ influx through sodium channels at rest (Attwell and Laughlin 2001). It has been shown that this short-term neuronal energy cost can be derived from a biophysical neuronal model, compared across species, and represented by a relatively simple equation (Karbowski 2009, 2012):

$$ \begin{array}{@{}rcl@{}} \text{CMR}_{glu}= a_{0} + a_{1}\langle r\rangle + b\rho_{s}f_{o}, \end{array} $$
(62)

where CMRglu is the glucose metabolic rate [in μ mol/(cm3 ⋅ min)], ρs is the synaptic density, 〈r〉 is the average postsynaptic firing rate, and the parameters a0, a1, and b characterize the magnitude of the above three contributions to the neural metabolism, i.e. resting, firing rate, and synaptic transmission, respectively (Karbowski 2012). The average postsynaptic rate 〈r〉 is found from Eq. (46).

According to biochemical estimates, one oxidized glucose molecule generates about 31 ATP molecules (Rolfe and Brown 1997). In addition, 1 ATP molecule provides about 20kT of energy (Phillips et al 2012). This means that the short-term energy rate per neuron, denoted as \(\dot {E}_{n}\), is given by

$$ \begin{array}{@{}rcl@{}} \dot{E}_{n}= 31\cdot 20 \frac{N_{A} kT}{\rho_{n}} \text{CMR}_{glu}, \end{array} $$
(63)

where NA is the Avogadro number, and ρn is the neuron density. We estimate the ratio of the synaptic plasticity power to neural power, i.e. \(\dot {E}/\dot {E}_{n}\) across different presynaptic firing rates for three areas of the adult human cerebral cortex (frontal, temporal, and visual), and two areas of macaque monkey cerebral cortex (frontal and visual).

The values of the parameters a0 and a1 in Eq. (62) are species- and area-independent, and they read a0 = 2.1 ⋅ 10− 10 mol/(cm3 s), and a1 = 2.3 ⋅ 10− 9 mol/cm3 (Karbowski 2012). The rest of the parameters take different values for human and macaque cortex. Most of them are taken from empirical studies, and are given below. The parameter b, present in Eq. (62), is proportional to the neurotransmitter release probability and synaptic conductance, and it was estimated based on fitting developmental data for glucose metabolism CMRglu and synaptic density ρs (which vary during the development) to the formula (62) (Karbowski 2012).

The following data are for an adult human cortex. The adult CMRglu is 0.27 μ mol/(cm3 ⋅min) (frontal cortex), 0.27 μ mol/(cm3 ⋅min) (visual cortex), and 0.24 μ mol/(cm3 ⋅min) (temporal cortex) (Chugani 1998). The parameter b reads: 1.16 ⋅ 10− 20 mol (frontal), 0.63 ⋅ 10− 20 mol (visual), 0.17 ⋅ 10− 20 mol (temporal) (Karbowski 2012). Note that the value of b is 7 times larger for the frontal cortex than for the temporal, which might suggest that the product of neurotransmitter release probability and synaptic conductance is also 7 fold larger in the frontal cortex. This high difference may seem unlikely, however, it is still plausible, given that the release probability is highly variable and can assume values between 0.05-0.7 (Bolshakov and Siegelbaum 1995; Frick et al. 2007; Volgushev et al. 2004; Murthy et al. 2001), and synaptic weights in the cortex are widely distributed (Loewenstein et al 2011). Neuron density ρn reads: 36.7 ⋅ 106 cm− 3 (frontal), 66.9 ⋅ 106 cm− 3 (visual), 59.8 ⋅ 106 cm− 3 (temporal) (Pakkenberg and Gundersen 1997). Synaptic density ρs reads: 3.4 ⋅ 1011 cm− 3 (frontal), 3.1 ⋅ 1011 cm− 3 (visual), 2.9 ⋅ 1011 cm− 3 (temporal) (Huttenlocher and Dabholkar 1997).

The following data are for an adult (6 years old) macaque monkey cortex. The adult CMRglu is 0.34 μ mol/(cm3 ⋅min) (frontal cortex), 0.40 μ mol/(cm3 ⋅min) (visual cortex) (Noda et al. 2002). The parameter b reads: 0.4 ⋅ 10− 20 mol (frontal), and 3.8 ⋅ 10− 20 mol (visual) (Karbowski 2012). Neuron density ρn reads: 9 ⋅ 107 cm− 3 (frontal), 31.9 ⋅ 107 cm− 3 (visual) (Christensen et al. 2007). Synaptic density ρs reads: 5 ⋅ 1011 cm− 3 (frontal) (Bourgeois et al. 1994), 6 ⋅ 1011 cm− 3 (visual) (Bourgeois and Rakic 1993).

Fisher information and coding accuracy in synapses

Fisher information IF(fo) about the driving input fo is a good approximation of the mutual information between the driving presynaptic activity and postsynaptic current v (Brunel and Nadal 1998). It is also a measure of the coding accuracy and it is defined as (Cover and Thomas 2006)

$$ \begin{array}{@{}rcl@{}} I_{F}(f_{o})= \left\langle \left( \frac{\partial\ln P_{s}(v|f_{o})}{\partial f_{o}} \right)^{2} \right\rangle. \end{array} $$
(64)

Taking into account the form of probability density, Eq. (37), we can rewrite this equation as

$$ I_{F}(f_{o})= \langle \left( \frac{\partial\ln P_{d}(v|f_{o})}{\partial f_{o}} \right)^{2} \rangle_{d} + \langle \left( \frac{\partial\ln P_{u}(v|f_{o})}{\partial f_{o}} \right)^{2} \rangle_{u}. $$
(65)

where

$$ \begin{array}{@{}rcl@{}} &&\left\langle \left( \frac{\partial\ln P_{i}(v|f_{o})}{\partial f_{o}} \right)^{2} \right\rangle_{i} \\&=& \left[Z^{\prime}/Z + ({\Phi}_{i}/D)'\right]^{2}p_{i} \\ &&+ 2\left[Z^{\prime}/Z + ({\Phi}_{i}/D)'\right]\left\langle \left( {\Phi}_{i}^{(2)}(v-v_{i})^{2}/(2D)\right)' \right\rangle_{i} \\ &&+ \left\langle \left[\left( {\Phi}_{i}^{(2)}(v-v_{i})^{2}/(2D)\right)'\right]^{2} \right\rangle_{i} \end{array} $$
(66)

Our first goal is to express the factor \(Z^{\prime }/Z\) in terms of the potential and its derivatives. To do it, we compute the following average:

$$\langle \frac{\partial\ln P_{s}}{\partial f_{o}} \rangle = \langle \frac{\partial\ln P_{d}}{\partial f_{o}} \rangle_{d} + \langle \frac{\partial\ln P_{u}}{\partial f_{o}} \rangle_{u}.$$

The left hand side of this equation is zero, since

$$\langle(\ln P_{s})'\rangle = {\int}_{0}^{\infty} dv (P_{s})' = \left( {\int}_{0}^{\infty} dv P_{s}(v)\right)' = 0,$$

where a prime denotes a derivative with respect to fo. Additionally,

$$\langle (\ln P_{i})' \rangle_{i} = - Z^{\prime}/Z - ({\Phi}_{i}/D)' - \langle \left( {\Phi}_{i}^{(2)}(v-v_{i})^{2}/(2D)\right)' \rangle_{i}.$$

Combining the last two equations we obtain a relation between \(Z^{\prime }/Z\) and the potentials:

$$ \begin{array}{@{}rcl@{}} Z^{\prime}/Z &=& -({\Phi}_{d}/D)' p_{d} - ({\Phi}_{u}/D)' p_{u} \\&&- \langle \left( {\Phi}_{d}^{(2)}(v-v_{d})^{2}/(2D)\right)' \rangle_{d} \\ &&- \langle \left( {\Phi}_{u}^{(2)}(v-v_{u})^{2}/(2D)\right)' \rangle_{u}. \end{array} $$
(67)

After insertion of this expression into Eq. (66) and after some algebra, we arrive at the Fisher information

$$ \begin{array}{@{}rcl@{}} I_{f}(f_{o}) &=& \left[ \left( \frac{{\Phi}_{u}-{\Phi}_{d}}{D}\right)' \right]^{2}p_{d}p_{u} + 2 \left( \frac{{\Phi}_{u}-{\Phi}_{d}}{D}\right)' \\ &&\times\left[ p_{d}\langle \left( {\Phi}_{u}^{(2)}(v-v_{u})^{2}/(2D)\right)' \rangle_{u} \right.\\&&\left.- p_{u}\left\langle \left( {\Phi}_{d}^{(2)}(v-v_{d})^{2}/(2D)\right)'\right \rangle_{d} \right] \\ &&- \left( \left\langle \left( {\Phi}_{d}^{(2)}(v-v_{d})^{2}/(2D)\right)' \right\rangle_{d} \right.\\&&\left.+ \left\langle \left( {\Phi}_{u}^{(2)}(v-v_{u})^{2}/(2D)\right)' \right\rangle_{u} \right)^{2} \\ &&+ \left\langle \left[\left( {\Phi}_{d}^{(2)}(v-v_{d})^{2}/(2D)\right)'\right]^{2} \right\rangle_{d} \\&&+ \left\langle \left[\left( {\Phi}_{u}^{(2)}(v-v_{u})^{2}/(2D)\right)'\right]^{2} \right\rangle_{u} \end{array} $$
(68)

The averages in the above equation can be computed to yield:

$$ \begin{array}{@{}rcl@{}} \left\langle \left( {\Phi}_{i}^{(2)}(v-v_{i})^{2}/(2D)\right)' \right\rangle_{i} &=& \frac{1}{2}p_{i}\left[ \frac{({\Phi}_{i}^{(2)})'}{{\Phi}_{i}^{(2)}} - \frac{D^{\prime}}{D} \right] \\&&+ O(e^{-1/D}) \end{array} $$
(69)

and

$$ \begin{array}{@{}rcl@{}} &&\left\langle \left[\left( {\Phi}_{i}^{(2)}(v-v_{i})^{2}/(2D)\right)' \right]^{2} \right\rangle_{i}\\ &=& 2p_{i}(v_{i}^{\prime})^{2} \frac{{\Phi}_{i}^{(2)}}{2D} \\&&+ \frac{3}{4}p_{i} \left[ \frac{({\Phi}_{i}^{(2)})'}{{\Phi}_{i}^{(2)}} - \frac{D^{\prime}}{D} \right]^{2} + O(e^{-1/D}) \end{array} $$
(70)

After insertion of these expressions into Eq. (68), and some algebraic manipulations, we arrive at Eq. (17) for IF in the Results.

Relationship between synaptic energy rate and Fisher information in the limit D↦0

Below we derive the relation given by Eqs. (18) and (19) in the limit of very weak synaptic noise, D↦0. In this limit, it can be noted that the product of the fractions of weak and strong synapses is

$$ \begin{array}{@{}rcl@{}} p^{(0)}_{d}p^{(0)}_{u}= \frac{ e^{({\Phi}_{d}-{\Phi}_{u})/D}\sqrt{{\Phi}_{d}^{(2)}/{\Phi}_{u}^{(2)}} } { \left( 1 + e^{({\Phi}_{d}-{\Phi}_{u})/D}\sqrt{{\Phi}_{d}^{(2)}/{\Phi}_{u}^{(2)}} \right)^{2} }. \end{array} $$
(71)

This expression enables us to write in a compact form the derivative of \(p^{(0)}_{d}\) with respect to fo as

$$ \begin{array}{@{}rcl@{}} \frac{\partial p^{(0)}_{d}}{\partial f_{o}}&=& p^{(0)}_{d}p^{(0)}_{u} \left[ \left( \frac{{\Phi}_{u}-{\Phi}_{d}}{D}\right)'\right. \\&&\left.+ \frac{1}{2}\left[ ({\Phi}_{u}^{(2)})'/{\Phi}_{u}^{(2)} - ({\Phi}_{d}^{(2)})'/{\Phi}_{d}^{(2)} \right] \right] \end{array} $$
(72)

where the prime denotes differentiation with respect to fo. On the other hand it can be noted that, in the bistable regime, the Fisher information in Eq. (17) in the leading order 1/D2 can be written as

$$ \begin{array}{@{}rcl@{}} I_{F}(f_{o})&=& p^{(0)}_{d}p^{(0)}_{u} \left[ \left( \frac{{\Phi}_{u}-{\Phi}_{d}}{D}\right)' \right. \\&&\left.+ \frac{1}{2}\left[ ({\Phi}_{u}^{(2)})'/{\Phi}_{u}^{(2)} - ({\Phi}_{d}^{(2)})'/{\Phi}_{d}^{(2)} \right] \right]^{2} \\&&+ O(1/D), \end{array} $$
(73)

which is similar in form to the expression for \({\partial p^{(0)}_{d}}{\partial f_{o}}\). This suggests that we can combine the two equations, and arrive at

$$ \begin{array}{@{}rcl@{}} I_{F}(f_{o})= \frac{ \left( \partial p^{(0)}_{d}/\partial f_{o} \right)^{2} } {p^{(0)}_{d} p^{(0)}_{u}} \left[ 1 + O(D) \right], \end{array} $$
(74)

which is Eq. (18) in the Results.

Next, we want to relate Eq. (74) for IF to the energy rate. Energy rate \(\dot {E}= p^{(0)}_{d}\dot {E}_{d} + p^{(0)}_{u}\dot {E}_{u}\) can be differentiated with respect to fo, which yields

$$ \begin{array}{@{}rcl@{}} \frac{\partial \dot{E}}{\partial f_{o}}= \frac{\partial p^{(0)}_{d}}{\partial f_{o}}(\dot{E}_{d}-\dot{E}_{u}) + p^{(0)}_{d}\frac{\partial\dot{E}_{d}}{\partial f_{o}} + p^{(0)}_{u}\frac{\partial\dot{E}_{u}}{\partial f_{o}}, \end{array} $$
(75)

where we used the relation \(\partial p^{(0)}_{d}/\partial f_{o} = -\partial p^{(0)}_{u}/\partial f_{o}\), which follows from the fact that \(p^{(0)}_{d} + p^{(0)}_{u}= 1\). Now it is crucial to note that the first term in \(\partial \dot {E}/\partial f_{o}\), i.e. \(\partial p^{(0)}_{d}/\partial f_{o}\), is of the order 1/D, whereas the rest terms (\(\sim p^{(0)}_{d}, p^{(0)}_{u}\)) are of the order of one. This implies that the first term dominates in the limit D↦0. Thus, we can write approximately the expression for \(\partial p^{(0)}_{d}/\partial f_{o}\), involving the energy rate as

$$ \begin{array}{@{}rcl@{}} \frac{\partial p^{(0)}_{d}}{\partial f_{o}}= \frac{\partial \dot{E}/\partial f_{o}}{(\dot{E}_{d}-\dot{E}_{u})} \left[ 1 + O(D) \right] \end{array} $$
(76)

Finally, if we combine Eqs. (74) and (76), we obtain Eq. (19) in the Results for the bistable regime.

Numerical simulations of the full synaptic system

Numerical stochastic dynamics of the whole synaptic system given by Eqs. (12) were performed using a stochastic version of Runge-Kutta scheme (Roberts 2001).

Energy dissipated for plasticity by the full synaptic system (Eqs. (1)–(2)) was computed numerically using the approach presented in (Tome 2006; Tome and de Oliveira 2010). We can rewrite Eq. (1) in a more compact form as

$$ \begin{array}{@{}rcl@{}} \frac{dw_{i}}{dt} = F_{w,i} + \frac{\sqrt{2}\sigma_{w}(1 + \tau_{f}f_{i})}{\sqrt{\tau_{w}}} \eta_{i} \end{array} $$
(77)

where

$$ \begin{array}{@{}rcl@{}} F_{w,i} = \lambda f_{i}r(r-\theta) - \frac{(w_{i}-\epsilon a)}{\tau_{w}}. \end{array} $$
(78)

This enables us to write the entropy flux in the steady state (equivalent to entropy production rate) of the full synaptic system in a compact form. Consequently, the numerical entropy flux per synapse of the whole system Γnum is

$$ \begin{array}{@{}rcl@{}} {\Gamma}_{num} = \frac{1}{N} {\sum}_{i=1}^{N} \left( \frac{\tau_{w}}{{\sigma_{w}^{2}}} \langle \left( \frac{F_{w,i}}{(1+\tau_{f}f_{i})}\right)^{2} \rangle + \langle F^{\prime}_{w,i} \rangle \right) \end{array} $$
(79)

where \(F^{\prime }_{w,i}= {\partial F_{w,i}}/{\partial w_{i}}\), and it is given by

$$ \begin{array}{@{}rcl@{}} F^{\prime}_{w,i} = -\frac{1}{\tau_{w}} + \frac{\lambda\upbeta A^{2}{f_{i}^{2}}(2r-\theta)}{N(2r+\kappa A^{2})}. \end{array} $$
(80)

The numerical energy rate \(\dot {E}_{num}\) is \(\dot {E}_{num}= E_{o}{\Gamma }_{num}\). The brackets 〈...〉 in Eq. (79) denote averaging over fluctuations in synaptic noise and presynaptic firing rates (averaging over η and x stochastic variables). In numerical simulations, these averages are computed as temporal averages over long simulation time. This equivalence in averaging is guaranteed due to ergodic theorem. The minimal number of time steps for numerical convergence is of the order of \(\sim 10^{5}\).

Parameters used in computations

The following values of various parameters were used: Vr = − 65 mV, q = 0.35 (Volgushev et al. 2004), τnmda = 150 msec (Nimchinsky et al. 2004), τampa = 5 msec (Smith et al. 2003), τf = 1.0 sec, a = 1.0 nS, α = 0.3 sec (Zenke et al. 2013), 𝜖 = 3 ⋅ 10− 4, A = 600 Hz/\(\sqrt {nA}\), τw = 3600 sec (Frey and Morris 1997; Zenke et al. 2013), σf = 10 Hz (Buzsaki and Mizuseki 2014), N = 2 ⋅ 103 (average value for many species of primates; see Sherwood et al. 2020; Elston et al. 2001). The amplitude of synaptic weight noise σw was taken in the range 0.02 ≤ σw ≤ 0.5 nS, which is the range suggested in experimental studies (Matsuzaki et al. 2001; Smith et al. 2003). The two undetermined parameters are λ and κ, and two sets of values were used for them: (i) κ = 0.001 (nA⋅sec), λ = 9 ⋅ 10− 7 (nS⋅sec2), and (ii) κ = 0.012 (nA⋅sec), λ = 10− 5 (nS⋅sec2), in order to obtain a transition to the bistable regime for \(f_{o}\sim 1-5\) Hz. The value of A was chosen to have postsynaptic firing rate in the range 0.1 − 10 Hz. The value of κ was chosen to obtain vu in the neurophysiological range \(\sim 1\) pA (O’Connor et al. 2005).