Rapid Bayesian learning in the mammalian olfactory system

Hiratani, Naoki; Latham, Peter E.

doi:10.1038/s41467-020-17490-0

Rapid Bayesian learning in the mammalian olfactory system

Article
Open access
Published: 31 July 2020

Volume 11, article number 3845, (2020)
Cite this article

Download PDF

You have full access to this open access article

From

View current issue

Rapid Bayesian learning in the mammalian olfactory system

Download PDF

4099 Accesses
14 Citations
10 Altmetric
Explore all metrics

Abstract

Many experimental studies suggest that animals can rapidly learn to identify odors and predict the rewards associated with them. However, the underlying plasticity mechanism remains elusive. In particular, it is not clear how olfactory circuits achieve rapid, data efficient learning with local synaptic plasticity. Here, we formulate olfactory learning as a Bayesian optimization process, then map the learning rules into a computational model of the mammalian olfactory circuit. The model is capable of odor identification from a small number of observations, while reproducing cellular plasticity commonly observed during development. We extend the framework to reward-based learning, and show that the circuit is able to rapidly learn odor-reward association with a plausible neural architecture. These results deepen our theoretical understanding of unsupervised learning in the mammalian brain.

Adaptive temporal processing of odor stimuli

Article Open access 06 January 2021

Predictive olfactory learning in Drosophila

Article Open access 24 March 2021

Representational drift in primary olfactory cortex

Article 09 June 2021

Introduction

It is crucial for animals to infer the identity of odors, in situations ranging from foraging to mating¹. While some odors are hardwired², most must be learned. Learning, however, is particularly difficult, especially in natural environments where odors are rarely presented in isolation, most odors are presented a small number of times, and odor identities are rarely supervised. Nevertheless, animals can learn to associate an odor with a reward in a few trials^3,4,5. Our goal here is to elucidate the local plasticity mechanisms that orchestrate this rapid learning.

To gain a conceptual understand of how learning occurs, note that if the affinities of olfactory receptor neurons (OSNs) to odors were known, approximate Bayesian inference could be used to infer which odors are present given OSN activity⁶. And in a supervised setting—a setting in which the animal is told which odors are present—the affinities (i.e. the weights) could be learned efficiently using recently proposed Bayesian approaches^7,8. Here we show that, even when the weights are not known and learning is unsupervised, we can combine these two methods to simultaneously learn the weights and infer the odors.

Our approach is as follows: when inferring which odors are present, average over the uncertainty in the weights; then use the inferred odors to update the estimates of the weights, and, importantly, decrease the uncertainty. As the estimates of the weights become more accurate, inference also improves. However, while straightforward, exact implementation of this learning process is intractable. Consequently, we have to use an approximate method⁹.

Although inference is approximate, our model still leads to faster learning of olfactory stimuli compared to previously proposed sparse-coding-based approaches^10,11,12. It also provides some insight into olfactory circuitry: it reveals the advantage, relative to the rectified linear transfer function¹³, of sigmoidal-shaped f–I curves typical of biological neurons^14,15, and it reproduces the reduction in neuronal input gain^16,17 and learning rate¹⁸ commonly observed during development. In addition, it predicts that the learning rate of granule cells should decrease as they become more selective, and thus exhibit lower lifetime sparseness^19,20, something that is possible (although difficult) to test experimentally. And finally, we extended our model to an odor–reward association task, and found that learning of a concentration invariant representation at the piriform cortex helps rapid odor–reward association.

While our approach gives us a model that is reasonably consistent with mammalian olfactory circuitry, the architecture predicted by our approximate Bayesian algorithm does not perfectly match the architecture of the olfactory system. However, a plausible olfactory circuit based on our model, but with the addition of recurrent inhibition among piriform neurons²¹, still learns to perform reward-based learning quickly. These results suggest that even at the circuit level, approximate Bayesian optimization may underlie rapid biological learning. But at the same time, our study reveals its limitation when applied to a complicated system.

Results

Problem setting

Let us denote odor concentrations by a vector c = (c₁, . . . , c_M), where c_j > 0 if odor j is present and c_j = 0 otherwise. By odor, we mean something like the odor of apple or coffee, not a single odorant molecule. In a typical environment, odors are very sparse, in the sense that few of them have a significant presence (i.e. c_j > 0 for a small number of j at any time; Fig. 1 left).

In the olfactory system, odors are first detected by OSNs, and then transmitted to glomeruli as spiking activity²². Neural activity accumulated at a glomerulus, denoted x_i for ith glomerulus (and thus ith OSN receptor type), is, approximately

$${x}_{i}={\sum }_{j}{w}_{ij}{c}_{j}+n,$$

(1)

where n is the noise due to sensory variability and unreliable OSN-spiking activity, and the affinity, or the mixing weight, w_ij, determines how strongly odor j activates glomerulus i (Fig. 1 right). OSN activity shows a roughly logarithmic dependence on odor concentration^23,24. Thus the amplitude, c_j, of each odor reflects log-concentration, not concentration. Below a threshold, here taken to be zero, odors are considered undetectable.

Olfactory learning as Bayesian inference

The goal of the early olfactory system is to infer which odors are present and what their concentrations are, based on OSN activity, x. However, this is a difficult problem because the animal does not know the mixing weights, w, but instead has to learn them, without supervision. One common approach to this type of unsupervised learning is the sparse coding model. Its associated learning algorithm is, however, inefficient, and thus slow, as we will see below (see the subsection “Sparse coding” in the Methods section). We thus turn to Bayesian inference.

The Bayesian approach is efficient because it takes into account uncertainty in both odor, c, and weight, w, and it can naturally incorporate a prior that reflects the sparseness of the olfactory environment. The steps are straightforward: first write down, from Eq. (1), an expression for p(c∣x, w), the distribution over odor concentrations given glomeruli activity, x, and weights w; then marginalize over the distribution of the weights given all the previous inputs, p(w∣ past observations of x) (see Methods section, Eq. (10)). However, exact marginalization is neither computationally tractable nor biologically plausible. We therefore employ a variational Bayesian approximation⁹, by replacing the true joint probability distribution with a fully factorized one. The effect of making a variational approximation is illustrated in Fig. 2c: the distribution of a pair of odors are typically slightly anti-correlated (Fig. 2c, left), while the variational distribution is independent (Fig. 2c, right). Because the anti-correlation is typically weak, the variational distribution captures the true distribution well.

**Fig. 2: Bayesian inference of odors and weights.**

The derivation of the algorithm for variational inference is described in detail in Methods section; here we simply give the results. The variational probability distribution of the concentration of odor j is updated iteratively as (see Methods section, Eq. (14b))

$$q({c}_{j}| {\bf{x}})\propto q({\bf{x}}| {c}_{j}){p}_{{\mathrm{{c}}}}({c}_{j})$$

(2)

where q(x∣c_j) is the variational likelihood of the concentration of the jth odor, c_j, given x, and p_c(c_j) is the prior distribution over c_j. We take the noise, n, in Eq. (1) to be Gaussian, so q(x∣c_j) is Gaussian (Fig. 2a, left). And to reflect the sparsity, p_c(c_j) is taken to be a point mass at zero combined with a continuous piece at positive concentration (Fig. 2a, middle). Because, the prior strongly favors the absence of odors, the estimated mean concentration, 〈c〉_q(c∣x) (dashed black line in Fig. 2a, right), is typically smaller than the mean over the likelihood function, 〈c〉_q(x∣c) (dashed orange line in Fig. 2a, right).

Similarly, the update rule for the variational probability distribution of a weight is given by (see Methods section, Eq. (14a))

$${q}_{t}({w}_{ij})\propto \Delta {q}_{t}({w}_{ij},{\bf{x}}){q}_{t-1}({w}_{ij}),$$

(3)

where Δq_t(w_ij, x) is the evidence provided by the new information, carried in x, at trial t (Fig. 2b) and q_t(w_ij) is the variational probability distribution of the weight, w_ij, given observations up to trial t (we suppress the time dependence to reduce clutter). Importantly, depending on the uncertainty in the weights, the same stimulus causes different amounts of plasticity. In particular, the higher the uncertainty in the estimated weight, w_ij, at t−1, the larger the change in the mean weight, Δw (left vs. right in Fig. 2b).

The update rules given in Eqs. (2) and (3) can be mapped onto neural dynamics and synaptic plasticity that closely mirrors the mammalian olfactory bulb (Fig. 3a and b). The firing rate dynamics obeys

$${\tau }_{{\mathrm{{r}}}}\frac{{\mathrm{{d}}}{m}_{i}}{{\mathrm{{d}}}\tau }=-{m}_{i}-\sum_{j = 1}^{M}{w}_{ij}^{{\rm{L}}}{\overline{c}}_{j}+{x}_{i}$$

(4a)

$${\tau }_{{\mathrm{{r}}}}\frac{{\mathrm{{d}}}{\overline{c}}_{j}}{{\mathrm{{d}}}\tau }=-{\overline{c}}_{j}+{F}_{j}\left(\sum_{i = 1}^{N}{w}_{ji}^{{\rm{F}}}{m}_{i}\right)$$

(4b)

where τ denotes time within an odor presentation (not to be confused with t, which refers to trial), m_i is the firing rate of the ith M/T (mitral/tufted) cell relative to baseline, and ${\overline{c}}_{j}$ is the firing rate of the jth granule cell. The ith M/T cell is linearly modulated by excitatory input from glomerulus i, via x_i, and also by inhibitory input from granule cells, the ${\overline{c}}_{j}$. The granule cells, whose activity correspond to the expected concentration of the odors, are driven by excitatory input from M/T cells, mediated by a nonlinear transfer function F_j. As we discuss below, this nonlinearity plays a critical role in rapid learning.

**Fig. 3: Neural implementation of Bayesian learning.**

The weights in Eq. (4), ${w}_{ij}^{{\rm{F}}}$ and ${w}_{ij}^{{\rm{L}}}$, correspond to M/T-to-granule and granule-to-M/T synapses, respectively (blue and red arrows in Fig. 3b). These synapses jointly form a dendro-dendritic connection between M/T and granule cells²⁵. To keep track of the variational probabilistic distribution q_t(w_ij), both the mean and the variance of each weight need to be updated. The update of the mean is

$${w}_{ji}^{{\mathrm{{F}}},t}=(1-{\delta }_{j}^{w,t}){w}_{ji}^{{\mathrm{{F}}},t-1}+\frac{1/t}{{\rho }_{j}^{t}{\sigma }_{x}^{2}}{\overline{c}}_{j}{m}_{i}$$

(5a)

$${w}_{ij}^{{\mathrm{{L}}},t}=(1-{\delta }_{j}^{w,t}){w}_{ij}^{{\mathrm{{L}}},t-1}+\frac{1/t}{{\rho }_{j}^{t}{\sigma }_{x}^{2}}{m}_{i}{\overline{c}}_{j}$$

(5b)

where m_i and ${\overline{c}}_{j}$ are evaluated at the end of the odor presentation. Here ${\delta }_{j}^{w,t}$ is the discount factor and ${\rho }_{j}^{t}$ represents the precision (the inverse of the variance) of the synaptic weights ${w}_{ji}^{{\mathrm{{F}}},t}$ and ${w}_{ij}^{{\mathrm{{L}}},t}$ (see subsection “Synaptic plasticity” in the Methods section for details). This rule is Hebbian, as the update depends on the product of presynaptic and postsynaptic activity m_i and ${\overline{c}}_{j}$. It is also adaptive, as the update depends on the precision, ${\rho }_{j}^{t}$: because of the $1/{\rho }_{j}^{t}$ dependence, low precision (and thus high uncertainty) produces large weight changes while high precision (and thus low uncertainty) produces small weight changes. This is illustrated in Fig. 2b. The precision, ${\rho }_{j}^{t}$, is also updated in an activity-dependent manner (see the Methods section, Eq. (35)). Figure 3c describes typical neural dynamics before and after learning. Before learning, when a mix of four odors is presented, M/T activity quickly converges to constant values with a relatively broad range (Fig. 3c, top-left), and granule cell activity is small and homogeneous (Fig. 3c, bottom-left). After learning, M/T cells exhibit transient activity, followed by convergence to a somewhat smaller range than before learning (Fig. 3c, top-right), as the large input-driven activity is partially canceled by the feedback from the granule cells. Granule cells, on the other hand, show very selective responses, with activity levels roughly matching the concentration of the corresponding odors (Fig. 3c, bottom-right).

The activity profiles of cells in our model have many similarities with experimental observations. For instance, as observed in experiments²⁶, M/T cells show both positive and negative responses relative to baseline (Fig. 3c top, here the baseline is 5), and their responses become more transient after learning (Fig. 3c, top-right, and Fig. 3d). Moreover, the response range of M/T cells becomes smaller as the animal learns the odors (Fig. 3d), as observed experimentally²⁷. In addition, after learning, granule cell activity is strongly modulated by odor concentration (Fig. 3c bottom-right; dotted horizontal lines represent the true concentrations of the corresponding odors), as observed experimentally²⁸.

After learning, the circuit can robustly detect odors with very few false positives, even when several odors are presented simultaneously (Fig. 3e). Moreover, the learning performance was robust with respect to odor presentation time: even if the odors were presented for only a few hundred milliseconds, which corresponds roughly to one sniff cycle^29,30, performance remained high (Fig. 3f). Learning was also robust to changes in the prior: a large increase in the range of possible odor concentrations had very little effect on learning performance (Supplementary Fig. 1).

The Bayesian approach is optimal if implemented exactly, but in the approximate model used here, learning is necessarily suboptimal. To determine how suboptimal, we would need to compare against exact inference. However, that is not feasible because exact inference is intractable. Our model does, however, do better than the sparse coding model (Fig. 4): it learns much faster (Fig. 4a), and it achieves high performance without fine tuning, whereas the learning rate of the sparse coding model must be fine-tuned (gray lines in Fig. 4a). This advantage was replicated when we assessed the performance by the error in the weights (Fig. 4b). Despite faster learning, the asymptotic performance of the Bayesian model is similar to that of sparse coding when there are a relatively small number of odor sources in the environment, and much better when there are many sources, although the performance of both models deteriorates in that regime (Fig. 4c).

These results indicate that a variational approximation of Bayesian learning and inference enables data efficient learning, and does so using biologically plausible learning rules and neural dynamics. How does our model manage to perform fast and robust learning? And is there evidence that the brain uses this strategy? Below, we show that our proposed circuit performs well because it exploits the sparseness of the odors and utilizes the uncertainty in both the weights and odor concentration. We then discuss the relationship of our model to experimental observations.

The sparse prior leads to a nonlinear transfer function

An important feature of olfaction, like many real world inference problems, is that the distribution over odors has a mix of discrete and continuous components: an odor may or may not be present (the discrete part), and if it is present its concentration can take on a range of values (the continuous part). In our model, we formalize this with a spike and slab prior (Fig. 2a middle): the spike is the delta function at zero; the slab is the continuous part. In this model, sparseness is ensured by setting the cumulative probability of the slab, denoted c_o, close to zero.

To see how the prior affects the dynamics, note that the granule cells (${\overline{c}}_{j}$ in Eq. (4)) represent the expected concentration of the odors, and so take the prior into account. Thus, after learning, most of them have near zero activity, with only a few of them active (Fig. 3c, bottom right panel). To achieve sparsity, the granule cells need a great deal of evidence to report non-negligible concentrations. That is reflected in the transfer functions of the granule cells (the function F_j in Eq. (4b); see orange curve in Fig. 5a). The function exhibits near zero response (corresponding to near zero concentration) for small input, followed by a sharp rise and then an approximately linear response for large input.

**Fig. 5: Adaptive transfer functions.**

If we derive update rules using a different prior, the transfer function changes. If we then perform inference and learning using the transfer function derived under a different prior, but drawing odors from the true prior, performance is, not surprisingly, sub-optimal (see subsection “Models with various priors on odor concentration” in the Methods section). For example, if we constrain the odors to be non-negative, the transfer functions are approximately rectified linear, a commonly used nonlinearity in artificial neural networks (gray line Fig. 5a¹³). However, this model failed to learn the input structure generated from the spike-and-slab prior, as the sparseness of the odor concentration is not taken into account (gray line in Fig. 5b). If we constrain the odors to be non-negative, but also ensure that they are not too large, by introducing an exponential decay¹⁰, learning improves initially, but the weight error eventually increases (black lines in Fig. 5a and b). These results suggest that the classic input–output function—sigmoidal at small input and linear at large input—found both in vitro^14,31 and in biophysically realistic models of neurons¹⁵, reflects the fact that the world is truly sparse—something not captured by classical sparse coding models. These gain functions thus offer a normative explanation for the biophysical responses of typical olfaction neurons to input. The shape of the activation function for the precision update also depends on the choice of prior, but they all closely resemble the squared transfer function, F² (Supplementary Fig. 2).

As the animal learns a better approximation to the true weights, the olfactory system can extract more information from the OSN activity; this results in a change in the transfer function. In particular, the transfer function exhibits a decrease in gain with learning (mainly a shift to the right), as shown in Fig. 5c (see subsection “The variational weight distribution” in the Methods section for details). Such a decrease in gain is a widely observed phenomenon among diverse neurons during development^14,16. It is also consistent with the reduction of input resistance observed in adult-born granule cells during development^17,18, as low resistance causes low excitability. If the transfer functions were held fixed during learning, performance would deteriorate gradually (gray and black curves vs. orange line in Fig. 5d), though the benefit of the adaptive gain was rather small in our model setting.

Weight uncertainty leads to adaptive synaptic plasticity

A key aspect of our model is that it explicitly takes the uncertainty of the weights into account. This leads to an adaptive learning rate (see Eq. (5)). In particular, the learning rate is the product of two terms: $(1/t)\times 1/{\rho }_{j}^{t}$. The first term, 1/t, is a global decay, and reflects an accumulation of information over time: at the beginning of learning, the olfactory stimuli contains a relatively large amount of information about the weights, and so the learning rate is large, and vice versa. The second term, $1/{\rho }_{j}^{t}$, is the cell-specific contribution to the learning rate. In steady state, it is given approximately by $1/{\rho }_{j}^{t}\propto 1/{\langle {c}_{j}^{2}\rangle }_{{\rm{odors}}}$ (the subscript “odors” indicates an average over odors).

It turns out that the second term is related to the lifetime sparseness, ${S}_{j}\equiv {\langle {c}_{j}\rangle }_{{\rm{odors}}}^{2}/{\langle {c}_{j}^{2}\rangle }_{{\rm{odors}}}$ (note that smaller S_j means activity is more sparse; see subsection “Lifetime sparseness” in the Methods section and ref. ¹⁹). Assuming the mean firing rate, ${\langle {c}_{j}\rangle }_{{\rm{odors}}}$, is approximately constant (as we see in our simulations), then $1/{\rho }_{j}^{t}\propto {S}_{j}$. When the granule cells have broad, non-selective tuning, the lifetime sparseness is large, and the learning rate is high; when the cells are sparse and have highly selective tuning, the lifetime sparseness is low, and so is the learning rate. Thus, if the mean granule cell responses are similar for all presented odors, the learning rate is large, encouraging neurons to modify their selectivity. If, on the other hand, the granule cell responses are sparse and selective, the learning rate is low, helping the neurons stabilize their acquired selectivity.

We examined the effects of the two factors—1/t and $1/{\rho }_{j}^{t}$—on learning. When the learning rate, $1/t{\rho }_{j}^{t}$, was kept constant throughout learning, learning was slower, even when the learning rate was finely tuned (gray lines vs. orange line in Fig. 6a). This makes sense from a Bayesian perspective: early on, when weight uncertainty is large, learning should be fast (the dark gray line, which has the highest learning rate, drops rapidly), whereas after a large number of trials, when weight uncertainty is low, learning should be slow (the lighter gray lines, which have lower learning rates, have better asymptotic performance). It is also consistent with the fine tuning required for the sparse coding model in Fig. 4a and b. When we fixed 1/ρ_j but included the global factor 1/t, performance was better than the model with fixed learning rate (light-green vs. gray in Fig. 6a), yet still worse than the original fully adaptive model (light-green vs. orange in Fig. 6a). This was more clear under a less sparse setting (c_o = 0.07 in Fig. 6b, versus c_o = 0.03 in Fig. 6a). Furthermore, as predicted, we found that the learning rate of a cell, $1/t{\rho }_{j}^{t}$, is positively correlated with the lifetime sparseness at each time point (i.e. at fixed t) as shown in Fig. 6a and b. This correlation becomes weaker as the prior becomes more sparse (compare Fig. 6c and d, for which c_o = 0.03 and 0.07, respectively). That is because a very sparse prior (low c_o) helps the granule cells to be highly selective at an early stage, enabling the lifetime sparseness to quickly converge to a small value (vertical cluster on the left edge of Fig. 6c and d). These results indicate that the global and postsynaptic-neuron-specific adaptation of the learning rate cooperatively help fast learning.

**Fig. 6: Adaptive synaptic plasticity.**

Learning concentration invariant representation and valence

Our results so far indicate that olfactory learning is well characterized as an approximate Bayesian learning process. Our circuit estimates odor concentration, which is important for locating an odor source³². However, the perceived concentration depends on factors such as the distance from the odor source, its size, and wind speed. Thus, odor concentration is not a reliable indicator of the amount of reward expected. Hence, acquisition of a concentration-invariant representation is highly useful for many olfactory-guided behaviors.

A concentration-invariant representation is essentially a representation of the probability of an odor being present, denoted ${\overline{p}}_{j}$. Because of the spike in our prior, ${\overline{p}}_{j}=\Pr [{c}_{j}> 0]$, thus probability is easily decoded from M/T cells using the circuit depicted in Fig. 7a (see subsection “Learning of concentration-invariant representation” in the Methods section). Here, ${\overline{p}}_{j}$ could be represented in layer 2 of piriform cortex neurons, as that is the main downstream target of M/T cells, and odor representation in piriform cortex is approximately concentration-invariant^21,33. As the granule cells acquire odor representation, neurons in piriform cortex acquire odor probability representation (cyan and dark blue line in Fig. 7e left).

**Fig. 7: Learning a concentration invariant representation and an odor-reward association.**

While the circuit shown in Fig. 7a exhibits good performance, it is not consistent with the mammalian olfactory system, in two ways. First, the weights from the M/T cells to the granule cells have to be copied to the corresponding M/T to piriform cortex connections (i.e. w^p = w^F), something that is not biologically plausible. Second, a direct projection from granule cells to piriform cortex is needed, but such a connection does not exist. These inconsistencies can be circumvented by modifying the circuit heuristically (Fig. 7b–d). Weight copying can be avoided by learning w^p with local synaptic plasticity (Fig. 7b), although in the absence of the teaching signal from the granule cells, this naive extension does not work (dark blue line in Fig. 7e middle-left). However, introducing lateral inhibition among the piriform neurons (Fig. 7c) as observed experimentally²¹, allows the piriform neurons to acquire odor representation (Fig. 7c and e middle-right), although the decoding performance was worse than the Bayesian model (Fig. 7e left vs. 7e middle-right). Finally, if connections from piriform cells to granule cells are added as well, the learning performance of granule cells became slightly better (Fig. 7d and e right), and more robust to changes in the strength of lateral inhibition (Fig. 7f). As expected, the responses of piriform neurons were mostly concentration-invariant (dark blue line in Fig. 7g), whereas granule cells showed a clear concentration dependence (cyan line in Fig. 7g). Thus, the architecture of the mammalian olfactory circuit indeed supports robust learning of concentration-invariant representation.

Once the circuit acquires a concentration-invariant representation, a circuit that performs odor–reward association can be constructed simply by taking the circuit depicted in Fig. 7d and adding a region that receives input from both piriform neurons and the reward system (e_p in Fig. 7h). Olfactory tubercle could be the site for this odor–reward association^5,34, but it could be other regions, such as layer 3 of piriform cortex, as well. To test performance of this circuit, we implemented a go/no go task in which one odor is associated with a reward (R = 1.0), while another odor is associated with no reward (R = 0.0), regardless of concentrations. We simulated this task by randomly presenting rewarded or unrewarded stimulus with equal probability (see subsection “Go/no go task” in the Methods section). We used the circuit pre-trained with a large number of odors but without reward. When the reward prediction was learned with the projection from piriform cells, $\overline{p}$, to olfactory tubercle cells, e_p (Fig. 7h), classification performance reaches 90% after just six trials (Fig. 7j; magenta lines). On the other hand, when the circuit learns the task directly from the glomeruli (Fig. 7i), though the circuit still learns to predict the reward as suggested previously³⁵, learning was much slower and the performance was worse even after a large amount of training (Fig. 7j; purple lines). After a dozen odor–reward association from piriform neurons, $\overline{p}$, olfactory tubercle cell activity, e_p, learned to represent the reward prediction given olfactory stimuli unless the concentration is very small (left half of Fig. 7k; in our model—e_p is the reward prediction), and once the reward is presented at τ = 2.5 s, the activity went back to near zero (right half of Fig. 7k; in our model, positive e_p represents an error, and so drives learning).

These results indicate that unsupervised learning of odor representation may underlie fast reward-based learning, and the proposed Bayesian learning mechanism improves reward association by enabling robust odor representation in a data efficient way.

Discussion

We formulated unsupervised olfactory learning in the mammalian olfactory system as a Bayesian optimization problem, then derived a set of local synaptic plasticity rules and neural dynamics that implemented Bayesian inference (Figs. 2 and 3). Our theory provides a normative explanation of the functional roles for the nonlinear transfer function and the developmental adaptation of the neuronal input gain (Fig. 5), both widely observed among sensory neurons. The model also predicts that the learning rate of dendro-dendritic connections should be approximately linear in the lifetime sparseness of the corresponding granule cells (Fig. 6). Finally, we extended the framework to learning of odor identity by piriform cortex, and showed that such learning supports rapid reward association (Fig. 7).

Our results suggest that adaptation of both input gain (Fig. 5) and learning rate (Fig. 6) are important for successful learning. The developmental reduction in input gain can be explained by a decrease in neural excitability, which is partially caused by the increased expression of K⁺ channels¹⁴. Correspondingly, it is known that changes in channel expression at the dendrite modulate the sensitivity of synaptic plasticity³⁶. In particular, it has been reported that elimination of voltage-gated K⁺ channels enhances the induction of long-term potentiation³⁷. These results suggest that developmental up-regulation of K⁺ channel expression at the soma and the dendrite may underlie the adaptation of the input gain and learning rate.

The cellular plasticity rules we derived explain multiple developmental changes in adult born granule cells. Experimentally, relative to young cells, mature granule cells have sparser selectivity²⁰, lower membrane resistance^17,18, and are less plastic¹⁸, as predicted by our model. In addition, our results provide insight into the functional role of adult neurogenesis. As shown previously⁸, if each synapse keeps track of its uncertainty, by removing the most uncertain synapses while adding synapses at a random position on the dendritic tree, a neuron can achieve sample-based Bayesian learning, making neurogenesis unnecessary. However, in our unsupervised learning framework, uncertainty is defined at neurons, not at synapses. As a result, from a Bayesian perspective, there is no good way to perform synaptogenesis. Thus, the brain should instead remove the most uncertain neurons, while at the same time randomly adding new ones.

The importance of the feedback circuit between M/T cells and granule cells has been noted previously^6,38, but plasticity mechanisms that generate this circuit have not been considered. Recently, several groups proposed learning algorithms for unsupervised olfactory learning using stochastic gradient descent^11,12,39, as in the case of our sparse coding model. However, as we have seen (Fig. 4), these algorithms are very unlikely to be fast. In addition to the sparse coding model, our problem setting is deeply related to Independent component analysis (ICA)⁴⁰. Indeed, by using the sparseness as the measure of non-Gaussianity, unsupervised olfactory learning can be reformulated as an ICA problem¹¹.

The spike-and-slab prior employed here is widely used in machine learning⁴¹, and has been applied to the sparse coding model of the early visual system⁴², and a normative analysis of nonlinear transfer functions has been carried out previously⁴³. A contribution of this work is the establishment of a link between the spike-and-slab prior and nonlinear transfer function of a neuron.

Studies of adaptive learning rates date back many decades^44,45; more recent studies have taken a Bayesian approach to adaptive learning in simplified single neurons models⁷. In this study, we considered an unsupervised learning problem, and showed that the learning rate of excitatory feedforward connections should depend only on the postsynaptic activity, independent of the presynaptic activity. Moreover, our theory predicted a non-trivial relationship between the learning rate and the lifetime sparseness of the postsynaptic neuron (Fig. 6c and d).

Acceleration of reward-based learning by unsupervised learning (Fig. 7j) has been studied in the context of both semi-supervised learning and model-based reinforcement learning. In particular, the latter approach has been applied to rapid learning by animals, but these were limited to abstract models, not circuit-based implementations⁴⁶. In the invertebrate literature, Bazhenov and colleagues (2013) studied the combination of unsupervised and reward-based learning in a computational model of the insect brain⁴⁷, but plasticity was applied only to the output connections (corresponding in our model to $\overline{p}\to {e}_{\mathrm{{{p}}}}$ in Fig. 7h). Interestingly, in the invertebrate brain, the connections corresponding to $m\to \overline{p}$ are mostly random and fixed⁴⁸, so the acceleration shown in Fig. 7j is potentially unique to vertebrates.

While our approach gave us a model that is reasonably consistent with mammalian olfactory circuitry, it is not perfect. In particular, the architecture predicted by our approximate Bayesian algorithm does not match perfectly the architecture of the olfactory bulb, piriform cortex, and olfactory tubercle. We were able to make small modifications to our circuit so that it did match the biology, and still gave decent performance, but performance was about 10% worse than the circuit predicted purely by Bayesian inference (blue lines in Fig. 7e-left vs. 7e-right). This discrepancy between the predicted and observed architecture highlights a limitation of this approach, especially when applied to complex systems. In particular, it is difficult to include biological constraints, both because we do not know exactly what they are, and because there is no straightforward way to marry those constraints with a normative Bayesian approach. However, that is an important avenue for future work.

Methods

Stimulus configuration

On each trial, the response of the ith glomerulus is modeled as

$${x}_{i}=\sum_{j}{w}_{ij}{c}_{j}+{\sigma }_{x}\xi_{i}$$

(6)

where c_j is the concentration of odor j, and ξ_i is a zero mean, unit variance Gaussian random variable. The Gaussian assumption is justified because, although olfactory sensory neurons fire with approximately Poisson statistics, 1000–10,000 sensory neurons converge to a single glomerulus²², where OSN activity is conveyed to M/T cells as stochastic currents. We take the affinities, or mixing weights, w, to be log normal, followed by a normalization step

$${\mathrm{log}}\,{\widetilde{w}}_{ij} \sim {\mathcal{N}}(-{\mathrm{log}}\,({c}_{{\mathrm{{o}}}}M),1)$$

(7a)

$${w}_{ij}={\widetilde{w}}_{ij}\times \frac{\frac{1}{NM}\mathop{\sum }\nolimits_{i}^{N}\mathop{\sum }\nolimits_{j}^{M}{\widetilde{w}}_{ij}}{\frac{1}{M}\mathop{\sum }\nolimits_{j}^{M}{\widetilde{w}}_{ij}}$$

(7b)

where recall, M is the number of odors and N is the number of glomeruli. The factor multiplying ${\widetilde{w}}_{ij}$ is 1 on average, so the normalization step does not have a huge effect on the weights. However, it forces ∑_jw_ij to be strictly independent of i, which makes the learning process less noisy.

On each trial, odors c_j (j = 1, 2, . . . , M) are generated from the spike-and-slab prior given as

$${p}_{{\mathrm{{c}}}}({c}_{j})=(1-{c}_{{\rm{o}}})\delta ({c}_{j})+{c}_{{\rm{o}}}\frac{{\alpha }^{\alpha }}{\Gamma (\alpha )}{c}_{j}^{\alpha -1}{{\mathrm{{e}}}}^{-\alpha {c}_{j}}\Theta ({c}_{j}),$$

(8)

where Θ(x) is a Heaviside function. We used α = 3 everywhere except Supplementary Fig. 1, where we used α = 1. Under this prior, each odor is independently presented with probability c_o, and its amplitude follows a Gamma distribution with unit mean (Fig. 1a left). Note that the amplitude, c_j, reflects log-concentration rather than concentration²⁴. To avoid the null stimulus, we resampled the odors if all of the c_j were 0 on any particular trial.

Bayesian model

As discussed in the main text, we mainly focus on unsupervised learning, in which animals see only glomeruli activity and must make sense of it. This is essentially a clustering problem: if the same pattern of glomeruli activity occurs multiple times, the brain should recognize it as an odor. The activity patterns at the glomeruli are determined by the product of odorant concentrations in the inhaled air, and the affinities of the OSNs for those odorants. Thus, to recognize an odor, animals have to effectively learn the affinities of OSNs for each odor, and store them in the olfactory circuitry. As we will see, in our model they are stored as weights between M/T cells and granule cells. Once those weights are stored, if an odor co-occurs with a reward (or punishment), the valance of that odor can be determined. And indeed, we find that unsupervised learning enables rapid learning of odor–reward associations.

More formally, the goal of the olfactory system is to infer the odor at time t, c_t, given all past presentations of odors, x_1:t ≡ {x₁, x₂, . . . , x_t}. Because the weights are not known, they must be integrated out

$$p({{\bf{c}}}_{t}| {{\bf{x}}}_{1:t})=\int\ {\mathrm{{d}}}{\bf{w}}\ p({{\bf{c}}}_{t},{\bf{w}}| {{\bf{x}}}_{1:t}).$$

(9)

Using Bayes’ theorem, this can be written in a more intuitive form

$$p({{\bf{c}}}_{t}| {{\bf{x}}}_{1:t})\propto \int\ {\mathrm{{d}}}{\bf{w}}\ p({{\bf{x}}}_{t}| {{\bf{c}}}_{t},{\bf{w}}){p}_{{\mathrm{{c}}}}({{\bf{c}}}_{t})p({\bf{w}}| {{\bf{x}}}_{1:t-1})$$

(10)

where, recall, p_c(c_t) is the prior over odors. To derive this expression, we used two facts: given c_t and w, x_t does not depend on past observations, and c_t does not depend on past observations. The first term on the right-hand side, p(x_t∣c_t, w) is the likelihood given the weights; but because we do not know the weights, we have to marginalize over them given past observations. The marginalization step is intractable, as we have to introduce past odors and then integrate them out. This leaves us with an integral over w (Eq. (10)) that cannot be performed analytically. And even if it could, the circuit would have to memorize all past stimuli, x₁, x₂, . . , x_t−1. We thus have to perform approximate inference. For that we make a variational approximation.

Variational approximation

The integral in Eq. (10) becomes easier if the distributions factorize. We thus make the variational approximation

$$p({\bf{c}},{\bf{w}}| {{\bf{x}}}_{1:t-1},{\bf{x}})\approx {q}^{t}({\bf{w}},{\bf{c}})\equiv {\prod }_{ij}{q}_{ij}^{w,t}({w}_{ij})\times {\prod }_{j}{q}_{j}^{c}({c}_{j})$$

(11)

where, to avoid a proliferation of subscripts, we suppress the fact that c and ${q}_{j}^{c}$ are to be evaluated at trial t; in line with this, to simplify subsequent equations we replace x_t with x; and, as is standard, we suppress the dependence of q on x_1:t.

The variational distributions, ${q}_{ij}^{w,t}$ and ${q}_{j}^{c}$, are found by minimizing the KL-divergence with respect to the true distribution, with the KL-divergence given by

$${D}_{{\mathrm{{KL}}}}\left[{q}^{t}({\bf{w}},{\bf{c}})| | p({\bf{c}},{\bf{w}}| {{\bf{x}}}_{1:t-1},{\bf{x}})\right]=\int\ {\mathrm{{d}}}{\bf{c}}{\mathrm{{d}}}{\bf{w}}\ {q}^{t}({\bf{w}},{\bf{c}}){\mathrm{log}}\,\frac{{q}^{t}({\bf{w}},{\bf{c}})}{p({\bf{c}},{\bf{w}}| {{\bf{x}}}_{1:t-1},{\bf{x}})}\ .$$

(12)

As is straightforward to show⁹, minimizing this quantity leads to the update rules

$${\mathrm{log}}\,{q}_{ij}^{w,t}({w}_{ij}) \sim {\langle {\mathrm{log}}\,p({\bf{x}}| {\bf{c}},{\bf{w}})\rangle }_{\backslash {w}_{ij}}+{\langle {\mathrm{log}}\,p({\bf{w}}| {{\bf{x}}}_{1:t-1})\rangle }_{\backslash {w}_{ij}}$$

(13a)

$${\mathrm{log}}\,{q}_{j}^{c}({c}_{j}) \sim {\langle {\mathrm{log}}\,p({\bf{x}}| {\bf{c}},{\bf{w}})\rangle }_{\backslash {c}_{j}}+{\mathrm{log}}\,{p}_{c}({c}_{j})$$

(13b)

where ~ indicates equality up to a constant, \w_ij indicates an average with respect to the variational distribution over all variables except w_ij, and, similarly, \c_j indicates an average with respect to the variational distribution over all variables except c_j. In the first equation, we approximate p(w∣x_1:t−1) with the variational distribution at the previous time step, ${\prod }_{ij}{q}_{ij}^{w,t-1}({w}_{ij})$, which makes the marginalization self-consistent. This approximation breaks down early in the learning process; nevertheless, in practice it works quite well. Using this approximation, we arrive at

$${q}_{ij}^{w,t}({w}_{ij})\propto {q}_{ij}^{w,t-1}({w}_{ij})\exp \left[{\left\langle {\mathrm{log}}\,p({\bf{x}}| {\bf{c}},{\bf{w}})\right\rangle }_{\backslash {w}_{ij}}\right]$$

(14a)

$${q}_{j}^{c}({c}_{j})\propto {p}_{{\mathrm{{c}}}}({c}_{j})\exp \left[{\langle {\mathrm{log}}\,p({\bf{x}}| {\bf{c}},{\bf{w}})\rangle }_{\backslash {c}_{j}}\right]\ .$$

(14b)

In the next two subsections we derive explicit update rules by computing the averages in these expressions.

The variational odor distribution

To find the variational distribution over odors, we need to compute the average over ${\mathrm{log}}\,p({\bf{x}}| {\bf{c}},{\bf{w}})$ that appears on the right-hand side of Eq. (14b). Using the fact that the x follows a Gaussian distribution, we have

$${\langle {\mathrm{log}}\,p({{\bf{x}}}_{t}| {\bf{c}},{\bf{w}})\rangle }_{\backslash {c}_{j}} \sim -\frac{1}{2{\sigma }_{x}^{2}}{\left\langle {\sum }_{i}{\left({x}_{i}^{t}-{\sum }_{m}{w}_{im}{c}_{m}\right)}^{2}\right\rangle }_{\backslash {c}_{j}}\\ \sim -\frac{{\sum }_{i}\langle {{w}_{ij}^{t}}^{2}\rangle }{2{\sigma }_{x}^{2}}{\left({c}_{j}-\frac{1}{{\sum }_{i}\langle {{w}_{ij}^{t}}^{2}\rangle }{\sum }_{i}\langle {w}_{ij}^{t}\rangle \left[{x}_{i}^{t}-{\sum }_{m\ne j}\langle {w}_{im}^{t}\rangle \langle {c}_{m}\rangle \right]\right)}^{2},$$

(15)

where the averages are with respect to the variational distribution. This is Gaussian, and it is straightforward to work out the mean and variance. Note that both depend on the first and second moments of the weights (which, as we will see below, determine the variational weight distribution) evaluated, importantly, at time t. However, synaptic plasticity is much slower than neural dynamics, so it is reasonable to update the weights on a slower timescale than concentration. Thus, when evaluating the mean and variance, we use the weight distribution on the previous time step. Using ${\mu }_{j}^{t}$ and $1/{\lambda }_{j}^{t}$ to denote the mean and variance, and making this approximation, we have

$${\mu }_{j}^{t}\equiv \frac{1}{{\sum }_{i}\langle {{w}_{ij}^{t-1}}^{2}\rangle } \sum_{i}\langle {w}_{ij}^{t-1}\rangle \left[{m}_{i}^{t}+\langle {w}_{ij}^{t-1}\rangle \langle {c}_{j}\rangle \right]$$

(16a)

$${\lambda }_{j}^{t}\equiv \frac{1}{{\sigma }_{x}^{2}}\sum_{i}\langle {{w}_{ij}^{t-1}}^{2}\rangle$$

(16b)

where we made the definition

$${m}_{i}^{t}\equiv {x}_{i}^{t}-\sum_{j = 1}^{M}\langle {w}_{ij}^{t-1}\rangle \langle {c}_{j}\rangle \ .$$

(17)

The distribution ${q}_{j}^{c}({c}_{j})$ can now be written in a very compact form

$${q}_{j}^{{c}}({c}_{j})\propto {p}_{{\mathrm{{c}}}}({c}_{j})\ \exp \left[-\frac{{\lambda }_{j}^{t}}{2}{\left({c}_{j}-{\mu }_{j}^{t}\right)}^{2}\right].$$

(18)

As we will see below, to update the weights we just need the first and second moments of c_j (see Eq. (27a)). And for the reward-based learning, we need the probability that c_j is positive. These quantities are straightforward, if tedious, to compute, and are given as follows.

For the first moment,

$$\langle {c}_{j}\rangle =\frac{1}{{Z}_{j}\sqrt{{\lambda }_{j}}}\left([2+{\alpha }_{j}^{2}]+{\alpha }_{j}[3+{\alpha }_{j}^{2}]\Psi ({\alpha }_{j})\right),$$

(19)

where the average is with respect to the distribution in Eq. (18), Z_j is the normalization constant

$${Z}_{j}\equiv \frac{2(1-{c}_{{\rm{o}}})}{27{c}_{{\rm{o}}}}{\lambda }_{j}^{3/2}+{\alpha }_{j}+(1+{\alpha }_{j}^{2})\Psi ({\alpha }_{j}),$$

(20)

and α_j and Ψ(α_j) are defined by

$${\alpha }_{j}\equiv \sqrt{{\lambda }_{j}}{\mu }_{j}-\frac{3}{\sqrt{{\lambda }_{j}}}$$

(21a)

$$\Psi ({\alpha }_{j})\equiv \sqrt{2\pi }{e}^{{\alpha }_{j}^{2}/2}\Phi ({\alpha }_{j}),$$

(21b)

with Φ the cumulative normal function

$$\Phi (\alpha )\equiv \frac{1}{\sqrt{2\pi }}\int_{-\infty }^{\alpha }{{\mathrm{{e}}}}^{-{x}^{2}/2}{\mathrm{{d}}}x.$$

(22)

Similarly, the second moment is given by

$$\langle {c}_{j}^{2}\rangle =\frac{1}{{Z}_{j}{\lambda }_{j}}\left({\alpha }_{j}(5+{\alpha }_{j}^{2})+(3+6{\alpha }_{j}^{2}+{\alpha }_{j}^{4})\Psi ({\alpha }_{j})\right).$$

(23)

And finally, the probability that an odor is present is written

$$\Pr [{c}_{j}> 0]=\frac{1}{{Z}_{j}}\left({\alpha }_{j}+(1+{\alpha }_{j}^{2})\Psi ({\alpha }_{j})\right)\ .$$

(24)

The variational weight distribution

To find the variational distribution over weights, we need to compute the average on the right-hand side of Eq. (14a). This is the same as Eq. (15), except that the average now excludes w_ij rather than c_j,

$${\langle {\mathrm{log}}\,p({\bf{x}}| {\bf{c}},{\bf{w}})\rangle }_{\backslash {w}_{ij}} \sim -\frac{1}{2{\sigma }_{x}^{2}}{\left\langle {\left({x}_{i}-\sum_{m}{w}_{im}{c}_{m}\right)}^{2}\right\rangle }_{\backslash {w}_{ij}}\\ \sim -\frac{\langle {c}_{j}^{2}\rangle }{2{\sigma }_{x}^{2}}{\left({w}_{ij}-\frac{\langle {c}_{j}\rangle }{\langle {c}_{j}^{2}\rangle }\left[{x}_{i}-\sum_{m\ne j}\langle {w}_{im}^{t}\rangle \langle {c}_{m}\rangle \right]\right)}^{2}$$

(25)

where the averages are, as above, with respect to the variational distributions. This is a quadratic function of w_ij; thus, if we assume that ${q}_{ij}^{w,t-1}({w}_{ij})$ is Gaussian, then ${q}_{ij}^{w,t}({w}_{ij})$ is also Gaussian. Using ${\overline{w}}_{ij}^{t}$ and $1/(t{\rho }_{j}^{t})$ to denote the mean and variance at time t, respectively (the latter to anticipate the 1/t falloff of the variance expected under Bayesian filtering), Eq. (14a) becomes

$$-\frac{t{\rho }_{j}^{t}}{2}{({w}_{ij}-{\overline{w}}_{ij}^{t})}^{2} \sim -\frac{(t-1){\rho }_{j}^{t-1}}{2}{\left({w}_{ij}-{\overline{w}}_{ij}^{t-1}\right)}^{2}-\frac{\langle {c}_{j}^{2}\rangle }{2{\sigma }_{x}^{2}}{\left({w}_{ij}-\frac{\langle {c}_{j}\rangle }{\langle {c}_{j}^{2}\rangle }\left[{x}_{i}-\sum_{m\ne j}{\overline{w}}_{im}^{t}\langle {c}_{m}\rangle \right]\right)}^{2}.$$

(26)

As in Eq. (15), ${\overline{w}}^{t}$ appears on the right-hand side of Eq. (26). However, very fast synaptic plasticity is required for solving this equation recursively for all the weights. We thus approximate the right-hand side by using the previous timestep, t−1, rather than the current one, t; an approximation that should be good when the weights change slowly. Doing that, we arrive at the update rules

$${\rho }_{j}^{t}=(1-1/t){\rho }_{j}^{t-1}+\frac{1/t}{{\sigma }_{x}^{2}}\langle {c}_{j}^{2}\rangle$$

(27a)

$${\overline{w}}_{ij}^{t}=(1-1/t)\frac{{\rho }_{j}^{t-1}}{{\rho }_{j}^{t}}{\overline{w}}_{ij}^{t-1}+\frac{1/t}{{\rho }_{j}^{t}{\sigma }_{x}^{2}}\langle {c}_{j}\rangle \left({m}_{i}^{t}+{\overline{w}}_{ij}^{t-1}\langle {c}_{j}\rangle \right)$$

(27b)

where we used Eq. (17) to simplify the second expression. Note that the update rule for ${\overline{w}}_{ij}^{t}$ is local, as it depends only on variables indexed by i and j. The update rule for ρ_j is also local, and in fact depends only on variables indexed by j.

Finally, it is convenient to write the update rules for the mean and precision of the variational distribution over concentration, Eq. (16), in terms of ${\overline{w}}_{ij}$ and ρ_j,

$${\mu }_{j}^{t}\equiv \frac{1}{{\sigma }_{x}^{2}{\lambda }_{j}^{t}}\sum_{i}{\overline{w}}_{ij}^{t-1}\left[{m}_{i}^{t}+{\overline{w}}_{ij}^{t-1}\langle {c}_{j}^{t}\rangle \right]$$

(28a)

$${\lambda }_{j}^{t}\equiv \frac{1}{{\sigma }_{x}^{2}}\sum _{i}{\left({\overline{w}}_{ij}^{t-1}\right)}^{2}+\frac{N}{{\sigma }_{x}^{2}(t-1){\rho }_{j}^{t-1}}\ .$$

(28b)

As shown in Fig. 5c, the transfer function shifts to the right with learning. This seems counter-intuitive: because the weights become more certain with learning, it should take less input to the granule cells to produce activity; this suggests that the transfer functions should shift left, not right. However, an increase in certainty is not the only thing that changes with learning; the weights also become more diverse, capturing the diverse responses of glomeruli for each odor. The diversity increases the variance of the input to the granule cells, and so to ensure a sparse response with increasing diversity, the transfer functions need to shift to the right. In our model, increased diversity (the first term in Eq. (28b)) had a larger effect than increased certainty (the second term), resulting in a net rightward shift in the transfer functions.

Network model

The analysis in the previous sections revealed that under the variational approximation, the distribution of the odors and the weights are updated locally. Thus, we implement the update rules in a network model of the olfactory bulb. The update of the weight distribution, ${q}_{ij}^{w,t}({w}_{ij})$, depends on 〈c_j〉 and $\langle {c}_{j}^{2}\rangle$, as shown in Eq. (27), while the update of the odor distribution, ${q}_{j}^{c,t}({c}_{j})$, depends on ${\overline{w}}_{ij}$ and ρ_j, as shown in Eq. (28). Ideally, all these parameters should be updated simultaneously. However, as mentioned above, updates to synaptic weights are typically much slower than the neural dynamics, so here we consider a two step update. First, the relevant parameters of the variational odor distribution, 〈c_j〉 and $\langle {c}_{j}^{2}\rangle$, are updated using the mean and precision of the weight distribution, ${\overline{w}}_{ij}$ and ρ_j, evaluated at t−1. Then, ${\overline{w}}_{ij}$ and ρ_j are updated using the first and second moments of the weights, 〈c_j〉 and $\langle {c}_{j}^{2}\rangle$, evaluated at time t.

Neural dynamics

Our goal is to write down a set of dynamical equations for 〈c_j〉 and $\langle {c}_{j}^{2}\rangle$ whose fixed points correspond to the values given in Eqs. (19) and (23), respectively. Examining these equations, we see that 〈c_j〉 and $\langle {c}_{j}^{2}\rangle$ depend on α_j and λ_j; after a small amount of algebra (involving the insertion of Eq. (28a) into Eq. (21a)), α_j may be written

$${\alpha }_{j}=\frac{1}{\sqrt{{\lambda }_{j}}{\sigma }_{x}^{2}}\left(\sum_{i}{\overline{w}}_{ij}{m}_{i}+\sum_{i}{\overline{w}}_{ij}^{2}\langle {c}_{j}\rangle -3{\sigma }_{x}^{2}\right)\ .$$

(29)

To avoid clutter, we dropped the dependence on time, but the weights should be evaluated at time t−1 and all other variables at time t.

Because neither α_j nor λ_j (the latter given in Eq. (28b)) depend on $\langle {c}_{j}^{2}\rangle$, we can write down coupled equations for 〈c_j〉 and m_i; the solution of those equations gives us the values of α_j and λ_j, which in turn gives us, via Eq. (23), $\langle {c}_{j}^{2}\rangle$. Using, for notational ease, ${\overline{c}}_{j}$ rather than 〈c_j〉, the simplest such equations (derived from Eqs. (17) and (19)) are

$${\tau }_{{\mathrm{{r}}}}\frac{{\mathrm{{d}}}{m}_{i}}{{\mathrm{{d}}}\tau }={x}_{i}-{m}_{i}-\sum_{j = 1}^{M}{w}_{ij}^{{\rm{L}}}{\overline{c}}_{j}$$

(30)

$${\tau }_{{\mathrm{{r}}}}\frac{{\mathrm{{d}}}{\overline{c}}_{j}}{{\mathrm{{d}}}\tau }=-{\overline{c}}_{j}+{F}_{j}\left[\sum_{i = 1}^{N}{w}_{ji}^{{\rm{F}}}{m}_{i};{\overline{c}}_{j}\right]$$

(31)

where τ_r is the time constant of the firing rate dynamics, and the nonlinear transfer function, F, is given by the right-hand side of Eq. (19)

$${F}_{j}\left[\sum_{i = 1}^{N}{w}_{ji}^{{\rm{F}}}{m}_{i};{\overline{c}}_{j}\right]\equiv \frac{1}{\sqrt{{\lambda }_{j}}}\frac{(2+{\alpha }_{j}^{2})+{\alpha }_{j}(3+{\alpha }_{j}^{2})\Psi ({\alpha }_{j})}{\frac{2(1-{c}_{{\rm{o}}})}{27{c}_{{\rm{o}}}}{\lambda }_{j}^{3/2}+{\alpha }_{j}+(1+{\alpha }_{j}^{2})\Psi ({\alpha }_{j})}$$

(32)

with α_j given in Eq. (29) and λ_j in Eq. (28b). Note that we have replaced the average weights, ${\overline{w}}_{ij}$, with two different weights, ${w}_{ij}^{{\rm{L}}}$ and ${w}_{ij}^{{\rm{F}}}$. Ideally, we should have ${w}_{ji}^{{\rm{F}}}={w}_{ij}^{{\rm{L}}}={\overline{w}}_{ij}$, but, for biological plausibility, we allow these reciprocal synapses to be learned independently. Note that when evaluating α_j, Eq. (29), ${w}_{ij}^{{\rm{F}}}$ should be used. Although the expression for F_j seems complicated, the transfer functions are relatively smooth, and resemble experimentally observed ones (see Fig. 5).

As shown in Fig. 3b, this dynamical system resembles the neural dynamics of the olfactory bulb, under the assumption that m_i and ${\overline{c}}_{j}$ are the firing rates of M/T cells and the granule cells, respectively. With this assumption, ${w}_{ji}^{{\rm{F}}}$ is the connection from M/T cell i to granule cell j and ${w}_{ij}^{{\rm{L}}}$ is the connection from granule cell j to M/T cell i.

Finally, the second moment of the concentration is given, via Eq. (23), by

$$\langle {c}_{j}^{2}\rangle ={G}_{j}\left[\sum_{i}^{N}{w}_{ji}^{{\mathrm{{F}}},t-1}{m}_{i};{\overline{c}}_{j}\right]\equiv \frac{1}{{\lambda }_{j}}\frac{{\alpha }_{j}(5+{\alpha }_{j}^{2})+(3+6{\alpha }_{j}^{2}+{\alpha }_{j}^{4})\Psi ({\alpha }_{j})}{\frac{2(1-{c}_{{\rm{o}}})}{27{c}_{{\rm{o}}}}{\lambda }_{j}^{3/2}+{\alpha }_{j}+(1+{\alpha }_{j}^{2})\Psi ({\alpha }_{j})}\ .$$

(33)

Synaptic plasticity

After trial t, the average feedforward weights, ${w}_{ji}^{{\rm{F}}}$, and the average lateral weights, ${w}_{ij}^{{\rm{L}}}$, are updated as in Eq. (27b)

$${w}_{ji}^{{\mathrm{{F}}},t}=\left(1-{\delta }_{j}^{w,t}\right){w}_{ji}^{{\mathrm{{F}}},t-1}+\frac{1/t}{{\rho }_{j}^{t}{\sigma }_{x}^{2}}{\overline{c}}_{j}{m}_{i}$$

(34a)

$${w}_{ij}^{{\mathrm{{L}}},t}=\left(1-{\delta }_{j}^{w,t}\right){w}_{ij}^{{\mathrm{{L}}},t-1}+\frac{1/t}{{\rho }_{j}^{t}{\sigma }_{x}^{2}}{m}_{i}{\overline{c}}_{j}$$

(34b)

$${\delta }_{j}^{w,t}\equiv \frac{1}{t}+\left(1-\frac{1}{t}\right)\left(1-\frac{{\rho }_{j}^{t-1}}{{\rho }_{j}^{t}}\right)-\frac{{\overline{c}}_{j}^{2}}{t{\rho }_{j}^{t}{\sigma }_{x}^{2}}\ .$$

(34c)

We used the firing rates m_i and ${\overline{c}}_{j}$ at the end of trial, after the neural dynamics has reached steady state. As the weight updates depend primarily on the product of m_i and ${\overline{c}}_{j}$, the learning rules are essentially Hebbian. Note that if the initial conditions are the same (i.e., if ${w}_{ji}^{{\mathrm{{F}}},0}={w}_{ij}^{{\mathrm{{L}}},0}$), then ${w}_{ji}^{{\mathrm{{F}}},t}$ and ${w}_{ij}^{{\mathrm{{L}}},t}$ will remain the same for all time. This is reasonable given that connections between M/T cells and granule cells are dendro-dendritic.

The variance of the weights, $1/t{\rho }_{j}^{t}$, consists of two components. The first, 1/t, represents the global hyperbolic decay in the learning rate due to accumulation of information. In our simulations, we started t from $t={t}_{\min }$ to suppress the influence of the initial samples; this is equivalent to using a trial-dependent discount factor $1/(t+{t}_{\min })$ instead of 1/t, where t is the actual trial count. The second, ${\rho }_{j}^{t}$, represents the neuron-specific contribution to the precision, and is given, via Eqs. (27) and (23), by

$${\rho }_{j}^{t}=(1-1/t){\rho }_{j}^{t-1}+\frac{1}{t{\sigma }_{x}^{2}}{G}_{j}\left[\sum_{i}^{N}{w}_{ji}^{{\mathrm{{F}}},t-1}{m}_{i};{\overline{c}}_{j}\right]\ ,$$

(35)

where G_j, the second moment of the concentration, is given in Eq. (33).

Models with various priors on odor concentration

In our model setting, the prior over concentration, p_c(c), enters via Eq. (14b), and affects the transfer functions F and G, given in Eqs. (32) and (33), respectively. Choosing different priors gives different transfer function. Below we consider two common ones: non-negative, and non-negative with an exponential decay.

The first of these is actually an improper prior, p_c(c) ∝ Θ(c). This results in gain functions of the form

$$F[{\mu }_{j};{\lambda }_{j}]={\mu }_{j}+\frac{1}{\sqrt{{\lambda }_{j}}\Psi \left[\sqrt{{\lambda }_{j}}\mu_{j} \right]}$$

(36a)

$$G[{\mu }_{j};{\lambda }_{j}]={\mu }_{j}F[{\mu }_{j};{\lambda }_{j}]+\frac{1}{{\lambda }_{j}}$$

(36b)

where μ_j and λ_j are given in Eqs. (28a) and (28b), respectively.

Under the non-negative prior introduced above, all positive concentrations are equally likely. However, that is not the case in a typical environment. Far more realistic is to assume that large concentrations are exponentially unlikely, yielding a prior of the form ${p}_{{\mathrm{{c}}}}(c)=\frac{1}{{c}_{{\rm{o}}}}\exp \left(-c/{c}_{{\rm{o}}}\right)$. (The decay constant, c_o, was chosen so that the mean is equal to c_o, the same mean as in the true generative model.) For this prior, the functions F and G are

$$F[{\mu }_{j};{\lambda }_{j}]=\left({\mu }_{j}-\frac{1}{{c}_{{\rm{o}}}{\lambda }_{j}}\right)+\frac{1}{\sqrt{{\lambda }_{j}}\Psi \left[\sqrt{{\lambda }_{j}}{\mu }_{j}-\frac{1}{{c}_{{\rm{o}}}\sqrt{{\lambda }_{j}}}\right]}$$

(37a)

$$G[{\mu }_{j};{\lambda }_{j}]=\left({\mu }_{j}-\frac{1}{{c}_{{\rm{o}}}{\lambda }_{j}}\right)F[{\mu }_{j};{\lambda }_{j}]+\frac{1}{{\lambda }_{j}}\ .$$

(37b)

While this prior is suboptimal for olfactory learning, experimental results from visual cortex indicate that the transfer function there resembles the one in Eq. (37a)⁴⁹ (black curve in Fig. 5a). Indeed, in early visual regions, where the prior is arguably more continuous¹⁰, this shifted rectified-linear transfer function, might be more beneficial⁵⁰.

Learning concentration invariant representations

Up to now we focused on the expected concentration, ${\overline{c}}_{j}$. However, in natural environments animals often care more about whether or not an odor exists in its vicinity than what its concentration is. From a Bayesian perspective, this means the animals should compute the probability that an odor is present, denoted ${\overline{p}}_{j}$. Using Eq. (24), ${\overline{p}}_{j}$ can be estimated as the steady state of the following dynamics:

$${\tau }_{{\mathrm{{r}}}}\frac{{\mathrm{{d}}}{\overline{p}}_{j}}{{\mathrm{{d}}}\tau }=-{\overline{p}}_{j}+{H}_{j}\left[\sum_{i}{w}_{ji}^{{\rm{F}}}{m}_{i},{\overline{c}}_{j}\right]$$

(38)

where H_j, which is approximately sigmoidal, is given, via Eq. (24), by

$${H}_{j}\left[\sum_{i}{w}_{ij}^{{\rm{F}}}{m}_{i},{\overline{c}}_{j}\right]=\frac{{\alpha }_{j}+(1+{\alpha }_{j}^{2})\Psi ({\alpha }_{j})}{\frac{2(1-{c}_{{\rm{o}}})}{27{c}_{{\rm{o}}}}{\lambda }_{j}^{3/2}+{\alpha }_{j}+(1+{\alpha }_{j}^{2})\Psi ({\alpha }_{j})}$$

(39)

with α_j given in Eq. (29), but with ${\overline{w}}_{ij}$ replaced by ${w}_{ij}^{{\mathrm{{F}}}}$ in that equation as before.

In principle, neurons receiving input, m_i, from M/T cells, such as layer 2 piriform cortex neurons, can decode the odor probability, as shown in Fig. 7a and 7e-left. However, to calculate H_j given input from M/T cells, the neuron would need to know the weights, ${w}_{ij}^{{\rm{F}}}$, as well as λ_j and ${\overline{c}}_{j}$ (the latter because α_j depends on ${\overline{c}}_{j}$; see Eq. (29)). This is clearly unrealistic, because there is no known biological mechanism that enables copying weights. Moreover, because granule cells do not have output projections, except for the dendro-dendritic connections with M/T cells, piriform neurons cannot know ${\overline{c}}_{j}$ directly. Nevertheless, piriform neurons can learn to decode the concentration-invariant representation, ${\overline{p}}_{j}$, as follows.

Let us use ${w}_{ji}^{{\mathrm{{p}}}}$ to denote the mean weight from M/T cells to the piriform neurons (see Fig. 7b–d). Assume for the moment that ${w}_{ji}^{{\mathrm{{p}}}}\approx {w}_{ji}^{{\rm{F}}}$; shortly we will write down a learning rule that achieves this (see Eq. (43)). This takes care of the weights, but we also need an approximation to ${\overline{c}}_{j}$. For that, we notice that if the estimation is unbiased, on average both ${\overline{c}}_{j}$ and ${\overline{p}}_{j}$ are equal to c_o. Thus, the simplest way to approximate ${\overline{c}}_{j}$ with the information available to the jth piriform neuron is to use ${\overline{c}}_{j}\approx {\overline{p}}_{j}$. Under this approximation, and using ${w}_{ji}^{{\mathrm{{p}}}}$ in place of ${w}_{ji}^{{\rm{F}}}$, Eq. (38) becomes

$${\tau }_{{\mathrm{{r}}}}\frac{{\mathrm{{d}}}{\overline{p}}_{j}}{{\mathrm{{d}}}\tau }=-{\overline{p}}_{j}+{H}_{j}\left[\sum_{i}{w}_{ij}^{{\mathrm{{p}}}}{m}_{i},{\overline{p}}_{j}\right]$$

(40)

where H_j is the same as Eq. (39), but with α_j replaced by ${\alpha }_{j}^{p}$—the analog of α_j, but with ${w}_{ij}^{{\mathrm{{p}}}}$ and lateral inhibition

$${\alpha }_{j}^{p}\equiv \frac{1}{\sqrt{{\lambda }_{j}^{p}}{\sigma }_{x}^{2}}\left({\sum }_{i}{w}_{ij}^{{\mathrm{{p}}}}{m}_{i}+\sum_{i}{\left({w}_{ij}^{{\mathrm{{p}}}}\right)}^{2}{\overline{p}}_{j}-{\sigma }_{x}^{2}\left[3+{\lambda }_{j}^{p}\sum_{k\ne j}{J}_{jk}{\overline{p}}_{k}\right]\right)$$

(41)

where considering the analogy with Eq. (28b), ${\lambda }_{j}^{p}$ is given by

$${\lambda }_{j}^{p}\equiv \frac{1}{{\sigma }_{x}^{2}}\sum_{i = 1}^{N}{\left({w}_{ji}^{{\mathrm{{p}}}}\right)}^{2}+\frac{N}{{\sigma }_{x}^{2}(t-1){\rho }_{j}^{p,t-1}}.$$

(42)

As above, ${\overline{p}}_{j}$ evolves with the weights set to their values updated at the end of previous trial. Once the neural dynamics reaches steady state, the weights are updated as in Eq. (34)

$${w}_{ji}^{{\mathrm{{p}}},t}=\left(1-{\delta }_{j}^{p,t}\right){w}_{ji}^{{\mathrm{{p}}},t-1}+\frac{1/t}{{\rho }_{j}^{p,t}{\sigma }_{x}^{2}}{F}_{j}\left[\sum_{i}^{N}{w}_{ji}^{{\mathrm{{p}}},t-1}{m}_{i},{\overline{p}}_{j}\right]{m}_{i}$$

(43a)

$${\delta }_{j}^{p,t}\equiv \frac{1}{t}+\left(1-\frac{1}{t}\right)\left(1-\frac{{\rho }_{j}^{p,t-1}}{{\rho }_{j}^{p,t}}\right)-\frac{1}{t{\rho }_{j}^{p,t}{\sigma }_{x}^{2}}{\left({F}_{j}\left[\sum_{i}^{N}{w}_{ji}^{p,t-1}{m}_{i},{\overline{p}}_{j}\right]\right)}^{2}$$

(43b)

and the precision as in Eq. (27a)

$${\rho }_{j}^{p,t}=(1-1/t){\rho }_{j}^{p,t-1}+\frac{1/t}{{\sigma }_{x}^{2}}{G}_{j}\left[\sum_{i}^{N}{w}_{ji}^{p,t-1}{m}_{i},{\overline{p}}_{j}\right].$$

(44)

Here F_j and G_j are the estimated first/second moment given in Eqs. (32) and Eq. (33), but calculated with ${\alpha }_{j}^{p}$ in Eq. (41). In steady state, these two terms approximate ${\overline{c}}_{j}$ and $\langle {c}_{j}^{2}\rangle$, respectively. In addition, to ensure sparse piriform cell firing⁵¹, we introduced Hebbian plasticity to the lateral weights J_jk,

$$\Delta {J}_{jk}=0.1{\overline{p}}_{k}\left(-5{c}_{{\rm{o}}}{J}_{jk}+{\overline{p}}_{j}\right)\ ,$$

(45)

while bounding J_jk > 0 and enforcing J_jj = 0. We initialized J_jk by J_jk = 0.02.

In Fig. 7e (panel d), 7f (orange line), 7g, and 7j–k, we modified the transfer function F_j of granule cells by replacing the prior term c_o with the input from piriform neuron ${\overline{p}}_{j}$. This means that ${F}_{j}^{{\rm{D}}}$ is written as

$${F}_{j}^{{\rm{D}}}\left[\sum_{i = 1}^{N}{w}_{ji}^{{\rm{F}}}{m}_{i};{\overline{c}}_{j},{\overline{p}}_{j}\right]\equiv \frac{1}{\sqrt{{\lambda }_{j}}}\frac{(2+{\alpha }_{j}^{2})+{\alpha }_{j}(3+{\alpha }_{j}^{2})\Psi ({\alpha }_{j})}{\frac{2(1-{\overline{p}}_{j})}{27{\overline{p}}_{j}}{\left({\lambda }_{j}\right)}^{3/2}+{\alpha }_{j}+(1+{\alpha }_{j}^{2})\Psi ({\alpha }_{j})}$$

(46)

where α_j is still given by Eq. (29). We modulated the gain function G_j of granule cells, Eq. (33), in the same way, by replacing c_o with ${\overline{p}}_{j}$. In Fig. 7f, we changed the relative strength of lateral inhibition by replacing J_jk in Eq. (41) with κ_JJ_jk where κ_J, the relative strength, ranged from 0 to 3, as shown in the x-axis of Fig. 7f, while using the original J_jk for the weight update.

Reward-based learning

Assuming that the reward amplitude depends only on the identity of the odors, not on their concentrations, the reward, R, on trial t is given by

$$R=\sum_{j = 1}^{M}{a}_{j}\Theta ({c}_{j})+{\sigma }_{\zeta }{\zeta }_{t}$$

(47)

where ζ_t is a zero mean, unit variance Gaussian random variable, and Θ(x) is a Heaviside function.

To estimate the reward, we augment the circuit in Fig. 7d by introducing a set of neurons, denoted e_p, that receive input both from ${\overline{p}}_{j}$ and the reward, R (see Fig. 7h). Using $\overline{a}$ to denote those weights, the natural neural dynamics of e_p is

$${\tau }_{{{r}}}\frac{{\mathrm{{d}}}{e}_{{{p}}}}{{\mathrm{d}}\tau }=-{e}_{{{p}}}+{\widehat{R}}_{t}(\tau )-\sum_{j}{\overline{a}}_{j}{\overline{p}}_{j}.$$

(48)

To represent the delay in reward delivery, ${\widehat{R}}_{t}(\tau )$ is zero for the first 2.5 s; after that it is set to the value of the reward,

$${\widehat{R}}_{t}(\tau )=\left\{\begin{array}{ll}0&\,\,\tau <2.5\,{\rm{{s}}} \\ R&\,\,\tau \ge 2.5\,{\rm{{s}}}.\end{array}\right.$$

(49)

Note that for the first 2.5 s of the trial, -e_p carries a prediction of the upcoming reward from the olfactory input, x. Once the reward is provided, the neuron represents the difference between the expected reward and the actual reward. That difference can be used to drive learning, via Hebbian plasticity

$${\overline{a}}_{j}^{t}={\overline{a}}_{j}^{t-1}+{\eta }_{a}{e}_{{\mathrm{{p}}}}{\overline{p}}_{j}$$

(50)

where ${\overline{a}}_{j}$ is updated only after the reward has been presented. Importantly, e_p is evaluated after the reward presentation.

Similarly, for the direct readout from x depicted in Fig. 7i, the reward is predicted by

$${\tau }_{{\mathrm{{r}}}}\frac{{\mathrm{{d}}}{e}_{x}}{{\mathrm{{d}}}\tau }=-{e}_{x}+{\widehat{R}}_{t}-\sum_{i}{h}_{i}{x}_{i}\ ,$$

(51)

with h_i again update via Hebbian plasticity,

$${h}_{i}^{t}={h}_{i}^{t-1}+{\eta }_{h}{e}_{x}{x}_{i}\ ,$$

(52)

after the reward has been presented.

Sparse coding

The sparse coding model originally proposed by Olshausen and colleagues^10,52 can be applied to the model of olfactory learning as shown below. The basic idea is that the odor, denoted $\widehat{{\bf{c}}}$, and the weight matrix, denoted $\widehat{{\bf{w}}}$, that best explains the input, x, should be close to the real c and w. This means $\widehat{{\bf{c}}}$ and $\widehat{{\bf{w}}}$ can be estimated by performing stochastic gradient descent on the likelihood of the inputs, x.

However, this is sub-optimal, primarily because uncertainty in $\widehat{{\bf{c}}}$ and $\widehat{{\bf{w}}}$ are ignored, even though they are important for data efficient learning⁴⁵. In addition, for tractability, the prior over the odors is taken to be a continuous function, making it difficult to capture the fact that at any given time most odors are absent. These constraints make the learning algorithm inefficient.

The log likelihood of the data with respect to an unknown set of weights, denoted $\widehat{{\bf{w}}}$, is given by

$${\mathrm{log}}\,p({{\bf{x}}}_{t}| \widehat{{\bf{w}}}) = \, {\mathrm{log}}\,\left(\int\ p({{\bf{x}}}_{t}| {{\bf{c}}}_{t},\widehat{{\bf{w}}})p({{\bf{c}}}_{t}){\mathrm{{d}}}{{\bf{c}}}_{t}\right)\\ \approx {\mathrm{log}}\,\left(p({{\bf{x}}}_{t}| {\widehat{{\bf{c}}}}_{t},\widehat{{\bf{w}}})p({\widehat{{\bf{c}}}}_{t})\right)+{\rm{const}}.$$

(53)

In the second line, the integral was approximated with the maximum a posteriori estimate ${\widehat{{\bf{c}}}}_{t}=\arg \,\mathop{\max }\limits_{{\bf{c}}}p({{\bf{x}}}_{t}| {\bf{c}},\widehat{{\bf{w}}})p({\bf{c}})$. The objective function is thus given by

$${E}_{t}\equiv {\mathrm{log}}\,p({{\bf{x}}}_{t}| {\widehat{{\bf{c}}}}_{t},\widehat{{\bf{w}}})+{\mathrm{log}}\,p({\widehat{{\bf{c}}}}_{t}).$$

(54)

Because the noise on x_t is Gaussian (see Eq. (6)), the first term is a simple quadratic function. However, the second term, ${\mathrm{log}}\,p({\widehat{{\bf{c}}}}_{t})$, requires further approximation to remove the delta function, and thus ensure differentiability of E_t with respect to ${\hat{c}}_{j}$. To this end, we approximated the prior with a Gamma distribution: ${p}_{{\mathrm{{c}}}}({\hat{c}}_{j})\propto {\hat{c}}_{j}^{{k}_{c}-1}{}{{\mathrm{{e}}}}^{-{\hat{c}}_{j}/{\theta }_{c}}$, for which the mean is k_cθ_c. We used k_c = 3 and θ_c = c_o/3, ensuring a mean of c_o. Under this approximation, the objective function, E_t, becomes

$${E}_{t}=\frac{-1}{2{\sigma }_{x}^{2}}{\sum }_{i}{\left({x}_{i}^{t}-\sum _{j}{\widehat{w}}_{ij}{\hat{c}}_{j}^{t}\right)}^{2}+\sum_{j}\left(({k}_{c}-1){\mathrm{log}}\,{\hat{c}}_{j}^{t}-{\hat{c}}_{j}^{t}/{\theta }_{c}\right).$$

(55)

We maximize the objective function via stochastic gradient descent, which occurs in two steps. In the first step, we maximize E_t with respect to $\widehat{{\bf{c}}}$,

$$\Delta {\hat{c}}_{j}\propto \frac{\partial {E}_{t}}{\partial {\hat{c}}_{j}}=\frac{1}{{\sigma }_{x}^{2}}\sum_{i}{\hat{m}}_{i}{\widehat{w}}_{ij}+\frac{{k}_{c}-1}{{\hat{c}}_{j}}-\frac{1}{{\theta }_{c}}\ ,$$

(56)

where ${\hat{m}}_{i}$ is the analog of Eq. (17),

$${\hat{m}}_{i}\equiv {x}_{i}-\sum_{j}{\widehat{w}}_{ij}{\hat{c}}_{j}.$$

(57)

Once ${\hat{c}}_{j}$ has converged, we update the weights via

$$\Delta {\widehat{w}}_{ij}\propto \frac{\partial {E}_{t}}{\partial {\widehat{w}}_{ij}}=\frac{1}{{\sigma }_{x}^{2}}{\hat{m}}_{i}{\hat{c}}_{j}.$$

(58)

To prevent divergence of the weights, after each timestep we apply L-2 normalization (see Eq. (60b) below).

In summary, on each trial, t, first, the ${\hat{c}}_{j}\,(j=1,2,...,M)$ are updated,

$${\hat{c}}_{j}^{t}(\tau )={\hat{c}}_{j}^{t}(\tau -1)+{\eta }_{c}\left(\sum_{i}{\hat{m}}_{i}^{t}(\tau -1){\widehat{w}}_{ij}^{t-1}+{\sigma }_{x}^{2}\left[\frac{2}{{\hat{c}}_{j}^{t}(\tau -1)}-\frac{3}{{c}_{{\rm{o}}}}\right]\right),$$

(59)

where the time step τ runs from 0 to 100,000 in each trial. At the end of trial t, the weights are then updated by

$${\widetilde{w}}_{ij}={\widehat{w}}_{ij}^{t-1}+{\eta }_{w}{\hat{m}}_{i}^{t}{\hat{c}}_{j}^{t}$$

(60a)

$${\widehat{w}}_{ij}^{t}=\frac{e}{{c}_{{\rm{o}}}M}\frac{{\widetilde{w}}_{ij}}{\sqrt{{\sum }_{i}{\widetilde{w}}_{ij}^{2}/N}}\ .$$

(60b)

The learning rates, η_c and η_w, were manually tuned. We used η_c = 0.00001 and η_w = 0.5 unless stated otherwise.

Simulation details

The parameters used in the simulations are given in Table 1. Additional details of the simulations, from the implementation of neural dynamics to the setting of Go/no go task, are provided in Table 1.

Table 1 Definitions and values of the parameters.

Full size table

Implementation of neural dynamics

The M/T cell activity, m_i, was defined relative to a baseline, denoted m_sp; in Fig. 3c, we plotted ${\widetilde{m}}_{i}\equiv {m}_{i}+{m}_{{\rm{sp}}}$. On each trial, m_i was initialized to zero and ${\overline{c}}_{j}$ to c_o: m_i(τ = 0) = 0 (i.e., ${\widetilde{m}}_{i}(0)={m}_{{\rm{sp}}}$) and ${\overline{c}}_{j}(\tau =0)={c}_{{\rm{o}}}$. In addition, the firing rates were lower-bounded by m_i ≥ −m_sp and ${\overline{c}}_{j}\ge 0$.

To avoid numerical instability, Ψ(α) in Eq. (21b) was approximated as

$$1/\Psi (\alpha )\approx \left\{\begin{array}{ll}-\alpha &\,\frac{\alpha }{\sqrt{2}}<-10\\ \frac{\exp (-{\alpha }^{2}/2)}{\sqrt{2\pi }\Phi (\alpha )}&\,-10\le \frac{\alpha }{\sqrt{2}}\le 10\\ 0&\,10<\frac{\alpha }{\sqrt{2}}\ .\end{array}\right.$$

(61)

Implementation of synaptic plasticity

Both the feedforward and lateral weights were initially sampled from a log-normal distribution

$${w}_{ij}^{t = 0}={\mathrm{log}}\,N({\mu }_{g}^{{\rm{init}}},{\sigma }_{g}^{{\rm{init}}})\ ,$$

(62)

with the variance and mean parameters set to

$${\sigma }_{{\rm{{g}}}}^{{\rm{init}}}=0.1$$

(63a)

$${\mu }_{{\mathrm{{g}}}}^{{\rm{init}}}=\frac{1}{2}\left(1-{({\sigma }_{{\mathrm{{g}}}}^{{\rm{init}}})}^{2}\right)-{\mathrm{log}}\,({c}_{{\rm{o}}}M)\ .$$

(63b)

The precision factors, ρ_j, were initialized as

$${\rho }_{j}^{t = 0}=\frac{{c}_{{\rm{o}}}}{{\sigma }_{x}^{2}{Z}_{\rho }}\ .$$

(64)

We used Z_ρ = 0.5, except in Fig. 6b and d, where we used Z_ρ = 0.3. The weights were lower-bounded by zero. As mentioned above, in the simulations we started t from $t={t}_{\min }$ to suppress the influence of the initial samples. Recurrent inhibition, J, was initialized to J_jk = 0.02 × (1−δ_jk).

Learning with a fixed gain function

In Fig. 5d, we fixed all λ_j at 200 (gray) and 342 (black), while the ${\rho }_{j}^{t}$ were updated at each trial as in Eq. (35).

Learning with a fixed learning rate

Fixing the learning rate, $1/t{\rho }_{j}^{t}$, to a constant, denoted η, the learning rules for ${w}_{ji}^{{\rm{F}}}$ and ${w}_{ij}^{{\rm{L}}}$ are rewritten as

$$\begin{array}{rcl}{w}_{ji}^{{\rm{{F}}},t}&=&{w}_{ji}^{{\rm{{F}}},t-1}+\frac{\eta }{{\sigma }_{x}^{2}}{\overline{c}}_{j}\left[{m}_{i}+{\overline{c}}_{j}{w}_{ji}^{{\rm{{F}}},t-1}\right]\\ {w}_{ij}^{{\rm{{L}}},t}&=&{w}_{ij}^{{\rm{{L}}},t-1}+\frac{\eta }{{\sigma }_{x}^{2}}{\overline{c}}_{j}\left[{m}_{i}+{\overline{c}}_{j}{w}_{ij}^{{\rm{{L}}},t-1}\right]\end{array}$$

(65)

and λ_j is given by

$${\lambda }_{j}=\frac{1}{{\sigma }_{x}^{2}}\left(\sum_{i = 1}^{N}{\left({w}_{ji}^{{\rm{F}}}\right)}^{2}+N\eta \right).$$

(66)

Go/no go task

In the simulation of the go/no go task, we selected two odors (j₊ and j₋) out of M total odors, then randomly presented one or the other with concentrations drawn from a Gamma distribution (as in Eq. (8), but c_j > 0 and c_o = 1). The reward associated with j₊ was R = 1.0 + ζ (i.e. ${a}_{{j}_{+}}=1.0$), where ζ is the noise in the observed reward sampled from a zero-mean Gaussian with variance 0.01. The reward associated with j₋ was R = ζ (i.e. ${a}_{{j}_{+}}=0.0$).

Learning of the circuit shown in Fig. 7h was done in two steps. First, the weights, ${w}_{ij}^{{\rm{F}}},{w}_{ij}^{{\rm{L}}}$ and ${w}_{ij}^{{\rm{{p}}}}$, and the precisions, ρ_j and ${\rho }_{j}^{{\rm{{p}}}}$, were learned with the unsupervised learning rules. During this unsupervised period, the reward, R, was kept at zero. After 4000 trials of unsupervised learning, we fixed ${w}_{ij}^{{\rm{F}}},{w}_{ij}^{{\rm{L}}},{w}_{ij}^{p}$, ρ_j, and ${\rho }_{j}^{{\rm{{p}}}}$, then trained the weights ${\overline{a}}_{j}$ using Eq. (50).

The reward weights for the circuits in both Fig. 7h and i, ${\overline{a}}_{j}$ and h_j, respectively, were initialized to zero, and the learning rates were manually tuned to the largest stable rates (η_a = 0.5 and η_h = 0.0015). The latter learning rate was smaller because ∥x∥ is typically much larger than $\parallel \bar{{\boldsymbol{p}}}\parallel$, and also because the update of the h_j was more susceptible to instability.

The classification performance was measured by the probability that the predicted and actual reward were both above 0.5 or both below 0.5,

$${\rm{performance}}\equiv \langle \Theta [({R}_{t}-0.5)(-{\widehat{e}}_{{\rm{{p}}}}-0.5)]\rangle \ ,$$

(67)

where ${\widehat{e}}_{{\rm{{p}}}}$ is the value of e_p right before the reward delivery (${\widehat{e}}_{{\rm{{p}}}}={e}_{{\rm{{p}}}}(\tau =2.45\,{\rm{{s}}})$). Note that, as mentioned above, ${\widehat{e}}_{{\rm{{p}}}}$ should converge to -R_t. Thus, the average error was defined to be

$$\,\text{Average} \, \text{error}\,\equiv {\left\langle {({R}_{t}+{\widehat{e}}_{{\rm{{p}}}})}^{2}\right\rangle }^{1/2}\ .$$

(68)

Performance evaluation

In the following sections, we summarized the performance evaluation methods we employed in this study.

Selectivity of granule cells

Because the network is trained with an unsupervised learning rule, we cannot know which neuron encodes which odor. We thus estimated the selectivity of a neuron from the incoming synaptic weights using a bootstrap method. Specifically, on each trial, the odor o(j) encoded by granule cell j is determined by choosing the odor that yields the maximum covariance between the estimated weights, w^F, and the true mixing weight, w,

$$o(j)=\arg \,\mathop{\max }\limits_{m}\sum_{i = 1}^{N}\left({w}_{ji}^{{\rm{{F}}},t}-{\langle {w}_{ji}^{{\rm{{F}}},t}\rangle }_{i}\right)\left({w}_{im}-{\langle {w}_{im}\rangle }_{i}\right).$$

(69)

The selectivity can also be estimated from the activity of a neuron directly, by assuming that the granule cell with the highest activity to odor j codes for odor j. Essentially the same result holds when we take this approach, although accurate readout of selectivity requires a large number of trials. After learning, most neurons learn to encode one odor stably.

Odor estimation performance

Given the odor selectivity, o(j), the original odors can be reconstructed by

$${\hat{c}}_{j}=\frac{{\sum }_{o(m) = j}{\overline{c}}_{j}}{{\sum }_{o(m) = j}1}\ .$$

(70)

The denominator is the number of neurons that encode odor j, which converges to one after successful learning. If both the denominator and the numerator were zero, we set ${\hat{c}}_{j}$ to 0. Performance was defined to be the correlation between the estimated odor concentration, ${\hat{c}}_{j}$, and the true concentration, c_j. Evaluation of performance on trial t used o(j) calculated from w^F,t−1, not from w^F,t. In Fig. 7e and f, we instead calculated the correlation between ${\widehat{p}}_{j}$ and the true value of Θ[c_j] using the same method, where

$${\widehat{p}}_{j}=\frac{{\sum }_{{o}_{p}(m) = j}{\overline{p}}_{j}}{{\sum }_{{o}_{p}(m) = j}1}\ ,$$

(71)

using the piriform neuron selectivity o_p(j).

ROC curve

We calculated the generalized ROC curves as in Fig. 7 of Grabska-Barwińska et al. (2017)⁶ using ${\hat{c}}_{j}$. We first separated the trials based on the total number of odors presented, and then for each trial we calculated the number of true/false positives under various thresholds θ_th. The true positive fraction is the fraction of presented odors above a threshold, θ_th, whereas the false positive count is the number of absent odors above a threshold, θ_th. The threshold, θ_th, was varied from 10⁻⁶ to 10¹ in a log scale, with an ~20% increase on every step.

Weight error

Given o(j), the error between the learned feedforward weight, ${w}_{ij}^{{\rm{F}}}$, and the true mixing weight, w_ij, was calculated by

$${d}_{w}^{{\rm{{F}}},t}\equiv \frac{1}{M}\sum_{j}^{M}\sqrt{\frac{1}{N}\sum_{i}^{N}{\left({w}_{ji}^{{\rm{{F}}},t}/{Z}_{j,t}^{w}-{w}_{i,o(j)}\right)}^{2}},$$

(72)

where ${Z}_{j,t}^{w}={\sum }_{i}{w}_{ji}^{{\rm{{F}}},t}/{\sum }_{i}{w}_{i,o(j)}$. For ease of comparison, in Fig. 6b the weight errors were scaled by 7/3, so that the initial error was similar to the errors shown in Fig. 6a.

Lifetime sparseness

For the measurement of the lifetime sparseness¹⁹, we first presented individual odors m = 1, 2, . . . , M, then recorded the activity of granule cells $\{{\overline{c}}_{j}^{(m)}\}$. Subsequently, we calculated the sparseness using

$${S}_{j}\equiv \frac{{\left(\frac{1}{M}\mathop{\sum }\nolimits_{m = 1}^{M}{\overline{c}}_{j}^{(m)}\right)}^{2}}{\frac{1}{M}\mathop{\sum }\nolimits_{m = 1}^{M}{\left({\overline{c}}_{j}^{(m)}\right)}^{2}}.$$

(73)

The lifetime sparseness, S_j, takes a small value (S_j ≃ 0) if the activity is sparse, while S_j ≲ 1 is satisfied if the activity is uniform/homogeneous. Because of this, the lifetime sparseness is sometimes defined as ${\widetilde{S}}_{j}\equiv 1-{S}_{j}$⁵³.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The main source codes of the simulations and the data analysis, from which our simulation date was generated, are publicly available at https://github.com/nhiratani/olfactory_learning. The rest are available from the corresponding author.

Code availability

The main codes for simulations and data analysis are publicly available as mentioned above, at http://github.com/nhiratani/olfactory_learning.

References

Li, Q. & Liberles, S. D. Aversion and attraction through olfaction. Curr. Biol. 25, R120–R129 (2015).
CAS PubMed PubMed Central Google Scholar
Ishii, K. K. et al. A labeled-line neural circuit for pheromone-mediated sexual behaviors in mice. Neuron 95, 123–137 (2017).
CAS PubMed Google Scholar
Staubli, U., Fraser, D., Faraday, R. & Lynch, G. Olfaction and the "data" memory system in rats. Behav. Neurosci. 101, 757 (1987).
CAS PubMed Google Scholar
Linster, C., Johnson, B. A., Morse, A., Yue, E. & Leon, M. Spontaneous versus reinforced olfactory discriminations. J. Neurosci. 22, 6842–6845 (2002).
CAS PubMed PubMed Central Google Scholar
Millman, D. J. & Murthy, V. N. Rapid learning of odor–value association in the olfactory striatum. J. Neurosci. 40, 4335–4347 (2020).
CAS PubMed Google Scholar
Grabska-Barwińska, A. et al. A probabilistic approach to demixing odors. Nat. Neurosci. 20, 98 (2017).
PubMed Google Scholar
Aitchison, L., Pouget, A. & Latham, P. E. Probabilistic synapses. Preprint at https://arxiv.org/abs/1410.1029 (2017).
Hiratani, N. & Fukai, T. Redundancy in synaptic connections enables neurons to learn optimally. Proc. Natl Acad. Sci. USA 115, E6871–E6879 (2018).
CAS PubMed Google Scholar
Beal, M. J. Variational Algorithms for Approximate Bayesian Inference (University of London, London, 2003).
Olshausen, B. A. & Field, D. J. Sparse coding with an overcomplete basis set: a strategy employed by v1? Vis. Res. 37, 3311–3325 (1997).
CAS PubMed Google Scholar
Tootoonian, S. & Lengyel, M. A dual algorithm for olfactory computation in the locust brain. Adv. Neural Inf. Process. Syst. 27, 2276–2284 (2014).
Google Scholar
Kepple, D. et al. Computational algorithms and neural circuitry for compressed sensing in the mammalian main olfactory bulb. Preprint at https://doi.org/10.1101/339689 (2018).
Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), 807–814 (ACM, 2010).
Oswald, A.-M. M. & Reyes, A. D. Maturation of intrinsic and synaptic properties of layer 2/3 pyramidal neurons in mouse auditory cortex. J. Neurophysiol. 99, 2998–3008 (2008).
PubMed PubMed Central Google Scholar
Poirazi, P., Brannon, T. & Mel, B. W. Pyramidal neuron as two-layer neural network. Neuron 37, 989–999 (2003).
CAS PubMed Google Scholar
Zhang, Z.-W. Maturation of layer v pyramidal neurons in the rat prefrontal cortex: intrinsic properties and synaptic function. J. Neurophysiol. 91, 1171–1182 (2004).
PubMed Google Scholar
Carleton, A., Petreanu, L. T., Lansford, R., Alvarez-Buylla, A. & Lledo, P. -M. Becoming a new neuron in the adult olfactory bulb. Nat. Neurosci. 6, 507 (2003).
CAS PubMed Google Scholar
Nissant, A., Bardy, C., Katagiri, H., Murray, K. & Lledo, P.-M. Adult neurogenesis promotes synaptic plasticity in the olfactory bulb. Nat. Neurosci. 12, 728 (2009).
CAS PubMed Google Scholar
Willmore, B. & Tolhurst, D. J. Characterizing the sparseness of neural codes. Network 12, 255–270 (2001).
CAS PubMed Google Scholar
Wallace, J. L., Wienisch, M. & Murthy, V. N. Development and refinement of functional properties of adult-born neurons. Neuron 96, 883–896 (2017).
CAS PubMed PubMed Central Google Scholar
Bolding, K. A. & Franks, K. M. Recurrent cortical circuits implement concentration-invariant odor coding. Science 361, 6407 (2018).
Google Scholar
Wilson, R. I. & Mainen, Z. F. Early events in olfactory processing. Annu. Rev. Neurosci. 29, 163–201 (2006).
CAS PubMed Google Scholar
O’Connell, R. J. & Mozell, M. M. Quantitative stimulation of frog olfactory receptors. J. Neurophysiol. 32, 51–63 (1969).
PubMed Google Scholar
Hopfield, J. J. Odor space and olfactory processing: collective algorithms and neural implementation. Proc. Natl Acad. Sci. USA 96, 12506–12511 (1999).
ADS CAS PubMed Google Scholar
Shepherd, G. M., Chen, W. R., Willhite, D., Migliore, M. & Greer, C. A. The olfactory granule cell: from classical enigma to central role in olfactory processing. Brain Res. Rev. 55, 373–382 (2007).
PubMed Google Scholar
Gschwend, O. et al. Neuronal pattern separation in the olfactory bulb improves odor discrimination learning. Nat. Neurosci. 18, 1474 (2015).
CAS PubMed PubMed Central Google Scholar
Yamada, Y. et al. Context-and output layer-dependent long-term ensemble plasticity in a sensory circuit. Neuron 93, 1198–1212 (2017).
CAS PubMed PubMed Central Google Scholar
Tan, J., Savigner, A., Ma, M. & Luo, M. Odor information processing by the olfactory bulb analyzed in gene-targeted mice. Neuron 65, 912–926 (2010).
CAS PubMed PubMed Central Google Scholar
Shusterman, R., Smear, M. C., Koulakov, A. A. & Rinberg, D. Precise olfactory responses tile the sniff cycle. Nat. Neurosci. 14, 1039 (2011).
CAS PubMed Google Scholar
Smear, M., Shusterman, R., O’Connor, R., Bozza, T. & Rinberg, D. Perception of sniff phase in mouse olfaction. Nature 479, 397–400 (2011).
ADS CAS PubMed Google Scholar
Chance, F. S., Abbott, L. F. & Reyes, A. D. Gain modulation from background synaptic input. Neuron 35, 773–782 (2002).
CAS PubMed Google Scholar
Baker, K. L. et al. Algorithms for olfactory search across species. J. Neurosci. 38, 9383–9389 (2018).
CAS PubMed PubMed Central Google Scholar
Roland, B., Deneux, T., Franks, K. M., Bathellier, B. & Fleischmann, A. Odor identity coding by distributed ensembles of neurons in the mouse olfactory cortex. Elife 6, e26337 (2017).
PubMed PubMed Central Google Scholar
Wesson, D. W. & Wilson, D. A. Sniffing out the contributions of the olfactory tubercle to the sense of smell: hedonics, sensory integration, and more? Neurosci. Biobehav. Rev. 35, 655–668 (2011).
PubMed Google Scholar
Mathis, A., Rokni, D., Kapoor, V., Bethge, M. & Murthy, V. N. Reading out olfactory receptors: feedforward circuits detect odors in mixtures without demixing. Neuron 91, 1110–1123 (2016).
CAS PubMed PubMed Central Google Scholar
Shah, M. M., Hammond, R. S. & Hoffman, D. A. Dendritic ion channel trafficking and plasticity. Trends Neurosci. 33, 307–316 (2010).
CAS PubMed PubMed Central Google Scholar
Chen, X. et al. Deletion of kv4. 2 gene eliminates dendritic a-type k+ current and enhances induction of long-term potentiation in hippocampal ca1 pyramidal neurons. J. Neurosci. 26, 12143–12151 (2006).
CAS PubMed PubMed Central Google Scholar
Koulakov, A. A. & Rinberg, D. Sparse incomplete representations: a potential role of olfactory granule cells. Neuron 72, 124–136 (2011).
CAS PubMed PubMed Central Google Scholar
Beck, J., Pouget, A. & Heller, K. A. Complex inference in neural circuits with probabilistic population codes and topic models. Adv. Neural Inf. Process. Syst. 25, 3059–3067 (2012).
Google Scholar
Hyvärinen, A. & Oja, E. Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000).
PubMed Google Scholar
Mitchell, T. J. & Beauchamp, J. J. Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83, 1023–1032 (1988).
MathSciNet MATH Google Scholar
Garrigues, P. & Olshausen, B. A. Learning horizontal connections in a sparse coding model of natural images. Adv. Neural Inf. Process. Syst. 20, 505–512 (2008).
Google Scholar
Triesch, J. Synergies between intrinsic and synaptic plasticity in individual model neurons. Adv. Neural Inf. Process. Syst. 17, 1417–1424 (2005).
Google Scholar
Amari, S. A theory of adaptive pattern classifiers. IEEE Trans. Electron. Comput. 3, 299–307 (1967).
MATH Google Scholar
MacKay, D. J. C. A practical Bayesian framework for backpropagation networks. Neural Comput. 4, 448–472 (1992).
Google Scholar
Doll, B. B., Simon, D. A. & Daw, N. D. The ubiquity of model-based reinforcement learning. Curr. Opin. Neurobiol. 22, 1075–1081 (2012).
CAS PubMed PubMed Central Google Scholar
Bazhenov, M., Huerta, R. & Smith, B. H. A computational framework for understanding decision making through integration of basic learning rules. J. Neurosci. 33, 5686–5697 (2013).
CAS PubMed PubMed Central Google Scholar
Caron, S. J. C., Ruta, V., Abbott, L. F. & Axel, R. Random convergence of olfactory inputs in the drosophila mushroom body. Nature 497, 113 (2013).
ADS CAS PubMed PubMed Central Google Scholar
Anderson, J. S., Carandini, M. & Ferster, D. Orientation tuning of input conductance, excitation, and inhibition in cat primary visual cortex. J. Neurophysiol. 84, 909–926 (2000).
CAS PubMed Google Scholar
Hennequin, G., Ahmadian, Y., Rubin, D. B., Lengyel, M. & Miller, K. D. The dynamical regime of sensory cortex: stable dynamics around a single stimulus-tuned attractor account for patterns of noise variability. Neuron 98, 846–860 (2018).
CAS PubMed PubMed Central Google Scholar
Hiratani, N. & Fukai, T. Mixed signal learning by spike correlation propagation in feedback inhibitory circuits. PLoS Comput. Biol. 11, e1004227 (2015).
ADS PubMed PubMed Central Google Scholar
Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607 (1996).
ADS CAS PubMed Google Scholar
Bolding, K. A. & Franks, K. M. Complementary codes for odor identity and intensity in olfactory cortex. Elife 6, e22630 (2017).
PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the Gatsby Charitable Foundation and the Wellcome Trust (110114/Z/15/Z).

Author information

Authors and Affiliations

Gatsby Computational Neuroscience Unit, University College London, 25 Howland Street, London, W1T 4JG, UK
Naoki Hiratani & Peter E. Latham

Authors

Naoki Hiratani
View author publications
You can also search for this author in PubMed Google Scholar
Peter E. Latham
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

NH and PEL designed the research; NH performed the research; NH analyzed the data; and NH and PEL wrote the paper.

Corresponding author

Correspondence to Naoki Hiratani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Brent Doiron, Matthew Smear and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hiratani, N., Latham, P.E. Rapid Bayesian learning in the mammalian olfactory system. Nat Commun 11, 3845 (2020). https://doi.org/10.1038/s41467-020-17490-0

Download citation

Received: 11 November 2019
Accepted: 02 July 2020
Published: 31 July 2020
DOI: https://doi.org/10.1038/s41467-020-17490-0
Springer Nature Limited

Rapid Bayesian learning in the mammalian olfactory system

From

Abstract

Similar content being viewed by others

Adaptive temporal processing of odor stimuli

Predictive olfactory learning in Drosophila

Representational drift in primary olfactory cortex

Introduction

Results

Problem setting

Olfactory learning as Bayesian inference

The sparse prior leads to a nonlinear transfer function

Weight uncertainty leads to adaptive synaptic plasticity

Learning concentration invariant representation and valence

Discussion

Methods

Stimulus configuration

Bayesian model

Variational approximation

The variational odor distribution

The variational weight distribution

Network model

Neural dynamics

Synaptic plasticity

Models with various priors on odor concentration

Learning concentration invariant representations

Reward-based learning

Sparse coding

Simulation details

Implementation of neural dynamics

Implementation of synaptic plasticity

Learning with a fixed gain function

Learning with a fixed learning rate

Go/no go task

Performance evaluation

Selectivity of granule cells

Odor estimation performance

ROC curve

Weight error

Lifetime sparseness

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation