1 Introduction

The human brain is arguably among the most sophisticated and enigmatic nature creations. Over millions of years it has evolved to amass billions of neurons, featuring on average \(86 \times 10^9\) cells (Herculano-Houzel 2012). This remarkable figure is several orders of magnitude higher than that of the most mammals and several times larger than in primates (Herculano-Houzel 2011). While measuring roughly \(2\%\) of the body mass, the human brain consumes about \(20\%\) of the total energy (Clark and Sokoloff 1999).

The significant metabolic cost associated with a larger brain in humans, as opposed to mere body size—a path that great apes might have evolved (Herculano-Houzel 2011), must be justified by evolutionary advantages. Some of the benefits may be related to the development of a remarkably important social life in humans. This, in particular, requires extensive abilities in the formation of complex memories. Indirectly this hypothesis is supported by the significant difference among species in the number of neurons in the cortex (Herculano-Houzel 2009) and the hippocampus (Andersen et al. 2007). For example, in the CA1 area of the hippocampus there are \(0.39\times 10^6\) pyramidal neurons in rats, \(1.3\times 10^6\) in monkeys, and \(14\times 10^6\) in humans.

Evolutionary implications in relation to cognitive functions have been widely discussed in the literature (see, e.g., Platek et al. 2007; Sherwood et al. 2012; Sousa et al. 2017). Recently, it has been shown that in humans new memories can be learnt very rapidly by supposedly individual neurons from a limited number of experiences (Ison et al. 2015). Moreover, some neurons can exhibit remarkable selectivity to complex stimuli, the evidence that has led to debates around the existence of the so-called “grand mother” and “concept” cells (Quiroga et al. 2005; Viskontas et al. 2009; Quiroga 2012), and their role as elements of a declarative memory. These findings suggest that not only the brain can learn rapidly but also it can respond selectively to “rare” individual stimuli. Moreover, experimental evidence indicates that such a cognitive functionality can be delivered by single neurons (Ison et al. 2015; Quiroga et al. 2005; Viskontas et al. 2009). The fundamental questions, hence, are: How is this possible? and What could be the underlying functional mechanisms?

Recent theoretical advances achieved within the Blue Brain Project show that the brain can operate in many dimensions (Reimann et al. 2017). It is claimed that the brain has structures operating in up to eleven dimensions. Groups of neurons can form the so-called cliques, i.e., networks of specially interconnected neurons that generate precise representations of geometric objects. Then the dimension grows with the number of neurons in the clique. Multidimensional representation of spatiotemporal information in the brain is also implied in the concept of generalized cognitive maps (see, e.g., Villacorta-Atienza et al. 2015; Calvo et al. 2016; Villacorta-Atienza et al. 2017). Within this theory, spatiotemporal relations between objects in the environment are encoded as static (cognitive) maps and represented as elements of an n-dimensional space (\(n\gg 1\)). The cognitive maps as information items can be learnt, classified, and retrieved on demand (Villacorta-Atienza and Makarov 2013). However, the questions concerning how the brain or individual neurons can distinguish among a huge number of different maps and select an appropriate one remain unknown.

In this work we propose that brain areas with a predominant laminar topology and abundant signalling routes simultaneously converging on individual cells (e.g., the hippocampus) are propitious for a high-dimensional processing and learning of complex information items. We show that a canonical neuronal model, the perceptron Rosenblatt (1962), in combination with a Hebbian-type of learning may provide answers to the above-mentioned fundamental questions. In particular, starting from stochastic separation theorems (Gorban and Tyukin 2017, 2018) we demonstrate that individual neurons gathering multidimensional stimuli through a sufficiently large number of synaptic inputs can exhibit extreme selectivity either to individual information items or to groups of items. Moreover, neurons are capable of associating and learning uncorrelated information items. Thus, a large number of signalling routes simultaneously converging on a large number of single cells, as it is widely observed in laminar brain structures, translates into a natural environment for rapid formation and maintenance of extensive memories. This is vital for social life and hence may constitute a significant evolutionary advantage, albeit, at the cost of high metabolic expenditure.

2 Fundamental Problems of Encoding Memories

Different brain structures, such as the hippocampus, have a pronounced laminar organization. For example, the CA1 region of the hippocampus is constituted by a palisade of morphologically similar pyramidal cells oriented with their main axis in parallel and forming a monolayer (Fig. 1a). The major excitatory input to these neurons comes through Schaffer collaterals from the CA3 region (Amaral and Witter 1989; Ishizuka et al. 1990; Wittner et al. 2007), which is a hub routing information among many brain structures. Each CA3 pyramidal neuron sends an axon that bifurcates and leaves multiple collaterals in the CA1 with dominant parallel orientation (Fig. 1b). This topology allows multiple parallel axons conveying multidimensional “spatial” information from one area (CA3) simultaneously leave synaptic contacts on multiple neurons in another area (CA1). Thus, we have simultaneous convergence and divergence of the information content (Fig. 1b, right).

Fig. 1
figure 1

General principles of encoding memories by single neurons in laminar structures. a Laminar organization of the CA3 and CA1 areas in the hippocampus facilitates multiple parallel synaptic contacts between neurons in these areas by means of Schaffer collaterals. b Axons from CA3 pyramidal neurons bifurcate and pass through the CA1 area in parallel (left panel) giving rise to the convergence–divergence of the information content (right panel). Multiple CA1 neurons receive multiple synaptic contacts from CA3 neurons. c Schematic representation of three memory encoding schemes. (1) Selectivity. A neuron (shown in yellow) receives inputs from multiple presynaptic cells that code different information items. It detects (responds to) only one stimulus (purple trace), whereas rejecting the others. (2) Clustering. Similar to 1, but now a neuron (shown in pink) detects a group of stimuli (purple and blue traces) and ignores the others. (3) Acquiring memories. A neuron (shown in green) learns dynamically a new memory item (blue trace) by associating it with a know one (purple trace) (Color figure online)

Experimental findings show that multiple CA1 pyramidal cells distributed in the rostro-caudal direction are activated near-synchronously by assemblies of simultaneously firing CA3 pyramidal cells (Ishizuka et al. 1990; Li et al. 1994; Benito et al. 2014). Thus, an ensemble of single neurons in the CA1 can receive simultaneously the same synaptic input (Fig. 1b, left). Since these neurons have different topology and functional connectivity (Finnerty and Jefferys 1993), their response to the same input can be different. Moreover, experimental in vivo results show that long-term potentiation can significantly increase the spike transfer rate in the CA3–CA1 pathway (Fernandez-Ruiz et al. 2012). This suggests that the efficiency of individual synaptic contacts can be increased selectively.

In this work we will follow conventional and rather general functional representation of signalling in the neuronal pathways. We assume that upon receiving an input, a neuron can either generate a response or remain silent. Forms of the neuronal responses as well as the definitions of synaptic inputs vary from one model to another. Therefore, here we adopt a rather general functional approach. Under a stimulus we understand a number of excitations simultaneously (or within a short time window) arriving to a neuron through several axones and thus transmitting some “spatially coded” information items (Benito et al. 2016). If a neuron responds to a stimulus (e.g., generates output spikes or increases its firing rate), we then say that the neuron detects the informational content of the given stimulus.

We follow the standard machine learning assumptions (Vapnik and Chapelle 2000; Cucker and Smale 2002). The stimuli are generated in accordance with some distribution or a set of distributions (“Outer World Models”). All stimuli that a neuron may receive are samples from this distribution. The sampling itself may be a complicated process, and for simplicity we assume that all samples are identically and independently distributed (i.i.d.). Once a sample is generated, a stimuli sub-sample is independently selected for testing purposes. If more than one neuron is considered, we will assume that a rule (or a set of rules) is in place that determines how a neuron is selected from the set. The rules can be both deterministic and randomized. In the latter case we will specify this process.

Let us now pose the following fundamental questions related to the information encoding and formation of memories by single neurons and their ensembles in laminated brain structures:

  1. 1.

    Selectivity: detection of one stimulus from a set (Fig. 1C.1) Pick an arbitrary stimulus from a reasonably large set such that a single neuron from a neuronal ensemble detects this stimulus. Then what is the probability that this neuron is stimulus-specific, i.e., it rejects all the other stimuli from the set?

  2. 2.

    Clustering: detection of a group of stimuli from a set (Fig. 1C.2) Within a set of stimuli we select a smaller subset, i.e., a group of stimuli. Then what is the probability that a neuron detecting all stimuli from this subset stays silent for all remaining stimuli in the set?

  3. 3.

    Acquiring memories: learning new stimulus by associating it with one already known (Fig. 1C.3) Let us consider two different stimuli \(\varvec{s}_1\) and \(\varvec{s}_2\) such that for \(t\le t_0\) they do not overlap in time and a neuron detects \(\varvec{s}_1\), but not \(\varvec{s}_2\). In the next interval \((t_0,t_1]\), \(t_1>t_0\) the stimuli start to overlap in time (i.e., they stimulate the neuron together). For \(t> t_1\) the neuron receives only stimulus \(\varvec{s}_2\). Then what is the probability that for some \(t_2\ge t_1\) the neuron detects \(\varvec{s}_2\)?

These questions are in the core of a broad range of puzzling phenomena reported in Ison et al. (2015), Quiroga et al. (2005), Viskontas et al. (2009). In what follows we will show that, remarkably, these three non-trivial fundamental questions can be answered within a simple classical modeling framework, whereby a neuron is represented by a mere perceptron equipped with a Hebbian-type of learning.

3 Formal Statement of the Problem

In this section we specify the information content to be processed by neurons and define a mathematical model of a generic neuron equipped with synaptic plasticity. Before going any further, let us first introduce notational agreements used throughout the text. Given two vectors \(\varvec{x},\varvec{y}\in \mathbb {R}^n\), their inner product \(\langle \varvec{x}, \varvec{y} \rangle \) is: \(\langle \varvec{x}, \varvec{y} \rangle =\sum _{i=1}^n x_i y_i\). If \(\varvec{x}\in \mathbb {R}^n\) then \(\Vert \varvec{x}\Vert \) stands for the usual Euclidean norm of \(\varvec{x}\): \(\Vert \varvec{x}\Vert =\langle \varvec{x},\varvec{x}\rangle ^{1/2}\). By \(B_n(1)=\{\varvec{x}\in \mathbb {R}^n | \ \ \Vert \varvec{x}\Vert \le 1\}\) we denote a unit n-ball centered at the origin; \(\mathcal {V}(\Xi )\) is the Lebesgue volume of \(\Xi \subset \mathbb {R}^n\), and \(|{\mathcal {M}}|\) is the cardinality of a finite set \({\mathcal {M}}\). Symbol \({\mathcal {C}}({\mathcal {D}})\), \({\mathcal {D}}\subseteq \mathbb {R}^m\) stands for the space of continuous real-valued functions on \({\mathcal {D}}\).

3.1 Information Content and Classes of Stimuli

We assume that a neuron receives and processes a large but finite set of different stimuli codifying different information items:

$$\begin{aligned} {\mathcal {S}}=\{ \varvec{s}_i \}. \end{aligned}$$
(1)
Fig. 2
figure 2

Codification of high-dimensional information by a neuron. Each of M stimuli comprises of the “spatial” information, \(\varvec{x}_i\in \mathbb {R}^n\), (e.g., M images) conducted through n axons (in yellow) and the temporal part, \(c(t-{\tau _{i,j}})\), reflecting the times of stimuli presentation. A neuron (in blue) receives the stimuli and generates responses determined by some transfer function f (Color figure online)

Figure 2 illustrates schematically the information flow. Each individual stimulus i is modeled by a function \(\varvec{s}:\mathbb {R}\times \mathbb {R}^n\rightarrow \mathbb {R}^n\):

$$\begin{aligned} \varvec{s}(t,\varvec{x}_i)=\varvec{x}_i \sum _{j}c(t-{\tau _{i,j}}), \end{aligned}$$
(2)

where \(\varvec{x}_i \in \mathbb {R}^n\setminus \{0\}\) is the stimulus content codifying the information to be transmitted over n individual “axons”. An example of an information item could be an \(l\times k\) image (see Fig. 2). In this case the dimension of each information item is \(n = l\times k\).

In Eq. (2) the function \(c(\cdot )\) defines the stimulus context, i.e., the time window when the stimulus arrives to the neuron. For the sake of simplicity we use a rectangular window:

$$\begin{aligned} c(t)=\left\{ \begin{array}{ll} 1, &{} \text{ if } t\in [0,\varDelta T] \\ 0, &{} \text{ otherwise }, \end{array} \right. \end{aligned}$$
(3)

where \(\varDelta T > 0\) is the window length. The time instants of the stimulus presentations, \(\tau _{i,j}\), are ordered and satisfy:

$$\begin{aligned} \tau _{i,j+1} > \tau _{i,j} + \varDelta T, \ \ \forall j. \end{aligned}$$
(4)

Different stimuli arriving to the neuron are added linearly on the neuronal membrane. Thus, the overall neuronal input \(\varvec{S}\) can be written as:

$$\begin{aligned} \varvec{S}(t) = \sum _{i,j} \varvec{x}_i c(t-\tau _{i,j}). \end{aligned}$$
(5)

We assume that the information content of stimuli (5) and (2), i.e., vectors \(\varvec{x}_i\) are drawn i.i.d. from some distribution. For convenience, we partition all information items into two sets:

$$\begin{aligned} {\mathcal {M}}=\{\varvec{x}_{1},\dots , \varvec{x}_{M}\}, \ \ {\mathcal {Y}}=\{\varvec{x}_{M+1},\dots , \varvec{x}_{M+m}\}, \end{aligned}$$
(6)

where M is large but finite and \(m\ge 1\) is in general smaller than M. The set \({\mathcal {M}}\) contains a background content for a given neuron, whereas the set \({\mathcal {Y}}\) models the informational content relevant to the task at hand. In other words, to accomplish a static memory task the neuron should be able to detect all elements from \({\mathcal {Y}}\) and to reject all elements from \({\mathcal {M}}\).

The sets \({\mathcal {M}}\) and \({\mathcal {Y}}\) give rise to the corresponding subsets of stimuli:

$$\begin{aligned} \begin{aligned} {\mathcal {S}}({\mathcal {M}})=\,&\{\varvec{s}_i\in {\mathcal {S}} \ | \ \varvec{s}_i(\cdot )=\varvec{s}(\cdot ,\varvec{x}_i), \ \varvec{x}_i\in {\mathcal {M}} \}, \\ {\mathcal {S}}({\mathcal {Y}})=\,&\{\varvec{s}_i\in {\mathcal {S}} \ | \ \varvec{s}_i(\cdot )=\varvec{s}(\cdot ,\varvec{x}_i), \ \varvec{x}_i\in {\mathcal {Y}} \}. \end{aligned} \end{aligned}$$
(7)

3.2 Neuronal Model

To stay within functional description of the information processing let us consider the most basic class of model neurons, a perceptron (Rosenblatt 1962). A single neuron receives a stimulus \(\varvec{s}(t,\varvec{x})\) through n synaptic inputs (Fig.  2) and its membrane potential, \(y \in \mathbb {R}\), is given by

$$\begin{aligned} y(\varvec{s},\varvec{w})=\langle \varvec{w},\varvec{s}\rangle , \end{aligned}$$
(8)

where \(\varvec{w}\in \mathbb {R}^n\) is a vector of the synaptic weights. The neuron generates a response, \(v\in \mathbb {R}\), according to:

$$\begin{aligned} v(\varvec{s},\varvec{w},\theta )= f(y(\varvec{s},\varvec{w}) - \theta ), \end{aligned}$$
(9)

where \(\theta \in \mathbb {R}\) is the “firing” threshold and \(f:\mathbb {R}\rightarrow \mathbb {R}\) is the transfer function (Fig.  2): \(f \in {\mathcal {C}}(\mathbb {R})\), f is locally Lipschitz, \(f(u)=0\) for \(u\in (-\infty ,0]\), and \(f(u)>0\) for \(u\in (0,\infty )\).

Model (8), (9) captures the summation of postsynaptic potentials and the threshold nature of the neuronal activation but disregards the specific dynamics accounted for in other more advanced models. Nevertheless, as we will show in Sect. 4, this phenomenological model is already sufficient to explain the fundamental properties of information processing discussed in Sect. 2.

3.3 Synaptic Plasticity

In addition to the basic neuronal response mechanism (Sect. 3.2), we also model the synaptic plasticity. The description adopted here relies on the neuronal firing rate and Hebbian learning. Such a learning rule implies that the dynamics of \(\varvec{w}\) should depend on the product of the input signal, \(\varvec{s}\), and the neuronal output, v. We thus arrive to a modified classical Oja rule (Oja 1982):

$$\begin{aligned} \begin{aligned} \dot{\varvec{w}}\,=\,&\alpha v(\varvec{s},\varvec{w},\theta ) y(\varvec{s},\varvec{w}) \left( \varvec{s}- \varvec{w}y(\varvec{s},\varvec{w}) \right) ,\\ \varvec{w}(t_0)\,=\,&\varvec{w}_0\in \mathbb {R}^n, \ \varvec{w}_0\ne 0, \end{aligned} \end{aligned}$$
(10)

where \(\alpha >0\) defines the relaxation time. The multiplicative term v in (10) ensures that plastic changes of \(\varvec{w}\) occur only when an input stimulus evokes a nonzero neuronal response. The fact that \(\varvec{w}_0\ne 0\) reflects the assumption that synaptic connections have already been established, albeit their efficacy could be subjected to plastic changes. In addition to capturing general principle of the classical Hebbian rule, model (10) guarantees that synaptic weights \(\varvec{w}\) are bounded in forward time (see “Appendix A”) and hence conforms with physiological plausibility.

4 Formation of Memories in High Dimensions

In Sect. 2 we formulated three fundamental problems of organization of memories in laminar brain structures. Let us now show how they can be treated given that pyramidal neurons operate in high dimensions.

To formalize the analysis let \({\mathcal {U}}\) be a subset of the stimulus set \({\mathcal {S}}\). A neuron (8), (9) parameterized by \((\varvec{w}, \theta )\) partitions the set \({\mathcal {U}}\) into the following subsets:

$$\begin{aligned} \begin{aligned} \mathrm {Activated}({\mathcal {U}},(\varvec{w},\theta ))=&\{\varvec{s}_i\in {\mathcal {U}} \ | \ \ \exists \, t{\ge } t_0: \ v(\varvec{s}_i(t),\varvec{w},\theta ) > 0\},\\ \mathrm {Silent}({\mathcal {U}},(\varvec{w},\theta ))=&\{\varvec{s}_i\in {\mathcal {U}} \ | \ \ v(\varvec{s}_i(t),\varvec{w},\theta )= 0 \ \ \forall \ t\ge t_0 \}. \end{aligned} \end{aligned}$$
(11)

The first set corresponds to the stimuli detected by the neuron, while the second one collects background stimuli.

4.1 Extreme Selectivity of a Single Neuron to Single Stimuli

Consider the case when the set \({\mathcal {Y}}\) in (6) contains only one element, i.e., \(|{\mathcal {Y}}|=1\), \({\mathcal {Y}}=\{\varvec{x}_{M+1}\}\), whereas the set \({\mathcal {M}}\) is allowed to be sufficiently large (\(|{\mathcal {M}}|=M\gg 1\)). Let us also assume that the stimuli with different information content, \(\varvec{s}(\cdot ,\varvec{x}_i)\), do not overlap in time, i.e., we present them to a neuron one by one.

For a given nonzero \(\varvec{x}_{M+1}\in {\mathcal {Y}}\) and stimulus \(\varvec{s}(\cdot ,\varvec{x}_{M+1})\) such that it is not identically zero for \(t\ge t_0\) we can always construct a neuron which would generate a nonzero response to the stimulus \(\varvec{s}(\cdot ,\varvec{x}_{M+1})\) at some \(t\ge t_0\). In other words, \(\varvec{s}(\cdot ,\varvec{x}_{M+1})\in \mathrm {Activated}({\mathcal {S}}({\mathcal {Y}}),(\varvec{w},\theta ))\). Mathematically such a neuron can be defined as follows. Let

$$\begin{aligned} \varvec{w}^{*} = \frac{\varvec{x}_{M+1}}{\Vert \varvec{x}_{M+1}\Vert }. \end{aligned}$$
(12)

Then the space from which the synaptic weights are chosen can be represented as a direct sum of the one-dimensional linear subspace \(L^{\Vert }(\varvec{w}^*)\) spanned by \(\varvec{w}^{*}\) and an \((n-1)\)-dimensional subspace \(L^{\bot }(\varvec{w}^{*})\) of \(\mathbb {R}^n\) that is orthogonal to \(\varvec{w}^{*}\). In this representation, if a neuron with the synaptic weight \(\varvec{w}\) generates a nonzero response to \(\varvec{s}(\cdot ,\varvec{x}_{M+1})\), then the coupling weight \(w^{*}=\langle \varvec{w},\varvec{w}^{*}\rangle \) should satisfy the following condition (Fig. 3, green area):

$$\begin{aligned} {w^{*}} > \frac{\theta }{\Vert \varvec{x}_{M+1}\Vert }. \end{aligned}$$

Indeed, such a choice is equivalent to

$$\begin{aligned} v(\varvec{x}_{M+1},\varvec{w},\theta ) = f({w^{*}}\Vert \varvec{x}_{M+1}\Vert - \theta ) > 0, \end{aligned}$$

which in turn implies that \(v(\varvec{s}(t,\varvec{x}_{M+1}),\varvec{w}^*,\theta )>0\) at some t and vice-versa.

Fig. 3
figure 3

Selection of neuronal parameters \(\bar{\theta }=\theta /\Vert \varvec{x}_{M+1}\Vert \) and \({w^{*}}\), such that the neuron responds to the relevant information \(\varvec{x}_{M+1}\). Neurons corresponding to points within the green area detect the stimulus \(\varvec{x}_{M+1}\). Brown areas show projections of hypercylinders defined in Theorem 1 for \(D_1 = 0.3\), \(D_2\) = 0.1, \(D_3 = 0.03\) and \(\Vert \varvec{x}_{M+1}\Vert = 0.6\) (Color figure online)

Once a neuron that detects relevant information item, i.e., \(\varvec{x}_{M+1}\), is specified we can proceed with assessing its selectivity properties.

Definition 1

(Neuronal Selectivity) We say that a neuron is selective to the information content\({\mathcal {Y}}\) iff it detects the relevant stimuli from the set \({\mathcal {S}}({\mathcal {Y}})\) and ignores all the others from the set \({\mathcal {S}}({\mathcal {M}})\).

The notion of selectivity, as stated in Definition 1, could be relaxed to account for partial detection and rejection of information content from \({\mathcal {Y}}\) and \({\mathcal {M}}\), respectively. This naturally gives rise to various levels of neuronal selectivity determined, for instance, by the proportion of elements from \({\mathcal {M}}\) that correspond to stimuli that have been rejected. As we will see below, different admissible pairs \((\varvec{w},\theta )\) (Fig. 3) produce different selectivity levels. The closer to the bisector, the higher the selectivity. One can pick an arbitrary firing threshold \(\theta \ge 0\) and select the synaptic efficiency at \(t=t_0\) as:

$$\begin{aligned} \varvec{w}(t_0) = \frac{\theta + \epsilon }{\Vert \varvec{x}_{M+1}\Vert }\varvec{w}^{*} + \varvec{w}^{\bot }, \ \ \ \epsilon >0, \ \varvec{w}^{\bot }\in L^{\bot }. \end{aligned}$$
(13)

It can be shown (see “Appendix A”) that if the stimulus \(\varvec{s}(\cdot ,\varvec{x}_{M+1})\) is persistent over time and \(\varvec{w}(t_0)\) satisfies (13) then synaptic efficiency \(\varvec{w}(t,\varvec{w}_0)\) converges asymptotically (as \(t\rightarrow \infty \)) to:

$$\begin{aligned} \varvec{w}_{\infty } = \left\{ \begin{array}{ll} \varvec{w}^{*}, \ &{} \ \text{ if } \theta < \Vert \varvec{x}_{M+1}\Vert \\ \frac{\displaystyle \theta }{\displaystyle \Vert \varvec{x}_{M+1}\Vert }\varvec{w}^{*} + \varvec{w}^{\bot }_{\infty }, \ &{} \ \text{ if } \theta \ge \Vert \varvec{x}_{M+1}\Vert , \end{array} \right. \end{aligned}$$
(14)

where \(\varvec{w}^{\bot }_{\infty }\) is an element of \( L^{\bot }\).

Fig. 4
figure 4

Example of selective neuronal responses to stimulation with different \((30\times 38)\)-pixels images (only first few stimulus are shown in the time line). Each neuron responds to its own (relevant) stimulus only and rejects the other (background) stimuli

Figure 4 shows typical responses of neurons parameterized by different pairs \((\varvec{w},\theta )\) and subjected to stimulation by different information items \(\varvec{x}_i\). Here \(\varvec{x}_i\) correspond to \((30 \times 38)\)-pixels color images (i.e., \(\varvec{x}_i \in \mathbb {R}^{3420}\)). Firing thresholds \(\theta \) have been chosen at random, and weights \(\varvec{w}\) have been set in accordance with (13) with the first three images serving as the relevant information items for the three corresponding neurons. No plastic changes in \(\varvec{w}\) were allowed. The neurons detect their own (relevant) stimuli, as expected. Moreover, they do not respond to the stimulation by other background information items (4 out of \(10^3\) images are shown in Fig. 4). Thus, the neurons indeed exhibit high stimulus selectivity.

The following theorem provides theoretical justification for these observations.

Theorem 1

Let elements of the sets \({\mathcal {M}}\) and \({\mathcal {Y}}\) be i.i.d. random vectors drawn from the equidistribution in \(B_n(1)\). Consider the sets of stimuli \({\mathcal {S}}({\mathcal {M}})\) and \({\mathcal {S}}({\mathcal {Y}})\) specified by (7). Let \((\varvec{w},\theta )\) be the neuron parameters such that

$$\begin{aligned} \varvec{s}_{M+1}\in \mathrm {Activated}({\mathcal {S}}({{\mathcal {Y}})},(\varvec{w},\theta )) \ \text{ and } \ 0<\theta <\Vert \varvec{w}\Vert . \end{aligned}$$

Then:

  1. 1.

    The probability that the neuron is silent for all background stimuli \(\varvec{s}_i\in {\mathcal {S}}({\mathcal {M}})\) is bounded from below by:

    $$\begin{aligned} \begin{aligned}&P( \varvec{s}_i \in \mathrm {Silent}({\mathcal {S}}({\mathcal {M}}),(\varvec{w},\theta )) \ \forall \varvec{s}_i\in {\mathcal {S}}({\mathcal {M}}) \big | \ \varvec{w},\theta ) \ge \\&\quad \ge \left[ 1-\frac{1}{2} \left( 1 - \frac{\theta ^2}{\Vert \varvec{w}\Vert ^2} \right) ^\frac{n}{2} \right] ^M. \end{aligned} \end{aligned}$$
    (15)
  2. 2.

    There is a family of sets parametrized by D (\(0<D<\min \{\frac{1}{2}, \Vert \varvec{x}_{M+1}\Vert \}\)):

    $$\begin{aligned} \varOmega _D=\Big \{ (\varvec{w},\theta ) \big | \ \ \Vert \varvec{w}-\varvec{w}^{*} \Vert <D, \ D \le \Vert \varvec{x}_{M+1}\Vert - \theta \le 2D \Big \}, \end{aligned}$$
    (16)

    where \(\varvec{w}^{*}=\varvec{x}_{M+1}/\Vert \varvec{x}_{M+1}\Vert \), such that \(\varvec{s}_{M+1}\in \mathrm {Activated}({\mathcal {S}}({\mathcal {Y}}),(\varvec{w},\theta ))\), for \((\varvec{w},\theta )\in \varOmega _D\) and

    $$\begin{aligned} \begin{aligned}&P\big ( \varvec{s}_i \in \mathrm {Silent}({\mathcal {S}}({\mathcal {M}}),(\varvec{w},\theta )) \ \forall \varvec{s}_i\in {\mathcal {S}}({\mathcal {M}})\big | \ \forall (\varvec{w},\theta )\in \varOmega _D\big ) \ge \\&\quad \ge \max _{\varepsilon \in (0,1-2D)} (1-(1-\varepsilon )^n) \left[ 1-\frac{1}{2} \rho (\varepsilon ,D)^{\frac{n}{2}} \right] ^M \end{aligned} \end{aligned}$$
    (17)

    where

    $$\begin{aligned} \rho (\varepsilon ,D)= 1 - \left( \frac{1-\varepsilon -2D}{1+D}\right) ^2. \end{aligned}$$

The proof is provided in “Appendix B”.

Remark 1

For an admissible fixed \(D>0\), the volume \({\mathcal {V}}(\varOmega _D)>0\). Therefore, the estimate provided by Theorem 1 is robust to small perturbations of \((\varvec{w},\theta )\), and slight fluctuations of neuronal characteristics are not expected to affect neuronal functionality.

Remark 2

Theorem 1 (part 2) specifies a non-iterative procedure for constructing sets of selective neurons. Such neurons detect given stimuli and reject the others, with high probability. Figure 3 (in brown) shows examples of three projections of the hypercylinders (16) ensuring robust selective stimulus detection. The smaller is the cylinder, the higher is the selectivity.

To illustrate Theorem 1 numerically we fixed the neuronal dimensionality parameter n and generated two random sets of information items comprising of \(10^3\) elements each, i.e., \(\{\varvec{x}_i\}_{i=1}^{10^3}\). One set was sampled from the equidistribution in a unit ball \(B_n(1)\) centered at the origin (i.e., \(\Vert \varvec{x}_i\Vert _2 \le 1\)), and the other from the equidistribution in the hypercube \(\Vert \varvec{x}_i\Vert _{\infty } \le 1\) (a product distribution). For each set of informational items, a neuronal ensemble of \(10^3\) single neurons parameterized by \((\varvec{w}_i,\theta _i)\) was created. Each neuron was assigned fixed firing threshold \(\theta _i = 0.5\), \(i=1,\dots ,10^3\), whereas the synaptic efficiencies were set as \(\varvec{w}_i=(\theta _i+\epsilon )\varvec{x}_i/\Vert \varvec{x}_i\Vert \), \(\epsilon =0.05\). For these neuronal ensembles and their corresponding stimuli sets we evaluated output of each neuron and assessed the neuronal selectivity (see Def. 1). The procedure was repeated 10 times. This was followed by evaluation of the frequencies of selective neurons in the pool for each n.

Fig. 5
figure 5

Extreme selectivity to stimuli and memory capacity of single neurons. a Stimulus selectivity vs the neuron dimension. The selectivity index steeply increases for \(n\in [10,20]\). For \(n>20\) practically all neurons become selective to a set of \(10^3\) random stimuli. b Memory capacity with reliability 0.95 of a neuronal ensemble versus the neuron dimension. For both types of stimuli the memory capacity grows exponentially (straight lines show regressions)

Figure 5a shows frequencies of selective neurons in an ensemble, for \(10^3\) stimuli taken from: i) a unit ball (red), ii) a hypercube (blue), and iii) the estimate provided by Theorem 1 (dashed). For n small (\(n < 6\)) neurons exhibit no selectivity, i.e., they confuse different stimuli and generate nonspecific responses. As expected, when neuronal dimensionality, n, increases, the neuronal selectivity increases rapidly; and at around \(n = 20\) it approaches \(100\%\).

4.2 Extreme Selectivity of a Single Neuron and Ensemble Memory Capacity

The property of a neuron to respond selectively to a single element from a large set of stimuli can be related to the notion of memory capacity of a neuronal ensemble comprising of a set of selective neurons.

Recall that in the framework of associative memory (Hopfield 1982), for each informational item (pattern) \(\varvec{x}_i\) from the set \({\mathcal {M}}\) there is a vicinity \({\mathcal {V}}_i\) associated with \(\varvec{x}_i\) and corresponding to all admissible perturbations of \(\varvec{x}_i\). Suppose that for each \(\varvec{x}_i\) there is a neuron in the ensemble that is activated for all stimuli with informational content \(\varvec{x}\) in \({\mathcal {V}}_i\) and is silent for all other stimuli, i.e., for stimuli with \(\varvec{x}\) in \(\cup _{j\ne i}{\mathcal {V}}_j\). The maximal size of the set \({\mathcal {M}}\) for which this property holds will be referred to as the (absolute) memory capacity of the ensemble (cf. Hopfield 1982; Barrett et al. 2004; Leung et al. 1995).

This conventional mechanistic definition of memory capacity, however, is too restrictive to account for variability and uncertainty that biological neuronal ensembles and systems are to deal with. Indeed, informational items themselves may bear a degree of uncertainty resulting in that \({\mathcal {V}}_i\cap {\mathcal {V}}_j\ne \varnothing \) for some ji, \(i\ne j\). Furthermore, errors in memory retrievals are known to occur in classical artificial associative memory models too (see, e.g., Hopfield 1982; Amit et al. 1985; Leung et al. 1995). To be able to formally quantify such errors in relation to the number of informational items an ensemble is to store, we extend the classical notion as follows.

Suppose that for each \(\varvec{x}_i\) there is a neuron in the ensemble that is activated for all stimuli with informational content \(\varvec{x}\in {\mathcal {V}}_i\) and, with probability \(\phi \), is silent for all stimuli with \(\varvec{x}\in {\mathcal {V}}_j\), \(j\ne i\). The maximal size of the set \({\mathcal {M}}\) for which this property holds will be referred to as the memory capacity with reliability\(\phi \) of the ensemble.

Assuming that \({\mathcal {V}}_i\) are sufficiently small, an estimate of the memory capacity with reliability \(\phi \) of a neuronal ensemble follows from Theorem 1.

Corollary 1

Let elements of the sets \({\mathcal {M}}\) and \({\mathcal {Y}}\) be i.i.d. random vectors drawn from the equidistribution in \(B_n(1)\). Consider the set of stimuli \({\mathcal {S}}({\mathcal {M}})\) as defined in (7). Then for a given fixed \(\phi \in (0,1)\) the maximal size \(\overline{M}\) of the stimuli set \({\mathcal {S}}({\mathcal {M}})\) for which the following holds

$$\begin{aligned} P( \varvec{s}_i \in \mathrm {Silent}({\mathcal {S}}({\mathcal {M}}),(\varvec{w},\theta )) \ \forall \varvec{s}_i\in {\mathcal {S}}({\mathcal {M}}) \big | \ \varvec{w},\theta ) \ge \phi \end{aligned}$$

grows at least exponentially with the neuronal dimension n:

$$\begin{aligned} \overline{M}> -\ln \left( \phi \right) \left( 2e^{\alpha n} - 1\right) , \ \text{ where } \ \alpha = \ln \left[ \frac{\Vert \varvec{w}\Vert }{\sqrt{\Vert \varvec{w}\Vert ^2-\theta ^2}}\right] >0. \end{aligned}$$
(18)

The proof is given in “Appendix C”.

Figure 5b illustrates how the memory capacity with reliability \(\phi \) grows with neuronal dimension n. For each neuronal dimension n we generated i.i.d. samples \({\mathcal {M}}\) with \(|{\mathcal {M}}|=M\) from the equidistribution in \(B_n(1)\) and the n-cube \([-1,1]^n\). For each sample, we defined neuronal ensembles comprising of M neurons with synaptic weights \(\varvec{w}_i=\varvec{x}_i/\Vert \varvec{x}_i\Vert \) and thresholds \(\theta _i=0.5\), and calculated the proportion of neurons in the ensemble that are activated by each stimulus. If the proportion was smaller than 0.05 of the total number of neurons, we incremented the value of M, generated a new sample \({\mathcal {M}}\) with increased cardinality M, and repeated the experiment. The values of M corresponding to samples at which the process stopped have been recorded and retained. These constituted empirical estimates of the maximal number of stimuli for which the proportion of neurons responding to a single stimulus is at most \(0.05=1-\phi \). Figure 5b shows empirical means of such numbers for the unit ball and in the hypercube. As follows from these observations, memory capacity grows exponentially with the neuron dimension in both cases. Such a fast growth can easily cover quite exigent memory necessities.

4.3 Selectivity of a Single Neuron to Multiple Stimuli

To organize memories, the ability to associate different information items is essential (Fig. 1C2). To determine if such associations are feasible at the level of single neurons we assess neuronal selectivity to multiple stimuli. In particular, we consider the set \({\mathcal {Y}}\) [Eq. (6)] containing \(m>1\) random vectors: \({\mathcal {Y}}=\{\varvec{x}_{M+1},\dots , \varvec{x}_{M+m}\}\). As in Sect. 4.1, here we assume that all stimuli do not overlap in time and arrive to the neuron separately. The question of interest is: Can we find a neuron [i.e., parameters \((\varvec{w},\theta )\)], such that it would generate a nonzero response to all \(\varvec{s}_i\in {\mathcal {S}}({\mathcal {Y}})\) and, with high enough probability, would be silent to all \(\varvec{s}_i \in {\mathcal {S}}({\mathcal {M}})\)?

Below we will show that this is indeed possible, provided that the neuronal dimensionality, n, is large enough. Moreover, the separation can be achieved by a neuron with the vector of synaptic weights, \(\varvec{w}=\varvec{w}^*\), closely aligned with the mean vector of the stimulus set \({\mathcal {Y}}\):

$$\begin{aligned} \bar{\varvec{x}}=\frac{1}{m}\sum _{i=1}^{m} \varvec{x}_{M+i}, \ {\varvec{w}^*=\frac{\bar{\varvec{x}}}{\Vert \bar{\varvec{x}}\Vert }}. \end{aligned}$$
(19)

This vector points to the center of the group to be separated from the set \({\mathcal {M}}\). In low dimensions, e.g., when \(n = 2\), such functionality appears to be extremely unlikely. However, high-dimensional neurons can accomplish this task with probability close to one. Formal statement of this property is provided in Theorem 2.

Theorem 2

Let elements of the sets \({\mathcal {M}}\) and \({\mathcal {Y}}\) be i.i.d. random vectors drawn from the equidistribution in \(B_n(1)\). Consider the sets of stimuli \({\mathcal {S}}({\mathcal {M}})\) and \({\mathcal {S}}({\mathcal {Y}})\) specified by (7) and let \(D, \, \varepsilon , \, \delta \in (0,1)\) be chosen such that

$$\begin{aligned} \theta ^*=\frac{(1-\varepsilon )^3 - \delta (m-1)}{\sqrt{m(1-\varepsilon )[1-\varepsilon + \delta (m-1)]}} \in (D,1). \end{aligned}$$
(20)

Let \(\varvec{w}^{*} = \bar{\varvec{x}}/\Vert \bar{\varvec{x}} \Vert \) and consider the set:

$$\begin{aligned} \varOmega _D=\Big \{(\varvec{w},\theta ) \big | \ \Vert \varvec{w}-\varvec{w}^{*}\Vert < D, \ \theta \in (0, \theta ^*-D] \Big \}. \end{aligned}$$

Then

$$ \begin{aligned} \begin{aligned}&P\Big ( [ \varvec{s}_i\in \mathrm {Activated} ({\mathcal {S}}({\mathcal {Y}}),\varvec{w},\theta ) \ \forall \ \varvec{s}_i\in {\mathcal {S}}({\mathcal {Y}})\mathcal ] \ \& \ \\&\quad [ \varvec{s}_i\in \mathrm {Silent} ({\mathcal {S}}({\mathcal {M}}),\varvec{w},\theta ) \ \forall \ \varvec{s}_i\in {\mathcal {S}}({\mathcal {M}})] \Big | \ (\varvec{w},\theta )\in \varOmega _D \Big )\ge p(\varepsilon ,\delta ,D,m), \end{aligned} \end{aligned}$$
(21)

where

$$\begin{aligned} \begin{aligned} p(\varepsilon ,\delta ,D,m)=&(1-(1-\varepsilon )^n)^{m}\prod _{d=1}^{m-1} \left( 1-d \left( 1 -\delta ^2\right) ^{\frac{n}{2}} \right) \left[ 1 - \frac{1}{2}\varDelta ^\frac{n}{2}\right] ^{M}, \\ \varDelta =&1- \frac{\theta ^2}{(1+D)^2}. \end{aligned} \end{aligned}$$

The proof is provided in “Appendix D”. The theorem admits the following corollary.

Corollary 2

Suppose that the conditions of Theorem 2 hold. Let \(\theta ^*>2D\) and consider the set:

$$\begin{aligned} \varOmega _D^*=\Big \{(\varvec{w},\theta ) \big | \ \Vert \varvec{w}-\varvec{w}^{*}\Vert < D, \ \theta \in [\theta ^*- 2D, \theta ^*-D] \Big \}. \end{aligned}$$

Then

$$ \begin{aligned} \begin{aligned}&P\Big ( [ \varvec{s}_i\in \mathrm {Activated} ({\mathcal {S}}({\mathcal {Y}}),\varvec{w},\theta ) \ \forall \ \varvec{s}_i\in {\mathcal {S}}({\mathcal {Y}})\mathcal ] \ \& \ \\&\ [ \varvec{s}_i\in \mathrm {Silent} ({\mathcal {S}}({\mathcal {M}}),\varvec{w},\theta ) \ \forall \ \varvec{s}_i\in {\mathcal {S}}({\mathcal {M}})] \Big | (\varvec{w},\theta )\in \varOmega _D^*\Big )\ge \\&(1-(1-\varepsilon )^n)^{m}\prod _{d=1}^{m-1} \left( 1-d \left( 1 -\delta ^2\right) ^{\frac{n}{2}} \right) \left[ 1 - \frac{1}{2}\varDelta ^\frac{n}{2}\right] ^{M}, \\&\varDelta =1- \left( \frac{\theta ^*-2D}{1+D}\right) ^2. \end{aligned} \end{aligned}$$
(22)

Remark 3

Estimates (21), (22) hold for all feasible values of \(\varepsilon \) and \(\delta \). Maximizing the r.h.s of (21), (22) over feasible domain of \(\varepsilon \), \(\delta \) provides lower-bound “optimistic” estimates of the neuron performance.

Remark 4

The term \(\theta ^*\) in Theorem 2 and Corollary 2 is an upper bound for the firing threshold \(\theta \). The larger is the value of \(\theta \), the higher is the neuronal selectivity to multiple stimuli. The value of \(\theta ^*\), however, decays with the number of stimuli m.

The extent to which the decay mentioned in Remark 4 affects neuronal selectivity to a group of stimuli depends largely on the neuronal dimension, n. Note also that the probability of neuronal selective response to multiple stimuli, as provided by Theorem 2, can be much larger if elements of the set \({\mathcal {Y}}\) are spatially close to each other or positively correlated (Tyukin et al. 2017) (see also Lemma 4 in “Appendix F”).

Remark 5

Similarly to the case considered in Corollary 1, the maximal size of the stimuli set \({\mathcal {S}}({\mathcal {M}})\) for which selective response is ensured, with some fixed probability, grows exponentially with dimension n. Indeed, denoting \(\phi =(1-z)^{\overline{M}}\), letting \(z=1/2 \varDelta ^{n/2}\) (with \(\varDelta \) defined in Theorem 2) and invoking (34), (35) from the proof of Corollary 1, we observe that

$$\begin{aligned} \overline{M}>-\ln (\phi )(z^{-1}-1)=-\ln (\phi ) (2 e^{\beta n}-1), \ \beta =\ln \frac{1+D}{\sqrt{(1+D)^2-\theta ^2}}. \end{aligned}$$

Thus, for \(M=|{\mathcal {S}}({\mathcal {M}})|\le \overline{M}\), the r.h.s. of (21) is bounded from below by

$$\begin{aligned} (1-(1-\varepsilon )^n)^{m}\prod _{d=1}^{m-1} \left( 1-d \left( 1 -\delta ^2\right) ^{\frac{n}{2}} \right) \phi . \end{aligned}$$

Similar estimate can be provided for the case considered in Corollary 2.

Fig. 6
figure 6

Selectivity of a single neuron to multiple stimuli. a Corresponds to the case when the informational content vectors, \(\varvec{x}_i\), are sampled from the equidistribution in the unit ball \(B_n(1)\), and b corresponds to the equidistribution in the n-cube centered in the origin. In both cases the neuronal selectivity approaches \(100\%\) when the dimension n grows. In (a) dashed curves show the estimates provided by Theorem 2. Parameter values: \(\varepsilon = 0.01\), \(D = 0.001\), \(\delta = (1 - \varepsilon )/2(m-1), \theta = \theta ^* - D\)

To illustrate Theorem 2 we conducted several numerical experiments. For each n we generated \(M=10^3\) of background information items \(\varvec{x}_i\) (the set \({\mathcal {M}}\)) and \(m=2, 5, 8\) relevant vectors (the sets \({\mathcal {Y}}\)). In the first group of experiments all \(M+m\) i.i.d. random vectors were chosen from the equidistribution in \(B_n(1)\). Neuronal parameters were set in accordance with Theorem 2 (i.e., Eqs. 1921). Figure 6a illustrates the results.

Similarly to the case of neuronal selectivity to a single item (Fig.  5a), we observe a steep growth of the selectivity index with the neuronal dimension. The sharp increase occurs, however, at significantly higher dimensions. The number of random and uncorrelated stimuli, m, to which a neuron should be able to respond selectively is fundamentally linked to the neuron dimensionality. For example, the probability that a neuron is selective to \(m=5\) random stimuli becomes sufficiently high only at \(n > 400\). This contrasts sharply with \(n=120\) for \(m=2\).

Our numerical experiments also show that the firing threshold specified in Theorem 2 for arbitrarily chosen fixed values of \(\delta \) and \(\varepsilon \) is not optimal in the sense of providing the best possible probability estimates. Playing with \(\theta \) one can observe that the values of n at which neuronal selectivity to multiple stimuli starts to emerge are in fact significantly lower than those predicted by Eq. (22). This is not surprising. First, since estimate (22) holds for all admissible values of \(\delta \) and \(\varepsilon \), it should also hold for the maximizer of \(p(\varepsilon ,\delta ,D,m)\). Second, the estimate is conservative in the sense that it is based on conservative estimates of the volume of spherical cups \({\mathcal {C}}_n\) (see, e.g., proof of Theorem 1). Deriving more accurate numerical expressions for the latter is possible, although at the expense of simplicity.

To demonstrate that dependence of the selectivity index on the firing threshold is likely to hold qualitatively for broader classes of distributions from which the sets \({\mathcal {M}}\) and \({\mathcal {Y}}\) are drawn, we repeated the simulation for the equidistribution in an n-cube centered at the origin. In this case, Theorem 2 does not formally apply. Yet, an equivalent statement can still be produced (cf. Gorban and Tyukin 2017). In these experiments synaptic weights were set to \(\varvec{w}=\bar{\varvec{x}}/\Vert \bar{\varvec{x}}\Vert \) and \(\theta = 0.5\Vert \bar{\varvec{x}}\Vert \). The results are shown in Fig. 6b. The neuron’s performance in the cube is markedly better than that of in \(B_n(1)\). Interestingly, this is somewhat contrary to expectations that might have been induced by our earlier experiments (shown in Fig. 5) in which neuronal selectivity to a single stimulus was more pronounced for \(B_n(1)\).

Overall, these results suggest that single neurons can indeed separate random uncorrelated information items from a large set of background items with probability close to one. This gives rise to a possibility for a neuron to respond selectively to various arbitrary uncorrelated information items simultaneously. The latter property provides a natural mechanism for accurate and precise grouping of stimuli in single neurons.

4.4 Dynamic Memory: Learning New Information Items by Association

In the previous sections we dealt with a static model of neuronal functions, i.e., when the synaptic efficiency \(\varvec{w}\) either did not change at all or the changes were negligibly small over large intervals of stimuli presentation. In the presence of synaptic plasticity (10), the latter case corresponds to \(0\le \alpha \ll 1\) in (10). In this section we explicitly account for the time evolution of the synaptic efficiency, \(\varvec{w}(t,\varvec{w}_0)\) [Eq. (10)]. As we will see below, this may give rise to dynamic memories in single neurons.

As before, we will deal with two sets of stimuli, the relevant one, \({\mathcal {S}}({\mathcal {Y}})\), and the background one, \({\mathcal {S}}({\mathcal {M}})\). We will consider two time epochs: (i) Learning phase and (ii) Retrieval phase. Within the learning phase we assume that all stimuli from the set \({\mathcal {S}}({\mathcal {Y}})\) arrive to a neuron completely synchronized, i.e.,:

$$\begin{aligned} \tau _{M+1,j}=\tau _{M+2,j}=\cdots = \tau _{M+m,j}, \ \ \forall \ j. \end{aligned}$$
(23)

Such a synchronization could be interpreted as a mechanism for associating or grouping different uncorrelated information items for the purposes of memorizing them at a later stage.

The dynamics of the synaptic weights for \(t\ge t_0\) is given Eq. (10) with the input signal \(\varvec{s}\) replaced with:

$$\begin{aligned} \bar{\varvec{s}}(t)= \sum _{i=1}^m \varvec{s}_{M+i}(t). \end{aligned}$$
(24)

Let \(\varvec{w}_0={\varvec{w}(t_0)}\) and \(\theta \) satisfy the following condition:

$$\begin{aligned} \begin{aligned}&\exists \ \varvec{s}_k\in {\mathcal {S}}({\mathcal {Y}}) \ \text{ such } \text{ that } \ \varvec{s}_k\in \mathrm {Activated}({\mathcal {S}}({\mathcal {Y}}),\varvec{w}_0,\theta ) \\&\varvec{s}_i\in \mathrm {Silent}({{\mathcal {S}}},\varvec{w}_0,\theta ) \ \text{ for } \text{ all } \ \varvec{s}_i\in {{\mathcal {S}}}\setminus \{\varvec{s}_k\}. \end{aligned} \end{aligned}$$
(25)

Thus, at \(t=t_0\) only one information item is “known” to the neuron. All other relevant items from the set \({\mathcal {Y}}\) are “new” in the sense that the neuron rejects them at \(t=t_0\). Theorem 1 specifies the sets of neuronal parameters \(\varvec{w}_0,\theta \) for which condition (25) holds with probability close to one if n is large enough.

The question is: What is the probability that, during the learning phase the synaptic weights \(\varvec{w}(t,\varvec{w}_0)\) evolve in time so that the neuron becomes responsive to all \(\varvec{s}_i\in {\mathcal {S}}({\mathcal {Y}})\) while remaining silent to all \(\varvec{s}_i\in {\mathcal {S}}({\mathcal {M}})\) (Fig. 1C.3)? In other words, the neuron learns new items and recognizes them in the retrieval phase. The following theorem provides an answer to this question.

Theorem 3

Let elements of the sets \({\mathcal {M}}\) and \({\mathcal {Y}}\) be i.i.d. random vectors drawn from the equidistribution in \(B_n(1)\). Consider the sets of stimuli \({\mathcal {S}}({\mathcal {M}})\) and \({\mathcal {S}}({\mathcal {Y}})\) specified by (7). Let (23) hold, the dynamics of neuronal synaptic weights satisfy (10), (24), and \((\varvec{w}_0, \theta )\) be chosen such that condition (25) is satisfied. Pick \(\varepsilon ,\delta \in (0,1)\) such that

$$\begin{aligned} (1-\varepsilon )^3 > \delta (m-1). \end{aligned}$$

Moreover, suppose that

  1. 1.

    There exist \(L,\kappa >0\) such that

    $$\begin{aligned} \int _{t}^{t+L} v(\bar{\varvec{s}}(\tau ),\varvec{w}(\tau ,{\varvec{w}_0}),\theta ) \langle \bar{\varvec{s}}(\tau ),\varvec{w}(\tau ,{\varvec{w}_0}) \rangle ^2 {d\tau } > \kappa , \ \ \forall \ t\ge t_0. \end{aligned}$$
  2. 2.

    The firing threshold, \(\theta \), satisfies

    $$\begin{aligned} 0<\theta < \frac{(1-\varepsilon )^3 - \delta (m-1)}{\sqrt{m(1-\varepsilon )[(1-\varepsilon )+\delta (m-1)]}}={\theta ^*}. \end{aligned}$$

Then for, any \(0<D\le \theta ^*-\theta \), there is \(t_1(D)> t_0\) such that

$$ \begin{aligned} \begin{aligned}&P([{\mathcal {S}}({\mathcal {Y}})\in \mathrm {Activated}({\mathcal {S}},\varvec{w}(t,{\varvec{w}_0}),\theta )]\ \& \ [{\mathcal {S}}({\mathcal {M}})\in \mathrm {Silent}({\mathcal {S}},\varvec{w}(t,{\varvec{w}_0}),\theta )])\ge \\&\quad (1-(1-\varepsilon )^n)^{m}\prod _{d=1}^{m-1} \left( 1-d \left( 1 -\delta ^2\right) ^{\frac{n}{2}} \right) \left[ 1 - \frac{1}{2}\left( 1-\frac{\theta ^2}{(1+D)^2}\right) ^\frac{n}{2}\right] ^{M} \end{aligned} \end{aligned}$$

for all \(t\ge t_1(D)\).

The proof is provided in “Appendix E”.

Fig. 7
figure 7

Dynamic memory: Learning new information items by association. a Example of the dynamic association of a known stimulus (neuron’s response to the known stimulus is shown by green curve) and a new one (neuron’s response shown by orange curve). Two relevant stimuli out of 502 are learnt by the neuron. At \(t\approx 2\) (red circle) the orange curve crosses the threshold (red dashed line) and stays above it for \(t>2\). Thus the neuron detects the corresponding stimulus for \(t>2\). b Same as in A but for \(m = 4\) and \(m = 12\). Parameter values: \(\varepsilon = 0.01\), \(D = 0.001\), \(\delta = (1 - \varepsilon )^3/2(m - 1)\), \(\alpha = 1\), \(M = 500\), \(\theta = \theta ^* - D\), \(n=400\) (Color figure online)

Figure 7 illustrates the theorem numerically. First we assumed that the relevant set \({\mathcal {Y}}\) consists of \(m =2\) items. One of them is considered as “known” to the neuron (Fig. 7a, green). Its informational content, \(\varvec{x}_{M+1}\), satisfies the condition \(\langle \varvec{w}_0, \varvec{x}_{M+1}\rangle >\theta \), i.e., this stimulus evokes membrane potential above the threshold at \(t=t_0\). Consequently, the neuron detects this stimulus selectively as described in Sect. 4.1. For the second relevant stimulus (Fig. 7a, orange), however, we have \(\langle \varvec{w}_0,\varvec{x}_{M+2} \rangle < \theta \). Therefore, the neuron cannot detect such a stimulus alone. The background stimuli from the set \({\mathcal {S}}({\mathcal {M}})\) are also sub-threshold (Fig.  7a, back curves).

During the learning phase, the neuron receives \(M=500\) background and \(m=2\) relevant stimuli. The relevant stimuli from the set \({\mathcal {S}}({\mathcal {Y}})\) appear simultaneously, i.e., they are temporarily associated. The synaptic efficiency changes during the learning phase by action of the relevant stimuli. Therefore, the membrane potential, \(y(t) = \langle \varvec{w}(t,{\varvec{w}_0}),\bar{\varvec{s}}(t) \rangle \), progressively increases when the relevant stimuli arrive (Fig. 7a, green area). These neuronal adjustments give rise to a new functionality.

At some time instant (marked by red circle in Fig. 7a) the neuron becomes responsive to the new relevant stimulus (Fig.  7a, orange), which is synchronized with the “known” one. Note that all other background stimuli that show no temporal associativity remain below the threshold (Fig. 7a, black traces). Thus, after a transient period, the neuron learns new stimulus. Once the learning is over, the neuron detects selectively either of the two relevant stimuli.

The procedure just described can be used to associate together more than two relevant stimuli. Figure 7b shows examples for \(m=4\) and \(m=12\). In both cases the neuron was able to learn all relevant stimuli, while rejecting all background ones. We observed, however, that increasing the number of uncorrelated information items to be learnt, i.e., the value of m, reduces the gap between firing thresholds and the membrane potentials evoked by background stimuli. In other words, the neuron does detect the assigned group of new stimuli, but with lower accuracy. This behavior is consistent with the theoretical bound on \(\theta \) prescribed in the statement of Theorem 3.

5 Discussion

Theorems 13 and our numerical simulations demonstrate that the extreme neuronal selectivity to single and multiple stimuli, and the capability to learn uncorrelated stimuli observed in a range of empirical studies Quiroga et al. (2005), Viskontas et al. (2009), Ison et al. (2015) can be explained by simple functional mechanisms implemented in single neurons. The following basic phenomenological properties have been used to arrive to this conclusion: (i) the dimensionality n of the information content and neurons is sufficiently large, (ii) a perceptron neuronal model, Eq. (9), is an adequate representation of the neuronal response to stimuli, and (iii) plasticity of the synaptic efficiency is governed by Hebbian rule (10). A crucial consequence of our study is that no a priori assumptions on the structural organization of neuronal ensembles are necessary for explaining basic concepts of static and dynamic memories.

Our approach does not take into account more advanced neuronal behaviors reproduced by, e.g., models of spike-timing-dependent plasticity (Markram et al. 1997) and firing threshold adaptation (Fontaine et al. 2014). Nevertheless, our model captures essential properties of neuronal dynamics and as such is generic enough for the purpose of functional description of memories.

Firing threshold adaption, as reported in Fontaine et al. (2014), steers firing activity of a stimulated neuron to a homeostatic state. In this state, the value of the threshold is just large/small enough to maintain reasonable firing rate without over/under-excitation. In our model, such a mechanism could be achieved by setting the value of \(\theta \) sufficiently close to the highest feasible values specified in Theorems 1 and 2.

In addition to rather general model of neuronal behavior, another major theoretical assumption of our work was the presumption that stimuli informational content is drawn from an equidistribution in a unit ball \(B_n(1)\). This assumption, however, can be relaxed, and results of Theorems 13 generalized to product measures. Key ingredients of such generalizations are provided in Gorban and Tyukin (2017), and their practical feasibility is illustrated by numerical simulations with information items randomly drawn from a hypercube (Figs. 5, 6, 7).

Our theoretical and numerical analysis revealed an interesting hierarchy of cognitive functionality implementable at the level of single neurons. We have shown that cognitive functionality develops with the dimensionality or connectivity parameter n of single neurons. This reveals explicit relationships between levels of the neural connectivity in living organisms and different cognitive behaviors such organisms can exhibit (cf. Lobov et al. 2017). As we can see from Theorems 1, 2 and Figs. 5 and 6, the ability to form static memories increases monotonically with n. The increase in cognitive functionality, however, occurs in steps.

For n small (\(n\in [1,10]\)), neuronal selectivity to a single stimulus does not form. It emerges rapidly when the dimension parameter n exceeds some critical value, around \(n=10\div 20\) (see Fig. 5a). This constitutes the first critical transition. Single neurons become selective to single information items. The second critical transition occurs at significantly larger dimensions, around \(n=100{-}400\) (see Fig. 6). At this second stage the neuronal selectivity to multiple uncorrelated stimuli develops. The ability to respond selectively to a given set of multiple uncorrelated information items is apparently crucial for rapid learning “by temporal association” in such neuronal systems. This learning ability as well as formation of dynamic memories are justified by Theorem 3 and illustrated in Fig. 7.

In the core of our mathematical arguments are the concentration of measure phenomena exemplified in Gorban et al. (2016), Gorban and Tyukin (2018) and stochastic separation theorems (Gorban and Tyukin 2017; Gorban et al. 2016). Some of these results, which have been central in the proofs of Theorem 2 and 3, namely, the statements that random i.i.d. vectors from equidistributions in \(B_n(1)\) and product measures are almost orthogonal with probability close to one, are tightly related to the notion of effective dimensionality of spaces based on \(\epsilon \)-quasiorthogonality introduced in Hecht-Nielsen (1994), Kainen and Kurkova (1993). In these works the authors demonstrated that in high dimensions there exist exponentially large sets of quasiorthogonal vectors. Gorban et al. (2016), however, as well as in our current work (see Lemma 3) we demonstrated that not only such sets exist, but also that they are typical.

Finally, we note that the number of multiple stimuli that can be selectively detected by single neurons is not extraordinarily large. In fact, as we have shown in Figs. 6 and 7, memorizing 8 information items at the level of single neurons requires more than 400 connections. This suggests that not only new memories are naturally packed in quanta, but also that there is a limit on this number that is associated with the cost of implementation of such a functionality. This cost is the number of individual functional synapses. Balancing the costs in living beings is of course a subject of selection and evolution. Nevertheless, as our study has shown, there is a clear functional gain that these costs may be paid for.

6 Conclusion

In this work we analyzed the striking consequences of the abundance of signalling routes for functionality of neural systems. We demonstrated that complex cognitive functionality derived from extreme selectivity to external stimuli and rapid learning of new memories at the level of single neurons can be explained by the presence of multiple signalling routes and simple physiological mechanisms. At the basic level, these mechanisms can be reduced to a mere perceptron-like behavior of neurons in response to stimulation and a Hebbian-type learning governing changes of the synaptic efficiency.

The observed phenomenon is robust. Remarkably, a simple generic model offers a clear-cut mathematical explanation of a wealth of empirical evidence related to in vivo recordings of “Grandmother” cells, “concept” cells, and rapid learning at the level of individual neurons (Quiroga et al. 2005; Viskontas et al. 2009; Ison et al. 2015). The results can also shed light on the question why Hebbian learning may give rise to neuronal selectivity in prefrontal cortex (Lindsay et al. 2017) and explain why adding single neurons to deep layers of artificial neural networks is an efficient way to acquire novel information while preserving previously trained data representations (Draelos et al. 2016).

Finding simple laws explaining complex behaviors has always been the driver of progress in Mathematical Biology and Neuroscience. Numerous examples of such simple laws can be found in the literature (see, e.g., Roberts et al. 2014; Jurica et al. 2013; Gorban et al. 2016; Perlovsky 2006). Our results not only provide a simple explanation of the reported empirical evidence but also suggest that such a behavior might be inherent to neuronal systems and hence organisms that operate with high-dimensional informational content. In such systems, complex cognitive functionality at the level of elementary units, i.e., single neurons, occurs naturally. The higher the dimensionality, the stronger the effect. In particular, we have shown that the memory capacity in ensembles of single neurons grows exponentially with the neuronal dimension. Therefore, from the evolutionary point of view, accommodating large number of signalling routes converging onto single neurons is advantageous despite the increased metabolic costs.

The considered class of neuronal models, being generic, is of course a simplification. It does not capture spontaneous firing, signal propagation in dendritic trees, and many other physiologically relevant features of real neurons. Moreover, in our theoretical assessments we assumed that the informational content processed by neurons is sampled from an equidistribution in a unit ball. The results, however, can already be generalized to product measure distributions (see, e.g., Gorban and Tyukin 2017). Generalizing the findings to models offering better physiological realism is the focus of our future works.