High-Dimensional Brain: A Tool for Encoding and Rapid Learning of Memories by Single Neurons

Tyukin, Ivan; Gorban, Alexander N.; Calvo, Carlos; Makarova, Julia; Makarov, Valeri A.

doi:10.1007/s11538-018-0415-5

High-Dimensional Brain: A Tool for Encoding and Rapid Learning of Memories by Single Neurons

Special Issue: Modelling Biological Evolution: Developing Novel Approaches
Open access
Published: 19 March 2018

Volume 81, pages 4856–4888, (2019)
Cite this article

Download PDF

You have full access to this open access article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

High-Dimensional Brain: A Tool for Encoding and Rapid Learning of Memories by Single Neurons

Download PDF

Ivan Tyukin ORCID: orcid.org/0000-0002-7359-7966^1,2,
Alexander N. Gorban¹,
Carlos Calvo³,
Julia Makarova^4,5 &
…
Valeri A. Makarov^3,5

3384 Accesses
26 Citations
6 Altmetric
Explore all metrics

Abstract

Codifying memories is one of the fundamental problems of modern Neuroscience. The functional mechanisms behind this phenomenon remain largely unknown. Experimental evidence suggests that some of the memory functions are performed by stratified brain structures such as the hippocampus. In this particular case, single neurons in the CA1 region receive a highly multidimensional input from the CA3 area, which is a hub for information processing. We thus assess the implication of the abundance of neuronal signalling routes converging onto single cells on the information processing. We show that single neurons can selectively detect and learn arbitrary information items, given that they operate in high dimensions. The argument is based on stochastic separation theorems and the concentration of measure phenomena. We demonstrate that a simple enough functional neuronal model is capable of explaining: (i) the extreme selectivity of single neurons to the information content, (ii) simultaneous separation of several uncorrelated stimuli or informational items from a large set, and (iii) dynamic learning of new items by associating them with already “known” ones. These results constitute a basis for organization of complex memories in ensembles of single neurons. Moreover, they show that no a priori assumptions on the structural organization of neuronal ensembles are necessary for explaining basic concepts of static and dynamic memories.

Universal principles justify the existence of concept cells

Article Open access 12 May 2020

Systems Theory, Emergent Properties, and the Organization of the Central Nervous System

Spatial, Temporal, and Behavioral Correlates of Hippocampal Neuronal Activity: A Primer for Computational Analysis

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The human brain is arguably among the most sophisticated and enigmatic nature creations. Over millions of years it has evolved to amass billions of neurons, featuring on average $86 \times 10^9$ cells (Herculano-Houzel 2012). This remarkable figure is several orders of magnitude higher than that of the most mammals and several times larger than in primates (Herculano-Houzel 2011). While measuring roughly $2\%$ of the body mass, the human brain consumes about $20\%$ of the total energy (Clark and Sokoloff 1999).

The significant metabolic cost associated with a larger brain in humans, as opposed to mere body size—a path that great apes might have evolved (Herculano-Houzel 2011), must be justified by evolutionary advantages. Some of the benefits may be related to the development of a remarkably important social life in humans. This, in particular, requires extensive abilities in the formation of complex memories. Indirectly this hypothesis is supported by the significant difference among species in the number of neurons in the cortex (Herculano-Houzel 2009) and the hippocampus (Andersen et al. 2007). For example, in the CA1 area of the hippocampus there are $0.39\times 10^6$ pyramidal neurons in rats, $1.3\times 10^6$ in monkeys, and $14\times 10^6$ in humans.

Evolutionary implications in relation to cognitive functions have been widely discussed in the literature (see, e.g., Platek et al. 2007; Sherwood et al. 2012; Sousa et al. 2017). Recently, it has been shown that in humans new memories can be learnt very rapidly by supposedly individual neurons from a limited number of experiences (Ison et al. 2015). Moreover, some neurons can exhibit remarkable selectivity to complex stimuli, the evidence that has led to debates around the existence of the so-called “grand mother” and “concept” cells (Quiroga et al. 2005; Viskontas et al. 2009; Quiroga 2012), and their role as elements of a declarative memory. These findings suggest that not only the brain can learn rapidly but also it can respond selectively to “rare” individual stimuli. Moreover, experimental evidence indicates that such a cognitive functionality can be delivered by single neurons (Ison et al. 2015; Quiroga et al. 2005; Viskontas et al. 2009). The fundamental questions, hence, are: How is this possible? and What could be the underlying functional mechanisms?

Recent theoretical advances achieved within the Blue Brain Project show that the brain can operate in many dimensions (Reimann et al. 2017). It is claimed that the brain has structures operating in up to eleven dimensions. Groups of neurons can form the so-called cliques, i.e., networks of specially interconnected neurons that generate precise representations of geometric objects. Then the dimension grows with the number of neurons in the clique. Multidimensional representation of spatiotemporal information in the brain is also implied in the concept of generalized cognitive maps (see, e.g., Villacorta-Atienza et al. 2015; Calvo et al. 2016; Villacorta-Atienza et al. 2017). Within this theory, spatiotemporal relations between objects in the environment are encoded as static (cognitive) maps and represented as elements of an n-dimensional space ($n\gg 1$). The cognitive maps as information items can be learnt, classified, and retrieved on demand (Villacorta-Atienza and Makarov 2013). However, the questions concerning how the brain or individual neurons can distinguish among a huge number of different maps and select an appropriate one remain unknown.

In this work we propose that brain areas with a predominant laminar topology and abundant signalling routes simultaneously converging on individual cells (e.g., the hippocampus) are propitious for a high-dimensional processing and learning of complex information items. We show that a canonical neuronal model, the perceptron Rosenblatt (1962), in combination with a Hebbian-type of learning may provide answers to the above-mentioned fundamental questions. In particular, starting from stochastic separation theorems (Gorban and Tyukin 2017, 2018) we demonstrate that individual neurons gathering multidimensional stimuli through a sufficiently large number of synaptic inputs can exhibit extreme selectivity either to individual information items or to groups of items. Moreover, neurons are capable of associating and learning uncorrelated information items. Thus, a large number of signalling routes simultaneously converging on a large number of single cells, as it is widely observed in laminar brain structures, translates into a natural environment for rapid formation and maintenance of extensive memories. This is vital for social life and hence may constitute a significant evolutionary advantage, albeit, at the cost of high metabolic expenditure.

2 Fundamental Problems of Encoding Memories

Different brain structures, such as the hippocampus, have a pronounced laminar organization. For example, the CA1 region of the hippocampus is constituted by a palisade of morphologically similar pyramidal cells oriented with their main axis in parallel and forming a monolayer (Fig. 1a). The major excitatory input to these neurons comes through Schaffer collaterals from the CA3 region (Amaral and Witter 1989; Ishizuka et al. 1990; Wittner et al. 2007), which is a hub routing information among many brain structures. Each CA3 pyramidal neuron sends an axon that bifurcates and leaves multiple collaterals in the CA1 with dominant parallel orientation (Fig. 1b). This topology allows multiple parallel axons conveying multidimensional “spatial” information from one area (CA3) simultaneously leave synaptic contacts on multiple neurons in another area (CA1). Thus, we have simultaneous convergence and divergence of the information content (Fig. 1b, right).

Experimental findings show that multiple CA1 pyramidal cells distributed in the rostro-caudal direction are activated near-synchronously by assemblies of simultaneously firing CA3 pyramidal cells (Ishizuka et al. 1990; Li et al. 1994; Benito et al. 2014). Thus, an ensemble of single neurons in the CA1 can receive simultaneously the same synaptic input (Fig. 1b, left). Since these neurons have different topology and functional connectivity (Finnerty and Jefferys 1993), their response to the same input can be different. Moreover, experimental in vivo results show that long-term potentiation can significantly increase the spike transfer rate in the CA3–CA1 pathway (Fernandez-Ruiz et al. 2012). This suggests that the efficiency of individual synaptic contacts can be increased selectively.

In this work we will follow conventional and rather general functional representation of signalling in the neuronal pathways. We assume that upon receiving an input, a neuron can either generate a response or remain silent. Forms of the neuronal responses as well as the definitions of synaptic inputs vary from one model to another. Therefore, here we adopt a rather general functional approach. Under a stimulus we understand a number of excitations simultaneously (or within a short time window) arriving to a neuron through several axones and thus transmitting some “spatially coded” information items (Benito et al. 2016). If a neuron responds to a stimulus (e.g., generates output spikes or increases its firing rate), we then say that the neuron detects the informational content of the given stimulus.

We follow the standard machine learning assumptions (Vapnik and Chapelle 2000; Cucker and Smale 2002). The stimuli are generated in accordance with some distribution or a set of distributions (“Outer World Models”). All stimuli that a neuron may receive are samples from this distribution. The sampling itself may be a complicated process, and for simplicity we assume that all samples are identically and independently distributed (i.i.d.). Once a sample is generated, a stimuli sub-sample is independently selected for testing purposes. If more than one neuron is considered, we will assume that a rule (or a set of rules) is in place that determines how a neuron is selected from the set. The rules can be both deterministic and randomized. In the latter case we will specify this process.

Let us now pose the following fundamental questions related to the information encoding and formation of memories by single neurons and their ensembles in laminated brain structures:

1.
Selectivity: detection of one stimulus from a set (Fig. 1C.1) Pick an arbitrary stimulus from a reasonably large set such that a single neuron from a neuronal ensemble detects this stimulus. Then what is the probability that this neuron is stimulus-specific, i.e., it rejects all the other stimuli from the set?
2.
Clustering: detection of a group of stimuli from a set (Fig. 1C.2) Within a set of stimuli we select a smaller subset, i.e., a group of stimuli. Then what is the probability that a neuron detecting all stimuli from this subset stays silent for all remaining stimuli in the set?
3.
Acquiring memories: learning new stimulus by associating it with one already known (Fig. 1C.3) Let us consider two different stimuli $\varvec{s}_1$ and $\varvec{s}_2$ such that for $t\le t_0$ they do not overlap in time and a neuron detects $\varvec{s}_1$, but not $\varvec{s}_2$. In the next interval $(t_0,t_1]$, $t_1>t_0$ the stimuli start to overlap in time (i.e., they stimulate the neuron together). For $t> t_1$ the neuron receives only stimulus $\varvec{s}_2$. Then what is the probability that for some $t_2\ge t_1$ the neuron detects $\varvec{s}_2$?

These questions are in the core of a broad range of puzzling phenomena reported in Ison et al. (2015), Quiroga et al. (2005), Viskontas et al. (2009). In what follows we will show that, remarkably, these three non-trivial fundamental questions can be answered within a simple classical modeling framework, whereby a neuron is represented by a mere perceptron equipped with a Hebbian-type of learning.

3 Formal Statement of the Problem

In this section we specify the information content to be processed by neurons and define a mathematical model of a generic neuron equipped with synaptic plasticity. Before going any further, let us first introduce notational agreements used throughout the text. Given two vectors $\varvec{x},\varvec{y}\in \mathbb {R}^n$, their inner product $\langle \varvec{x}, \varvec{y} \rangle $ is: $\langle \varvec{x}, \varvec{y} \rangle =\sum _{i=1}^n x_i y_i$. If $\varvec{x}\in \mathbb {R}^n$ then $\Vert \varvec{x}\Vert $ stands for the usual Euclidean norm of $\varvec{x}$: $\Vert \varvec{x}\Vert =\langle \varvec{x},\varvec{x}\rangle ^{1/2}$. By $B_n(1)=\{\varvec{x}\in \mathbb {R}^n | \ \ \Vert \varvec{x}\Vert \le 1\}$ we denote a unit n-ball centered at the origin; $\mathcal {V}(\Xi )$ is the Lebesgue volume of $\Xi \subset \mathbb {R}^n$, and $|{\mathcal {M}}|$ is the cardinality of a finite set ${\mathcal {M}}$. Symbol ${\mathcal {C}}({\mathcal {D}})$, ${\mathcal {D}}\subseteq \mathbb {R}^m$ stands for the space of continuous real-valued functions on ${\mathcal {D}}$.

3.1 Information Content and Classes of Stimuli

We assume that a neuron receives and processes a large but finite set of different stimuli codifying different information items:

$$\begin{aligned} {\mathcal {S}}=\{ \varvec{s}_i \}. \end{aligned}$$

(1)

Figure 2 illustrates schematically the information flow. Each individual stimulus i is modeled by a function $\varvec{s}:\mathbb {R}\times \mathbb {R}^n\rightarrow \mathbb {R}^n$:

$$\begin{aligned} \varvec{s}(t,\varvec{x}_i)=\varvec{x}_i \sum _{j}c(t-{\tau _{i,j}}), \end{aligned}$$

(2)

where $\varvec{x}_i \in \mathbb {R}^n\setminus \{0\}$ is the stimulus content codifying the information to be transmitted over n individual “axons”. An example of an information item could be an $l\times k$ image (see Fig. 2). In this case the dimension of each information item is $n = l\times k$.

In Eq. (2) the function $c(\cdot )$ defines the stimulus context, i.e., the time window when the stimulus arrives to the neuron. For the sake of simplicity we use a rectangular window:

$$\begin{aligned} c(t)=\left\{ \begin{array}{ll} 1, &{} \text{ if } t\in [0,\varDelta T] \\ 0, &{} \text{ otherwise }, \end{array} \right. \end{aligned}$$

(3)

where $\varDelta T > 0$ is the window length. The time instants of the stimulus presentations, $\tau _{i,j}$, are ordered and satisfy:

$$\begin{aligned} \tau _{i,j+1} > \tau _{i,j} + \varDelta T, \ \ \forall j. \end{aligned}$$

(4)

Different stimuli arriving to the neuron are added linearly on the neuronal membrane. Thus, the overall neuronal input $\varvec{S}$ can be written as:

$$\begin{aligned} \varvec{S}(t) = \sum _{i,j} \varvec{x}_i c(t-\tau _{i,j}). \end{aligned}$$

(5)

We assume that the information content of stimuli (5) and (2), i.e., vectors $\varvec{x}_i$ are drawn i.i.d. from some distribution. For convenience, we partition all information items into two sets:

$$\begin{aligned} {\mathcal {M}}=\{\varvec{x}_{1},\dots , \varvec{x}_{M}\}, \ \ {\mathcal {Y}}=\{\varvec{x}_{M+1},\dots , \varvec{x}_{M+m}\}, \end{aligned}$$

(6)

where M is large but finite and $m\ge 1$ is in general smaller than M. The set ${\mathcal {M}}$ contains a background content for a given neuron, whereas the set ${\mathcal {Y}}$ models the informational content relevant to the task at hand. In other words, to accomplish a static memory task the neuron should be able to detect all elements from ${\mathcal {Y}}$ and to reject all elements from ${\mathcal {M}}$.

The sets ${\mathcal {M}}$ and ${\mathcal {Y}}$ give rise to the corresponding subsets of stimuli:

$$\begin{aligned} \begin{aligned} {\mathcal {S}}({\mathcal {M}})=\,&\{\varvec{s}_i\in {\mathcal {S}} \ | \ \varvec{s}_i(\cdot )=\varvec{s}(\cdot ,\varvec{x}_i), \ \varvec{x}_i\in {\mathcal {M}} \}, \\ {\mathcal {S}}({\mathcal {Y}})=\,&\{\varvec{s}_i\in {\mathcal {S}} \ | \ \varvec{s}_i(\cdot )=\varvec{s}(\cdot ,\varvec{x}_i), \ \varvec{x}_i\in {\mathcal {Y}} \}. \end{aligned} \end{aligned}$$

(7)

3.2 Neuronal Model

To stay within functional description of the information processing let us consider the most basic class of model neurons, a perceptron (Rosenblatt 1962). A single neuron receives a stimulus $\varvec{s}(t,\varvec{x})$ through n synaptic inputs (Fig. 2) and its membrane potential, $y \in \mathbb {R}$, is given by

$$\begin{aligned} y(\varvec{s},\varvec{w})=\langle \varvec{w},\varvec{s}\rangle , \end{aligned}$$

(8)

where $\varvec{w}\in \mathbb {R}^n$ is a vector of the synaptic weights. The neuron generates a response, $v\in \mathbb {R}$, according to:

$$\begin{aligned} v(\varvec{s},\varvec{w},\theta )= f(y(\varvec{s},\varvec{w}) - \theta ), \end{aligned}$$

(9)

where $\theta \in \mathbb {R}$ is the “firing” threshold and $f:\mathbb {R}\rightarrow \mathbb {R}$ is the transfer function (Fig. 2): $f \in {\mathcal {C}}(\mathbb {R})$, f is locally Lipschitz, $f(u)=0$ for $u\in (-\infty ,0]$, and $f(u)>0$ for $u\in (0,\infty )$.

Model (8), (9) captures the summation of postsynaptic potentials and the threshold nature of the neuronal activation but disregards the specific dynamics accounted for in other more advanced models. Nevertheless, as we will show in Sect. 4, this phenomenological model is already sufficient to explain the fundamental properties of information processing discussed in Sect. 2.

3.3 Synaptic Plasticity

In addition to the basic neuronal response mechanism (Sect. 3.2), we also model the synaptic plasticity. The description adopted here relies on the neuronal firing rate and Hebbian learning. Such a learning rule implies that the dynamics of $\varvec{w}$ should depend on the product of the input signal, $\varvec{s}$, and the neuronal output, v. We thus arrive to a modified classical Oja rule (Oja 1982):

$$\begin{aligned} \begin{aligned} \dot{\varvec{w}}\,=\,&\alpha v(\varvec{s},\varvec{w},\theta ) y(\varvec{s},\varvec{w}) \left( \varvec{s}- \varvec{w}y(\varvec{s},\varvec{w}) \right) ,\\ \varvec{w}(t_0)\,=\,&\varvec{w}_0\in \mathbb {R}^n, \ \varvec{w}_0\ne 0, \end{aligned} \end{aligned}$$

(10)

where $\alpha >0$ defines the relaxation time. The multiplicative term v in (10) ensures that plastic changes of $\varvec{w}$ occur only when an input stimulus evokes a nonzero neuronal response. The fact that $\varvec{w}_0\ne 0$ reflects the assumption that synaptic connections have already been established, albeit their efficacy could be subjected to plastic changes. In addition to capturing general principle of the classical Hebbian rule, model (10) guarantees that synaptic weights $\varvec{w}$ are bounded in forward time (see “Appendix A”) and hence conforms with physiological plausibility.

4 Formation of Memories in High Dimensions

In Sect. 2 we formulated three fundamental problems of organization of memories in laminar brain structures. Let us now show how they can be treated given that pyramidal neurons operate in high dimensions.

To formalize the analysis let ${\mathcal {U}}$ be a subset of the stimulus set ${\mathcal {S}}$. A neuron (8), (9) parameterized by $(\varvec{w}, \theta )$ partitions the set ${\mathcal {U}}$ into the following subsets:

$$\begin{aligned} \begin{aligned} \mathrm {Activated}({\mathcal {U}},(\varvec{w},\theta ))=&\{\varvec{s}_i\in {\mathcal {U}} \ | \ \ \exists \, t{\ge } t_0: \ v(\varvec{s}_i(t),\varvec{w},\theta ) > 0\},\\ \mathrm {Silent}({\mathcal {U}},(\varvec{w},\theta ))=&\{\varvec{s}_i\in {\mathcal {U}} \ | \ \ v(\varvec{s}_i(t),\varvec{w},\theta )= 0 \ \ \forall \ t\ge t_0 \}. \end{aligned} \end{aligned}$$

(11)

The first set corresponds to the stimuli detected by the neuron, while the second one collects background stimuli.

4.1 Extreme Selectivity of a Single Neuron to Single Stimuli

Consider the case when the set ${\mathcal {Y}}$ in (6) contains only one element, i.e., $|{\mathcal {Y}}|=1$, ${\mathcal {Y}}=\{\varvec{x}_{M+1}\}$, whereas the set ${\mathcal {M}}$ is allowed to be sufficiently large ($|{\mathcal {M}}|=M\gg 1$). Let us also assume that the stimuli with different information content, $\varvec{s}(\cdot ,\varvec{x}_i)$, do not overlap in time, i.e., we present them to a neuron one by one.

For a given nonzero $\varvec{x}_{M+1}\in {\mathcal {Y}}$ and stimulus $\varvec{s}(\cdot ,\varvec{x}_{M+1})$ such that it is not identically zero for $t\ge t_0$ we can always construct a neuron which would generate a nonzero response to the stimulus $\varvec{s}(\cdot ,\varvec{x}_{M+1})$ at some $t\ge t_0$. In other words, $\varvec{s}(\cdot ,\varvec{x}_{M+1})\in \mathrm {Activated}({\mathcal {S}}({\mathcal {Y}}),(\varvec{w},\theta ))$. Mathematically such a neuron can be defined as follows. Let

$$\begin{aligned} \varvec{w}^{*} = \frac{\varvec{x}_{M+1}}{\Vert \varvec{x}_{M+1}\Vert }. \end{aligned}$$

(12)

Then the space from which the synaptic weights are chosen can be represented as a direct sum of the one-dimensional linear subspace $L^{\Vert }(\varvec{w}^*)$ spanned by $\varvec{w}^{*}$ and an $(n-1)$-dimensional subspace $L^{\bot }(\varvec{w}^{*})$ of $\mathbb {R}^n$ that is orthogonal to $\varvec{w}^{*}$. In this representation, if a neuron with the synaptic weight $\varvec{w}$ generates a nonzero response to $\varvec{s}(\cdot ,\varvec{x}_{M+1})$, then the coupling weight $w^{*}=\langle \varvec{w},\varvec{w}^{*}\rangle $ should satisfy the following condition (Fig. 3, green area):

$$\begin{aligned} {w^{*}} > \frac{\theta }{\Vert \varvec{x}_{M+1}\Vert }. \end{aligned}$$

Indeed, such a choice is equivalent to

$$\begin{aligned} v(\varvec{x}_{M+1},\varvec{w},\theta ) = f({w^{*}}\Vert \varvec{x}_{M+1}\Vert - \theta ) > 0, \end{aligned}$$

which in turn implies that $v(\varvec{s}(t,\varvec{x}_{M+1}),\varvec{w}^*,\theta )>0$ at some t and vice-versa.

Once a neuron that detects relevant information item, i.e., $\varvec{x}_{M+1}$, is specified we can proceed with assessing its selectivity properties.

Definition 1

(Neuronal Selectivity) We say that a neuron is selective to the information content${\mathcal {Y}}$ iff it detects the relevant stimuli from the set ${\mathcal {S}}({\mathcal {Y}})$ and ignores all the others from the set ${\mathcal {S}}({\mathcal {M}})$.

The notion of selectivity, as stated in Definition 1, could be relaxed to account for partial detection and rejection of information content from ${\mathcal {Y}}$ and ${\mathcal {M}}$, respectively. This naturally gives rise to various levels of neuronal selectivity determined, for instance, by the proportion of elements from ${\mathcal {M}}$ that correspond to stimuli that have been rejected. As we will see below, different admissible pairs $(\varvec{w},\theta )$ (Fig. 3) produce different selectivity levels. The closer to the bisector, the higher the selectivity. One can pick an arbitrary firing threshold $\theta \ge 0$ and select the synaptic efficiency at $t=t_0$ as:

$$\begin{aligned} \varvec{w}(t_0) = \frac{\theta + \epsilon }{\Vert \varvec{x}_{M+1}\Vert }\varvec{w}^{*} + \varvec{w}^{\bot }, \ \ \ \epsilon >0, \ \varvec{w}^{\bot }\in L^{\bot }. \end{aligned}$$

(13)

It can be shown (see “Appendix A”) that if the stimulus $\varvec{s}(\cdot ,\varvec{x}_{M+1})$ is persistent over time and $\varvec{w}(t_0)$ satisfies (13) then synaptic efficiency $\varvec{w}(t,\varvec{w}_0)$ converges asymptotically (as $t\rightarrow \infty $) to:

$$\begin{aligned} \varvec{w}_{\infty } = \left\{ \begin{array}{ll} \varvec{w}^{*}, \ &{} \ \text{ if } \theta < \Vert \varvec{x}_{M+1}\Vert \\ \frac{\displaystyle \theta }{\displaystyle \Vert \varvec{x}_{M+1}\Vert }\varvec{w}^{*} + \varvec{w}^{\bot }_{\infty }, \ &{} \ \text{ if } \theta \ge \Vert \varvec{x}_{M+1}\Vert , \end{array} \right. \end{aligned}$$

(14)

where $\varvec{w}^{\bot }_{\infty }$ is an element of $ L^{\bot }$.

Figure 4 shows typical responses of neurons parameterized by different pairs $(\varvec{w},\theta )$ and subjected to stimulation by different information items $\varvec{x}_i$. Here $\varvec{x}_i$ correspond to $(30 \times 38)$-pixels color images (i.e., $\varvec{x}_i \in \mathbb {R}^{3420}$). Firing thresholds $\theta $ have been chosen at random, and weights $\varvec{w}$ have been set in accordance with (13) with the first three images serving as the relevant information items for the three corresponding neurons. No plastic changes in $\varvec{w}$ were allowed. The neurons detect their own (relevant) stimuli, as expected. Moreover, they do not respond to the stimulation by other background information items (4 out of $10^3$ images are shown in Fig. 4). Thus, the neurons indeed exhibit high stimulus selectivity.

The following theorem provides theoretical justification for these observations.

Theorem 1

Let elements of the sets ${\mathcal {M}}$ and ${\mathcal {Y}}$ be i.i.d. random vectors drawn from the equidistribution in $B_n(1)$. Consider the sets of stimuli ${\mathcal {S}}({\mathcal {M}})$ and ${\mathcal {S}}({\mathcal {Y}})$ specified by (7). Let $(\varvec{w},\theta )$ be the neuron parameters such that

$$\begin{aligned} \varvec{s}_{M+1}\in \mathrm {Activated}({\mathcal {S}}({{\mathcal {Y}})},(\varvec{w},\theta )) \ \text{ and } \ 0<\theta <\Vert \varvec{w}\Vert . \end{aligned}$$

Then:

1.
The probability that the neuron is silent for all background stimuli $\varvec{s}_i\in {\mathcal {S}}({\mathcal {M}})$ is bounded from below by:
$$\begin{aligned} \begin{aligned}&P( \varvec{s}_i \in \mathrm {Silent}({\mathcal {S}}({\mathcal {M}}),(\varvec{w},\theta )) \ \forall \varvec{s}_i\in {\mathcal {S}}({\mathcal {M}}) \big | \ \varvec{w},\theta ) \ge \\&\quad \ge \left[ 1-\frac{1}{2} \left( 1 - \frac{\theta ^2}{\Vert \varvec{w}\Vert ^2} \right) ^\frac{n}{2} \right] ^M. \end{aligned} \end{aligned}$$
(15)
2.
There is a family of sets parametrized by D ($0<D<\min \{\frac{1}{2}, \Vert \varvec{x}_{M+1}\Vert \}$):
$$\begin{aligned} \varOmega _D=\Big \{ (\varvec{w},\theta ) \big | \ \ \Vert \varvec{w}-\varvec{w}^{*} \Vert <D, \ D \le \Vert \varvec{x}_{M+1}\Vert - \theta \le 2D \Big \}, \end{aligned}$$
(16)
where $\varvec{w}^{*}=\varvec{x}_{M+1}/\Vert \varvec{x}_{M+1}\Vert $, such that $\varvec{s}_{M+1}\in \mathrm {Activated}({\mathcal {S}}({\mathcal {Y}}),(\varvec{w},\theta ))$, for $(\varvec{w},\theta )\in \varOmega _D$ and
$$\begin{aligned} \begin{aligned}&P\big ( \varvec{s}_i \in \mathrm {Silent}({\mathcal {S}}({\mathcal {M}}),(\varvec{w},\theta )) \ \forall \varvec{s}_i\in {\mathcal {S}}({\mathcal {M}})\big | \ \forall (\varvec{w},\theta )\in \varOmega _D\big ) \ge \\&\quad \ge \max _{\varepsilon \in (0,1-2D)} (1-(1-\varepsilon )^n) \left[ 1-\frac{1}{2} \rho (\varepsilon ,D)^{\frac{n}{2}} \right] ^M \end{aligned} \end{aligned}$$
(17)
where
$$\begin{aligned} \rho (\varepsilon ,D)= 1 - \left( \frac{1-\varepsilon -2D}{1+D}\right) ^2. \end{aligned}$$

The proof is provided in “Appendix B”.

Remark 1

For an admissible fixed $D>0$, the volume ${\mathcal {V}}(\varOmega _D)>0$. Therefore, the estimate provided by Theorem 1 is robust to small perturbations of $(\varvec{w},\theta )$, and slight fluctuations of neuronal characteristics are not expected to affect neuronal functionality.

Remark 2

Theorem 1 (part 2) specifies a non-iterative procedure for constructing sets of selective neurons. Such neurons detect given stimuli and reject the others, with high probability. Figure 3 (in brown) shows examples of three projections of the hypercylinders (16) ensuring robust selective stimulus detection. The smaller is the cylinder, the higher is the selectivity.

To illustrate Theorem 1 numerically we fixed the neuronal dimensionality parameter n and generated two random sets of information items comprising of $10^3$ elements each, i.e., $\{\varvec{x}_i\}_{i=1}^{10^3}$. One set was sampled from the equidistribution in a unit ball $B_n(1)$ centered at the origin (i.e., $\Vert \varvec{x}_i\Vert _2 \le 1$), and the other from the equidistribution in the hypercube $\Vert \varvec{x}_i\Vert _{\infty } \le 1$ (a product distribution). For each set of informational items, a neuronal ensemble of $10^3$ single neurons parameterized by $(\varvec{w}_i,\theta _i)$ was created. Each neuron was assigned fixed firing threshold $\theta _i = 0.5$, $i=1,\dots ,10^3$, whereas the synaptic efficiencies were set as $\varvec{w}_i=(\theta _i+\epsilon )\varvec{x}_i/\Vert \varvec{x}_i\Vert $, $\epsilon =0.05$. For these neuronal ensembles and their corresponding stimuli sets we evaluated output of each neuron and assessed the neuronal selectivity (see Def. 1). The procedure was repeated 10 times. This was followed by evaluation of the frequencies of selective neurons in the pool for each n.

Figure 5a shows frequencies of selective neurons in an ensemble, for $10^3$ stimuli taken from: i) a unit ball (red), ii) a hypercube (blue), and iii) the estimate provided by Theorem 1 (dashed). For n small ($n < 6$) neurons exhibit no selectivity, i.e., they confuse different stimuli and generate nonspecific responses. As expected, when neuronal dimensionality, n, increases, the neuronal selectivity increases rapidly; and at around $n = 20$ it approaches $100\%$.

4.2 Extreme Selectivity of a Single Neuron and Ensemble Memory Capacity

The property of a neuron to respond selectively to a single element from a large set of stimuli can be related to the notion of memory capacity of a neuronal ensemble comprising of a set of selective neurons.

Recall that in the framework of associative memory (Hopfield 1982), for each informational item (pattern) $\varvec{x}_i$ from the set ${\mathcal {M}}$ there is a vicinity ${\mathcal {V}}_i$ associated with $\varvec{x}_i$ and corresponding to all admissible perturbations of $\varvec{x}_i$. Suppose that for each $\varvec{x}_i$ there is a neuron in the ensemble that is activated for all stimuli with informational content $\varvec{x}$ in ${\mathcal {V}}_i$ and is silent for all other stimuli, i.e., for stimuli with $\varvec{x}$ in $\cup _{j\ne i}{\mathcal {V}}_j$. The maximal size of the set ${\mathcal {M}}$ for which this property holds will be referred to as the (absolute) memory capacity of the ensemble (cf. Hopfield 1982; Barrett et al. 2004; Leung et al. 1995).

This conventional mechanistic definition of memory capacity, however, is too restrictive to account for variability and uncertainty that biological neuronal ensembles and systems are to deal with. Indeed, informational items themselves may bear a degree of uncertainty resulting in that ${\mathcal {V}}_i\cap {\mathcal {V}}_j\ne \varnothing $ for some j, i, $i\ne j$. Furthermore, errors in memory retrievals are known to occur in classical artificial associative memory models too (see, e.g., Hopfield 1982; Amit et al. 1985; Leung et al. 1995). To be able to formally quantify such errors in relation to the number of informational items an ensemble is to store, we extend the classical notion as follows.

Suppose that for each $\varvec{x}_i$ there is a neuron in the ensemble that is activated for all stimuli with informational content $\varvec{x}\in {\mathcal {V}}_i$ and, with probability $\phi $, is silent for all stimuli with $\varvec{x}\in {\mathcal {V}}_j$, $j\ne i$. The maximal size of the set ${\mathcal {M}}$ for which this property holds will be referred to as the memory capacity with reliability$\phi $ of the ensemble.

Assuming that ${\mathcal {V}}_i$ are sufficiently small, an estimate of the memory capacity with reliability $\phi $ of a neuronal ensemble follows from Theorem 1.

Corollary 1

Let elements of the sets ${\mathcal {M}}$ and ${\mathcal {Y}}$ be i.i.d. random vectors drawn from the equidistribution in $B_n(1)$. Consider the set of stimuli ${\mathcal {S}}({\mathcal {M}})$ as defined in (7). Then for a given fixed $\phi \in (0,1)$ the maximal size $\overline{M}$ of the stimuli set ${\mathcal {S}}({\mathcal {M}})$ for which the following holds

$$\begin{aligned} P( \varvec{s}_i \in \mathrm {Silent}({\mathcal {S}}({\mathcal {M}}),(\varvec{w},\theta )) \ \forall \varvec{s}_i\in {\mathcal {S}}({\mathcal {M}}) \big | \ \varvec{w},\theta ) \ge \phi \end{aligned}$$

grows at least exponentially with the neuronal dimension n:

$$\begin{aligned} \overline{M}> -\ln \left( \phi \right) \left( 2e^{\alpha n} - 1\right) , \ \text{ where } \ \alpha = \ln \left[ \frac{\Vert \varvec{w}\Vert }{\sqrt{\Vert \varvec{w}\Vert ^2-\theta ^2}}\right] >0. \end{aligned}$$

(18)

The proof is given in “Appendix C”.

Figure 5b illustrates how the memory capacity with reliability $\phi $ grows with neuronal dimension n. For each neuronal dimension n we generated i.i.d. samples ${\mathcal {M}}$ with $|{\mathcal {M}}|=M$ from the equidistribution in $B_n(1)$ and the n-cube $[-1,1]^n$. For each sample, we defined neuronal ensembles comprising of M neurons with synaptic weights $\varvec{w}_i=\varvec{x}_i/\Vert \varvec{x}_i\Vert $ and thresholds $\theta _i=0.5$, and calculated the proportion of neurons in the ensemble that are activated by each stimulus. If the proportion was smaller than 0.05 of the total number of neurons, we incremented the value of M, generated a new sample ${\mathcal {M}}$ with increased cardinality M, and repeated the experiment. The values of M corresponding to samples at which the process stopped have been recorded and retained. These constituted empirical estimates of the maximal number of stimuli for which the proportion of neurons responding to a single stimulus is at most $0.05=1-\phi $. Figure 5b shows empirical means of such numbers for the unit ball and in the hypercube. As follows from these observations, memory capacity grows exponentially with the neuron dimension in both cases. Such a fast growth can easily cover quite exigent memory necessities.

4.3 Selectivity of a Single Neuron to Multiple Stimuli

To organize memories, the ability to associate different information items is essential (Fig. 1C2). To determine if such associations are feasible at the level of single neurons we assess neuronal selectivity to multiple stimuli. In particular, we consider the set ${\mathcal {Y}}$ [Eq. (6)] containing $m>1$ random vectors: ${\mathcal {Y}}=\{\varvec{x}_{M+1},\dots , \varvec{x}_{M+m}\}$. As in Sect. 4.1, here we assume that all stimuli do not overlap in time and arrive to the neuron separately. The question of interest is: Can we find a neuron [i.e., parameters $(\varvec{w},\theta )$], such that it would generate a nonzero response to all $\varvec{s}_i\in {\mathcal {S}}({\mathcal {Y}})$ and, with high enough probability, would be silent to all $\varvec{s}_i \in {\mathcal {S}}({\mathcal {M}})$?

Below we will show that this is indeed possible, provided that the neuronal dimensionality, n, is large enough. Moreover, the separation can be achieved by a neuron with the vector of synaptic weights, $\varvec{w}=\varvec{w}^*$, closely aligned with the mean vector of the stimulus set ${\mathcal {Y}}$:

$$\begin{aligned} \bar{\varvec{x}}=\frac{1}{m}\sum _{i=1}^{m} \varvec{x}_{M+i}, \ {\varvec{w}^*=\frac{\bar{\varvec{x}}}{\Vert \bar{\varvec{x}}\Vert }}. \end{aligned}$$

(19)

This vector points to the center of the group to be separated from the set ${\mathcal {M}}$. In low dimensions, e.g., when $n = 2$, such functionality appears to be extremely unlikely. However, high-dimensional neurons can accomplish this task with probability close to one. Formal statement of this property is provided in Theorem 2.

Theorem 2

Let elements of the sets ${\mathcal {M}}$ and ${\mathcal {Y}}$ be i.i.d. random vectors drawn from the equidistribution in $B_n(1)$. Consider the sets of stimuli ${\mathcal {S}}({\mathcal {M}})$ and ${\mathcal {S}}({\mathcal {Y}})$ specified by (7) and let $D, \, \varepsilon , \, \delta \in (0,1)$ be chosen such that

$$\begin{aligned} \theta ^*=\frac{(1-\varepsilon )^3 - \delta (m-1)}{\sqrt{m(1-\varepsilon )[1-\varepsilon + \delta (m-1)]}} \in (D,1). \end{aligned}$$

(20)

Let $\varvec{w}^{*} = \bar{\varvec{x}}/\Vert \bar{\varvec{x}} \Vert $ and consider the set:

$$\begin{aligned} \varOmega _D=\Big \{(\varvec{w},\theta ) \big | \ \Vert \varvec{w}-\varvec{w}^{*}\Vert < D, \ \theta \in (0, \theta ^*-D] \Big \}. \end{aligned}$$

Then

$$ \begin{aligned} \begin{aligned}&P\Big ( [ \varvec{s}_i\in \mathrm {Activated} ({\mathcal {S}}({\mathcal {Y}}),\varvec{w},\theta ) \ \forall \ \varvec{s}_i\in {\mathcal {S}}({\mathcal {Y}})\mathcal ] \ \& \ \\&\quad [ \varvec{s}_i\in \mathrm {Silent} ({\mathcal {S}}({\mathcal {M}}),\varvec{w},\theta ) \ \forall \ \varvec{s}_i\in {\mathcal {S}}({\mathcal {M}})] \Big | \ (\varvec{w},\theta )\in \varOmega _D \Big )\ge p(\varepsilon ,\delta ,D,m), \end{aligned} \end{aligned}$$

(21)

where

$$\begin{aligned} \begin{aligned} p(\varepsilon ,\delta ,D,m)=&(1-(1-\varepsilon )^n)^{m}\prod _{d=1}^{m-1} \left( 1-d \left( 1 -\delta ^2\right) ^{\frac{n}{2}} \right) \left[ 1 - \frac{1}{2}\varDelta ^\frac{n}{2}\right] ^{M}, \\ \varDelta =&1- \frac{\theta ^2}{(1+D)^2}. \end{aligned} \end{aligned}$$

The proof is provided in “Appendix D”. The theorem admits the following corollary.

Corollary 2

Suppose that the conditions of Theorem 2 hold. Let $\theta ^*>2D$ and consider the set:

$$\begin{aligned} \varOmega _D^*=\Big \{(\varvec{w},\theta ) \big | \ \Vert \varvec{w}-\varvec{w}^{*}\Vert < D, \ \theta \in [\theta ^*- 2D, \theta ^*-D] \Big \}. \end{aligned}$$

Then

$$ \begin{aligned} \begin{aligned}&P\Big ( [ \varvec{s}_i\in \mathrm {Activated} ({\mathcal {S}}({\mathcal {Y}}),\varvec{w},\theta ) \ \forall \ \varvec{s}_i\in {\mathcal {S}}({\mathcal {Y}})\mathcal ] \ \& \ \\&\ [ \varvec{s}_i\in \mathrm {Silent} ({\mathcal {S}}({\mathcal {M}}),\varvec{w},\theta ) \ \forall \ \varvec{s}_i\in {\mathcal {S}}({\mathcal {M}})] \Big | (\varvec{w},\theta )\in \varOmega _D^*\Big )\ge \\&(1-(1-\varepsilon )^n)^{m}\prod _{d=1}^{m-1} \left( 1-d \left( 1 -\delta ^2\right) ^{\frac{n}{2}} \right) \left[ 1 - \frac{1}{2}\varDelta ^\frac{n}{2}\right] ^{M}, \\&\varDelta =1- \left( \frac{\theta ^*-2D}{1+D}\right) ^2. \end{aligned} \end{aligned}$$

(22)

Remark 3

Estimates (21), (22) hold for all feasible values of $\varepsilon $ and $\delta $. Maximizing the r.h.s of (21), (22) over feasible domain of $\varepsilon $, $\delta $ provides lower-bound “optimistic” estimates of the neuron performance.

Remark 4

The term $\theta ^*$ in Theorem 2 and Corollary 2 is an upper bound for the firing threshold $\theta $. The larger is the value of $\theta $, the higher is the neuronal selectivity to multiple stimuli. The value of $\theta ^*$, however, decays with the number of stimuli m.

The extent to which the decay mentioned in Remark 4 affects neuronal selectivity to a group of stimuli depends largely on the neuronal dimension, n. Note also that the probability of neuronal selective response to multiple stimuli, as provided by Theorem 2, can be much larger if elements of the set ${\mathcal {Y}}$ are spatially close to each other or positively correlated (Tyukin et al. 2017) (see also Lemma 4 in “Appendix F”).

Remark 5

Similarly to the case considered in Corollary 1, the maximal size of the stimuli set ${\mathcal {S}}({\mathcal {M}})$ for which selective response is ensured, with some fixed probability, grows exponentially with dimension n. Indeed, denoting $\phi =(1-z)^{\overline{M}}$, letting $z=1/2 \varDelta ^{n/2}$ (with $\varDelta $ defined in Theorem 2) and invoking (34), (35) from the proof of Corollary 1, we observe that

$$\begin{aligned} \overline{M}>-\ln (\phi )(z^{-1}-1)=-\ln (\phi ) (2 e^{\beta n}-1), \ \beta =\ln \frac{1+D}{\sqrt{(1+D)^2-\theta ^2}}. \end{aligned}$$

Thus, for $M=|{\mathcal {S}}({\mathcal {M}})|\le \overline{M}$, the r.h.s. of (21) is bounded from below by

$$\begin{aligned} (1-(1-\varepsilon )^n)^{m}\prod _{d=1}^{m-1} \left( 1-d \left( 1 -\delta ^2\right) ^{\frac{n}{2}} \right) \phi . \end{aligned}$$

Similar estimate can be provided for the case considered in Corollary 2.

To illustrate Theorem 2 we conducted several numerical experiments. For each n we generated $M=10^3$ of background information items $\varvec{x}_i$ (the set ${\mathcal {M}}$) and $m=2, 5, 8$ relevant vectors (the sets ${\mathcal {Y}}$). In the first group of experiments all $M+m$ i.i.d. random vectors were chosen from the equidistribution in $B_n(1)$. Neuronal parameters were set in accordance with Theorem 2 (i.e., Eqs. 19–21). Figure 6a illustrates the results.

Similarly to the case of neuronal selectivity to a single item (Fig. 5a), we observe a steep growth of the selectivity index with the neuronal dimension. The sharp increase occurs, however, at significantly higher dimensions. The number of random and uncorrelated stimuli, m, to which a neuron should be able to respond selectively is fundamentally linked to the neuron dimensionality. For example, the probability that a neuron is selective to $m=5$ random stimuli becomes sufficiently high only at $n > 400$. This contrasts sharply with $n=120$ for $m=2$.

Our numerical experiments also show that the firing threshold specified in Theorem 2 for arbitrarily chosen fixed values of $\delta $ and $\varepsilon $ is not optimal in the sense of providing the best possible probability estimates. Playing with $\theta $ one can observe that the values of n at which neuronal selectivity to multiple stimuli starts to emerge are in fact significantly lower than those predicted by Eq. (22). This is not surprising. First, since estimate (22) holds for all admissible values of $\delta $ and $\varepsilon $, it should also hold for the maximizer of $p(\varepsilon ,\delta ,D,m)$. Second, the estimate is conservative in the sense that it is based on conservative estimates of the volume of spherical cups ${\mathcal {C}}_n$ (see, e.g., proof of Theorem 1). Deriving more accurate numerical expressions for the latter is possible, although at the expense of simplicity.

To demonstrate that dependence of the selectivity index on the firing threshold is likely to hold qualitatively for broader classes of distributions from which the sets ${\mathcal {M}}$ and ${\mathcal {Y}}$ are drawn, we repeated the simulation for the equidistribution in an n-cube centered at the origin. In this case, Theorem 2 does not formally apply. Yet, an equivalent statement can still be produced (cf. Gorban and Tyukin 2017). In these experiments synaptic weights were set to $\varvec{w}=\bar{\varvec{x}}/\Vert \bar{\varvec{x}}\Vert $ and $\theta = 0.5\Vert \bar{\varvec{x}}\Vert $. The results are shown in Fig. 6b. The neuron’s performance in the cube is markedly better than that of in $B_n(1)$. Interestingly, this is somewhat contrary to expectations that might have been induced by our earlier experiments (shown in Fig. 5) in which neuronal selectivity to a single stimulus was more pronounced for $B_n(1)$.

Overall, these results suggest that single neurons can indeed separate random uncorrelated information items from a large set of background items with probability close to one. This gives rise to a possibility for a neuron to respond selectively to various arbitrary uncorrelated information items simultaneously. The latter property provides a natural mechanism for accurate and precise grouping of stimuli in single neurons.

4.4 Dynamic Memory: Learning New Information Items by Association

In the previous sections we dealt with a static model of neuronal functions, i.e., when the synaptic efficiency $\varvec{w}$ either did not change at all or the changes were negligibly small over large intervals of stimuli presentation. In the presence of synaptic plasticity (10), the latter case corresponds to $0\le \alpha \ll 1$ in (10). In this section we explicitly account for the time evolution of the synaptic efficiency, $\varvec{w}(t,\varvec{w}_0)$ [Eq. (10)]. As we will see below, this may give rise to dynamic memories in single neurons.

As before, we will deal with two sets of stimuli, the relevant one, ${\mathcal {S}}({\mathcal {Y}})$, and the background one, ${\mathcal {S}}({\mathcal {M}})$. We will consider two time epochs: (i) Learning phase and (ii) Retrieval phase. Within the learning phase we assume that all stimuli from the set ${\mathcal {S}}({\mathcal {Y}})$ arrive to a neuron completely synchronized, i.e.,:

$$\begin{aligned} \tau _{M+1,j}=\tau _{M+2,j}=\cdots = \tau _{M+m,j}, \ \ \forall \ j. \end{aligned}$$

(23)

Such a synchronization could be interpreted as a mechanism for associating or grouping different uncorrelated information items for the purposes of memorizing them at a later stage.

The dynamics of the synaptic weights for $t\ge t_0$ is given Eq. (10) with the input signal $\varvec{s}$ replaced with:

$$\begin{aligned} \bar{\varvec{s}}(t)= \sum _{i=1}^m \varvec{s}_{M+i}(t). \end{aligned}$$

(24)

Let $\varvec{w}_0={\varvec{w}(t_0)}$ and $\theta $ satisfy the following condition:

$$\begin{aligned} \begin{aligned}&\exists \ \varvec{s}_k\in {\mathcal {S}}({\mathcal {Y}}) \ \text{ such } \text{ that } \ \varvec{s}_k\in \mathrm {Activated}({\mathcal {S}}({\mathcal {Y}}),\varvec{w}_0,\theta ) \\&\varvec{s}_i\in \mathrm {Silent}({{\mathcal {S}}},\varvec{w}_0,\theta ) \ \text{ for } \text{ all } \ \varvec{s}_i\in {{\mathcal {S}}}\setminus \{\varvec{s}_k\}. \end{aligned} \end{aligned}$$

(25)

Thus, at $t=t_0$ only one information item is “known” to the neuron. All other relevant items from the set ${\mathcal {Y}}$ are “new” in the sense that the neuron rejects them at $t=t_0$. Theorem 1 specifies the sets of neuronal parameters $\varvec{w}_0,\theta $ for which condition (25) holds with probability close to one if n is large enough.

The question is: What is the probability that, during the learning phase the synaptic weights $\varvec{w}(t,\varvec{w}_0)$ evolve in time so that the neuron becomes responsive to all $\varvec{s}_i\in {\mathcal {S}}({\mathcal {Y}})$ while remaining silent to all $\varvec{s}_i\in {\mathcal {S}}({\mathcal {M}})$ (Fig. 1C.3)? In other words, the neuron learns new items and recognizes them in the retrieval phase. The following theorem provides an answer to this question.

Theorem 3

Let elements of the sets ${\mathcal {M}}$ and ${\mathcal {Y}}$ be i.i.d. random vectors drawn from the equidistribution in $B_n(1)$. Consider the sets of stimuli ${\mathcal {S}}({\mathcal {M}})$ and ${\mathcal {S}}({\mathcal {Y}})$ specified by (7). Let (23) hold, the dynamics of neuronal synaptic weights satisfy (10), (24), and $(\varvec{w}_0, \theta )$ be chosen such that condition (25) is satisfied. Pick $\varepsilon ,\delta \in (0,1)$ such that

$$\begin{aligned} (1-\varepsilon )^3 > \delta (m-1). \end{aligned}$$

Moreover, suppose that

1.
There exist $L,\kappa >0$ such that
$$\begin{aligned} \int _{t}^{t+L} v(\bar{\varvec{s}}(\tau ),\varvec{w}(\tau ,{\varvec{w}_0}),\theta ) \langle \bar{\varvec{s}}(\tau ),\varvec{w}(\tau ,{\varvec{w}_0}) \rangle ^2 {d\tau } > \kappa , \ \ \forall \ t\ge t_0. \end{aligned}$$
2.
The firing threshold, $\theta $, satisfies
$$\begin{aligned} 0<\theta < \frac{(1-\varepsilon )^3 - \delta (m-1)}{\sqrt{m(1-\varepsilon )[(1-\varepsilon )+\delta (m-1)]}}={\theta ^*}. \end{aligned}$$

Then for, any $0<D\le \theta ^*-\theta $, there is $t_1(D)> t_0$ such that

$$ \begin{aligned} \begin{aligned}&P([{\mathcal {S}}({\mathcal {Y}})\in \mathrm {Activated}({\mathcal {S}},\varvec{w}(t,{\varvec{w}_0}),\theta )]\ \& \ [{\mathcal {S}}({\mathcal {M}})\in \mathrm {Silent}({\mathcal {S}},\varvec{w}(t,{\varvec{w}_0}),\theta )])\ge \\&\quad (1-(1-\varepsilon )^n)^{m}\prod _{d=1}^{m-1} \left( 1-d \left( 1 -\delta ^2\right) ^{\frac{n}{2}} \right) \left[ 1 - \frac{1}{2}\left( 1-\frac{\theta ^2}{(1+D)^2}\right) ^\frac{n}{2}\right] ^{M} \end{aligned} \end{aligned}$$

for all $t\ge t_1(D)$.

The proof is provided in “Appendix E”.

Figure 7 illustrates the theorem numerically. First we assumed that the relevant set ${\mathcal {Y}}$ consists of $m =2$ items. One of them is considered as “known” to the neuron (Fig. 7a, green). Its informational content, $\varvec{x}_{M+1}$, satisfies the condition $\langle \varvec{w}_0, \varvec{x}_{M+1}\rangle >\theta $, i.e., this stimulus evokes membrane potential above the threshold at $t=t_0$. Consequently, the neuron detects this stimulus selectively as described in Sect. 4.1. For the second relevant stimulus (Fig. 7a, orange), however, we have $\langle \varvec{w}_0,\varvec{x}_{M+2} \rangle < \theta $. Therefore, the neuron cannot detect such a stimulus alone. The background stimuli from the set ${\mathcal {S}}({\mathcal {M}})$ are also sub-threshold (Fig. 7a, back curves).

During the learning phase, the neuron receives $M=500$ background and $m=2$ relevant stimuli. The relevant stimuli from the set ${\mathcal {S}}({\mathcal {Y}})$ appear simultaneously, i.e., they are temporarily associated. The synaptic efficiency changes during the learning phase by action of the relevant stimuli. Therefore, the membrane potential, $y(t) = \langle \varvec{w}(t,{\varvec{w}_0}),\bar{\varvec{s}}(t) \rangle $, progressively increases when the relevant stimuli arrive (Fig. 7a, green area). These neuronal adjustments give rise to a new functionality.

At some time instant (marked by red circle in Fig. 7a) the neuron becomes responsive to the new relevant stimulus (Fig. 7a, orange), which is synchronized with the “known” one. Note that all other background stimuli that show no temporal associativity remain below the threshold (Fig. 7a, black traces). Thus, after a transient period, the neuron learns new stimulus. Once the learning is over, the neuron detects selectively either of the two relevant stimuli.

The procedure just described can be used to associate together more than two relevant stimuli. Figure 7b shows examples for $m=4$ and $m=12$. In both cases the neuron was able to learn all relevant stimuli, while rejecting all background ones. We observed, however, that increasing the number of uncorrelated information items to be learnt, i.e., the value of m, reduces the gap between firing thresholds and the membrane potentials evoked by background stimuli. In other words, the neuron does detect the assigned group of new stimuli, but with lower accuracy. This behavior is consistent with the theoretical bound on $\theta $ prescribed in the statement of Theorem 3.

5 Discussion

Theorems 1–3 and our numerical simulations demonstrate that the extreme neuronal selectivity to single and multiple stimuli, and the capability to learn uncorrelated stimuli observed in a range of empirical studies Quiroga et al. (2005), Viskontas et al. (2009), Ison et al. (2015) can be explained by simple functional mechanisms implemented in single neurons. The following basic phenomenological properties have been used to arrive to this conclusion: (i) the dimensionality n of the information content and neurons is sufficiently large, (ii) a perceptron neuronal model, Eq. (9), is an adequate representation of the neuronal response to stimuli, and (iii) plasticity of the synaptic efficiency is governed by Hebbian rule (10). A crucial consequence of our study is that no a priori assumptions on the structural organization of neuronal ensembles are necessary for explaining basic concepts of static and dynamic memories.

Our approach does not take into account more advanced neuronal behaviors reproduced by, e.g., models of spike-timing-dependent plasticity (Markram et al. 1997) and firing threshold adaptation (Fontaine et al. 2014). Nevertheless, our model captures essential properties of neuronal dynamics and as such is generic enough for the purpose of functional description of memories.

Firing threshold adaption, as reported in Fontaine et al. (2014), steers firing activity of a stimulated neuron to a homeostatic state. In this state, the value of the threshold is just large/small enough to maintain reasonable firing rate without over/under-excitation. In our model, such a mechanism could be achieved by setting the value of $\theta $ sufficiently close to the highest feasible values specified in Theorems 1 and 2.

In addition to rather general model of neuronal behavior, another major theoretical assumption of our work was the presumption that stimuli informational content is drawn from an equidistribution in a unit ball $B_n(1)$. This assumption, however, can be relaxed, and results of Theorems 1–3 generalized to product measures. Key ingredients of such generalizations are provided in Gorban and Tyukin (2017), and their practical feasibility is illustrated by numerical simulations with information items randomly drawn from a hypercube (Figs. 5, 6, 7).

Our theoretical and numerical analysis revealed an interesting hierarchy of cognitive functionality implementable at the level of single neurons. We have shown that cognitive functionality develops with the dimensionality or connectivity parameter n of single neurons. This reveals explicit relationships between levels of the neural connectivity in living organisms and different cognitive behaviors such organisms can exhibit (cf. Lobov et al. 2017). As we can see from Theorems 1, 2 and Figs. 5 and 6, the ability to form static memories increases monotonically with n. The increase in cognitive functionality, however, occurs in steps.

For n small ($n\in [1,10]$), neuronal selectivity to a single stimulus does not form. It emerges rapidly when the dimension parameter n exceeds some critical value, around $n=10\div 20$ (see Fig. 5a). This constitutes the first critical transition. Single neurons become selective to single information items. The second critical transition occurs at significantly larger dimensions, around $n=100{-}400$ (see Fig. 6). At this second stage the neuronal selectivity to multiple uncorrelated stimuli develops. The ability to respond selectively to a given set of multiple uncorrelated information items is apparently crucial for rapid learning “by temporal association” in such neuronal systems. This learning ability as well as formation of dynamic memories are justified by Theorem 3 and illustrated in Fig. 7.

In the core of our mathematical arguments are the concentration of measure phenomena exemplified in Gorban et al. (2016), Gorban and Tyukin (2018) and stochastic separation theorems (Gorban and Tyukin 2017; Gorban et al. 2016). Some of these results, which have been central in the proofs of Theorem 2 and 3, namely, the statements that random i.i.d. vectors from equidistributions in $B_n(1)$ and product measures are almost orthogonal with probability close to one, are tightly related to the notion of effective dimensionality of spaces based on $\epsilon $-quasiorthogonality introduced in Hecht-Nielsen (1994), Kainen and Kurkova (1993). In these works the authors demonstrated that in high dimensions there exist exponentially large sets of quasiorthogonal vectors. Gorban et al. (2016), however, as well as in our current work (see Lemma 3) we demonstrated that not only such sets exist, but also that they are typical.

Finally, we note that the number of multiple stimuli that can be selectively detected by single neurons is not extraordinarily large. In fact, as we have shown in Figs. 6 and 7, memorizing 8 information items at the level of single neurons requires more than 400 connections. This suggests that not only new memories are naturally packed in quanta, but also that there is a limit on this number that is associated with the cost of implementation of such a functionality. This cost is the number of individual functional synapses. Balancing the costs in living beings is of course a subject of selection and evolution. Nevertheless, as our study has shown, there is a clear functional gain that these costs may be paid for.

6 Conclusion

In this work we analyzed the striking consequences of the abundance of signalling routes for functionality of neural systems. We demonstrated that complex cognitive functionality derived from extreme selectivity to external stimuli and rapid learning of new memories at the level of single neurons can be explained by the presence of multiple signalling routes and simple physiological mechanisms. At the basic level, these mechanisms can be reduced to a mere perceptron-like behavior of neurons in response to stimulation and a Hebbian-type learning governing changes of the synaptic efficiency.

The observed phenomenon is robust. Remarkably, a simple generic model offers a clear-cut mathematical explanation of a wealth of empirical evidence related to in vivo recordings of “Grandmother” cells, “concept” cells, and rapid learning at the level of individual neurons (Quiroga et al. 2005; Viskontas et al. 2009; Ison et al. 2015). The results can also shed light on the question why Hebbian learning may give rise to neuronal selectivity in prefrontal cortex (Lindsay et al. 2017) and explain why adding single neurons to deep layers of artificial neural networks is an efficient way to acquire novel information while preserving previously trained data representations (Draelos et al. 2016).

Finding simple laws explaining complex behaviors has always been the driver of progress in Mathematical Biology and Neuroscience. Numerous examples of such simple laws can be found in the literature (see, e.g., Roberts et al. 2014; Jurica et al. 2013; Gorban et al. 2016; Perlovsky 2006). Our results not only provide a simple explanation of the reported empirical evidence but also suggest that such a behavior might be inherent to neuronal systems and hence organisms that operate with high-dimensional informational content. In such systems, complex cognitive functionality at the level of elementary units, i.e., single neurons, occurs naturally. The higher the dimensionality, the stronger the effect. In particular, we have shown that the memory capacity in ensembles of single neurons grows exponentially with the neuronal dimension. Therefore, from the evolutionary point of view, accommodating large number of signalling routes converging onto single neurons is advantageous despite the increased metabolic costs.

The considered class of neuronal models, being generic, is of course a simplification. It does not capture spontaneous firing, signal propagation in dendritic trees, and many other physiologically relevant features of real neurons. Moreover, in our theoretical assessments we assumed that the informational content processed by neurons is sampled from an equidistribution in a unit ball. The results, however, can already be generalized to product measure distributions (see, e.g., Gorban and Tyukin 2017). Generalizing the findings to models offering better physiological realism is the focus of our future works.

References

Andersen P, Morris R, Amaral D, Bliss T, O’Keefe J (eds) (2007) The hippocampus book. Oxford University Press, Oxford
Google Scholar
Amaral DG, Witter MP (1989) The three-dimensional organization of the hippocampal formation: a review of anatomical data. Neuroscience 31:571–591
Google Scholar
Amit DJ, Gutfreund H, Sompolinsky H (1985) Storing infinite numbers of patterns in a spin-glass model of neural networks. Phys Rev Lett 55:1530–1533
Google Scholar
Barrett Lisa Feldman, Tugade Michele M, Engle Randall W (2004) Individual differences in working memory capacity and dual-process theories of the mind. Psychol Bull 130(4):553
Google Scholar
Benito N, Fernandez-Ruiz A, Makarov VA, Makarova J, Korovaichuk A, Herreras O (2014) Spatial modules of coherent activity in pathway-specific lfps in the hippocampus reflect topology and different modes of presynaptic synchronization. Cereb Cortex 11(7):1738–1752
Google Scholar
Benito N, Martin-Vazquez G, Makarova J, Makarov VA, Herreras O (2016) The right hippocampus leads the bilateral integration of gamma-parsed lateralized information. eLife 5:e16658. https://doi.org/10.7554/eLife.16658
Article Google Scholar
Calvo C, Villacorta-Atienza JA, Mironov VI, Gallego V, Makarov VA (2016) Waves in isotropic totalistic cellular automata: application to real-time robot navigation. Adv Complex Syst 19(4):1650012–18
MathSciNet Google Scholar
Clark DD, Sokoloff L (1999) Circulation and energy metabolism of the brain. In: Siegel GJ, Agranoff BW, Albers RW, Fisher SK, Uhler MD (eds) Basic neurochemistry: molecular. Cellular and medical aspects. Lippincott, Philadelphia, pp 637–670
Google Scholar
Cucker F, Smale S (2002) On the mathematical foundations of learning. Bull Am Math Soc 39(1):1–49
MathSciNet MATH Google Scholar
Draelos TJ, Miner NE, Lamb CC, Vineyard CM, Carlson KD, James CD, Aimone JB (2016) Neurogenesis deep learning. arXiv preprint arXiv:1612.03770
Fernandez-Ruiz A, Makarov VA, Herreras O (2012) Sustained increase of spontaneous input and spike transfer in the ca3-ca1 pathway following long term potentiation in vivo. Front Neural Circuits 6:71
Google Scholar
Finnerty CT, Jefferys JGR (1993) Functional connectivity from ca3 to the ipsilateral and contralateral ca1 in the rat dorsal hippocampus. Neuroscience 56(1):101
Google Scholar
Fontaine B, Peña JL, Brette R (2014) Spike-threshold adaptation predicted by membrane potential dynamics in vivo. PLoS Comput Biol 10(4):e1003560
Google Scholar
Gorban AN, Tyukin IY, Romanenko I (2016) The blessing of dimensionality: Separation theorems in the thermodynamic limit. IFAC-PapersOnLine 49(24):64–69, 2016. 2th IFAC Workshop on Thermodynamic Foundations for a Mathematical Systems Theory TFMST 2016
Google Scholar
Gorban AN, Tyukin IY (2018) Blessing of dimensionality: mathematical foundations of the statistical physics of data. Phiolosphical Trans R Soc A. https://doi.org/10.1098/rsta.2017.0237
MathSciNet Google Scholar
Gorban AN, Tyukin IY (2017) Stochastic separation theorems. Neural Netw 94:255–259
Google Scholar
Gorban AN, Tyukin IYu, Prokhorov DV, Sofeikov KI (2016) Approximation with random bases: pro et contra. Inf Sci 364–365:129–145
Google Scholar
Gorban AN, Tyukina TA, Smirnova EV, Pokidysheva LI (2016) Evolution of adaptation mechanisms: adaptation energy, stress, and oscillating death. J Theor Biol 405:127–139
MathSciNet MATH Google Scholar
Hecht-Nielsen R (1994) Context vectors: general-purpose approximate meaning representations self-organized from raw data. In: Zurada J, Marks R, Robinson C (eds) Computational intelligence: imitating life. IEEE Press, London
Google Scholar
Herculano-Houzel S (2009) The human brain in numbers: a linearly scaled-up primate brain. Front Hum Neurosci 3:31
Google Scholar
Herculano-Houzel S (2011) Gorilla and orangutan brains conform to the primate cellular scaling rules: implications for human evolution. Brain Behav Evol 77:33–44
Google Scholar
Herculano-Houzel S (2012) The remarkable, yet not extraordinary, human brain as a scaled-up primate brain and its associated cost. Proc Nat Acad Sci 109:10661–10668
Google Scholar
Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Nat Acad Sci 79(8):2554–2558
MathSciNet MATH Google Scholar
Ishizuka N, Weber J, Amaral DG (1990) Organization of intrahippocampal projections riginating from ca3 pyramidal cells in the rat. J Comp Neurol 295(580–623):580
Google Scholar
Ison MJ, Quiroga R Quian, Fried I (2015) Rapid encoding of new memories by individual neurons in the human brain. Neuron 87(1):220–230
Google Scholar
Jurica P, Gepshtein S, Tyukin I, van Leeuwen C (2013) Sensory optimization by stochastic tuning. Psychol Rev 120(4):798–816
Google Scholar
Kainen PC, Kurkova V (1993) Quasiorthogonal dimension of euclidian spaces. Appl Math Lett 6(3):7–10
MathSciNet MATH Google Scholar
Khalil H (2002) Nonlinear systems, 3rd edn. Prentice Hall, Upper Saddle River
MATH Google Scholar
Leung Chi-Sing, Chan Lai-Wan, Lai Edmund (1995) Stability, capacity, and statistical dynamics of second-order bidirectional associative memory. IEEE Trans Syst Man Cybernet 25(10):1414–1424
Google Scholar
Li XG, Somogyi P, Ylinen A, Buzsaki G (1994) The hippocampal ca3 network: an in vivo intracellular labeling study. J Comp Neurol 339:181–208
Google Scholar
Lindsay GW, Rigotti M, Warden MR, Miller EK, Fusi S (2017) Hebbian learning in a random network captures selectivity properties of prefrontal cortex. bioRxiv, p 133025
Lobov SA, Zhuravlev MO, Makarov VA, Kazantsev VB (2017) Noise enhanced signaling in stdp driven spiking-neuron network. Math Model Nat Phenom 12(4):109–124
MathSciNet MATH Google Scholar
Markram H, Lubke J, Frotscher M, Sakmann B (1997) Regulation of synaptic efficacy by coincidence of postsynaptic aps and epsps. Science 275(5297):213–215
Google Scholar
Oja E (1982) A simplified neuron model as a principal component analyzer. J Mathe Biol 15:267–273
MathSciNet MATH Google Scholar
Perlovsky LI (2006) Toward physics of the mind: concepts, emotions, consciousness, and symbols. Phys Life Rev 3(1):23–55
MathSciNet Google Scholar
Platek M, Keenan JP, Shackelford T K (2007) Evolutionary cognitive neuroscience. MIT Press, Cambridge
Google Scholar
Quiroga R Quian (2012) Concept cells: the building blocks of declarative memory functions. Nat Rev Neurosci 13(8):587–597
Google Scholar
Quiroga R Quian, Reddy L, Kreiman G, Koch C, Fried I (2005) Invariant visual representation by single neurons in the human brain. Nature 435(7045):1102–1107
Google Scholar
Reimann MW, Nolte M, Scolamiero M, Turner K, Perin R, Chindemi G, Dlotko P, Levi R, Hess K, Markram H (2017) Cliques of neurons bound into cavities provide a missing link between structure and function. Front Comput Neurosci 11:48
Google Scholar
Roberts A, Conte D, Hull M, Merrison-Hort R, al Azad AK, Buhl E, Borisyuk R, Soffe SR (2014) Can simple rules control development of a pioneer vertebrate neuronal network generating behavior? J Neurosci 34(2):608–621
Google Scholar
Rosenblatt F (1962) Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Spartan Books, Sparta
MATH Google Scholar
Sherwood CC, Bauernfeind AL, Bianchi S, Raghanti MA, Hof PR (2012) Human brain evolution writ large and small. Prog Brain Res 195:237–254
Google Scholar
Sousa AM, Meyer KA, Santpere G, Gulden FO, Sestan N (2017) Evolution of the human nervous system function, structure, and development. Cell 170(2):226–247
Google Scholar
Tyukin IY, Gorban AN, Sofeikov K, Romanenko I (2017) Knowledge transfer between artificial intelligence systems. arXiv preprint arXiv:1709.01547
Vapnik V, Chapelle O (2000) Bounds on error expectation for support vector machines. Neural Comput 12(9):2013–2036
Google Scholar
Villacorta-Atienza JA, Makarov VA (2013) Neural network architecture for cognitive navigation in dynamic environments. IEEE Trans Neural Netw Learn Syst 24(12):2075–2087
Google Scholar
Villacorta-Atienza JA, Calvo C, Makarov VA (2015) Prediction-for-compaction: navigation in social environments using generalized cognitive maps. Biol Cybernet 109(3):307–320
MATH Google Scholar
Villacorta-Atienza JA, Calvo C, Lobov S, Makarov VA (2017) Limb movement in dynamic situations based on generalized cognitive maps. Math Model Nat Phenom 12(4):15–29
MathSciNet MATH Google Scholar
Viskontas IV, Quiroga R Quian, Fried I (2009) Human medial temporal lobe neurons respond preferentially to personally relevant images. Proc Nat Acad Sci 106(50):21329–21334
Google Scholar
Wittner L, Henze DA, Zaborszky L, Buzsaki G (2007) Three-dimensional reconstruction of the axon arbor of a ca3 pyramidal cell recorded and filled in vivo. Brain Struct Funct 212(1):75–83
Google Scholar

Download references

Acknowledgements

This work has been supported by Innovate UK Grants KTP009890 and KTP010522, by the Spanish Ministry of Economy and Competitiveness under Grant FIS2014-57090-P, the Russian Federation Ministry of Education state assignment (No. 8.2080.2017/4.6), “Initiative scientific project” of the main part of the state plan of the Ministry of Education and Science of Russian Federation (Task No. 2.6553.2017/BCH Basic Part), and by the Russian Science Foundation Project 15-12-10018 (numerical assessment and results). Alexander N. Gorban was supported by the Ministry of Education and Science of Russian Federation (Project No. 14.Y26.31.0022).

Author information

Authors and Affiliations

Department of Mathematics, University of Leicester, University Road, Leicester, LE1 7RH, UK
Ivan Tyukin & Alexander N. Gorban
Saint-Petersburg State Electrotechnical University, Prof. Popova Str. 5, Saint Petersburg, Russia
Ivan Tyukin
Instituto de Matemática Interdisciplinar, Faculty of Mathematics, Universidad Complutense de Madrid, Avda Complutense s/n, 28040, Madrid, Spain
Carlos Calvo & Valeri A. Makarov
Department of Translational Neuroscience, Cajal Institute, CSIC, Madrid, Spain
Julia Makarova
Lobachevsky State University of Nizhny Novgorod, Gagarin Ave. 23, Nizhny Novgorod, Russia, 603950
Julia Makarova & Valeri A. Makarov

Authors

Ivan Tyukin
View author publications
You can also search for this author in PubMed Google Scholar
Alexander N. Gorban
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Calvo
View author publications
You can also search for this author in PubMed Google Scholar
Julia Makarova
View author publications
You can also search for this author in PubMed Google Scholar
Valeri A. Makarov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan Tyukin.

Appendices

A Dynamics of Coupling Weights

The following results demonstrate that the neuronal model provided in Sect. 3 is well-posed.

Lemma 1

Consider (9), (10) with the function $\varvec{s}(\cdot ,\varvec{x})$, $\varvec{x}\in \mathbb {R}^n$ defined as in (2). Then

(1)
solutions $\varvec{w}(\cdot ,\varvec{w}_0)$ of (10) are defined for all $t\ge t_0$ and are unique and bounded in forward time.

If, in addition, $\theta \ge 0$ and there exist numbers $L,\delta >0$ such that:

$$\begin{aligned} \int _{t}^{t+L} {v(\varvec{s}(\tau ,\varvec{x}),\varvec{w}(\tau ,\varvec{w}_0),\theta ) \langle \varvec{s}(\tau ,\varvec{x}),\varvec{w}(\tau ,\varvec{w}_0) \rangle ^2} \, \mathrm {d}\tau > \delta , \ \ \ \forall t\ge t_0, \end{aligned}$$

(26)

then

(2)
$\varvec{x}/\Vert \varvec{x}\Vert $ is an attractor, that is:
$$\begin{aligned} \lim _{t\rightarrow \infty } \varvec{w}(t,\varvec{w}_0)=\frac{\varvec{x}}{\Vert \varvec{x}\Vert }. \end{aligned}$$
(27)

Proof of Lemma 1

(1)
The right-hand side of (10) is continuous in $\varvec{w}$ and piece-wise continuous in t with finite number of discontinuities of the first kind in any finite interval containing $t_0$, independently on the values of $\varvec{w}$. Hence, in accordance with Peano Theorem, solutions of (10) are defined on some non-empty interval containing $t_0$. Let $\mathcal {T}$ be the maximal interval of this solution’s definition (to the right of $t_0$). Since the right-hand side of (10) is locally Lipschitz in $\varvec{w}$, the solution $\varvec{w}(\cdot ,\varvec{w}_0)$ is uniquely defined on $\mathcal {T}$.
To show that $\mathcal {T}=[t_0,\infty )$ consider
$$\begin{aligned} V(\varvec{w})=1-\Vert \varvec{w}\Vert ^2. \end{aligned}$$
In the interval $\mathcal {T}$ we have:
$$\begin{aligned} \dot{V}= -2 \alpha v y^2 V. \end{aligned}$$
Given that $vy^2 \ge 0$, the above expression implies that
$$\begin{aligned} |1 - \Vert \varvec{w}_0\Vert ^2| \ge |1-\Vert \varvec{w}(t,\varvec{w}_0)\Vert ^2|\ge \Vert \varvec{w}(t,\varvec{w}_0)\Vert ^2 - 1. \end{aligned}$$
Consequently,
$$\begin{aligned} \Vert \varvec{w}(t,\varvec{w}_0)\Vert \le \left( 1 + |1 - \Vert \varvec{w}_0\Vert ^2|\right) ^{\frac{1}{2}} \end{aligned}$$
(28)
for all $t\ge t_0$, $t\in \mathcal {T}$. Let $t_1$ be an arbitrary point in the interval $\mathcal {T}$. Recall that the right-hand side of (10) is continuous and locally Lipschitz with respect to $\varvec{w}$ (uniformly in t). Thus (28) implies existence of some $\varDelta (\varvec{w}_0,\varvec{x})>0$, independent on $t_1$, such that the solution $\varvec{w}(\cdot ,\varvec{w}_0)$ is defined on the interval $[t_0,t_1+\varDelta (\varvec{w}_0,\varvec{x})]$. Given that $t_1$ was chosen arbitrarily in $\mathcal {T}$, we can conclude that $\mathcal {T}=[t_0,\infty )$ (cf. Theorem 3.3 Khalil 2002).
(2)
For the sake of convenience, we denote
$$\begin{aligned} p(t)= v(\varvec{s}(t,\varvec{x}),\varvec{w}(t,\varvec{w}_0),\theta ) \langle \varvec{s}(t,\varvec{x}),\varvec{w}(t,\varvec{w}_0) \rangle ^2. \end{aligned}$$
Condition (26) assures that both $\varvec{x}\ne 0$, $\varvec{w}_0\ne 0$. Moreover, since $V(\varvec{w}(t,\varvec{w}_0))$ is defined for all $t\ge t_0$, we can conclude that
$$\begin{aligned} {|V(t)| = \left| V_0 e^{-{2}\alpha \int _{t_0}^{t} p(\tau ) \mathrm {d}\tau }\right| \le |V_0| e^{-{2}\alpha \delta \left\lfloor \frac{t-t_0}{L}\right\rfloor }.} \end{aligned}$$
Hence
$$\begin{aligned} \lim _{t\rightarrow \infty } \Vert \varvec{w}(t,\varvec{w}_0)\Vert =1. \end{aligned}$$
(29)
Consider:
$$\begin{aligned} \varvec{w}(t,\varvec{w}_0)= & {} e^{-\alpha \int _{t_0}^{t} p(\tau ) \mathrm {d}\tau } \varvec{w}_0 \\&+\,\alpha \left[ \int _{t_0}^{t} e^{-\alpha \int _{\tau }^{t} p(s) \mathrm {d}s } v(\varvec{s}(\tau ,\varvec{x}),\varvec{w}(\tau ,\varvec{w}_0),\theta ) \langle \varvec{s}(\tau ,\varvec{x}),\varvec{w}(\tau ,\varvec{w}_0) \rangle \right. \\&\left. \sum _j c(\tau -\tau _j)\, \mathrm {d}\tau \right] \varvec{x}. \end{aligned}$$
Observe that the first term decays exponentially to 0, whereas the second term is proportional to $\varvec{x}$. Moreover, since $\theta \ge 0$, the term $v(\varvec{s}(\tau ,\varvec{x}),\varvec{w}(\tau ,\varvec{w}_0),\theta )\langle \varvec{s}(\tau ,\varvec{x}),\varvec{w}(\tau ,\varvec{w}_0) \rangle \ge 0$ for all $\tau \ge t_0$. Hence the coefficient in front of $\varvec{x}$ is non-negative. This, combined with (29), implies that (27) holds. $\square $

Note that Lemma 1 applies to stimuli classes that are broader than the one defined by (2), (3). The results hold, e.g., for the functions $c(\cdot )$ in (2) that are non-negative, piece-wise continuous, and bounded. On the other hand, to determine convergence and asymptotic properties of $\varvec{w}(\cdot ,\varvec{w}_0)$ for $t\ge t_0$ (part 2 of the lemma) one needs to check that condition (26) holds. A drawback of this condition is that it requires availability of signals $v(\varvec{s}(t,\varvec{x}),\varvec{w}(t,\varvec{w}_0),\theta )$, $\langle \varvec{s}(t,\varvec{x}),\varvec{w}(t,\varvec{w}_0) \rangle $ for all $t\ge t_0$.

For $c(\cdot )$ specified by (2) this latter condition can be drastically simplified. To see this, let us get a somewhat deeper geometrical insight into the dynamics of $\varvec{w}$ governed by (10). In order to bring the discussion in line with the question of neuronal selectivity, consider the stimuli sets (6), (7) with ${\mathcal {Y}}=\{\varvec{x}_{M+1}\}$, and suppose that stimuli $\varvec{s}(\cdot ,\varvec{x}_i)$, $i=1,\ldots ,M$ do not evoke any neuronal responses, i.e., $v(\varvec{s}(\cdot ,\varvec{x}_i),\varvec{w}(\cdot ,\varvec{w}_0),\theta ) = 0$ for all $i=1,\ldots ,M$. Hence no changes in $\varvec{w}$ occur if the stimulus $\varvec{s}$ in (10) is any of $\varvec{s}(\cdot ,\varvec{x}_i)$, $i=1,\ldots ,M$.

Consider system (10) with $\varvec{s}(\cdot ,\varvec{x}_{M+1})$. The variable $\varvec{w}$ may change only over those intervals of t when $\varvec{s}(\cdot ,\varvec{x}_{M+1})\ne 0$. Between these intervals $\varvec{w}(t,\varvec{w}_0)$ is constant. Let the stimulus be persistent in the sense that for any $t'\ge t_0$ there is a $t''$ such that $\varvec{s}(t'',\varvec{x}_{M+1})\ne 0$. Thus, without loss of generality and for the purposes of assessing asymptotic behavior of $\varvec{w}(t,\varvec{w}_0)$ at $t\rightarrow \infty $ variable $\varvec{s}(t,\varvec{x}_{M+1})$ in (8)–(10) may be replaced with $\varvec{x}_{M+1}$.

Recall that $\varvec{w}(t,\varvec{w}_0)$ can be represented as a sum

$$\begin{aligned} \varvec{w}(t,\varvec{w}_0)={w^{*}}(t,\varvec{w}_0) \varvec{w}^*+ \varvec{w}^{\bot }(t,\varvec{w}_0), \ {w^{*}}(t,\varvec{w}_0)=\langle \varvec{w}(t,\varvec{w}_0),\varvec{w}^*\rangle , \end{aligned}$$

where $\varvec{w}^*$ is defined in (12) and $\varvec{w}^{\bot }\in L^{\bot }$. In this representation,

$$\begin{aligned} \begin{aligned}&\dot{\varvec{w}}= \dot{w}^{*} \varvec{w}^*+ \dot{\varvec{w}}^{\bot }= \alpha f(\langle \varvec{x}_{M+1}, {w^{*}} \varvec{w}^*+ \varvec{w}^{\bot } \rangle - \theta ) \langle \varvec{x}_{M+1}, {w^{*}} \varvec{w}^*+ \varvec{w}^{\bot } \rangle (\varvec{x}_{M+1} -\\&\langle \varvec{x}_{M+1}, {w^{*}} \varvec{w}^*+ \varvec{w}^{\bot } \rangle [{w^{*}} \varvec{w}^*+ \varvec{w}^{\bot }] ) = \left[ \alpha f({w^{*}}\Vert \varvec{x}_{M+1}\Vert -\theta )\Vert \varvec{x}_{M+1}\Vert ^2(1 - {w^{*}}^2)\right] \\&{w^{*}}\varvec{w}^*-\left[ \alpha f ({w^{*}}\Vert \varvec{x}_{M+1}\Vert -\theta ) \Vert \varvec{x}_{M+1}\Vert ^2 {w^{*}}^2\right] \varvec{w}^{\bot } \end{aligned} \end{aligned}$$

or, equivalently,

$$\begin{aligned} \dot{w}^*= & {} \alpha \Vert \varvec{x}_{M+1}\Vert ^2 f({w^{*}}\Vert \varvec{x}_{M+1}\Vert -\theta )(1 - {w^{*}}^2) {w^{*}} \end{aligned}$$

(30)

$$\begin{aligned} \dot{\varvec{w}}^{\bot }= & {} - \left[ \alpha f ({w^{*}}\Vert \varvec{x}_{M+1}\Vert -\theta ) \Vert \varvec{x}_{M+1}\Vert ^2 {w^{*}}^2\right] \varvec{w}^{\bot } . \end{aligned}$$

(31)

Obviously, $L^{\Vert }$, $L^{\bot }$, and the set

$$\begin{aligned} {\mathcal {W}}(\varvec{x}_{M+1},\theta )=\{({w^{*}},\varvec{w}^{\bot }), \ {w^{*}}\in \mathbb {R}, \varvec{w}^{\bot }\in L^{\bot }\ | {w^{*}} \Vert \varvec{x}_{M+1}\Vert - \theta \le 0 \} \end{aligned}$$

are invariant with respect to (10). Let $\varvec{x}_{M+1}\ne 0$, $\theta \ge 0$, and $\varvec{w}_0\notin {\mathcal {W}}(\varvec{x}_{M+1},\theta )$. Then two non-trivial alternatives (Fig. 8) are possible:

A: If $\theta < \Vert \varvec{x}_{M+1}\Vert $ then ${w^{*}}(t,\varvec{w}_0) \rightarrow 1$ and, according to (31), $\varvec{w}^{\bot }(t,\varvec{w}_0)\rightarrow 0$ as $t\rightarrow \infty $. Thus,
$$\begin{aligned} \lim _{t\rightarrow \infty } \varvec{w}(t) = \frac{\varvec{x}_{M+1}}{\Vert \varvec{x}_{M+1}\Vert } = \varvec{w}^*. \end{aligned}$$
B: If $\theta \ge \Vert \varvec{x}_{M+1}\Vert $ then ${w^{*}}(t,\varvec{w}_0) \rightarrow \theta /\Vert \varvec{x}_{M+1}\Vert $ as $t\rightarrow \infty $. There is no guarantee, however, that $\varvec{w}^{\bot }(t,\varvec{w}_0)$ converges to the origin asymptotically. Thus, there is a $\varvec{w}^{\bot }_{\infty }\in L^\bot $:
$$\begin{aligned} \lim _{t\rightarrow \infty } \varvec{w}(t) = \frac{\theta }{\Vert \varvec{x}_{M+1}\Vert }\varvec{w}^*+ {\varvec{w}^{\bot }_{\infty }}. \end{aligned}$$

The above result can now be formalized as

Lemma 2

Consider (9), (10) with the function $\varvec{s}(\cdot ,\varvec{x})$, $\varvec{x}\in \mathbb {R}^n$ defined as in (2). Let $\theta \ge 0$ and $\langle \varvec{w}_0,\varvec{x}\rangle >\theta $. Furthermore, let the stimulus $\varvec{s}(\cdot ,\varvec{x})$ be persistent in the sense that for any $t'\ge t_0$ there is a $t''>t'$ such that $\varvec{s}(t'',\varvec{x})\ne 0$. Then the following alternatives hold:

1)
If $\theta < \Vert \varvec{x}\Vert $ then $\lim _{t\rightarrow \infty }\varvec{w}(t,\varvec{w}_0)=\varvec{x}/\Vert \varvec{x}\Vert $.
2)
If $\theta \ge \Vert \varvec{x}\Vert $ then $\lim _{t\rightarrow \infty }\langle \varvec{w}(t,\varvec{w}_0),\varvec{x}/\Vert \varvec{x}\Vert \rangle =\theta /\Vert \varvec{x}\Vert $.

Note that alternative 1) in Lemma 2 is equivalent to the second statement of Lemma 1. Alternative 2) corresponds to the case when condition (26) of Lemma 1 is not satisfied. $\square $

B Proof of Theorem 1

1.
Let us first assume that $\Vert \varvec{w}\Vert =1$. Notice that the condition
$$\begin{aligned} \langle \varvec{w}, \varvec{x}_i \rangle \le \theta \ \ \ \forall \varvec{x}_i\in {\mathcal {M}}, \end{aligned}$$
(32)
assures that $v=0$ and hence $\varvec{s}_i \in \mathrm {Silent}({\mathcal {S}}({\mathcal {M}}),(\varvec{w},\theta )) \ \ \forall \varvec{s}_i\in {\mathcal {S}}({\mathcal {M}})$.

In this case the neuron is silent for all stimuli except $\varvec{s}_{M+1}$ that does evoke a response by construction. Therefore, it is sufficient to estimate the probability that (32) holds.

Let ${\mathcal {C}}_n(\varvec{w},\theta )$ be the spherical cap:
$$\begin{aligned} {\mathcal {C}}_n(\varvec{w},\theta )=\{ \varvec{x}\in B_n(1) \ | \ \langle \varvec{w}, \varvec{x}\rangle > \theta \}. \end{aligned}$$
Then the ratio of volumes ${\mathcal {V}}({\mathcal {C}}_n(\varvec{w},\theta ))/{\mathcal {V}}(B_n(1))$ is the probability that a random vector $\varvec{x}_i \in {\mathcal {C}}_n(\varvec{w},\theta )$. Observe that
$$\begin{aligned} \frac{{\mathcal {V}}({\mathcal {C}}_n(\varvec{w},\theta ))}{{\mathcal {V}}(B_n(1))}\le \frac{1}{2}(1-\theta ^2)^{\frac{n}{2}}. \end{aligned}$$
Thus, the probability that all $\varvec{x}_i\in {\mathcal {M}}$ are outside the cap ${\mathcal {C}}_n(\varvec{w},\theta )$ is bounded from below:
$$\begin{aligned} P = \left[ 1 - \frac{{\mathcal {V}}({\mathcal {C}}_n(\varvec{w},\theta ))}{{\mathcal {V}}(B_n(1))} \right] ^M \ge \left[ 1- \frac{1}{2}(1-\theta ^2)^{\frac{n}{2}}\right] ^M, \end{aligned}$$
(33)
which is equivalent to (15), given that $\Vert \varvec{w}\Vert =1$.

Let $\Vert \varvec{w}\Vert \ne 1$. Noticing that, for $\Vert \varvec{w}\Vert > 0$
$$\begin{aligned} \langle \varvec{w}, \varvec{x}_i \rangle \le \theta \ \ \ \forall \varvec{x}_i\in {\mathcal {M}} \ \Leftrightarrow \ \langle \varvec{w}/\Vert \varvec{w}\Vert , \varvec{x}_i \rangle \le \theta /\Vert \varvec{w}\Vert \ \ \ \forall \varvec{x}_i\in {\mathcal {M}}, \end{aligned}$$
and substituting $\theta /\Vert \varvec{w}\Vert $ in place of $\theta $ in (33) results in (15).
2.
Let us show that for $(\varvec{w},\theta )\in \varOmega _D$ the neuron detects the relevant stimulus $\varvec{s}_{M+1}$, i.e., $v>0$. Using (16) we observe that
$$\begin{aligned} \begin{aligned} \langle \varvec{w}, \varvec{x}_{M+1} \rangle - \theta&= \langle \varvec{w}- \varvec{w}^{*}, \varvec{x}_{M+1} \rangle + \Vert x_{M+1}\Vert - \theta \ge \langle \varvec{w}- \varvec{w}^{*}, \varvec{x}_{M+1} \rangle + D \ge \\&\ge -\Vert \varvec{w}- \varvec{w}^{*}\Vert \Vert \varvec{x}_{M+1}\Vert + D > D(1 - \Vert \varvec{x}_{M+1}\Vert )\ge 0, \end{aligned} \end{aligned}$$
implying that $\varvec{s}_{M+1}\in \mathrm {Activated}({\mathcal {Y}},(\varvec{w},\theta ))$.

Let us evaluate the probability that the neuron rejects all background stimuli for all $(\varvec{w},\theta )\in \varOmega _D$. According to (16) the following holds:
$$\begin{aligned} \frac{\theta }{\Vert \varvec{w}\Vert } \ge \frac{\Vert \varvec{x}_{M+1}\Vert - 2 D}{1 + D}, \ \ \ \forall (\varvec{w},\theta )\in \varOmega _D. \end{aligned}$$
Moreover, $\Vert \varvec{x}_{M+1}\Vert \ge 1-\varepsilon $ with probability $p=1-(1-\varepsilon )^n$. Therefore, with probability larger or equal to p, the ratio $\frac{\theta }{\Vert \varvec{w}\Vert }$ is bounded from below as:
$$\begin{aligned} \frac{\theta }{\Vert \varvec{w}\Vert } \ge \frac{1-\varepsilon - 2 D}{1 + D}. \end{aligned}$$
Finally, since the value of $\varepsilon $ can be chosen arbitrarily in the interval $(0,1-2D)$ and taking into account that the right-hand side of (33) is a monotone and increasing function with respect to $\theta $ in the interval [0, 1], estimate (17) immediately follows from (33) and (15).

$\square $

C Proof of Corollary 1

Consider (15) and denote

$$\begin{aligned} z = \frac{1}{2} \left[ 1 - \frac{\theta ^2}{\Vert \varvec{w}\Vert ^2} \right] ^\frac{n}{2}, \ \ \ \phi = (1-z)^{\overline{M}}. \end{aligned}$$

(34)

According to (34), $(1-z)^M\ge \phi $ for all $0<M\le \overline{M}$. Given that $z\in (0,1)$, from Eq. (34) we get $\ln (\phi )= \overline{M} \ln (1-z)$. Recall that $\ln (1-z) > -z/(1-z)$, $\forall z\in (0,1)$. Thus, we can conclude that

$$\begin{aligned} \overline{M} > -\ln (\phi ) \frac{1-z}{z} =-\ln (\phi )(z^{-1} -1)= -\ln (\phi )\left( 2 e^{\alpha n}- 1\right) , \end{aligned}$$

(35)

where $\alpha $ is given by (18). Thus, according to (35), for $0<M\le -\ln (\phi )\left( 2 e^{an}- 1\right) < \overline{M}$ the following holds

$$\begin{aligned} P( \varvec{s}_i \in \mathrm {Silent}({\mathcal {S}}({\mathcal {M}}),(\varvec{w},\theta )) \ \forall \varvec{s}_i\in {\mathcal {S}}({\mathcal {M}}) \big | \ \varvec{w},\theta ) \ge \phi . \end{aligned}$$

$\square $

D Proof of Theorem 2

The proof of the Theorem is essentially contained in Lemmas 3 and 4 (Sect. F). Consider the set ${\mathcal {Y}}$. With probability $(1-(1-\varepsilon )^n)^m$, all elements $\varvec{x}_i\in {\mathcal {Y}}$ satisfy the condition $\Vert \varvec{x}_i\Vert \ge 1-\varepsilon $. Hence, using Lemma 3 we have that the following inequality

$$\begin{aligned} |\langle \varvec{x}_i, \varvec{x}_j \rangle | \le \frac{\delta }{1-\varepsilon }, \ \ \forall \varvec{x}_i,\varvec{x}_j\in {\mathcal {Y}}, \ i\ne j \end{aligned}$$

holds with probability

$$\begin{aligned} p_0\ge (1-(1-\varepsilon )^n)^m \prod _{d=1}^{m-1} \left( 1-d(1-\delta ^2)^{\frac{n}{2}}\right) . \end{aligned}$$

This implies that, with probability $p_0$, the following conditions are met

$$\begin{aligned} \Vert \varvec{x}_i\Vert \ge 1-\varepsilon , \ \ \ -\frac{(m-1)\delta }{1-\varepsilon }\le \sum _{j=1, \ j\ne i}^m \langle \varvec{x}_i,\varvec{x}_j \rangle \le \frac{(m-1)\delta }{1-\varepsilon }, \ \ \forall \ \varvec{x}_i\in {\mathcal {Y}}. \end{aligned}$$

Consider $\ell (\varvec{x})=\langle \varvec{w}^{*}, \varvec{x}\rangle - \theta ^*+D$. Invoking Lemma 4 and setting $\beta _1=\delta /(1-\varepsilon )$, $\beta _2=-\delta /(1-\varepsilon )$, we can conclude that, with probability $p_0$,

$$\begin{aligned} \ell (\varvec{x})\ge D, \ \ \forall \varvec{x}\in {\mathcal {Y}}. \end{aligned}$$

In fact, we can conclude that with probability $p_0$

$$\begin{aligned}&\ell _0(\varvec{x})=\langle \varvec{w}, \varvec{x}\rangle - \theta = \ell (\varvec{x}) + \langle \varvec{w}-\varvec{w}^{*},\varvec{x}\rangle - \theta \\&\qquad \qquad +(\theta ^*-D) > 0, \ \ \forall \ (\varvec{w},\theta )\in \varOmega _D, \ \varvec{x}\in {\mathcal {Y}}. \end{aligned}$$

Thus, the probability that $\ell _0(\varvec{x})>0$ for all $\varvec{x}\in {\mathcal {Y}}$ and that $\ell _0(\varvec{x})\le 0$ for all $\varvec{x}\in {\mathcal {M}}$ is bounded from below by

$$\begin{aligned} (1-(1-\varepsilon )^n)^m \prod _{d=1}^{m-1} \left( 1-d(1-\delta ^2)^{\frac{n}{2}}\right) \left[ 1-\frac{1}{2}\left( 1-\frac{\theta ^2}{\Vert \varvec{w}\Vert ^2}\right) ^{\frac{n}{2}}\right] ^{M}. \end{aligned}$$

Noticing that $\Vert \varvec{w}\Vert \le 1+D$, we can conclude that (21) holds. $\square $

E Proof of Theorem 3

According to Lemma 1, solutions $\varvec{w}(t,\varvec{w}_0)$ are defined for all $t\ge t_0$. Moreover, condition 1 of the theorem and Lemma 1 imply that

$$\begin{aligned} \lim _{t\rightarrow \infty } \varvec{w}(t,{\varvec{w}_0})=\frac{\sum _{i=1}^m \varvec{x}_{M+i}}{\Vert \sum _{i=1}^m \varvec{x}_{M+i}\Vert }=\bar{\varvec{x}}/\Vert \bar{\varvec{x}}\Vert =\varvec{w}^*. \end{aligned}$$

(36)

Let $D>0$ be chosen so that

$$\begin{aligned} 0<\theta + D \le \theta ^*. \end{aligned}$$

Given that $0<\theta <\theta ^*$, such Ds always exist. Equation (36) implies that there is a $t_1(D)> t_0$ such that

$$\begin{aligned} {\Vert \varvec{w}(t,{\varvec{w}_0})-\varvec{w}^*\Vert <D, \ \theta \in (0,\theta ^*-D] \ \forall \ t\ge t_1(D).} \end{aligned}$$

(37)

The theorem now follows immediately from Theorem 2. $\square $

F Auxiliary Results

Lemma 3

(cf. Gorban et al. 2016) Let ${\mathcal {Y}}=\{\varvec{x}_1,\varvec{x}_2,\dots ,\varvec{x}_k\}$ be a set of k i.i.d. random vectors from the equidistribution in the unit ball $B_n(1)$. Let $\delta ,r\in (0,1)$, and suppose that $\Vert \varvec{x}_i\Vert \ge r$, for all $i\in \{1,\dots ,k\}$.

Then the probability that the elements of ${\mathcal {Y}}$ are pair-wise $\delta /r$-orthogonal, that is

$$\begin{aligned} |\cos (\angle (\varvec{x}_i,\varvec{x}_j) )| \le \frac{\delta }{r} \ \text{ for } \text{ all } \ i\ne j \ \ i,j\in \{1,\dots ,k\}, \end{aligned}$$

is bonded from below as

$$\begin{aligned} \begin{aligned}&\mathcal {P}\left( |\cos (\angle (\varvec{x}_i,\varvec{x}_j) )| \le \frac{\delta }{r} \ \forall \ i,j\in \{1,\dots ,k\}, \ i\ne j \ \left| \ \Vert \varvec{x}_i\Vert \ge r, \ 1\le i\le k \right. \right) \\&\quad \ge \prod _{d=1}^{{k}-1} \left( 1-d \left( 1 -\delta ^2\right) ^{\frac{n}{2}} \right) . \end{aligned} \end{aligned}$$

Proof of Lemma 3

Let $\varvec{x}_i$, $i=1,\dots ,k$ be random vectors satisfying conditions of the lemma. Let $E_\delta (\varvec{x}_i)$ be the delta-thickening of the largest equator of $B_n(1)$ that is orthogonal to $\varvec{x}_i$. There is only one such equator, and it is uniquely determined by $\varvec{x}_i$. Consider the following probabilities:

$$ \begin{aligned} \begin{aligned}&P(\varvec{x}_2\in \ E_\delta (\varvec{x}_1))\\&P([\varvec{x}_3\in \ E_\delta (\varvec{x}_2)] \& [\varvec{x}_3\in \ E_\delta (\varvec{x}_1)])\\&P([\varvec{x}_4\in \ E_\delta (\varvec{x}_3)] \& [\varvec{x}_4\in \ E_\delta (\varvec{x}_2)] \& [\varvec{x}_4\in \ E_\delta (\varvec{x}_1)])\\&\cdots \\&P([\varvec{x}_k\in \ E_\delta (\varvec{x}_{k-1})] \& \cdots \& [\varvec{x}_k\in \ E_\delta (\varvec{x}_1)]). \end{aligned} \end{aligned}$$

Pick $\varvec{x}_i,\varvec{x}_j\in {\mathcal {Y}}$, $i\ne j$. Recall that, for any random events $A_1,\dots ,A_k$, the probability

$$ \begin{aligned} P(A_1 \& A_2 \& \cdots \& A_k) \ge 1 - \sum _{i=1}^k (1-P(A_i)). \end{aligned}$$

(38)

According to (38), the probability that $\varvec{x}_i\in E_\delta (\varvec{x}_j)$ is bounded from below by $1-\left( 1 -\delta ^2\right) ^{\frac{n}{2}}$ (cf. Gorban et al. 2016, Proposition 3; see also Fig. 1 in Gorban et al. 2016 for illustration). Then

$$ \begin{aligned} \begin{aligned}&P(\varvec{x}_2\in \ E_\delta (\varvec{x}_1))\ge 1-\left( 1 -\delta ^2\right) ^{\frac{n}{2}}\\&P([\varvec{x}_3\in \ E_\delta (\varvec{x}_2)] \& [\varvec{x}_3\in \ E_\delta (\varvec{x}_1)])\ge 1-2\left( 1 -\delta ^2\right) ^{\frac{n}{2}}\\&P([\varvec{x}_4\in \ E_\delta (\varvec{x}_3)] \& [\varvec{x}_4\in \ E_\delta (\varvec{x}_2)] \& [\varvec{x}_4\in \ E_\delta (\varvec{x}_1)])\ge 1-3\left( 1 -\delta ^2\right) ^{\frac{n}{2}}\\&\cdots \\&P([\varvec{x}_k\in \ E_\delta (\varvec{x}_{k-1})] \& \cdots \& [\varvec{x}_k\in \ E_\delta (\varvec{x}_1)]) \ge 1-(k-1)\left( 1 -\delta ^2\right) ^{\frac{n}{2}}. \end{aligned} \end{aligned}$$

(39)

The fact that $\varvec{x}_i\in E_\delta (\varvec{x}_j)$ combined with the condition that $\Vert \varvec{x}_i\Vert \ge r$, $\Vert \varvec{x}_j\Vert \ge r$ imply:

$$\begin{aligned} |\cos (\angle (\varvec{x}_i,\varvec{x}_j) )| \le \frac{\delta }{r}. \end{aligned}$$

Finally, given that $\varvec{x}_1,\dots ,\varvec{x}_k$ are drawn independently and that the distribution is rotationally invariant, the probability that all vectors in ${\mathcal {Y}}$ are pair-wise orthogonal is the product of all probabilities in the left-hand side of (39). Thus the statement follows. $\square $

Lemma 4

Let ${\mathcal {Y}}=\{\varvec{x}_{1},\dots ,\varvec{x}_{m}\}$ be a finite set from $B_n(1)$. Let $\Vert \varvec{x}_i\Vert \ge 1-\varepsilon $, $\varepsilon \in (0,1)$ for all $\varvec{x}_i\in {\mathcal {Y}}$, and $\beta _1,\beta _2\in \mathbb {R}$ be such that the following condition holds:

$$\begin{aligned} \beta _2 (m-1) \le \sum _{j\in \{1,\dots ,m\}, \ j\ne i} \langle \varvec{x}_{i}, \varvec{x}_{j}\rangle \le \beta _1 (m-1) \ \text{ for } \text{ all } \ i=1,\dots ,m. \end{aligned}$$

(40)

Consider

$$\begin{aligned} \begin{aligned} \ell (\varvec{x}) = \left\langle \frac{\bar{\varvec{y}}}{\Vert \bar{\varvec{y}}\Vert }, \varvec{x}\right\rangle - \frac{1}{\sqrt{m}}\left( \frac{(1-\varepsilon )^2 + \beta _2 (m-1)}{\sqrt{1+(m-1)\beta _1}}\right) , \ \bar{\varvec{y}}=&\frac{1}{m}\sum _{i=1}^{m} \varvec{x}_{i}, \end{aligned} \end{aligned}$$

and suppose that parameters $\beta _1,\beta _2$ satisfy:

$$\begin{aligned} (1-\varepsilon )^2 + \beta _2 (m-1)> 0, \ 1+(m-1)\beta _1 > 0. \end{aligned}$$

Then

$$\begin{aligned} \ell (\varvec{x}_i) \ge 0 \ \text{ for } \text{ all } \ \varvec{x}_i\in {\mathcal {Y}}. \end{aligned}$$

(41)

Proof of Lemma 4

Consider the set ${\mathcal {Y}}$. According to the lemma assumptions, $\Vert \varvec{x}_{i}\Vert \ge 1-\varepsilon $ for some given $\varepsilon \in (0,1)$ and all $i=1,\dots ,m$. Consider now the mean vector $\bar{\varvec{y}}$

$$\begin{aligned} \bar{\varvec{y}}=\frac{1}{m}\sum _{i=1}^{m} \varvec{x}_{i}, \end{aligned}$$

and evaluate the following inner products

$$\begin{aligned} \left\langle \frac{\bar{\varvec{y}}}{\Vert \bar{\varvec{y}}\Vert }, \varvec{x}_{i} \right\rangle =\frac{1}{m \Vert \bar{\varvec{y}}\Vert } \left( \Vert \varvec{x}_{i}\Vert ^2 + \sum _{j\in \{1,\dots ,m\}, \ j\ne i} \langle \varvec{x}_{i},\varvec{x}_{j} \rangle \right) , \ i=1,\dots ,m. \end{aligned}$$

According to assumption (40), the following holds

$$\begin{aligned} \left\langle \frac{\bar{\varvec{y}}}{\Vert \bar{\varvec{y}}\Vert }, \varvec{x}_{i} \right\rangle \ge \frac{1}{m \Vert \bar{\varvec{y}}\Vert } \left( (1-\varepsilon )^2 + \beta _2 (m-1) \right) , \end{aligned}$$

and, respectively,

$$\begin{aligned} \frac{1}{m}\left( 1+(m-1)\beta _1 \right) \ge \langle \bar{\varvec{y}},\bar{\varvec{y}}\rangle = \Vert \bar{\varvec{y}}\Vert ^2 \ge \frac{1}{m}\left( (1-\varepsilon )^2 + \beta _2 (m-1)\right) \end{aligned}$$

Let $(1-\varepsilon )^2 + \beta _2(m-1) > 0$ and $1 + \beta _1(m-1) > 0$. It is clear that for $\ell $, as defined by (41), the following holds for all $i=1,\dots ,m$: $\ell (\varvec{x}_{i})\ge 0$. $\square $

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Tyukin, I., Gorban, A.N., Calvo, C. et al. High-Dimensional Brain: A Tool for Encoding and Rapid Learning of Memories by Single Neurons. Bull Math Biol 81, 4856–4888 (2019). https://doi.org/10.1007/s11538-018-0415-5

Download citation

Received: 16 October 2017
Accepted: 04 March 2018
Published: 19 March 2018
Issue Date: November 2019
DOI: https://doi.org/10.1007/s11538-018-0415-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

High-Dimensional Brain: A Tool for Encoding and Rapid Learning of Memories by Single Neurons

Abstract

Similar content being viewed by others

Universal principles justify the existence of concept cells

Systems Theory, Emergent Properties, and the Organization of the Central Nervous System

Spatial, Temporal, and Behavioral Correlates of Hippocampal Neuronal Activity: A Primer for Computational Analysis

1 Introduction

2 Fundamental Problems of Encoding Memories

3 Formal Statement of the Problem

3.1 Information Content and Classes of Stimuli

3.2 Neuronal Model

3.3 Synaptic Plasticity

4 Formation of Memories in High Dimensions

4.1 Extreme Selectivity of a Single Neuron to Single Stimuli

Definition 1

Theorem 1

Remark 1

Remark 2

4.2 Extreme Selectivity of a Single Neuron and Ensemble Memory Capacity

Corollary 1

4.3 Selectivity of a Single Neuron to Multiple Stimuli

Theorem 2

Corollary 2

Remark 3

Remark 4

Remark 5

4.4 Dynamic Memory: Learning New Information Items by Association

Theorem 3

5 Discussion

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

A Dynamics of Coupling Weights

Lemma 1

Proof of Lemma 1

Lemma 2

B Proof of Theorem 1

C Proof of Corollary 1

D Proof of Theorem 2

E Proof of Theorem 3

F Auxiliary Results

Lemma 3

Proof of Lemma 3

Lemma 4

Proof of Lemma 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation