Recruitment and Consolidation of Cell Assemblies for Words by Way of Hebbian Learning and Competition in a Multi-Layer Neural Network
- First Online:
- Cite this article as:
- Garagnani, M., Wennekers, T. & Pulvermüller, F. Cogn Comput (2009) 1: 160. doi:10.1007/s12559-009-9011-1
- 542 Downloads
Current cognitive theories postulate either localist representations of knowledge or fully overlapping, distributed ones. We use a connectionist model that closely replicates known anatomical properties of the cerebral cortex and neurophysiological principles to show that Hebbian learning in a multi-layer neural network leads to memory traces (cell assemblies) that are both distributed and anatomically distinct. Taking the example of word learning based on action-perception correlation, we document mechanisms underlying the emergence of these assemblies, especially (i) the recruitment of neurons and consolidation of connections defining the kernel of the assembly along with (ii) the pruning of the cell assembly’s halo (consisting of very weakly connected cells). We found that, whereas a learning rule mapping covariance led to significant overlap and merging of assemblies, a neurobiologically grounded synaptic plasticity rule with fixed LTP/LTD thresholds produced minimal overlap and prevented merging, exhibiting competitive learning behaviour. Our results are discussed in light of current theories of language and memory. As simulations with neurobiologically realistic neural networks demonstrate here spontaneous emergence of lexical representations that are both cortically dispersed and anatomically distinct, both localist and distributed cognitive accounts receive partial support.
KeywordsCompetitive recruitment learningLanguage acquisitionMemory traceConsolidationLTP/LTDLocalist distributed representationsNeurocomputationNeurobiologically plausible modellingSynaptic plasticityPerisylvian cortexPerception-action associations
The field of cognitive neuroscience has been inundated by a flood of experimental findings produced by new and exciting imaging techniques. The development of brain-based theories, however, has not been able to keep up with the large and unexpected flow of experimental data: although the mapping of cognitive processes to cortical areas may be motivated by empirical evidence, a fundamental issue that any theory should address is a principled explanation of why specific areas become active when specific cognitive processes are being performed. The major “label and conquer” approach to cognitive neuroscience has, in many cases, fallen short of providing such a mechanistic explanation.
Recently, researchers have started to use computational modelling in conjunction with experimental techniques with a view to combine cognitive and brain theories and link neuronal circuits to functional systems, especially in the domain of visual, auditory and language processing (e.g. [13, 14, 16, 33, 37, 86, 93]). When built so as to closely replicate neuroanatomical structure and neurophysiological characteristics of the cortex, computational models can make precise, quantitative predictions about when and where in the brain specific cognitive processes are expected to take place. Such predictions can be tested with experimental methods, which can provide evidence in support of the neurophysiological validity of the models or lead to their further refinement. Crucially, neurobiologically realistic models can help address a fundamental (and generally neglected) question in the field of cognitive neuroscience i.e. the “how/why” question: by shifting the level of investigation from that of abstract mechanisms down to that of cortical circuits, models can provide a mechanistic explanation of how human cognition might emerge from neurobiological structure and function. In the work described here this approach was successfully applied in the domain of language to simulate and explain, at the neuronal level, the mechanisms underlying early word acquisition.
In psycholinguistics, most existing computational approaches explain language processes either as the activation and long-term storage of localist elements [17, 18, 45, 51, 57, 58] or on the basis of fully distributed activity patterns [28, 39, 52, 64, 72, 73, 76]. Localist approaches typically assume, a priori, the existence of separate nodes for separate items (words), and of pre-established, “hard-wired” connections between them; the adoption of anatomically distinct nodes allows different item representations to be active at the same time while avoiding cross-talk. Distributed accounts, on the other hand, do not make such a priori assumptions: according to them, the representations of the relevant items emerge as distributed patterns of strengthened connections over all nodes in a layer (the hidden layer). In this approach, the same set of hidden nodes is used to encode different items as different patterns of graded activation; this, however, makes it impossible to maintain separate different item representations when these are simultaneously active. In general, cognitive arguments (e.g. our proven ability to maintain multiple item representations distinct) favour localist representations, whereas neuroscience arguments weight in favour of distributedness [22, 58].
The results presented here suggest that these two approaches are both partly correct and partly misleading: distributed and anatomically distinct representations can emerge spontaneously in the cortex solely as a result of Hebbian synaptic plasticity mechanisms, and do not need to be assumed a priori.
On the basis of neuroanatomical and imaging studies, six interconnected cortical areas are modelled: primary auditory cortex (A1), auditory belt (AB) and parabelt (PB) areas (Wernicke’s area), inferior prefrontal (PF) and premotor (PM) cortex (Broca’s area) and primary motor cortex (M1).
Neurons are modelled as graded-response cells with adaptation, whose output represents the average firing rate of a local pool of pyramidal cells.
- (iv)Both local and global (non-specific) cortical inhibition mechanisms are realized:
Inhibitory cells reciprocally connected with neighbouring excitatory cells simulate the action of a pool of inter-neurons surrounding a cortical pyramidal cell in serving as lateral inhibition and local activity control;
Area-specific inhibitory loops implement a mechanism of self-regulation, preventing the overall network activity from falling into non-physiological states (total saturation or inactivity).
In our previous work, this architecture has been used to investigate the emergence of cell assemblies for words and the effects of attention on language processes [26, 27]; here, we specifically focus on the mechanisms of cell assembly formation and on how different computational implementations of synaptic plasticity affect such mechanisms and the network’s ability to spontaneously develop separate input pattern representations.
It should be noted that we did not model individual spiking neurons but chose to use a mean-field approach, where each cell of the network represents the average activity of a local pool of neurons, or cortical column [21, 94]. Although spiking neurons would have made the model more biologically realistic, their introduction would have produced a significant impact in terms of computational resources; thus, we decided to start simple, and leave the implementation of this level of detail to a possible second phase, if necessary. As it turned out, modelling the cortical interactions at the level of cortical columns was sufficient to reproduce the phenomena of interest here.
With regard to point (v), we postulate that the emergence of specialized cell assemblies for words is driven by the repeated presentation of the same action-perception patterns in presence of Hebbian mechanisms of associative learning. LTP and LTD, consisting of long-term increase or decrease in synaptic strength resulting from pairing presynaptic activity with specific levels of postsynaptic membrane potentials, are believed to play a key role in experience-dependent plasticity, memory and learning [48, 71]. The network implemented two different computational abstractions of LTP and LTD, one based on Sejnowski’s covariance rule [77, 78], the other one on the Artola–Bröcher–Singer (ABS) model of LTP/LTD [3, 4]. Both algorithms are described in the next section.
After the training, the network was tested to reveal the properties of the cell assemblies which had emerged for the given auditory-motor pattern pairs. More precisely, for each of the four patterns presented to the network, the time-average of the response (output value, or “firing rate”) of each cell in the network was computed. These averages were used to identify the CAs that developed in the network in response to the four input pairs, as follows: a CA was defined simply as the subset of cells exhibiting average output above a given threshold γ∈[0,1] during stimulus presentation. Using the above functional definition, we then measured, for different values of γ, (i) CA size (averaged across the CAs) and (ii) distinctiveness of a CA, quantified as the average overlap (number of cells that two CAs shared) between one randomly chosen CA and the other three (this is also a measure of the amount of cross-talk between pairs of CAs). We repeated the above process and collected these measures for two sets of ten networks, each set trained using one of the two rules, and each network randomly initialized and trained with a different set of stimulus pairs.
As the training progressed, we observed the emergence of distributed cell assemblies; the distinctiveness of the CAs, however, differed significantly depending on the learning rule adopted. This becomes apparent by examining the time-averaged response that each input pattern induced in the network at the different stages of the learning process.
Impacts of the Learning Rule
The merging phenomenon prevented the formation of distinct CAs in several of the networks, and was a symptom of the covariance learning rule’s inability to separate, or “pull apart” the representations of two or more input patterns that happened to produce overlapping activations (see below and the “Discussion” section). This illustrates a mechanism commonly used for nurturing reservations against the cell assembly theory: learning can “lump together” different representations (see, for example, ).
The number of cells activated and involved in the binding is significantly larger than that observed in the previous simulations. In particular, a number of weakly active cells (widespread grey areas in the background) now accompany CA formation. Although a higher number of cells responding to an input pattern should increase the probability of CA overlap (and, thus, of CA merging), this did not happen (Fig. 5). In this example, too, a small overlap did develop between the CAs responding to W3 and W4 (e.g. compare the lower corners of area PB). This overlap, however, is maintained within limits (e.g. the responses to W3 and W4 still differ in areas AB and PM).
Although this may not be easily visible, the weakly active cells (which can be considered either as weak members of the CA or as part of what  called the “halo” of the assembly) become less numerous (and/or less active) between stages 100 and 5,000: this phenomenon is more apparent in the central areas (e.g. compare the responses to patterns W1–W3 in area PB at these two time points). On the other hand, cells that are already strongly active after 100 presentations (very bright or white dots) still respond equally (if not more) strongly after 5,000 presentations; this indicates that the CAs have reached a stable and robust configuration, with strongly and reciprocally connected sets of cells forming their “kernel” .
The reduction in the sizes of the CAs’ haloes suggests that, subsequently to the initial period of CA growth, the links connecting a CA to the set of potential candidates (cells that could become part of the CA kernel i.e. be “recruited” by it) undergo a process of weakening, or “pruning”. Such process could play a role in limiting CA merging and in “separating” initially overlapping CAs. To address this issue, it is necessary to look at the way in which CA overlap changes (in the areas where the pruning takes place) as a function of learning.
Pruning and CA Separation (with ABS Rule)
In this section we report evidence that the pruning and reduction in number of weakly active cells visually observed in the central areas is indeed a phenomenon that occurs reliably (on average) in all networks trained with the ABS rule, and that the amount of overlap between the CAs in these areas decreases as the learning progresses.
By far the largest group (shown in cyan) consists of cells that are either not active or have very weak output (below 0.01). A direct comparison of the bar graphs for 100 and 5,000 training phases shows that, in the two central areas (PB and PF) a significant number of cells initially active in the 0.01–0.02 interval are pushed to the lower activation level (0–0.01). This result indicates that the weakly active cells (grey areas in Fig. 6) indeed become less numerous (and/or less active) between stages 100 and 5,000 in the two central areas. As the weaker activations can only be explained by synaptic-weight reduction, these numbers illustrate the phenomenon of pruning during the learning process.
Emergence (Cell Recruitment) and Consolidation of CAs
Action-perception patterns are stored as distributed sets of neurons in multilayer networks with neuroanatomical constraints;
During learning, neurons are recruited and gradually bound together into a single CA; such recruitment and CA consolidation processes proceed from the sensory and motor areas inwards, towards the central, or “amodal”, associative areas;
Whereas networks adopting the covariance rule  struggle to produce input-specific, distinct lexical representations, the adoption of a neurobiologically grounded Hebbian rule with fixed thresholds, based on the Artola–Bröcher–Singer  model of LTP/LTD, leads to CA overlap minimization (<5%) and anatomically distinct CAs;
In the networks adopting the ABS rule, the process of growth and merging of CAs is countered by a process of competition and pruning of the CAs’ halo; we conjecture that this reduction in CA size and overlap reflects a strong weight decrease in the connections between kernel and halo.
It should be added that the strong binding of the articulatory-acoustic activation patterns and distinctiveness of CAs was confirmed by additional simulations (not reported here) in which the network was stimulated, in area A1 only, with the auditory component of a word pattern; the results showed that, while the input-specific CA was strongly activated and led to partial reconstruction of the associated motor pattern in M1 (on average, approximately 30% of the motor pattern was reproduced), the other CAs remained almost completely silent (see Garagnani et al. , their Figs. 10 and 11).
Assume that during training, the weights are modified according to the co-variance rule, as summarized by Table 1. The difference between current and average activity of one cell is larger when cells are fully active than when they are silent, as in a sparsely active network, a cell’s average activity is much closer to zero than to its maximum level of activation.
Note that links between two cells that are simultaneously silent are strengthened (case (d) in Table 1). This leads to an overall merging effect. Setting Δw = 0 in case (d) is not sufficient to solve this problem. In fact, because of the differences in magnitude, the net effect produced by the alternated strengthening (a) and weakening (cases (b) or (c)) of a link is an increase in strength. In the example of Fig. 10, alternation of inputs A and B means alternated increase (homosynaptic LTP, (a)) and decrease (heterosynaptic LTD, (b)) of w3 and w4: the net effect is a weight increase in both, which, in the long run, causes the two cell assemblies to merge into a single one.
It uses the same amount of weight change Δw per unit time for both LTP and LTD (this implies that weakening and strengthening produce weight changes of equal magnitude);
It does not strengthen links between cells that are simultaneously silent;
It uses a single parameter’s value (the postsynaptic membrane potential) to determine whether LTP or LTD should occur—see Eq. 2.
The last feature (based on neurobiological evidence) allows one to precisely define the ranges of values of the postsynaptic membrane potential for which either LTP or LTD will occur. We speculate that the ratio between the widths of these ranges allows the modulation of the total amount of competitive learning that takes place in the network. The validity of this hypothesis requires further computational testing, motivated also by the presence of a residual CA overlap in the ABS rule simulations.
Also note that as words usually have “neighbours” with which they share part of their form [12, 50], the model correlate of words should also partially overlap in their articulatory-acoustic signatures. Input patterns with overlapping activations, however, are expected to lead to an increase in CA overlap and merging; to maintain CA distinctiveness and neurobiological realism, we envisage the use, in future, of temporally dynamic patterns, spiking neurons and time-dependent synaptic plasticity rules.
The fact that the computational abstraction of the ABS model implemented here exhibits both competitive and recruitment learning behaviour [19, 24, 80] is worth of note. Perhaps the most well-known example of Hebbian learning rule exhibiting both of these properties is the Bienenstock–Cooper–Munro (BCM) rule , which has been successfully used to model and explain the spontaneous emergence of orientation selectivity and ocular dominance in the visual cortex . It should be noted that, although many of the BCM rule properties have been shown to arise from spike-time dependent plasticity rules (e.g. ), this rule had been originally developed to account for cortical organization and receptive field properties during development. Instead, the ABS model was derived from neurophysiological data obtained in the mature cortex. Below we discuss in detail additional aspects that distinguish the ABS rule implemented here from the classical BCM rule.
First of all, in the BCM rule the LTP/LTD threshold—corresponding to parameter θ+ in Eq. 2—is not, like here, a predefined, fixed value, but a sliding threshold that changes according to the running average of the postsynaptic cell’s activity.2 As pointed out by , although evidence suggesting that the LTP/LTD threshold may be affected by the activity of the cell does exist (e.g. [5, 41]), it has been established that this effect is input (i.e. synapse) specific, and that it depends on the pattern of pre-synaptic rather than post-synaptic activity . Thus, the assumption of a single, postsynaptic-driven LTP/LTD threshold that applies to all the synapses of a cell is not entirely justified.3 Second, in the BCM rule LTD occurs even with very small postsynaptic potentials, whereas experimental evidence suggests that if postsynaptic depolarization remains below a certain threshold, the synaptic efficacy should remain unchanged, regardless of any presynaptic activity . This aspect was implemented in the present ABS rule using a second (fixed) threshold, parameter θ− in Eq. 2. Finally, the BCM rule is unable to model heterosynaptic LTD (the weakening of synaptic inputs that are themselves inactive), as it requires at least some presynaptic activity to be present at a synapse for LTD to take place. This form of LTD has been observed in the hippocampus and neocortex ; the induction protocols require strong postsynaptic activation (e.g. high frequency stimulation of the cell through excitatory inputs). Accordingly, the ABS rule implemented here allows heterosynaptic LTD to occur, subject to the postsynaptic cell being strongly depolarized (condition V(y,t) ≥ θ+ in Eq. 2).
In view of the above point, we submit that the ABS rule that we adopted is more neurobiologically accurate than the BCM rule; furthermore, it does not make any assumptions about the existence of a global sliding threshold (or conservation of the cell’s total synaptic strength). At the time of writing, we are not aware of any other examples of biologically realistic learning rules with fixed (non-sliding), input-specific (local) LTP/LTD thresholds (and no synaptic-weight conservation) that have been reported to exhibit both recruitment and competitive learning (but see  for a biologically grounded recruitment learning algorithm).
Relevant here is the innovative recent study by , which explores the effects of the BCM learning rule on the dynamics of neural populations in a model of the hippocampus and surrounding cortical areas. Like the simulations presented here, Seth and Edelman’s network uses mean-firing rate units and Hebbian synaptic plasticity, and carefully replicates neuroanatomical structure and connectivity of the relevant areas. In this study, however, the authors investigate the spontaneous emergence of mesoscale (i.e. cell assembly level) causal structures during the execution of a spatial navigation task. To analyse the network’s behaviour in terms of population dynamics, for each specific network output, or neural reference (NR),4 a corresponding “context network” is identified, consisting of the set of neuronal interactions that could have potentially caused the observed NR (i.e. the set of cells that were active before the NR and are connected to it, either directly or via a number of synaptic links smaller than six). From this context, a “causal core” of interactions is then extracted, comprised of only those links that are causally significant (according to Granger’s concept of causality ) and that form a chain of activations causing (or predicting) the specific NR. Interestingly, the results of this study indicate that the size of the causal cores diminishes as the learning progresses, a process which the authors refer to as of “refinement”, and which is interpreted by them as possibly reflecting the selection of a specific causal pathway from diverse neuronal repertoires.
In view of the apparent similarities with the present work, one might be tempted to draw parallels between the concepts of causal core and cell assembly, or, even further, between the process of refinement and that of pruning of a CA’s halo. However, this analogy would not be very appropriate. To begin with, the concept of causal core (or context network) introduced by Seth and Edelman is not equivalent to that of cell assembly kernel (or halo). In fact, a CA kernel consists of a set of strongly and reciprocally connected cells; the presence of positive feedback loops within the CA’s circuits is a crucial feature, as it allows reverberation and persistence of activity even in absence of a stimulus. Instead, causal cores, more akin to Abeles’ synfire chains , are formed by mono-directional chains of (not necessarily strongly connected) cells, whose sequential activation is a good predictor of the activation of a single output cell at a particular time point (due to the combinatorial growth in the number of cells to be included in the context network, the causal core analysis cannot be easily extended to include a set of NRs rather than just one ). Second, while in the results reported here the pruning process that gradually separates the CAs takes place in the CA’s halo, in Seth & Edelman’s simulations the refinement takes place in the causal core (the equivalent of the CA kernel), and not in the context network, or “periphery” of the circuit. Third, Seth and Edelman’s results do not provide evidence of an ongoing process of recruitment of cells from the periphery (or halo) and consequent consolidation of the links between them: the context network is mostly unaffected by the learning process , their Fig. 3a). Finally, to achieve competitive learning, in addition to adopting a BCM-based rule (which, as discussed earlier, makes some assumptions that are not biologically motivated), Seth & Edelman’s model of synaptic plasticity uses weight normalization , a mechanisms not fully justified on neurobiological grounds. In spite of these differences, Seth & Edelman’s significant contribution is still close, in spirit, to the present and other works (e.g. [16, 33, 37, 86]), which attempt to explain the emergence of high-level behavioural and cognitive processes in terms of neural population dynamics in neurobiologically realistic neural networks.
Finally, the present results provide evidence in support of the hypothesis that words, similar to other units of cognitive processing (e.g. objects, faces), are represented in the human brain as distributed and anatomically distinct action-perception circuits. Existing theoretical and computational accounts of knowledge representation in the brain explain memory either as the activation of a priori-established localist elements, or on the basis of fully overlapping, distributed patterns (see “Introduction” section). However, neither of these accounts is entirely compatible with an approach grounded in neuroanatomy and neurophysiology: localist networks with one cell (neuronal pool, or cortical column) coding for one cognitive trace may have difficulty in explaining (or making predictions about) the experimentally observed spreading of activity in cortex when words or concepts are recognized. Fully distributed networks, on the other hand, predict very global brain activity if their layers are not firmly related to specific cortical areas, and struggle to explain our ability to maintain active more than one representation at the same time within the same sensory modality. The present results suggest that anatomically distinct and distributed action-perception circuits can emerge spontaneously in the cortex as a result of synaptic plasticity. Our model predicts and explains the formation of lexical representations consisting of strongly interconnected, anatomically distinct cortical circuits distributed across multiple cortical areas, allowing two or more lexical items to be active at the same time. Crucially, our simulations provide a principled, mechanistic explanation of where and why such representations should emerge in the brain, making predictions about the spreading of activity in large neuronal assemblies distributed over precisely defined areas, thus paving the way for an investigation of the physiology of language and memory guided by neurocomputational and brain theory.
The output value O(x,t) of a cell x represents the cumulative output (number of action potentials per time unit) of neuronal cluster (column) x at time t, and is a piecewise-linear sigmoid function of the cell’s membrane potential V(x,t). See Appendix 1 for details.
More precisely, for the BCM rule to exhibit stable learning behaviour, the threshold must be a more-than-linear function of the cell's average output rate (simulations typically use the power of 2).
Although evidence in support of the existence of homeostatic plasticity mechanisms exists (see  for a review), phenomena such as that of synaptic scaling—showing that prolonged changes in the cell’s activity lead to the multiplicative scaling of all the amplitudes of the miniature excitatory postsynaptic currents —do not constitute direct evidence for the presence of a single sliding LTP/LTD cell-threshold. Equally, synaptic scaling does not justify assuming that the norm of the vector of the synaptic strengths is conserved and equal for all cells, as often presupposed by neurobiologically inspired implementations of Hebbian learning (e.g. ).
A “neural reference” is defined as the activation, at a particular time, of a particular cell in area CA1; in the model in question, the activity in area CA1 directly affected motor output.
M.G. and F.P. acknowledge support by the UK Medical Research Council (U1055.04.003.00001.01, U1055.09.001.00002.01) and the European Community under the ‘‘New and Emerging Science and Technologies’’ Programme (NEST-2005-PATH-HUM contract 043374, NESTCOM). T.W. acknowledges support by the Engineering and Physical Sciences Research Council (grant EP/C010841/1, COLAMN—A Novel Computing Architecture for Cognitive Systems based on the Laminar Microcircuitry of the Neocortex).
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.