Biological Neuron Coding Inspired Binary Word Embeddings
- 427 Downloads
Word embeddings are the semantic representations of the words. They are derived from large corpus and work well on many natural language tasks, with one downside of costing large memory space. In this paper, we propose binary word embedding models based on inspirations from biological neuron coding mechanisms, converting the spike timing of neurons during specific time intervals into binary codes, reducing the space and speeding up computation. We build three types of models to post-process the original dense word embeddings, namely, the homogeneous Poission processing-based rate coding model, the leaky integrate-and-fire neuron-based model, and the Izhikevich’s neuron-based model. We test our binary embedding models on word similarity and text classification tasks of five public datasets. The experimental results show that the brain-inspired binary word embeddings (which reduce approximately 68.75% of the space) get similar results to original embeddings for word similarity task while better performance than traditional binary embeddings on text classification task.
KeywordsWord embeddings Neuron coding Spiking neural networks
Word embeddings models can convert both semantic and syntactic information of words into dense vectors, for example, Word2Vec  and GloVe . Recently, they attract a lot of attention due to their good performances in various natural language processing tasks, such as language modeling , parsing , sentence classification , and machine translation .
However, these dense representations are mostly derived from statistical property of large corpus while are lack of interpretability in each dimension of the word vectors. Several works have tried to transform dense word embeddings into sparse ones to improve the interpretability. Murphy et al. introduced a matrix factorization algorithm named non-negative sparse embeddings (NNSE) on co-occurrence matrix to get sparse, effective and interpretable embeddings . Faruqui et al. defined a L1 regularized objective function and proposed an post-process optimization algorithm to convert original dense embeddings into sparse or binary embeddings. They call them sparse or binary overcomplete word vector . Sun et al. introduced an algorithm to get sparse embeddings during training Word2Vec model through L1 regularizer on cost function and regularized dual averaging optimization algorithm . For binary word embeddings, there are also some rounding algorithms on converting dense vectors into discrete integer values to reduce memory. Ling et al. proposed post-processing rounding, stochastic rounding, and auxiliary update vectors algorithms for word embeddings with limited memory, which is named as truncated word embeddings . The interpretability issue in these works is mentioned but not demonstrated clearly. In this paper, we want to improve it via a brain-inspired approach, explaining each dimension of word embeddings based on neuron coding models.
In biological brains, the encoding of information in the areas such as inferior temporal visual cortex, hippocampus, orbitofrontal cortex and insula is with sparse distributed representation . Many experimental evidences have indicated that biological neural systems use the timing of spikes to encode information [12, 13, 14]. The spike trains of cell activities during information transition inspire us to combine traditional word embeddings and neuron coding models into binary embeddings. In this paper, we perform post-process operations on original dense word embeddings to get binary ones with inspirations from biological neuron coding models, and the proposed binary embeddings are with less space occupation and with better interpretability than previous models.
Neuron coding is concerned with describing the relationship between the stimulus and the neuronal responses . A great many efforts have been dedicated to developing techniques to enable the recording of the brain’s electrical activity at different spatial scales, such as single cell spike train recording, local field potential (LFP), and electroencephalogram (EEG) . Neuron coding models mainly concern how neurons encode, transmit, and decode information, and their main focus is to understand how neurons respond to a wide variety of stimuli, and to construct models that attempt to predict responses to other stimuli.
Neurons propagate signals by generating electrical pulses called action potentials: voltage spikes that can travel down nerve fibers. For example, sensory neurons change their activities by firing sequences of action potentials in various temporal patterns, with the presence of external sensory stimuli, such as light, sound, taste, smell and touch . It is known that information about the stimulus is encoded in action potentials and transmitted through connected neurons in our brains.
There are various kinds of hypotheses on neuron coding based on recent neurophysiological findings on biological nervous system, mainly including spike rate coding and spike time coding. For spike rate coding, only the firing rate in an interval is concerned as a measurement for information carried. Rate coding is firstly motivated by the observation of the frog cutaneous receptors by Adrian et al. in 1926 that physiological neurons tend to fire more often for stronger stimuli . Spike rate coding has been the main paradigm in artificial neural networks, such as sigmoidal neurons. Meanwhile, the Poisson-like rate coding is widely used by physiologists to describe how the neurons transmit information. Recently, some neurophysiological results show that efficient processing of information is more likely based on precise timing of action potentials rather than firing rate in some biological neural systems [18, 19, 20]. For timing coding hypotheses , they mostly concentrate on the timing of individual spikes and the typical ones are the time to first spike [22, 23], rank order coding [20, 24], latency coding , and phase coding .
In our study, we use Poisson-like coding for spike rate coding and various spiking neuron models for time coding. We try to apply these biological neuron coding hypotheses to build binary word embedding models.
Spiking Neural Network Models
Spiking neural networks (SNNs), which are highly inspired from recent advancement in neuroscience, are often referred as the third generation neural network models . Different from traditional neural networks, SNNs consider the timing of individual spikes as the means of communication and neural computation .
Spiking neuron models are the basis of SNNs, which describe the properties of certain cells in the nervous system that generate spikes across their cell membrane. The most well-known neuron model is Hodgkin-Huxley model (H-H model). In 1952, Hodgkin and Huxley did experiments on the giant axon of squid with the voltage clamp technique, which punctured the cell membrane and allowed to force a specific membrane voltage or current . The model was proposed by the recordings and fitting results, well describing the change of ion channel and neuron behavior after stimulation.
In the H-H model , the semipermeable cell membrane separates the interior of the cell from the extracellular liquid and acts as a capacitor. Because of the active ion transportation through the cell membrane, the ion concentration inside the cell is different from that in the extracellular liquid. The Nernst potential generated by the difference in ion concentration is represented by a battery.
In addition to the H-H model, other types of spiking neuron models have been proposed, such as integrate-and-fire models and variants, Izhikevich’s neuron model, and spike response model (SRM). Recently, SNN-based models have been applied in variant AI applications, such as character recognition [30, 31], object recognition , image segmentation , speech recognition , robotics , knowledge representation , and symbolic reasoning . In this paper, we will use leaky integrate-and-fire model and Izhikevich’s neuron model to convert the word embeddings into more explainable binary embeddings.
Word Embedding Models Based on Inspirations from Biological Neuron Coding
We build unsupervised models for post-processing binary word embeddings based on two types of brain-inspired models, homogeneous Poisson process and spiking neural networks. Based on preprocessed word embeddings, such as Word2Vec and GloVe, these models convert original dense embeddings into the form of binarization. Different from traditional works on binary word representations, our models are inspired by neuroscience which are biologically plausible and more interpretable.
Homogeneous Poisson Process-Based Binary Word Embeddings
Poisson-like rate coding is a major algorithm to simulate spiking response to stimuli. Biological recordings from medial temporal [38, 39] and primary visual cortex  of macaque monkeys have shown good evidence for Poisson process-based coding.
Spiking Neural Networks Based Binary Word Embeddings
The LIF-Based Binary Word Embedding Model
The leaky integrate-and-fire (LIF) neuron model, a simplified version of H-H model, is one of the simplest spiking neuron models . LIF model is widely used because it is biologically realistic and computationally simple to be analyzed and simulated [31, 42, 43].
The Izhikevich Neuron-Based Binary Word Embedding Model
In the Izhikevich neuron model, the meaning of v, vth, and vr are the same as in the LIF model, while u represents the membrane recovery variable and a, b, c, and d are four important hyper-parameters. The parameter a describes the time scale of u, b describes the sensitivity of u to the subthreshold fluctuations of v, and c is used to describe the after-spike reset value of v and is caused by fast high-threshold K+ conductances. d is used to describe the after-spike reset of u and is caused by slow high-threshold Na+ and K+.
As Izhikevich et al.  shows, different choices of these four parameters can simulate different types of neurons in the mammalian brains, such as excitatory cortical cells, inhibitory cortical cells, thalamocortical cells, etc. In this paper, we mainly focus on excitatory and inhibitory cortical neurons. According to the intracellular recordings, cortical cells can be divide into different types, for example, regular spiking (RS), intrinsically bursting (IB), and chattering (CH) for excitatory neurons while fast spiking (FS) and low-threshold spiking (LTS) for inhibitory neurons.
In our Izhikevich neuron model-based binary word embedding models, we make use of the combination of excitatory and inhibitory neurons at the rate of 4:1, which is motivated by the rate in mammalian cortex . As mentioned before, for each word, we set |D| neurons and regard the product of the original word embeddings wid and a factor Iboost as the the current for the model. We set each neuron to excitatory/inhibitory sub-models, and for different dimensions of each word, we get the spike times according to its sub-models.
Validation Tasks and Datasets
We evaluate our binary embeddings on word similarity and text classification tasks. The word similarity task has been widely used to measure in which degree the word embeddings can capture the similarity between two words, while the text classification task is a traditional NLP application. In our experiment, all the binary word embedding models are based on two kinds of well-accepted original word embeddings, namely, Word2Vec  and GloVe .
For word similarity task, we find similar words via Hamming distance, which will be faster than traditional cosine distance for dense embeddings and we evaluate embeddings on three public datasets: (1) WordSim-353, it is the most widely used dataset for word similarity test, consisting of 353 pairs of words ; (2) SimLex-999, it consists of 999 pairs of words and provides a way of measuring how well the word embeddings capture similarity, rather than relatedness or association ; (3) Rare Words, it consists of 2,034 word pairs proposed by Luong et al. , focusing on rare words to complement exiting ones. All these pairs of words are along with human-assigned similarity scores and we check Spearman’s rank correlation coefficient between word embeddings and the human labeled ranks.
For the text classification task, we do OR operation on binary embeddings to generate the representation for text and use the k-nearest neighbors (kNN) classifier to measure accuracy. We validate our algorithms on two public text datasets: (1) Search Snippets, it is a short text dataset collected by Phan et al. , which is selected from the results of Web search transaction using predefined phrases of 8 different domains; (2) Sentiment Analysis, it is proposed by Socher et al.  and is a treebank of sentences annotated with sentiment labels from movie reviews. The sentences in the treebank were split into a train (8544), dev (1101), and test splits (2210). We merge the train and dev part for the kNN classifier and ignore neutral sentences, analyzing performance on only positive and negative class.
Experiment Details and Results
In our experiment, we use the pre-trained GloVe1 and Word2Vec2 embeddings, both of which are 300 dimensions. We set three comparative experiments of original embedings, binary embeddings, “Overcomplete-B” derives from Faruqui’s work , and “Rude Binarization” convert original embeddings into binary ones via simple sign function.
Results of word embeddings on the word similarity tasks
30.16 ± 6.23
27.79 ± 6.27
15.94 ± 4.02
29.03 ± 9.20
27.64 ± 7.62
16.34 ± 4.05
51.61 ± 6.31
35.80 ± 5.50
20.87 ± 4.58
50.81 ± 2.80
49.50 ± 3.78
16.79 ± 0.90
Summarized results of two tasks
In this paper, we propose three kinds of biological neuron coding-inspired models to generate binary word embeddings, which show better performance and interpretability compared to existing works on word similarity evaluation and text classification task. To the best of our knowledge, this is the first attempt to convert the dense embeddings into binary ones via spike timing, and we have proved its feasibility on some natural language processing applications.
Due to the limitation on the performance of supervised SNNs, in this paper, we do post-processing operations on given word embeddings. However, we are looking forward to build SNN-based language model to get brain-inspired word embeddings from the raw corpus. We are trying to adjust the cost function of supervised SNNs and add several biological mechanisms such as STDP to the model to get them. Furthermore, in contrast to excitatory neocortical neurons, which have stereotypical morphological and electrophysiological classes, inhibitory neocortical interneurons have wildly diverse classes with various firing patterns that cannot be classified as FS or LTS . In this paper, we focus on FS and LTS inhibitory neurons for their parameters in Izhikevich’s neuron model are easy to get. In the future, we will pay more attention to more detailed types of inhibitory neuron models.
This study is supported by the Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDB32070100), the Beijing Municipality of Science and Technology (Grant No. Z181100001518006), the CETC Joint Fund (Grant No. 6141B08010103), and the Major Research Program of Shandong Province 2018CXGC1503.
Compliance with Ethical Standards
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of Interest
The authors declare that they have no conflict of interest.
- 1.Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems; 2013. p. 3111–9.Google Scholar
- 2.Pennington J, Socher R, Manning C. Glove: global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); 2014.Google Scholar
- 3.Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model. J Mach Learn Res 2003;3:1137–55.Google Scholar
- 4.Socher R, Bauer J, Manning CD, et al. Parsing with compo- sitional vector grammars. Meeting of the association for computational linguistics; 2013. p. 455–65.Google Scholar
- 5.Kim Y. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
- 6.Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. Advances in neural information processing systems; 2014. p. 3104–12.Google Scholar
- 7.Murphy B, Talukdar P, Mitchell T. Learning effective and interpretable semantic models using non-negative sparse embedding. COLING; 2012. p. 1933–50.Google Scholar
- 8.Faruqui M, Tsvetkov Y, Yogatama D, et al. Sparse overcomplete word vector representations. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 149101500; 2015.Google Scholar
- 9.Sun F, Guo J, Lan Y, et al. Sparse word embeddings using l1 regularized online learning. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence; 2016. p. 2915–21.Google Scholar
- 10.Ling S, Song Y, Dan R. Word embeddings with limited memory. Meeting of the association for computational linguistics; 2016. p. 387–92.Google Scholar
- 14.Rieke F, Warland D, Rob DRVS, et al. 1999. Spikes: exploring the neural code: MIT Press, Cambridge.Google Scholar
- 19.Van RR, Thorpe SJ. Rate coding versus temporal order coding: what the retinal ganglion cells tell the visual cortex. Neural Comput 2001;13(6):1255–83.Google Scholar
- 21.Ponulak F, Kasinski A. Introduction to spiking neural networks: information processing, learning and applications. Acta Neurobiol Exp 2011;71(4):409.Google Scholar
- 24.Thorpe SJ. Spike arrival times: a highly efficient coding scheme for neural networks. Parallel processing in neural systems & computers; 1990.Google Scholar
- 27.Maas W. Networks of spiking neurons: the third generation of neural network models. Trans Soc Comput Simul Int 1997;14(4): 1659–71.Google Scholar
- 29.Gerstner W, Kistler WM. Spiking neuron models: single neurons, populations, plasticity. Cambridge: Cambridge University Press; 2002.Google Scholar
- 30.Gupta A, Long LN. Character recognition using spiking neural networks. International Joint Conference on Neural Networks; 2007. p. 53–8.Google Scholar
- 31.Tavanaei A, Maida AS. 2016. Bio-inspired spiking convolutional neural network using layer-wise sparse coding and STDP learning. arXiv preprint arXiv:1611:03000.
- 33.Azhar H, Iftekharuddin K, Kozma R. A chaos synchronization-based dynamic vision model for image segmentation. IEEE International Joint Conference on Neural Networks. IJCNN ’05. Proceedings. IEEE; 2005.Google Scholar
- 34.Loiselle S, Rouat J, Pressnitzer D, et al. Exploration of rank order coding with spiking neural networks for speech recognitionI. IEEE International Joint Conference on Neural Networks, 2005 IJCNN ’05 Proceedings IEEE; 2005. p. 2076–80.Google Scholar
- 37.Stewart TC, Xuan C, Eliasmith C. Symbolic reasoning in spiking neurons: a model of the cortex/basal ganglia/thalamus loop. Meeting of the cognitive science society; 2010.Google Scholar
- 38.O’Keefe LP, Bair W, Movshon JA. Response variability of MT neurons in macaque monkey. Soc Neurosci Abstr 1997;23:1125.Google Scholar
- 43.Hunsberger E, Eliasmith C. 2015. Spiking deep networks with LIF neurons. Computer Science.Google Scholar
- 45.Izhikevich EM. Dynamical systems in neuroscience: the geometry of excitability and bursting. Cambridge: MIT Press; 2007.Google Scholar
- 46.Agirre E, Alfonseca E, Hall K, et al. A study on similarity and relatedness using distributional and wordnet-based approaches. Proceedings of human language technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics; 2009. p. 19–27.Google Scholar
- 47.Hill F, Reichart R, Korhonen A. 2016. Simlex-999: evaluating semantic models with (genuine) similarity estimation. Computational Linguistics.Google Scholar
- 48.Luong T, Socher R, Manning CD. Better word representations with recursive neural networks for morphology. CoNLL; 2013. p. 104–13.Google Scholar
- 49.Socher R, Perelygin A, Wu JY, et al. Recursive deep models for semantic compositionality over a sentiment treebank. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP); 2013. p. 1642.Google Scholar
- 50.Phan XH, Nguyen LM, Horiguchi S. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. Proceedings of the 17th International Conference on World Wide Web. ACM; 2008. p. 91–100.Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.