CLSTM-SNP: Convolutional Neural Network to Enhance Spiking Neural P Systems for Named Entity Recognition Based on Long Short-Term Memory Network

Deng, Qin; Chen, Xiaoliang; Yang, Zaiyan; Li, Xianyong; Du, Yajun

doi:10.1007/s11063-024-11576-2

CLSTM-SNP: Convolutional Neural Network to Enhance Spiking Neural P Systems for Named Entity Recognition Based on Long Short-Term Memory Network

Open access
Published: 18 March 2024

Volume 56, article number 109, (2024)
Cite this article

Download PDF

You have full access to this open access article

Neural Processing Letters Aims and scope Submit manuscript

CLSTM-SNP: Convolutional Neural Network to Enhance Spiking Neural P Systems for Named Entity Recognition Based on Long Short-Term Memory Network

Download PDF

Qin Deng¹,
Xiaoliang Chen^1,2^na1,
Zaiyan Yang³^na1,
Xianyong Li¹^na1 &
…
Yajun Du¹^na1

469 Accesses
1 Altmetric
Explore all metrics

Abstract

Membrane computing is a type of parallel computing system (generally called P system) abstracted from information exchange mechanisms in biological cells, tissues, or neurons, which can process data in a distributed and interpretable manner. LSTM-SNP, the first model of long short-term memory networks based on parameterized nonlinear Spiking neural P systems, was proposed recently. However, a systematic understanding and leveraging of the LSTM-SNP model to address named entity recognition (NER) and other natural language processing (NLP) tasks are still lacking. The bottleneck of the NER task lies in the scarcity of data and the vague definition of entity edges. Most approaches center on dataset handling, and there have been few attempts to address the issue in Spiking neural P (SNP) systems. This paper proposes a model named CLSTM-SNP based on the LSTM-SNP, aiming to tackle the NER problem in the field of SNP systems for the first time. First, this study employs a CNN layer to obtain character-level characteristics. Second, GloVe word vectors are utilized as word representations. Third, the research employs the LSTM-SNP to analyze textual features. We subsequently studied CLSTM-SNP’s effectiveness in addressing NER problems on CoNLL-2003 and OntoNotes 5.0 datasets and compared it to the results of five other baseline methods. Our model CLSTM-SNP achieved a macro F1-score of 89.2 $\%$ on CoNLL-2003 and 75.5$\%$ on OntoNotes 5.0, respectively. The performance of CLSTM-SNP and LSTM-SNP indicates a great potential for handling named entity recognition or other sequential tasks in NLP.

Improving Feature Extraction Using a Hybrid of CNN and LSTM for Entity Identification

Article 03 January 2023

Chinese Named Entity Recognition Using the Improved Transformer Encoder and the Lexicon Adapter

A deep neural network-based model for named entity recognition for Hindi language

Article 04 April 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Membrane computing, known as the P system, was proposed by academician Gheorghe Pǎun [1] at Turku Computing Research Center. P systems are theoretical calculation models derived from the architecture and information interaction mechanisms of biological cells, tissues, and organs. As an emerging research field, the significant influence of membrane computing is mainly reflected in the following two aspects [2,3,4]. First, the computing capacity of membrane computing models is theoretically as powerful as the Turing machine. Second, the models offer high computation efficiency due to their inherent maximum parallelism. Accordingly, some computationally complex problems, e.g., NP problems, can be processed using membrane computing in a feasible time frame.

Extensive research has shown the advent of membrane computing in various applications such as biology and bio-medicine [5], computer graphics [6], cryptography [7], robot control [8], image processing [9], power system fault diagnosis [10] and other real-life complex problems. How to make membrane computing with excellent learning and application capabilities is still one of the research hotspots nowadays. Recent researches are committed to implementing learning algorithms, similar to back-propagation algorithms, within membrane computing [11, 12]. The study on applying membrane computing with learning functions to practical problems is also in progress. These papers are primarily based on the Spiking neural P (SNP) system, a significant branch of membrane computing. For instance, Wu and Pan [12]designed a numerical SNP system based on the SNP system, which possesses fine favorable numerical representation abilities. Song and Pang [13] proposed an SNP model with learning functions to successfully resolve the task of English letter recognition.

SNP systems can be expressed by a directed graph, in which the nodes represent neurons, and the arcs characterize synapses between them. Each neuron in an SNP system has two components: data and spiking rules. The data reflect the internal state, and the spiking rules describe the system’s dynamic behaviors. The working mechanism of spiking rules involves two processes: spike consumption and spike generation. Additionally, various biological mechanisms have been introduced previously, referring to a delay on synapses [14], dynamic threshold mechanism [15], nonlinear coupled mechanism [16], inhibitory rules [17], communication on request [18], etc. NSNP systems [19], referred to as nonlinear versions of traditional SNPs, are currently being investigated, in which neurons consume and generate spikes via predefined nonlinear functions of the neuron’s states. Therefore, NSNP systems are appropriate for capturing nonlinear characteristics in complex systems.

Long short-term memory network (LSTM) [20] is a class of RNN with the ability to extract data features with long-term dependency. An LSTM model incorporates a hidden state that records information about the current time step and passes it on to the next time step. The network has three gates, i.e., forget gate, input gate, and output gate, which regulate the information transmission of neurons. As previously mentioned, each spiking neuron in an SNP system has an internal state and two spiking mechanisms (spike consumption and spike generation). Motivated by the state and mechanisms of NSNP and LSTM models, Liu and Peng [21] develop the LSTM-SNP model, a parameterized NSNP system, to solve the time series prediction problem effectively. An LSTM-SNP consists of only one nonlinear spiking neuron with nonlinear spiking mechanisms (nonlinear spike consumption and nonlinear spike generation) and nonlinear gate functions (reset, consumption, and generation).

Surveys such as that conducted by Ma et al. [22] have shown that recurrent-type models perform exceptionally well on sequence problems, such as time series forecasting problems. The LSTM-SNPs are a generic term that refers to recurrent-type NSNP systems based on nonlinear spiking neuron mechanisms. Their ability to deal with time-series issues is also to be expected. Named entity recognition (NER) is a critical task in many natural language processing (NLP) time-series applications, such as information extraction, question answering, and machine translation. However, a systematic understanding and leveraging of the LSTM-SNP to address NER or other NLP time-series tasks are still lacking. Hence, this paper designed a convolutional LSTM-SNP (CLSTM-SNP) model to study the adaptability of the latest general neural networks LSTM-SNP in the NER issues. In this study, the CLSTM-SNP approach utilizes the text representation ability of GloVe vectors [23] and character feature representation of convolutional neural networks [24].

Recently, the development of LLMs has shifted from in-context learning to addressing task-specific challenges [25,26,27], particularly in the domains of NER and RE. The paradigm shift represented by targeted distillation, as exemplified by UniversalNER, provides valuable insights into the future landscape of utilizing LLMs for a myriad of applications in NLP, ranging from diverse to complex and resource-intensive scenarios. Despite the progress, LLMs’ performances on the task of NER are still well below supervised baselines. This is because of the intrinsic gap between the two tasks of NER and LLMs: NER is a sequence labeling task in nature, where the model needs to assign an entity-type label to each token within a sentence, while LLMs are formalized under a text generation task. The gap between the semantic labeling task and the text generation model leads to inferior performance when applying LLMs to resolve the NER task.

Bottlenecks in NER lie in the scarcity of data and the vague definition of entity edges. Most current NER solutions focus on data processing to compensate for scarce data, which is commonly achieved by transforming text words into graphs prior to processing [28], active learning [29], or adversarial learning strategies [30]. Are there ways to use the existing dataset of named entities in conjunction with more substantial model computing power in order to find a feasible solution? A membrane system is accurately defined as a computational model derived from biology [1]. The computational power of membrane systems can be comparable with that of the Turing machine in theory. Another purpose of this paper is to investigate a competent method to work out the NER problems in the field of membrane computing. The code is available at this website.^{Footnote 1} The main motivations and contribution of this study can be summarised as follows:

A model CLSTM-SNP combining LSTM-SNP with convolutional neural networks (CNNs) has been proposed. CLSTM-SNPs are the first attempt to combine the CNN to enhance the capability of LSTM-SNP for characteristic extraction. CLSTM-SNPs are able to extract characteristics from text data sets with a high level of sparse data more accurately than LSTM-SNPs.
This research examines the emerging role of CLSTM-SNP and LSTM-SNP models in the task of named entity recognition for the first time, aiming at extending the application of membrane computing to natural language processing. Both models have the potential to provide powerful computing capabilities in theory but suffer from difficulty in application. Our model-constructed approach is designed to overcome the difficulties of applying the SNP systems. The training processing recommended in this article provides new insights into SNP applications.
Our research shows that membrane computing is viable for solving NER, even with sparse data scenarios. While more research is needed to perfect this technique, it offers enormous potential for the future of NER research.

2 Related Work

Named entity recognition (NER) is an NLP mission that identifies and classifies named entities in text. Several entities can be considered, including people, organizations, locations, events, etc. NER is a versatile tool that can be used for many downstream applications, such as information extraction and question-answering. NER can be accomplished through the utilization of rule-based approaches, machine learning approaches, or deep learning approaches. Rule-based approaches involve the use of hand-crafted rules to identify named entities in text. These rules are typically based on patterns in the text, such as specific character sequences or word forms. Machine learning methods use the algorithm, such as a hidden Markov model (HMM) [31] or a conditional random field (CRF) [32], to identify named entities based on labeled training data. Such statistical methods can be effective but require a large training corpus. Deep learning methods apply neural networks to learn how to identify named entities from text. These methods can be effective with little training data but require more computational resources. This section provides a brief overview of each type of NER method.

2.1 Rules-Based NER

A few different approaches to NER primarily rely on grammar rules. Two classic methods can be found in the studies [33] and [34], referring to the rules according to place name dictionaries and lexical-syntactic patterns, respectively. These approaches can be practical in many cases, but there are also some limitations. One of the main drawbacks is that these approaches can be complex and rely heavily on the annotation of domain experts as part of the learning process. Developing comprehensive rules that cover all potential entities can take time and effort. In addition, entities can be intricate and may only sometimes adhere to standard grammar regulations.

2.2 Machine Learning NER

Machine learning algorithms have become increasingly sophisticated and are now able to learn from data much more effectively than before. This has made them extremely powerful tools for solving NER problems that are generally considered a multi-classification sequence annotation issue. The typical approaches include maximum entropy (MaxEnt) [35], support vector machines (SVMs) [36] hidden Markov models (HMMs) [31], and conditional random fields (CRFs) [32]. For example, Makino et al. [37] construct artificial features based on speech and word forms, extract the features with an HMM, combine them, and calculate entity recognition results using an SVM. Krishnan et al. [38] use two CRFs to access local feature extraction in entity recognition and output feature information obtained by the logic forward CRF. These models overcome the defaults of the rule design. However, their performance declines when the sentence is too long due to the inability to capture more contextual information.

2.3 Deep Learning NER

Many studies on NLP underline the significance of neural network approaches in deep learning to address NER difficulties. Long short-term memory networks (LSTMs) and convolutional neural networks (CNNs) are two popular types of neural networks that have been used to achieve excellent results on NER tasks. LSTMs are more effective when dealing with data that has a temporal component, while CNNs are more suitable for data that has a spatial structure. Neural network approaches do not necessitate a manual feature extraction process. Some related work has achieved beneficial model performance gains by updating the two types of models. Luo et al. [39] developed an attention-based bidirectional LSTM with a CRF-layer (Att-BiLSTM-CRF) to train a robust model for NER issues. BiLSTM-CNN model conducted by Li et al. [40] has shown that CNN as a model component can significantly improve entity recognition accuracy. The research conducted by Devlin et al. [41] indicates that utilizing Bert as a pretraining language model can significantly enhance the performance of NER approaches. Yang et al. [42] proposed a language model xlnet based on auto-regressive pretraining, eliminating Bert’s disadvantage in ignoring location dependence. Li et al. [28] presented a novel alternative, namely W2NER, by modeling the unified NER as word-word relation classification. NER problems can be optimally solved by combining Bert, LSTM, and multiple two-dimensional extended convolutions (DConv).

Recurrent neural networks (RNNs) are suitable for processing sequential data, especially in named entity recognition issues. Some RNN-like architectures have been developed, such as LSTMs [20] and gated recurrent units (GRUs) [43]. The RNN-like neural networks are prone to the problems of gradient disappearance and gradient explosion when dealing with the NER task. In contrast, the Spiking neural P systems have inherent distributed and powerful numerical computing capabilities. The proper combination of RNN-like structures and the SNP system mechanisms are recognized as potential solutions to the NER problems in all the neural network studies reviewed here. The idea for the proposed solution comes from recognizing that current RNN-like architectures cannot explicitly identify the syntactic relationships between words in a sentence. Conversely, the SNPs can construct syntactic relationships between words in a sentence by combining the SNP mechanisms in a neural network model.

2.4 Large Language Models with NER

Introduction to Large Language Models (LLMs) in Named Entity Recognition (NER) Research. The progress of Large Language Models (LLMs) marks a pivotal moment in the field of natural language processing (NLP) [25, 26, 44,45,46,47], significantly influencing various NLP tasks. This overview distills essential insights from three key research papers focusing on the intersection of LLMs and Named Entity Recognition (NER).

Commencing our exploration with GPT-NER [44], the introduction of LLMs, especially the groundbreaking GPT model, has initiated a revolution in NLP through the paradigm of in-context learning. GPT-NER illustrates this adaptability by redefining NER as a text-generation model, achieving performance levels comparable to supervised baselines. Notably, implementing a self-verification strategy addresses inherent limitations in LLMs, marking a significant stride in enhancing the effectiveness of real-world NER applications.

Shifting focus to GPT-RE [45], the paper delves into challenges associated with Relation Extraction (RE) despite the capabilities of LLMs. It introduces nuanced in-context learning for RE, proposing innovative approaches such as task-aware retrieval and gold label-induced reasoning. GPT-RE’s state-of-the-art performance underscores the untapped potential of LLMs in tasks demanding a deep understanding of language relationships, representing a notable leap forward in leveraging LLMs for intricate linguistic tasks.

The evolutionary trajectory takes a strategic turn with UniversalNER [46], emphasizing targeted distillation for open Named Entity Recognition (NER). Acknowledging the prohibitive costs involved, the paper explores mission-focused instruction tuning to distill LLMs into cost-efficient student models. UniversalNER, trained in collaboration with ChatGPT, not only demonstrates remarkable generalizability but also surpasses the NER accuracy of ChatGPT. This approach signals the immense potential of targeted distillation in producing highly efficient models with a reduced parameter footprint.

In summary, the development of LLMs has shifted from in-context learning to addressing task-specific challenges [25, 26], particularly in the domains of NER and RE. The paradigm shift represented by targeted distillation, as exemplified by UniversalNER, provides valuable insights into the future landscape of utilizing LLMs for a myriad of applications in NLP, ranging from diverse to complex and resource-intensive scenarios.

3 Problem Formalization

Named entity recognition (NER) is a task that is performed to identify and classify named entities within the text. The NER research pertains to NLP and information retrieval (IR). This study refers notation $\Delta $ to a set of labels such as persons, organizations, locations, products, etc. Let set ${\mathbb {S}}=\left\{ S_1,S_2,..., S_n\right\} $ denote the training set of sentences. For any element ${S_i}\in {\mathbb {S}}$, $S_i=\left\{ x_1,x_2,...,x_h\right\} $ represents an sentence (instance) with length h, i.e., h words (tokens). Any word $x_t\in S_i$ will be assigned a label $\varepsilon \in \Delta $. NER systems are designed to take a text sequence $S_i$, output a set of entities, and predict their labels. We address the NER problem by constructing a mapping model that takes a text sequence ${S_i}\in {\mathbb {S}}$ as input and predicts the labels of the resulting set of entities. As an example, when given the sentence, “John Wilson is a professor of XXX University”, a NER system based on either rule-based, machine learning, or deep learning algorithms could accurately assign the entity labels “John Wilson” (person) and “XXX University” (organization). In particular, the label for a word will be the category of the entity if that word is indeed an element of a named entity. Otherwise, this word falls outside the scope of named entities.

To be specific, we assume that the dataset ${\mathbb {S}}=\left\{ S_1,S_2,..., S_n\right\} $ has n sentences, each sentence $S_i=\left\{ x_1,x_2,...,x_h\right\} $ has h words, and each word $x_t=\left\{ c_{1},c_{2},...,c_{l}\right\} $ has l characters. The dataset should be processed according to Algorithm 1. The time complexity of the Algorithm 1 is $O(h\cdot l)$, where h is the number of words in the sentence, and l is the average number of characters per word.

4 CLSTM-SNP Model Specification

4.1 Overview

This work presents a novel neural network model, namely CLSTM-SNP, to address named entity recognition (NER) issues. The proposed model combines convolutional neural networks (CNNs) and LSTM-SNP modules to obtain improved accuracy and efficiency, allowing for syntactic and semantic information integration in the NER task. The CLSTM-SNP model is an effective sequence analysis tool consisting of two components: multi-feature embeddings and an LSTM-SNP module processor. The multi-feature embeddings provide a way to represent sequence features of characters, word semantics, and word capitalization. Some classical methods are effectively combined to extract the above three features. The CNN module is responsible for extracting character-level features of entities from the input text. The Glove [23] module provides an exceptional way to learn vector representations of the input sequence that accurately capture the semantic features of words. We assess Collobert’s [48] approach to identify word capitalization characteristics.

The LSTM-SNP module processor then uses this information to analyze the text sequence, fed with the text representations created by combining the embedding of the three features (character-level, word semantic, and word capitalization). The LSTM-SNP module processor allows CLSTM-SNP to be used for a variety of sequence analysis tasks, especially for NER issues. Finally, the model has produced the sequence of entity labels predicted. Figure 1 illustrates the overall structure of the CLSTM-SNP model.

4.2 Multi-feature Embeddings

In this section, the multi-feature embeddings are discussed in detail. These representations allow the encoding of multiple features in a single representation, which can be used for solving NER issues. Section 4.2.1 depicts the workflow of collecting the character-level features. Sections 4.2.2 and 4.2.3 indicate the mechanisms of word semantic features and word capitalization features, respectively.

4.2.1 Character-Level Features Embedding

The architecture of the convolutional neural networks (CNNs) developed by LeCun et al. [24] are widely recognized as one of the most influential neural network designs in deep learning. By utilizing convolutional and pooling layers, CNNs can learn the hierarchical representation of visual data, making them very suitable for image recognition and other computer vision tasks. Furthermore, CNNs have been successfully applied to natural language processing, audio processing, and other domains. The experimental setup in this paper utilizes a CNN module to extract character-level features from English text data pertaining to named entities. English words consist of very fine-grained letters with hidden features such as prefixes and suffixes. The CNN module is trained on a labeled dataset and can learn the complex relationship between the letters and the associated named entity. The extracted features are then used to classify the entity. This technique is advantageous compared to traditional methods, as it can capture the contextual information surrounding the entity, leading to improved accuracy in entity recognition. In the execution details, we set random character vectors for different characters to distinguish between characters and character types (letters, numbers, punctuation, and special symbols). For example, uppercase ‘A’ and lowercase ‘a’ correspond to two different sets of character vectors. The illustration in Fig. 2 provides an example of how the CNN module extracts the character-level features of an individual word.

Let ${c_k} \in {{\mathbb {R}}^d}$ be the d-dimensional character vector corresponding to the kth character in a word $x_t$. A character vector $c_k$ is formed by querying the character lookup table T (Collobert et al. [48]). A word of length l is represented as

$$\begin{aligned} \begin{aligned}&{\mathrm{{c}}_{1:l}} = {c_1} \oplus {c_2} \oplus ... \oplus {c_l}, \end{aligned} \end{aligned}$$

(1)

where operation ‘$\oplus $’ indicates the concatenation. Accordingly, character vector matrix $C_t$ (${C_t} \in {{\mathbb {R}}^{d \times l}}$) of the word $x_t$ is generated, where d represents the dimension of a character vector $c_k$ and l signifies the length of the word $x_t$. The problem of uneven word lengths in the character vector matrix is addressed by adding extra placeholders to the left and right sides of the words in the sequence, achieving the same word length based on the longest word. In general, let $c_{k:k+j}$ refer to the concatenation of characters $c_k,c_{k+1},...,c_{k+j}$. The CLSTM-SNP model employs one convolutional layer to obtain character-level feature vectors in the convolutional process. The convolutional layer also helps to reduce the complexity of the model and speed up the process of feature extraction. The convolution process involves taking a filter $w\in {{\mathbb {R}}^{d \times m}}$ and applying it to a span of m characters to generate a fresh feature. For instance, a feature $F_k$ is generated from a window of characters $c_{k:k+m-1}$ by

$$\begin{aligned} \begin{aligned}&{F_k} = f(w\cdot {c_{k:k + m - 1}} + b). \end{aligned} \end{aligned}$$

(2)

where, $b\in {\mathbb {R}}$ is a bias term and f is a non-linear function such as the hyperbolic tangent or relu. This filter is used for each possible window of characters in the word $\left\{ c_{1:m}, c_{2:m+1},..., c_{l-m+1:l}\right\} $ to obtain a feature map

$$\begin{aligned} \begin{aligned}&{F} = [F_1,F_2,..., F_{l-m+1}], \end{aligned} \end{aligned}$$

(3)

with ${F} \in {{\mathbb {R}}^{l-m+1}}$. Afterward, we utilize the max pooling operation on the feature map and pick the maximum value ${\hat{F}} = \max \{ F\} $ as the feature linked to the particular filter. The primary goal is to acquire the most significant feature by choosing the highest value from each feature map. We describe the process of extracting one feature from a filter. The CLSTM-SNP model utilizes various filters (each with a distinct window size) to create multiple feature maps, thus allowing for a better understanding of the underlying text information.

Finally, we construct a representation of word $x_t$ by connecting all the feature maps.

$$\begin{aligned} \begin{aligned}&{{\hat{F}}_{word}} = {{\hat{F}}_1} \oplus {{\hat{F}}_2} \oplus ... \oplus {{\hat{F}}_n}, \end{aligned} \end{aligned}$$

(4)

where ${\hat{F}}_{word}$ is a character-level features embedding by a convolution neural network, ‘$\oplus $’ is the concatenation operator, and n specifies the quantity number of filter.

The CLSTM-SNP model has incorporated the CNN module for character-level feature embedding processing. This module is based on the Keras framework of TensorFlow technology, providing an effective way to extract and encode features from characters. The specific process can be found in Algorithm 2. The time complexity of Algorithm 2 is characterized by two main components: text data initialization and iterative CNN module execution. Text data initialization is denoted as O(f(n)), where f(n) represents the complexity related to text data. The iterative execution of the CNN module, controlled by total_iterations, contributes O(g(m)), where g(m) signifies the complexity related to the CNN module, and m is associated with the input size. In summary, the overall time complexity is succinctly expressed as $O(f(n) + total\_iterations \cdot \mathrm{{g(m))}}$, with n and m representing the scale of text data and the input size of the CNN module, respectively.

4.2.2 Word Semantic Features Embedding

Word embeddings can be employed to convey the semantic implications of words in a fashion analogous to the way humans conceptualize words. In recent years, NLP has seen significant advancements in the field of NER. Several powerful theoretical tools, such as word2vec [49, 50] and Glove [23], have been developed and widely adopted to facilitate the extraction of entities from text. These tools are designed to analyze and understand the context of words and phrases in order to identify and classify entities accurately. We conducted experiments with the CLSTM-SNP model by utilizing a set of published embeddings, specifically Stanford’s GloVe embedding, which was trained with 6 billion words from Wikipedia and Web text. By using global matrix decomposition and a local context window technique, GloVe embedding produces a vector space with a meaningful substructure and lower dimensions compared to one-hot representations with more sparse structures. The specifics and inner workings of the GloVe embedding will be discussed below.

The GloVe module primarily draws from the latent semantic analysis (LSA) approach proposed by Hofmann [51] and the word2vec technique of singular value decomposition. The LSA algorithm splits the individual elements of the term-document matrix (each one represented as a TF-IDF) to create vector representations for the terms and documents. This TF-IDF is mainly utilized to obtain the terms’ global statistical characteristics. Word2vec algorithms, first introduced by Mikolov et al. [49, 50], are used to generate vector representations of words, known as word embeddings. These algorithms can be divided into two main categories: skip-gram and continuous bag of words (CBOW) [49]. Skip-gram focuses on predicting a target word given the context words, while CBOW focuses on predicting the context words given the target word [49]. These two techniques (skip-gram, CBOW) rely on a local sliding window approach for determining the context. Word2vec algorithms are used for various NLP tasks, such as sentiment analysis and text classification.

LSA and word2vec are the representatives of two kinds of word embedding techniques. One method of matrix decomposition relies on global features, whereas the other takes into account the local context. The GloVe module merges two characteristics: the general data feature of the corpus and the nearby context feature (for instance, a sliding window) to generate semantic features. The GloVe module introduces the co-occurrence probabilities matrix. We present the co-occurrence matrix of word-word.

The quantity $Y_{ab}$ refers to the total occurrences of word b when it is in the vicinity of word a across all corpus. Alternatively, the total number of occurrences of all words (except for a) in conjunction with word a is denoted by ${Y_a}$:

$$\begin{aligned} \begin{aligned}&{Y_a} = \sum \nolimits _c {{Y_{ac}}} \end{aligned} \end{aligned}$$

(5)

The possibility $P_{ab}$ of word b appears in the context of word a can be calculated by:

$$\begin{aligned} \begin{aligned}&{P_{ab}} = P(b\mid a)= \frac{{{Y_{ab}}}}{{{Y_a}}} \end{aligned} \end{aligned}$$

(6)

A simple example of the co-occurrence matrix [23] is shown in Table 1. The first entry of the co-occurrence matrix is the probability of the word ‘solid’ appearing when the word ‘ice’ is present. Similarly, the probability of the word ‘gas’ appearing when ‘ice’ is present is recorded in the second entry.

Table 1 A simple example of the co-occurrence

CLSTM-SNP: Convolutional Neural Network to Enhance Spiking Neural P Systems for Named Entity Recognition Based on Long Short-Term Memory Network

Abstract

Similar content being viewed by others

Improving Feature Extraction Using a Hybrid of CNN and LSTM for Entity Identification

Chinese Named Entity Recognition Using the Improved Transformer Encoder and the Lexicon Adapter

A deep neural network-based model for named entity recognition for Hindi language

1 Introduction

2 Related Work

2.1 Rules-Based NER

2.2 Machine Learning NER

2.3 Deep Learning NER

2.4 Large Language Models with NER

3 Problem Formalization

4 CLSTM-SNP Model Specification

4.1 Overview

4.2 Multi-feature Embeddings

4.2.1 Character-Level Features Embedding

4.2.2 Word Semantic Features Embedding

4.2.3 Word Capitalization Features Embedding

4.3 LSTM-SNP Module Processor

5 Experiment Analysis

5.1 Datasets

5.2 Performance Metrics

5.3 Parameter Configuration

5.3.1 Hyperparameter Settings for Word Embeddings

5.3.2 Hyperparameter Settings for Iterations, Dropout, and Sentence Length

5.3.3 Hyperparameter Settings for LSTM-SNP Neurons

5.3.4 Hyperparameter Settings for CNN Module

5.4 Experimental Results and Discussion

5.4.1 Influence of the LSTM-SNP Module

5.4.2 Influence of the Word Embeddings

5.4.3 Influence of the CNN Module and the Capitalization Features

6 Conclusion and Future work

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation