Abstract
The proposed method for joint entity and relation extraction integrates the tasks of entity extraction and relation classification by sharing the encoding layer. However, the method faces challenges due to incongruities in the contextual information captured by these subtasks, resulting in potential feature conflicts and adverse effects on model performance. To address this, we introduced a novel joint entity and relation extraction method that incorporates multi-module feature information enhancement (MFIE) (https://github.com/liyao345496280/Relation-extraction). We employ a relation awareness enhancement module for the entity extraction task, which directs the model’s focus towards extracting entities closely related to potential relations using a potential relation extraction module and an attention mechanism. For the relation extraction task, we implement an entity information enhancement module that uses entity extraction results to augment the original feature information through a gating mechanism, thereby enhancing relation classification performance. Experiments on the NYT and WebNLG datasets demonstrate that our method performs well. Compared to the state-of-the-art method, the F1 score on the NYT dataset improved by 0.7%.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Triplet extraction involving relations constitutes a core task in natural language processing, with extensive applicability spanning information extraction, knowledge graph creation [1], question-answering systems [2], and recommendation systems [3]. The task primarily focuses on unveiling entities and their associated semantic relations within unstructured text, ultimately representing them as triples \(<s,r,o>\), as shown in Fig. 1, where r represents a predefined array of relations, while s and o denote subject and object entities, respectively. Two principal categories of models have been devised for this purpose: pipeline models and joint models. Pipeline models address entity recognition and relation extraction sequentially, potentially leading to issues related to error propagation. However, end-to-end joint extraction models for entities and relations have gained prominence due to their ability to effectively harness implicit interactions between named entity recognition and relation extraction, thereby mitigating concerns regarding error propagation.
Joint entity and relation extraction models can be categorized into two groups based on their decoding techniques: multi-step decoding models [4,5,6,7] and single-step decoding models [8,9,10]. Multi-step decoding models use multiple interconnected processing steps and modules to extract entities and relations sequentially. These modules share parameters but require distinct decoding algorithms at each step, leading to the accumulation of decoding errors that can adversely affect the model’s performance. Conversely, single-step decoding models independently identify entities and relations, combining them into a ternary structure based on their potential relevance. This approach mitigates the issue of decoding error accumulation to some extent because it employs a single decoding process. However, it also reduces the correlation between sub-tasks. Moreover, existing joint models often use shared encoding layers, but the contextual feature information that entity extraction and relation extraction focus on is not identical [11]. Therefore, performing identification without constraints can result in computational redundancy and further reduce the model’s performance.
Therefore, we propose a novel joint entity and relation extraction approach incorporating Multi-Module Feature Information Enhancement (MFIE). We designed two distinct modules to optimize entity extraction and relation extraction. For entity extraction, we introduced a relation awareness enhancement module. This module initially extracts potential relation information from the input sentence and then integrates it with the feature data obtained from the model’s encoding layer. This enables the entity extraction model to prioritize entities relevant to the relations of interest, reducing computational redundancy. Conversely, for relation extraction, we designed an entity information enhancement module. This module leverages the outcomes of entity extraction by incorporating them as feature information. A gating mechanism combines this entity feature information with the feature data from the encoding layer, enhancing the input information for the relation extraction model and improving its performance. The effectiveness of our approach was validated through extensive experiments on two commonly used datasets: NYT and WebNLG, consistently delivering robust performance even in complex scenarios characterized by overlapping triplets and multiple triplets.
The primary contributions of this study are as follows:
-
We proposed a relation-aware enhancement module that effectively utilizes potential relation information in sentences to constrain the extraction of entity information, reducing the capability of the model to identify irrelevant relation entities and mitigating the computational overhead.
-
We designed an entity information enhancement module, which effectively incorporated entity information obtained from entity extraction to enrich the features extracted at the relation extraction layer, enhancing the relation extraction capability of the model.
-
In order to reduce the problem of entity sparsity as well as relationship sparsity in the text, we employ a global pointer network at the decoding layer and use a sparse cross-entropy loss function to mitigate the sparsity problem and improve the performance of the model’s relationship extraction.
Related work
Traditional approaches to entity relation extraction often adopt a pipeline methodology [12,13,14], breaking down the task into distinct sub-tasks of named entity recognition and relation extraction. However, this conventional method neglects to harness the inherent interactions between these two tasks, leading to a growing preference for the joint entity and relation extraction approach in current research.
Early joint extraction methods were usually categorized as multi-task learning methods. These methods essentially constructed two separate models for entity extraction and relation extraction, optimizing them uniformly by sharing parameters. The initial investigation [4] introduced sequence labeling for entity prediction and employed a tree-based LSTM (Long Short Term Memory) model for relation extraction. This innovative approach facilitated parameter sharing by utilizing shared LSTM layers. Building on this, the authors of [15] adopted a joint model of attention-based recurrent neural networks to address the shortcomings of tree structures. In [16], the authors enhanced the performance of the relation classification model by framing it as a multi-head selection problem, effectively resolving the challenge of handling multiple overlapping relations. The CasRel (Cascade Binary Tagging Framework) model [17] innovatively divided entity relation extraction into two stages: head entity recognition and relation extraction. Firstly, the head entity recognition layer was binary classified, and then relation extraction and corresponding tail entity recognition were performed based on the head entity.
While the CasRel model improved the generalization of entity relation extraction tasks, these methods were essentially pipelined methods, specifically multi-step decoding methods, and encountered issues of error propagation. Joint decoding models, which are designed based on table-filling methods [18], have introduced a fresh perspective to the joint entity and relation extraction task. In [9], the authors merged a graph neural network with a table-filling model to adeptly capture the associative information between the two subtasks. Additionally, in [10], a table-filling model was proposed, employing two independent encoders: a sequence encoder to gather information for the entity recognition task and a table encoder to acquire information for the relation extraction task.In contrast, the TPlinker model [19] took a different approach by treating joint entity-relation extraction as a Token Pair Linking problem. It introduced a specialized handshake marking scheme tailored to relations, facilitating the alignment of boundary tokens between entity pairs. The model exhibited certain advantages when dealing with multiple relations. However, it constructed matching matrices based on sentence sequences for global relations, leading to data sparsity and slower training.The EmRel (Joint Representation of Entities and Embedded Relations) model [20] introduced a relational representation and exploited rich interactions across relations, entities, and contexts. Nevertheless, it used newly initialized relations for direct embedding, resulting in noisier information and ignoring heterogeneity between entities.
Although the joint decoding algorithm alleviates the error transfer problem caused by pipelined models to a certain extent, it still suffers from computational redundancy when combining triples. We argue that this issue arises because the aforementioned models still adopt model chunking and step-by-step ternary extraction, whereas entity extraction and relation extraction do not focus on the same contextual feature information. Thus, the shared encoding layer approach leads to feature information conflicts and an excessive number of negative samples when the model computes features.
To tackle these challenges, we designed a joint entity and relation extraction approach combined with Multi-Module Feature Information Enhancement (MFIE). We introduced a relation awareness enhancement module and an entity information enhancement module, tailored for the entity extraction and relation extraction tasks, respectively. Specifically, we first generate sentence-level relation feature vectors through an encoder and extract potential relation feature representations of sentences through a classification task. Subsequently, these are combined with the original entity feature information to perform entity recognition via a global pointer network. Afterwards, the obtained entity-extracted feature representation is combined with the feature information of the relation extraction model through a gating mechanism to obtain a new feature representation with enhanced entity information. Finally, relation extraction is performed using a global pointer network. Experimental results corroborate the efficacy of our model on public datasets.
Methodology
The overall process of MFIE model is illustrated in Fig. 2. The algorithm consists of two main phases: In the first phase, the relation-aware enhancement module is designed to incorporate relation feature information into the input of the entity extraction model. During the second phase, the entity information enhancement module is crafted to fuse the entity extraction results with the relation extraction model’s input. This integration empowers the relation extraction model to access enhanced entity feature information at its input.
Task definition
In a given sentence \(S = \left\{ W_1, W_2, \ldots , W_n\right\} \) containing predefined relations \(R = \left\{ r_1, r_2, \ldots , r_m\right\} \), where n represents the sentence length and m denotes the number of relations, the entity-relation joint extraction task aims to identify all possible triples within the sentence \(T = \left\{ (s, r, o), |, s, o \in E,, r \in R\right\} \). Here, s represents the head entity, o signifies the tail entity, r corresponds to the relation, and E constitutes the set of entities.
Encoder module
In the encoding layer, we employ the BERT pre-trained model as a sentence encoder to transform each token of the sentence into embeddings, as illustrated in Eq. (1).
Here, D represents the embedding vector acquired for each sentence, where n denotes the sequence length, and \(d_i \in \mathbb {R}^l\) signifies the embedding vector representation of each token, with l representing the dimension of the embedding vector.
Entity extraction layer based on relation awareness enhancement
In order to enhance the model’s capacity to filter out irrelevant relation information and mitigate computational redundancy during the entity extraction process, we introduce a relation awareness enhancement module to bolster the model’s entity extraction capabilities. The complete module design is depicted in Fig. 3.
To obtain the relation feature information associated with each word in a sentence, we initially construct a sentence relation representation vector \(V = \left\{ v_1, v_2, \ldots , v_m\right\} \). In this context, m refers to the total number of potential relations, \(v_i\) signifies the relations present within the sentence, and i serves as the relation identifier (provided in the dataset). If the sentence does not include the relation numbered i, the corresponding \(v_i\) is set to 0; if the sentence includes the relation numbered i, then \(v_i\) is set to i.
Following that, we encode the relational representation of the sentence to generate a relational embedding vector \(V_s\). Given that the number of relations in the sentence is significantly lower than the total number of relations, the embedding vector acquired may contain a noticeable level of error. To procure a more precise relation embedding vector, it is essential to input it into the self-attention mechanism for attention calculation, as delineated in Eqs. (2) and (3).
The Relational Feature Vector \(V_s \in \mathbb {R}^{m \times l}\) represents a feature vector that encompasses all relations in a sentence. To combine the relation feature vector with the word vector, we employ a sentence-level relation extraction module to process the embedding vectors obtained from the BERT model. This module yields the character-level relation prediction set, as shown in Eq. (4).
Here, \(W_r\) and \(b_r\) represent the trainable weights and biases, respectively. \(\text {Relu}\) denotes the activation function, and \(P_r\) signifies the acquired word-level representation of the relation prediction.
Finally, the relational feature vector is multiplied by the word-level relational prediction representation to obtain word-level relational feature information. This obtained word-level relational feature information is then added to the original BERT word vector. This process enhances the relational information in the input vectors of the final entity extraction model, thereby improving the accuracy of entity extraction, as illustrated in Eqs. (5) and (6).
where \(D_r\) is the relation feature information at the word level, and \(D_{ent}=\{h_1,h_2,...,h_n\}\) is the input vector of entity extraction layer, \(h_i\) is each vector in the sequence vector. We input the input vector enhanced with relation information into the entity extraction decoding layer for entity extraction. Guided by the relation information, we can obtain more accurate entity recognition results.
At the entity extraction decoding layer, we adopt the decoding method of global pointer network inspired by the literature [21]. Different from the ordinary pointer network,as shown in Fig. 4, the global pointer network considers the head and tail of the entity as a whole to identify, as shown in Eqs. (7–9).
In this case, W and b denote the trainable parameters. In addition, \(q_{i,a}\in R^l\) and \(k_{i,a}\in R^l\) are vector representations identifying entities of type a, specifically, \(q_{i,a}\) and \(k_{i,a}\) are vector representations of the head token and tail token of the entity, respectively. Here, l denotes the vector dimension. i and j denote the initial and final position of the entity in the sequence, respectively. o denotes the relative position information of each entity in the sequence, while \(E_a \in R^{ann}\) denotes the score matrix of the entity, where a denotes the number of types, n corresponds to the length of the sequence,and \(E_a(i,j)\) denotes the vector representation corresponding to the position of (i, j) in the matrix, the vector representation of the corresponding entity. Since entity extraction in the Entity Relationship Extraction task does not need to identify specific types of entities, but only needs to classify entities as head entities or tail entities, we consider entity classification as a binary task \(a=2\), classifying them as head entities or tail entities.
Relation extraction layer based on entity information enhancement
The structural diagram of the entity information enhancement module is depicted in Fig. 5.
Specifically, The output of the entity extraction layer is a score matrix for the whole entity \(E_a\in R^{a*n*n}\), here consider \(a=1\) regarded as the head entity and \(a=2\) regarded as the tail entity, we need to split this matrix to get the corresponding head token and tail token of the head entity and the head token and tail token of the tail entity.
First, the matrix is split into a head entity matrix and a tail entity matrix. This is shown in Eq. (10).
where \(E_1\in R^{n*n}\) is the head entity, and \(E_2\in R^{n*n}\) is the tail entity. \(\text {chunk}\) is the matrix split function, and here it is split in dimension a to perform the splitting.
Due to the nature of the global pointer network, we can consider each \(n*n\) matrix as representing the positional relations among words in a sentence sequence of length n. Therefore, we can treat the words along the row coordinates as head tokens and the words along the column coordinates as tail tokens. Subsequently, we will reduce the dimensions of the matrix through aggregation. Specifically, we will reduce matrices \(E_1\) and \(E_2\) row-wise into vectors \(D_{h,s}\) and \(D_{h,e}\), and column-wise into vectors \(D_{t,s}\) and \(D_{t,e}\). In this context, \(D_{h,s}\) and \(D_{t,s}\) denote the head and tail tokens of the head entity, respectively, while \(D_{h,e}\) and \(D_{t,e}\) represent the head and tail tokens of the tail entity.
Subsequent analysis of the obtained \(D_{h,s}\), \(D_{h,e}\) and \(D_{t,s}\), \(D_{t,e}\) are combined to obtain the head token pair vector and tail token pair vector as shown in Eqs. (11, 12).
Here, \(D_h\) signifies the vector depiction of the head token pair, while \(D_t\) denotes the vector portrayal of the tail token pair. Then, using the gate mechanism, the original BERT word vector representation is enhanced with entity information obtained by \(D_h\) and \(D_t\) as gate functions, as shown in Eq. (13,14), so that the input information of the relation extraction model can include entity information obtained from entity extraction, thereby improving the performance of relation extraction.
where D is the original BERT word vector representation. \(D_{relh}\) and \(D_{relt}\) are the head token pair vector representation and the tail token pair vector representation to be input to the relational extraction model to get entity information enhancement, respectively.
In the relation extraction decoding layer, we continue to employ the global pointer network for decoding, but with a different configuration. In this case, the relation type is denoted as \(a=m\), where m represents the dataset’s overall count of relations. The head token pairs and tail token pairs are separately fed into the global pointer network for decoding, resulting in the score matrices \(E_h\in R^{ann}\) and \(E_t\in R^{ann}\).
Training strategies
Influenced by prior research [22], we utilize a sparse variant of the multi-label cross-entropy loss function for model training. The traditional multi-label cross entropy loss function, on the other hand, is computed by enumerating the positive and negative samples separately, and its general structure is shown in equation (15). specifically, the prediction errors of the positive class T and the negative class F are calculated and summed up to get the final loss value. Where a positive sample is a set of samples belonging to the class, and a negative sample is a set of samples not belonging to the class, as shown in Fig. 4, where 1 corresponds to a positive sample and 0 corresponds to a negative sample.
where T denotes the group of positive classes, F signifies the group of negative classes, and \(D^i\) signifies each of the score matrices acquired previously.
Since the number of negative samples is much larger than that of positive samples, a larger computational burden is incurred when performing the computation of stroking negative samples, so we chose the sparse multi-label cross-entropy loss function. The sparse version of multilabel cross-entropy is calculated by substituting the whole samples set for the negative samples, as shown in in Eq. (16). Specifically, the whole samples set comprises both positive and negative samples. If the loss were to be calculated directly on the positive and negative samples, it would necessitate index-based judgments for each sample. However, calculating across the whole samples set bypasses the indexing phase, allowing direct computation on the entire prediction matrix. Therefore, replacing the more numerous negative samples with the whole samples set can significantly reduce the computational burden.
where \(A = T \cup F\), thus the loss values for the set of negative classes can be calculated from A and T, greatly reducing the amount of computation. Ultimately, based on the requirements of the entity-relation extraction task, the calculated loss values are derived by calculating and subsequently averaging the score matrices for all three: entity extraction, head token-to-relation extraction and tail token-to-relation extraction, as shown in Eq. (17).
where \(E_a\) is the score matrix for entity extraction, \(E_h\) is the score matrix for head token-to-relation extraction, and \(E_t\) is the score matrix for tail token-to-relation extraction.
Experimental results and analysis
Dataset
We performed experiments on two publicly accessible datasets: NYTFootnote 1 and WebNLG.Footnote 2 The aim was to assess the efficacy of our MFIE model by comparing it with classical models. Table 1 provides specific dataset details. In Table 2, “Normal” indicates datasets without overlapping triples, “SEO” signifies scenarios with only one shared entity among triples, “EPO” designates datasets where triples share the same entity pair, and “N” represents the number of triples.
Assessment of indicators
For an equitable comparison with prior research, we employ Precision Recall, and F1 scores as evaluation metrics for our model. The formulas are shown in Eqs. (18, 19, 20).
where TP is the number of entities correctly identified, FP is the number of non-entities incorrectly identified as entities, and FN is the number of entities incorrectly identified as non-entities.
Experimental setup
In our experiments, we trained the model on a RTX 4090 GPU running on the Windows 10 OS. We employed the BERT-Base-Cased version for our pre-trained model, featuring 12 transformer layers with a hidden layer size of 768 and 12 self-attention heads. The hyper-parameters of the model are specified in Table 3.
Baseline model
This model will be compared with the following baseline models:
-
1.
The CasRel model [17] first extracts information about the head entity and subsequently extracts information about the tail entity and entity relations using the head entity information as a criterion.
-
2.
The TPlinker model [19] approaches the joint extraction of entity relations as a token-pair linking problem and incorporates a handshake marking scheme to align entity pairs.
-
3.
The EmRel model [20] introduces a relational representation and exploits rich interactions across relations, entities and contexts to enhance the learning model.
-
4.
The PRGC model [23] presents a joint extraction framework that emphasizes latent connections and worldwide correspondences.
-
5.
SPN4RE model [24] proposes an integrated prediction model.
-
6.
The OneRel model [25] employs a solitary module and a single-stage approach to extract triplets directly from text.
-
7.
The PMEI model [26] introduces an incremental multi-task learning strategy that leverages early predicted information interactions to enhance representations specific to each task.
-
8.
The PARE model [27] proposes a remote supervision based learning model.
-
9.
The ERGM model [28] suggests a unified extraction approach founded on a global entity matching strategy, incorporating a relational attention mechanism for embedding relational representations, similar to EmRel.
-
10.
The PRDCEM model [29] utilizes a cross-attention mechanism to detect relational information to obtain relational information embedding, and a negative sampling strategy to reduce error propagation as a way to improve the model’s relational extraction performance.
-
11.
The TERS model [30] introduces a sequence of relations to connect the two triple extraction steps, filters out irrelevant information, and uses iterations to interact with the information.
Analysis of experimental results
To assess the effectiveness of the MFIE model, we carried out experiments on both the NYT and WebNLG datasets. The training process for these experiments is depicted in Fig. 6, with the red curve representing the F1 change for the NYT dataset and the green curve for the WebNLG dataset. Figure 6 displays the F1 change curve of our model during the training process, utilizing the first 20 epochs for training. The curve clearly illustrates the rapid overall training convergence, with the model reaching a fitting state in just 15 epochs.It should be noted that compared with the NYT dataset, the model’s fitting speed on the WebNLG dataset is relatively slow. This article analyzes that this is because the WebNLG dataset has a smaller amount of data but a larger number of relations. For the relation-aware enhancement module in our MFIE model, it has a higher learning burden, hence the slower fitting speed.
Comparison experiment
To evaluate the efficacy of our model, we performed a comparative analysis with the baseline on two separate datasets, as depicted in Table 4.
Table 4 presents a comparison between our model’s experimental outcomes and those of other baseline models on both the NYT and WebNLG datasets. The superior results are highlighted in bold, while the second-best ones are underscored. On the NYT dataset, our model excels, demonstrating a remarkable 0.7 increase in the final F1 value compared to the runner-up. Conversely, with the WebNLG dataset, our model performs comparably to the top baseline model. Still, it secures the second-best position by elevating the F1 value when contrasted with the other baseline models. This incremental enhancement on the WebNLG dataset is primarily due to its relatively smaller dataset size, yet an abundance of relations, which places constraints on our relation-aware enhancement module’s learning capacity. Despite our endeavors to augment the module’s aptitude for learning sentence relations through the integration of the relation attention mechanism, the advancement remains somewhat limited. In contrast, the NYT dataset provides an ample volume of training data and features a significantly smaller number of relations in comparison to the WebNLG dataset. Consequently, the relation-aware enhancement module has more extensive opportunities for proficient learning, ultimately leading to a superior overall performance. Thus, our forthcoming research will be dedicated to exploring more effective methods for extracting relational information from sentences.
Efficiency analysis
In this section, we compare the efficiency of our model with the best performing OneRel model mentioned above on the NYT dataset, as shown in Table 5, with batchsize uniformly set to 6 and maxlength uniformly set to 512, and in order to make convenient comparisons, we compare at the first epoch. Where Tt denotes the training time, Dt denotes the validation time, and Sum denotes the memory share of training.
As can be seen in Table 5, the training time of our model on the first Epoch is 55 s, which is 10 s less compared to the OneRel model. The validation time of our model on the validation set is 4 s, which is 3 s less compared to the OneRel model. In terms of memory usage, our model is 5592 MB less than the OneRel model for the same batchsize. All the results show that our model is better in terms of computational efficiency and is more competitive in real-world application environments. We analyse that this is because our model employs decoding entity extraction and relation extraction separately at the decoding end, which reduces the dimensionality of computing the score matrix. The OneRel model, on the other hand, although better in terms of effectiveness, has a higher dimensionality and a greater computational burden due to the single-step decoding method, which results in a score matrix that contains both entity and relation components.
Detailed results for complex scenarios
To confirm our model’s capability to handle sentences with overlapping and multiple triples, we conducted extension experiments inspired by CasRel. These experiments were performed using the NYT and WebNLG datasets, and we selected the same four models as baselines. Tables 6 and 7 display the detailed experimental comparisons. Bold text highlights the best results obtained.
The results reveal that our model achieves the highest F1 scores in 13 out of 16 subsets. This demonstrates our model’s distinct advantage in handling cases involving straightforward overlapping and multiple triples.
Ablation experiment
To further affirm the efficiency of our model and assess the influence of the two modules on its performance, we conducted ablation experiments. Table 8 presents the F1 score results from these ablation experiments on the overall model. We conducted comparative experiments by removing modules, and the models in the table are the original model (Ours), the model with the entity information enhancement module (EIE) removed, the model with the relation awareness enhancement module (RAE) removed, and finally the model with both modules removed, only retaining the embedding layer and encoding layer.
From Table 8, it can be observed that both the relation awareness enhancement module and the entity information enhancement module have a significant impact on the model. However, the enhancement of the entity information enhancement module is less for the model compared to the relation awareness enhancement module. We suggest that this is due to error accumulation. Although the module in this paper facilitates the interaction between subtasks to a certain extent, it also increases the error accumulation of the whole model accordingly, thus limiting the enhancement effect of the module. The improvement in the relation awareness enhancement module is less pronounced on the WebNLG dataset compared to the NYT dataset. This observation aligns with our earlier argument that the WebNLG dataset has a larger number of relations but a smaller dataset size, which limits the module’s learning capacity and consequently results in a smaller enhancement effect. In future research, we will explore methods to leverage entity extraction results more effectively to enhance relation extraction, addressing the issue of error accumulation in the entire model. Additionally, we will investigate techniques to enhance the learning ability of the relation-aware module when working with smaller datasets.
In order to further verify the improvement effect of the relation awareness enhancement module on the model, this paper first removes the entity information enhancement module in the model, retains the relation awareness enhancement module as the original model “R”, and conducts comparative experiments on the WebNLG dataset through ablation experiments. As shown in Fig. 7, “R-w/o attention” represents the model without attention mechanism, and “R-w/o RAE” represents the model without the entire relation awareness enhancement (RAE). Among them, s represents the F1 value of head entity extraction, o represents the F1 value of tail entity extraction, (s, o) represents the F1 value of entity extraction, and (s, r, o) represents the F1 value of the entire triple extraction. The relation-aware enhancement module indeed enhances the entity extraction capability of the model. Furthermore, the introduction of the attention mechanism in the module further enhances entity extraction and improves sentence relation extraction. As depicted in the figure, the enhancement of entity extraction ability positively impacts the model’s triple extraction performance as well.
Impact of different pre-training models
In order to verify the scalability of our method, we selected four different pre-training models, BERT-Small, BERT-base, BERT-large, and roberta-base, and experimented on both NYT and WebNLG datasets. As shown in the Table 9, based on the use of different pre-training models, our method can achieve more than 90% F1 value on both individual datasets, which reflects better performance. This proves the scalability of our method.
Case studies
In this section, we present specific examples from both the NYT dataset and the WebNLG dataset to analyze our models, as depicted in Fig. 8. In the figure, the green markers represent all the triples in the sentence, the red color indicates incorrectly recognized triples, and the blue color indicates correctly recognized triples. We compare the specific recognition results under four conditions: “Ours”, “w/o EIE”, “w/o RAE” and “w/o RAE and EIE”.
According to Fig. 8, the “w/o EIE” model lacks a relation-aware enhancement module, which results in the inability to filter out entities that do not have a relation in the sentence during the entity extraction process. As a result, wrong entities corresponding to wrong relations are extracted. For example, in the first sentence, the relation “/people /deceased_person /place_of_death” does not exist in the original sentence. However, since it is not possible to filter irrelevant relations in advance, the model still extracts the relation and the corresponding entity. The “w/o RAE” model lacks an entity information enhancement module, which results in the failure to identify the correct entity when extracting relations. For example, in the second sentence, the triad “[“Memorial”, “leaderTitle”, “Azerbaijan”]” is incorrectly extracted, where “Memorial” is the wrong entity. The “w/o RAE and EIE” model, however, lacks interaction between subtasks after the removal of the two modules. This causes the errors of entity extraction subtasks to accumulate with the errors of relation extraction subtasks.Ultimately, the model performs poorly in extracting triples and suffers from incomplete recognition, e.g., in the first sentence it suffers from incomplete extraction and can only recognize a single type of relationship between pairs of entities, making it difficult to extract multiple relationships for the same pairs of entities, and in the second sentence, due to the lack of help from the two modules for their respective subtask layers, it suffers from the same problem of incomplete extraction and fails to filter out erroneous triples.From the results, the two modules proposed in this model can significantly boost the interaction between subtasks and improve the model’s ternary extraction ability to achieve the best extraction results.
Conclusion
This paper presents a method for Joint Entity and Relation Extraction Combined with Multi-Module Feature Information Enhancement (MFIE). Initially, we utilize a BERT pre-trained encoder to obtain word embedding vectors from the text. Subsequently, we incorporate two specialized modules aimed at enhancing entity extraction and relation extraction: the relation awareness enhancement module and the entity information enhancement module. The relation awareness enhancement module captures potential relation information from sentences through a potential relation extraction module and an attention mechanism. It then integrates this information with BERT-encoded data to ensure that the input information of the entity extraction layer includes relation information while reducing irrelevant content. The entity information enhancement module effectively combines entity extraction results and BERT encoding information via a gating mechanism. This optimizes the input information of the relation extraction layer with entity information, thereby improving relation extraction performance. In the decoding layer, we employ a global pointer network and sparse multi-label cross-entropy to decode features and train the model, resulting in optimal ternary extraction results. We conducted experiments on the NYT and WebNLG datasets to validate the effectiveness of our MFIE model, as demonstrated through comparative and ablation experiments.
Nonetheless, in scenarios characterized by a higher quantity of relations and fewer available training samples, our model demonstrates limited enhancements compared to the baseline model. In future research, we will delve deeper into optimizing the synergy between entity and relational information, striving for greater efficiency in the model’s capacity to identify intricate entity and relation categories.
Data availability
The datasets analyzed in this study are accessible from [25]. Data can also be obtained directly from the authors upon reasonable request and with their permission.
Notes
Get and preprocess NYT following CasRel [17]: https://github.com/weizhepei/CasRel/tree/master/data/NYT.
Get and preprocess WebNLG following CasRel [17]: https://github.com/weizhepei/CasRel/tree/master/data/WebNLG.
Abbreviations
- MFIF:
-
Joint entity and relation extraction combined with multi-module feature information enhancement
- LSTM:
-
Long short term memory
- BERT:
-
Bidirectional encoder representation from transformers
- EIE:
-
Entity information enhancement
- RAE:
-
Relation awareness enhancement
- SEO:
-
Single entity overlap
- EPO:
-
Entity pair overlap
References
Liu Q, Li Y, Duan H et al (2016) A survey of knowledge mapping construction techniques. J Comput Res Dev 53(3):582–600
Dwivedi SK, Singh V (2013) Research and reviews in question answering system. Procedia Technol 10:417–424
Guo Q, Zhuang F, Qin C et al (2020) A survey on knowledge graph-based recommender systems. IEEE Trans Knowl Data Eng 34(8):3549–3568
Miwa M, Bansal M (2016) End-to-end relation extraction using LSTMs on sequences and tree structures. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics (volume 1: long papers). Association for Computational Linguistics
Fu T-J, Li P-H, Ma W-Y (2019) Graphrel: modeling text as relational graphs for joint entity and relation extraction. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, pp 1409–1418
Fu T-J, Li P-H, Ma W-Y (2020) Joint extraction of entities and relations based on a novel decomposition strategy. In: Proceedings of the 24th European Conference on Artificial Intelligence. Santiago de Compostela, pp 2282–2289
Yuan Y, Zhou X, Pan S et al (2021) A relation-specific attention network for joint entity and relation extraction. In: International joint conference on artificial intelligence. International Joint Conference on Artificial Intelligence
Zhong Z, Chen D (2021) A frustratingly easy approach for entity and relation extraction In: 2021 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021. Association for Computational Linguistics (ACL), pp 50–61
Sun C, Gong Y, Wu Y et al (2019) Joint type inference on entities and relations via graph convolutional networks. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, pp 1361–1370
Wang J, Lu W (2020) Two are better than one: joint entity and relation extraction with table-sequence encoders. In: Proceedings of the 2020 conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics
Zexuan Z, Danqi C (2021) A frustratingly easy approach for entity and relation extraction. In: Proceedings of the 2021 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics
Zhu H, Lin Y, Liu Z et al (2019) Graph neural networks with generated parameters for relation extraction. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, pp 1331–1339
Xiao M, Liu C (2016) Semantic relation classification via hierarchical recurrent neural network with attention. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: technical papers, pp 1254–1263
Li Z, Sun Y, Tang S et al (2021) Dual attention guided graph convolutional networks for relation extraction. Acta Electron Sin 49(2):315
Katiyar A, Cardie C (2017) Going out on a limb: joint extraction of entity mentions and relations without dependency trees. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics (volume 1: long papers), pp 917–928
Bekoulis I, Deleu J, Demeester T et al (2018) Adversarial training for multi-context joint entity and relation extraction. In: EMNLP2018, the conference on Empirical Methods in Natural Language Processing, pp 1–7
Wei Z, Su J, Wang Y et al (2020) A novel cascade binary tagging framework for relational triple extraction. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, pp 1476–1488
Zhang M, Zhang Y, Fu G (2017) End-to-end neural relation extraction with global optimization. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 1730–1740
Wang Y, Yu B, Zhang Y et al (2020) TPLinker: singlestage joint extraction of entities and relations through token pair linking. In: Proceedings of the 28th International Conference on Computational Linguistics, pp 1572–1582
Xu B, Wang Q, Lyu Y et al (2022) EmRel: joint representation of entities and embedded relations for multi-triple extraction. In: Proceedings of the 2022 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 659–665
Zhang Y, Li J, Xin Y et al (2023) A model for Chinese named entity recognition based on global pointer and adversarial learning. Chin J Electron 32(4):854–867
Sun Y, Cheng C, Zhang Y et al (2020) Circle loss: a unified perspective of pair similarity optimization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6398–6407
Zheng H, Wen R, Chen X et al (2021) PRGC: potential relation and global correspondence based joint relational triple extraction. In: Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (volume 1: long papers), pp 6225–6235
Sui D, Zeng X, Chen Y et al (2023) Joint entity and relation extraction with set prediction networks. IEEE Trans Neural Netw Learn Syst
Shang Y-M, Huang H, Mao X (2022) Onerel: joint entity and relation extraction with one module in one step In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 36, no 10, pp 11285–11293
Sun K, Zhang R, Mensah S et al (2021) Progressive multi-task learning with controlled information flow for joint entity and relation extraction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 35, no 15, pp 13851–13859
Chen T, Zhou L, Wang N et al (2022) Joint entity and relation extraction with position-aware attention and relation embedding. Appl Soft Comput 119:108604
Gao C, Zhang X, Li L et al (2023) ERGM: a multi-stage joint entity and relation extraction with global entity match. Knowl Based Syst 271:110550
Zhou X, Zhang Q, Gao M, Wang G (2023) Joint relational triple extraction based on potential relation detection and conditional entity mapping. Appl Intell 53(24):29656–29676
Zhanjun Z, Haoyu Z, Qian W, Jie L (2023) Entity-relation triple extraction based on relation sequence information. Expert Syst Appl 238:121561
Funding
Funding was provided by The National Natural Science Foundation of China (Grant No. 61173184).
Author information
Authors and Affiliations
Contributions
Yao Li: Conceptualization, Methodology, Software, Validation, Data Curation, Writing - Original Draft, Writing - Review and Editing. Yan He: Methodology, Formal analysis, Validation, Data Curation, Writing-Review and Editing. Ye Zhang and Xu Wang: Formal analysis, Validation, Writing - Review and Editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no conflict of interest regarding the publication of this paper.
Ethics approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, Y., Yan, H., Zhang, Y. et al. Joint entity and relation extraction combined with multi-module feature information enhancement. Complex Intell. Syst. (2024). https://doi.org/10.1007/s40747-024-01518-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40747-024-01518-9