1 Introduction

Relational triple extraction aims to recognize all entities and semantic relations between entities from unstructured text, which is widely used in various downstream tasks, such as knowledge graph construction [1] and question answering [2].

Traditional pipeline approaches [3, 4] first identify all entities in a sentence, and then classify the relations between each entity pair. Although these methods are flexible, they suffer from error propagation. To address this shortcoming, joint feature-based extraction models [5, 6] are proposed. However, these methods often rely heavily on external NLP tools and require sophisticated feature engineering.

Recently, deep neural network models based on joint extraction have attracted a lot of interest from researchers. Sun et al. [7] have developed a graph convolutional network model of an entity-relation bipartite graph, which allows joint inference of entity and relation types. Wei et al. [8] and Ren et al. [9] tackled the triple extraction task in two steps, first identifying the head entities, and then detecting multiple tail entities under specific relations. TDEER [10] employed a translation decoding strategy that treats relations as translation operations from head entities to tail entities. RIFRE [11] represented words and relations as nodes on a graph and fuse them to obtain a more efficient representation of nodes for the relation extraction task. RFBFN [12] transformed triple extraction into first detected relations and then recognized entities.

Although the above methods achieve promising performance, most of them often detect entities and relations sequentially, which may lead to error accumulation. Inspired by the above ideas, we propose a parallel model for jointly extracting entities and relations (PRE-Span) in this paper, which consists of two mutually independent submodules. Specifically, for a given sentence, our method generates candidate entities and relations by enumerating token sequences based on span length. Then, the Entity Extraction Module and the Relation Detection Module are designed to perform entity recognition and relation detection, respectively. Finally, the prediction results from two submodules are filtered to retain only those predicted to be entities and relations, which are then decoded jointly. However, enumerating token sequences in this way generates a large number of negative samples. To overcome this problem, we randomly remove some negative samples with downsampling. Extensive experiments are conducted on public datasets (WebNLG*, NYT*, NYT and WebNLG) and the results demonstrate that our method achieves strong competition.

In summary, the main contributions of our work are as follows:

  1. (1)

    We propose an end-to-end model that transforms the relational triple extraction task into two mutually independent and parallel-executed submodules, which can effectively solve error accumulation.

  2. (2)

    Unlike most previous methods, the proposed PRE-Span can simultaneously detect both entities and relations in sentences, and the features between submodules do not interfere with each other. This method extracts all the triples in a sentence in just one step.

  3. (3)

    Extensive experiments are conducted on several datasets (WebNLG*, NYT*, NYT and WebNLG) and the results show that our method outperforms previous baselines.

2 Related Work

In recent years, various neural network models based on joint learning have been proposed by researchers. According to the relational triple extraction procedure, related works can be broadly divided into the following three categories [13]: sequence labeling, table filling and text generation.

The first class is sequence labeling, which converts entity recognition and relation classification into a sequence labeling problem. Zheng et al. [14] and Luo et al. [15] proposed a sophisticated tagging schema that allows both entities and relations to be tagged simultaneously, without the need to identify them separately. Zheng et al. [16] predicted potential relations and constrains subsequent entity extraction to the predicted relation subset, instead of all relations. Wang et al. [17] improved the accuracy of the triple extraction task by integrating the semantic role attention mechanism with position awareness and the attention mechanism based on semantic feature vectors. To effectively leverage correlations between semantic relations, Wang et al. [18] proposed a tensor learning model based on Tucker decomposition, which used a three-dimensional word relation tensor to depict the relations between words within a sentence. Jiang et al. [19] designed an entity and relation heterogeneous graph attention network, comprising word nodes, subject nodes, and relation nodes. This architecture aims to learn and enhance semantic information between entities and relations.

The second class is table filling, which treats relational triple extraction as a table filling problem. Fu et al. [20] proposed a relation-weighted graph convolutional network to improve relation extraction by accounting for the interaction of information between named entities and relations. Wang et al. [21] designed a one-stage joint extraction model, TPLinker, that converts joint extraction into a token pair linking problem and introduces a novel handshaking tagging scheme to align boundary tokens of entity pairs under each relational type. Wang et al. [22] used a unified classifier to predict the label of each cell so that information between entities and relations could be better learned. Shang et al. [23] treat the joint extraction task as a fine-grained triple classification to tackle the challenge posed by the interdependence and indivisibility of the three components within a triple. Ren et al. [9] proposed a global feature-oriented model for relational triple extraction that enhances the global associations between relations and token pairs. Gao et al. [24] proposed a novel lightweight joint extraction model based on a global entity matching strategy, which uses relation attention to fuse candidate relations into the entity recognition module to identify entities in sentences more accurately. Wang et al. [25] devised a W-shaped DNN (WNet) to capture coarse-level high-order connections, aiming to encompass more comprehensive information than first-order word-by-word interactions.

The third class is text generation, which uses the encoder-decoder framework to generate relational triples. Zeng et al. [26] developed an end-to-end model to generate the relation and its corresponding entities through a copy mechanism, but it is limited to predicting only the last word of an entity. To address this limitation, CopyMTL [27] can effectively identify entities with multiple tokens. TransRel [28] was a novel unified translation framework that addresses redundant predictions, overlapping triplets, and relational connections simultaneously. Huang et al. [29] used encoder-to-decoder to decompose relational triple extraction into two subtasks and capture the connection information between them via a partition filter network.

However, most existing methods detect entities and relations sequentially, which may result in error accumulation if the initial step is incorrectly identified. Unlike previous methods, our proposed model consists of two mutually independent submodules and uses the output features of the BERT encoder as their input. Therefore, it would effectively solve the above problem.

3 Proposed Model

In this section, we first define the relational triple extraction task in Sect. 3.1. Next, the generating principle for candidate entities and relations is introduced in Sect. 3.2. Subsequently, the Entity Extraction Module and the Relation Detection Module are described in detail in Sects. 3.3 and 3.4, respectively. To effectively train our model, a joint learning method is introduced in Sect. 3.5. Finally, a specific decoding process is described for two submodules in Sect. 3.6. Figure 1 shows an overview architecture of the proposed model.

Fig. 1
figure 1

The overall architecture of the end-to-end model

3.1 Task Definition

The purpose of relational triple extraction is to recognize all entities and their corresponding relations in sentences. Given a sentence with \(\textit{n}\) tokens \(X=\left( x_{1},x_{2},...,x_{n}\right) \), our model is designed to detect all possible triples \(T(X)=\left\{ \left( s,r,o\right) |s,o\in \textrm{E},r\in \mathcal {R}\right\} \), where \(\textrm{E}\) is the head and tail entities of the triples, and \(\mathcal {R}\) is the set of predefined relation types.

3.2 Constructing Candidate Entities and Relations

We generate candidate entities and relations by enumerating all consecutive token sequences with a span length that is less than the sentence length. For example, the sentence “The BBC broadcasted Bananaman which starred Bill Oddie”, has two triples: (Bananaman, starring, Bill Oddie) and (Bananaman, broadcastedBy, BBC). All candidate entities {“The”, “The BBC”, “The BBC broadcasted”, ..., “Bill Oddie”, “Oddie”} are generated based on the span length, as described in previous works [12, 30]. For relations, candidate relations are also generated by enumerating token sequences, except that the span length is not set. It is worth noting that positive and negative samples are separated when enumerating token sequences, and the threshold of candidate entities and relations is set 100. If the total sample size exceeds a certain threshold, \(\textit{N}\) negative samples (\(\textit{N}\) = 100 - number of positive samples) are randomly selected and mixed with the positive samples. Otherwise, all negative samples are mixed into positive samples. We denote the subset of candidate entities and relations as \(\varepsilon ^{e}\) and \(\varepsilon ^{r}\).

3.3 Entity Extraction Module

To improve the performance of the entity recognition task, the component consists of Multi-head Self-Attention and Bi-LSTM, which effectively capture contextual representation. Specifically, we use the BERT encoder to obtain the contextual representation of each token. Let \(\textit{S}=\left[ s_{1},s_{2},\dots ,s_{n}\right] \) denote all the feature representations in \(\textit{X}\), where \(\textit{S}\in \mathbb {R}^{n\times {d}}\), \(\textit{n}\) is the length of a sentence and \(\textit{d}\) is the embedding dimension. The output \(\textit{S}\) from the BERT encoder is fed into the Multi-Headed Self-Attention layer, which aims to project the hidden representation into different subspaces and learn them individually. The formulas for the Multi-Head Self-Attention layer are presented below:

$$\begin{aligned} head_{l}&=Attention\left( QW_{l}^{q},KW_{l}^{k},VW_{l}^{v} \right) \nonumber \\ \text {Multihead}&=\text {Concat}\left( \text {head}_{1},\text {head}_{2},...,\text {head}_{l}\right) W^{o} \end{aligned}$$
(1)

where, \(\textit{K}\), \(\textit{Q}\) and \(\textit{V}\) are derived from matrix \(\textit{S}\), \(W_{l}^{q}\), \(W_{l}^{k}\), \(W_{l}^{v}\) and \(W_{l}^{o}\) are trainable weights and \(\textit{l}\) is the number of heads in Multi-Head Self-Attention. We represent the outputs as \(M =\left[ m_{1},m_{2},\dots ,m_{n}\right] \), with \(\textit{n}\) is the number of tokens in the sentence. Then, the \(\textit{M}\) is input to Bi-directional Long Short-Term Memory (Bi-LSTM) for encoding, which calculates the current forward LSTM \(\overrightarrow{h_{i}^{e}}\) and backward one \(\overleftarrow{h_{i}^{e}}\) based on the previous hidden state \(h_{i-1}\), the memory cell \(c_{i-1}\) and the current word vector \(m_{i}\). The detailed formulas for the Bi-LSTM are as follows:

$$\begin{aligned} \overrightarrow{h_{i}^{e}}&=\text {LSTM}\left( m_{i},\overrightarrow{h_{i-1}},\overrightarrow{c_{i-1}}\right) \nonumber \\ \overleftarrow{h_{i}^{e}}&=\text {LSTM}\left( m_{i},\overleftarrow{h_{i-1}},\overleftarrow{c_{i-1}}\right) \end{aligned}$$
(2)

The \(\overrightarrow{h_{i}^{e}}\) and \(\overleftarrow{h_{i}^{e}}\) are concatenated as a sequence-level representation of the \(m_{i}\). The representation of it can be denoted as:

$$\begin{aligned} h_{i}^{e}=\left[ \overrightarrow{h_{i}^{e}};\overleftarrow{h_{i}^{e}}\right] \end{aligned}$$
(3)

The final outputs of the Bi-LSTM are \(H^{e}=\left[ h_{1}^{e},h_{2}^{e},\dots ,h_{n}^{e}\right] \) after computing the hidden state for each token in a sentence. Finally, based on the candidate entities \(e_{i}\in \varepsilon ^{e}\) mentioned in Sect. 3.2, the corresponding word vectors from Bi-LSTM are selected and subjected to a max pooling operation before passing through the linear layer for classification. The max pooling and linear classification formulas are as follows:

$$\begin{aligned} e_{i}&=\text {Maxpool}\left( \left[ X_{start\left( i\right) }^{e};X_{end\left( i\right) }^{e}\right] \right) \end{aligned}$$
(4)
$$\begin{aligned} \text {Ent}_{i}&=W_{e}e_{i}+b_{e} \end{aligned}$$
(5)

where, \(X_{start\left( i\right) }^{e}\) and \(X_{end\left( i\right) }^{e}\) are the contextual presentations of the boundary tokens, \(W_{e}\in \mathbb {R}^{d\times {n_{e}}}\) and \(b_{e}\in \mathbb {R}^{1\times {n_{e}}}\) are trainable weights, \(\textit{d}\) is the word vector’s dimension and \(n_{e} \) is the size of the tag set.

3.4 Relation Detection Module

Considering the potential mutual influence between the two submodules, along with the design principles of the Relation Detection Module, we combine Bi-LSTM and Feed Forward Network. This combination is intended to capture local contextual representations within sentences to avoid introducing additional noise. To elaborate, we input the features extracted by the BERT encoder into Bi-LSTM to facilitate the acquisition of contextual representations. For vectorized token \(r_{i}\), the final hidden state \(h_{i}^{r}\) is obtained by concatenating the features of the forward LSTM \(\overrightarrow{h_{i}^{r}}\) and the backward one \(\overleftarrow{h_{i}^{r}}\), as follows:

$$\begin{aligned} h_{i}^{r}=\left[ \overrightarrow{h_{i}^{r}};\overleftarrow{h_{i}^{r}}\right] \end{aligned}$$
(6)

Therefore, the final output representation of Bi-LSTM is denoted as \(H^{r} =\left[ h_{1}^r,h_{2}^r,\dots ,h_{n}^r\right] \), where \(h_{i}^r\) is the hidden state of the i-th tokens and \(\textit{n}\) is the length of the sentence. Next, the Feed Forward Network (FFN) is connected behind the Bi-LSTM and \(\textit{Relu}\) is used as the activation function. The formula for FFN is as follows:

$$\begin{aligned} \text {Re}=\text {Relu}\left( WH^{r}+b\right) \end{aligned}$$
(7)

where, \(W\in \mathbb {R}^{d\times {n_r}}\) and \(b\in \mathbb {R}^{1\times {n_r}}\) are trainable weights. Then, the candidate relations constructed in Sect. 3.2 are used to obtain the FFN feature representation and perform max pooling. Finally, a linear layer is used to predict the type of candidate relations. The detailed formulas are as follows:

$$\begin{aligned} t_i&=\text {Maxpool}\left( \left[ X_{start\left( i\right) }^r;X_{end\left( i\right) }^r\right] \right) \end{aligned}$$
(8)
$$\begin{aligned} Relat_i&=W_rt_i+b_r \end{aligned}$$
(9)

where, \(X_{start\left( i\right) }^r\) and \(X_{end\left( i\right) }^r\) are the contextual presentations of the boundary tokens, \(W_r\in \mathbb {R}^{d\times {n_r}}\) and \(b_r\in \mathbb {R}^{1\times {n_r}}\) are trainable weights, \(\textit{d}\) is the word vector’s dimension and \(n_r\) is the number of relation types.

3.5 Joint Training

To enable the two submodules to learn the features of the BERT encoder, different learning rates are set for them. We adopt cross-entropy loss as the loss function for the two submodules, and the total loss can be divided into two parts, as follows:

$$\begin{aligned} \mathcal {L}_{ent}&=\sum _{i=1}^k \log P\left( y_i^*=\hat{l}^*\right) \end{aligned}$$
(10)
$$\begin{aligned} \mathcal {L}_{rel}&=\sum _{j=1}^n\log P\left( y_j^*=\hat{t}^*\right) \end{aligned}$$
(11)

where, k is the number of entity types, \(\hat{l}^*\) is the ground truth of the candidate entity, n is the number of relational types, \(\hat{t}^*\) is the true tag of the candidate relation. The total loss is the sum of the Entity Extraction Module and the Relation Detection Module losses, as follows:

$$\begin{aligned} \mathcal {L} =\mathcal {L}_{ent}+\mathcal {L}_{rel} \end{aligned}$$
(12)

3.6 Joint Decoder

In the sentence discussed in Sect. 3.2, (“BBC”, “BBC”), (“Bananaman”, “Bananaman”) and (“Bill”, “Oddie”) are predicted to be 1, which means “BBC”, “Bananaman” and “Bill Oddie” are entities. For relation, (“Bananaman”, “BBC”) and (“Bananaman”, “Oddie”) are predicted to be 68 and 52, which means that the relations of “Bananaman broadcasted BBC” and “Bananaman which starred Bill Oddie” are “broadcastedBy” and “starring”, respectively. To form triples, we first construct entity pairs: < “BBC”, “BBC” >, < “BBC”, “Bananaman” >, ..., < “Bananaman”, “Bill Oddie” > and < “Bill Oddie”, “Bill Oddie” >. Then, the longest segments are identified as “BBC”, “BBC broadcasted Bananaman”, ..., “Bananaman which starred Bill Oddie” and “Bill Oddie”. Finally, if segments and relations are matched, the corresponding entity pairs and relations could form relational triples. Based on the above steps, two triples can be decoded: (Bananaman, broadcastedBy, BBC) and (Bananaman, starring, Bill Oddie).

4 Experiments

4.1 Datasets

For a fair and comprehensive comparison with previous works, we evaluate the performance of our model using NYT [31] and WebNLG [32] datasets, respectively.

  • NYT: The dataset is generated by a distantly supervised relation extraction task, and was automatically aligned through Freebase’s relational facts and the New York Times (NYT) corpus. It contains 150 business articles from the New York Times, of which 56k are training sentences and 5k are test sentences.

  • WebNLG: It was created for the Natural Language Generation (NLG) task, which contains 5k training sentences and 703 test sentences.

Both of the above datasets have another version, which is annotated only with the last word of the entities. Following the convention of previous works [12, 13], we refer to them as NYT* and WebNLG*. The datasets used in our experiments were provided by Zheng et al. [16] and the detailed statistical results are shown in Table 1.

Table 1 Statistics on WebNLG, NYT, WebNLG* and NYT* datasets

4.2 Metrics

To be consistent with prior works [33,34,35], we use standard Precision(Prec), Recall(Rec) and F1-score(F1) as evaluation metrics for our model, as follows:

$$\begin{aligned} \text {Prec}&=\frac{\text {TP}}{\text {TP}+\text {FP}} \nonumber \\ \text {Rec}&=\frac{\text {TP}}{\text {TP}+\text {FN}} \nonumber \\ \text {F1}&=\frac{2\times \text {Prec}\times \text {Rec}}{\text {Prec}+\text {Rec}} \end{aligned}$$
(13)

where, TP represents the number of correctly predicted triples; FP represents the number of mispredicted triples; FN represents the number of true triples in the corpus that were not predicted.

4.3 Experimental Settings

Our model is implemented with PyTorch and uses Adamw to optimize its network weights. We use bert-base-cased [36] as a sentence feature encoder and fine-tune the parameters during training. At the same time, dropout is added to the Relation Detection Module to prevent overfitting of the model. To ensure experimental rigor, all of our experiments are performed on the same device with RTX3060 GPU, AMD R5-5500 CPU and 24 G RAM. The settings of our model hyperparameters are shown in Table 2.

Table 2 Hyperparameter settings in our model

4.4 Compared Method

We compare our method with the following baseline models:

  • NovelTagging [14]: The model converted the joint extraction task into a tagging problem, which can extract triples directly without independently identifying entities and their relations.

  • CopyRE [26]: The sequence-to-sequence model with copy mechanism that attempted to solve different types of triples.

  • MultiHead [37]: The model first identified all entities in a given sentence and then transformed the relation extraction task into a multi-headed selection problem.

  • GraphRel [20]: The model facilitated the interaction between entities and relations through a relation-weighted GCN for better relation extraction.

  • OrderCopyRE [38]: The method applied reinforcement learning to a sequence-to-sequence model to generate multiple triples in a specific order.

  • ETL-Span [39]: The model decomposed the joint extraction task into two associated subtasks and implemented triple extraction by a hierarchical boundary tagger and a multi-span decoding algorithm.

  • ImGraph [40]: The GCN model based on a relation-aware attention mechanism was designed to connect entities and relations graphically.

  • RSAN [41]: A relation-specific attention network model was proposed to address redundancy in relation prediction.

  • CasRel [42]: To tackle the challenge of triples overlap, a novel cascade binary tagging framework (CASREL) has been proposed, treating relations as functions that map subjects to objects within a sentence.

  • GCN\(^2\)-NAA [43]: A novel joint entity-relation extraction model, GCN\(^2\)-NAA, was proposed. The model extracts relation triples in stages by Graph Convolutional Neural Networks and a NodeAware Attention mechanism.

  • CBCapsule [44]: A Cascade Bidirectional Capsule Network model was proposed, which first aggregates context representations dynamically and then uses bidirectional routing mechanisms to facilitate information interaction between entities and relations.

  • RMAN [45]: A joint extraction model of entities and relations, called RMAN, was proposed that encodes sentence representations and decodes sequence annotations through multiple feature fusion.

  • ERHGA [46]: A heterogeneous graph attention network with a gate mechanism was proposed to improve the performance of the model by containing world nodes, subject nodes, and relation nodes.

4.5 Main Results

The comparative results of our model against baseline models across all datasets are shown in Table 3. Experimental results demonstrate that our method achieves better performance in Precision, Recall and F1 scores, outperforming almost all other models. Compared to the best model, ERHGA, PRE-Span achieves a significant performance improvement on the WebNLG* dataset, with an increase of 1.1% in F1 score. In addition, among all baselines, our method also achieves better performance on NYT* and NYT datasets. Comparative experiments validate the rationality of the proposed method and show relatively good performance. We attribute this success to the design concept of two independent submodules, as their feature representations are mutually independent, resulting in experimental results that exceed other methods.

We further observe that the performance in the WebNLG dataset is marginally inferior to GCN\(^2\)-NAA. Nested entities are found in sentences by analyzing the dataset. For example, in the sentence “NK University is nicknamed Cornell Big Red”, “Cornell” and “Cornell Big Red” are both entities, and the latter includes the former. The test sentences contain a total of 205 triples across 85 sentences with nested entities. We hypothesize that the performance of our model is affected by nested entities. To test this hypothesis, when sentences with nested entities are removed and reevaluated with the trained model, the result shows a significant increase in F1 score to 86.5%. This indicates that although our method has limitations in extracting Subject Object Overlap, for other types of relational triples, they can be effectively identified and extracted by our model without error accumulation. Given the characteristics of our method, we expect it to be widely used in relevant practical scenarios.

Table 3 Results on NYT*, WebNLG* NYT and WebNLG datasets

4.6 Detailed Results on Complex Scenarios

To further analyze the performance of our model in different overlapping patterns and different number of triples, we conduct experiments on the WebNLG* and NYT* datasets.

We evaluate sentences containing triples of Normal, SEO and EPO types, and the results are shown in Fig. 2. Our model outperforms all baselines on the WebNLG* dataset, showing significant improvements in different types of triples: Normal, SEO, and EPO by 3.8%, 3.3%, and 7.8%, respectively, compared to ETL-Span. For the NYT* dataset, our method improves by 5.5% in triples of Normal type. However, it performs relatively worse than other types.

Fig. 2
figure 2

F1 score of different overlapping patterns on sentences

In addition, we explore the performance of the model using sentences containing varying numbers of triples. Based on the number of triples in the sentence, WebNLG* and NYT* datasets can be classified into five groups: 1, 2, 3, 4, and \(\ge \)5, as shown in Fig. 3. The results show that with the exception of sentences with four triples in NYT*, the F1 score outperforms all baseline models. It indicates that our method has the ability to extract multiple triples.

Fig. 3
figure 3

F1 score of a different number of triples on sentences

4.7 Efficiency Comparison

To show the training efficiency of the methods more clearly, experiments are conducted on the NYT* and WebNLG* datasets, with comparisons made against baselines, as illustrated in Fig. 4.

It is evident from the figure that ETL-Span and CopyRE, which did not use the BERT model, required 165 s and 122 s for training on NYT* and WebNLG* datasets, as well as 11 s and 9 s, respectively. However, CasRel and PRE-Span both use the BERT model and conduct experiments on the same datasets. In comparison, PRE-Span takes only 814 s and 62 s. This indicates that due to the parallelizability of the two submodules in PRE-Span, it takes less time during the training phase.

Fig. 4
figure 4

Time taken for training for each epoch within the NYT* and WebNLG* datasets

4.8 Results on Different SubModules

To further investigate the detection capabilities of each component in our model, we conduct more detailed evaluations on the NYT* and WebNLG* datasets. Table 4 shows the results for Precision, Recall and F1. The Entity Extraction Module demonstrates a strong identification capability across all evaluation indices, with a score of over 96%, indicating that the submodule effectively recognizes entities within sentences. The Relation Detection Module performs well on the WebNLG* dataset, but has a lower F1 score of 88.1% on the NYT* dataset.

We also observe that the Entity Extraction Module outperformes another submodule across all evaluation indices. It is believed that the module focuses solely on identifying entities in sentences, regardless of their type. Conversely, the Relation Detection Module not only accurately identifies the start and end positions of candidate relations, but also determines their types.

Table 4 Results from different submodules of the WebNLG* and NYT* datasets

4.9 Ablation Experiment

To insight into the impact of individual submodules on model performance, we conducted an ablation study on the WebNLG* dataset, and the results are shown in Table 5. When the Entity Extraction Module is removed, there is a 0.6% decrease in the F1 score, indicating that the module contributes to improved model performance. Simultaneously removing both submodules results in a 0.9% decrease in the F1 score, suggesting that each submodule contributes to model performance. In addition to the probe components, we study the impact of dropout on the Relation Detection Module. The F1 score drops by 0.4% when the dropout is removed. This suggests that the module is overly complex if certain neurons are not randomly dropped during training. In addition, we also study the impact of freezing the parameters of the BERT encoder on the downstream task. The result shows that all evaluation indices are down, with Precision being particularly affected by a 2.7% decline.

Table 5 Performance of different ablations on the WebNLG* dataset. A bold mark indicates the highest score

5 Conclusion and Future Work

In this paper, we propose an end-to-end parallel model comprising two mutually independent modules that can detect both entities and relations in sentences, allowing relational triple extraction in a single step. To verify the validity of our model, extensive experiments are conducted on WebNLG*, NYT*, NYT and WebNLG datasets, and the results show that our method outperforms other baselines. Furthermore, we also explore the impact of submodules on the model, and the ablation study shows that they all contribute to model performance.

However, our model has some limitations. On the one hand, if a sentence contains nested entities, such as “Cornell” and “Cornell Big Red” where “Cornell Big Red” encompasses “Cornell”, this situation may result in the generation of redundant error triples during decoding. On the other hand, the Relation Detection Module encounters challenges in accurately determining the types of candidate relations. We argue that when the number of tokens in a segment is substantial, this submodule could introduce additional information, potentially leading to inaccuracies in detection results.

In future work, we plan to explore other techniques to address the problem of nested entities in relational triples, and implement more advanced approaches to mitigate the challenge of long-distance dependencies in a relation extraction task.