NEDORT: a novel and efficient approach to the data overlap problem in relational triples

Relation triple extraction is a combination of named entity recognition and relation prediction. Early works ignore the problem of data overlap when extracting triples, resulting in poor extraction performance. Subsequent works improve the capability of the model to extract overlapping triples through generative and extractive methods. These works achieve considerable performance but still suffer from some defects, such as poor extraction capability for individual triplets and inappropriate spatial distribution of the data. To solve the above problems, we perform sequence-to-matrix transformation and propose the NEDORT model. NEDORT predicts all subjects in the sentence and then completes the extraction of relation–object pairs. There are overlapping parts between relation–object pairs, so we conduct the conversion of sequence to matrix. We design the Differential Amplified Multi-head Attention method to extract subjects. This method highlights the locations of entities and captures sequence features from multiple dimensions. When performing the extraction of relation–object pairs, we fuse subject and sequence information through the Biaffine method and generate relation–sequence matrices. In addition, we design a multi-layer U-Net network to optimize the matrix representation and improve the extraction performance of the model. Experimental results on two public datasets show that our model outperforms other baseline models on triples of all categories


Introduction
Relation triple extraction 1 is a combination of named entity recognition [1,9,22,27] and relation prediction [13,18,35].The previous works perform these two tasks separately [24,30], which is inconvenient and ineffective.Subsequently, the methods of joint extraction are proposed, bringing a breakthrough for the extraction of triples.
Early works on relation triple extraction first extract all entities in a sentence, and then predict the relations [14,20,50].With the development of extraction methods [8,28,36], the performance of such models has been further improved.Although these models can complete the extraction of relational triples, they fail to deal with the problem of data overlap.As shown in Fig. 1, the relational triples are divided into three categories: Normal, SEO, 2 and EPO. 3 The Normal category contains triples without overlapping patterns, while triples of the other two categories contain one and two overlapping entities, respectively.In addition, SEO and EPO overlapping patterns can appear simultaneously in a sentence, which brings significant challenges to the extraction of relational triples.The above models do not consider overlapping data and perform poorly on triples of SEO and EPO categories.[47] points out the problem of data overlap for the first time and classifies the relational triples according to the overlapping patterns.To deal with overlapping data, [47] proposes a generative model, which extracts relational triples through the encoding-decoding structure.Subsequent works optimize on this basis and propose various generative methods [10,25,45].Generative methods generate the entire triple by designing an end-to-end framework.The performance of the model can be improved by optimizing the encoder and decoder [7,10,46].Generative models obtain the entire triple without considering the data overlap problem.The implementation of these methods is based on the extraordinary inference capability of the model, which limits the extraction effect of triples.The extractive method can also deal with overlapping data through ingenious framework design.These works divide the triple into several parts and complete the extraction through multiple steps [16,44].The design of task division can deal with overlapping data more effectively.Extractive methods transform triple extraction into a classification problem, which is easier to implement.Recently, CGT [42] and GraphJoint [40] perform well on the extraction of relational triples.CGT improves the qual-ity of generated triples by constructing negative samples and introducing the dynamic masking mechanism.GraphJoint proposes a relation-based triple extraction model, which predicts all relations in the sentence by graph neural network and then performs entity recognition on this basis.Although the above works achieve good performance in dealing with the problem of data overlap, there are still some defects.The encoder-decoder framework is designed to handle overlapping data, but it does not perform well in extracting individual triplets.The above methods extract triplets in one-dimensional space, which can only improve the inference capability of the model for overlapping data but cannot change the spatial distribution of data to eliminate the problem of data overlap fundamentally.
To solve the above problems, we perform the transformation from sequence to matrix (two-dimensional space) and propose the NEDORT model.NEDORT divides the extraction of relational triples into two steps: the first step extracts all subjects in the sentence, and the second step predicts relation-object pairs corresponding to the subject.The overlapping data are caused by expressing data of multidimensional space in one dimension.The first step does not involve overlapping phenomenon, and the extraction can be performed directly in the sequence.Numerous overlapping cases occur in relation-object pairs, so we convert the sequence to a matrix for extraction.There are no overlapping relation-object pairs in the matrix, so the problem of data overlap can be fundamentally eliminated.Moreover, both extraction steps perform binary classification tasks.There-Batchoy is a food in the Philippines and one of its ingredient is shrimp.When extracting subjects, this paper proposes the Differential Amplified Multi-head Attention method to highlight the locations of entities.The relation-object pairs to be extracted are subject-related, so we incorporate the relevant information of the subject into the subsequent extraction.After fetching the start and end positions of the subject, we generate its comprehensive representation through self-attention and apply it to the extraction of relation-object pairs.In this paper, the problem of data overlap is solved by matrix representation, so we design the Biaffine method to complete the conversion from sequence to matrix.This method fuses subject and sequence information and generates the relation-sequence matrix without overlapping data.Traditional sequence-based methods (Lstm, Transformer, etc.) are powerless to deal with matrix problems, so we design the U-Net network to optimize matrix features.U-Net realizes the combination of up-sampling and down-sampling blocks and the information interaction between different layers, which has significant advantages in extracting matrix features.Finally, we combine the extraction results of the two steps to construct relational triples.
We conduct experiments on two public datasets WebNLG and NYT.Experimental results show that our proposed NEDORT model outperforms the previous state-of-the-art models in the extraction of triples.Further analysis demonstrates that our model has advantages over other models in dealing with the problem of data overlap.
The main contributions of this paper are as follows: 1. We propose the sequence-to-matrix transformation method to eliminate the problem of data overlap.2. We propose the Differential Amplified Multi-head Attention method to highlight the position of the subject, which is more conducive to the prediction of the start and end indices.3. We propose the Biaffine method to generate a relationsequence matrix and optimize it through the U-Net network.The optimized matrix contains more prosperous entity and relation features.
To show the paper organization more clearly, we make a summary according to the contents of each section [23]."Introduction" summarizes the motivation and contribution of this paper."Related work" describes the related works of this task."Methods" introduces the implementation details of the proposed methods."Experiments" presents the experimental results and conducts comparative analysis."Conclusion" summarizes the work of this paper.

Related work
Relation triple extraction is a combination of named entity recognition [15,31] and relation prediction [5,30,43].Early works extract all entities in a sentence and then predict the relation between them [2,20,32,48].This is actually performing named entity recognition and relation prediction, but the two parts are jointly trained and share the underlying modules [26,50].To enhance the link between entity recognition and relation prediction, models based on table filling are proposed [12,21].These models construct a connection table between tokens, and define a new label to identify the relationship between tokens [19,37,38].Constructing reasonable labels and improving the inference capability of the model are the keys to improving performance.The span-based method performs well on named entity recognition, so some works conduct the extraction of relational triples based on span [8,17,36].To obtain better extraction performance, [52] adds new annotations on the basis of span.The new annotations related to entities are only used when predicting relations.Span-based methods have significant advantages in extracting triples.However, these method needs to traverse all the spans in the sequence and make predictions, which is computationally intensive.Despite the excellent performance of the above methods, the data overlap problem is not considered when performing relation triple extraction.Overlapping triples are abundant in this task and become an important factor hindering the extraction performance.These models perform poorly when dealing with triples containing overlapping data.
Subsequent works discover this phenomenon and focus on dealing with overlapping data.[47] points out the problem of data overlap for the first time and divides the triples into three categories.Extracting triples of SEO and EPO categories is the key to solving the problem.The current works are mainly divided into two types: generative and extractive.The generative work generates relational triples through an encoder-decoder framework.Better extraction performance can be achieved by optimizing the encoder and decoder.[10] constructs a two-stage end-to-end model through Bi-GCN, which can capture the implicit features between word pairs.[45] improves the quality of the generated entities through multi-task learning.[7] optimizes the model for information interaction between relation categories.To achieve better training effect, [42] constructs negative samples by random sampling and proposes a dynamic masking mechanism.The generation order also has an impact on model performance.Therefore, [46] determines the generation order of triples through reinforcement learning.The entities generated by the above models are all represented by words.However, [25] adopts the idea of span to generate the start and end indexes corresponding to the entities.Although the generative model can solve the problem of data overlap, the Adolfo Suarez Madrid Barajas Airport is located in Alcobendas , Spain , and operated by the UNK ENAIRE .In comparison, the extractive model is easier to implement.[16] learns the connections between each relation category through a multi-head attention mechanism.[39] regards the relation between entities as a function map.
[33] extracts entities and relations simultaneously and re-labels multiple relation labels between entities to solve the problem of data overlap.[49] introduces a cascading capsule network to aggregate context representations and proposes a two-way routing mechanism to encourage interactions between relations and entities.[44] first extracts the head entity, and then predicts the relation corresponding to each boundary of the tail entity.Different from [40], [44] predicts all relations in the sentence, and then extracts entity pairs corresponding to a particular relation.The extractive model performs well in dealing with the problem of data overlap.This paper conducts the transformation from sequence to matrix and proposes an extractive model that does not need to deal with overlapping data.

Methods
This section presents the specifics of relation triple extraction.First, we provide a task definition to summarize the entire extraction process.Then we introduce the extraction method employed in each step.An overview illustration of NEDORT is shown in Fig. 1.

Task definition
The overlapping phenomenon is abundant in relational triples.To eliminate the interference of data overlap problem on the model, this paper decomposes the entire extraction task into two subtasks and performs conversion of sequence to matrix on the second subtask.We define the decomposition process of the extraction task by the following formula: where C is the input context sequence, Rs is all the relations to be extracted, and R is one of them, Ss and Os are similar, I ter denotes to iterate over all elements in the collection.The above formula defines the whole extraction process.We first extract all subjects contained in the context, and then predict the relation-object pairs corresponding to one subject.Due to SEO and EPO phenomena, a subject may correspond to multiple relation-object pairs.Moreover, there may be more than one corresponding object for a given subject and relation.Therefore, the extraction of relation-object pairs is essentially a multi-classification based on matrix.

NEDORT encoder
is the input sentence sequence and n is the number of tokens.To capture the semantic features of the sequence, we use the pre-trained BERT [6] to encode the input sentence: ) is the sentence expression after encoding, n is the length of the sentence, and d is the embedding dimension.

Subject extraction
This section aims to extract all subjects in the sentence.
To obtain better extraction performance, we design the Differential Amplified Multi-head Attention method.This method combines data features and optimizes on the basis of Multi-head Self-attention [34].The Multi-head Selfattention method generates the sequence representation in one dimension by formula so f tmax QK T √ d V , where Q=K =V .√ d is a constant derived from the hyper-parameters of the model and According to the features of the softmax function, multiplying the input data can make the large values in the output more prominent.The purpose of this section is to predict the start and end positions for the entity.The prominent output values at these two positions are beneficial to the prediction of subjects [41].Therefore, we perform extraction without the scaling factor √ d.Context semantics are crucial for the extraction of subjects, so we use BiLSTM to obtain representations that incorporate sentence context.Different from the Multi-head Self-attention method, we set Q, K , and V to different values to fuse more sentence information.The specific implementation of the Differential Amplified Multi-head Attention method is as follows: S m = head (1) ; . . .; head (k) W o (5) where is the sentence sequence generated in Sect."NEDORT encoder", k is the number of heads, [•; •] denotes vector connection.
We have obtained the sentence representation that incorporates rich semantic features.Then we complete the extraction of subjects by predicting the start and end positions: ) where W s sub , W e sub , b s sub and b e sub are trainable weights, σ is the activation function, and here is Relu.A sentence contains multiple subjects, so we will obtain multiple start positions and multiple end positions.

Relation-object pair extraction
This section completes the extraction of relation-object pairs.To improve the extraction performance, we introduce subject information at this stage.S = [w 1 , w 2 , • • • , w n ] is the extracted subject sequence, and w i is one of the tokens.Each token contained in S contributes differently to global information, so we generate the ensemble representation of the subject by the following formula: where W e1 and W e2 are trainable parameters, σ is the activation function, and here is Relu, denotes element-wise multiplication, Mean Pool denotes mean pooling.The problem of data overlap is abundant in relationobject pairs.To eliminate the interference of overlapping data, we perform the sequence-to-matrix transformation.The relation-object pair to be extracted has a relationship with the subject, so the information related to the subject should be included in the generated matrix.This section employs the Biaffine method to fuse subject features and generate the relation-sequence matrix.As shown in Fig. 3, the labels 1, 2, 3, 4, 5, 6, and 7 correspond to seven matrices M 1 , M 2 , M 3 , M 4 , M 5 , M 6 , and M 7 , respectively.M 1 = S en is the subject representation generated earlier.M 5 = S e is the encoded sentence representation generated in Sect.NEDORT encoder".Then we fuse the information of these two matrices and construct the relation-sequence matrix.To complete the transformation of sequence-to-matrix and introduce subject features, we extend M 1 and generate the relation matrix M 2 .
Fig. 3 Details of the Biaffine method.There are seven matrices in total.Matrix 1 is the subject representation obtained.Matrix 5 is the vector representation corresponding to the input sequence.Matrix 7 is the generated relation-sequence matrix There are numerous repeated expressions in the expanded matrix, so we design the matrix M 3 to optimize it.M 3 is generated by random initialization, and it can be dynamically adjusted during training.The matrix generated by the dynamically adjusted strategy is more conducive to the extraction of relational triples.M 4 = M 2 M 3 is the matrix expression after optimization. 4The matrix to be generated is relation and sequence related.Therefore, we combine the relation-related matrix M 4 and the sequence-related matrix M 5 to generate the final relation-sequence matrix M 7 .M 7 can be generated by the formula M 7 = M 4 M 6 , where M 6 = (M 5 ) T .The whole process above can be expressed by where S en is the subject representation, S en is the sequence representation, M bia is the generated matrix representation.The above formula completes the conversion of sequence to matrix and incorporates the subject features.The extraction of relation-object pairs is more complicated than that of subjects.Although the design of M 3 improves the inference capability of the model, it is difficult for the generated matrix to accurately extract the relationobject pairs corresponding to the subject.To optimize the matrix representation, we design the U-Net network.U-Net 4 Matrix multiplication in this section includes dimension expansion operations.
is a multi-layer convolutional neural network that can extract the deep features of the matrix.As shown in Fig. 2, the U-Net network consists of seven parts.Part 1 is the relationsequence matrix obtained by the Biaffine method.Each of the remaining parts is composed of a two-layer convolutional neural network, and its formula is expressed as follows: where Conv is the convolution operation and the convolution kernels of all parts are 3 × 3, σ is the activation function, and here is Relu, M in is the input matrix, M out is the output matrix, and the input and output of each part are different.Figure 2 annotates the matrix dimension of the output for each step.Parts 2, 3, and 4 are down-sampling blocks whose output channels are constantly increasing.More channels can enlarge the receptive field of matrix embedding, thus providing rich global information between relations and sequences.Parts 5, 6, and 7 are up-sampling blocks, and their output channels are trending down.The reduction of the number of extracted features can make the model pay more attention to the token part corresponding to the object rather than the entire sequence.Moreover, there are information interactions between the down-sampling blocks and the up-sampling blocks.The combined matrix representation considers its contextual semantics while focusing on the object.Different from feature extraction in images, the obtained output Fig. 4 The structure of the U-Net network.Part 1 is the input matrix.The rest parts are convolution blocks with different number of channels."cat" represents the information interaction between two blocks.The content in the lower right corner is the matrix dimension of the output matrix must have the same dimensions as the input matrix.Therefore, we design two Maxpooling layers and two Deconv layers in the down-sampling and up-sampling stages, respectively.The entire process can be expressed by: where M bia is the relation-sequence matrix generated by the Biaffine method, M u is the matrix representation after optimization.
The optimized relation-sequence matrix has been obtained, and then we extract relation-object pairs on this basis.We complete the extraction of relation-object pairs by predicting the start and end positions of the object corresponding to each relation.The prediction formulas for these two positions are as follows: where o and b e2 o are trainable weights, σ is the activation function, and here is Relu.The object corresponding to a specific relation may be None, indicating that the relation-object pair containing this relation does not exist.

Relational triple generation
The previous section performs the prediction of the start and end positions.This section combines these predicted values to generate relational triples.We provide a threshold and select the index whose corresponding value is higher than the threshold as the start or end position.For a specific input sentence, our model can obtain multiple start positions and multiple end positions.We match the start position with the first end position after it (the start and end positions can share an index 5 ).
In Fig. 5, we set the values above the threshold to 1 and the rest to 0. S = [T 0 , T 1 , • • • , T 7 ] is the input sentence sequence, and two subjects are extracted by the model: Then we predict the relation-object pairs for each subject.Taking sub 1 as an example, the model extracts two objects: obj 0 = {T 6 , T 7 }, obj 1 = {T 1 }.Combining the corresponding relations, there are three relation-object pairs obtained: p 0 = (rel 0 , obj 0 ), p 1 = (rel 1 , obj 0 ), p 2 = (rel n , obj 1 ).Then the relational triples containing sub1 are: . The same entity pairs in t 0 and t 1 correspond to different relations, which belong to the EPO overlapping pattern.obj 1 and sub 0 are the same entity and t 2 contains obj 1 , then t 2 and all triples containing sub 0 share an entity, which belongs to the SEO overlapping pattern.As shown in Fig. 5, our model performs perfectly when dealing with the above overlapping data.This paper conducts sequence-to-matrix conversion when extracting relation-object pairs.The transformed expression can avoid the interference of data overlap problem.Consecutive tokens marked in red are entities extracted from the sequence."sub", "Rel" and "obj" denote subject, relation and object, respectively

Loss function
Our model outputs multiple prediction results, so the training loss consists of multiple parts: L e S = − where y s , y e , ȳs , and ȳe are the ground-truth labels corresponding to Sub s ,Sub e ,Ob s , and Ob e , respectively.We combine the above four parts to calculate the total loss: where α, β, λ, μ ∈ [0, 1] are hyper-parameters used to control the contribution of each part.

Experiments
This section first elaborates the details of the experiment and then presents and analyzes the experimental results.

Datasets
We evaluate our model on two public datasets WebNLG [11], and NYT [29].The WebNLG dataset was originally created for the Natural Language Generation task, and the NYT dataset was initially used for the distant supervision method.[47]  According to the overlapping pattern, we divide the test set into three categories: Normal, SEO (SingleEntityOverlap), and EPO (EntityPairOverlap).In addition, we also divide the test set into five test subsets according to the number of triples in a sentence to verify the capability of the model in dealing with multiple triples.The statistics of the two datasets are described in Table 1.

Implementation details
We implement our model with the Pytorch library and the hyper-parameters are set as follows.We set the batch size on WebNLG and NYT to be 6 and 10, respectively.Except for the batch size, other hyper-parameters are the same on both datasets.We encode the input sentence through Bert and use its default parameters.The BiLSTM we employed has only one layer, and the designed Differential Amplified Multihead Attention method contains 8 heads.We set the output dimension of Biaffine to 256 and the hidden size to 768.The input and output channels of U-Net are both 256, where the convolution kernel is 3 × 3 and the pooling kernel is 2 × 2.
The thresholds for predicting the start and end positions of entities are both set to 0.5.The activation functions used in this paper are all Relu.We optimize the model by SGD with an initial learning rate of 0.1.

Baselines
We consider the following strong baselines for comparison.NovelTagging [51] constructs a new annotation method to complete the extraction of relational triples.CopyR [47] designs an encoder-decoder model with three decoders.GraphRel [10] proposes an end-to-end extraction model based on graph convolutional networks.CopyR RL [46] determines the extraction order of triples by reinforcement learning.Att-as-Rel [16] designs a supervised multihead self-attention module to learn the correlation of each relation category.RIN [32] proposes a multi-task learning model, which effectively extracts task-specific features through dynamic interaction.CasRel [39] regards relations as functional mappings between entities.ETL-Span [44] first extracts the head entity, and then predicts the relation corresponding to each boundary of the tail entity.WDec [25] also proposes an encoder-decoder model.Different from CopyR, WDec [25] decodes the boundary position of the entity instead of the token span.Similar to the previous encoder-decoder models, CopyMTL, MA-DCGCN, and CGT Uni L M all extract relational triples through gener- Bold text indicates the best results § Marks results reported by [47].
Marks results produced with official implementation.‡ Marks results quoted directly from the original papers ative methods.CopyMTL [45] proposes a generative model based on multi-task learning.MA-DCGCN [7] optimizes the model for information interaction between relation categories.CGT Uni L M [42] proposes a novel triplet contrastive training object and designs a dynamic masking mechanism to improve the quality of generated triples.GraphJoint [40] proposes a relation-based two-step extractive model.The first step predicts the relations in the sentence through a graph neural network.The second step introduces the obtained relations to perform the extraction of entities.

Results and discussions
We conduct experiments on two public datasets WebNLG and NYT.For a fair comparison, we present the scores of all models on Precision (Prec.),Recall (Rec.) and F1.F1 is the synthesis of Precision and Recall, which can better reflect the overall performance of the model [3,4].Tables 2 and 3 show the comparison results of our proposed NEDORT model and other baseline models.The NEDORT model achieves the best results on both datasets, and the F1 scores all exceed 90 (%).CasRel and CGT Uni L M obtain the highest scores on Precision, while their poor performance on Recall results in relatively low F1 scores.In contrast, the performance of our model on the three indicators is relatively balanced, and the comprehensive performance is more prominent.Bold text indicates the best results § Marks results reported by [47].
Marks results produced with official implementation.‡ Marks results quoted directly from the original papers CGT Uni L M is a generative model, and GraphJoint is an extractive model.The extraction performance of CGT Uni L M is better than that of GraphJoint on the NYT dataset.Generative models require a large amount of training data to achieve good results.The NYT dataset contains 56195 training sentences and satisfies this condition.Furthermore, there are few relation categories (24) in NYT, which is beneficial for generative models.For the WebNLG dataset with less training data (5000) and more relation categories (246), CGT Uni L M performs much worse than GraphJoint.In general, the extraction performance of GraphJoint is slightly better than that of CGT Uni L M .
The above two models show certain advantages on different datasets, but they are inferior to our model in terms of triple extraction.Compared with CGT Uni L M , the F1 score improvement of our model on the two datasets is 8.5 (%) and 1.6 (%), respectively.The output of CGT Uni L M is not included in any input tokens, and the extraction result depends on the powerful inference capability of the model.In this paper, we only need to perform binary classification for each input token.Compared with the generative task, the binary classification task is simpler, so the model proposed in this paper can achieve better extraction results.Compared with GraphJoint, the F1 score improvement of our model on the two datasets is 4.0 (%) and 4.5 (%), respectively.GraphJoint extracts triplets in a one-dimensional space, where multiple overlapping cases occur between triplets, and overlapping data seriously interferes with the extraction results of the model.Different from GraphJoint, NEDORT expresses the input data in two-dimensional spaces.After dimension expansion, there is no overlap between triplets, which brings great convenience to triple extraction.The transformation from sequence to matrix is the key for our model to obtain good extraction results.

Results on triples of different categories
The problem of data overlap is a significant obstacle for relation triple extraction.To demonstrate the performance of the model in dealing with overlapping data, we divide triples into three categories: Normal, SEO, and EPO.The Normal category has no data overlap problem, while the triples of the other two categories contain overlapping parts.Therefore, performance in the SEO and EPO categories is the key to evaluate the capability of the model to address the data overlap problem.We conduct experiments on these three categories and present the comparison results in Figs. 6, 7, and 8.The data in the figure is the F1 score of each model on different categories of triples.
The comparison shows that our model achieves the best results on triples of three categories for both datasets.Figure 6 presents the extraction results on triples of the Normal category.Compared with the GraphJoint model, NEDORT only achieves an F1 score improvement of 0.1 (%) on the NYT dataset.The extraction performance on the WebNLG dataset is better, and the F1 score is improved by 0.8 (%).Our model performs better but the advantage is not obvious.Figures 7  and 8 present the extraction results on triples of overlapping patterns.Our model significantly outperforms other baseline models in extracting triples of these two categories.For the SEO category, our model obtains F1 score improvements of 0.8 (%) and 4.2 (%) on both datasets, respectively.The advantage is more obvious for the triple of EPO category.Compared with other baseline models, NEDORT improves the F1 score by 12.9 (%) and 19.8 (%), respectively.There is only one overlapping entity in the triple of SEO category, and it is relatively easy to extract such triples.The triple of the EPO category contains two overlapping entities, so it is difficult for the baseline model to make a correct judgment on the overlapping part.In this paper, the interference of overlapping data on the model is eliminated by matrix presentation, and the categories of overlapping triples have no influence on our model.Therefore, the F1 score improvement of NEDORT in the triple of EPO category is more pronounced.In conclusion, our model is superior to other baseline models in dealing with triples of all categories, and the advantage is more prominent in overlapping data.

Results on triples of different numbers
The previous sections focus on the optimization of the data overlap problem.However, too many triples corresponding to one sentence can also make the extraction more difficult.This section verifies the capability of the model to handle sentences containing multiple triples.We divide the test set into 5 subsets according to the number of triples in the sentence, and the last subset contains all triples whose number is greater than 4. It can be seen from the data division that a sentence may contain 5 or more triples, which brings great challenges to the extraction of relational triples.We conduct experiments on 10 subsets of both datasets and compare the F1 score of each model.Table 4 shows the extraction results of all models on the WebNLG dataset.Our model achieves the best results on all subsets compared to other baseline models.The comparison results on the NYT dataset are shown in Table 5.Our model also performs the best on all subsets, and the advantage is more prominent.These experiments demonstrate the superiority of our model on sentences containing multiple triples.

Complexity analysis
This section analyzes the computational complexity of the proposed methods and presents the results in Table 7.As  shown in the table, n is the length of the input sequence, d is the dimension of the word embedding, and r is the number of relations contained in the dataset.The Differential Amplified Attention method captures the subject features contained in the sequence, and the computational complexity is O(dn 2 + d 2 n).This method performs feature extraction in multiple dimensions, so the computation is relatively complex.The Biaffine method performs the transformation from sequence to matrix, and the U-Net method is used to optimize the representation of the matrix.The computational complexity of these two methods is O(drn), indicating that the computation is not complicated.The last two methods are used to predict the start and end positions of the entity.Both methods perform binary classification, so they perform well in terms of computational complexity.NEDORT combines all the above methods to extract triples, so the comprehensive computational complexity of the model is O(dn 2 + d 2 n + drn + rn + n).For a given dataset, both d and r are constants, so the computational complexity of NEDORT can be expressed as O(n 2 ).The above analysis shows that NEDORT performs well in terms of computational complexity.Combined with the excellent performance of the model in overlapping data, NEDORT is well suited to perform triple extraction.

Ablation study
This section verifies the effectiveness of the proposed methods through ablation studies.We conduct experiments on the WebNLG dataset and show the results in Table 8.The Differential Amplified Multi-head Attention method is used to extract the subject.We first remove it to perform ablation.The result shows that the extraction performance of the entire triple decreases significantly after ablation.This method extracts entity features from multiple dimensions and highlights the start and end indices of entities, which is essential for the extraction of subjects.
The relation-object pair to be extracted has a relationship with the subject, so we incorporate subject information in the second extraction step.To verify the necessity of information introduction, we perform ablation of the subject.The model after ablation does not input subject information when extracting relation-object pairs.Table 8 shows that the F1 score of the model dropped by 26.5 (%) when performing subject removal.Such obvious performance degradation indicates that the subject information is crucial for the extraction of relation-object pairs.Without subject information, our model cannot associate the extracted relation-object pairs with a specific subject, which is the reason for the poor performance.
The Biaffine method is employed to generate relationsequence matrices.We replace it with matrix expansion to complete the ablation.Compared with NEDORT, the extraction capability of the model after ablation decreased significantly, indicating that this method is irreplaceable.The Biaffine method realizes the complete fusion of subject and sentence sequence, which lays the foundation for the extraction of relation-object pairs.Finally, we remove the U-Net network to perform ablation.Surprisingly, the model without U-Net fails to converge.U-Net is used to optimize matrix representation and extract features between entities and relations.The extraction of relation-object pairs is more complicated than that of subjects, and a matrix without optimization cannot capture the complex connections between relations and objects.Therefore, the U-Net network is the key to the excellent performance of our model and cannot be removed.

Conclusion
The problem of data overlap is a significant obstacle to the extraction of relational triples.This paper performs the sequence-to-matrix transformation to fundamentally eliminate the interference of overlapping data.We design the Biaffine method to fuse subject information and generate relation-sequence matrices.Furthermore, we also design the U-Net network to optimize the matrix representation.When performing the extraction of subjects, we employ the Differential Amplified Multi-head Attention method to highlight the start and end positions of entities and extract sequence features from multiple dimensions.The experimental results show that the proposed NEDORT model outperforms other baseline models on triples of all categories.Our model completes the extraction of relational triples in two steps, which are linked only by subject information.If there are more con-nections between the two steps, the extraction performance of the model will be further improved.Future work will explore new models to increase the information interaction between these two steps.The above experiments have proved that there is no data overlap problem in multidimensional space.In the future, we will explore more effective multidimensional space expression methods to further improve the extraction results of triples.In addition, more efficient word embedding generation methods can also be combined with our model, which can provide better initialization representation for the input data.

Fig. 2
Fig. 2 An overview of the proposed NEDORT framework.The Differential Amplified Multi-head Attention method is employed in the extraction of subjects.The right part presents the generation and optimization of the relation-sequence matrix.Lin and Relu represent linear and activation functions, respectively Ss)} ⊕ I ter S∈Ss (C, S) → I ter R∈Rs (R, Os) = {(C) → (Ss)} ⊕ I ter S∈Ss (C, S) → I ter R∈Rs I ter O∈Os (R, O) = {(C) → (Ss)} ⊕ I ter S∈Ss (C, S) → I ter R∈Rs I ter O∈Os (S, R, O)

Fig. 5
Fig.5 Implementation details for the extraction of relational triples."Head" and "Tail" represent predictions for start and end positions, respectively.Consecutive tokens marked in red are entities extracted from the sequence."sub", "Rel" and "obj" denote subject, relation and object, respectively 67

Fig. 7 Fig. 8
Fig. 7 Extraction results (%) for different models on triples of SEO category.The data in the figure is the F1 score on the specified dataset The Asser Levy Public Baths are located in New York City , New York , United States .
state city nickname Fig. 1 Examples of Normal, SEO and EPO overlapping patterns.Subject, Object and Relation are marked in red, green and orange, respectively.The entities marked in purple are both Subject and Object.Overlapping triples are contained in the SEO and EPO categories fore, NEDORT performs better than the encoder-decoder framework in extracting individual triplets.

Table 2
Results (%) of different models on WebNLG

Table 3
Results (%) of different models on NYT Extraction results (%) for different models on triples of Normal category.The data in the figure is the F1 score on the specified dataset

Table 4
F1-score (%) of extracting relational triples from sentences with different number (denoted as N ) of triples on WebNLG In this paper, the extraction of relational triples is divided into two steps.The extraction results of previous experiments are based on the entire triple.This section verifies the performance of our model on a single extraction step.We conduct experiments on two datasets and present the results of the two steps in Table6.For the WebNLG dataset, the F1 score of our model in the first step is 96.7 (%).The extraction of relation-object pairs is much more difficult than that of subjects, and our model can still obtain an F1 score of 93.3 (%).In addition, we also present the respective extraction results of relation and object.The F1 scores for relation and

Table 5
F1-score (%) of extracting relational triples from sentences with different number (denoted as N ) of triples on NYT

Table 6
Results (%) of single step on WebNLG and NYT

Table 7
Complexity analysis of the proposed method