Abstract
Extractive approaches have been the mainstream paradigm for identifying overlapping entity–relation extraction. However, limited by their inherently methodological flaws, which hardly deal with three issues: hierarchical dependent entity–relations, implicit entity–relations, and entity normalization. Recent advances have proposed an effective solution based on generative language models, which cast entity–relation extraction as a sequence-to-sequence text generation task. Inspired by the observation that humans learn by getting to the bottom of things, we propose a novel framework, namely GenRE, Generative multi-turn question answering with contrastive learning for entity–relation extraction. Specifically, a template-based question prompt generation first is designed to answer in different turns. We then formulate entity–relation extraction as a generative question answering task based on the general language model instead of span-based machine reading comprehension. Meanwhile, the contrastive learning strategy in fine-tuning is introduced to add negative samples to mitigate the exposure bias inherent in generative models. Our extensive experiments demonstrate that GenRE performs competitively on two public datasets and a custom dataset, highlighting its superiority in entity normalization and implicit entity–relation extraction. (The code is available at https://github.com/lovelyllwang/GenRE).
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Relation extraction (RE) aims to extract explicit and implicit entity–relations from unstructured text and transform it into a structural knowledge base. It is an essential component of natural language processing (NLP) tasks, such as question answering (QA) [1], text summarization [2], and knowledge graph construction [3].
Current approaches to relation extraction can be roughly divided into two categories: (1) pipelined approaches that decompose into named entity recognition and relation classification separately [4, 5]; and (2) joint approaches that perform two subtasks simultaneously in a joint learning manner [6,7,8]. However, as mentioned by Zeng et al. [9], these models cannot handle the overlapping triple problem. As shown in Fig. 1, multiple relational triples share the overlapped entity “Frances E. Allen”. To address the above limitations, subsequent works introduced various strategies, such as tagging-based methods [10,11,12], span-based methods [13, 14], and table-filling-based methods [15, 16]. Though the above approaches have achieved promising success, they are extractive paradigms, making it difficult to deal with three key challenges.
First, there exist hierarchical dependency entity–relations between different tags in real-life scenarios, especially in scholar profiling task. Traditional entity–relation methods may not adequately capture the complexity and richness of these relation. As shown in Fig. 1, a model may easily obtain (Robert M. Metcalfe, obtain, PhD) triple. However, this triple is not enough to fully express that “PhD are obtained from Harvard University”. This inadequacy arises from the inherent dependency relation between the head entity “Harvard University” and the dependent entity “PhD”. Existing studies on this issue usually consider relation extraction as a multi-turn question answering problem [17]. This idea employs an extractive machine reading comprehension (MRC), which predicts the start and end positions of the answer span given the context text. However, its success heavily relies on annotation tools to label accurate information and required expensive labeling costs.
Second, implicit entity–relations may be contained in a context. Most methods can only extract entities that appear in the context, but implicit entity–relations cannot be inferred. Take Fig. 1 as an example, Robert M. Metcalfe entered MIT in 1964, where the year 1964 is inferred from the logical relations according to “after five years”. Moreover, based on the given context, it is reasonable to infer that Robert M. Metcalfe initiated his master’s degree at Harvard University around 1969 and pursued his Ph.D. in 1970, respectively.
Third, an entity has different expressions and needs to be normalized. Take the context in Fig. 1 for example, an extractor easily identifies the entity MIT, but it is difficult to be normalized the entity Massachusetts Institute of Technology unless the extractor incorporates a reference set. Although common in entity linking, obtaining the normalized entities are particularly difficult for entity–relation extraction where entities are not explicitly mentioned in the context.
To address the above limitations, we propose a novel paradigm, GenRE, to perform entity–relation extraction, which can generate an entity–relation from the context based on generative multi-turn question answering with contrastive learning. Specially, to make it easy for GenRE to move from one domain to the other, a template-based question prompt generation is first designed to answer in different turns. We then formulate entity–relation extraction as a generative QA task to generate explicit and implicit entity–relations using the general language model (GLM) [18] instead of extractive question answering, where we introduce a special candidate answer “unknown” to address the early stop problem caused by unanswerable questions in multi-turn QA. In this way, we not only answer questions when possible but also determine when no answer is supported for unanswerable questions in the given context. Meanwhile, we introduce self-supervised contrastive learning to improve the faithfulness of the generated answers, which guides the model to increase the distance between the positive and negative samples. Figure 2 provides an example and overview of our approach. The key insight of GenRE is to simulate humans learn knowledge by getting to the bottom of it.
To summarize, our main contributions are as follows:
-
We introduce a novel framework that solves relation extraction tasks by casting them as a generative multi-turn QA problem that take into account the rich semantic information of hierarchical dependency relations.
-
By incorporating a contrastive learning, the performance of the model in discriminating between positive and negative answers is significantly improved.
-
We conduct extensive experiments on three datasets, including two versions of two public datasets and a custom dataset, to verify the effectiveness and flexibility of our approach, especially in entity normalization and inference.
The remainder of this paper is organized as follows. The related work is summarized in “Related works” section. We formalize the RE task and present the proposed GenRE model in detail in “Preliminaries” section. A series of experiments are conducted to evaluate the performance of GenRE in “Experimental setups” and “Results and discussion” sections. Finally, we conclude our work in “Conclusion” section.
Related works
Relation extraction
Traditionally, relation extraction has been tackled using separate subtasks in a pipeline manner for name entity recognition and relation classification [5, 19, 20]. However, pipelined systems suffer significantly from error propagation problems. To overcome this problem, joint learning have been proposed to exploit the interrelation between the two tasks [8, 21, 22].
Various strategies exist for joint learning methods, such as treating the two tasks as a sequence-labeling problem, as suggested by some researchers [8, 23]. Wei et al. [10] utilized a cascaded binary tagging framework to extract subject entities and their corresponding relationship and object entities in two stages. Despite their initial success, these methods cannot precisely identify overlapping triples in a sentence, because there exists a phenomenon in which an entity pair may have multiple relations or that two relation triples share an overlapping entity. Ren et al. [12] further improved and designed a bidirectional extraction framework [10] reduce entity extraction omissions.
A series of studies have formulated joint entity and relation extraction as a table-filling problem. Wang et al. [15] designed a table encoder and a sequence encoder that collaborate to facilitate representation learning. TPLinker [24] is a single-step model for jointly extracting entities and overlapping relations, where a handshaking tagging scheme is proposed. UNIRE [25] presented a novel table-filling approach, where entities and relations are represented as squares and rectangles. Ren et al. [16] proposed a joint extraction method while considering global information to improve table modeling. OneRel [26] is similar to TPLinker [24], the difference is that the number of matrices is reduced and the model efficiency is improved.
The span-based approach is another widely used method for joint entity and relation extraction. Markus et al. [27] introduced a novel approach, SpERT, a span-based model for joint entity–relation extraction. Zhong et al. [14] presented a simple approach which learns two independent encoders for entity recognition and relation extraction.
Recent some efforts have focused on identifying overlapping triples, which usually introduce various strategies like seq2seq with copy mechanisms [9, 28, 29], graph convolutional networks [22, 30, 31], and reinforcement learning [32, 33]. Unfortunately, these approaches may suffer from semantic dependencies, as they still unambiguously predict the relationship between each pair of entities, ignoring the effects of other entity–relations.
Another interesting thread is to cast information extraction as an MRC task [17, 34,35,36]. Li et al. [17] adopted a similar idea to frame relation extraction as an MRC problem, which cannot only identify overlapping triples but also deals with hierarchical dependencies among multiple triples. Nevertheless, this approach relies on identifying entity spans, making it difficult to deduce implicit relationships between entities.
More recently, researchers have proposed utilizing generative pre-trained models, such as BART [37] and T5 [38], for relation extraction. TANL [39] treated the task as a translation problem between augmented natural languages, while CGT [40] and REBEL [41] framed triple extraction as a sequence-generation task. UIE [42] introduced a unified text-to-structure generation framework that dynamically generates target extractions via a schema-based prompt mechanism. Our approach is similar to these methods in that we use generative pre-trained models, but it distinguishes itself by utilizing a multi-turn QA method to generate entity–relations with hierarchical dependencies.
Pre-trained language models
Recently, pre-trained language models have achieved great success in the field of NLP. Vaswani et al. [43] proposed a self-attention-based architecture Transformer, which soon became the backbone of many subsequent pre-trained language models by pre-training on a large-scale corpus. Existing pre-training language models can be categorized into three types: First, autoencoding models [44, 45] learn a bidirectional contextualized encoder for natural language understanding (NLU) via denoising objectives, which are suitable for NLU tasks but cannot be directly applied to text generation. The second type is autoregressive models, which are trained with a left-to-right language-modeling objective [46, 47]. It performs well in unconditional generation tasks and can be applied to NLU and conditional generation tasks. Third, encoder–decoder models are pre-trained for sequence-to-sequence tasks [37, 38]. These types of models are typically deployed in conditional text generation and can be applied to NLU and unconditional generation tasks. However, these pre-training frameworks are not suitable for all NLP tasks. Recently, Du et al. [18] proposed a general language model based on autoregressive blank-filling to address this challenge.
Contrastive learning
Contrastive learning for self-supervised, semi-supervised, and supervised learning methods have been widely used to learn representations by contrasting positive pairs against negative pairs, especially in computer vision [48]. Recently, contrastive learning has been used in NLP to train language models with self-supervision [49], learn sentence representations [50], and improve machine translation [51]. A paradigm that introduces self-supervised contrastive loss when fine-tuning a pre-trained model has been used for conditional text-generation tasks [52], including machine translation, question generation, and summarization. Inspired by these studies, we incorporated contrastive learning into generative QA.
Preliminaries
This section formally details the relation extraction task and then introduces generative multi-turn QA techniques with contrastive learning based on GLM pre-trained language models. Figure 3 describes the architecture of GenRE, which consists of two stages. (1) The training stage in which we train the generative QA model based on GLM while utilizing contrastive learning. (2) The inference stage in which we generate the answers based on the generative QA model in a multi-turn QA method and then integrate the answers into a structural knowledge base.
Problem formalization
Before presenting our model, we first formalize the entity–relation extraction task. Formally, given context \(C={c_1,c_2,\ldots ,c_n }\) with n tokens, let E denote the set of entities included in C. Entity \(e_i\in E\) consists of one or multiple consecutive words in the context. Let \(r_{i,j}\) indicate one type of relationship between entity pairs \((e_i,e_j)\) and \(r_{i,j}\in R\), where \(R={r_1,r_2,\ldots ,r_m}\). Hence, the traditional RE task is to extract the explicit and implicit relation triples \(T={(e_i,{r_{i,j},e}_j)}\) between entity pairs \((e_i,e_j)\). However, the relation \(r_{i,j}\) may depend on another entity \(e_k\) owing to hierarchical dependency in one context.
Inspired by MRC-based RE, we treated RE as a multi-turn QA problem and solved the task based on a generative model with contrastive learning. Specifically, the extraction of explicit and implicit relations is transformed into a multi-turn QA. We formally define the generative QA problems. Given a context \(C={c_1,c_2,\ldots ,c_n}\) with n tokens and a question \(Q={{q}_1,q_2,\ldots ,q_m}\) with m tokens, the model must generate target answers \(A={{a}_1,a_2,\ldots ,a_l}\) with l tokens for answerable or unanswerable questions. Note that the “unknown” string is returned when the question is unanswerable.
We summarize how to transform the entity–relation extraction task into a multi-turn generative QA task in Algorithm 1.
Template-based question prompt generation
For each turn of QA that we would like to predict, we need to generate an appropriate question q. To make it easy for GenRE to move from one domain to the other, we designed a question generation template that is not only effective for the task but also be simple and efficient to implement.
For the normal relation extraction task, we transformed it into two turns of QA. Specifically,the first turn involves identifying all candidate entities (entity taggers), while the second turn involves classifying the relation types between each possible entity pair (relation extractor). However, in some specific scenarios, RE may need to be treated as a multi-turn QA task. In this case, the head entities with the highest priority are extracted in the first turn QA, and then fed to the question template to generate the second turn questions to obtain tail entities and relations, and so on, until we generate the answers in a one-to-one correspondence.
Table 1 presents the question generation templates for different scenarios. It is worth noting that each question may have multiple answers that are necessary to iteratively generate the next-turn question.
Generative QA model with contrastive learning
In a generative QA task, the goal is to generate answers by autoregressively predicting tokens, where every answer has its corresponding rationale. The rationale can usually be located in a certain continuous area of the context or inferred from semantic reasoning based on the context. In this paper, we consider explicit and implicit answers to obtain complete entity–relations.
Inspired by the current trend of formulating NLP tasks as generation tasks [18], we proposed a generative QA model based on GLM with contrastive learning to obtain more complete entity–relations. Figure 3(1) illustrates the training process of our model. The GLM is a general language model pre-trained with an autoregressive blank-filling objective. It has shown superior performance when compared to state-of-the-art models in various NLP tasks, such as NLU, unconditional generation, and conditional generation. We used the pre-trained GLM as the main structure for the generative QA model.
Learning representations
Suppose that given a context \(C={c_1,\ldots ,\ c_n}\), a question \(Q=\left\{ q_1,\ldots ,\ q_m\right\} \) as the source text \(x=\left\{ x_1,\ldots ,\ x_{n+m}\right\} \), and an answer \(y=\left\{ y_1,\ldots ,\ y_o\right\} \) as the target output text, where n,m,o indicate the number of words in the context, question, and answer, respectively. To align with the GLM framework, we must convert the input and target text into word tokens.
To achieve this, we introduce the prompt tokens “\(<\textit{Context}>\)”, “\(<\textit{Question}>\)”, “\(<\textit{Answer}>\)” at the beginning of sequence representations of context, question, and answer, respectively. Figure 4 shows an illustrative example of single-turn QA and its representation of the source and target text. For each token, the input embedding comprises token embedding \(E_{T}\), position embedding \(E_{P}\), and block position embedding \(E_{B}\).
Fine-tuning GLM for answer generation
Taking the above learning representation tokens as the input representations, the GLM first computes the hidden states of the input via multi-layer L transformer layers, where each layer consists of a multi-head self-attention layer and a fully connected feedforward network
In contrast to other models, the number of missing tokens in a span is unknown to the model, and a span may contain multiple tokens. Therefore, the GLM predicts the answers indicated [MASK] in an autoregressive manner and generates the token in the masked spans following the left-to-right order. For source sequence x, the probability of generating the target [MASK] span y is expressed as follows:
where \(y_{z_{<i}}\) denotes the first \(i-1\) elements of a permutation \(z \in Z_{T} \), \( Z_{T} \) is the set of all possible permutations of the length \(-T\) index sequence \([1,2,\ldots ,T]\). \(y_j\) corresponds to the jth token of y and \(h_L\) represents the Lth hidden state.
Contrastive learning for faithful answers
Generative models have become increasingly popular in natural language processing, particularly in text generation. However, a major challenge for these models is that they are generally trained with teacher forcing,which involves providing ground-truth answers at each time step during training. In other words, the generative models is not exposed to negative examples and may not learn to distinguish between correct and incorrect answers. This issue is commonly referred to as the “exposure bias” problem.
In the QA task, exposure bias can result in incorrect or insufficient answers.For example, given that the input sentence “Chris was educated at Plymouth University in the UK and holds an honor degree in Geography.” and the question “Where was Chris educated?”, we should expect the model to generate the answer “Plymouth University” rather than “UK”, which is a correct answer but not what we expect. Therefore, we need to find a way to enhance the faithfulness and accuracy of the generated answers.
To address this problem, contrastive learning has emerged as a promising solution. This approach has been successful in militating the “exposure bias” problem by increasing the distance between positive and negative samples. In light of this, we introduce a naïve contrastive learning framework to train the GLM model,which incorporates in-batch non-target sequences as negative examples, as illustrated in Fig. 3(1).
Our approach involves considering the answer \(y^{(i)}\) as the positive instance and adopting in-batch sampling to obtain the negative instance \({{y}^{\left( j\right) }},\ \forall j\ne i \). First, we train the GLM model to learn a joint embedding space. Then, we apply average pooling to \(X^{\left( i\right) }\) and \(Y^{\left( i\right) }\) and project them into a two-dimensional space using t-SNE. Our objective is to maximize the similarity between the source and the N true target sentence embeddings in the batch, while minimizing the similarity between the source and the remaining \({N}^{2}-N\) negative pairs. We use cross-entropy loss to optimize the similarity scores, as shown in the following loss function:
where \(z_*^{(i)}\) denotes the application of mean pooling to compute the fixed-sized representation of a sentence \(z\in {\mathbb {R}}^d\), \(X^{\left( i\right) }\),\(Y^{\left( i\right) }\) denotes a concatenation of the hidden states of the source text x and target text y, respectively. \(\text {sim}(\cdot ,\cdot )\) calculates the cosine similarity between the two representations. Furthermore, \(D=\{z_y^{\left( j\right) }:j\ne i\}\) is a series of hidden representations of negative targets that are in-batch sampled and are not paired with the source text \(x^{(i)}\), where \(\tau \) is the temperature set to 1.0.
Joint generative and contrastive training
In the fine-tuning stage, given a training dataset, we formulate the cross-entropy loss function of this QA task as follows:
Therefore, during the training of our model, it can be optimized by minimizing the cross-entropy loss of the generation and contrastive training loss
where \(\lambda \) is a hyperparameter that controls the weight of the generative QA tasks. During the training process, we used a linear decay schedule on the value of \(\lambda \), to rely more on contrastive learning to generate faithful answers at the early stage, followed by a subsequent focus on the target generation task.
Constrained decoding for answer generation
The quality of the generated text in autoregressive language models s influenced by several factors, and one of them is the decoding strategy. This strategy is employed to select the next word to generate based on the probability of the entire vocabulary. In this study, we explore three constrained decoding methods.
In greedy decoding, the decoder generates the token with the highest probability given all previous tokens at each step. In search-based decoding with beam search, the decoder keeps track of the num_beam most probable sequences and finds a better one at each step, where num_beam is the number of sequences of tokens to track as candidate completion sequences. In sampling decoding, the decoder samples from either the top_k number of tokens or the number of tokens corresponding to the top_p probability.
In the decoding strategies mentioned above, the decoder generates tokens from an entire vocabulary. However, for explicit entity–relation extraction, the answers presented in the context are considered valid. By limiting the generation space to valid tokens, we can prevent the model from generating invalid tokens, which ultimately improves accuracy. Therefore, we experimented with two kinds of constrained generation, namely, context-constrained generation and vocabulary-constrained generation using different decoding strategies.
Experimental setups
Dataset
To fairly evaluate the performance of the proposed model, we conducts the experiments on two tasks: the traditional relation extraction task and domain-specific structured prediction task education information extraction. For the former, we leverage two popular public relationship extraction datasets known for the presence of overlapping relationships. For the latter, a more complex custom dataset is employed, incorporating instances of both overlapping and dependency relations.
Relation extraction
To extract overlapping relation, we conducted experiments on two popular datasets, NYT [53] and WebNLG [54] to evaluate the proposed framework and all baselines. The NYT was constructed using a distant supervision method and is widely used for relation extraction. It contains 24 relations, 56,195 sentences for training, 5000 sentences for validation, and 5000 sentences for testing. WebNLG was originally created for natural language generation but was later used by Zeng et al. [9] for triplet extraction. It contains 171 relations and 5019/500/703 sentences for training, validation, and testing, respectively. Note that both NYT and WebNLG have two different versions according to the following two annotation criteria: (1) the last token of the annotated entity and (2) the span of the entire entity. We evaluated our model using different versions of these datasets for a fair comparison. The first versions are denoted as \(\hbox {NYT}^*\) and \(\hbox {WebNLG}^*\), and the second versions are denoted as NYT and WebNLG, respectively.
Education information extraction
For scholar profiling tasks, current entity–relation extraction methods cannot extract more specific and complex entity–relations from text. To support this task, we constructed a Profiling-Edu dataset from the Aminer system, which contains more than 300 million scholars. We sampled the education records of 2053 scholars to annotate start-date, end-date, university, degree, and major. The dataset was randomly split into a testing set of 500 samples and a training set for the remaining samples.
Note that we treat each task as a generative multi-turn QA task, and all the datasets used for relation extraction cannot be directly used in QA-based models. Hence, some preprocessing is performed to construct QA pairs according to the given context. Concretely, following the question generation method outlined in the “Template-based question prompt generation” section, we obtain a relation extraction dataset that can be used for QA. Our approach differs from other methods that tag the start and end position of the answer. Instead, our model directly provides the answer itself. In cases where the answer is unknown or not present in the given context, we denote it as “unknown”.
Evaluation metric
For traditional relation extraction, we followed the popular choice report micro-F1 scores, precision, and recall on entities and relations for evaluation.
In the domain-specific relation extraction task, the Profiling-Edu dataset contains multiple items that need a one-to-one correspondence between the items. Therefore, we adapted the adjusted precision, recall, and F1-score to evaluate the results of the education information extraction
where m is the number of extracted education records, n is the number of ground-truth education records, and k denotes the number of items in the record. If an item is consistent with the annotated item, \(x_i\) receives a value of 1; otherwise, it is assigned a value of 0.
Implementation details
We adapted GLM-Doc,Footnote 1 which has 24 layers, 1024 hidden units, and 16 attention heads, as the MRC backbone. We optimized our model using label smoothing and used AdamW optimization with \(\beta _1=0.9\) and \(\beta _2=0.999\). The batch size was 64 and the learning rate was 2e-5, with a weight decay of 1e-1. We applied a linear warm-up learning rate scheduler, with a warm-up ratio of 0.06. We trained our model for a maximum of 50 epochs. All experiments were performed on an Intel(R) Xeon(R) Gold 6240 CPU and an NVIDIA V100 32 GB GPU.
Comparison methods
To demonstrate its effectiveness, we compared our method with several baselines as follows, which can be summarized in two groups: extractive methods and generative methods.
-
1.
Extractive methods
-
NovelTagging [8] introduced a novel tagging scheme and modeled the relational triple extraction problem as a sequence-labeling problem.
-
CopyRE [9] adapted a seq2seq model with a copy mechanism that can effectively extract overlapping triples in a sentence.
-
GraphRel [30] is a two-stage model based on a graph convolutional network (GCN) for jointly learning named entities and relations.
-
RSAN [23] proposed a sequence-labeling approach that utilizes a relation-specific attention mechanism.
-
CasRel [10] employed a cascade binary tagging framework, which first extracts all possible head entities in a sentence with span-based MRC. Then, for each head entity, all possible relations and corresponding tail entities are identified.
-
TPLinker [24] iterates all token pairs and uses matrices to tag token links to recognize the relations between token pairs.
-
RIFRE [31] proposed the construction of heterogeneous graphs for iterative representation fusion by treating relations as nodes on the graph and applying them to relation extraction tasks.
-
TransRel [55] proposed a novel translation based framework, which contains an entity tagger and a relation extractor.
-
-
2.
Generative methods
-
TANL [39] frames structured prediction language tasks as a task of translation between augmented natural languages, which makes it easy to encode structured information in the input and decode the output text into structured information.
-
CGT [40] treats triple extraction as a sequence-generation task and employs contrastive training to generate faithful triplets.
-
REBEL [41] frames triple extraction into a seq2seq task and leverages the BART as the base model.
-
UIE [42] introduced a unified text-to-structure generation framework that adaptively generates target extractions via a schema-based prompt mechanism. For a fair comparison, we fine-tune the UIE using T5-v1.1-large as backbone to in the following experiments.
-
Results and discussion
In this section, we introduce the experimental results on the WebNLG and NYT datasets for the relation extraction task and the profiling education dataset for education information extraction. In addition, the performance of our models was demonstrated through detailed analysis and discussion.
Main result
Relation extraction
Table 2 presents the performance of all the baselines and our model on the WebNLG and NYT datasets. According to the results, our model achieved competitive performance within all evaluation metrics of the best baselines on all datasets. It is important to consider whether it is a generative model, since generative models achieved much better results on fully annotated labels datasets, and slightly inferior but still competitive results on partially matched datasets. The reason for this may be that generative models are more suitable for handling semantic integrity and are less prone to ambiguity for incomplete annotations. For example, given the instance “Joe Buck’s father is Jack Buck.”, the exact matched triple is (Jack Buck, Children, Joe Buck), whereas the partially matched triple is (Buck, Children, Buck). Thus, the generative model may not know which Buck is the head entity because of the ambiguity of incomplete annotations. This is meaningful, because it indicates that generative models perform well when deployed in real scenarios.
Compared with UIE [39], a model that frames all IE tasks text-to-structure transformations, our model achieves an absolute improvement of 1.8 in precision-score on the NYT datasets. The UIE can handle overlapped triples and is flexible, but it cannot deal with redundant predictions. However, our framework can effectively reduce redundant predictions by filtering a candidate relation “unknown”. Moreover, our model performs much better than other generative models on the NYT* dataset, which is because our model transforms relation extraction into a multi-turn QA task.
Education information extraction
Considering that the Profiling-Edu dataset constructed by us has cases where there is a hierarchical dependency between the entities, we also conducted experiments on it and report the results in Table 3. In contrast to the above datasets, the Profiling-Edu dataset has five types of entities, and multi-turn QA is required, as illustrated in Table 1. We further compared our model with four baselines: MQARE [17], TANL [39], REBEL [41], and UIE [42]. Specifically, MQARE casts entity–relation extraction is an MRC task that utilizes BERT as the backbone. TANL is a seq2seq model that frames structured prediction language tasks as tasks of translation between augmented natural languages. According to the results, our model and MQARE significantly outperform TANL, because the MRC methods can fully model the rich interactions between entities and relationships and can be generalized to new scenarios. Compared with MQARE, our model still achieves competitive performance, because we adopt a generative method, which employs a special candidate answer “unknown” that can bridge the gap between no-answer and wrong answer and cannot affect the answer of the next turn. Although the two models REBEL and UIE have high performance in the custom dataset, our approach GenRE performs much better than them. These results verify the effectiveness of the proposed model.
Detailed analysis and discussion
Ablation study
To better understand the effectiveness of our model, we conducted ablation studies by removing the key modules individually. As shown in Table 4, there is an obvious performance gap when removing question prompts, indicating that question prompt generation plays an important role in generating high-quality answers, with an average drop of 4.54% in F1. To investigate the effects of our learning model, we further ablated contractive learning loss. It can be observed that the model performance inevitably decreases, demonstrating that contrastive learning helps boost performance compared to the corresponding model versions without an additional objective.
Effect of contrastive loss margin
Next, we investigate the effect of the contrastive learning loss margin \(\lambda \) in our framework, which controls the balance between the generation loss and contrastive learning loss. To this end, we fine-tuned the GLM by varying the value of \(\lambda \) from 0.1 to 1.0 and measured the evaluation metric. The results are shown in Fig. 5. When \(\lambda \) is set to 0, it refers to the model with only a cross-entropy loss. Interestingly, contrastive learning always helps boost performance compared to the corresponding model versions without the additional objective. This indicates that the proposed model can generate semantically valid answers that are beneficial for training the QA model. In addition, we observed that the best performance was achieved when \(\lambda =0.6\) for the relation extraction task.
Effect of constrained decoding
In this section, we provide a more in-depth comparison between context-constrained generation and vocabulary-constrained generation with different decoding strategies, including greedy decoding, beam search, top-k, and top-p. Table 5 presents the F1-score of GenRE using the decoding strategy on the Profiling-Edu dataset discussed in “Constrained decoding for answer generation” section. Intuitively, the differences between several encoding strategies are marginal in the context-constrained generation case. This highlights the suitability of context-constrained generation is more suitable for explicit entity–relation extraction. Additionally, we observed that the beam-search decoding strategy brings more improvements than the other strategies in either case and can effectively guide answer generation. This suggests that it is suitable for answer generation, particularly for implicit entity–relation extraction.
Inference of GenRE
One straightforward solution for RE is to extract explicit entity relations in context. However, in real-life scenarios, a sentence may contain implicit entity–relations. To verify that our model has the ability to reason, we further conducted detailed experiments on the Profiling-Edu dataset, which adds implicit entity–relations on start-date and end-date. Results are provided in Table 6. We observe that GenRE performs the best among the models, and our GenRE model outperforms REBEL and UIE by 16.3% and 2.3%, respectively, in the F1 evaluation metric with implicit entity relations. This shows that the generated answers are strongly correlated with rationales, demonstrating the inference effectiveness of leveraging the GenRE model.
Normalization of GenRE
For information retrieval or question answering, named entities are expected to be normalized, which refers to the process of mapping different names refer to the same entity. For example, MIT refers to Massachusetts Institute of Technology and should be normalized. However, traditional BERT-based MRC models extract entities by predicting the start and end indexes, which cannot directly normalize the extracted entities but require further operations. In contrast, the proposed generative model can normalize the generated entities, providing a more direct solution to this issue.
To verify the effectiveness of normalization, we conducted a detailed analysis on the Profiling-Edu dataset where the university tag was normalized, such as \(<\textit{person}, \textit{edu\_at}, \textit{univ}>\). Table 7 presents the results of the study. As can be seen, the normalization ability of REBEL and UIE is weak, and they only obtain an F1-score of 74.0% and 73.0%, respectively. This is in line with our expectations, since they cannot adequately capture context and entity affinity. The generative MRC model, which generates the answer to a given question, has more generalization and normalization capabilities, resulting in acceptable results.
Conclusion
In this study, we cast the RE task as a generative MRC task using contrastive learning. Specifically, we propose an effective generative MRC framework that generates entity–relations in multi-turn QA and a contrastive learning algorithm for efficient model learning. The experimental results show that our model can achieve competitive performance with the previous SOTA models using only coarse annotation. Based on our findings, we believe that generative modeling is highly promising and capable of implicit entity–relation inference and entity normalization.
In the future, we will explore using reinforcement learning to reward and punish for the error accumulation in multi-turn QA, which might potentially improve the extraction performance on Recall metric. Additionally, we will also consider switching to more information-extraction tasks, such as event extraction.
Data availability
The relation extraction datasets NYT*, WebNLG*, NYT, and WebNLG are accessible through the following link: https://drive.google.com/file/d/1RxBVMSTgBxhGyhaPEWPdtdX1aOmrUPBZ/view?pli=1. Additionally, the Profiling-Edu dataset is available from the corresponding author upon reasonable request.
References
Fader A, Zettlemoyer L, Etzioni O (2014) Open question answering over curated and extracted knowledge bases. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data, pp 1156–1165. https://doi.org/10.1145/2623330.2623677
Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2:258–268. https://doi.org/10.4304/jetwi.2.3.258-268
Riedel S, Yao L, McCallum A, Marlin BM (2013) Relation extraction with matrix factorization and universal schemas. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 74–84. https://aclanthology.org/N13-1008
Chan YS, Roth D (2011) Exploiting syntactico-semantic structures for relation extraction. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp 551–560. https://aclanthology.org/P11-1056
Lin Y, Shen S, Liu Z, Luan H, Sun M (2016) Neural relation extraction with selective attention over instances. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol 1, pp 2124–2133. https://doi.org/10.18653/v1/p16-1200
Li Q, Ji H (2014) Incremental joint extraction of entity mentions and relations. In: Proceedings of the 52th Annual Meeting of the Association for Computational Linguistics, pp 402–412. https://doi.org/10.3115/v1/p14-1038
Ren X, Wu Z, He W, Qu M, Voss CR, Ji H, Abdelzaher TF, Han J (2017) CoType: joint extraction of typed entities and relations with knowledge bases. In: Proceedings of the 26th International Conference on World Wide Web, pp 1015–1024. https://doi.org/10.1145/3038912.3052708
Zheng S, Wang F, Bao H, Hao Y, Zhou P, Xu B (2017) Joint extraction of entities and relations based on a novel tagging scheme. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1227–1236. https://doi.org/10.18653/v1/P17-1113
Zeng X, Zeng D, He S, Liu K, Zhao J (2018) Extracting relational facts by an end-to-end neural model with copy mechanism. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 506–514. https://doi.org/10.18653/v1/p18-1047
Wei Z, Su J, Wang Y, Tian Y, Chang Y (2020) A novel cascade binary tagging framework for relational triple extraction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 1476–1488. https://doi.org/10.18653/v1/2020.acl-main.136
Zheng H, Wen R, Chen X, Yang Y, Zhang Y, Zhang Z, Zhang N, Qin B, Xu M, Zheng Y (2021) PRGC: potential relation and global correspondence based joint relational triple extraction. In: ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp 6225-6235. https://doi.org/10.18653/v1/2021.acl-long.486
Ren F, Zhang L, Zhao X, Yin S, Liu S, Li B (2022) A simple but effective bidirectional framework for relational triple extraction. In: WSDM 2022 - Proceedings of the 15th ACM International Conference on Web Search and Data Mining, pp 824–832. https://doi.org/10.1145/3488560.3498409
Dixit K., Al-Onaizan Y (2020) Span-level model for relation extraction. In: ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pp 5308–5314. https://doi.org/10.18653/v1/p19-1525
Zhong Z, Chen D (2021) A frustratingly easy approach for entity and relation extraction. In: NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, pp 50–61. https://doi.org/10.18653/v1/2021.naacl-main.5
Wang J, Lu W (2020) Two are better than one: Joint entity and relation extraction with table-sequence encoders. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1706–1721. https://doi.org/10.18653/v1/2020.emnlp-main.133
Ren F, Zhang L, Yin S, Zhao X, Liu S, Li B, Liu Y (2021) A novel global feature-oriented relational triple extraction model based on table filling. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 2646–2656. https://doi.org/10.18653/v1/2021.emnlp-main.208
Li X, Yin F, Sun Z, Li X, Yuan A, Chai D, Zhou M, Li J (2019) Entity-relation extraction as multi-turn question answering. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 1340–1350. https://doi.org/10.18653/v1/p19-1129
Du Z, Qian Y, Liu X, Ding M, Qiu J, Yang Z, Tang J (2022) GLM : general language model pretraining with autoregressive blank infilling. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pp 320–335. https://doi.org/10.18653/v1/2022.acl-long.26
Chen D, Manning CD (2014) A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) pp 740–750. https://doi.org/10.3115/v1/d14-1082
Zeng D, Liu K, Chen Y, Zhao J (2015) Distant supervision for relation extraction via piecewise convolutional neural networks. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1753–1762. https://doi.org/10.18653/v1/d15-1203
Zhang M, Zhang Y, Fu G (2017) End-to-end neural relation extraction with global optimization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1730–1740. https://doi.org/10.18653/v1/d17-1182
Sun C, Gong Y, Wu Y, Gong M, Jiang D, Lan M, Sun S, Duan N (2019) Joint type inference on entities and relations via graph convolutional networks. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 1361–1370. https://doi.org/10.18653/v1/p19-1131
Yuan Y, Zhou X, Pan S, Zhu Q, Song Z, Guo L (2020) A relation-specific attention network for joint entity and relation extraction. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), pp 4054–4060. https://doi.org/10.24963/ijcai.2020/561
Wang Y, Yu B, Zhang Y, Liu T, Zhu H, Sun L (2020) Tplinker: single-stage joint extraction of entities and relations through token pair Linking. In: Proceedings of the 28th International Conference on Computational Linguistics, pp 1572–1582. https://doi.org/10.18653/v1/2020.coling-main.138
Wang Y, Sun C, Wu Y, Zhou H, Li L, Yan J (2021) UNIRE: a unified label space for entity relation extraction. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 220–231. https://doi.org/10.18653/v1/2021.acl-long.19
Shang YM, Huang H, Mao XL (2022) OneRel: joint entity and relation extraction with one module in one step. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022, pp 11285–11293. https://doi.org/10.1609/aaai.v36i10.21379
Eberts M, Ulges A (2020) Span-based joint entity and relation extraction with transformer pre-training, vol 325. https://doi.org/10.3233/FAIA200321
Zeng D, Zhang H, Liu Q (2020) Copymtl: copy mechanism for joint extraction of entities and relations with multi-task learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 9507–9514. https://doi.org/10.1609/aaai.v34i05.6495
Nayak T, Ng HT (2020) Effective modeling of encoder–decoder architecture for joint entity and relation extraction. In: AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, pp 8528–8535. https://doi.org/10.1609/aaai.v34i05.6374
Fu TJ, Li PH, Ma WY (2019) Graphrel: modeling text as relational graphs for joint entity and relation extraction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 1409–1418. https://doi.org/10.18653/v1/p19-1136
Zhao K, Xu H, Cheng Y, Li X, Gao K (2021) Representation iterative fusion based on heterogeneous graph neural network for joint entity and relation extraction. Knowl-Based Syst 219:106888. https://doi.org/10.1016/j.knosys.2021.106888
Takanobu R, Zhang T, Liu J, Huang M (2019) A hierarchical framework for relation extraction with reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 7072–7079. https://doi.org/10.1609/aaai.v33i01.33017072
Zeng X, He S, Zeng D, Liu K, Zhao J (2019) Learning the extraction order of multiple relational facts in a sentence with reinforcement learning, pp 367–377. https://doi.org/10.18653/v1/d19-1035
Levy O, Seo M, Choi E, Zettlemoyer L (2017) Zero-shot relation extraction via reading comprehension. In: Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pp 333–342. https://doi.org/10.18653/v1/k17-1034
Li X, Feng J, Meng Y, Han Q, Wu F, Li J (2020) A unified mrc framework for named entity recognition. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 5849–5859. https://doi.org/10.18653/v1/2020.acl-main.519
Du X, Cardie C (2020) Event extraction by answering (almost) natural questions. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 671–683. https://doi.org/10.18653/v1/2020.emnlp-main.49
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21:1–67. https://doi.org/10.48550/arxiv.1910.10683
Paolini G, Athiwaratkun B, Krone J, Ma J, Achille A, Anubhai R, Nogueira C, Xiang B, Soatto S (2021) Structured prediction as translation between augmented natural languages. In: International Conference on Learning Representations (ICLR- 2021). https://doi.org/10.48550/arXiv.2101.05779
Zhang N, Ye H, Deng S, Tan C, Chen M, Huang S, Huang F, Chen H (2021) Contrastive information extraction with generative transformer. IEEE/ACM Trans Audio Speech Lang Process 29:3077–3088. https://doi.org/10.1109/TASLP.2021.3110126
Cabot PLH, Navigli R (2021) REBEL: relation extraction by end-to-end language generation. In: Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021, pp 2370–2381. https://doi.org/10.18653/v1/2021.findings-emnlp.204
Lu Y, Liu Q, Dai D, Xiao X, Lin H, Han X, Sun L, Wu H (2022) Unified structure generation for universal information extraction. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pp 5755–5772. https://doi.org/10.18653/v1/2022.acl-long.395. https://arxiv.org/abs/2203.12277
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Łukasz Kaiser Polosukhin, I (2017) Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 6000–6010. https://doi.org/10.48550/arXiv.1706.03762
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 4171–4186. https://doi.org/10.18653/V1/N19-1423
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le QV (2019) Xlnet: generalized autoregressive pretraining for language understanding. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp 5753–5763. https://doi.org/10.48550/arXiv.1906.08237
Radford A, Narasimhan K, Salimans T, Sutskever I, et al (2018) Improving language understanding by generative pre-training. OpenAI. https://api.semanticscholar.org/CorpusID:49313245
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901. https://doi.org/10.48550/arXiv.2005.14165
Misra I, van der Maaten L (2020) Self-supervised learning of pretext-invariant representations. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 6707–6717. https://doi.org/10.1109/CVPR42600.2020.00674
Fang H, Wang S, Zhou M, Ding J, Xie P (2020) Cert: Contrastive self-supervised learning for language understanding. arXiv preprint arXiv:2005.12766. https://doi.org/10.48550/arXiv.2005.12766
Gao T, Yao X, Chen D (2021) SimCSE: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 6894–6910. https://doi.org/10.18653/v1/2021.emnlp-main.552
Yang Z, Cheng Y, Liu Y, Sun M (2019) Reducing word omission errors in neural machine translation: a contrastive learning approach. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistic, pp 6191–6196. https://doi.org/10.18653/v1/p19-1623
Lee S, Lee DB, Hwang SJ (2021) Contrastive learning with adversarial perturbations for conditional text generation. In: International Conference on Learning Representations (ICLR 2021), https://arxiv.org/abs/2012.07280
Riedel S, Yao L, McCallum A (2010) Modeling relations and their mentions without labeled text. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 148–163. https://doi.org/10.1007/978-3-642-15939-8_10
Gardent C, Shimorina A, Narayan S, Perez-Beltrachini L (2017) Creating training corpora for nlg micro-planning. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 179–188. https://doi.org/10.18653/v1/P17-1017
Huang H, Shang YM, Sun X, Wei W, Mao X (2022) Three birds, one stone: a novel translation based framework for joint entity and relation extraction. Knowl-Based Syst 236:107677. https://doi.org/10.1016/j.knosys.2021.107677
Acknowledgements
This work was supported by Natural Science Foundation of Xinjiang Province [No. 2021D01C079], National Natural Science Foundation of China [No. 62166044], and Natural Science Foundation of Hebei Province [No. F2022203072]. The authors sincerely thank the editor and anonymous reviewers for their valuable comments and feedbacks.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, L., Yu, K., Wumaier, A. et al. Genre: generative multi-turn question answering with contrastive learning for entity–relation extraction. Complex Intell. Syst. 10, 3429–3443 (2024). https://doi.org/10.1007/s40747-023-01321-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40747-023-01321-y