KC-GEE: knowledge-based conditioning for generative event extraction

Wu, Tongtong; Shiri, Fatemeh; Kang, Jingqi; Qi, Guilin; Haffari, Gholamreza; Li, Yuan-Fang

doi:10.1007/s11280-023-01216-5

KC-GEE: knowledge-based conditioning for generative event extraction

Open access
Published: 25 October 2023

Volume 26, pages 3983–3999, (2023)
Cite this article

Download PDF

You have full access to this open access article

World Wide Web Aims and scope Submit manuscript

KC-GEE: knowledge-based conditioning for generative event extraction

Download PDF

Tongtong Wu^1,2,
Fatemeh Shiri²,
Jingqi Kang¹,
Guilin Qi¹,
Gholamreza Haffari² &
…
Yuan-Fang Li²

1108 Accesses
Explore all metrics

Abstract

Event extraction is an important, but challenging task. Many existing techniques decompose it into event and argument detection/classification subtasks, which are complex structured prediction problems. Generation-based extraction techniques lessen the complexity of the problem formulation and are able to leverage the reasoning capabilities of large pretrained language models. However, they still suffer from poor zero-shot generalizability and are ineffective in handling long contexts such as documents. We propose a generative event extraction model, KC-GEE, that addresses these limitations. A key contribution of KC-GEE is a novel knowledge-based conditioning technique that injects the schema of candidate event types as the prefix into each layer of an encoder-decoder language model. This enables effective zero-shot learning and improves supervised learning. Our experiments on two benchmark datasets demonstrate the strong performance of our KC-GEE model. It achieves particularly strong results in the challenging document-level extraction task and in the zero-shot learning setting, outperforming state-of-the-art models by up to 5.4 absolute F1 points.

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Article 28 November 2018

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Article 27 November 2023

Exploring aspect-based sentiment quadruple extraction with implicit aspects, opinions, and ChatGPT: a comprehensive survey

Article Open access 20 January 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Event extraction [1] aims at extracting structured event records from unstructured text. For example, as shown in Figure 1, the goal of event extraction is to map the document “Two homemade pressure-cooker bombs are detonated remotely by the Tsarnaevs near the finish line of the Boston Marathon, killing three and injuring some 260 others. Seventeen people lost limbs.” to four predefined event types (highlighted with celeste), such as <event type: Attack, trigger word: detonated, role:Attacker: Tsarnaevs, $\dots $, role:ExplosiveDevice: bombs, role:Place: Boston Marathon>, as well as other events that are triggered by words killing and injuring.

Event extraction is challenging due to the diversity of natural language expressions and the complexity of event structures. These challenges are amplified in document-level event extraction where the text is a full document and typically contains more events. Currently, most event extraction methods employ a decomposition-based approach [2], which involves breaking down the structured prediction problem of a complex event into classifications of substructures like trigger detection, entity recognition, and argument classification. Many of these methods tackle the subproblems separately, which necessitates additional annotations for each stage [3].

Natural language generation techniques have been successfully applied to a number of NLP tasks [4,5,6]. These techniques have inspired the use of controlled event generation to tackle event extraction. These approaches use manually designed templates to wrap input sentences and train a model for cloze-style filling. The study by [7] proposes generating linearised event records via a pretrained encoder-decoder architecture combined with a constrained decoding mechanism that alleviates the complexity associated with template combination when extracting multiple events. The advantage of the extraction-as-generation approach is the removal of the need for fine-grained token-level annotations, which are typically used in previous event extraction approaches [8], thus enjoying greater feasibility.

Although good generalizability has been achieved for other tasks, we have observed a significant decrease in performance when it comes to generation-based event extraction over documents or unseen event types. Structured prediction tasks, such as event extraction, often rely on an external schema to format the output, whereas natural language generation tasks do not. To bridge this gap, we introduce a novel technique called knowledge-based conditioning. This approach involves injecting event type information as prefixes on different layers of the underlying pretrained language model. By incorporating this information, we aim to improve the performance of event extraction tasks. Additionally, to address the challenge of adapting to new scenarios, we consider event extraction from the perspective of zero-shot learning [9, 10]. Our model, KC-GEE, is capable of document-level event extraction and is generalizable to the zero-shot setting.

Our main contributions are as follows.

We propose a novel knowledge-based conditioning technique that injects event type information into the model, enabling zero-shot learning capability.
We carefully design a prefix-based injection mechanism that incorporates cross-attention to improve document-level event extraction.
We conducted extensive experiments on two benchmark datasets, in both fully supervised and zero-shot settings. Our evaluation consistently shows strong performance across all settings. In particular, our model achieves substantial superiority in the challenging settings of document-level event extraction and zero-shot transfer, outperforming state-of-the-art models by up to 5.4 absolute F1 points.

2 Related work

Document-level event extraction

Event extraction is a task that extracts structured event records from unstructured text [5]. Many approaches have been proposed for sentence-level event extraction [11, 12], ranging from hand-designed features [13] and neural-learned features [14, 15]. Yet, many real-world applications require document-level event extraction [14,15,16,17,18], in which the information of an event may be mentioned in multiple sentences [19, 20]. Moreover, most work adopt decomposition strategies in event extraction [2], which employ trigger detection [13], entity recognition [21, 22], and argument classification [23]. These decomposition strategies have shown high performance while introducing more detailed annotation to model training [5, 7].

Zero-shot event extraction

Several previous supervised event extraction methods have relied on features derived from manual annotations, limiting their applicability to new event types without additional annotation effort [9, 24, 25]. These methods often struggle to effectively generalize to new label taxonomies and domains. In contrast, [26] proposes a zero-shot event extraction approach. They first utilize existing tools, such as SRL (Semantic Role Labeling), to identify events and subsequently map them to a predefined taxonomy of event types without the need for specific training data. Lyu et al. [27] explores the possibility of zero-shot event extraction by formulating it as a series of Textual Entailment (TE) and/or Question Answering (QA) queries. For instance, they utilize pretrained TE/QA models to establish a direct transfer of knowledge, using statements like "A city was attacked" which entails "There is an attack." In this paper, we propose a novel approach for zero-shot event extraction by jointly training a prefix generator for event schemas. Our method is designed to be parameter-efficient and lightweight, allowing for effective event extraction even in scenarios with limited or no training data.

Generative event extraction

. Generative event extraction has emerged as a promising approach for automatically extracting event information from text by employing generative models. Motivated by the achievements of pretrained language models and the associated natural language generation-based approach in diverse NLP tasks [4, 28,29,30,31,32], some researchers have approached event extraction as controlled event generation. As shown in Figure 2, [5, 6] are end-to-end conditional generation methods with manually designed discrete prompts for each event type, requiring more human effort to find the optimal prompt. To remove the complexity of template combination in extracting multiple events, [7] proposed a method for generating the event records directly using a pretrained encoder-decoder architecture and a constrained decoding mechanism. This extraction-as-generation approach does not require fine-grained token-level annotations, which are typically needed by previous event extraction methods. Liu et al. [33] proposes a generative template-based event extraction method that uses dynamic prefixes and integrates context information with type-specific prefixes to learn a context-specific prefix for each context. However, this method does not consider zero-shot extraction or document-level extraction, which we consider in our paper.

3 Generation-based event extraction

Problem definition

We denote $\mathcal {E}$ and $\mathcal {R}$ as the set of predefined event types and role categories, respectively. An input sequence $\varvec{x} := \{x_1,\ldots ,x_{\mid {\varvec{x}}\mid }\}$ comprises tokens $x_i$, where $\mid {\varvec{x}}\mid $ denotes the sequence length. Given an input document, an event extraction model aims to extract one or more structured events, where each event is specified by (i) the event type $e \in \mathcal {E}$ along with the trigger word t from the document, and (ii) the roles $\mathcal {R}_e \subseteq \mathcal {R}$ along with their corresponding arguments from the document.

Event extraction as generation

Given $\mathcal {E}$ and $\mathcal {R}$ in the predefined event schema, generation-based event extraction models generate a structured sequence based on an input document that is constrained by the schema [7].

The generated sequence is a linearised representation of events mentioned in the document. Specifically, given a document with token sequence ${\varvec{x}}$ as input, a generation-based extraction model, such as KC-GEE, outputs the linearised events representations $\varvec{y} = \langle y_1, y_2, \dots , y_{\mid \varvec{y} \mid }\rangle $, where each event $y_i$ is denoted by $\langle e_i, t_i, \langle r_{i,1}, a_{i,1}\rangle , \dots , \langle r_{i,\mid r \mid }, a_{i,\mid r \mid }\rangle \rangle $. The angled brackets $\langle \cdot \rangle $ are special tokens indicating the sequence structure. The $e \in \mathcal {E}$ and t are the event type and the trigger words (a subspan of the document ${\varvec{x}}$); furthermore, $r_i \in \mathcal {R}$ and $a_i$ denote roles and arguments (subspans of the document ${\varvec{x}}$).

Architecture

Our KC-GEE model adopts a Transformer-based encoder-decoder architecture for event structure generation. KC-GEE outputs the sequentialized event representation $\varvec{y}$ for an input document ${\varvec{x}}$. First, it computes the hidden representation $\textbf{H}_{\varvec{x}} = ({\varvec{h}}_1, {\varvec{h}}_2, \dots , {\varvec{h}}_{\mid {\varvec{x}} \mid }) \in \mathbb {R}^{\mid {\varvec{x}} \mid \times d}$ for each token in the document via a multi-layer Transformer encoder:

$$\begin{aligned} \textbf{H}_{\varvec{x}} = \text {Encoder}({\varvec{x}}), \end{aligned}$$

(1)

where each layer of $\text {Encoder}(\cdot )$ is a Transformer block [34] with the multi-head self-attention mechanism.

Given the encoding $\textbf{H}_{\varvec{x}}$, the decoder generates each token sequentially to produce the sequence of events. At step t, the Transformer-based decoder generates the token $y_t$ and hidden state ${{\varvec{h}}}_t$ as:

$$\begin{aligned} y_t, {\varvec{h}}_t = \text {Decoder}(y_{t-1}; \textbf{H}_{\varvec{y}_{<t}}, \textbf{H}_{{\varvec{x}}}), \end{aligned}$$

(2)

where each layer of $\text {Decoder}(\cdot )$ is a Transformer block, with both the self-attention to past hidden states $\textbf{H}_{\varvec{y}_{<t}}\in \mathbb {R}^{(t-1)\times d}$ during decoding and the cross-attention to the encoding $\textbf{H}_{{\varvec{x}}}$. The conditional probability of the output sequence $p(\varvec{y}\mid {\varvec{x}})$ is then,

$$\begin{aligned} p_{\theta }(\varvec{y}\mid {\varvec{x}}) = \prod _{t=1}^{\mid \varvec{y} \mid }p_{\theta }(y_{t}\mid \varvec{y}_{<t},{\varvec{x}}), \end{aligned}$$

(3)

where $\theta $ denotes the parameters of the Transformer-based encoder-decoder model.

4 Knowledge-based conditioning in event generation

This paper investigates the best way to leverage pretrained language models (PLMs) as the backbone encoder-decoder model for event extraction.^{Footnote 1} Using PLMs is now standard practice in NLP, as they lead to strong performance and generalisation.

Given a labeled training dataset $\mathcal {D}$, we investigate the best way to specialise the PLM for the event extraction task via prefix-tuning [35]. In this section, we show how to effectively condition the generation process on the event extraction task as well as the given document.

One may specialise the underlying PLM to the event extraction task through other methods, such as fine-tuning the PLM parameters or injecting adapters to the encoder and/or decoder of the PLM (see Figure 3). Our experiments show that prefix-tuning is more effective than those methods.

Our desiderata for prefix-conditioning of a PLM for event extraction are as follows: It should enable the model to be aware of (i) the candidate event schemas in the task, (ii) the specific input document, and (iii) flexible schema modifications that may occur after the model is trained in real-world settings. In what follows, we explain how we achieve these desiderata by producing prefixes for the encoder and the decoder based on the events of the task and the input document. Please refer to Figure 4 for an overview of the framework.

4.1 Encoder conditioning

We condition the encoder on the event types of the underlying event extraction task. Given the event types $\varvec{e} = \{e_1, e_2, \dots , e_{\mid \varvec{e} \mid }\} \subseteq \mathcal {E}$ for a task, we use the encoder to get the encoding representation for the event types $\textbf{H}_{\varvec{e}} \in \mathbb {R}^{\mid \varvec{e}\mid \times d}$. We then combine these events representations through a function $f_{enc}:\mathbb {R}^{\mid \varvec{k} \mid \times d}\mapsto \mathbb {R}^{d'}$ to create the events conditioning context, i.e.

$$\begin{aligned} \begin{aligned} \textbf{H}_{\varvec{e}} = \textrm{Encoder}(\varvec{e});\ \ {\varvec{h}}_{\varvec{e}, enc} = f_{enc}(\textbf{H}_{\varvec{e}}) \end{aligned} \end{aligned}$$

(4)

Since we assume each event type is equally probable a priori, we use the pooling average operator as $f_{enc}$. The vector ${\varvec{h}}_{\varvec{e}, enc}$ is used by a prefix generation network $g_{enc}$ to produce the prefix. As shown in Figure 4, by ± in $f_enc(.)$, we suggest that adding or removing an event type representation from knowledge-based conditioning is flexible.

4.2 Decoder conditioning

It is expected that the representation of instances could help the downstream generation in the decoder. Hence we use the representation of both the task and the input document to create a prefix for the decoder.

Specifically, let $\textbf{H}_{{\varvec{x}}}$ denote the representation of the tokens of the input document ${\varvec{x}}$. We combine the document representation $\textbf{H}_{\varvec{x}}$ and the task representation $\textbf{H}_{\varvec{e}}$ through the function $f_{dec}:\mathbb {R}^{\mid \varvec{e}\mid \times d} \times \mathbb {R}^{\mid \varvec{e}\mid \times d} \mapsto \mathbb {R}^{d'}\times \mathbb {R}^{d'}$ as follows,

$$\begin{aligned} {\varvec{h}}_{\varvec{e}, dec},{\varvec{h}}_{{\varvec{x}}, dec} = f_{dec}(\textbf{H}_{\varvec{e}}, \textbf{H}_{{\varvec{x}}}) \end{aligned}$$

(5)

where $f_{dec}$ is based on dot product-based cross-attention, and ${\varvec{h}}_{\varvec{e}, dec} \in \mathbb {R}^{d'}$, ${\varvec{h}}_{{\varvec{x}}, dec} \in \mathbb {R}^{d'}$ are the resulting fixed-dimensional vector summaries for decoder conditioning.

4.3 Prefix generation

We create the encoder prefix $\textbf{Z}_{enc}$ and decoder prefix $\textbf{Z}_{dec}$ as follows,

$$\begin{aligned} \begin{aligned} \textbf{Z}_{enc}&= g_{enc}({\varvec{h}}_{\varvec{e}, enc}) \\ \textbf{Z}_{dec}&= g_{dec}([{\varvec{h}}_{{\varvec{x}}, dec};{\varvec{h}}_{{\varvec{x}}, dec}]) \end{aligned} \end{aligned}$$

(6)

where $g_{enc}$ and $g_{dec}$ are both mapping function $g:\mathbb {R}^{2\times d'}\mapsto \mathbb {R}^{k\times \mid \textbf{H}_i\mid }$, where k is the length of injected prefix and $\mid \textbf{H}_i \mid $ is the number of parameters of the ith injected prefix maintained in the Transformer architecture. With the injection of $\textbf{Z}_{enc}$ and $\textbf{Z}_{dec}$, the encoder and the decoder in (1) and (2) are modified as follows:

$$\begin{aligned} \textbf{H}_{{\varvec{x}}}&= {\text {E}ncoder}({\varvec{x}};\textbf{Z}_{enc}) \end{aligned}$$

(7)

$$\begin{aligned} y_t, {\varvec{h}}_t&= \text {Decoder}(y_{t-1}; \textbf{H}_{\varvec{y}_{<t}},\textbf{Z}_{dec},\textbf{H}_{{\varvec{x}}}), \end{aligned}$$

(8)

where $\textbf{Z}_{enc}$ and $\textbf{Z}_{dec}$ can be thought as pseudo-prefix tokens impacting the generation process [35].

4.4 Training and inference

We train the model by minimising the negative log-likelihood loss:

$$\begin{aligned} \theta ^* = \arg \min _{\theta } \ \sum _{({\varvec{x}},\varvec{y}) \in \mathcal {D}} -\log p_{\theta }(\varvec{y}\mid {\varvec{x}}, \varvec{e}) \end{aligned}$$

(9)

where $\mathcal {D}$ is the training set, $\theta ^*$ denotes the optimal parameters, $\varvec{e} = \{e_1, e_2, \dots , e_{\mid \varvec{e} \mid }\} \subseteq \mathcal {E}$ denotes the event type for a task, $\varvec{x}$ is the input document and $\varvec{y}$ is the predicted event structure. As we formulate the event extraction problem as a sequence generation problem, the overall likelihood $p_{\theta }(\varvec{y}\mid {\varvec{x}}, \varvec{e})$ is formulated as follows:

$$\begin{aligned} p_{\theta }(\varvec{y}\mid {\varvec{x}}, \varvec{e}) = \prod _{t=1}^{\mid \varvec{y} \mid }p_{\theta }(y_{t}\mid \varvec{y}_{<t}, {\varvec{x}}, \varvec{e}). \end{aligned}$$

(10)

where $p_\theta (\varvec{y}\mid {\varvec{x}}, \varvec{e})$ is defined as the cumulative product of $p_{\theta }(y_{t}\mid \varvec{y}_{<t}, {\varvec{x}}, \varvec{e})$, in which $y_t$ is the t-th token in the output sequence $\varvec{y}$.

For inference, we use constrained decoding [7].

5 Experiments

We compare our KC-GEE model with several recent strong models, evaluating in both supervised learning and zero-shot learning settings, as well as for the document-level extraction task. Our aim is to demonstrate the greater generalizability and effectiveness of our model in these challenging scenarios.

5.1 Evaluation setup

5.1.1 Datasets

We carry out experiments on two event extraction datasets: the sentence-level dataset Automatic Content Extraction 2005 (ACE05-EN) [11] and the document-level dataset: WikiEvents [20]. The statistics for both are provided in Table 1. Note that we use the official dataset splits of the two datasets to ensure better reproducibility. It is worth noting that WikiEvents presents significant challenges due to three factors. (1) Context length: each instance in ACE05-EN contains only one sentence, whereas instances in WikiEvents are documents. (2) Event density: almost every instance in ACE05-EN contains only one event, whereas multiple events could be present in one instance in WikiEvents. (3) Data scarcity: the amount of training data in ACE05-EN is more than 77 times greater than that in WikiEvents.

5.1.2 Evaluation metrics

We employ the same evaluation metrics used in previous work [7, 36] for both trigger extraction (Trig-C) and arguments extraction (Arg-C). These metrics include F1, precision, and recall.

As KC-GEE is a text generation model, we consider the input sequence one by one to find the matched utterance to reconstruct the offset of predicted trigger mentions. Additionally, in the case of argument mentions, we identify the trigger offset as the nearest matched utterance to the predicted trigger mention.

Table 1 Statistics of the event extraction datasets used in the paper, including the numbers of event types, argument types, the type of instances, events per instance, and the number of instances in different splits

Full size table

5.1.3 Baselines

We evaluate KC-GEE against three groups of baselines which use different levels of annotations of decreasing granularity: Both token-level and entity-level annotation, Token-level annotation, and Parallel text-record annotation. Some methods utilise token annotations, in which each token in an instance is annotated with event labels and golden entity annotation to facilitate event extraction. Joint3EE [36] is a multi-task model that jointly performs entity, trigger, and argument extraction by shared Bi-GRU hidden representations. DYGIE++ [8] is a BERT-based extraction framework that models text spans and captures within-sentence and cross-sentence context. GAIL [37] is an ELMo-based model that proposes a joint entity and event extraction framework based on generative adversarial imitation learning, which is an inverse reinforcement learning method. OneIE [36] introduces a classification-based information extraction system that employs global features and beam search to extract event structures. Some other methods use token-level annotation. For instance, TANL [3] is a sequence generation-based method that tackles event extraction in a trigger-argument pipeline. Multi-task TANL is the extended version of TANL that transfers structure knowledge from other tasks. BERT-QA [38] and MQAEE [39] consider event extraction as a sequence of extractive question-answering problems. Similar to Text2Event [7], we use Parallel text-record annotation, which only requires (instance, event) pairs without expensive, fine-grained token-level or entity-level annotations. As shown in an instance of such an annotation, $\langle $“Evidence at a makeshift morgue points to mass executions by the Iraqi regime.”, {Type: Execute, Trigger: executions, ...}$\rangle $, parallel text-record annotation is the least demanding and therefore more practical annotation level. We compare our method with Text2Event [7], which introduces a sequence-to-structure generation model that addresses the missing event structure issue via constrained decoding. Given the BART-Gen [5] and Degree [6] both using BART-large as the backbone model while we are using T5-base, and both of the methods require the detected event type as prior, we list their results as another group distinguished from our method and Text2Event. Furthermore, we evaluate KC-GEE against zero-shot approaches on ACE05-EN [9, 10, 27].

5.1.4 Implementation details

We develop our KC-GEE method based on the T5-base pretrained language model, and train it for 50 epochs with a learning rate of 1e-4 and batch size of 8 for the supervised setting. For the zero-shot setting, we use a learning rate of 5e-5 and batch size of 16. To optimize KC-GEE, we employ label smoothing [41] and AdamW [42]. The prefix length is set to 20 for all experiments in Section 5.2.

5.2 Main results

We compare our KC-GEE model in two evaluation settings: fully supervised and zero-shot. For each setting, we organise the model evaluation by the characteristics of the datasets including sentence-level (ACE05-EN) and document-level (WikiEvents).

5.2.1 Supervised setting

In this setting, each model is trained on the full training data of the respective dataset. Table 2 presents the sentence-level event extraction results on ACE05-EN. Note that except for the last block, performance numbers of all baselines are taken directly from Text2Event [7].

From the table, it can be observed that our KC-GEE model outperforms Text2Event in terms of F1 for both argument extraction and trigger extraction.

Sentence-level performance

As discussed above, among all compared models, our KC-GEE model, together with Text2Event [7], is trained on parallel text-record annotations which represent the weakest form of supervision. In contrast, the other baseline models require token-level annotations and entity annotations, which are more fine-grained and expensive to collect. It is expected that models trained on more extensive data would perform better. The last column of the table also shows that the better-performing models use larger pretrained language models (PLMs), such as BERT-large. The larger capacity of these PLMs also contributes to model’s performance.

Table 2 Experiment results for the fully supervised event extraction on ACE05-EN. PLM represents the pretrained language model used by each model. We use text-record annotation, which only provides (instance, event) pairs without expensive, fine-grained token-level or entity-level annotations

Full size table

Table 3 Results for supervised learning on the document-level event extraction dataset WikiEvents

Full size table

Document-level performance

Table 3 shows the performance of the baseline (Text2Event), our model KC-GEE, and its different variants for document-level event extraction on the WikiEvents dataset. Please note that BART-Gen [39] and Degree [6] rely on explicit annotations of event type and assume the event-specific templates are given for document-level argument and trigger extraction, while both Text2Event and our model implicitly perform event detection and subsequent extraction. Furthermore, the remaining models listed in Table 2 are designed for sentence-level tasks and do not support this specific task. To ensure a fair comparison with other methods, we rigorously report our results under the identical setting as indicated in Table 3.

The majority of document-level baselines focus only on event argument extraction from WikiEvents dataset, which did not handle event types and triggers [20, 43, 44]. Our model supports the joint extraction of both event triggers and arguments from the WikiEvents dataset.

We can observe from the table that our full model achieves the best F1 values for both argument extraction (Arg-C) and trigger detection (Trig-C) on WikiEvents. It is especially noteworthy that KC-GEE achieves significant performance advantages over Text2Event of +11.1 and +9.4 absolute F1 points for Arg-C and Trig-C, respectively.

The superiority of our model can be attributed to two design features. Firstly, our cross-attention mechanism filters event type tokens and argument tokens, allowing the model to handle long context better. Secondly, our knowledge-based conditioning mechanism injects event type information into the model, enabling it to learn more effectively with less data. A detailed analysis of the contributions of each model component are presented below.

Table 4 Experiment results for zero-shot learning on sentence-level (ACE05-EN) and document-level (WikiEvents) datasets

Full size table

5.2.2 Zero-shot setting

We evaluate KC-GEE’s ability to generalize to unseen event types in the zero-shot setting for both sentence-level (ACE05-EN) and document-level (WikiEvents) event extraction tasks. Specifically, for both datasets, we randomly split the instances into two subsets Source and Target. Source contains the annotations of 23 event types, while Target only retains the annotations of 10 instances for each of the 10 unseen event types. In this experiment, we first pretrained each model on the Source dataset, which is evaluated on the 10 new event types in the Target dataset without fine-tuning.

The results for both datasets are shown in Table 4. Once again, our full model significantly outperforms baselines. On ACE05-EN, it obtains F1 gains of 27.7 and 9.2 absolute points for Arg-C and Trig-C, respectively. On WikiEvents, the F1 gains over Text2Event are 4.4 and 5.4 absolute points for Arg-C and Trig-C, respectively. We attribute the strong zero-shot generalizability of our model to knowledge-based conditioning. By casting the event extraction task as a generation problem, and injecting event type names, the model gains task-specific information that is especially valuable.

Table 5 The ablation study in the supervised learning setting on the ACE05-EN dataset based on T5-base

Full size table

5.3 Ablation study

This section analyzes the effects of prefix encoder conditioning, prefix decoder conditioning, prefix cross-attention, and constrained decoding in KC-GEE. We designed five ablated variants based on T5-base:

w/o enc-cond indicates KC-GEE without prefix encoder conditioning.
w/o dec-cond indicates KC-GEE without prefix decoder conditioning.
w/o both-cond indicates KC-GEE without both prefix encoder and prefix decoder conditioning.
w/o const-dec discards the constrained decoding during inference and generates event structures as an unconstrained generation model.
w/o cross-att indicates KC-GEE without prefix cross-attention.

Table 5 shows the results of ACE05-EN on the test set for the supervised learning setting. We observe that:

constrained decoding helps, but not too much;
prefix encoder and decoder conditioning are the most effective module if we use both of them together.

Furthermore, as constraint decoding limits the argument and the trigger word generated by the model, our method does not suffer from hallucination problems.

5.4 Analysis

In this section, we conduct comprehensive studies to analyze the design of our method from different perspectives.

5.4.1 Prefix length

Longer prefixes provide more knowledge-based conditioning information to the model. Table 6 summarizes the result of the model performance with different prefix lengths on the WikiEvents dataset. As shown in the table, longer prefixes improve model performance on Arg-C, while performance on Trig-C improves with increases in prefix length until 20, after which the F1 value plateaus. However, longer prefixes requires more model parameters. Therefore, we set the prefix length to 20 as a trade-off between model performance and computational efficiency.

Table 6 Zero-shot learning on WikiEnents with different prefix lengths

Full size table

Table 7 Zero-shot learning on WikiEvents with different knowledge-based conditioning

Full size table

5.4.2 Knowledge-based conditioning

A key contribution to our method is the introduction of knowledge-based conditioning information. We analyze this component from two perspectives: (1) conditioning information and (2) injection mechanism.

Conditioning information

In Table 7, we analyze in detail the effect of different knowledge-based conditioning, in which we fix the prefix length at 20. As can be seen, having no knowledge-based conditioning (None) results in poor performance across the board. Injecting task-agnostic information (Pseudo token) provides noticeable gains on Trig-C. Furthermore, injecting event type information substantially improves performance on both Arg-C and Trig-C. Adding role information improves performance on Arg-C but decreases performance on Trig-C. Finally, having all three types of conditioning does not bring additional benefits.

This comparison highlights the effectiveness of knowledge-based conditioning on event type information. Additionally, incorporating role information enhances argument extraction performance, although it comes at the expense of trigger extraction.

Injection mechanism

The bottom four rows in Tables 3 and 4 display variants of our KC-GEE model, where knowledge-based conditioning information is injected in different ways, as depicted in Figure 3. Specifically, the “Adapter” variant injects knowledge-based conditioning information in an adapter layer over each Transformer layer while freezing the parameters of the underlying language model. The “Fine tuning+Adapter” variant employs adapter layers and updates the language model’s parameters. The “Prefix” variant prepends the knowledge-based conditioning vectors ${\varvec{h}}$ to each layer in the language model while keeping the language model’s parameters frozen. Finally, the “Fine tuning+Prefix” variant additionally updates the parameters of the language model. We can make the following observations from Tables 3 and 4.

As expected, updating the language model’s parameters (i.e. “Fine-tuning”) is much more effective than when the parameters are frozen, regardless of whether Knowledge-based conditioning information is injected as adapters or prefixes.

The “Adapter” style of injection performs especially poorly on WikiEvents in both the supervised and zero-shot settings. In comparison, on WikiEvents, “Prefix” inject is able to outperform Text2Event in the zero-shot setting and achieves competitive performance in the supervised setting.

6 Conclusion

In this paper, we formulate the problem of event extraction as a natural-language generation task. We propose KC-GEE, a generation-based document-level event extraction technique that leverages large pretrained language models. A key component in KC-GEE is a novel Knowledge-based conditioning technique that injects event type information into the model as prefixes to enable zero-shot learning capability. The cross-attention mechanism in the prefix generator also facilitates effective document handling. Extensive experiments on two benchmark datasets demonstrate the effectiveness of KC-GEE, which achieves state-of-the-art performance in document-level extraction in fully supervised and zero-shot settings. On the challenging WikiEvents dataset, KC-GEE substantially outperforms the current best model by up to 27.7 absolute points in F1. In future work, we will investigate incorporating attention mechanisms or graph-based techniques to integrate external knowledge, like event descriptions, improving zero-shot event extraction performance.

7 Limitations

In this paper, we explore a new method for solving zero-shot and document-level event extraction through Knowledge-based conditioning. The model has the ability of zero-shot transfer learning primarily from seen roles composition. This means that although the model has not seen any instances of zero-shot event types during training, the schema of these event types is available in the training stage. There is still a gap in the true zero-shot generalization.

Availability of data and materials

All code and data will be made publicly available on acceptance of the paper.

Notes

In our experiments, we make use of T5 [4], but our methods are applicable to other large pretrained encoder-decoder models as well.

References

Li, Q., Peng, H., Li, J., Hei, Y., Sun, R., Sheng, J., Guo, S., Wang, L., Yu, P.S.: A survey on deep learning event extraction: Approaches and applications. TNNLS, 1–21 (2022)
Xu, R., Liu, T., Li, L., Chang, B.: Document-level event extraction via heterogeneous graph-based interaction model with a tracker. In: Proceedings of ACL, pp. 3533–3546 (2021)
Paolini, G., Athiwaratkun, B., Krone, J., Ma, J., Achille, A., Anubhai, R., Santos, C.N.d., Xiang, B., Soatto, S.: Structured prediction as translation between augmented natural languages. In: ICLR (2021)
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140), 1–67 (2020)
MathSciNet Google Scholar
Li, S., Ji, H., Han, J.: Document-level event argument extraction by conditional generation. In: Proceedings of NAACL, pp. 894–908 (2021)
Hsu, I.-H., Huang, K.-H., Boschee, E., Miller, S., Natarajan, P., Chang, K.-W., Peng, N.: Degree: A data-efficient generation-based event extraction model. In: Proceedings of NUSE@NAACL (2022)
Lu, Y., Lin, H., Xu, J., Han, X., Tang, J., Li, A., Sun, L., Liao, M., Chen, S.: Text2Event: Controllable sequence-to-structure generation for end-to-end event extraction. In: Proceedings of ACL, pp. 2795–2806 (2021)
Nguyen, T.M., Nguyen, T.H.: One for all: Neural joint modeling of entities and events. In: Proceedings of AAAI, vol. 33, pp. 6851–6858 (2019)
Huang, L., Ji, H., Cho, K., Dagan, I., Riedel, S., Voss, C.: Zero-shot transfer learning for event extraction. In: Proceedings of ACL, pp. 2160–2170 (2018)
Zhang, H., Wang, H., Roth, D.: Zero-shot Label-aware Event Trigger and Argument Classification. In: Findings of ACL, pp. 1331–1340 (2021)
Walker, C., Strassel, J.M.S., Maeda, K.: Ace 2005 multilingual training corpus. https://catalog.ldc.upenn.edu/LDC2006T06 (2006)
Shiri, F., Wu, T., Li, Y.-F., Haffari, G.: Tcg-event: Effective task conditioning for generation-based event extraction. In: Proceedings of ALTA workshop (2022)
Shen, S., Wu, T., Qi, G., Li, Y., Haffari, G., Bi, S.: Adaptive knowledge-enhanced bayesian meta-learning for few-shot event detection. In: Findings of ACL, pp. 2417–2429 (2021)
Zhang, N., Chen, X., Xie, X., Deng, S., Tan, C., Chen, M., Huang, F., Si, L., Chen, H.: Document-level relation extraction as semantic segmentation. In: Proceedings of IJCAI, pp. 3999–4006 (2021)
Huang, K.-H., Peng, N.: Document-level event extraction with efficient end-to-end learning of cross-event dependencies. In: Proceedings of NUSE@NAACL, pp. 36–47 (2021)
Yang, H., Sui, D., Chen, Y., Liu, K., Zhao, J., Wang, T.: Document-level event extraction via parallel prediction networks. In: Proceedings of ACL, pp. 6298–6308 (2021)
Liu, X., Luo, Z., Huang, H.: Jointly multiple events extraction via attention-based graph information aggregation. In: Proceedings of EMNLP, pp. 1247–1256 (2018)
Lou, D., Liao, Z., Deng, S., Zhang, N., Chen, H.: Mlbinet: A cross-sentence collective event detection network. In: Proceedings of ACL, pp. 4829–4839 (2021)
Ebner, S., Xia, P., Culkin, R., Rawlins, K., Durme, B.V.: Multi-sentence argument linking. In: Proceedings of ACL, pp. 8057–8077 (2020)
Li, S., Ji, H., Han, J.: Document-level event argument extraction by conditional generation. In: Proceedings of NAACL, pp. 894–908 (2021)
Lison, P., Barnes, J., Hubin, A., Touileb, S.: Named entity recognition without labelled data: A weak supervision approach. In: Proceedings of ACL, pp. 1518–1533 (2020)
Du, X., Rush, A.M., Cardie, C.: GRIT: generative role-filler transformers for document-level event entity extraction. In: Proceedings of EACL, pp. 634–644 (2021)
Zhang, Z., Kong, X., Liu, Z., Ma, X., Hovy, E.H.: A two-step approach for implicit event argument detection. In: Proceedings of ACL, pp. 7479–7485 (2020)
Geng, Y., Chen, J., Zhuang, X., Chen, Z., Pan, J.Z., Li, J., Yuan, Z., Chen, H.: Benchmarking knowledge-driven zero-shot learning. J. Web Semant., 100757 (2023)
Chen, J., Geng, Y., Chen, Z., Horrocks, I., Pan, J.Z., Chen, H.: Knowledge-aware zero-shot learning: Survey and perspective. In: Proceedings of IJCAI, pp. 4366–4373 (2021)
Zhang, H., Wang, H., Roth, D.: Zero-shot label-aware event trigger and argument classification. In: Findings of ACL, pp. 1331–1340 (2021)
Lyu, Q., Zhang, H., Sulem, E., Roth, D.: Zero-shot event extraction via transfer learning: Challenges and insights. In: Proceedings of ACL, pp. 322–332 (2021)
Wang, Y., Wood, I., Wan, S., Dras, M., Johnson, M.: Mention flags (MF): Constraining transformer-based text generators. In: Proceedings of ACL), pp. 103–113 (2021)
Wang, Y., Xu, C., Sun, Q., Hu, H., Tao, C., Geng, X., Jiang, D.: PromDA: Prompt-based data augmentation for low-resource NLU tasks. In: Proceedings of ACL), pp. 4242–4255 (2022)
Wang, Y., Zheng, J., Xu, C., Geng, X., Shen, T., Tao, C., Jiang, D.: KnowDA: All-in-one knowledge mixture model for data augmentation in low-resource NLP. In: ICLR (2023)
Ye, H., Zhang, N., Bi, Z., Deng, S., Tan, C., Chen, H., Huang, F., Chen, H.: Learning to ask for data-efficient event argument extraction (student abstract). In: Proceedings of AAAI, pp. 13099–13100 (2022)
Yao, Y., Mao, S., Chen, X., Zhang, N., Deng, S., Chen, H.: Schema-aware reference as prompt improves data-efficient relational triple and event extraction. CoRR (2022)
Liu, X., Huang, H., Shi, G., Wang, B.: Dynamic prefix-tuning for generative template-based event extraction. Proceedings of ACL (2022)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of NeurIPS, pp. 5998–6008 (2017)
Li, X.L., Liang, P.: Prefix-tuning: Optimizing continuous prompts for generation. In: Proceedings of ACL, pp. 4582–4597 (2021)
Lin, Y., Ji, H., Huang, F., Wu, L.: A joint neural model for information extraction with global features. In: Proceedings of ACL, pp. 7999–8009 (2020)
Zhang, T., Ji, H., Sil, A.: Joint entity and event extraction with generative adversarial imitation learning. Data Intell. 1(2), 99–120 (2019)
Article Google Scholar
Du, X., Cardie, C.: Event extraction by answering (almost) natural questions. In: Proceedings of EMNLP, pp. 671–683 (2020)
Li, F., Peng, W., Chen, Y., Wang, Q., Pan, L., Lyu, Y., Zhu, Y.: Event extraction as multi-turn question answering. In: Findings of EMNLP, pp. 829–838 (2020)
Wadden, D., Wennberg, U., Luan, Y., Hajishirzi, H.: Entity, relation, and event extraction with contextualized span representations. In: Proceedings of EMNLP, pp. 5783–5788 (2019)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of CVPR, pp. 2818–2826 (2016)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2017)
Liu, J., Chen, Y., Xu, J.: Machine reading comprehension as data augmentation: A case study on implicit event argument extraction. In: Proceedings of EMNLP, pp. 2716–2725 (2021)
Lin, J., Chen, Q.: Poke: A prompt-based knowledge eliciting approach for event argument extraction. CoRR (2021)

Download references

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions. This material is based on research sponsored by Air Force Research Laboratory and DARPA under agreement number FA8750-19-2-0501 and HR001122C0029. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Southeast University, Nanjing, 211189, Jiangsu, China
Tongtong Wu, Jingqi Kang & Guilin Qi
Department of DS &AI, Faculty of IT, Monash University, Melbourne, 3800, VIC, Australia
Tongtong Wu, Fatemeh Shiri, Gholamreza Haffari & Yuan-Fang Li

Authors

Tongtong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Fatemeh Shiri
View author publications
You can also search for this author in PubMed Google Scholar
Jingqi Kang
View author publications
You can also search for this author in PubMed Google Scholar
Guilin Qi
View author publications
You can also search for this author in PubMed Google Scholar
Gholamreza Haffari
View author publications
You can also search for this author in PubMed Google Scholar
Yuan-Fang Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Tongtong Wu: Model and algorithm design, implementation, data curation, evaluation and analysis of results, writing the paper draft and final version. Fatemeh Shiri: Model and algorithm design implementation, data curation, evaluation and analysis of results, writing the paper draft and final version. Jingqi Kang: Data curation, conducting the ablation study. Guilin Qi: Results analysis, writing the final version. Gholamreza Haffari: Conceptualization, model and algorithm design, results analysis and discussions, writing the final version. Yuan-Fang Li: Conceptualisation, model and algorithm design, results analysis and discussions, writing the final version.

Corresponding author

Correspondence to Yuan-Fang Li.

Ethics declarations

Ethical approval

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, T., Shiri, F., Kang, J. et al. KC-GEE: knowledge-based conditioning for generative event extraction. World Wide Web 26, 3983–3999 (2023). https://doi.org/10.1007/s11280-023-01216-5

Download citation

Received: 21 October 2022
Revised: 04 September 2023
Accepted: 01 October 2023
Published: 25 October 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11280-023-01216-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

KC-GEE: knowledge-based conditioning for generative event extraction

Abstract

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Exploring aspect-based sentiment quadruple extraction with implicit aspects, opinions, and ChatGPT: a comprehensive survey

1 Introduction

2 Related work

Document-level event extraction

Zero-shot event extraction

Generative event extraction

3 Generation-based event extraction

Problem definition

Event extraction as generation

Architecture

4 Knowledge-based conditioning in event generation

4.1 Encoder conditioning

4.2 Decoder conditioning

4.3 Prefix generation

4.4 Training and inference

5 Experiments

5.1 Evaluation setup

5.1.1 Datasets

5.1.2 Evaluation metrics

5.1.3 Baselines

5.1.4 Implementation details

5.2 Main results

5.2.1 Supervised setting

Sentence-level performance

Document-level performance

5.2.2 Zero-shot setting

5.3 Ablation study

5.4 Analysis

5.4.1 Prefix length

5.4.2 Knowledge-based conditioning

Conditioning information

Injection mechanism

6 Conclusion

7 Limitations

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation