Enriching news events with meta-knowledge information
Given the vast amounts of data available in digitised textual form, it is important to provide mechanisms that allow users to extract nuggets of relevant information from the ever growing volumes of potentially important documents. Text mining techniques can help, through their ability to automatically extract relevant event descriptions, which link entities with situations described in the text. However, correct and complete interpretation of these event descriptions is not possible without considering additional contextual information often present within the surrounding text. This information, which we refer to as meta-knowledge, can include (but is not restricted to) the modality, subjectivity, source, polarity and specificity of the event. We have developed a meta-knowledge annotation scheme specifically tailored for news events, which includes six aspects of event interpretation. We have applied this annotation scheme to the ACE 2005 corpus, which contains 599 documents from various written and spoken news sources. We have also identified and annotated the words and phrases evoking the different types of meta-knowledge. Evaluation of the annotated corpus shows high levels of inter-annotator agreement for five meta-knowledge attributes, and moderate level of agreement for the sixth attribute. Detailed analysis of the annotated corpus has revealed further insights into the expression mechanisms of different types of meta-knowledge, their relative frequencies and mutual correlations.
KeywordsEvents Annotation Meta-knowledge Subjectivity Modality Speculation
The digital information era has made vast and continually growing amounts of data available in digital form. This potentially provides a very rich source of historical data for researchers. However, as the amount of data available grows, researchers face increasing difficulties in finding information that is of interest to their research questions. Simple keyword-based search systems are usually not adequate for this purpose, as researchers typically have to spend a lot of time trawling through volumes of mostly irrelevant data returned by their searches.
Oscar Pistorious killed his girlfriend in Pretoria last night.
Mr Pistorious told the court that he deeply regrets shooting his girlfriend.
According to unconfirmed reports, Oscar Pistorious may have fatally shot his girlfriend, Reeva Steenkamp, at his residence in Pretoria.
Mrs Steenkamp said that she holds Oscar responsible for the tragic events that led to her daughter’s death.
All three of the above sentences (S2–S4) are similar to S1 (and to each other), in that they all refer to the same event (i.e., the death of Reeva Steenkamp caused by Oscar Pistorious). However, the interpretation of the event is different in each sentence. S1 and S3 report the event as new or emerging information, while S2 and S4 mention it as already known or presupposed information. In S1, the information source of the event is the author herself; in S2 and S4, the source is someone involved in the event; and in S3 the information has been attributed to unknown third-party sources. The occurrence of the event is mentioned speculatively in S3, while S1, S2 and S4 report it with apparent certainty. Finally, S2 and S4 contain indications of negative sentiments towards the event, while S1 and S3 do not contain any sentiment or opinion about the event.
These examples demonstrate that merely detecting the event participants and their respective roles in the event is not sufficient; instead, additional contextual information is required for correct/complete interpretation of the event. We refer to this type of contextual information as meta-knowledge (Nawaz et al. 2010b) pertaining to the event. However, it is important to note that the term extra-propositional aspects of meaning (Morante and Sporleder 2012) can also be used to refer to similar types of information.
The ability to automatically recognise meta-knowledge information has been shown to be important for various types of Natural Language Processing (NLP) applications, including information extraction, question answering, summarisation, essay analysis and opinion mining (Wiebe et al. 2004; Riloff et al. 2005; Stoyanov et al. 2005; Webber et al. 2012). Such meta-knowledge has also been shown to improve the sophistication of event extraction systems (Miwa et al. 2012b; Chen et al. 2009; Nawaz et al. 2013a), and can provide additional filtering criteria in semantic search systems (Hirohata et al. 2008).
Building on previous work aimed at enriching biomedical events with meta-knowledge information (Nawaz et al. 2010b, 2012b), this paper describes our work on carrying out a similar type of enrichment of events within a different domain, i.e., news stories. The content of such texts, together with the types of events annotated within them, are very different from those in scientifically and academically oriented articles. Accordingly, we have made substantial changes to the annotation scheme employed, to make it more suitable for application to events concerning news. For this purpose, we took the ACE 2005 corpus (Walker et al. 2006) as our starting point, and modified and updated the annotations based on our new annotation scheme. We chose the ACE 2005 corpus because it is a well-known resource, which already contains some meta-knowledge annotations.
We have developed a new meta-knowledge annotation scheme tailored for news events, together with associated annotation guidelines. The annotation scheme comprises six meta-knowledge attributes. In relation to the original ACE 2005 annotation scheme, we have added two new annotation attributes (i.e., SUBJECTIVITY and SOURCE-TYPE) and have refined one attribute (i.e., MODALITY) by adding two new values (i.e., Speculated and Presupposed) and further specifying the definition of the existing values (i.e., Asserted and Other). We have not changed the existing values for the remaining three attributes (i.e., POLARITY, GENERICITY and TENSE). However, we have refined the annotation guidelines to further clarify the distinction between the values of these attributes.
We have annotated the entire ACE 2005 corpus according to the new annotation scheme.
We have annotated cue phrases that provide evidence for the assignment of specific attribute values.
The newly added attributes are intended to facilitate the development and/or enhancement of various NLP applications in which the ability to compare/contrast opinions or viewpoints can be important, e.g., systems that take multiple perspectives into account when carrying out summarisation (Teufel and Moens 2000) or question answering (Wiebe et al. 2003).
Evaluation of the annotated corpus has shown high inter-annotator agreement for the majority of the added/modified categories, whilst analysis of the annotated attributes has revealed various interesting patterns and correlations.
The meta-knowledge annotations and guidelines may be downloaded from http://www.nactem.ac.uk/ace-mk. The annotations are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International licence.
The remainder of this paper is organised as follows: Sect. 2 provides a brief introduction to event-based text mining, and further highlights the need for meta-knowledge annotation. Section 3 describes the proposed annotation scheme in detail. Section 4 describes the annotation process and evaluation. Section 5 provides a detailed discussion on the analysis of annotated attributes and values. Finally, Sect. 6 contains brief concluding remarks.
2 Background and motivation
Following on from the discussion above, this section provides a more detailed account of event-based text mining, describes the significance of meta-knowledge and its annotation at the event level, and concludes with a brief overview of the ACE 2005 corpus.
2.1 Event-based text mining
As briefly mentioned in Sect. 1, event representations aim to capture the information content of a given text by systematically linking together the entities (e.g., people, organisations, locations, etc.) with events (e.g., actions, relations, situations and states) mentioned in the text (Sauri and Pustejovsky 2009). The entities constitute the “players” (or participants) in the event and, according to the type of event being described, are linked together in different ways, with each participant playing a specific semantic role in the description of the event. For example, the event representation in Fig. 1 assigns the semantic roles of AGENT and VICTIM to the entities Oscar Pistorious and his girlfriend respectively. The event itself is also usually assigned a semantic type from a pre-defined list or ontology. For example, following the ACE 2005 event representation scheme, the event in Fig. 1 has been assigned the semantic type DIE, which is a sub-type of LIFE. Finally, central to the description of the event is a word or phrase (called the event trigger) around which the event participants are arranged. These triggers typically correspond to either verbs (e.g., S1, S2 and S3) or nouns (e.g., S4).
The goal of event extraction systems is to automate the process of recognising events in unstructured text, and to create structured representations such as the above. These structures can be exploited by NLP systems in various ways, e.g., to assist in automatic summarisation (e.g., Liao et al. 2013) or to create semantically-based search systems (e.g., Miyao et al. 2006). Particularly in the biomedical domain, automatic event extraction has been shown to have a broad range of applications (Ananiadou et al. 2015).
Manually annotated corpora of event representations facilitate the development of automatic event extraction systems. Several such corpora have been developed, often in the context of challenges aimed at pushing forward the state of the art in event extraction. These include the MUC (Grishman and Sundheim 1996) and ACE (Strassel et al. 2008) series (primarily newswire) and the BioNLP shared tasks (e.g., Nédellec et al. 2013) (biomedical text). These challenges have stimulated the development of a wide range of event extraction systems in each domain, e.g., (Aone and Ramos-Santacruz 2000; Ji and Grishman 2008; Miwa et al. 2012a; Bjorne and Salakoski 2013).
2.2 Significance of meta-knowledge
As discussed in Sect. 1, the mere recognition of event triggers and their participants is not sufficient for correct and complete event representation. As seen in the example sentences S1–S4, contextual meta-knowledge information is often present within the text, and must be considered to interpret the event correctly. Various types of meta-knowledge information have been demonstrated to be highly relevant in news articles. The expression of different sentiments and opinions in news articles has already been widely studied, e.g., (Bautin et al. 2008; Balahur et al. 2010), because news stories are rarely reported in a neutral way (Godbole et al. 2007). The identification of information source is also very important, given that as many as 90 % of news articles can contain direct or indirect reported speech (Bergler 2006). Additionally, attribution of information to a particular source could either be done in a positive way, to bolster a claim already made in the text, or otherwise to distance the author from the attributed material, implicitly lowering its credibility (Anick and Bergler 1992).
In the past few years, several corpora annotated with certain aspects of meta-knowledge have been created. However, each effort generally has a main focus, such as the identification of information about speculation/certainty, e.g., (Rubin et al. 2006; Rubin 2010), degree of factuality, e.g., FactBank (Sauri and Pustejovsky 2009), opinions, e.g., MPQA (Wiebe et al. 2005) or temporal information, e.g., TimeBank (Pustejovsky et al. 2003). There is often some level of overlap in the types of annotations in these different corpora, since the focussed information is usually supplemented with other information that is considered relevant to correct interpretation, such as polarity (positive or negative) and information source. In addition to the types of information annotated, these corpora vary in a number of other ways, including whether or not they annotate cue expressions that provide evidence for the categories assigned, and the granularity of the textual units annotated—these may be sentences, (sub-sentence) expressions or events. Related efforts in scientific domain (e.g., Wilbur et al. 2006; Nawaz et al. 2010a; Medlock and Briscoe 2007; Vincze et al. 2008; Light et al. 2004) identify some domain-specific features, although their annotation of features such as negation, speculation/certainly level and type of evidence/information source demonstrate the cross-domain importance of these types of information.
2.3 Meta-knowledge annotation of news events
The Steenkamp family fears that Oscar Pistorious may not be found guilty of premeditated murder of Reeva Steenkamp.
Mr Roux said that he was relieved that Oscar was not found guilty of premeditated murder.
The sentences S5 and S6 are similar, in that they both express the event E1 as presupposed (i.e., already known) information, and the event E2 is negated in both sentences. However, there are significant differences between the interpretation of event E2 in each sentence. In S5, E2 is presented as a speculation by a source involved in the event (i.e., the Steenkamp family). Moreover, the source has expressed negative sentiment towards the possible non-occurrence of this event (as denoted by the verb fears). However, in S6, the event E2 is presented as something that has already happened. Moreover, the source (i.e., Mr Roux) has expressed positive sentiment towards the event (according to his use of the verb relieved).
The above examples serve to illustrate the importance of identifying meta-knowledge at the event level. This importance has been demonstrated through the production of corpora containing one or more meta-knowledge features identified at the event level. Examples include Sauri and Pustejovsky (2009), Pustejovsky et al. (2003), Thompson et al. (2011b), and Walker et al. (2006). It has also been shown that meta-knowledge annotation at the event level can complement information annotated for coarser-grained units (Liakata et al. 2012). Such corpora could also form the basis for studying discourse structure at the event level, either by identifying discourse relations that hold between events, or by studying patterns of features that hold across sequences of events, in a similar way to the preliminary work carried out in (Nawaz et al. 2013c). Event-level discourse analysis could complement previous research into identifying discourse relations between coarser-grained units of text (e.g., Carlson et al. 2003; Marcu and Echihabi 2002; Prasad et al. 2008, 2011).
The utility of event-level meta-knowledge annotation has been demonstrated through the development of systems that have been trained to assign individual meta-knowledge attribute values to existing events (Nawaz et al. 2012a, 2013a, b) as well as fully integrated systems that are able to recognise events and multiple types of associated meta-knowledge (e.g., Ahn 2006; Miwa et al. 2012b). In terms of the performance of automatic meta-knowledge recognition, micro-averaged F-Scores generally range between around 70 and 98 %, according to the attribute being recognised.
Although, as mentioned above, there are already several corpora annotated with meta-knowledge features at the event level, these do not constitute ideal resources for training systems to assign fine-grained meta-knowledge attributes to complex event structures prevalent in news articles. For example, the GENIA-MK corpus (Thompson et al. 2011b) provides five types of meta-knowledge annotation for events occurring in biomedical abstracts. Whilst this annotation includes some domain-independent features, the large differences between the characteristics of scientific academic texts and news stories mean that even domain-independent information is usually expressed in very different ways in the two text types. In contrast, the FactBank corpus (Sauri and Pustejovsky 2009) contains news stories. However, the types of event annotated do not have the same type of complex structure that was introduced above, i.e., event participants are not identified and characterised.
2.4 ACE 2005 corpus
We chose the ACE 2005 corpus (Walker et al. 2006) as our starting point for creating and implementing a meta-knowledge annotation scheme for news events. This was motivated by the following main reasons:
Size The ACE 2005 corpus comprises 599 news articles and contains annotations for 15,382 different entities and 5349 different events. The size of the corpus has already been shown to be sufficient to facilitate the training of a machine learning event extraction system with state-of-the-art performance (Miwa et al. 2014). A prototype, integrated system for extracting news events and associated meta-knowledge has been developed. Meta-knowledge in this system corresponds to the original attributes in the ACE 2005 corpus, as detailed below. The system has been used in the development of a semantic search system for the New York Times archive,1 which allows search results to be refined based upon the presence of specific event types and meta-knowledge values (Thompson et al. 2013).
Event Normalisation All events in the corpus are grounded to one of the 33 designated event types, which fall under 8 different top-level categories that are frequently reported in news stories. These top-level categories are LIFE, MOVEMENT, TRANSACTION, BUSINESS, CONFLICT, CONTACT, PERSONNEL and JUSTICE. For example, in the event representation of sentence S5 (shown in Fig. 2), event E1 has been assigned the event type DIE, which is a subtype of the event category LIFE, while event E2 has been assigned the CONVICT subtype of the category JUSTICE. For each event type, the ACE 2005 annotation scheme also specifies a potential set of semantic roles which can be instantiated by entities of specific types. For example, five sematic roles (AGENT, VICTIM, INSTRUMENT, TIME, and PLACE) are defined for the event type DIE, with type restrictions on each participant (e.g., the AGENT can only be an entity of type PERSON or ORGANISATION). The DIE event shown in Fig. 1 has four of these roles instantiated, while the DIE event in Fig. 2 only has two roles instantiated.
Owing to the fine-grained annotation, the normalisation of named entities and events, the specification of semantic roles for each event type, and the implicit restrictions on the types of entities participating in an event, the ACE 2005 corpus constitutes a highly suitable basis for developing semantically enhanced search and question answering systems. For example, such applications can potentially answer questions like, “Who was killed by Oscar Pistorious?”, and “How/when/where did Reeva Steenkamp die?”
Distribution of annotated events across the six subparts of the ACE 2005 corpus
No of events
Given such diversity of texts within the corpus, it provides a highly suitable test set for verification and validation of the proposed attributes and their respective categories in our annotation scheme.
Existing Meta-Knowledge Annotation The ACE 2005 corpus already includes some meta-knowledge attributes annotated at the level of events, in the form of attribute-value pairs. A brief description of each existing attribute is as follows:
POLARITY—This value is set to Negative if it is explicitly stated that the event did not take place. Otherwise the value is set to Positive. For example, referring back to sentence S5 and its event representation in Fig. 2, the polarity value for event E1 would be set to Positive, while the value for E2 would be Negative, as the word not explicitly negates the conviction event.
TENSE—The possible values for this attribute are: Past, Present, Future or Unspecified. These values are assigned according to the time that the event took place with respect to the textual anchor time (i.e., the time of broadcast or publication). Unspecified is assigned if it is not clear when the event took place or if it has taken place. For example, the value of E2 in S5 would be Future, while the value for E1 would be Past.
MODALITY—There are only two possible values for this attribute. The value is set to Asserted when the author or speaker makes reference to the event as though it were a real occurrence. In all other cases the value is set to Other. For example, the modality value for event E1 in S5 would be Asserted, while the value for E2 would be Other. This is because the death event (E1) is being described as something that has actually happened, but speculation is expressed towards the conviction event (E2).
It is hoped that these measures will reduce the number of civilian deaths.
Although the above-mentioned attributes capture some aspects of event interpretation, they do not encode the subjective attitudes (pertaining to the event) that might have been expressed in the text. Similarly, the source of an event and its relative relationship to the event is not identified. Another limitation of the existing meta-knowledge annotation is that the MODALITY attribute has been designed only to identify events that have actually taken place, and there is no way to distinguish events that have speculation expressed towards them. Moreover, no distinction is made between events being reported as “new” information and those describing “old/known” information. We also noticed that there were some inconsistencies in the original annotation of the above attributes. This is further discussed in Sect. 4. Finally, the existing meta-knowledge annotations do not include the corresponding evidence for the assignment of specific values, i.e., the words/phrases often present in the text that indicate a particular aspect of meta-knowledge regarding a specific event. Accordingly, we have aimed to improve the current meta-knowledge annotation in the ACE 2005 corpus, with the ultimate goal of facilitating the training of event extraction systems that are able to recognise rich meta-knowledge to a high degree of accuracy.
3 Annotation scheme
Added two new attributes (i.e., SUBJECTIVITY and SOURCE-TYPE).
Refined one attribute (i.e., MODALITY) by adding two new values (i.e., Speculated and Presupposed) and further specifying the definition of the existing two values (i.e., Asserted and Other).
Refined the annotation guidelines for the remaining three attributes (i.e., POLARITY, GENERICITY, and TENSE) to further clarify the distinction between the values of these attributes. We have re-annotated these three attributes, although we have not changed the original values.
We have annotated the cue words/phrases that provide evidence for the assignment of particular attribute values, and linked them to the appropriate events.
We have annotated named information sources and linked them to the appropriate events.
This attribute aims to capture the source or origin of the information being expressed by the event. Our approach can be compared to various efforts to annotate information about attribution (e.g., Prasad et al. 2007; Pareti and Prodanof 2010; Pareti 2012a, b). All of these studies recognise the importance identifying details about the information source, and the latter efforts specifically aim to annotate the respective text spans that correspond to the source of the information, and to the cue (i.e., the word or phrase linking the source and information). In all of the above efforts, an attribute is assigned to distinguish between different types of source, i.e., the writer, another specified agent, or an arbitrary, unspecified agent. In another study specifically targeted at news (Rubin 2010), a distinction is made between sources corresponding to direct participants and third-party experts. Taking inspiration from these previous studies, we distinguish between events that can be attributed to the correspondent/author, someone involved in the event, or some other third party. In case of third parties, we distinguish between named third party sources and unnamed third party sources (since unnamed sources are often considered less reliable than named sources). We annotate cues in all cases. Additionally, where the source is named, this is also annotated and linked to the event.
Brief descriptions of each value are as follows:
Author This value is assigned to events that are presented as information provided by the author, or as representing their own point of view. This is the default value, assigned to events unless there is any evidence for one of the other values. For example, the LIFE_DIE event reported in sentence S1 is being reported by the author (and there is no mention of any other source). Therefore, it would be assigned the Author value.
Involved This value indicates that the information expressed by the event is attributed to a specified source who is somehow involved or has close links to the actions described by the event. This may be an individual, group, government, political or terrorist organisation who is clearly involved in the event. This value is always determined through the presence of an explicit cue word or phrase, together with the name of the source. For example, consider sentences S2, S4, S5 and S6. In all four cases the source is named and is someone involved in the event.
Third-party This value indicates that the information expressed by the event can be attributed to a third party source that is not involved in the event. Third parties are always indicated by an explicit word or phrase. However, unlike involved sources, the description of third party sources can be vague sometimes, e.g., in sentence S2, the third party source is not named.
Most news stories contain mentions of subjective opinions or attitudes towards the events being described. For example, an event that has already occurred can be praised, condoned or condemned. Similarly, a hypothetical or future event can be planned, proposed, wished for, or feared.
A broad range of different types of information can be grouped under the umbrella of “subjectivity”. For example, taking inspiration from (Banfield 1982) and linking subjectivity to “private states” (Quirk 1985), Wiebe (1994) defines subjectivity analysis as the study of linguistic expressions of opinions, sentiments, emotions, evaluations, beliefs and speculations. Whilst the implicit subjectivity of events can depend upon complex interactions between explicit subjective expressions, advantages/disadvantages for particular event participants (Wiebe and Deng 2014; Deng et al. 2013) or emotions felt by them (Russo and Caselli 2013), the nature of news texts means that it is often difficult to distinguish between finely grained sub-categories of subjectivity (Balahur et al. 2010). As such, we decided to take a relatively simple approach to subjectivity annotation, which is focussed on identifying positive and negative sentiments that are expressed towards the event by the information source. In this respect, the information encoded through this attribute is comparable to the “attitude-type” annotation in the MPQA corpus (Wiebe et al. 2005). However, we also identify cases in which multiple types of subjectivity, both positive and negative, are specified in the context of an event, by multiple information sources. Given the complexity of the complete annotation task, which involves considering various other aspects of meta-knowledge, annotation of subjectivity information has been kept intentionally simple, and is restricted to identifying explicit expressions of subjectivity towards the event as a whole by the identified information source. Such subjectivity may be expressed either through an explicit cue, or through an event trigger that expresses strong subjectivity, such as terrorism, genocide or massacre.
Brief descriptions of each possible value are as follows:
Positive This value is assigned if the information source evaluates the event as good for themselves, for social groups with whose interests they identify, or for the wider community, whether or not they could be considered harmful to others. Such events are often characterised by words indicating approval or anticipation, e.g., verbs like want and urge; adjectives like good and positive; nouns like happy and excited; and adverbs like hopefully, etc.
Negative This value applies when an event is evaluated as bad or harmful from the perspective of the source. Such events are often characterised by words indicating disapproval, apprehension, or fear, e.g., verbs like worry, fear; adjectives like bad and negative; nouns like sad and afraid; and adverbs like unfortunately, etc. Sometimes the event trigger itself also plays the role of a negative subjectivity cue, e.g., words like genocide, holocaust, massacre, ambush, etc.
Multi-valued Occasionally, two or more sources express opposite (i.e., positive and negative) sentiments about the same event. This value is used to identify such instances.
Neutral This is the default value for events with no explicit subjectivity information specified.
While President Obama was congratulating the nation, Al-Qaida issued a statement, vowing to avenge Osama’s death.
As discussed in Sect. 2.4, this attribute already existed in the ACE 2005 corpus. However, the original aim of this attribute was only to distinguish between events that have actually taken place (i.e., Asserted events) and those that are planned, anticipated or feared (i.e., Other events). We have refined the values of this attribute to further distinguish between speculated and certain events, and between events describing new and presumed information. This has resulted in the addition of two new values (i.e., Presupposed and Speculated), and the redefinition of the existing values (i.e., Asserted and Other). A brief description of each value is as follows:
Asserted This value is assigned to definite events, i.e., situations where something has actually happened or is happening. However, in contrast to the original ACE 2005 annotation scheme, we have added the additional constraint that this value is only to be assigned to events that assert new information into the discourse.
Presupposed This is a new value, assigned to definite events that describe situations that are assumed to be already known by the listener/reader, or have been previously mentioned within the discourse. This is a relatively broad definition. For example, in comparison to the classes of information status (Prince 1992), it covers both hearer-old and discourse-old events. Likewise, compared to the givenness hierarchy (Gundel et al. 1993), our definition of Presupposed includes four statuses (in focus, activated, familiar, and uniquely identifiable). We have introduced this value since, according to the fast moving nature of news events, it is important to be able to identify the “newest” part of an on-going news story.
Speculated This value is used to identify events for which there is some explicitly expressed uncertainty regarding their occurrence. Although related corpora make a greater number of distinctions with regard to certainty levels, e.g., Rubin (2007) distinguishes 5 different levels, it was found that annotators could only reach slight levels of agreement (0.15 κ) on such a detailed scale (Rubin 2010), hence our decision to use a more simple distinction.
Other This is the default value for events that do not fit into any of the above categories.
Referring back to the sentences S1–S4, the MODALITY value assigned to the LIFE_DIE event in S1 would be Asserted, as it describes an event that has actually taken place and is being reported as new information. Even though the LIFE_DIE events in S2 and S4 describe definite occurrences, they are not being presented as new information. Therefore, they will be assigned the Presupposed value. Finally, the LIFE_DIE event in S3 is presented as a speculation; therefore it will be assigned the Speculated value.
3.4 Polarity, genericity, and tense
Although we have not changed the existing values for these three attributes, we had noticed some apparent annotation inconsistencies in the ACE 2005 corpus. Therefore, we decided to re-annotate these attributes and produced extended guidelines to facilitate this. This is further discussed in the following section.
4 Annotation process and evaluation
This section contains brief discussions on the annotation of existing attributes, the annotation of meta-knowledge cues, an overview of the annotation process, and the evaluation of the annotations produced.
4.1 Annotation of existing attributes
Whilst the original ACE annotation guidelines included only very brief information about how to annotate the existing attributes, we have produced a new set of guidelines, covering both existing and new attributes. These guidelines include more detailed explanations for each attribute and its possible values, along with examples. We have included expanded explanations for the existing attributes, as we found that the very brief original guidelines had sometimes led to inconsistent annotations in the original corpus. For example, for the TENSE attribute, the Unspecified value was sometimes assigned whenever the event trigger was not a tensed verb, e.g., words like death or war, even when the textual context of the event made clear the time of the event with respect to the textual anchor time.
In order to address the problem of existing inconsistent annotations, we decided that the task undertaken as part of the current work should include not only the annotation of the new or changed attributes, but also the review and possible update of the values of the unchanged attributes. By expanding the guidelines for these attributes, we aimed to foster a more common understanding amongst annotators of when to assign the most appropriate value, and hence to increase the consistency of the annotations. For example, we updated the guidelines to ensure that the value of the TENSE attribute reflects the time of the event according to the textual context. Additionally, by creating a full set of guidelines for all attributes, the same scheme can straightforwardly be applied to other corpora in the future.
4.2 Annotation of cue phrases
As previously mentioned, cue phrases can be helpful in identifying and characterising meta-knowledge features of text spans and/or events. Several previous studies have found that such cues can be important in the interpretation of various aspects of academic texts, e.g., 85 % of speculated statements in biology articles have been found to be conveyed through the presence of particular cue words and phrases (Hyland 1996). Other studies have found that further types of discourse-related information can also be expressed through specific cues (e.g., Rizomilioti 2006; Thompson et al. 2008). Based on these findings, we previously enriched a corpus of events in biomedical text with information about their interpretation, including the identification of cue words and phrases (Thompson et al. 2011b). Subsequent training of a system that could automatically recognise events and their interpretation found that the presence of such cues improves the accuracy of predictions made about meta-knowledge information (Miwa et al. 2012b).
Based on the above findings, we decided to identify cues in the ACE 2005 corpus as part of the annotation effort. The aim is both to improve the quality of results obtained from machine learning, as well as providing a means to carry out an analysis of the type of language used to convey the various types of meta-knowledge information. Annotators were asked to identify any words or phrases in the same sentence as the event that provide evidence for the assignment of a specific value for one of the meta-knowledge attributes, to label them accordingly (e.g., Modality-Cue, Subjectivity-Cue, etc.) and to link these cues to the appropriate event. So, for example, in sentence S5, the word may would be annotated as a Modality-Cue, and linked to the event with the trigger guilty, as evidence for the assignment of the Speculated modality value. Similarly, in S6, said would be annotated as a SourceType-Cue and linked to the event with the trigger guilty.
Based on previous work (Thompson et al. 2011b; Vincze et al. 2008), we decided that, as a general rule, the span of the cue annotation should be the minimum unit of text which can be used to determine the correct value for the given annotation attribute. If the length of the cue is more than a single word, then the cue phrase must be a continuous span of text. This maintains consistency with the rest of the annotations in the ACE 2005 corpus, since all original annotations constitute continuous spans.
4.3 Annotation process
Reviewing and possibly updating the values of existing meta-knowledge attributes (i.e., POLARITY, TENSE, MODALITY and GENERICITY),
Assigning values for the new SUBJECTIVITY and SOURCE-TYPE attributes, as well as identifying the named information source in the text, if present, and linking it to the appropriate event.
Identifying and annotating cue words/phrases that provide evidence for the assignment of particular values to each of the six attributes, if such cues are readily identifiable in the text, and linking them to the appropriate event.
The annotation was carried out with the aid of the brat annotation tool.2 This was chosen for a number of reasons. Firstly, it is very simple to use. Secondly, it provides support to display the complex event structures that are annotated in the ACE 2005 corpus. Finally, it is web-based and requires no installation, meaning that annotators can straightforwardly complete their tasks in any location where they have Internet access.
4.4 Corpus evaluation
During its development phase, the annotation scheme was tested and refined through an iterative process, in which two annotators with computational linguistics expertise annotated a common set of documents, and then compared and discussed the results. This process was particularly useful in highlighting the need to re-annotate the existing attributes in the ACE 2005 corpus.
Agreement rates for annotated discourse attributes
Table 2 shows that there are variations in agreement, according to the attribute being annotated. In terms of the interpretations of Kappa provided in (Viera and Garrett 2005), the agreement achieved for the GENERICITY and POLARITY attributes is “almost perfect”, for TENSE, MODALITY and SUBJECTIVITY, agreement is “substantial” and for SOURCE-TYPE, the agreement level is considered “moderate”. Therefore, the levels of agreement achieved can be considered acceptable in all cases.
It is perhaps unsurprising that the attributes that achieve the highest levels of agreement are the ones that were already present in the ACE 2005 corpus, since the task for these attributes was mainly to review the existing values according to the updated guidelines. However, it should also be noted that although two new values were added to the MODALITY attribute, and the definitions of existing values were changed, “substantial” agreement was still achieved. Although the agreement for the SUBJECTIVITY is about 0.15 lower than for MODALITY, this is still considered to be “substantial” agreement. We consider this to be an encouraging result, given the complexity of the task, i.e., the potential subtlety of the ways in which positive or negative subjectivity can be expressed, and the variety of the types of cues that can be used. The wide range of vocabulary used in subjective expressions has been confirmed by other efforts that have annotated this type of information, e.g., (Wiebe et al. 2005; Kessler et al. 2010). The fact that these studies report similar levels of agreement to ours, in terms of the identification of subjective expressions and/or their linking to target expressions, serves to emphasise the complexity of tasks that involve subjectivity identification.
Agreement rates for cue phrases
Agreement (positive specific agreement)
As shown in Table 3, there is a high degree of consensus between the annotators about which cue phrases to annotate. We found that disagreements may occur if there are multiple possible cues for a given dimension in a sentence. The relatively small difference in agreement rates between exact and relaxed spans illustrates that sufficient guidance was given to annotators regarding the extent of text to mark up as a cue.
4.5 Annotation challenges and resolution
As the above results show, the main annotation challenges were encountered for the SUBJECTIVITY and SOURCE-TYPE attributes. The majority (71 %) of SUBJECTIVITY disagreements in the double-annotated part of the corpus involved discrepancies between the Negative and Neutral values. Further investigation and discussion of these revealed that in most of these cases, one or other of the annotators had failed to notice the negative subjectivity. In the consolidated corpus, most of these cases were thus agreed upon as instances of negative subjectivity. To give some idea of the complexity of identifying subjectivity cues, 324 unique negative subjectivity cues and 179 unique positive subjectivity cues were annotated in the whole corpus. On average, each negative subjectivity cue is associated with 1.84 events, and each positive subjectivity cue is associated with 1.78 events. This demonstrates that there are few “typical” ways of expressing positive or negative subjectivity, which makes the annotation task more difficult.
The most commonly occurring negative subjectivity cue, terrorism (which also functions as an event trigger) appears only 18 times in the entire corpus. In comparison, for the Speculation value of the MODALITY attribute, each unique cue is, on average, used almost three times more frequently than positive or negative subjectivity cues. Furthermore, the most commonly occurring cue for Speculation (i.e., if) occurs 87 times in the corpus, i.e., around five times more frequently than terrorism.
For the SOURCE-TYPE attribute, which has the lowest levels of agreement, 158 of the 173 disagreements (91 %) were found to be cases where one of the annotators had assigned the Author value, while the other annotator had assigned either the Involved or Third Party value. An examination of these disagreements showed that they were mostly annotation errors, in which one of the annotators had missed the fact that the information was explicitly stated as having come from a source other than the author. Such information was frequently missed when a short phrase such as X said was placed at the end of the sentence and far removed from the actual event. The nature of this type of error meant that nearly all occurrences could be agreed upon and corrected in the consolidated version of the corpus. However, it is worth noting that there were very few instances (15 in total) where the two annotators disagreed on whether to assign Involved or Third Party to events with a Source other than Author.
5 Annotation analysis
In this section, we present a discussion and analysis of the complete, updated ACE 2005 corpus annotation, considering each of the six annotated attributes separately. In each case, we consider statistics from the corpus as a whole, and also its subparts, i.e., BN (Broadcast News), BC (Broadcast Conversation), CTS (Conversational Telephone Speech), NW (Newswire), UN (Usenet Newsgroups/Discussion Forums) and WL (Weblogs).
Almost half of the events (around 47 %) correspond to the newly introduced values (i.e., Speculated or Presupposed). This provides strong evidence that our decision to include these categories was well-motivated, since these types of information occur frequently, but were not distinguished in the original version of the ACE 2005 corpus.
Statistics for the modality attribute (total counts and percentages)
In more informal interactions, the proportion of Speculated events becomes higher than the average over the complete corpus, since the focus is on discussing, interpreting and speculating about current affairs. Indeed, the percentage of Speculated events rises as high as 46.6 % in the UN texts, where there are around 10 % more Speculated events than Asserted events. This is in contrast to news reports (BN and NW), where speculation levels are very much lower, and are less than twice as numerous as Asserted events. It is interesting to note, however, that even these proportions of speculated events are still considerably higher than in scientific academic texts. In (Thompson et al. 2011b), it was found that only 8.1 % of events in abstracts of biomedical articles showed any degree of uncertainty. However, academic abstracts are a very different type of text, where authors mostly want to try to present their most certain results, in order to convince the reader of the validity of their work. News reports, on the other hand, aim to present the most relevant and up-to-date details about a particular story. This may include some less reliable, unverified information or rumours, possibly coming from multiple sources. It is important that such information is explicitly flagged as being uncertain, in order to retain credibility in the case that any of the information reported is later contradicted, when new details about the story are obtained.
In the corpus as a whole, just over one-sixth of the events are Presupposed, with proportions in the sub-parts ranging between about 14 and 23 %. In news reports, the reader/listener’s attention is held by ensuring that the majority of the report asserts new details. In a smaller number of cases, events that are already known about may be mentioned, in order to provide updates, or to provide context or background information about the news stories. In parts of the corpus concerned with discussions of news stories, the introduction of previously known information can also be important, as a stimulus for subsequent discussion, interpretations and evaluations of news stories.
Cues for speculated modality
The high occurrence of words such as if, would and whether provides evidence that many speculated events occur within hypothetical contexts. Other events may occur in the context of questions (indicated by what), while modal auxiliaries such as could, may and can, together with related adverbs such as likely, show that there are also instances where the speculation relates to a degree of uncertainty about the truth of the event. Verbs that denote personal opinions, such as believe and think, tend to be more prevalent in the more informal text types, with the more formal or impersonal modal auxiliaries occurring with higher frequencies in news reports.
Statistics for the Subjectivity attribute (total counts and percentages)
Cues for positive and negative subjectivity
So, for example, terrorism or terrorist attacks will be used instead of the more neutral attacks, and genocide will be used instead of killing. Another way of intensifying the negative sentiments invoked by the mention of an event is to use strongly negative adjectives and adverbs, such as deadly. Examining the most commonly occurring negative subjectivity cues specifically for news reports reveals more of these, such as fierce, bloody and horribly. A further method is to use verbs with negative connotations as a means of reporting what people have said, the most commonly occurring examples including threaten, condemn, warn and deny.
The multi-valued subjectivity category, i.e., cases where events are reported with conflicting subjectivity values ascribed to the event by two (or more) different sources, is used very rarely, constituting less than 0.5 per cent of all events in the corpus. Nevertheless, the recognition of such cases may still be worthwhile, since they would be of interest to researchers looking for contradictory and opposing opinions.
Distribution of positive subjectivity events amongst different modalities
Distribution of negative subjectivity events amongst different modalities
The percentages of events with Negative subjectivity are highest for Presupposed events in NW, where almost a quarter of such events have negative subjectivity expressed towards them, and WL, where the proportion rises to almost one-third. In NW, this could be due to the sensationalist nature of news stories, as explained above. In WL, writers are likely to express their own strong opinions. Interestingly, the proportion of Presupposed events with Negative subjectivity in the other type of news reports, i.e., BN, is less than half that in NW. Indeed, in general, there seems to be a lesser tendency to express negative subjectivity on Presupposed events in speech than in writing.
Statistics for the source-type attribute (total counts and percentages)
Looking at the individual parts of the corpus reveals that the explicit identification of information source is particularly prevalent in newswire text, where events attributed to a particular source other than the author account for about 35 % of all events. The ratio of Involved to Third Party events remains about the same as the average over the complete corpus (i.e., about 2:1). Whilst a similar ratio holds for the other part of the corpus that constitutes news reports (i.e., BN), the absolute proportions of events with a SOURCE-TYPE other than Author are much lower in BN than for newswire, constituting only about 13 % of all events in this portion of the corpus. That is to say, events attributed to non-author sources are only about one-third as numerous as in newswire texts. Thus, there seems to be a noticeable divergence in the norms of how news is reported in speech or in writing.
The proportions of events with a non-author source are much lower in the parts of the corpus that contain discussions. Whilst in BC (which is from the CNN channel), the proportion is not much lower than for broadcast news (around 10 %), this falls to about 7 % in discussion groups, and only 1.3 % in conversational telephone speech. In contrast, in WL, the proportion is quite high (about 18 % of all events), with roughly equal numbers of Involved and Third Party events. This may be due to weblogs covering a topic in detail, and from multiple points of view.
Statistics for the polarity attribute (total counts and percentages)
Distribution of negated events amongst different modalities
Statistics for the Genericity attribute (total counts and percentages)
We observed that, on the basis of our detailed annotation guidelines, we were able to identify almost 200 more Specific events than were annotated in the original ACE 2005 corpus. This finding supports our decision to re-annotate the GENERICITY attribute.
Statistics for the Tense attribute in the corpus (total counts and percentages)
For events with Unspecified tense, there is quite a large amount of variation in the different parts of the corpus, ranging from 12.7 % in the BN portion to 34.1 % in UN. The proportions of Unspecified events correlate closely with the proportions of Generic events, which seems reasonable: discussions about generally occurring or habitual events are much less likely to be associated with tense information.
It is important to note that the overall number of Unspecified events in the updated corpus is almost half of that in the original ACE 2005 corpus. Therefore, the re-annotation of the values of the TENSE attribute was worthwhile.
In this paper, we have discussed how meta-knowledge information has a significant impact on the interpretation of events. Therefore, the automatic recognition of such information is important to allow the development of sophisticated and accurate NLP systems. We took the ACE 2005 corpus as our starting point, whose annotation scheme identifies events and encodes some basic aspects of event interpretation. We subsequently extended this scheme to encode a number of other aspects of meta-knowledge, by considering both domain-independent and domain-relevant features of news-related text. We created new annotation guidelines and enriched all 5349 events in the ACE 2005 corpus according to this scheme.
Our annotation effort has not only added new meta-knowledge attributes to the events, but has also identified textual evidence for their assignment (i.e., cues), which has previously been shown to be important for the automated recognition of meta-knowledge information. We verified the soundness and robustness of the scheme through double-annotation of a portion of the corpus and subsequent calculation of inter-annotator agreement, which ranged from 0.530 to 0.871 κ, according to attribute. Subsequent discussion and investigation of the attributes with lower levels of agreement showed that the majority of discrepancies corresponded to systematic errors that were straightforward to correct.
We performed an analysis of the corpus, both as a whole and by considering the parts collected from different data sources separately. This analysis revealed a number of interesting differences in the meta-knowledge features of events, according both to the formality of the setting (e.g., formal news reports versus more informal discussions of news stories) and to whether the material is written or spoken.
As further work, we are developing a machine learning system that makes use of the enriched meta-knowledge information and associated cues to predict richer information relating to the interpretation of events. This will be used in the development an enhanced version of our semantic search system over news archives.
The work described in this article was supported by the JISC-funded ISHER project and the AHRC-funded Mining the History of Medicine project.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.