1 Introduction

Journalism relies more and more on computers and the Internet [28]. Journalistic platforms such as NewsReader [51], SUMMA [13], and Reuters Tracer [25] are designed to continuously harvest potentially news-related information and make it useful for journalists. More general news platforms such as Event Registry [24] and the GDELT projectFootnote 1 offer similar services to a wider audience.

Journalistic knowledge platforms must be able to cope with information that arrives from a wide variety of sources and in a wide variety of formats. Knowledge graphs [45] and related semantic technologies [3] appear well suited for this task because they have been developed specifically for integrating, enriching, and processing factual information. We envisage journalistic knowledge graphs that continuously integrate new factual information from both news and pre-news sources; enrich it with reference and other contextual information; and prepare it for journalistic use.

This paper explores how journalistic knowledge graphs can be augmented with support for news angles, a concept that refers to how journalists make news events interesting for an audience [9], for example, by emphasising and including certain facts about an event over others. Although finding good news angles on unfolding events is a central skill for journalists, they remain a journalistic practice more than a theoretical concept. To our knowledge, news angles have not been studied at a deeper, structural level, for example, from a knowledge representation and reasoning perspective. We therefore seek to formalise news angles in order to develop an ICT platform that can help journalists with tasks such as: detecting newsworthy events quickly and precisely; identifying appropriate angles on those events; and contextualising those angles with suitable background information. Examples of angles are local person, conflict, triumph over adversity, and fall from grace. Some of them are more detailed versions of others, such as David-versus-Goliath, a subtype of conflict. We will encounter more examples soon.

Specifically, the paper will propose OWL ontologies [3] that can be used to organise knowledge graphs that support journalistic news angles. Our central aim is to identify the concepts and relations that such knowledge graphs must capture. While there are many related ontologies available, we will argue that none of them satisfy our needs completely. We ask: how can ontologies be used to organise journalistic knowledge graphs and augment them to support news angles? We answer this research question by working through an archival example of a real news event. We present detailed knowledge graphs and ontologies that can be used to represent news items, events, and angles. We thereby also shed light on the development of knowledge graph-driven software systems, an increasingly important type of system that is driven by models both on the operational (or run-time) level where knowledge graphs represent individuals along with their type and their relations and on the development (or software engineering) level where ontologies represent relations between and constraints on resource types.

The rest of the paper is organised as follows: Sect. 2 reviews related work, and Sect. 3 outlines our research method. Section 4 discusses the concept of news angles, and Sect. 5 identifies central concepts and relations and presents ontologies to represent them. Section 6 discusses our proposal and compares our ontologies to existing ones, and Sect. 7 concludes the paper and outlines paths for further work.

2 Background

2.1 Computational journalism

Thurman [47] considers computational journalism to be “the advanced application of computing, algorithms, and automation to the gathering, evaluation, composition, presentation, and distribution of news”. Computational journalism can aim either to support journalists or to automate journalism. Examples of automation include robot writing [23]. In contrast, this paper aims to support journalists by relieving them from much of the low-level work of collecting, checking, and organising facts.

In either variant, computational journalism relies increasingly on machine learning, natural language processing (NLP), and other artificial intelligence (AI) techniques. Miroshnichenko [31]  identifies four uses of AI for journalism: data mining, topic selection, commentary moderation, and news writing. This paper aims to support data mining and topic selection in particular, by applying knowledge graphs to represent the contents of news items and the events they describe. The knowledge graphs are extracted from news items using NLP techniques and ontologies.

2.2 News Hunter

In collaboration with Wolftech Broadcast Solutions, a developer of TV news production software for the international market, our research group is developing News Hunter, a knowledge graph-based journalistic knowledge platform [4, 33]. News Hunter has been designed to continuously: harvest potentially news-related information from a variety of sources; integrate the information; enrich it with additional information from encyclopaedic and other reference sources; organise it for journalistic use; and provide potentially relevant information to journalists or the general audience, whether passively on demand or proactively through event detection. Section 5.1 will review the evolving News Hunter architecture in more depth.

This paper builds on previous papers about News Hunter that: give an overview of the earlier News Hunter prototypes (which did not support angles) [4]; discuss the concept of news angles and outline a suitable big-data architecture [9]; investigate reasoning approaches for finding suitable news angles [46]; and discuss how angles can help formalising the concept of newsworthiness [2]. It is based on a paper presented at EMMSAD 2019 [34], which it extends in several ways by: reviewing related work more extensively; discussing annotation confidence, relevance, and strength in more detail; reviewing related ontologies more thoroughly; and showing how our proposal goes beyond existing ontologies.

2.3 Journalistic knowledge graphs

Knowledge graphs represent factual information as triples of subjects, predicates, and objects. Each subject is a material or conceptual resource, such as a person, an organisation, a place, a piece of information, or a concept. A predicate expresses a relation between the subject and the object, for example that an organisation employs a person, that a place has a name, or that a piece of information is about a concept. Hence, an object can be either a material or conceptual resource like a subject, or it can be a literal value, such as a string, number, time, or date. When a knowledge graph is represented in the Resource Description Format (RDF) [3], its subjects and predicates are represented as standardised Internationalised Resource Identifiers (IRIs), and its objects are represented either as IRIs or literals. Because the same IRI can be the subject and object of many triples, the facts form a directed graph with subjects and objects as nodes and predicates as arrows.

Fig. 1
figure 1

A small knowledge graph describing that Hassan Ali Khayre, who has previously lived and worked in Norway, has been appointed prime minister of Somalia

figure c

Example

Figure 1 depicts a small knowledge graph of 6 resources (nodes) and 9 triples (arrows). Listing 1 shows the same graph in the more detailed Turtle notation we will use in the rest of the paper. The graph describes that president Mohamed Farmajo has appointed Hassan Ali Khayre as prime minister of Somalia. Khayre is a dual citizen of both Somalia and Norway. He has a geographical relation to Norway because he has resided in Vestre Slidre and worked for the Norwegian Refugee Council. Each resource in the graph is represented using a standard IRI defined by DBpedia (the dbr: prefix), and each predicate is represented using either a DBpedia IRI (dbo:), the Friend-of-a-Friend vocabulary (foaf:based_near), or the RDF version of WordNet (where wn:02481345-v represents a specific sense of the verb “appoint”). The graph can easily be expanded with related facts from DBpedia, Wikidata, and other data sets available in the Linked Open Data (LOD) cloud [5], a vast distributed repository of knowledge available as RDF graphs that use standard IRIs. More advanced knowledge graphs can also represent details such as the sources of facts and the time intervals during which they are valid.Footnote 2

Whereas a prime minister appointment in Somalia might not warrant prominent mention in national news outside Eastern Africa, the connection to the Norwegian Refugee Council and to Vestre Slidre means that the core facts represented in Fig. 1 are newsworthy in Norway too. But this connection may not be easy to detect for journalists who are not knowledgeable about both countries. The rest of the paper will therefore investigate how the event in Fig. 1 could potentially have been discovered through a journalistic knowledge graph augmented with support for a news angle such as local person.

3 Research method

Because our research on News Hunter is exploratory and involves technology development, we have framed it as design science [17, 18], investigating journalistic knowledge platforms by developing a series of proof-of-concept prototypes [4, 33]. The gist of design science research is to advance theory and improve practice by incrementally developing and evaluating one or more research artefacts. We have focussed on two artefacts: an architecture (a high-level structure of system components) and a series of instantiations (a situated implementation in a specific environment) of that architecture in the form of prototypes [49]. Hence, the ontological constructs and models and the design principles behind our architecture form a “[n]ascent design theory—knowledge as operational principles/architecture” [15]—that can be explored in further research.

Fig. 2
figure 2

An example tweet written in Somali, announcing that President Farmajo has appointed Hassan Ali Khayre as the new prime minister of Somalia

To investigate how journalistic knowledge graphs can be augmented with support for news angles, we build on our earlier discussions of the concept of news angles [9]. In this paper, we proceed to work through an archival example of a real news event, developing detailed ontologies and example knowledge graphs that can be used to represent a news item and a matching news angle. For this purpose, we must first identify the central concepts and relations to build our ontologies around. Having done this, we can proceed to link our central concepts to and extend our ontologies with concepts and relations from existing ontologies. We must also ensure that our ontologies support reasoning over knowledge graphs. We will validate our proposal through the running example, by showing that the ontologies we propose indeed provide the concepts necessary to represent item sub-graphs that can be collated into event graphs that can in turn be matched with represented news angles. When work on the platform has proceeded further, we plan to perform more extensive evaluations with larger data sets of automatically lifted news items, automatically detected news events, and automatically matched news angles.

4 What is a news angle?

In the journalist’s daily work, they need to find a way of presenting a newsworthy event that attracts readers or viewers. A news angle is thus defined as how a journalist or other news worker makes an event (or situation) interesting for an audience [9], for example when selecting which core facts to emphasise in a news report and which contextual facts to include. When the event itself has high public interest, it may be obvious from what angle is should be reported, but in many circumstances only a creative, original angle warrants a report on the event.

News angles and values are common journalistic ideas mentioned in text books, e.g. [44, p. 115], and in the research literature [16]. When selecting a news angle, the journalist chooses features of the event to focus on, like a particular person being involved, relationships between persons and other entities in the event, or unusual qualities of some of the features of the event. We have compiled a list of angles from academic textbooks [44] and web sitesFootnote 3. These include

  • Conflict: the event accentuates a conflict among people or organisations.

  • Human interest: the event involves an individual who is personally affected in some way.

  • Impact: the event has an effect on society or nature.

  • Influence: the event changes somebody’s position or status in society.

  • Milestone: the event is significant in the lifetime of someone or something.

  • Proximity: the event has a particular relevance to a local place.

  • Recency: the event has a particular relevance to current issues.

In addition to gaining the audience’s attention, a news angle serves several additional purposes:

  • it provides a criterion for selecting events that are worth reporting;

  • it points towards additional facts to report;

  • it suggests which information sources to use; and

  • it serves as a template for how to present the event.

In practice, a journalist with sufficient background knowledge may observe a tweet like the one in Fig. 2 and start to think about news angles like the ones in Table 1.

Table 1 Alternative news angles on the tweet from Fig. 2

5 Central concepts and relations

To prepare for a knowledge graph-based ICT platform for journalists, this section will propose thematic (sub-)ontologies for representing: potentially news-relevant information in semantic form (Sect. 5.2); potentially newsworthy events detected and aggregated from that information (Sect. 5.3); and possible news angles on those events (Sect. 5.4). While many ontologies with related purposes have already been presented for the news domain (Sect. 6), we are not aware of existing proposals with the same aim as ours: to develop a knowledge graph-driven journalistic knowledge platform that can support news angles.

For each ontology, we will explain the role it plays in the News Hunter architecture; its central concepts and relations; its most closely related ontologies; the reasoning and other processing techniques used to populate and analyse it; and finally, an example graph in RDF, serialised using Turtle notation. Section 6 will review existing ontologies from the literature and show how our proposals go beyond them.

Fig. 3
figure 3

An augmented News Hunter architecture that supports angles

5.1 News Hunter architecture

To prepare for explaining the ontologies, however, we need to review the News Hunter architecture briefly. Figure 3 shows a simplified version of the architecture from [4] and suggests how it can be augmented with support for news angles.

The Harvester continuously downloads potentially newsworthy text items such as RSS messages and tweets from the net and inserts them into a Source DB. The Lifter in turn represents each text item as a small knowledge graph by invoking NLP services such as named-entity extraction, topic identification, and sentiment analysis. It uploads the resulting item graphs into a Graph DB (or triple store). The Enricher extends these item sub-graphs with additional triples retrieved from the LOD cloud. The Front End provides a text editor the journalists can type their stories into. The Retriever supplies the journalists with relevant background facts from the Graph DB and related text items from the Source DB, either on demand or proactively (by analysing the text in the editor using the Lifter).

At the same time, the Event Detector monitors the incoming item sub-graphs. When a sufficient number of similar or overlapping sub-graphs have arrived from sufficiently trustworthy sources, the event detector collates them into an event graph that is uploaded back into the Graph DB. The Angle Matcher in turn monitors the new event graphs to find ones that fit angles represented in the Angle Catalogue. When a fit is found, the event is considered potentially newsworthy. It is extended with appropriate background facts from the Graph DB according to its angle and submitted by the Provider to the Front End for consideration by the journalist. This simplified architecture has left out several components that were presented and discussed in [4] but the Angle Matcher, Angle Catalogue, and Provider, shown in green in Fig. 3, were not considered there.

Fig. 4
figure 4

Ontology for representing news items semantically in a knowledge graph

5.2 News items

Hence, News Hunter will continuously harvest potentially news-relevant information items from a variety of sources and in different formats. So far, we have explored harvesting of: messages from social media like Facebook and Twitter; articles from newspapers on the web; and items from RSS. But potentially news-relevant texts are available from a much wider range of sources that include: commercial news services like AP and Reuters; the home pages of commercial companies and public authorities; ideal and commercial news aggregators such as GDELT and WebHoseFootnote 4; and the Internet of Things (IoT). In addition to these real-time sources, it is also possible to populate our knowledge graphs with historical news items, for example taken from news archives or from encyclopaedia. We have so far focussed on textual items, but strive to develop an architecture that is open to also including images, audio, and video in the future.

Role in the architecture Harvested items are first filtered. The ones that are deemed potentially news-relevant are then lifted into semantic form and represented as item (sub-)graphs of the central knowledge graph. Whereas standard text-based similarity searches are restricted to topics and named entities, it is a driving idea behind News Hunter to leverage the structure of this graph to facilitate more precise reasoning: the structural matching of events with news angles in this paper is one example. Nevertheless, we also store each filtered item closer to its original form as a JSON object in the source database, indexed from the knowledge graph.

Concepts and relations Figure 4 shows how a potentially news-relevant Item is represented semantically as an item graph.Footnote 5 Each item has an originalTitle, an originalText, and a sourceIRL among its datatype properties. It has a Person as its contributor, perhaps contributing through or on behalf of a source Agent. The agent can be, for example, an organisation or web site, whereas the contributor can be a natural person or a social-media handle. Although not shown in the figure, an Agent has got a confidence score normalised to the unit interval [0 : 1], representing how much the agent’s items are trusted. Also not shown are the confidence scores of Items, which must be smaller than or equal to the confidence scores of their contributor.

The item’s semantics is represented by Annotations, each of which contains a single piece of semantic information about the item or a part of it.Footnote 6 In Fig. 4, each Annotation is in turn related to an Entity in the knowledge graph, of which there are several subtypes:

  • A NamedEntity mentioned in the text, possibly a named geolocation.

  • A Concept, Topic, or Category reflected in the text, all of them subtypes of skos:Concept. The difference is that a concept must be a word or phrase used in the text, whereas topics and categories can be latent. Categories are taken from a restricted vocabulary, such as the IPTC Media Topics.

  • A Location (geo:SpatialThing) or a DateTime (xsd:dateTime) associated with the text.

  • A Sentiment reflected in the text.

Each instance of these subclasses (NamedEntity, Concept, Topic, Category, Location, DateTime, and Sentiment) has an IRI and can be extended with facts from the Linked Open Data (LOD) cloud and from proprietary data sets. The final subclass, RelationAnnotation, represents semantic relations between pairs of other Entities that annotate the same item. Each RelationAnnotation is related to an owl:ObjectProperty that describes the type of relationship.

An annotation can also have a foaf:Agent as its annotator, which will usually be a piece of software or a service, such as a named-entity linker or sentiment analyser. Linking annotations to their annotators in this way is needed whenever the semantic-lifting software is later improved or turns out to have been imprecise or faulty.

Furthermore, an annotation has a confidence, a strength and a relevance, each normalised to the unit interval [0 : 1]. The confidence describes how much trust is placed in the annotation. It is typically returned by the annotator Agent. When assessing the overall confidence in an annotation, both annotation confidence and item confidence must be taken into account.Footnote 7

The strength describes how strongly a graded annotation applies to a news item. For example, for a sentiment annotation like anger, it gauges the degree of anger expressed in the text whereas, for a relation such as likes, it represents how strongly one entity (a prospective informant) likes another (a person in the news). Hence, strength is important for the meaning of graded entities and relations. For non-graded entities and relations, such as the AngelaMerkel individual or a marriedTo relation, the strength is always one.

The relevance describes how important a role the entity or relation plays in the item. For example, in the sentence “Blast at Rally for Afghan President Kills at Least 24”, the entities “Blast”, “Kills”, and “24” should most likely be ranked as more relevant than “President”, for example to avoid misinterpretations such as “the president was killed” or “the president killed at least 24”. Hence, relevance captures an aspect of annotations that is important for downstream analysis. It is orthogonal to strength. For example, it may be important that one news item conveys a weak emotional reaction (high relevance, but low strength).

Related ontologies The item annotation ontology in Fig. 4 has already been linked to common terms defined in other vocabularies, such as foaf:Agent, foaf:Person, skos:Concept, and geo:SpatialThing. However, these are just examples. In further work, we want to align and enrich Fig. 4 with concepts from related ontologies, in particular from the IPTC’s NewsML G2 and the BBC Ontologies. Section 6 reviews and compares Fig. 4 to these and other existing ontologies for annotating news items and other texts.

Reasoning Lifting textual items into item (sub-)graphs—small knowledge graphs shaped by Fig. 4—requires natural-language processing (NLP) techniques. Earlier prototypes [4] have explored RAKE (Rapid Automatic Keyword Extraction) for lifting shorter messages, Textacy (a wrapper library for Spacy) for RSS feeds, and the Python implementation of TextRank for longer texts. We are currently identifying named entities using DBpedia Spotlight [29] and Spacy-NELFootnote 8. Integrated tools like FRED [12] and PIKES [8] are already able to automatically lift NL texts to small knowledge graphs such as ours, and lifting techniques that use word embeddings (e.g. [30]) and deep learning (e.g. [22]) keep improving. We have presented a survey of recent named-entity extraction techniques [1] that can be used in combination with techniques for topic identification and relation extraction to represent the contents of news texts increasingly precisely as item graphs.

Example

Figure 2 shows a tweet posted by Universal Somali TV early in the morning on 23 February 2017. Listing 2 shows an item graph that could result from lifting the text in this tweet, supported by the context provided by the news article it links to.Footnote 9 The tweet proclaims that President Mohamed Abdullahi Farmajo appoints Hassan Ali Khayre as the new Prime Minister of Somalia. Importantly, we assume that the translation and lifting steps have resolved the Somalian name Xasan Khayre Cali to its international counterpart: Hassan Ali Khayre. President Farmajo has been successfully resolved to a DBpedia IRI, whereas the new prime minister Khayre is not yet defined in DBpedia or Wikidata and is therefore given an internal News Hunter prefix unres:... for unresolved.

figure d
figure e

Although Hassan Ali Khayre might not be a well-known person outside of Somalian politics, a knowledge graph populated over time with social-media content might already contain the triples in Listing 3, which have been harvested and lifted from the caption of a YouTube video uploaded by The Royal House of Norway in 2010. Although the IRIs are not identical, the foaf:names in Listings 2 and 3 are sufficiently similar for a person name resolver to make the connection, perhaps supported by other triples not shown in the listings. The knowledge graph in a Norwegian newsroom might thereby contain the information necessary to detect the prime minister appointment as potentially newsworthy due to the local-person connection.

5.3 News events

To represent potentially newsworthy events with higher confidence and in more detail, the individual item graphs must be clustered, merged, and enriched to form event (sub-)graphs of the central knowledge graph. Because they are aggregated, event graphs provide more complete and precise information than individual item graphs, each of which may only describe a small part or aspect of an event. For the same reason, event graphs are corroborated by more sources, which is particularly important for social-media messages that originate from less known contributors and whose annotations may have low confidence.

Fig. 5
figure 5

Ontology for representing news events semantically in a knowledge graph

Role in the architecture Items are clustered into event graphs according to their annotations, such as their named entities, concepts/topics/categories, locations, and date–times, most of which will be shared by many item sub-graphs. Annotation entities and relations from item graphs in the same cluster are then merged to form the event graph, whose entities can be enriched with further facts taken from the Linked Open Data (LOD) cloud and other sources, either by linking to external graphs or by downloading and inserting RDF facts into the local graph.

Concepts and relations Figure 5 shows how a potentially newsworthy Event is represented semantically as an event graph. Each Event is describedBy one or more Items that it has been derived from. It can come before or after and it can cause other events, and it can have subevents. The semantics of an Event is represented in further detail by Descriptors, each of which contains a single piece of semantic information about the event. Analogously to item annotations, each Descriptor is further related to an Entity with subtypes similar to those in Fig. 4. RelationDescriptors represent semantic relationships between pairs of entities in the same event graph.

Figure 5 also shows how event Descriptors have confidence, strength, and relevance values in the same way as item annotations. In addition, Descriptors can hold before, during, and/or after the Event. Pointing forward to the next section, an event can match one or more NewsAngles, of which two subtypes are shown: LocalPerson and Nepotism. They will be explained in Sect. 5.4.

Related ontologies Particularly relevant are again the IPTC’s EventsML G2 vocabulary and the BBC Ontologies. Section 6 will review and compare our contribution to these and other existing event-related ontologies.

Reasoning Simple clustering of item graphs by annotation similarity is straightforward. Clustering can take into account item annotations that are identical as well as related: either semantically, for example through taxonomical or mereological relations, or lexically, for example using Levenshtein distance or similar measures to detect different spellings of the same name. To the extent possible, cluster detection should also identify how larger events are composed of sub-events with temporal, causal, and other relations between them.

An earlier prototype clustered items using Scikit-learn’s DBSCAN algorithm, which offers scalability and focus on neighbourhood size at the expense of uneven cluster sizes [4]. Other researchers have investigated detection of events in knowledge graphs [24, 38], as well as relations between events [43].

Merging entities and relations from item graphs that belong to the same event is also straightforward, as long as standard identifiers (IRIs) are used during lifting. We have so far used DBpedia IRIs where available, and an earlier prototype enriched the knowledge graph with DBpedia facts [4]. But many other sources of standard IRIs and related facts are available in the LOD cloud, like Wikidata and GeoNamesFootnote 10, a freely available geographical database of more than 25 million geographical names (toponyms) that refer to over 11 million unique features.

figure f

Example

In the example from Listing 2, Universal Somali TV might be treated as a trusted source whose news item is considered a new event without further corroboration. But if news items from other sources would report the same information independently, confidence in the new event would increase, perhaps along with completeness and precision. Listing 4 shows an event graph that could result from enriching the facts in Listing 2 with facts from external sources like DBpedia and Wikidata and from the related item graph shown in Listing 3, assuming that the similar-looking IRIs for Hassan Ali Khayre have been resolved.

5.4 News angles

As noted in Sect. 4, some exceptional events are newsworthy in themselves, but most events have to be made newsworthy by reporting them from a news angle. We represent news angles as: core patterns that can be matched with events to see if the angle fits along with extended patterns that suggest additional types of information to include in the presentation of the event. Matching events with news angles is a bidirectional process, in which the core facts of the event suggest candidate news angles and the candidate news angles in turn encourage additional facts to be sought, whether manually or by automated means.

Role in the architecture News angles are important both for detecting newsworthy events and for presenting them in ways that may interest the intended audience. We can represent each news angle as a core pattern to which an event must be matched and one or more extended patterns according to which the event graph can be enriched in potentially interesting ways. The part of an event graph that matches a news angle becomes a fabula (sub-)graph. The term fabula is adopted from literary theory [14] to denote the facts that a story contains in contrast to the discourse, which denotes how the those facts are told. Although our representations of news angles and fabulae might support automatic narration as well, our work on News Hunter is currently limited to proposing angled events as fabulae, leaving the writing to the journalist.

Fig. 6
figure 6

Ontological representation of the local-person news angle

Concepts and relations Figure 6 shows how the core pattern of the LocalPerson news angle can be represented in OWL. It is a particularly simple angle, matched whenever a central Person in an event graph is relatedTo a particular Location that is of importance to the journalist’s intended audience, or to another Location basedNear that location. Figure 5 already showed how such an Event can be matched by a NewsAngle to form a fabula.

Fig. 7
figure 7

Ontological representation of the nepotism news angle

Figure 7 illustrates the core pattern of a more complex news angle, that of Nepotism [46], in which a PowerfulPerson controls a Value which a GainingPerson achieves access to because of her/his privateRelation (typically a family relation) to the PowerfulPerson. Because nepotism proper also requires causality, the angle in Fig. 7 represents a weaker potential nepotism that mandates further investigation by journalists.

Related ontologies While there are many existing ontologies that capture central concepts and relations for describing events, and a few ontologies exist for annotating textual items too, we are not aware of previous work on representing news angles as ontologies—or indeed in any reasoning-ready or otherwise machine-processable form.

figure g
figure h

Reasoning Because they may involve identical or taxonomically related concepts and relations, the library of news angles will form a more or less connected news-angle ontology. The central concepts and relations in this slowly evolving ontology suggests which types of resources and relations that need to be represented in event graphs and lifted from news items.

We are exploring different ways of matching news angles with event sub-graphs. For example, Listings 5 and 6 show SPARQL queries that realise the core patterns of the news angles in Figs. 6 and 7. Each query searches the knowledge graph and constructs a core fabula graph for each match of the angle to an event. In Listing 5, na:relatedToLocation/na:basedNear? is a property path stating that the person must be related to the location of interest or, optionally, to another location near it. It is an example of how OWL ontologies must sometimes be extended with rules or other additional restrictions to fully represent angles.

We envisage a News Hunter architecture in which many collaborating agents specialise in maintaining and leveraging specific concepts and relations in the connected news-angle ontology, continuously looking for changes that could enable or disable particular angles in response to unfolding events. For example, a local-person agent would specialise in deriving new PersonrelatedToLocationLocation facts from the knowledge graph.

figure i

Example

Listing 7 shows the core fabula graph that results from matching the facts in Listing 4 with the news angle in Figure 6. This graph comprises only four facts, possibly derived by a local-person agent from facts stating that Khayre has worked for the Refugee Council located in Norway. Although the graph is simple, the facts it contains are important as they form the core fabula of the angled news report, to which potentially interesting related facts from the LOD cloud can be added. To guide identification of such related facts, the core pattern in Fig. 6 could be augmented into an extended pattern for the local-person angle, also represented as an OWL ontology.

6 Discussion

6.1 Novelty

We have proposed a family of OWL ontologies that can be used to organise journalistic knowledge graphs and augment them with support for news angles. To the best of our knowledge, this is the first attempt to analyse and represent news angles as OWL ontologies, and we suggest for the first time how ontologies for annotating items, events, and news angles can be combined in a journalistic knowledge platform. We also think that the idea of augmenting a journalistic knowledge graph with support for news angles is in itself new.

As such, the knowledge graph-driven News Hunter platform can be a useful example of an emerging type of model-driven information system that we think is becoming increasingly important. From a systems modelling perspective, development of knowledge graph-driven systems can involve reuse of existing ontologies and other vocabularies with large user communities outside the enterprise. Commitment to such ontologies makes a wide array of information sources, services, and software readily available for the system under development. At the same time, the fluent nature of the LOD cloud calls for model-driven designs that can leverage new information sources and services quickly and easily as they become available and also replace existing ones as they disappear. Commitment to common ontologies thereby also makes the system under development tied-in to an evolving ecology of sources, services, and software bound together by an ontology that is defined and maintained collaboratively by stakeholders external to the enterprise. Hence, knowledge graph-driven software systems development widens systems modelling to involve new types of long-term strategic concerns about which ontologies and LOD communities the enterprise should align with and to what extent and how it should align. As Sect. 6.7 will mention, developing a system whose components will all read from and update the same (set of) central knowledge graph(s) also calls for well-considered modularisation strategies, which can be ontology-driven too. While none of these concerns are new in themselves, knowledge graph-driven information systems bring them to the fore and combine then in new ways that deserve attention.

6.2 Use cases

We hope the News Hunter platform can help journalists with central tasks such as: detecting newsworthy events quickly and precisely; identifying appropriate angles on those events; and contextualising those angles up with relevant background and other related information. To make these and other uses of the platform clear, we have specified eleven use cases with extensions and variants to drive development of the platform. One particular important example is What’s my angle? [46], which comprises the following steps:

  1. 1.

    A journalist types a working news report into the front end.

  2. 2.

    News Hunter lifts the working report and returns IRIs for named entities, concepts/topics/categories, relations, and sentiments in the report.

  3. 3.

    News Hunter retrieves angles that fit the working report.

  4. 4.

    News Hunter recommends the most suitable angles.

  5. 5.

    The front end makes recommendations to the journalist.

6.3 Relation to existing annotation-related ontologies

We have systematically compared our proposed annotation ontology with related ontologies from the literature, such as:

  • The International Press Telecommunications Council (IPTC) has proposed NewsML G2 [48]Footnote 11 as part of its news architecture. Although not based on RDF or OWL, it offers an XML vocabulary and data format for exchanging news-related information in an industrial environment, allowing news items to be annotated with concepts and named entities from controlled vocabularies.

  • The BBC’s Linked Open Data Platform [21] includes the BBC ThingsFootnote 12, which is an online reference catalogue of people, places, organisations, and events that matter to the BBC and its audience. It is used to annotate the BBC’s archival content.

  • Although not specific to news, the Tag Ontology [32] focusses on the relations between an agent, an arbitrary resource, and one or more tags. It is extended by the Meaning-of-a-Tag (MoaT) ontology [35], which defines relations to the concepts that the tags and resources are about.

  • The SIOC (Semantically Interlinked Online Communities) ontology [6] makes social media items and posts available as Linked Open Data [5] and lets them be tagged and categorised.

[19, 20] present overviews of early annotation ontologies before 2010. More recent proposals include SCOT [19] and MUTO [26]. However, none of them accounts for the RelationAnnotation in Fig. 4, which is essential for annotating news items with actual item graphs, because it represents the relation (an OWL-object property) between two Entities represented by other Annotations of the same Item. Other terms we have not found in the related ontologies are:

  • Confidence, relevance, and strength, which are essential for representing the uncertain and graded semantic annotations produced by NL lifters.

  • The explicit representation of the annotator Agent along with each Annotation, and the possibility of explicit assignment of confidence to annotator and source Agents as well.

Hence, existing annotation ontologies are not sufficient for News Hunter, although they offer many interesting paths for further linking and extension of our proposal.

6.4 Relation to existing event-related ontologies

We have also systematically compared our proposed event ontology with related ontologies from the literature, which include:

  • IPTC’s EventsML G2 [48]Footnote 13 is an XML vocabulary and data format for “conveying event information in a news industry environment”, with focus on receiving, storing, exchanging, and publishing information about persistent (archival) and topical (ongoing) events and their coverage.

  • The BBC OntologiesFootnote 14 include the News Storyline Ontology, which is a generic model for describing and organising the story lines that news organisations tell about events, but which offers no detailed description of the events themselves. The BBC Core Ontology defines event subtypes (for music, sports, politics, etc.) that are instantiated by BBC ThingsFootnote 15. There are also specialised ontologies for business news, politics, and sports.

  • The Event and (Implied) Situation Ontology (ESO) [41] is an OWL2 ontology used in the NewsReader project [38]. It targets economical and financial news and organises different types of events in a taxonomy. It is extended by the Circumstantial Event Ontology (CEO) [40] intended to capture chains of newsworthy calamity events.

  • Although not specific to news, the Event Ontology [37] is an OWL ontology developed to support the Music OntologyFootnote 16.

  • ACE (Automatic Content Extraction)Footnote 17 is promoted by the Linguistic Data Consortium to drive research on natural-language processing through standardised annotation tasks and training and evaluation materials. It provides a detailed framework for describing different types of events along with their relations to other events and to the agents and other entities they involve.

  • Other common ontologies that deal with events and situations include: Linked Open Description of Events (LODE) [42], which is an intentionally minimal model of events aimed to facilitate interoperability; DOLCE+DnS UltraLite (DUL) [10], which is a simplification and extension of the DOLCE [11] and the Descriptions and Situations ontologies; the F Model of Events [39], which builds a modularised event model over DUL to cover participation in, composition of, causality and correlation of, documentation/representation of, and interpretation of events; the Simple Event Model (SEM) [50]), which provides core classes for describing events in terms of their actors, places, and times; the Rich Event Ontology (REO) [7], which is an OWL ontology that unifies existing semantic role-labelling (SRL) schemas (like ESO and CEO) and augments them with causal and temporal relations between events; and the Comprehensive Event Ontology (CEVO) [43], which is an event ontology and lexicon designed identify semantic relations between entities that appear in a texts or knowledge graphs.

Linked Open Data resources such as Schema.org, DBpedia, and Wikidata also define terms for describing events and related phenomena. However, none of them accounts for how Events are describedBy Items and match Events, which are our most central concerns in Fig. 5. Other terms we have not found in the related ontologies are:

  • Confidence, relevance, and strength, which are derived from the corresponding datatype properties of the Items that describe the Event.

  • The RelationDescriptor, which represents a semantic relation between a pair of entities that describe the same event, using the hasRelation property to indicate the semantic relationship intended. Whereas existing ontologies such as ESO, CEO, and CEVO also represent relations (through their focus on verbs), they do not provide concepts for representing event graphs.

Hence, existing event ontologies are not sufficient for News Hunter, although they offer many interesting paths for future extensions. In particular, they can be used to define the Entity-subclasses and owl:ObjectProperty-subproperties in event graphs.

6.5 Relation to existing news-angle related ontologies

While there are many existing ontologies that capture central concepts and relations for describing annotations and events, we are not aware of any work on representing news angles as ontologies—or indeed in any reasoning-ready or otherwise machine-processable format. To the best of our knowledge, this is an original contribution of the News Hunter platform, of our ongoing News Angler project, and of this paper.

6.6 Further ontology development

We expect the proposed ontologies to evolve as we develop the News Hunter platform and proof-of-concept prototype further. It is possible that Fig. 6 and our other news-angle ontologies will need to be supplemented with additional rules and constraints in further work, perhaps using domain-specific modelling notations on top of our ontological approach.

Additional ontologies may also be needed, for example, to: organise different types of input items; represent available analysis techniques and tools; propagate information about provenance/confidence and terms-of-use; reason about privacy; describe editorial and journalistic preferences; etc. Although we have presented them as separate ontologies in this paper, we see them as alternative thematic windows into a single logically contiguous, but perhaps physically distributed, knowledge graph.

6.7 Architectural considerations

Our paper has focussed on ontology—and thus on knowledge graph—structure, leaving architectural and most reasoning issues to parallel [1, 9, 46] and future work. The thematic ontology organisation can potentially also shape News Hunter’s processing structure, so that different sub-systems, perhaps implemented as collaborating agents, can take responsibility for different thematic sub-ontologies. For example, one group of agents would lift data into item graphs, another would collate item graphs into events, a third would match event graphs with news angles, etc. The different entities used to describe items and events could also be managed by different sub-systems according to types such as person, organisation, and location, each of which requires different forms of enrichment and analysis. Finally, metadata about privacy, provenance, and terms of use could also be maintained by different sub-systems. In such an architecture, the knowledge graph would be split by sub-ontology into contiguous and sometimes overlapping thematic sub-graphs that comprise both ontology definitions (TBox), corresponding RDF-facts represented as triples (ABox), and associated software agents with responsibility for maintaining and leveraging facts of a particular type.

6.8 Fake news

Fake news that originates from fraudulent social messaging accounts, deceptive web sites, and bogus news feeds has received increasing attention in recent years. We argue that a journalistic knowledge platform like News Hunter can make false information easier to detect because it is able to constantly triangulate multiple real-time sources of information about the same event. It is also able to corroborate parts of the information with background information taken from reference sources. This opens for triangulation- and fact-based approaches to identifying fake news, which complement the current focus on identifying fake news items by their sources and textual features. The representation of harvested information in a knowledge graph may also open for a graph-based approach to identifying fake news, using graph features instead of or in addition to the textual features used in current machine-learning approaches to fake news identification.

Care must be taken to ensure that malicious actors do not exploit journalistic knowledge platforms like News Hunter to generate fake news by infusing false news items into the knowledge graph. We have therefore designed the ontology to keep traces of all events back to the items they aggregate and further back to the persons or other agents who contributed them, making it possible to retract information derived from news sources or items that later turn out to be wholly or partially false.

7 Conclusion

Our work on the News Hunter platform and prototype has opened many interesting paths for further work: developing the platform further and populating it with live and test data; collecting libraries of news angles, both manually and automatically; adapting and extending suitable analysis techniques for analysing news items, detecting and aggregating events, finding suitable news angles, and identifying causal and other relations between events; understanding what makes a news event or report interesting in a particular context; and selecting the most suitable and appropriate empirical research goals and evaluation approaches for our project.

In a world of ever-increasing information, journalists are not the only ones facing the challenge of detecting interesting events and situations in big data sets and presenting those events and situations in interesting ways. We therefore hope our results will be useful for—and inspire practice and research in—other information systems areas beyond journalism and the news.