Introduction

Maps, as symbolic representations for the spatial locations of and relationships among elements in space, are widely adopted as an effective instrument for disseminating, and comprehending the spatial dimension (and also temporal dimension in many cases) of various geographies. Such geographies include the ones in the real world, and in fictional worlds (e.g., in fictions, films, and games). Maps thereof have become a regular proxy to better understand the narratives that embodied in the geographies, primarily in terms of the placement, and the evolution of the narratives. For example, maps are used to describe the progression of historical events such as migrations, wars, and accidents (Caquard and Cartwright 2014). In addition, maps are utilized to systematically study the geography of fictional narratives such as Atlas of Literature (Bradbury 1998), Atlas of the European novel, 1800–1900 (Moretti 1998), and the research project A Literary Atlas of Europe.Footnote 1

Narrative cartography (Caquard 2013; Caquard and Cartwright 2014; Ryan 2020) is a discipline which studies the interwoven and intimate relationship between stories and maps. The materialization of such intimate relationship evolves over time for centuries with the development of technologies. In old times, mapping processes were mostly accomplished on papers and through labor-intensive creation (Varanka and Usery 2018). Today, with the rapid development of web mapping technology, online web maps (e.g., Google Maps, Apple Maps, OpenStreetMap, and Baidu Map) have become an indispensable component and one of the major sources of geographic information in our daily lives.

Thanks to the arising of Web 2.0 and the development of web mapping and GPS technologies, an increasing number of user-generated contents, such as social media posts, photos, or even audios, are associated with geotags to depict users’ activities in the geographic space (Amoroso 2010). Such kinds of volunteered geographic information (VGI) (Goodchild 2007) provide a geospatial view of personal stories for users. Additionally, the studies on temporal aspects of mapping considerably enhance the capability of maps to be a vivid and perceptible channel to delineate stories that evolve over time; see, e.g., Andrienko et al. (2010). In view of the developments of media (particularly the web), and advancements in mapping theories and technologies, interactive maps today are also denoted as geovisualization, which are of remarkable power to better act as a proxy for various stories.

Nevertheless, geovisualization for narratives encounters several long-standing challenges, which also appear in geovisualization, and even geospatial studies in general. Some prominent challenges are, among others, the data acquisition & integration challenge and the semantic challenge.

The data acquisition & integration challenge pertains to the difficulties to retrieve, process, and integrate geospatial data for narrative cartography. It has been commonly acknowledged that data acquisition, cleaning, integration, and apportionment cost most of the resources (such as human labors, financial resources, and time) for a typical data science project (Janowicz 2021). Today the data that could underlay narrative mapping are exploding, including geospatial data from authoritative sources, VGI, and other types of thematic data linked to narratives, e.g., data from Wikipedia. In this context, it is nontrivial to search, preprocess, and integrate relevant datasets in a common data format for a particular narrative mapping task. In fact, they are, among others, the major roadblocks of a narrative cartography project and could easily cost the majority of an entire project’s resources.

An example is the development of a narrative map that describes the stories pertaining to heritage buildings in a particular city. The required information includes building footprints, building attribute information (e.g., building construction time, and building history description), related historical activities records, and the annual tourist statistics of each building. However, such information often does not come from a single data repository, but instead from a number of data sources. Such dispersed data sources impose two major issues. First, data from multiple sources often require different data preprocessing strategies. Second, despite the intrinsic connections between the data from different sources, they are, in most cases, stored in the so-called data silos. The reality is that leveraging these numerous data sources “in-one-go” is rather difficult, as the data sources are often locked away from each other with different data models, schemes, and semantics. Integrating these enormous amount of data for narrative mapping is a unique opportunity brought up in this contemporary web mapping era, and also an immense challenge.

Another critical challenge is the semantic challenge which is deemed prominent in connecting maps with narratives. It can be divided into two sub-challenges — the semantic challenge of map content and the semantic challenge in geovisualization. The seminal editorial in narrative cartography by Caquard and Cartwright (2014) argued that the nexuses between maps and narratives should be addressed from two perspectives, i.e., (1) maps as representations for spatiotemporal structures of stories and their relations with places, and (2) connecting maps with the mapping process through narratives. With regard to the former perspective, one core question is how to store and represent spatiotemporal data in a semantically explicit manner to facilitate narrative mapping — the semantic challenge of map content. For instance, for place names, especially the ambiguous ones such as “Washington”, “San Jose” mentioned in a personal travel blog, place name recognition (Manning et al. 2014; Karimzadeh et al. 2019; Wang et al. 2020) and disambiguation (Overell and Rüger 2008; Hu et al. 2014; Ju et al. 2016) are indispensable preprocessing steps that link a place name to a specific geographic entity in a gazetteer (Goodchild 2004; Regalia et al. 2018). How to store these preprocessed data in a formalized and semantic explicit manner for narrative mapping is essential for downstream mapping tasks, and for data reusability and map reproducibility.

The latter perspective has been seldom addressed, and this could be partially and tacitly ascribed to the semantic challenge in geovisualization. The knowledge of geovisualization processes, i.e., how the maps are produced, and how the underlying data are transformed to graphics, is usually embedded implicitly in complex programs or in the mind of cartographers, which renders the knowledge difficult to be transferred, interpreted, expanded, and reused. Put differently, the semantics in mapping process is difficult to be represented, and formalized. Janowicz et al. (2010) regarded visualization as a sink where semantics transferred through all the components of spatial data infrastructures (SDIs) has to be aggregated, interpreted and visualized in a meaningful way. For instance, during visualization, symbols that transform underlying data to graphics bear abundant semantic information for the delivery of map content to users, and it is broadly recognized that such information should be formally represented to foster wide comprehension and reuse.

Over the past decades, there has been another emerging trend of the Web, namely the Web 3.0, and at its core lies the prospect of the Semantic Web (Berners-Lee et al. 2001; Bizer et al. 2011) that calls for a “Web of data” in contrast to the traditional “Web of documents”; see the activities from the World Wide Web Consortium (W3C).Footnote 2 Semantic Web is partially an endeavor of formalizing and representing data on the web in a both human- and machine-readable fashion to foster data interoperability, (re)usability, and applicability. With Semantic Web technologies, a knowledge graph (KG) (Noy et al. 2019; Ji et al. 2021) can be constructed as a data repository describing entities (e.g., places, events, and people) and their relations within or across domain(s) according to formalized ontologies, which can be seen as a directed labeled multigraph (Nickel et al. 2015).

Thus far, a number of large-scale KGs have been constructed, including open-sourced projects such as DBpedia (Mendes et al. 2011), YAGO (Rebele et al. 2016), and Wikidata (Vrandečić and Krötzsch 2014), as well as commercial projects (Noy et al. 2019), such as Microsoft’s Satori,Footnote 3 Google Knowledge Graph,Footnote 4 Facebook’s social graph,Footnote 5 and eBay’s Product Knowledge Graph. These open-sourced or commercial KGs provide structured data and factual knowledge that support many intelligent applications and services such as question answering (Saxena et al. 2020; Mai et al. 2019b), voice assistant (e.g., Apple Siri, Amazon Alex, Google Assistant), search (e.g., Google Search, Bing Search, Amazon Product Search), and so on (Noy et al. 2019).

Semantic Web technologies have been increasingly adopted in the geospatial domain (Janowicz et al. 2012; Battle and Kolas 2011; Huang et al. 2019) to address some long-standing issues, such as data integration and reuse (e.g., Schade and Smits 2012; Huang et al. 2018), knowledge formalization (e.g., Kuhn 2005; Huang and Harrie 2020), and semantic interoperability (e.g., Janowicz et al. 2010). The increasing employment of Semantic Web technologies in the geospatial domain have fostered various geospatial KGs (denoted GeoKG hereinafter) such as LinkedGeoData (Stadler et al. 2012), GeoNames,Footnote 6 and GNIS-LD (Regalia et al. 2018).

KGs fundamentally organize (geospatial) knowledge and data in an interlinked and formalized fashion, thereby revealing a promising avenue to (partially) resolving the data acquisition & integration as well as the semantic challenges for geospatial studies in general (Janowicz et al. 2012), especially for narrative cartography. Specifically, there are several advantages to use KGs for narrative cartography:

  1. 1.

    Semantic Web technologies and many existing large-scale KGs are potentially great solutions for the data acquisition & integration challenge. There are many ever-growing large cross-domain KGs (e.g., Wikidata, and DBpedia), which cover a wide range of topics (e.g., events, peoples, places, and organizations) through integrating various data sources. These pre-integrated cross-domain knowledge bases can potentially serve as a huge data repository for many narrative mapping tasks which can quickly reduce the data acquisition workload. Moreover, they can save tremendous efforts for data integration. Their rich context also allows map users to easily explore relations among geographic entities (e.g., places, events), and non-geographic entities (e.g., people, organizations)

  2. 2.

    Modeling the underlying map data as a GeoKG is also a great practice to overcome the semantic challenge of map content. Sometimes, the existing KGs may not have the rich information that a narrative map would like to cover. So instead of only using existing KGs as the underlying map data, we can formalize our own map content as a GeoKG. This practice leads to a semantic explicit map data representation so that the KG statements about map data can be easily integrated with other KGs for various mapping or data analysis purposes.

  3. 3.

    The semantic challenge in geovisualization can also be tackled by formalizing the geovisualization process in KGs. This is because that by encoding the geovisualization process as KG statements, we make the semantics of the mapping process explicit, which forsters better reproducibility and comprehension.

  4. 4.

    The data integration challenge can be naturally resolved when we represent the map content in KGs, given the great power of the Semantic Web technologies in data integration, entity alignment (Trisedya et al. 2019; Zhu et al. 2020), and ontology alignment (Jain et al. 2010; Zhu et al. 2016, 2020).

Despite such advantages, Semantic Web technologies (and KGs) have been barely exploited for geovisualization — needless to say for narrative cartography. In this work, we investigate how to use them address the aforementioned challenges in narrative cartography. Our contribution can be summarized as follows:

  1. 1.

    To showcase how to use existing cross-domain KGs to overcome the data acquisition challenge, we developed a set of KG-based geoenrichment python toolboxes within ArcGIS Pro. We show how to use these toolboxes to quickly fetch information from a popular KG (i.e., Wikidata) to make narrative maps about historical events and trajectories directly. Two use cases are provided to demonstrate this idea — one for Ferdinand Magellan’s expedition and the other one for the World War II.

  2. 2.

    To overcome the semantic challenge of map content and the semantic challenge in geovisualization, we design a modular ontology (also denoted KG schema) for narrative cartography, which includes a map content module and a cartography module, to formally represent the map content as well as the related geovisualization process respectively. This modular ontology can guide the development of KGs for different narrative mapping projects.

  3. 3.

    In addition, to demonstrate how we can use the cartography module to model the whole geovisualization process in a semantic explicit manner, we list some example portrayal rules which are implemented with SPARQL rules. Each of them shows how KGs connect the geovisualization process with the underlying map data explicitly.

  4. 4.

    We show that our designed modular ontology is flexible enough to allow the resulting KG to link to other existing KGs, and thus substantially alleviates the efforts for data acquisition and data integration, and fosters data reusability and map reproducibility.

The rest of this paper is structured as follows. We first discuss some background and related works in “Background and Related Work”. Subsequently, in “Knowledge Graph-Based GeoEnrichment for Narrative Mapping”, we discuss our KG-based geoenrichment ArcGIS Pro python toolboxes and how they can help to mitigate, or even eliminate the data acquisition & integration challenge for narrative cartography. Next, in “A Modular Ontology for Narrative Cartography”, we design a KG schema (i.e., ontologies) to formally model the map content as well as the related geovisualization process. This KG schema can help to guide the development of KGs for different narrative mapping projects. Moreover, the designed KG schema is flexible enough to allow the resulting KG to link to other existing KGs, and thus substantially alleviates the efforts for data integration, and fosters data reusability and map reproducibility. In “Conclusions and Outlook”, we discuss the advantages and potential limitations of our proposed KG-based narrative mapping practice with several future research directions being pointed out.

Background and Related Work

Knowledge Graphs

A knowledge graph (KG) provides a graph-structured way to encode facts and statements with a certain world view. From a graph view, a KG can be regarded as a directed labeled multigraph, in which a statement is composed of two entities (nodes) and a relation (a labeled, directed edge) between them. Accordingly, a statement in the context of the Semantic Web can be expressed in the form of a triple (h, r, t), where h is the head entity (i.e., subject), r the relation (i.e., predicate), and t the tail entity (i.e., object), respectively. For instance, a statement Santa Barbara is part of California can be represented as (Santa Barbara, partOf, California). Such a data model is called Resource Description Framework (RDF), a W3C standard that facilitates data integration and knowledge management among different data sources on the web.

So far, different KGs have been constructed for different purposes and different topics. For example, DBpedia, Wikidata, and YAGO are general-purpose cross-domain KGs partially built based on Wikipedia. LinkedGeoData (Stadler et al. 2012), GeoNames,Footnote 7 GNIS-LD (Regalia et al. 2018), and GeoLink (Mai et al. 2016; Cheatham et al. 2018) are GeoKGs with majorly geospatial entities. Bio2RDF (Belleau et al. 2008) is a bioinformatics KG, and KnowLife (Ernst et al. 2014) is a health and life science KG. Thanks to the RDF data model, it is straightforward to integrate data among these KGs. More specifically, all these open-sourced KGs are linked to each other, e.g., by owl:sameAs links to indicate that two nodes from two different KGs corresponding to the same real-world entity. So these KGs jointly form a even large KG called the Linked Open Data Cloud,Footnote 8 which currently has in total 1301 data repositories (KGs). These KGs, especially cross-domain KGs and geospatial KGs, provide a rich information resource for spatial analysis and geovisualization.

Space and Time in a KG

Space and time are the nexuses of knowledge representation and organization in KGs (Janowicz 2010). Despite the noticeable inseparability of space and time, two terms, geographic knowledge graphs (GeoKGs) (Wang et al. 2019; Yan 2019; Sun et al. 2021) and temporal knowledge graphs (TKGs) (García-Durán et al. 2018; Gottschalk and Demidova 2018), have become widely used in the literature. They emphasize the geographic and temporal perspectives in KGs respectively. In the past decade, they have contributed to a variety of geospatial studies, such as toponym resolution (Grover et al. 2010; Middleton et al. 2018), geographic question answering (Mai et al. 2019c; Mai et al. 2020), place summarization (Yan et al. 2019), and travel attraction recommendation (Lu et al. 2016).

Table 1 summarizes the ways in which KGs represent different types of spatial and temporal information. We categorize common spatial information into four groups, including location information, spatial relations between/among them, spatial scope of statements, and non-spatial attributes about geographic entities. Meanwhile, four types of temporal information are identified, which describe the occurring time of events, temporal/non-temporal relations between events/entities, temporal scope of entities (e.g., the lifespan of a person), and temporal scope of statements, respectively. We also demonstrate how a KG jointly represents spatial and temporal information such as space time point or a trajectory. For each of these sub-type information, we provide triples from Wikidata/DBpedia as illustrations. We can see that there is abundant spatial and temporal information stored in KGs.

Table 1 Different types of spatial and temporal information in KGs

Those geographic entities which carry spatiotemporal information are also linked to other geographic and non-geographic entities. For example, an expedition is linked to its participants, travel origin, stopover points, and destination. The entities are also linked to other entities (i.e., through 2-degree relations). This forms an interesting graph structure that a user of narrative maps can explore. Sometimes, to make a narrative map for some historical events or stories, one can directly get those preprocessed information from an existing KG rather than starting to collect map data from scratch. These existing KGs can be used for narrative mapping directly or can be easily integrated with additional data sources to serve as the underlining map data representation. This will significantly alleviate the data acquisition efforts. “Knowledge Graph-Based GeoEnrichment for Narrative Mapping” shows two examples of making narrative maps based on existing KGs.

Event

The objective of narrative cartography can often be linked to the concept of event in spatial information (Scheider et al. 2014). To this end, previous theoretical studies on formalizing the concept of event can provide a foundation to the ontology design for narrative carography.

As one of the earliest work on event modeling, Allen and Ferguson (1994) advocated the idea that events are the way by which agents classify certain relevant patterns of changes. An event must involve at least one object over some stretch of time, i.e., time intervals, or involve at least one change of state. Moreover, they are defined to occur over intervals of time which cannot be reduced to some set of properties holding at instantaneous points in time. In contrast, actions are something an agent (e.g., a person or robot) might do which might cause an event to occur. By associating time periods to events, Allen and Ferguson (1994) introduced a general representation of events and action based on the famous interval temporal logic (Allen 1983) which supports a wide range of reasoning tasks including planning, explanation, and prediction.

Galton and Augusto (2002) compared two event definitions from two communities — active database and knowledge representation. The active database approach defines events based on their detection conditions and regards events as instantaneous. On the contrary, the knowledge representation approach defines event based on their occurrence over an interval and regards them as durative. Galton and Augusto (2002) showed that treating events as instantaneous is inadequate and will lead to problems during temporal reasoning.

Galton and Mizoguchi (2009) compared the similarities and differences between events and objects. They pointed out that they are both discrete individuals, which cannot be dissected into parts with the same types, and have well-defined extensions. Galton and Mizoguchi (2009) further showed that the relation between event and process can be seen as an analogy of the relation between object and matter. This neat analogy has been widely accepted for event modeling.

Event is also regarded as one of the core concepts of spatial information (Kuhn 2011, 2012; Kuhn and Ballatore 2015). Similar to previous studies, Kuhn and Ballatore (2015) also treated events as individual portions of processes and are temporally bounded. In many cases, events are also spatially bounded, such as wildfires, hurricanes, and floods. Additionally, they also emphasized that events have an identity as objects which are described by their temporal and thematic properties and relations.

These previous theoretical studies on events provide a solid ground to formalize the conceptual model of events and the way how events can connect with other spatial information in a narrative cartography context. In “Knowledge Graph-Based GeoEnrichment for Narrative Mapping”, we show how to use existing KGs to dynamically visualize two well-known historic events — Ferdinand Magellan’s expedition and World War II as simple narrative maps with the help of a collection of KG-based GeoEnrichment toolboxes. Moreover, in “The Map Content Module”, we explicitly consider the spatial temporal scoping of different geographic objects and events when we design the map content ontology.

Semantic Web and KG Applications in Digital Humanities

The advancements in narrative cartography are also closely related to the “spatial turn” in many humanities disciplines such as history, classical studies, literary studies, philology, and religion (Adams et al. 2017). Except for human geography, other disciplines also began to regard space as an important dimension to their own areas of inquiry (Warf and Arias 2008) and many humanity researchers have started to explicitly record the spatial and temporal attributes of their data and use visual analytic as part of their analysis. Under this “spatial turn” trend, narrative maps become increasingly popular in digital humanity research. One good example is Esri’s Story Map collection about history, culture, literature, and the art.Footnote 9

In order to add the spatial (and temporal) dimension into the current digital humanity data repositories, a common practice is to do content annotation — associating the spatiotemporal references, i.e., historic or contemporary toponyms, in unstructured resources to the corresponding geographic features in a gazetteer (Goodchild 2004; Barker et al. 2016; Grossner et al. 2016). A gazetteer is a geographic dictionary or directory which links place names to their geographic locations as well as other information such as place description, alternative names, feature types, and their spatial relations to other places. Some popular gazetteers are Alexandria Digital Library Gazetteer (Goodchild 2004), Pleiades Gazetteer (Elliott and Gillies 2008), and the Geographic Names Information System (GNIS).Footnote 10

Because of several key advantages of Knowledge Graphs and Semantic Web technologies — improving interoperability across heterogeneous datasets, easing dataset publishing and retrieval, supporting co-reference resolution without enforcing global consistency (Regalia et al. 2018), we have witnessed an increasing number of gazetteers published in a Linked Data format, i.e., as a knowledge graph, to facilitate dataset discovery and integration. Specifically, a number of gazetteers have been published in a Linked Data format such as Getty Thesaurus of Geographic Names (TGN),Footnote 11 GeoNames (Ahlers 2013), GNIS-LD (Regalia et al. 2018), the Pelagios Project (Barker et al. 2016), and World Historical Gazetteer (Mostern 2017; Grossner 2020).

One key challenge that many gazetteers encounter is how to meaningfully scope different places spatially and temporally. Kauppinen et al. (2008) proposed a geospatial ontology time series to represent the meaning of changing geographic features. They proposed to represent each region with different URIs after some regional changes such as merge, split, and name change. By using this practice to encode geographic features in Finland, the part-of place hierarchy at a specific time can be automatically constructed. Grossner et al. (2016) approached the same problem with a different approach. They proposed an ontology design pattern (ODP) for setting. This ODP associates each period and place with a setting which has a SpatialScope and a TemporalScope. SpatialScope and TemporalScope are the superclass of SpatialExtent and TemporalExtent accordingly. Here, SpatialExtent and TemporalExtent are models by GeoSPARL ontlology and OWL Time Ontology respectively.

Formalizing Geospatial and Cartographic Knowledge with Ontologies

Ontology is a major paradigm for knowledge representation and reasoning in Semantic Web. Specifically, ontologies are controlled vocabularies that describe concepts and relations between concepts using well-understood formal constructs; such constructs formalize the intended meaning of the vocabularies and capture background knowledge about the domain (Horrocks 2008). In the geospatial domain, knowledge representation using ontologies has been a long-standing research topic. Different ontologies have been proposed to formally represent geospatial information (e.g., vector geometries) in knowledge graphs such as NeoGeo (Norton et al. 2012), GeoSPARQL (Battle and Kolas 2011), stRDF/stSPARQL (Koubarakis and Kyzirakos 2010), and AGO (Regalia et al. 2017). Except for modeling basic geometric information, many of the endeavors in the geospatial semantics community have been made to model more advanced geographic concepts such as trajectories (Hu et al. 2013) and sensor network (Janowicz 2012).

The idea of formalizing knowledge embedded in maps is intuitive in view of the implicit concepts and rules inherent in maps (Kavouras and Kokla 2007). In this direction, Scheider et al. (2014) proposed ontologies to formally represent the content of historic maps in order to support search for map resources. Varanka and Usery (2018) proposed the notion of “the map as knowledge base” (namely developing maps with GeoKGs), which is, from a technical perspective, akin to the KG-based GeoEnrichment approach for narrative mapping in this paper (cf. “Knowledge Graph-Based GeoEnrichment for Narrative Mapping ”). Subsequently, Huang and Harrie (2020), Huang et al. (2020), and Viry and Villanova-Oliver (2021) took further steps to this end. They not only semantically encoded the underlying map content data, but also formalized the knowledge of visualization, i.e., how the data are transformed to graphics, so as to forge GeoKGs at both the data level (map content) and the meta-knowledge level (geovisualization theories). Besides, Gao et al. (2017) proposed a map legend ontology to semantically annotate and query map contents via their legend in a machine-readable manner. Degbelo (2021) formalized an ontology design pattern for map content to facilitate map interpretation and insights sharing.

These studies form a solid ground for this paper, i.e., formalizing the knowledge involved in narrative mapping. However, these previous studies predominately focused on general-purpose static visualization of geospatial data, and some of the key components in narrative mapping are lacking, e.g., in the modeling of temporal scope of events. In this context, we design a modular ontology tailored for narrative cartography, including the map content module and the cartography module that the narratives entail (See “A Modular Ontology for Narrative Cartography”). Moreover, in contrast to many previous studies (Scheider et al. 2014; Gao et al. 2017; Degbelo 2021) which mainly focused on modeling map topics and content for map sharing and searching, our narrative cartography ontology formalizes the whole mapping process from the map content to the geovisualization.

Knowledge Graph-Based GeoEnrichment for Narrative Mapping

As we discussed in “Introduction”, one promising way to overcome the data acquisition & integration challenge of narrative cartography is to use those pre-integrated cross-domain large knowledge graphs as the data repository for different mapping projects. However, one question is how we can directly make use of the data from those KGs from within a Geographic Information System (GI System) to make narrative maps. In fact, currently, there is no GI Systems that can directly consume Linked Data and Knowledge Graphs as one of their data formats. So some knowledge graph plugins need to be developed to make this possible in the first place.

In the following, we will show how we can use our KG-based GeoEnrichment python toolboxes to make narrative maps for different types of historical events. “Limitations of Existing Knowledge Graph Plugins of GI Systems” discusses some pioneer works about integrating Linked Data into GI systems. We will discuss the limitations of these KG plugins for GI Systems. Then, “Overview of our KG-Based GeoEnrichment Services” briefly discusses our KG-based GeoEnrichment Python Toolboxes for ArcGIS Pro which aim at overcoming the shortcomings of previous works. Next, “A Map of Ferdinand Magellan’s Expedition” and “A Map of All Events During World War II” shows two use cases of our KG-based GeoEnrichment toolboxes for narrative mapping. “Limitations of the KG-Based GeoEnrichment Approach for Narrative Cartography” section concludes this section and discusses the limitations of our tools.

Limitations of Existing Knowledge Graph Plugins of GI Systems

The main objective of a knowledge graph plugin for a GI System is to allow a GI System to directly consume the data from many ever-growing knowledge graphs so that we can do spatial analysis or make maps on top of these KGs. Despite the advantages of KGs and Semantic Web technologies in modeling (spatiotemporal) data, few efforts have been devoted to directly consume geospatial data within KGs for spatial analysis, geoprocessing, or mapping purpose. From a GISystem perspective, Linked Data and KG research seem like a one-way street (Mai et al. 2019a). On the one hand, numerous efforts have been made to triplify the existing geospatial data into RDF triples and focus on getting various types of geo-data out of data silos. On the other hand, the question of how to actually make use of this plethora of data (i.e., GeoKGs) for spatial analysis or cartography purpose remains largely unanswered. The reason is that all the current state-of-the-art GISystems and cartography softwares such as ArcGIS, QGIS, and SuperMap cannot directly consume (geospatial) KGs directly without data conversion.

It is possible to flatten the graph structure of a whole KG into a table format such as Shapefiles in order to make it manipulable for GISystems. However, the converted tabular data will become another data silo and possibly get quickly out-of-date. Moreover, this data flattening practice will erase all the rich link structure provided by a KG. The resulting tabular data is very similar to those we get from the conventional web feature services (WFS) and GIS plugins such as the QGIS OpenStreetMap Plugin.Footnote 12 These traditional services primarily focus on fetching the geometric information as well as some basic properties (e.g., labels, types) of geographic features/entities. Yet, from a narrative cartography perspective, we are not only interested in fetching such basic information, but also interested in exploring the relationships between places and other entities such as events, people, and activities in which the link structure provided by KGs becomes important.

Instead of the graph flattening approach, recently we see two works — ESRI ArcMap Linked Data Connector (Mai et al. 2019a) and QGIS SPARQL Unicorn pluginFootnote 13 — that build toolboxes or plugins for existing GISystems to enable a GIS user to explore the KG structure from within a GISystem. These toolboxes or plugins can automatically construct SPARQL queries based on the user input and send them to some existing KGs, such as DBpedia and Wikidata. The results of these SPARQL queries are automatically materialized into a GIS processable format such as Shapefiles or feature classes in a File geodatabase, which can be utilized to explore the KG further or to do conventional spatial analysis. Unlike the whole graph flattening approach, these two works will not directly flatten the whole KG into a big table. Instead, they only take a small subgraph from the KG based on users’ input, convert them into a tabular format while still keep the possibility to allow users to explore the strongly interlinked graph structure further.

Both tools have the area-based entity retrieval functionality which enables a GIS user to define a study area and retrieve geographic entities of certain types falling into this area. They can be used to answer questions such as show me all county seats within the picked study area. The fetched geographic entities are formatted in a Shapefile which contains information such as their Unique Resource Identifies (URIs), their feature types, their geometries, and their labels. Because of these URI information, this file can be further utilized by the same toolbox set or plugin to explore the KG further.

Despite the above neat properties, several limitations exist for these existing KG tools:

  1. 1.

    QGIS SPARQL Unicorn plugin only support a few basic spatial query functionalities such as area-based entity retrieval. They do not support more complex KG exploration and analysis (e.g., N-degree relationship exploration as shown in Fig. 2a) which can be useful for narrative cartography.

  2. 2.

    Both ESRI ArcMap Linked Data Connector and QGIS SPARQL Unicorn plugin are still geographic entity-centred. They only allow for exploring a KG starting from a geographic node. However, a user might be willing to start exploring the KG from a non-geographic entity/node, such as Ferdinand Magellan.

  3. 3.

    These tools are based on out-dated GIS platforms or are poorly maintained. For instance, based on a comprehensive testing on QGIS SPARQL Unicorn plugin, we find out many functionalities of QGIS SPARQL Unicorn plugin do not work. Based on the inspection on the constructed SPARQL queries, we discover some failures that are due to a systematic syntax errors in their SPARQL constructor, while others are hard to tell. As to ESRI ArcMap Linked Data Connector, although most toolboxes work well on ArcMap 10.4 — an old version of ArcGIS platform, which stopped being maintained by Esri Inc., they are not compatible with the newest ArcGIS Pro platform.

Overview of our KG-Based GeoEnrichment Services

To overcome the above limitations, we develop a KG oriented GeoEnrichment tool based on the ESRI ArcMap Linked Data Connector. It is a collection of ArcGIS python toolboxes to support narrative mapping and provide a general access to KG information from within GISystems. Figure 1 illustrates how our KG-base GeoEnrichment toolset can be used for narrative mapping. It serves as a middle layer between a GI System, i.e., ArcGIS Pro, and a Knowledge Graph (e.g., Wikidata). A GIS user can directly explore and retrieve necessary data from the KG within ArcGIS Pro. The retrieved data is materialized in a GIS format (i.e., Shapefiles and ArcGIS Attribute Tables) so that normal spatial analysis and cartography operations can be applied on them. We can directly make narrative maps based on these data while the time and efforts for data retrieval, preprocessing, and integration are largely reduced. Compared with those two previous tools, our new KG-based GeoEnrichment toolset have the following advantages: (1) It supports more flexible KG exploration and data retrieval functionalities such as N-degree relation exploration and non-geographic entity property enrichment; (2) It allows users to explore KGs from a non-geographic node; (3) It is developed for the newest ArcGIS Pro which is rather easy to maintain.

Fig. 1
figure 1

An illustration of our KG-based GeoEnrichment python toolbox workflow

Compared with the ESRI ArcMap Linked Data Connector, our new KG-based GeoEnrichment toolset is mainly different in two toolboxes:

  1. 1.

    Linked Data Relationship Finder: This toolbox enables users to explore N-degree relationship paths within a KG such as Wikidata from within a GI System. The idea of N-degree relation exploration is shown in Fig. 1. This toolbox requires several input parameters: the SPARQL endpoint, start node(s), relationship degree, and the property direction and property URL of the K th degree property along the property path. The SPARQL endpoint is the SPARQL endpoint of the KG a user would like to connect to. Currently, our tool supports Wikidata as well as any other KGs who support GeoSPARQL. The start nodes are the user selected nodes where our property path begins which are denoted as blue nodes in Fig. 1. ESRI ArcGIS Linked Data Connector (Mai et al. 2019a) only allows geographic entities as the start nodes while we relax this restriction and allow non-geographic node as start nodes, e.g., Ferdinand Magellan or World War II. In addition, a user need to let the tool know which property path (s)he want to explore. This includes the relationship degree (the length of the property path), each property’s URL as well as its direction along the path. As shown in Fig. 1, when a user specifies Ferdinand Magellan as the start node, ?peopleparticipant in?expeditionvia?place as a 2-degree property path,Footnote 14 all the expeditions taken by Ferdinand Magellan and all the places these expeditions have past will be retrieved. Figures 2a, b and 3a show how this toolbox looks like in different use cases. Note that this toolbox adopts an interactive way to allow users to specify the property path as Mai et al. (2019a) did. For example, when a user select Ferdinand Magellan as the start node and ORIGIN as the first relation direction, a SPARQL query will be constructed to get all properties that have Ferdinand Magellan as its subject node. The rest works in a similar manner.

  2. 2.

    Linked Data Property Enrichment: This toolbox allows a user to enrich the retrieved data with more information from the KG. For example, when we retrieve all the events which are transitatively part of World War II based on Linked Data Relationship Finder toolbox as shown in Fig. 1, we can enrich these events with more attributes such as start time, end time, and point in time which are indicated as red arrows. ESRI ArcGIS Linked Data Connector (Mai et al. 2019a) also provided a similar property enrichment toolbox. However, their toolbox only allows geographic entities as the input entities for property enrichment. This will post a lot of limitations on the kind of information we can retrieve from a KG. In contrast, our toolbox allow property enrichment for non-geographic entities. Figure 3b is a screenshot of this tool.

Fig. 2
figure 2

A use case of KG-based GeoEnrichment tool: mapping the Ferdinand Magellan’s expedition in the sixteenth century. We use the Relationship Finder toolbox to find all expeditions participated by Ferdinand Magellan and all these expeditions’ (a) start points and (b) stopover points (via). (c) The resulted map shows the Ferdinand Magellan’s expedition in the sixteenth century. Here, http://www.wikidata.org/entity/Q1496 indicates Ferdinand Magellan in Wikidata

Fig. 3
figure 3

A use case of KG-based GeoEnrichment tool: mapping all events happened during World War II. (a) We use the Linked Data Relationship Finder toolbox to find all events that is transitively part of World War II (maximum 4 degree away) with has part relation and get all their names and geographic locations. The result is materialized as a Shapefile. (b) We use the Linked Data Property Enrichment toolbox to query for more information about these events such as their start time, end time, as well as point in time which serve as their temporal information. (c) The resulted map shows all events during World War II which are ordered by the timeline shown on the top of the map. Here, http://www.wikidata.org/entity/Q362 indicates World War II in Wikidata

In the following, we use two use cases to demonstrate how we can use the KG-based GeoEnrichment toolboxes in narrative mapping.

A Map of Ferdinand Magellan’s Expedition

A typical narrative map example is to map people’s activities across space and over certain time period such as tracing the movements of characters within James Joyce’s Dubliners (Joyce 2008). Here, we showcase a narrative map concerning the Ferdinand Magellan’s expedition in the sixteenth century.

To make such a map, in addition to the trajectory of this expedition, we are also interested in exploring the people-expedition-place relationships which cannot be done through traditional web feature services. A typical narrative mapping practice is to read the historical record of Ferdinand Magellan such as his Wikipedia page and then map out the expedition’s trajectory. This requires to recognize the place names appeared in this article, link them to the corresponding geographic entities in an existing gazetteer, find out the geographic locations of these places as well as when Ferdinand visited them, and finally prepare a Shapefile based on such information for narrative mapping. However, with our KG-based GeoEnrichment tool, we can make such map within a few minutes from within ArcGIS directly.

Figure 2 shows the whole process as well as the resultant map. Firstly, Fig. 2a shows how we can use the Linked Data Relationship Finder toolbox to explore the property path — ?peopleparticipant in?expeditionstart point?place. In this case, we start from Ferdinand MagellanFootnote 15 (the entity wd:Q1496 in Wikidata) and explore its 2-degree relationship paths. The particular example path shown here goes from the person (Ferdinand Magellan) to the expeditions he took (e.g., MagellanElcano expeditionFootnote 16) through the participant in relation, and then to the start points of these expeditions (e.g., SevilleFootnote 17) through the start point relation. Similarly, Fig. 2b explores the people-expedition-via point (stopover points) relationship paths. The resulting start points and stopover points are automatically materialized into a Shapefile format so that we can directly map them as a trajectory.

Figure 2c shows the resultant map of Ferdinand Magellan’s expedition. Note that this map is based on the available information on Wikidata and may not reflect the whole trajectory of this expedition. This section focuses on the question how to utilize KGs to overcome the data acquisition & integration bottleneck for narrative cartography, rather than producing a real ready-to-go narrative map product. This map shown in Fig. 2c is simply a use case and should not be judged from an aesthetic aspect. Nevertheless, we believe that the promotion of open-sourced KGs and cartography with KGs are mutually beneficial. Since the more people can find ways to utilize data in open-sourced KGs (e.g., Wikidata in this example), the more incentive they have to contribute to these KGs.

A Map of All Events During World War II

Another example is mapping the major events happening in World War II (WW2) which might be interesting for a student or a historian. Since the events in Wikidata are organized in a hierarchical way, we explore the event-subevent-subsubevent relationships to obtain all events’ locations as well as their temporal sequence during WW2. Here, we show how to use our KG-based GeoEnrichment tool to easily do this.

Figure 3 shows how to use the Linked Data Relationship Finder tool to explore 4-degree relationship paths involving ?eventhas part?subevent relationsFootnote 18 with World War IIFootnote 19 as the start node. Here, each EVENT node is connected to its direct SUBEVENT node with has part relation. Figure 1 shows the subgraph of Wikidata in which World War II is connected to the first and second degree of its subevents. All these events which are transitively part of World War II (maximum 4 degree away) are retrieved from Wikidata and materialized as a Shapefile for geovisualization.

Since we are also interested in temporal order of these events, we can enrich this GIS data by querying the temporal information of these events from Wikidata. Figure 3b shows how we can do this based on another toolbox — Linked Data Property Enrichment toolbox. After loading the materialized GIS data into the toolbox, it will automatically query for the properties of these events that are available for data enrichment. Here, we pick start time, end time, point in time as they indicate the temporal information of these events. Figure 3c shows the final map of these events during World War II. The timeline above can also control which events to show based on the retrieved timestamps.

Another important advantage of using KG for narrative cartography is that KGs contain massive information about each geographic and non-geographic entity. When we visualize some geographic entities (e.g., events, objects) on the narrative map, KGs can provide rich contextual information for these entities and allow the users to do further exploration. Since many existing KGs such as Wikidata are built based on various data sources, it contains massive amount of information for each entity which will be much more than what we normally get from a single dataset (e.g., a historic battle dataset). For instance, by using the Linked Data Property Enrichment toolbox, we can not only access the temporal information but also numerous other types information about this event such as the event category, the number of deaths, the number of injured, significant events during these events, participants, the cause of this event, and following events. More specifically, as shown in Fig. 3b, there are 76 different properties shown in the multi selection box that are available to enrich the retrieved WW2 event dataset. The exact SPARQL query result (i.e., all the available event properties) can be accessed with this link.Footnote 20 These additional properties provide rich contextual information for each event during World War II. In contrast, a traditional interactive map for World War II such as World War II Interactive MapFootnote 21 only contains a short description for each event.

Note that all those steps discussed above can be accomplished in a few minutes through interactions with the GIS platform. In contrast, if we want to follow the traditional narrative cartography practice, it will require a lot of efforts make such map since those event information sources are scattered in different parts of a narrative and substantial efforts are needed to extract these information, preprocess them, then make such information ready for a cartography program. However, the event information is readily available through a KG such as Wikidata. This will substantially reduce the data acquisition efforts and accelerate the mapping process. Moreover, if one would like to add more information which is missing from the Wikidata, it is also very easy to integrate these repositories (KGs) together given the power of the underlining Semantic Web technologies.

Limitations of the KG-Based GeoEnrichment Approach for Narrative Cartography

Despite these advantages of the KG-based GeoEnrichment toolboxes, we also observe several limitations if we only use existing KGs for narrative mapping with the help of our toolboxes. For specifically, there are three challenges — data incompleness of the existing KGs, semantic challenges in modeling map content, and semantic challenges in modeling the geovisualization process.

The first limitation is the data incompleteness of the existing KGs for the select map topic. For example, for World War II use case in “A Map of All Events During World War II”, we retrieved 48 unique significant events who are transitively part of World War II in Wikidata. However, World War II Interactive MapFootnote 22 shows 334 different events. One of the reasons for this discrepancy is that in Fig. 3 we only consider 1-, 2-, 3-, and 4-degree has part relationship/property paths from the World War II node (wd:Q362) while subevents that are more than 4 degree away from World War II node with has part relation will not be retrieved. For example, Battle of Mount SongFootnote 23 (wdt:Q13403439) is a famous battle during War World II which is 5-degree away from the WW2 node. However, even if we account for all the sub-events which is x-degree way from the WW2 node, by using this property path query,Footnote 24 we can only retrieve 122 subevents.

Another important reason is the data incompleteness issue of Wikidata. Wikidata, as one of the world largest collaborative open-sourced KGs, still suffers from the data incompleteness issue. It is possible that some events during the World War II are missing (entity missing) and some links among these events are missing from Wikidata (link missing). As for entity missing, one example in the World War II use case is that one specific assault of the Japanese Army during the Defense of Sihang WarehouseFootnote 25 is not instantiated as an event node in Wikidata. Compared with entity missing, link incompleteness is more common in existing KGs. For example, the Second Sino-Japanese WarFootnote 26 is not linked to the First Battle of ChangshanFootnote 27 with a has part relation while they are linked reversely with a part of relation despite the fact that these two relations are inverse to each other. This indicates that there are missing links among these event entities. So in order to find all subevents of World War II, we should use the SPARQL query shown in Listing 1 which combines two property path query patterns (1) and (2) with the has part and part of relation. This queryFootnote 28 will return 2087 events as the result which is far more than those listed in World War II Interactive Map. This also shows the limitation of our KG-based GeoEnrichment tool to support various property path queries. We leave this as a future work and discuss it in “Conclusions and Outlook”.

Listing 1
figure a

Query for all sub-events of World War II in Wikidata

Another limitation is semantic incompatibility. It can happen within the existing KG or between the existing KGs and the intended map topic. The former means there are some semantic conflicts within a single KG. We take the World War II use case again to demonstrate this. Based on the description in Wikidata, World War II started on September 1st, 1939 (start time), and ended on September 2, 1945 (end time). So strictly speaking, all events outside of this time interval should not be consider as its subevents. However, as shown in Fig. 3c, the timeline starts on January 1st, 1937 because Wikidata also asserts the Second Sino-Japanese War which happened on January 7th, 1937, as a subevent of World War II. This is a common data disagreement issue in open-source KGs because of the open world assumption. The latter means that the semantics of a concept such as World War II in the existing KGs might be different from the one that cartographers have in their mind. For example, Jinan IncidentFootnote 29 happened on May 11th, 1928 is considered as an event of WW2 in the World War II Interactive Map (See this linkFootnote 30). However, Wikidata does not declare such a subevent relation (i.e., part of relation). According to the open world assumption (Drummond and Shearer 2006) adopted by the Semantic Web technologies, we are not able to make any assertion relationship between this event and World War II. However, given the fact that Jinan Incident happened 11 years before the defined start time of World War II, instead of assuming that the link between them is missing because of data incompleteness, we are inclined to believe that Jinan Incident is treated as an individual event, rather than a subevent of the WW2 in Wikidata. This indicates a semantic incompatibility between Wikidata and World War II Interactive Map. To solve the semantic incompatibility issue within one KG, we can define some data quality constraints (e.g., with the SHACL shapesFootnote 31). With regard to the semantic incompatibility issue among different existing KGs and the intended map topic, we need to formally define an ontology for map content to enable the semantic interoperability among them. We will demonstrate an ontology design pattern for map content in “The Map Content Module”.

The third and last limitation comes from the semantic challenges in geovisualization discussed in “Introduction”. Although our KG-based GeoEnrichment tool can allow a GIS user to explore existing KGs within a GISystem, it only concerns about the map content but not the geovisualization process. How to interpret the data retrieved from an existing KG and how to visualize them on the map would depend on cartographers. The question on whether to visualize a battle as a circle, a drop pin, or a cross on the map is implicitly embedded into the mind of cartographers. To explicitly express the semantics of the geovisualization process, we need to design an ontology design pattern for it which will be discussed in “The Cartography Module”.

A Modular Ontology for Narrative Cartography

To tackle the semantic challenges in modeling map contents as well as geovisualization process discussed in “Limitations of the KG-Based GeoEnrichment Approach for Narrative Cartography”, in this section, we take a step forward in the direction of underpinning narrative cartography with KGs, i.e., we formalize the knowledge involved in producing narrative maps in several ontologies. Such a formalization work is divided into two parts, namely the narrative map content ontology, and the data visualization (cartography) ontology. These ontologies are available through our Github repository.Footnote 32

The Map Content Module

In principle, the types of map content used in narrative cartography are numerous; apart from its main body (the map and its legend), there can also be other types of media such as image, audio, and video to serve as attachments of the maps. This implies that the representation forms in narrative cartography are both various and uncertain — it is uncertain which forms are employed in a particular context. Therefore, in this paper we concentrate on the formalization of the main body of the map content, i.e., the map itself but not its attachments.

The main objective for designing Map Content module is to overcome the semantic challenge in modeling map content and fix the semantic incompatibility among different data sources. The designed Map Content module in the narrative cartography ontology is demonstrated in Fig. 4 (note that the concepts shaded in orange are in this module, and others are reused from other ontologies). In this ontology, the most generalized concept is MapContent, representing the entire content of narrative maps to be rendered. An instance of MapContent is associated with one or several MapContentType through the relation hasMapContentType. The concept MapContentType generalizes the phenomenon that the narrative entails and is to be exhibited with cartography. An instance of MapContentType can be interpreted as a map layer used in a narrative map, e.g., the collection of events occurred during World War II can be viewed as an instance of this concept. One MapContentType consists of several MapContentItem with the hasMapItem relation. An instance of MapContentItem is one individual item (e.g., a particular battle during World War II) needs to be display on the map.

Fig. 4
figure 4

The Map Content module of the Narrative Cartography Ontology

With regard to narrative cartography, there are two major types of geographic entities need to be visualized — Object and Event. They can be further classified into different subclass. Object can be classified into Mountain, City, Park, and so on while Event can be classified into Natural Disaster, Expedition, War, and etc. These object and event classifications depend on the real map content and several existing geographic feature type classification schema can be used here such as the Geographic Names Information System (GNIS) Feature Classification schema (Regalia et al. 2018). In other words, these object and event classifications can be borrowed from other ontology design patterns and we indicate them in blue boxes with dash line boundaries.

Each instance of MapContentItem should be associated with an instance of Spatiotemporal- Extent, which is used to describe the spatiotemporal scope of an object or an event. Spatio- temporalExtent is further divided into SpatialExtent and TemporalExtent to separately delineate the spatial scope and temporal scope of the associated instance of MapContentItem. For instance, for an event — Battle of Sedjenane, its temporal scope is during February 26 to March 4 of 1943, and its spatial scope is a Tunisian town “Sejenane” (according to the relevant entity in WikidataFootnote 33). An object can also have a temporal scope. For example, Soviet Union has a temporal scope from December 30th, 1922, to December 26th, 1991. Note that the use of SpatialExtent is to visualize the spatial footprints of events and objects. It corresponds to the location information (the 2nd row) discussed in Table 1. The use of TemporalExtent facilitate the formalization of temporal sequence of different MapContentItem for the geovisualization of narratives. It corresponds to the occurring time and temporal scope of entities in Table 1 under the class of temporal information. TemporalExtent is particularly important for narrative cartography compared to other geovisualization tasks since the temporal ordering of the events and stories are the focus of it.

It is also valuable to model the “observations” of each object and event. Here, “observations” means the observed properties that are used to describe objects and events. We use sosa:Observation from the SOSA ontology (Janowicz et al. 2019) as well as sosa:Observation- Collection from the SOSA extensionFootnote 34 to model them. The attributes about geographic entities in Table 1 can be modeled as “observations”. For example, the elevation of Rio de Janeiro, one stopover of Ferdinand Magellan’s expedition, can be modeled as an instance of sosa:Observation with elevation above sea level as its sosa:ObservableProperty and the elevation number as its observation result — sosa:Result. If there are multiple observations for the same event or object (e.g., population, elevation, and precipitation), we can model them as different Observation instances which are all linked to one ObservationCollection instance. This ObservationCollection instance is then link to the corresponding MapContentItem instance through sosa:isFeatureOfInterestOf relation.

In order to enrich the metadata of the target instances of MapContentItem, MapContentType, and MapContent, they can be associated with provenance information, e.g., by using the PROV OntologyFootnote 35).

Note that the Map Content module shown in Fig. 4 is a general ontology design pattern (ODP) to model the content of a narrative map. A cartographer can build a map content KG based on this ODP. Since this ODP is very general and flexible, it can be easily linked to some existing KGs such as Wikidata and DBpedia (i.e., data integration) based on entity alignment between MapContentItem instances in this KG and geographic entities in other existing KGs. After entity alignment step, the map content KG will be significantly enriched with other data repositories. This will lead to a more powerful geovisualization which allows users to explore all these rich contextual information. Based on this formally defined ODP and entity alignment, we can also solve the semantic incompatibility issue (see “Limitations of the KG-Based GeoEnrichment Approach for Narrative Cartography”) and achieve semantic interoperability among different KGs.

The Cartography Module

The portrayal of the instances of MapContentType is also formalized into an ontology module, namely the Cartography module (Fig. 5; the concepts shaded in green are in this module). At the most generalized level, the concept FeatureTypeStyle represents the style that converts an instance of MapContentType to visualizations (graphics). They are associated through the relation hasStyle. Following the design pattern of the knowledge base for geovisualization from Huang and Harrie (2020), we design the cartography module coupling ontology and semantic rules.

Fig. 5
figure 5

The Cartography module of the Narrative Cartography Ontology

For the ontological part, the concept FeatureTypeStyle is associated with Symbol, and thereafter Symbolizer concept, which then is linked to specific geometry portrayal concepts, e.g.,Stroke for linestrings, and Fill for polygons. Symbol is also associated with Legend and LegendItem which represent the information of the map legend. An instance of Symbol could have a number of instances of Symbolizer, which is at the implementation level to link particular portrayal rules with specific geometry portrayal concepts. For instance, we can render the battles lasting longer than a year (rule condition) with a particular size and color of dots (rule conclusion — a particular symbolizer).

The portrayal rules in this paper are organized in a rule base, and implemented with SPARQL rules. Here we provide four concrete examples of such rules in Listings 234 and 5. The first two rules are based on the relation among entities. The rest two rules are based on temporal constraints. Listing 2 is a SPARQL rule that states using a particular symbolizer for the battles during the World War II that the US participated. The condition of the rule comes after the keyword WHERE, saying that (1) the entity is a battle (the first clause in the condition); (2) the battle is a part of World War II (the second and third clause); and (3) the US participated in the battle (the fourth clause). Note that all the predicates (e.g., wdt:P32) and entities (e.g., wd:Q362) are from Wikidata. The rule conclusion comes above after the keyword CONSTRUCT, saying that this rule uses a particular symbolizer (symbolizer_0 in this case) for the entities that meet the conditions below. Likewise, Listing 34, and 5 formalize the rules using a particular symbolizer for the battles with more than 5 participating countries, using a particular symbolizer for the battles lasting more than 30 days, as well as using a particular symbolizer for the battles whose start time is in 1939. Listings 4 and 5 show SPARQL rules with temporal constraints which are particularly interesting from the narrative cartography perspective.

Listing 2
figure b

Use a particular symbolizer for the battles during World War II that the US participated. See Linka for it corresponding SPARQL query in Wikidata. ahttps://api.triplydb.com/s/NDF05YYBl

Listing 3
figure c

Use a particular symbolizer for the battles during World War II with more than 5 participating countries. See Linka for it corresponding SPARQL query in Wikidata. ahttps://api.triplydb.com/s/tY6amm4AZ

Listing 4
figure d

Use a particular symbolizer for the battles during World War II which lasted more than 30 days. See Linka for it corresponding SPARQL query in Wikidata. ahttps://api.triplydb.com/s/sUOFyqZNx

Listing 5
figure e

Use a particular symbolizer for the battles during World War II whose start time is in 1939. See Linka for it corresponding SPARQL query in Wikidata. ahttps://api.triplydb.com/s/ETk6mbUHx

Note that here we show how to visualize a specific set of battles that satisfy certain conditions with the Wikidata as the underlining map content KG. As for the map content KG built based on the Map Content ontology design pattern, we need to change the selection clauses to select MapContentItem instances that satisfy the said condition. As for the rules depicted in Listings 4 and 5, the start time of battles can be access through the TemporalExtent of each MapContentItem instance. The reason we use Wikidata as the map content KG is that it is easier for the reader to test these rules through real SPARQL queries shown in each listing.

In terms of the implementation of the portrayal rule base, such rules can be encapsulated in a named graph, and represented with the SHACL rule vocabulary.Footnote 36 In some RDF (KG) stores/frameworks (e.g., RDF4JFootnote 37), deductions can be derived automatically, and in such settings the rule base could derive corresponding symbolizers to individual map objects to realize a knowledge-based narrative cartography scenario. For technical details, please see Huang and Harrie (2020) and its Github repository.Footnote 38

By modeling the whole geovisualization process with our Cartography ontology design pattern, we can explicitly express the semantics behind each map symbol and legend item. For example, when we see a specific map symbol portrayal:symbolizer_2 (see Listing 4) on a narrative map, it indicates a battle lasting more than 30 days during the World War II. This practice can help us to overcome the semantic challenges in geovisualization described in “Introduction” and “Limitations of the KG-Based GeoEnrichment Approach for Narrative Cartography ”, and achieve map reproducibility).

Conclusions and Outlook

In this work, we introduce the idea of doing narrative cartography with knowledge graphs. The main motivation is to overcome the data acquisition & integration challenge and the semantic challenge of the conventional narrative cartography techniques and foster underlying data integration, data reusablity, and visualization reproducibility. We first discuss a way to utilize our KG-based GeoEnrichment tool developed for ArcGIS Pro to directly make narrative maps based on an existing KG — Wikidata. Two use cases are provided to illustrate the effectiveness of this idea — a map of Ferdinand Magellan’s expedition as well as a map of all events during World War II. We show that this KG-based GeoEnrichment tool can effectively help map a narrative with substantially less efforts in data acquisition, preprocessing, and integration. Moreover, our approach requires nearly no prior knowledge about Semantic Web technologies from the users.

We also identify several limitations for this GeoEnrichment approach — data incompleteness of the existing KGs, semantic incompatibility in map data among different data sources, as well as the semantic challenges of geovisualization process. To overcome the last two challenges, we develop a modular ontology for narrative cartography which consists of two ontology design patterns — Map Content module and Cartography module. The Map Content module formalizes the concepts and relations entailed in the map content. It can be utilized to formally define the semantics of each map content concept and achieve semantic interoperability between the narrative map KG and other existing KGs. So the semantic incompatibility issue can be well handled. The Cartography module explicitly models the semantics behind the geovisualization process so that a narrative map can be easily reproduced through the deductions made by portrayal rules.

This paper can be treated as the latest endeavor to use KGs and Semantic Web technologies for a narrative cartography purpose. Compared with previous work discussed in “Formalizing Geospatial and Cartographic Knowledge with Ontologies” that mostly focused on modeling map content for map sharing and search purpose, our narrative cartography ontology formalizes both the map content as well as the geovisualization process. We have discussed many advantages of this approach, including a semantic explicit map data representation to facilitate map data reusability, a more expressive way to represent the geovisualization process to facilitate map reproducibility, and an easier way to do data acquisition and integration.

However, in order to establish a mature KG-based narrative cartography framework, there are still several technical challenges to be resolved. First, as discussed in “Limitations of the KG-Based GeoEnrichment Approach for Narrative Cartography”, the data incompleteness issue of the existing KGs can possibly affect the quality of the map data which in turn affects the reliability of the resulting maps. Recently we have witnessed many efforts to solve this data incompleteness issue by using relational machine learning techniques (Dong et al. 2014; Nickel et al. 2015). Various KG embedding techniques have been proposed for link prediction and KG completion, such as TransE (Bordes et al. 2013), TransH (Wang et al. 2014), TransR (Lin et al. 2015), ComplEx (Trouillon et al. 2016), R-GCN (Schlichtkrull et al. 2018), TransGCN (Cai et al. 2019), and RoteE (Sun et al. 2019). They perform very well on some experimental datasets like FB15K and WN18. However, how well they perform in a real-world setting and how reliable the predicted links are have not been systematically studied. Moreover, most KGE models ignore literal nodes while focusing on predicting relations among entities. These literal nodes are particularly important for geovisualization purpose such as geographic coordinates, timestamps, and other text information. Some recent works have shown how to encode spatial information (Mai et al. 2020; Mai et al. 2022) and temporal information (Dasgupta et al. 2018; Kazemi et al. 2019; Cai et al. 2021) into the embedding space so that link prediction among entities and certain literal nodes (geographic coordinates, timestamps) are possible. Yet, they also need to be validated for their reliability.

Second, although our proposed Linked Data Relationship Finder tool (in Figs. 2a, b, and 3a) is very useful to explore N-degree relation paths from some start nodes, it also has some limitations. For example, it cannot fully support a logical disjunction among several relationship paths such as Listing 1 which is sometimes important for a narrative cartography use case. This is due to the user interface design of this tool and the restriction of ArcGIS Python toolbox. To design a more intelligent user interface for GIS users to interact with KGs from within GISystems, we need to do more study on the user need and have a more comprehensive list of competency questions.

Third, the proposed modular ontology for narrative cartography only focuses on the main body of the map — the map itself and the legend. We have not discussed how to represent multimedia data such as images, audios, and videos in the Map Content module. We also have not discussed how to display these information during the geovisualization process through the Cartography module. We treat this as the future work, whereas a preliminary prototype of developing geovisualizations entirely backed by KGs can be found in Huang and Harrie (2020).

Fourth, through the KG-based GeoEnrichment tool, we have shown how to use an existing KG as the map content KG for narrative mapping. However, we have not discussed how to use the proposed Cartography module to guide the mapping process within an existing GISystem.

Last but not least, in this work, we focus on representing the real-world historical events and objects mentioned in narratives into a KG and visualizing them on a map. This work does not cover how to model and represent the content of fictional narratives. Recently, Branch et al. (2017) have made progresses in formalizing transmedia fictional world into an ontology which describes the relations among different fictional concepts such as characters, elements of power, items, places, and events. How to develop geovisualization on top of them is an interesting yet challenging task. For instance, since the fictional world might describe a totally different geographic space and layout, commonly used basemaps used for geovisualization may not be applicable here. Making fictional map is nontrivial task. We leave this as the future work.