WorldKG: World-Scale Completion of Geographic Information

Dsouza, Alishiba; Tempelmeier, Nicolas; Gottschalk, Simon; Yu, Ran; Demidova, Elena

doi:10.1007/978-3-031-35374-1_1

Alishiba Dsouza⁴,
Nicolas Tempelmeier⁵,
Simon Gottschalk⁵,
Ran Yu⁴ &
…
Elena Demidova^4,6

1085 Accesses
3 Altmetric

Abstract

Knowledge graphs provide standardized machine-readable representations of real-world entities and their relations. However, the coverage of geographic entities in popular general-purpose knowledge graphs, such as Wikidata and DBpedia, is limited. An essential source of the openly available information regarding geographic entities is OpenStreetMap (OSM). In contrast to knowledge graphs, OSM lacks a clear semantic representation of the rich geographic information it contains. The generation of semantic representations of OSM entities and their interlinking with knowledge graphs are inherently challenging due to OSM’s large, heterogeneous, ambiguous, and flat schema and annotation sparsity. This chapter discusses recent knowledge graph completion methods for geographic data, comprising entity linking and schema inference for geographic entities, to provide semantic geographic information in knowledge graphs. Furthermore, we present the WorldKG knowledge graph, lifting OSM entities into a semantic representation.

You have full access to this open access chapter, Download chapter PDF

Extending the YAGO2 Knowledge Graph with Precise Geospatial Knowledge

KnowWhereGraph-Lite: A Perspective of the KnowWhereGraph

Grounding Linked Open Data in WordNet: The Case of the OSM Semantic Network

Keywords

1 Introduction

Geographic information is of crucial importance for a variety of real-world applications, including accident prediction (Dadwal et al. 2021), detection of topological dependencies in road networks (Tempelmeier et al. 2021a), and positioning charging stations (von Wahl et al. 2022). Such applications can substantially profit from standardized machine-readable representations of geographic entities, including monuments, roads, and charging stations. Particularly, such semantic representations should comprise detailed descriptions of geographic entities, including their types, properties, context, relations, and interlinking across sources.

OpenStreetMap (OSM)^{Footnote 1} is a critical source of volunteered and openly available geographic information. OSM provides rich but highly heterogeneous data regarding geographic entities, including fine-grained coordinates of real-world locations and user-defined tags comprising entity types, properties, and relations. At the time of writing, OSM contains over 6.8 billion entities from 188 countries.^{Footnote 2} However, the adoption of OSM data in real-world applications is limited, mainly due to the large, heterogeneous, ambiguous, and flat schema adopted for the OSM tags.

Knowledge graphs (Hogan et al. 2021)—graph-based representations of real-world entities and their relations—provide detailed machine-readable descriptions of real-world entities through ontologies and facilitate interlinking across sources. Information representation in knowledge graphs is based on W3C standards such as the Resource Description Framework (RDF)^{Footnote 3} and established ontologies. This representation facilitates structured semantic access via standardized query languages, such as SPARQL.^{Footnote 4} Although popular general-purpose knowledge graphs such as Wikidata and DBpedia (Auer et al. 2007) contain a number of geographic entities, only a tiny fraction of them include precise location information. Furthermore, whereas some community-defined links between OSM entities and knowledge graphs exist at the instance level, these links are sparse and cover only selected entity types. For example, as of September 2022, only \(0.52\%\) of OSM nodes provided links to the Wikidata knowledge graph. In this setting, knowledge graph completion, such as interlinking knowledge graphs and geographic information sources at the entity and schema levels, is inherently challenging due to the representation heterogeneity in OSM and the sparsity of geographic information in popular knowledge graphs.

Table 1.1 illustrates a geographic entity, Cairo, the capital of Egypt, and its representations in OSM and the Wikidata knowledge graph,^{Footnote 5} OSM provides information as heterogeneous key-value pairs called “tags.” In this example, OSM encodes the entity type information as \(\langle \mathrm {place},\mathrm {city} \rangle \), whereas the precise semantics of the tags often remain unclear. In contrast, entities in Wikidata are represented via well-defined statements, also known as RDF triples. A triple has the form \(\langle \)Subject, Predicate, Object\(\rangle \) and enables the representation of entity types, properties, and relations. In Wikidata, an entity type is expressed using the instance of property, denoted as wdt: P31.^{Footnote 6} In this example, this property connects a unique entity identifier wd:Q85, representing Cairo, to the entity type city, denoted as wd: Q515.^{Footnote 7}

Table 1.1 Representation of Cairo in OpenStreetMap and Wikidata

Full size table

In this chapter, we discuss recent methods aiming to bridge the gap between OSM and knowledge graphs through semantically enriching geospatial information in OSM and making this information available in WorldKG—a novel geographic knowledge graph. In particular, we develop methods for knowledge graph completion, to establish links between OSM and knowledge graphs at the entity and schema levels. Geographic entity linking discussed in this chapter aims at interlinking the representations of entities in OSM and Wikidata (Cairo in this example). Geographic class alignment aims to link the OSM tags that provide entity type information to the corresponding knowledge graph classes. In this example, the OSM tag \(\langle \mathrm {place},\mathrm {city} \rangle \) should be aligned to the Wikidata class wd:Q515 (city).

Existing schema alignment and entity linking methods are not directly applicable to geographic data sources such as OSM due to structural differences between OSM and knowledge graphs (Otero-Cerdeira et al. 2015). Generic schema matching methods typically rely on name and schema structure similarities (Madhavan et al. 2001). Other approaches, such as LIMES (Ngomo and Auer 2011), have strict heuristics and consider fixed schemas. Geographic entity linking approaches such as LinkedGeoData (Auer et al. 2009) rely on manually aligned schemas and create links using type information, spatial distance, and name similarity. These approaches often fail due to representation differences, toponym ambiguities, OSM schema flatness, and geographic coordinate variations across sources. Thus, new approaches are required to lift OSM’s flat and heterogeneous geographic information into a precise, machine-readable semantic representation.

In the remainder of this chapter, we first formally define the problem of geographic entity linking and class alignment in Sect. 1.2. Then, we discuss approaches we recently proposed to interlink OSM and knowledge graphs at the entity and schema levels. These approaches are illustrated in Fig. 1.1. In Sect. 1.3, we present OSM2KG—a geographic entity linking approach (Tempelmeier and Demidova 2021) depicted in the upper part of Fig. 1.1. Following that, in Sect. 1.4, we discuss NCA—a neural approach for geographic class alignment between OSM and knowledge graphs that utilizes entity links (Dsouza et al. 2021a). NCA is illustrated in the lower part of Fig. 1.1. Then, in Sect. 1.5, we describe WorldKG—a novel geographic knowledge graph (Dsouza et al. 2021b) that adopts OSM2KG and NCA to provide semantic representations of OSM entities. Finally, in Sect. 1.6, we discuss open research directions and provide a conclusion.

A flow diagram of creating World K G from O S M. The geographic entity linking with linked classes and geographic class alignment with aligned classes flows to the world K G creation and geographic knowledge graph. — **Fig. 1.1**

2 Problem Definition

Linking geographic data sources and knowledge graphs at the entity and the schema level can help create a comprehensive source of geographic information, i.e., a geographic knowledge graph. First, we define geographic data sources and knowledge graphs based on the definitions by Tempelmeier and Demidova (2021).

A geographic data source represents geographic entities. Each geographic entity is annotated with an identifier, a location, and a set of key-value pairs called tags. More formally:

Definition 1.1

A geographic data source\(\mathcal {G}= (N, T)\) consists of a set of geographic entities N and a set of tags T. Each tag \(t \in T\) is represented as a key-value pair \(t = \langle k,v \rangle \). Each node \(n \in N\) represents a real-world geographic entity with a geolocation and a set of tags \(T_n \subset T\).

A typical example of a geographic data source is OpenStreetMap. Examples of the tags assigned to the node representing Cairo are illustrated in Table 1.1. A geolocation can be represented as a coordinate pair (i.e., latitude and longitude) or a sequence of coordinate pairs (e.g., forming a polygon).

A knowledge graph is a semantic information source containing data regarding real-world entities, their classes, and their properties. Typical examples of popular general-purpose knowledge graphs are Wikidata and DBpedia, which cover an extensive set of real-world entities and their relations in various application domains. Table 1.1 illustrates selected properties of the Wikidata entity representing Cairo.

Definition 1.2

A knowledge graph\(\mathcal {K}\mathcal {G} = (E, C, P, L, L_{geo}, F)\) consists of a set of entities E, a set of classes \(C \subset E\), a set of properties P, a set of literals L, a set of geolocations \(L_{geo}\), and a set of relations \(F \subseteq E \times P \times (E \cup L \cup L_{geo})\).

An entity in a knowledge graph can represent a real-world entity or a class of entities. Literals represent values including strings, numbers, and dates. An entity \(e \in E\) can be related to one or multiple geolocations representing different geometries, such as a point or a polygon.

Geographic entity linking aims to align entities from a geographic data source and a knowledge graph representing the same real-world entity.

Definition 1.3

Given a geographic data source \(\mathcal {G}= (N, T)\) and a knowledge graph \(\mathcal {K}\mathcal {G} = (E, C, P, L, L_{geo}, F)\), the problem of geographic entity linking is to identify a node \(n \in N\) and an entity \(e \in E\) representing the same real-world object.

In the example illustrated in Table 1.1, Cairo from OSM and Cairo from Wikidata represent the same real-world entity. In RDF, this link is typically denoted using the owl:sameAs property.

Schema alignment refers to the interlinking of equivalent schema elements across sources. In the context of this work, we focus on geographic class alignment, which refers to the interlinking of tags of a geographic data source and classes of a knowledge graph representing the same semantic concept. The key or the value of the tag alone is not sufficient to describe the class. For example, tag \(\langle \mathrm {natural},\mathrm {peak} \rangle \) describes the concept Mountain. Considering only the key or the value here may not align to the correct class Mountain of the knowledge graphs. Hence, we align the tag, i.e., key=value, to the knowledge graph classes.

Definition 1.4

Given a geographic data source \(\mathcal {G}= (N, T)\) and a knowledge graph \(\mathcal {K}\mathcal {G} = (E, C, P, L, L_{geo}, F)\), the problem of geographic class alignment is to identify a tag \(t \in T\) and a class \(c \in C\) representing the same semantic concept.

For example, the OSM tag \(\langle \mathrm {place},\mathrm {city} \rangle \) and the Wikidata class wd:Q515 illustrated in Table 1.1 represent the same semantic concept.

In the context of this work, a geographic knowledge graph refers to a knowledge graph that provides geolocation information for a substantial fraction of the entities it contains. This work aims to create a geographic knowledge graph through geographic entity linking and class alignment.

3 Geographic Entity Linking with OSM2KG

Geographic entity linking refers to the task of interlinking geographic entities representing the same real-world entity across data sources (Definition 1.3). Typically, entity linking approaches utilize semantic and syntactic similarity of the different entity representations. In OSM, geographic entity linking is particularly challenging due to the large scale and the ambiguities of location names. For example, “Berlin” can denote the name of the capital of Germany and a restaurant name. Also, the names of geographic entities, such as “Church Road,” are often non-distinctive.

3.1 Related Work

State-of-the-art entity linking approaches such as LIMES (Ngomo and Auer 2011) and WOMBAT (Sherif et al. 2017) assume that entities are represented through the same number of properties with a 1:1 property mapping. In the case of linking OSM nodes with the knowledge graph entities, these assumptions do not hold. Entity linking methods such as DBpedia Spotlight (Daiber et al. 2013) detect links between textual data and knowledge graphs. LinkedGeoData (Stadler et al. 2012) performs geographic entity linking by creating links between OSM and knowledge graphs such as DBpedia and GeoNames. However, the linking with LinkedGeoData relies on manually aligned schema, syntactic similarity, and spatial distance. To overcome the shortcomings of current approaches, we proposed OSM2KG (Tempelmeier and Demidova 2021), a machine learning algorithm for geographic entity linking based on representation learning of OSM tags.

3.2 The OSM2KG Approach

The overall geographic entity linking process of OSM2KG is illustrated in the upper part of Fig. 1.1. First, OSM2KG adopts geographic blocking to reduce the number of potential candidate entities given an OSM node (candidate set generation). Then, OSM2KG creates latent representations of OSM tags (tag embeddings) and extracts features of the candidate entities (feature extraction). Finally, OSM2KG predicts if a node-entity pair represents the same real-world entity (link classification). In the following, we present these steps in more detail.

Candidate Set Generation

Given a node \(n \in N\) of the geographic data source \(\mathcal {G}=(N, T)\), the goal of the candidate set generation step is to identify potentially matching entities in the knowledge graph \(\mathcal {K}\mathcal {G}\). The geographic coordinates in OSM and knowledge graphs represent the points of community consensus, rather than an objective metric (Auer et al. 2009). Consequently, the coordinates of geographic entities represented in these sources can deviate. The candidate generation step is based on the intuition that entities and nodes representing the same real-world entities should be located in geographic proximity. Thus, for a given input node n, OSM2KG creates the candidate set by considering all entities within an experimentally determined distance threshold. A spatial index such as R-Tree (Guttman 1984) can be utilized to enable efficient geographic blocking.

Tag Embeddings

The set of tags \(T_n\) assigned to an OSM node n plays an essential role in detecting the correct matching candidate. As OSM tags are highly heterogeneous, OSM2KG aims at learning their unsupervised latent representations. OSM2KG utilizes a skip-gram-based neural network model (Mikolov et al. 2013). This representation learns the co-occurrences of OSM tags to capture the semantic similarity between OSM nodes. The embedding model is trained in an unsupervised manner based on the tag similarity of geographic entities, meaning geographic entities with similar tags are represented in a closer space. The resulting embeddings can be used to estimate the semantic similarity of the OSM nodes. Geographic coordinates are not considered in this step, such that the embedding reflects semantic similarity independent of the geolocation.

Feature Extraction

For each candidate entity from \(\mathcal {K}\mathcal {G}\) in the candidate set, we extract additional features, namely, the entity type and its popularity, as reflected by the number of incoming edges in the knowledge graph.

In addition to the tag embeddings and the entity features, the Jaro-Winkler distance (Winkler 1999) is calculated between the names of the OSM node and the candidate to measure their name similarity. Furthermore, the logistic distance proposed by Stadler et al. (2012) is used to compute the geographic distance between the OSM node and the candidate entity.

Link Classification

Finally, a random forest classification model is utilized to classify whether the input node in \(\mathcal {G}\) represents the same real-world entity as the candidate entity in \(\mathcal {K}\mathcal {G}\). To train the model, as positive examples, OSM2KG takes node-entity pairs from the existing links between \(\mathcal {G}\) and \(\mathcal {K}\mathcal {G}\).

3.3 Evaluation Results of the OSM2KG Approach

We evaluated OSM2KG (Tempelmeier and Demidova 2021) regarding its interlinking performance. In particular, we considered the interlinking of OSM entities with Wikidata and DBpedia knowledge graphs in Germany, France, and Italy. This evaluation demonstrated a substantial F1-score improvement achieved by OSM2KG compared to eight different baseline approaches. OSM2KG performed best on all Wikidata datasets, achieving an F1-score of \(92.05\%\) on average and outperforming the best-performing baseline by \(21.82\) percentage points. OSM2KG also achieved the best recall performance and high precision on all datasets.

As a result of OSM2KG, we can infer new links between the OSM nodes and geographic entities in knowledge graphs. Such links can be beneficial for creating and enriching semantic sources, as they can provide complementary information regarding the linked geographic entities. These linked entities can also serve as additional training data to develop supervised methods for geographic schema alignment.

4 Geographic Class Alignment with NCA

Geographic class alignment between a geographic data source \(\mathcal {G}=(\mathcal {N},\mathcal {T})\) and a knowledge graph \(\mathcal {K}\mathcal {G} = (E, C, P, L, L_{geo}, F)\) aims to align the tags and the classes representing the same real-world concepts (Definition 1.4).

The heterogeneous tag-based OSM structure created by volunteers makes it challenging to identify the tags that can be linked to knowledge graph ontologies. For example, the OSM tag \(\langle natural, peak \rangle \) corresponds to the “mountain” class in the Wikidata knowledge graph. This match cannot be easily identified using the existing approaches based on syntactic and structural similarity.

4.1 Related Work

Ontology alignment methods typically rely on structural and element-level similarity to align schema elements (Otero-Cerdeira et al. 2015). As the OSM schema is flat, approaches that depend on the structural hierarchy (Melnik et al. 2002) do not perform well. Schema alignment methods that depend on the element-level syntactic similarity (Madhavan et al. 2001) do not work well either, due to the essential differences in the syntactic representation of OSM tags and knowledge graph classes. Instance-based alignment approaches (Ngo et al. 2013) rely on the structural similarity of neighboring instances to align schema elements. Machine learning (Doan et al. 2004) and deep learning-based approaches (Bento et al. 2020; Xiang et al. 2015) also rely on the structure. Furthermore, tabular data alignment methods (Cappuzzo et al. 2020) cannot appropriately handle sparse OSM tag annotations. Overall, the lack of a well-defined OSM ontology and the essential differences in the structural as well as syntactic representation of OSM tags and knowledge graph ontologies, along with the sparsity of OSM annotations, hinder the application of state-of-the-art ontology and schema alignment approaches. To overcome these limitations, we proposed NCA, a neural class alignment approach that utilizes existing entity links between geographic data sources and knowledge graphs in a novel shared latent space.

4.2 The NCA Approach

At the bottom of Fig. 1.1, we briefly illustrate the building blocks of the class alignment NCA approach. In the first step, NCA aims to create a shared latent space that aligns the feature spaces of \(\mathcal {G}\) and \(\mathcal {K}\mathcal {G}\). To this extent, NCA creates an auxiliary neural classification model. This model captures the semantic relations between the OSM tags and the semantic classes in the shared latent space. In the second step, NCA probes the auxiliary model to obtain the tag-to-class alignments between the OSM tags and the knowledge graph classes.

Auxiliary Classification

The goal of the first NCA step is to create a shared latent space containing similar latent representations of geographic entities in \(\mathcal {G}\) and \(\mathcal {K}\mathcal {G}\). To achieve this aim, NCA creates an auxiliary classification model. This model is trained to classify linked entities from \(\mathcal {G}\) and \(\mathcal {K}\mathcal {G}\) into the corresponding semantic classes \(c \in C\) of \(\mathcal {K}\mathcal {G}\). During supervised training, the auxiliary classification model adopts known pairs of linked geographic entities from \(\mathcal {G}\) and \(\mathcal {K}\mathcal {G}\) as a training set. For \(\mathcal {G}\), tags and keys having more than 50 occurrences^{Footnote 8} in OSM are selected as features. For \(\mathcal {K}\mathcal {G}\), top-25 properties of each class are used as features. These features are passed through the fully connected layers to form the shared latent space of the model that aligns the representations of OSM and \(\mathcal {K}\mathcal {G}\) entities. The intuition behind the shared latent space is that linked entities from OSM and \(\mathcal {K}\mathcal {G}\) that belong to the same semantic class will be represented similarly. To create the shared latent space, NCA adopts an adversarial classifier that exploits linked entities of \(\mathcal {G}\) and \(\mathcal {K}\mathcal {G}\). This classifier aims to distinguish between \(\mathcal {G}\) and \(\mathcal {K}\mathcal {G}\) entities in the latent space. NCA aims to make their representations similar by inverting the gradient of the adversarial loss. In this way, as a result of the training, the feature spaces of \(\mathcal {G}\) and \(\mathcal {K}\mathcal {G}\) are aligned.

Tag-to-Class Alignment

The training of the auxiliary classification model results in a shared latent space. NCA then probes the model with one OSM tag at a time and computes the complete forward pass. NCA selects the results of the classification layer to obtain the tag-to-class alignment. As one OSM tag can be matched to multiple classes in the knowledge graph, NCA selects all classes whose confidences exceed an experimentally determined threshold value.

4.3 Evaluation Results of the NCA Approach

We evaluated the NCA approach on OSM as the geographic data source and Wikidata and DBpedia as knowledge graphs (Dsouza et al. 2021a). The NCA performance was compared to six state-of-the-art ontology and tabular data alignment methods. The evaluation was conducted on a dataset with seven countries having the most data available in OSM, namely, Germany, France, Great Britain, Spain, Russia, the USA, and Australia. In terms of tag-to-class alignment, NCA obtained up to 13 and up to 37 percentage point improvement of the F1-score on Wikidata and DBpedia, respectively. On average, we observed 10 (21) percentage point F1-score improvement on Wikidata (DBpedia). As a result, the NCA approach increased the number of OSM entities with semantic class annotations from Wikidata and DBpedia knowledge graphs by over \(400\%\). The resulting tag-to-class annotations are available as part of the WorldKG knowledge graph presented in the next section.

5 The WorldKG Knowledge Graph

WorldKG is a geographic knowledge graph that provides semantic information on geographic entities extracted from OSM. While OSM contains rich data regarding such geographic entities, this data is not directly accessible to semantic applications. With WorldKG, we tackle this problem and provide a geographic knowledge graph. WorldKG follows the ontology illustrated in Fig. 1.2 and is available online.^{Footnote 9}

A relationship graph. It includes r d f s as class, w k g s of W K G object and W K G property, o s m as wiki, r d f property, geo spatial object, s f point, s f line string, and s f polygon with relationships of type, source, domain, spatial object, and subclass. — **Fig. 1.2**

5.1 Related Work

Geographic knowledge graphs such as LinkedGeoData (Auer et al. 2009) and YAGO2geo (Karalis et al. 2019) either contain only a few geographic classes or represent data of a restricted geographic area. Specialized geographic knowledge graphs such as the KnowWhereGraph (Janowicz et al. 2022) and EventKG (Gottschalk and Demidova 2019) concentrate on past events and have a limited location coverage. In contrast, WorldKG is based on OSM and contains over 100 million geographic entities on a world scale typed with over 1,000 semantic classes.

5.2 WorldKG Creation Approach

WorldKG captures geographic entities in OSM and contains links to Wikidata and DBpedia at the entity and class levels. The creation procedure of WorldKG includes two main tasks depicted in Fig. 1.1, namely, ontology creation and triple creation.

Ontology Creation

To infer a class hierarchy from OSM tags, we utilize OSM map features^{Footnote 10}—a list of established key-value pairs. We extract classes (keys) and their subclasses (values) from the map features. For example, from the map feature \(\langle place, city\rangle \), we infer the class “Place” and its subclass “City.” All remaining keys not covered by the map features are considered properties.

We convert the names of the extracted properties and classes according to the OWL naming conventions.^{Footnote 11} We also incorporate the tag-to-class alignment inferred using the NCA approach (Sect. 1.4) into the WorldKG ontology.

The WorldKG ontology is depicted in Fig. 1.2. Any object in WorldKG can be connected to a geolocation (geo:SpatialObject—either a point, a line string, or a polygon) via wkgs:spatialObject. Other relations are represented using properties typed as wkgs:WKGProperty. Information regarding the original OSM tags is provided by dcterms:source.

An example of a WorldKG entity of the class “City” is illustrated in Fig. 1.3. Via the property wkgs:spatialObject, the entity is connected to its geolocation, which provides a coordinate pair. The “City” class is connected to its equivalents in DBpedia and Wikidata.

A relationship graph. It includes w k g s as place, city, 240109189, W K G object, and geo 240109189, o s m wiki or tag, city, Germany, d b o as place and city, w d as Q 515, s f point, o s m n 240109189, with relations of equivalent class, subclass, source, and o s m link. — **Fig. 1.3**

Triple Creation

We add all OSM nodes that have at least one tag and belong to at least one class of the WorldKG ontology to the WorldKG knowledge graph. To this extent, we create triples that represent the nodes and their properties and adhere to the WorldKG ontology.

5.3 WorldKG Access, Statistics, Evaluation, and Examples

Access

WorldKG offers a GeoSPARQL endpoint.^{Footnote 12} This endpoint supports queries in GeoSPARQL^{Footnote 13}—a geographic query language for RDF data—and visualizes geolocations of the query results on a map.

Statistics

As of September 2022, WorldKG contains over 800 million triples describing approximately a 100 million entities that belong to over \(1,000\) distinct classes. The number of unique properties (wgks:WKGProperty) in WorldKG is over \(1,800\). As a result of the NCA approach presented in Sect. 1.4, WorldKG provides links to 40 Wikidata and 21 DBpedia classes.

Evaluation

We evaluated the quality of WorldKG by assessing the type assertions of the geographic entities (Dsouza et al. 2021b). From Wikidata and DBpedia, we randomly selected five classes each, aligned with the WorldKG ontology. Per each of these classes, we randomly selected 100 example geographic entities in WorldKG and manually checked if they belong to the assigned knowledge graph class. We observed that WorldKG achieved over 97% accuracy on average.

Examples

Listing 1.1 illustrates the representation of a WorldKG entity of type wkgs:Restaurant in the Turtle format. Listing 1.2 is an example query that makes use of the GeoSPARQL function bif: st_distance to extract three restaurants closest to the Dresden Central Station.^{Footnote 14} The query results are shown in Table 1.2 and Fig. 1.4. This example illustrates the potential of using WorldKG in downstream applications such as POI recommendation.

A screenshot of a Google map highlights the location of the restaurants nearby Dresden Central. The restaurants with distances are Marche 0.02, Dean and David 0.08, and Dschingis Khan 0.13. — **Fig. 1.4**

Table 1.2 Result of the example GeoSPARQL query in Listing 1.2

Full size table

Listing 1.1 RDF triples in the Turtle format for an example geographic entity of type wkgs:Restaurant in WorldKG

Listing 1.2 Example GeoSPARQL query to retrieve three restaurants closest to the Dresden Central Station (Dresden Hbf)

6 Discussion and Open Research Directions

In this chapter, we presented WorldKG—a geographic knowledge graph that we developed to provide a semantic representation of geographic entities in OSM. Furthermore, we described OSM2KG and NCA, novel methods for geographic entity linking and class alignment. These methods enable interlinking geographic entities in OpenStreetMap with other semantic sources of geographic information at the entity and schema levels. Our proposed approaches outperformed state-of-the-art methods when applied to OSM and popular general-purpose knowledge graphs, Wikidata and DBpedia. We made WorldKG publicly available.

WorldKG is a comprehensive source of semantic geographic information in its current form; it also opens many directions for future research.

A critical aspect of the knowledge graph creation from volunteered geographic information is data quality. As OSM data builds a basis for the knowledge graph creation, data quality issues of OSM can be propagated into WorldKG. In WorldKG, we rely on the existing links between OSM nodes and knowledge graphs as a quality signal. Moreover, to enhance the quality of OSM data, we developed OVID (Tempelmeier and Demidova 2022)—a novel method to detect vandalism in OpenStreetMap automatically. Quality aspects of OSM are also considered in Chap. 2. In future work, we would like to investigate further methods to enhance data quality in OSM and WorldKG. WorldKG can also potentially be used for visual reporting solutions discussed in Chap. 7.

To make OSM data more easily accessible to machine learning algorithms, we developed GeoVectors—a reusable openly available dataset of OSM embeddings (Tempelmeier et al. 2021b). GeoVectors approach extends the OSM node embedding algorithms presented in Sect. 1.3 and encodes semantic and geographic similarity of OSM nodes. In future work, we would like to leverage WorldKG and GeoVectors to provide semantic geographic information for machine learning applications.

Notes

1.
OpenStreetMap, OSM, and the OpenStreetMap magnifying glass logo are trademarks of the OpenStreetMap Foundation and are used with their permission. We are not endorsed by or affiliated with the OpenStreetMap Foundation.
2.
OSMstats: https://osmstats.neis-one.org.
3.
Resource Description Framework: https://www.w3.org/RDF/.
4.
SPARQL 1.1 Query Language: https://www.w3.org/TR/sparql11-query/.
5.
wd and wtd are the prefixes of http://www.wikidata.org/entity/ and http://www.wikidata.org/prop/direct/, respectively.
6.
Definition of the instance of Wikidata property: https://www.wikidata.org/wiki/Property:P31.
7.
Definition of the city (Q515) in Wikidata: https://www.wikidata.org/wiki/Q515.
8.
https://taginfo.openstreetmap.org/tags.
9.
WorldKG: https://www.worldkg.org/.
10.
https://wiki.openstreetmap.org/wiki/Map_features.
11.
https://www.w3.org/TR/owl-ref/.
12.
WorldKG GeoSPARQL endpoint: https://www.worldkg.org/sparql.
13.
GeoSPARQL: https://www.ogc.org/standards/geosparql.
14.
The geographic location of the Dresden Central Station is taken from OSM.

References

Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) DBpedia: a nucleus for a web of open data. In: Proceedings of the 6th International Semantic Web Conference, Asian Semantic Web Conference, ISWC 2007 \(+\) ASWC 2007, volume 4825 of Lecture Notes in Computer Science. Springer, Berlin, pp 722–735. https://doi.org/10.1007/978-3-540-76298-0_52
Auer S, Lehmann J, Hellmann S (2009) LinkedGeoData: adding a spatial dimension to the web of data. In: Proceedings of the 8Th International Semantic Web Conference, ISWC 2009, volume 5823 of Lecture Notes in Computer Science. Springer, Berlin, pp 731–746. https://doi.org/10.1007/978-3-642-04930-9_46
Google Scholar
Bento A, Zouaq A, Gagnon M (2020) Ontology matching using convolutional neural networks. In: Proceedings of the 12th Language Resources and Evaluation Conference, LREC 2020. European Language Resources Association, pp 5648–5653. https://aclanthology.org/2020.lrec-1.693/
Cappuzzo R, Papotti P, Thirumuruganathan S (2020) Creating embeddings of heterogeneous relational datasets for data integration tasks. In: Proceedings of the 2020 International Conference on Management of Data, SIGMOD 2020. ACM, New York, pp 1335–1349. https://doi.org/10.1145/3318464.3389742
Google Scholar
Dadwal R, Funke T, Demidova E (2021) An adaptive clustering approach for accident prediction. In: Proceeding of the 24th IEEE International Intelligent Transportation Systems Conference, ITSC 2021. IEEE, pp 1405–1411. https://doi.org/10.1109/ITSC48978.2021.9564564
Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: Proceeding of the International Conference on Semantic Systems, ISEM ’13. ACM, New York, pp 121–124. https://doi.org/10.1145/2506182.2506198
Chapter Google Scholar
Doan A, Madhavan J, Domingos PM, Halevy AY (2004) Ontology matching: a machine learning approach. In: Handbook on Ontologies, International Handbooks on Information Systems. Springer, Berlin, pp 385–404. https://doi.org/10.1007/978-3-540-24750-0_19
Chapter Google Scholar
Dsouza A, Tempelmeier N, Demidova E (2021a) Towards neural schema alignment for openstreetmap and knowledge graphs. In: Proceeding of the 20th International Semantic Web Conference, ISWC 2021, volume 12922 of Lecture notes in computer science. Springer, Berlin, pp 56–73. https://doi.org/10.1007/978-3-030-88361-4_4
Google Scholar
Dsouza A, Tempelmeier N, Yu R, Gottschalk S, Demidova E (2012b) WorldKG: a world-scale geographic knowledge graph. In: Proceeding of the 30th ACM International Conference on Information and Knowledge Management, CIKM ’21. ACM, New York, pp 4475–4484. https://doi.org/10.1145/3459637.3482023
Google Scholar
Gottschalk S, Demidova E (2019) EventKG—the Hub of event knowledge on the web- and biographical timeline generation. Semantic Web 10(6):1039–1070. https://doi.org/10.3233/SW-190355
Article Google Scholar
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the Annual Meeting, SIGMOD 1984. ACM Press, New York, pp 47–57. https://doi.org/10.1145/602259.602266
Google Scholar
Hogan A, Blomqvist E, Cochez M, d’Amato C, de Melo G, Gutiérrez C, Kirrane S, Gayo JEL, Navigli R, Neumaier S, Ngomo AN, Polleres A, Rashid SM, Rula A, Schmelzeisen L, Sequeda JF, Staab S, Zimmermann A (2021) Knowledge graphs. ACM Comput Surv 54(4):71:1–71:37. https://doi.org/10.1145/3447772
Janowicz K, Hitzler P, Li W, Rehberger D, Schildhauer M, Zhu R, Shimizu C, Fisher CK, Cai L, Mai G, Zalewski J, Zhou L, Stephen S, Estrecha SG, Mecum BD, Lopez-Carr A, Schroeder A, Smith D, Wright DJ, Wang S, Tian Y, Liu Z, Shi M, D’Onofrio A, Gu Z, Currier K (2022) Know, Know Where, KnowWhereGraph: a densely connected, cross-domain knowledge graph and geo-enrichment service stack for applications in environmental intelligence. AI Mag 43(1):30–39. https://doi.org/10.1609/aimag.v43i1.19120
Google Scholar
Karalis N, Mandilaras GM, Koubarakis M (2019) Extending the YAGO2 knowledge graph with precise geospatial knowledge. In: Proceedings of the 18th International Semantic Web Conference, ISWC 2019, volume 11779 of Lecture notes in computer science. Springer, Berlin, pp 181–197. https://doi.org/10.1007/978-3-030-30796-7_12
Google Scholar
Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid. In: Proceedings of the 27th International Conference on Very Large Data Bases, VLDB 2001. Morgan Kaufmann, pp 49–58. https://doi.org/10.5555/645927.672191
Melnik S, Garcia-Molina H, Rahm E (2002) Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th International Conference on Data Engineering, 2002. IEEE Computer Society, pp 117–128. https://doi.org/10.1109/ICDE.2002.994702
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems 2013, pp 3111–3119. https://doi.org/10.5555/2999792.2999959
Ngo D, Bellahsene Z, Todorov K (2013) Opening the black box of ontology matching. In: Proceedings of the ESWC 2013, volume 7882 of Lecture Notes in Computer Science. Springer, Berlin, pp 16–30. https://doi.org/10.1007/978-3-642-38288-8_2
Google Scholar
Ngomo AN, Auer S (2011) LIMES—a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011. IJCAI/AAAI, pp 2312–2317. https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-385
Otero-Cerdeira L, Rodríguez-Martínez FJ, Gómez-Rodríguez A (2015) Ontology matching: a literature review. Expert Syst Appl 42(2):949–971. https://doi.org/10.1016/j.eswa.2014.08.032
Article Google Scholar
Sherif MA, Ngomo AN, Lehmann J (2017) Wombat—a generalization approach for automatic link discovery. In: Proceedings of the Semantic Web—14Th International Conference, ESWC 2017, volume 10249 of Lecture Notes in Computer Science, pp 103–119. https://doi.org/10.1007/978-3-319-58068-5_7
Stadler C, Lehmann J, Höffner K, Auer S (2012) LinkedGeoData: a core for a web of spatial open data. Semantic Web 3(4):333–354. https://doi.org/10.3233/SW-2011-0052
Article Google Scholar
Tempelmeier N, Demidova E (2021) Linking OpenStreetMap with knowledge graphs—link discovery for schema-agnostic volunteered geographic information. Fut Gener Comput Syst 116:349–364. https://doi.org/10.1016/j.future.2020.11.003
Article Google Scholar
Tempelmeier N, Demidova E (2022) Attention-based vandalism detection in OpenStreetMap. In: Proceeding of the ACM Web Conference 2022, WWW 2022. ACM, New York, pp 643–651. https://doi.org/10.1145/3485447.3512224
Google Scholar
Tempelmeier N, Feuerhake U, Wage O, Demidova E (2021a) Mining topological dependencies of recurrent congestion in road networks. ISPRS Int J Geo-Inform 10(4):248. https://doi.org/10.3390/ijgi10040248
Article Google Scholar
Tempelmeier N, Gottschalk S, Demidova E (2021b) GeoVectors: a linked open corpus of OpenStreetMap Embeddings on world scale. In: Proceedings of the 30th ACM International Conference on Information And Knowledge Management, CIKM 2021. ACM, pp 4604–4612. https://doi.org/10.1145/3459637.3482004
von Wahl L, Tempelmeier N, Sao A, Demidova E (2022) Reinforcement learning-based placement of charging stations in urban road networks. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022. ACM, New York, pp 3992–4000. https://doi.org/10.1145/3534678.3539154
Google Scholar
Winkler WE (1999) The state of record linkage and current research problems. In: Statistical Research Division, US Census Bureau
Google Scholar
Xiang C, Jiang T, Chang B, Sui Z (2015) ERSOM: a structural ontology matching approach using automatically learned entity representation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015. The Association for Computational Linguistics, pp 2419–2429. https://doi.org/10.18653/v1/d15-1289

Download references

Acknowledgements

This research was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—WorldKG, 424985896.

Author information

Authors and Affiliations

Data Science & Intelligent Systems Group (DSIS), University of Bonn, Bonn, Germany
Alishiba Dsouza, Ran Yu & Elena Demidova
L3S Research Center, University of Hannover, Hannover, Germany
Nicolas Tempelmeier & Simon Gottschalk
Lamarr Institute for Machine Learning and Artificial Intelligence, Bonn, Germany
Elena Demidova

Authors

Alishiba Dsouza
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Tempelmeier
View author publications
You can also search for this author in PubMed Google Scholar
Simon Gottschalk
View author publications
You can also search for this author in PubMed Google Scholar
Ran Yu
View author publications
You can also search for this author in PubMed Google Scholar
Elena Demidova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alishiba Dsouza .

Editor information

Editors and Affiliations

Cartographic Communication, TU Dresden, Dresden, Germany
Dirk Burghardt
Data Science and Intelligent Systems, Computer Science Institute, University of Bonn, Bonn, Germany
Elena Demidova
Data Analysis and Visualization, University of Konstanz, Konstanz, Germany
Daniel A. Keim

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dsouza, A., Tempelmeier, N., Gottschalk, S., Yu, R., Demidova, E. (2024). WorldKG: World-Scale Completion of Geographic Information. In: Burghardt, D., Demidova, E., Keim, D.A. (eds) Volunteered Geographic Information. Springer, Cham. https://doi.org/10.1007/978-3-031-35374-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-35374-1_1
Published: 09 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35373-4
Online ISBN: 978-3-031-35374-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics