1 Introduction

CRMgeo [1] defines a formal ontology intended as a global schema for integrating spatiotemporal properties of temporal entities and persistent items. Its primary purpose is to give an adequate account of the relationship of physical things and processes to spacetime, compatible with physics, besides others by explicitly introducing the differentiation of places in the real world (phenomenal) and in the world described by information (declarative), and thus integrate geoinformation available in GIS formats under a CIDOC CRM [2] compatible form without loss of information. More generally, it aims at integrating topological information with other types of factual knowledge in a common knowledge representation formalism suited for semantic Web technologies. To do so it links the CIDOC CRM to the OGC standard of GeoSPARQL [3] , making use of the conceptualisations and formal definitions that have been developed in the Geoinformation community. CRMgeo uses and extends the CIDOC CRM (ISO21127), a general ontology of human activity, things and events taking place in spacetime. It uses the same encoding-neutral formalism of knowledge representation (“data model” in the sense of computer science) as the CIDOC CRM. It can thus be implemented in RDFS, OWL, on RDBMS, as well as in other types of encoding. The background for the development of this model lies in a growing interest in enriching cultural heritage data with precise and well identifiable descriptions of location and geometry of sites of historical events or remains, objects and natural features. On the one side, there is already a tradition of more than two decades of using GIS systems for representing cultural-historical and archaeological data and reasoning on properties of spatial distribution, vicinity, accessibility and others. These systems tended to be closed and focusing more on representing feature categories by visual symbols at different scales than integrating rich contextual object descriptions. Such systems have been being extremely successful in all kinds of “geosciences”, resource management and public administration, whereas cultural heritage is a rather marginal application area. On the other hand, archives, libraries and museums keep detailed historical records with very poor spatial determination. Often the language of the source or the local context is used. At the time of creation the meaning of such expressions could have been pretty determined, but they frequently refer to wider geopolitical units only, such as “Parthenon in Athens”. They often focus on typologies, individual objects, parts and wholes, provenance, kinds of events, participating people and influential factors, rather than precise dates and periods. This practice creates problems when current users want to integrate city plans, tourism guides, detailed excavation or restoration records. The fact that “people know quite well where the Parthenon lies” or “you’ll see it when you go to Athens” is not helpful for today’s IT systems. The two traditions, the “GIS community” and the “cultural heritage community”, have developed standards which precisely reflect the two different foci—the OGC/ISO Standards for Geographic Information which are the building blocks of the GeoSPARQL ontology and the ontology of the CIDOC CRM which is the ISO standard for representing cultural heritage information. In an attempt to combine these two standards, we experienced a surprise: there is no match at any intermediate concept between the standards, notwithstanding that the CRM was explicitly intended to interface with Open Geospatial Consortium (OGC) Standards, and that neither standard allow for expressing objectively the location of something in a way robust against changes of spatial scale and time. For instance, the CRM allows for specifying a property “P\(\ldots \) has former or current location”, without declaring if the location is or was the extent of the object, was within the extent of the object or included its extent, and at which time the location was had. Before GeoSPARQL, OGC Standards and traditional Geoinformation Systems, on the other side, allowed for assigning one (or in rare cases more) precise “geometries” to a “feature”, but did not say how the real matter of the thing with its smaller irregularities relates to them. It could be a point in the feature, a circle around it, or a centimetre-precise smoothed surface. For any “feature” there is a spatial scale at which a “geometry” of a detail cannot be compared to the geometry of the whole, and the temporal validity range is not explicitly stated even if OGC Standards provide mechanisms for doing that. What is needed is an “articulation” (linkage) of the two ontologies, i.e., a more detailed model of the overlap between the two, a model allowing for covering the underdetermined concepts and properties of both ontologies. This should be done by shared specialisations rather than by generalisations. So we took a step back and developed a model based on an analysis of the epistemological processes of defining, using and determining places. This includes an analysis of how a question such as “Is this the place of the Varus Battle?” or “Is this the place where Lord Nelson died?”, can be verified or falsified in practice, also based on geometric specifications. This required identifying various sources of factual errors as well as incorrect data appearing in such verification processes, and also questioning the truth of the very historical record. Consequently, we reached at a surprisingly detailed model which seems to give a complete account of all practical components necessary to verify such questions, in agreement with the laws of physics, the practice of geometric measurement, and archaeological reasoning. This model appears to have the capability not only to link both ontologies but also to show the way towards correct reconciliation of data at any scale and time—not by inventing precision or truth that cannot be acquired, but by quantifying or delimiting the immanent indeterminacies, which is good practice in the natural sciences.

2 Model history

The integration of detailed geoinformation with CIDOC CRM has been addressed in various research projects. The AnnoMAD System [4] used OGC standards to represent geoinformation within CRM structures while utilizing the Geography Markup Language (GML) in information objects that refer to places of cultural objects. The CLAROS project [5] used the ’Basic Geo Vocabulary’ RDF representation [6]. English Heritage created already in 2004 the CRM-EH [7], an extension to the CIDOC CRM for archaeological excavations with single context recording. The extension used the concept of Spatial Coordinates within the CRM to relate to spatial X, Y and Z coordinates through datatype properties. Recent work of Paul Cripps extended CRM-EH and related it to GeoSPARQL in his GSTAR project [8]. At the German National Museum the WissKI project [9] started an initiative to investigate the possibilities of integrating coordinate information within the CIDOC CRM [10]. A research project at ICS-FORTH led to the first definition of CRMgeo in a technical report [1] and the extension was presented to the archaeological community at the CAA 2013 conference [11]. One main concept of CRMgeo, the Spacetime volume, was subsequently integrated in the CIDOC CRM version 6, because it was regarded as fundamental to basic cultural historical reasoning. In this paper we present the current version of CRMgeo together with the spatiotemporal concepts introduced from CRMgeo into the “core” CIDOC CRM itself.

Fig. 1
figure 1

CIDOC CRM 6.2 view of Spacetime volume (E92) and spatial and temporal projections to Place (E53) and Time Span (E52)

3 Core concepts of CIDOC CRM 6.2 to represent spatiotemporal properties of Periods (E4) and Physical Things (E18)

The introduction of the Spacetime volume first in CRMgeo and then “lifting it up” to CIDOC CRM allows for an integrated view of space and time in the context of CIDOC CRM domains. In the current version of CIDOC CRM (6.2) the continued analysis of the relationship between existing CRM classes and the new Spacetime volume concept established Period (E4) as subclass of Temporal Entity (E2) and of Spacetime volume (E92). The latter is intended as a Phenomenal Spacetime volume as defined in CRMgeo which will be discussed in chapter four. By virtue of this multiple inheritance we can discuss the physical extent of a Period (E4) without representing each instance of it together with an instance of its associated Spacetime volume (E92). This model combines two quite different kinds of substance: an instance of Period (E4) is a set of coherent phenomena while a Spacetime volume (E92) is an aggregation of points in spacetime. The real spatiotemporal extent of an instance of Period (E4) is the spreading out and sphere of influence of the constituent phenomena, such as the actions of the citizens of the Roman Empire and the areas they controlled and effectively claimed. Its identity and existence depends uniquely on the identity of the instance of Period (E4). This is why we call respective extents in space, time or spacetime to be “phenomenal”. They are regarded to be unique with all their details and fuzziness, but ultimately distinct from all geometric determinations or approximations that would take their identity from a human declaration, which we will, therefore, call to be “declarative”. Therefore, this multiple inheritance is unambiguous and effective and furthermore, corresponds to the intuitions of natural language. The same multiple inheritance is applied to Physical Thing (E18), making it a subclass of Legal Object (E72) and of Spacetime volume (E92), the latter again being a phenomenal one in the sense of CRMgeo, because it is determined by the phenomenon of the actual presence of the matter of the Physical Thing (E18). This construct allows for a more condensed information representation because properties of Spacetime volumes can directly be attached to Periods (E4) and Physical Thing (E18) without the need to introduce a separate Spacetime volume instance. The established CRM concepts of Place (E53) and Time-Span (E52) are now defined as spatial and temporal projections of a Spacetime volume (E92) which is the unique extent of a period or thing. Space Primitives (E94) have been introduced to express geometries on or relative to earth, or any other stable constellations of matter, relevant to cultural and scientific documentation. They should be implemented with appropriate validation, precision and references to spatial coordinate systems. Within a historic discourse or research question it is often of relevance where a Physical Thing (E18) was at a specific time or what extent in space was covered by a specific Period (E4) during a specific Time Span (E52). For this purpose Presence (E93) as subclass of Spacetime volume (E92) was created to define “snapshots” of a Spacetime volume (E92), i.e. intersections of a Spacetime volume (E92) with all space restricted to a particular time-span, such as the extent of the Roman Empire during 33 B.C., or the extent occupied by a museum object at rest in an exhibit. Since determining the spatial extent of such things can in general not be done for infinitesimally small time-spans, we define Presence (E93) as a possibly “thin” spacetime volume itself, which we can then project on the space axes to obtain “where the thing was” during that time. Figure 1 illustrates the new classes Spacetime volume (E92), Presence (E93) and Space Primitive (E94) and their relations to previously existing CRM classes. We want to use the fight of the English HMS Victory and the French ship Redoubtable in the Battle of Trafalgar as an example to illustrate Spacetime volume (E92), Presence (E93) and how one Spacetime volume may be projected to two Places (E53) that are at rest relative to (P157) different Physical Things (E18). The fight of the two ships in which Lord Nelson was shot and subsequently died is illustrated in Fig. 2. For a better illustration of CIDOC CRM and CRMgeo spatiotemporal classes and relations we let the fight end with the sinking of the Redoubtable, which was actually not the case.

Fig. 2
figure 2

The fight of the HMS Victory and the Redoubtable in the Battle of Trafalgar illustrating Spacetime volume (E92) and Presence (E93) and their projection to different Places (E53)

The starting point of the modelling is the unique Spacetime volume (E92) of the fight, starting at the first shot fired between the two ships and ending with the sinking of the Redoubtable. We model two different instances of Presence (E93) at crucial points in time during the battle. One of them is the shooting of Lord Nelson by a French sharpshooter. For an historian who wants to reconstruct the situation of Lord Nelson’s shooting, the positions and movements of Lord Nelson, other crew members and the sharpshooter are of importance in relation to the ship. Therefore, the unique Spacetime volume (E92) of the fight and the Presence (E93) at the time of the shooting may be projected to a place that is at rest in relation to the HMS Victory. For an archaeologist interested in the remains of the Redoubtable on the seafloor it is important to formulate hypotheses in relation to the seafloor, where we would expect to find debris of the fight and the wreck of the Redoubtable. The unique Spacetime volume (E92) of the fight and the Presence (E93) at the time of the sinking of the Redoubtable may be projected to a place that is at rest in relation to the seafloor. Therefore, depending on the research question, the same event, i.e., the unique spacetime volume, may be projected either to a place that is at rest in relation to the HMS Victory or projected to a place that is at rest in relation to the seafloor. Each projection creates a different Place (E53). The Place on the ship ceases to exist when the HMS Victory ceases to exist. The Place on the seafloor ceases to exist when the seafloor disappears under the continental plate (which can be relevant in other scholarly settings, such as palaeontology). The Spacetime volume does not cease to exist as long as the temporal and spatial dimensions exist (a black hole would end these dimensions).

Fig. 3
figure 3

Phenomenal Spacetime volume (SP1) of the sinking of the Redoubtable, its spatial and temporal projections and Declarative Places (SP6) and Declarative Time Spans (SP10) approximating the phenomenal ones

4 Spatiotemporal refinement in CRMgeo: differentiating between phenomenal and declarative spacetime volume, place and time span

The CRMgeo ontology explicitly introduces the differentiation of “phenomenal” and “declarative” extents in space, time or spacetime in the real world (phenomenal) and the world described by information (declarative). In the real world, exact spatiotemporal properties of phenomena [Periods (E4) or Physical Things (E18)] can not be known due to factors such as fuzzy boundaries of the phenomena and errors in measurements. Nevertheless, the spatiotemporal properties exist and CRMgeo introduces them as Phenomenal Spacetime volume (SP1), Phenomenal Place (SP2) and Phenomenal Time Span (SP13) as subclasses of Spacetime volume (E92), Place (E53) and Time Span (E52). They derive their identity from a phenomenon that has occupied or still occupies a unique Spacetime volume (E92).

Originating in the world described by information, Declarative Spacetime volume (SP7), Declarative Place (SP6) and Declarative Time Span (SP10) derive their identity from a human declaration. These may be coordinates derived from a measurement or a map for a Place (E53) or dates from a historic source for a Time Span (E52). Figure 3 shows how phenomenal and declarative subclasses are introduced for Spacetime volume, Place and Time Span and their symmetry in each superclass. To illustrate the concepts of phenomenal classes we want to create instances for the sinking of the Redoubtable in the real world. The Phenomenal Spacetime volume of the sinking is unique and derives its identity from the sinking event. The exact spatial and temporal extent of the Phenomenal Spacetime volume is not knowable but it exists. Hence we can create a Phenomenal Spacetime volume (SP1) for the sinking event, a Phenomenal Place (SP2) for its spatial projection and a Phenomenal Time Span (SP13) for its temporal projection. To illustrate the declarative classes, we create instances for the approximation of the wreck location and sinking time. The sinking of the Redoubtable may be approximated in human knowledge through the use of available information from log books and sea charts used in the battle. This information will be modelled as Declarative Places (SP6) and Declarative Time Spans (SP10). A sinking time of 21st of October 1805, 3 pm assumed based on log book records is a Time Expression (SP14) that defines a Declarative Time Span (SP10). This Declarative Time Span (SP10) has the purpose to approximate the Phenomenal Time Span (SP13) of the sinking of the Redoubtable. The same method can be applied to the Place of the sinking of the Redoubtable; coordinates taken from a battle sea chart define a Declarative Place (SP6) that approximates the Phenomenal Place (SP2) of the sinking. Now we will show how this model can help to find the wreck of the Redoubtable through the creation of Declarative Places (SP6). We assume that the sinking location of the Redoubtable was marked on the sea chart of the HMS Victory with an ’x’ creating our first Declarative Place (SP6). Depending on the scale of the sea chart, the pen size of the ’x’ and the methodology of estimating the ships position a maximum error of the location can be estimated [12]. This maximum error creates a polygon, typically a circle (our second Declarative Place), around the coordinates derived from the HMS Victory sea chart. To infer the wreck location on the sea floor from its sinking location we need an estimation of the maximum possible drift of a sinking ship, based on factors such as the depth of the sea and prevailing currents. We can add the drift as a buffer to our second Declarative Place and create a third outer bounding Declarative Place that contains the Phenomenal Place of the wreck if the estimates of drift and maximum error of the chart location are correct. Let’s assume in addition that a French ship observed the sinking of the Redoubtable and made an ’o’ on their sea chart. The same process of creating an outer bound Declarative Place is applied, this time based on the properties of the French sea chart like the size of the ’o’, the scale, the coordinate reference system and others. The result of our modelling is one Phenomenal Place for the wreck location and two outer bound Declarative Places approximating the wreck location. To represent the two outer bound Declarative Places with coordinate information and calculate the overlap between them to look for the ship wreck we use GeoSPARQL concepts to represent and serialize geometries and GeoSPARQL topological relations and queries to calculate the overlap. To see how this works we will give a short overview on GeoSPARQL and then show how CRMgeo classes relate to GeoSPARQL classes.

Fig. 4
figure 4

GeoSPARQL overview (components on the left and classes with properties to the right)

5 GeoSPARQL overview

SPARQL is a protocol and query language for the Semantic Web defined in terms of the W3C’s RDF data model in much the same way as SQL is a query language for relational databases. GeoSPARQL defines a spatial extensions to the W3C’s SPARQL protocol and RDF query language and provides a framework how to implement OGC Standards with semantic technologies through RDF/OWL encoding. Its introduction allows the integration of RDF specified information models with the OGC/ISO standards developed in the geoinformation community. It provides the foundational geospatial vocabulary for linked data and defines extensions to SPARQL for processing geospatial data. In this context we want to concentrate on four GeoSPARQL modules.

  1. 1.

    Core component: defines top-level RDFS/OWL classes for spatial objects

  2. 2.

    Geometry component: defines RDFS data types for serialising geometry data, RDFS/OWL classes for geometry object types, geometry-related RDF properties, and non-topological spatial query functions for geometry objects

  3. 3.

    Geometry topology component: defines topological query functions

  4. 4.

    Topological vocabulary component: defines RDF properties for asserting topological relations between spatial objects

The core component contains two main classes. The root class within the hierarchy of the GeoSPARQL ontology is SpatialObject representing everything that can have a spatial representation. Its subclass Feature represents a real-world object whose properties are under observation. The geometry component defines a vocabulary for asserting information about geometry data. A single root class Geometry is defined as a subclass of the SpatialObject class defined in the core component. To represent the actual coordinates of a Geometry, a so called Serialisation is used. That means that the coordinates are stored in a format which defines the sequence of the characters. The two OGC formats Well Known Text (WKT) and Geography Markup Language (GML) are defined as Serialisations and they build the base for subclasses of the geometry class. Figure 4 illustrates the four GeoSPARQL components on the left and the introduced classes on the right.

Fig. 5
figure 5

CIDOC CRM and CRMgeo 1.2 classes and their relation to GeoSPARQL classes

6 Linking CIDOC CRM and CRMgeo concepts to GeoSPARQL

The model of CRMgeo 1.2 incorporates the changes realised in CIDOC CRM 6.2. Making the Spacetime volume (E92), which is intended a Phenomenal Spacetime volume (SP1), a super class of Period (E4) and Physical Thing (E18) enables us to define geosparql:Feature as a superclass of Phenomenal Spacetime volume (SP1). Period (E4) and Physical Thing (E18) will then inherit the properties of geosparql:Feature, in particular the elaborated topology relations that can be applied between geosparql:Spatial Objects, the superclass of geosparql:Feature and geosparql:Geometry. The introduction of a new class Geometry (SP15) in CRMgeo 1.2 allows for an easier implementation as it comprises the union of geometric definitions and the declarative places that these geometries define. The new relationships at class level between CRMgeo 1.2 and Geo-SPARQL are illustrated in Fig. 5.

7 Example of approximating the Redoubtable wreck

We want to show in Fig. 6 now a graph modelled in CIDOC CRM, CRMgeo and GeoSPARQL that represents the approximation of the Redoubtable wreck with the two outer bound Declarative Places created from the HMS Victory sea chart and the sea chart of the French ship. The two outer bounds are instantiated through CRMgeo Geometry (SP15) objects that were created from the coordinate information of the location of the sinking event derived from the two sea charts, the maximum error of the charts, and the maximum drift. These outer bound Declarative Places use the GeoSPARQL topological relation geo:sfContains to state the hypothesis that the wreck of the Redoubtable will be found within them.

Fig. 6
figure 6

Approximating the location of the Redoubtable wreck modelled with CIDOC CRM, CRMgeo and GeoSPARQL

8 Conclusion

Making the Spacetime volume (E92) in CIDOC CRM a superclass of Period (E4) and Physical Thing (E18) introduces an integrated view of space and time into the CRM that allows for spatiotemporal modelling and reasoning based on semantic relations between CRM instances. Before the introduction of the Spacetime volume there was either temporal or spatial reasoning but the inherent relation between the two could not be modelled. The differentiation between phenomenal and declarative Spacetime volume, Place and Time Span in CRMgeo defines identity criteria for real world spatiotemporal properties of Periods and Physical Things and spatiotemporal properties created from information sources like historical documents, maps, observations or measurements. This differentiation allows for a modelling and reasoning of the relations between real world locations and temporal extents of things and events and the available information about their locations and temporal extents. This is of particular interest when trying to determine the actual location of a thing, based on several information sources like nautical charts and logbooks from different nations that use different reference systems, scales and units. The linking of CRMgeo to GeoSPARQL allows for a representation of coordinate information compliant with OGC standards and the application of the elaborated topological relations GeoSPARQL defines. As a result the concepts of CRMgeo enable information integration on a spatiotemporal level based on the semantics defined in CIDOC CRM and making use of the technologies and definitions based on OGC standards.