5.1 Representing Scenarios

The division of a knowledge base \(\mathcal {K}= (\mathcal {T}, \mathcal {A})\) into an ontology \(\mathcal {T}\) and a scenario \(\mathcal {A}\), as introduced in Sect. 3.1, is not only formal, but also motivated by practice. Fulfilling the role of a schema, an ontology needs to be ingested into a data infrastructure, or implemented by it, only once; frequent updates are undesirable, since they require a reannotation of data. For the scenarios handled by digital platforms, obversely, data retrieval and ingest are routine operations, and so are updates, since they need to occur whenever the represented reality changes, e.g. a new service is offered or a new user is registered. Challenges related to I/O (or ingest and retrieval) mainly concern the scenarios, not the ontologies, and their standardized representation by files, streams or protocols is the main vehicle for syntactic interoperability.

Since the IRIs of resources on the semantic web can point to each other as freely as the URLs of sites on the World Wide Web, i.e. in a graph-like way, it is natural to visualize scenarios by graphs. These representations are referred to as knowledge graphs. In Sect. 3.1, a scenario was defined as a tuple \(\mathcal {A}= (\mathbf {I}, A _\mathrm {c}, A _\mathrm {r}, H )\) with individual names \(\mathbf {I}\), conceptual assertions \( A _\mathrm {c}\), relational assertions \( A _\mathrm {r}\) and elementary datatype property assertions \( H \). The corresponding knowledge graph is a labelled graph \( G = (\mathbf {I}, E , \varLambda _\mathrm {v}, \varLambda _\mathrm {e})\) where the vertices are given by \(\mathbf {I}\) and the edges by

$$\begin{aligned} E ~ = ~ \{( I , J ) \,\mid \, \exists R \in \mathbf {R}: ~ ( I , R , J ) \in A _\mathrm {r}\} ~ \subseteq ~ \mathbf {I}^2. \end{aligned}$$
(5.1)

Vertices are labelled according to the function \( \varLambda _\mathrm {v}: \mathbf {I}\rightarrow 2^{\mathbf {C}\cup \mathbb {R}\cup \Sigma ^ \star }\) that mapsFootnote 1 each individual name \( I \in \mathbf {I}\) to a set of labels

$$\begin{aligned} \varLambda _\mathrm {v}( I ) ~ = ~ \left( A _\mathrm {c}( I ) \,\cap \, \mathbf {C}\right) \,\cup \, \{ v \in \Sigma ^ \star \,\mid \, \exists k \in \Sigma ^ \star : ~ ( k , v ) \in H ( I )\}, \end{aligned}$$
(5.2)

while the edge labelling function \( \varLambda _\mathrm {e}: E \rightarrow 2^\mathbf {R}\) assigns the corresponding relation names

$$\begin{aligned} \varLambda _\mathrm {e}\left( ( I , J )\right) ~ = ~ \{ R \in \mathbf {R}\,\mid \, ( I , R , J ) \in A _\mathrm {r}\} \end{aligned}$$
(5.3)

to an edge \(( I , J ) \in E \).

An example is given in Fig. 5.1; this knowledge graph might be read as follows: “There is a course labelled ‘CECAM SWiMM 2021’. This course has a syllabus, in which information is given on an instructor who is labelled ‘Jean-Pierre’ and ‘Minier’. The course has a training unit labelled ‘Salome/YACS’ for which event information is given,” etc. While this particular representation does not contain IRIs of datatype properties (to match the definition of the knowledge graph given above), it could easily be modified to incorporate this information as well, e.g. by using property graphs following Abad Navarro et al. [1]. The individual name IRIs are not shown in the figure to simplify the visualization; however, they are included in the definition of the knowledge graph.

Fig. 5.1
figure 1

Part of the knowledge graph corresponding to a scenario describing a training event, using the ontology OTRAS in combination with the Course Curriculum and Syllabus Ontology (CCSO) [2], cf. Sect. 3.5. The elliptical vertices represent individuals, labels inside the ellipses denote the concept instantiated by the respective individuals and labels in italics represent values associated with the individuals by means of elementary datatype properties, while the arrows (edges) represent relations between individuals and are labelled with the respective relation names

The technical implementation of semantic interoperability requires a syntactic representation by which information can be extracted from (or ingested into) a digital platform including a knowledge base; cf. Fig. 5.2 for a typical multi-tier design approach. For this purpose, subject-predicate-object triples can be employed, e.g. in TTL format (cf. Sect. 3.1), by which the scenario from Fig. 5.1 is rendered as follows:

Fig. 5.2
figure 2

Role of semantic technology within interoperable data infrastructures, illustrated for the case of a typical multi-tier architecture; a multitude of such platforms, which may be substantially more complex than outlined here, has been emerging in recent years. JSON is often used as a convenient format for communicating object data through HTTP-based APIs. Ontologies support reasoning in the logical application layer as well as interoperability between multiple platforms

figure a

Above, @prefix statements introduce the abbreviations employed for IRI prefixes, e.g. the datatype property https://w3id.org/ccso/ccso#csName is abbreviated by ccso:csName. The elementary datatypes follow the conventions for XML schemas, cf. Chapter 2.

TTL notation has the advantage that it can be employed consistently for the whole knowledge base, including both the ontology and the scenario. For many applications, however, this is more problematic than beneficial, because the expressive power of OWL and its various serializations (including TTL) goes far beyond what is needed to represent objects and their properties; consequently, it is harder to parse and to process. Moreover, it cannot be ensured at the syntax level that only information on the scenario is included. Instead, JavaScript Object Notation (JSON) is often preferred, particularly in its JSON Linked Data (JSON-LD) variety which was specifically designed for the purpose of exchanging semantically characterized information on objects and their relations. In JSON-LD format, the example scenario becomes

figure b

There, every pair of curly braces encloses the description of an object (except the value of @context, which includes the IRI prefix definitions), given as a sequence of key-value pairs. The individual names are provided as values corresponding to the key @id, while the instantiated concept names are indicated by the key @type. The other keys are relation names, and the associated values are the third elements of the respective triples, as can be seen from the direct correspondence between the TTL and JSON-LD examples given above.

Additionally, domain-specific solutions on the basis of the hierarchical data format HDF5 facilitate combining a greater volume of data, including binary data, with the corresponding semantic annotation [3], e.g. the H5MD format [4] for semantically enriched data in molecular modelling and simulation. The VIMMP marketplace platform API and its Zontal Space back end permit handling annotated digital objects through the HDF5-based Allotrope Data Format (ADF) [5,6,7].

5.2 Top-Level Ontology

For a fundamental philosophical underpinning, the European Materials and Modelling Ontology [8, 9] relies on a combination of physicalist mereotopology following Varzi [10] and a nominalist reinterpretation of Peirce’s semiotics [11]. Therein, physicalist mereotopology primarily addresses the description of materials, which is extended by nominalist semiotics to describe modelling, simulation and experiments. For a discussion of nominalism, cf. Lewis [12], more specific implications of the approach of the EMMO on representing modelling and simulation of physical systems have been discussed elsewhere [13].

To facilitate the top-level ontology alignment of the VIMMP ontologies, a module with a scaled-down EMMO in TTL format is included, EMMO version 1 simplified (EMMO1s), which at the present stage (version 1.0.4) is based on EMMO version 1.0.0 alpha 2 (April 2020). EMMO1s provides user-friendly IRIs for EMMO concepts,Footnote 2 retaining the labels, e.g. the IRI of the EMMO concept with rdfs:label “Semiosis” is given in the original EMMO as emmo-semiotics:EMMO_008fd3b2_ 4013_451f_8827_52bceab11841. For these entities, EMMO1s specifies aliases that can be accessed directly through the label, such as emmo1s:Semiosis. In the interest of notational clarity, to indicate the origin of the concept definitions and the respective EMMO modules, these entities will here be denoted by the EMMO prefix followed by the EMMO1s suffix, e.g. by emmo-semiotics:Semiosis, even though internally, for VIMMP, it is actually emmo1s:Semiosis.

The VIMMP Primitives (VIPRS) module amplifies the ways in which the EMMO-based top-level semantic interoperability architecture can be applied to the relations characterizing metadata from the VIMMP marketplace-level domain ontologies.Footnote 3 With this aim, VIPRS extends the EMMO system of top-level relations by three features:

  1. 1.

    modal logic (e.g. Kripke semantics) and modal squares of opposition;

  2. 2.

    concatenation of mereotopological and semiotic relations, yielding mereosemiotic relations;

  3. 3.

    top-level datatype properties.

While the EMMO can be used to describe materials and models as such, statements on necessity and possibility anchored in modal logic are metaontological, i.e. beyond the ontology, from the point of view of the EMMO [9], e.g. within the framework of the EMMO, an event can be described as a physical process, but the statement that “this process can possibly occur, but it will not necessarily occur” cannot be expressed. The present domain ontologies, however, make ample use of relations that are ultimately modal to specify capabilities (it is possible that :X will be used to do :Y) or requirements (if is necessary that if :X occurs, :Y also occurs).

Fig. 5.3
figure 3

Selected traditional (top) and generalized (bottom) modal squares of opposition from VIPRS. Here, stands for “\( I \) occurs”, and \( K _ I ~\mathsf {C}~ I \) stands for “\( K _ I \) conceptualizes \( I \)”, arrows denote subsumption and solid lines denote complementarity (top) and logical negation with respect to the modal formula (bottom)

To provide a top-level structure for modal relations, VIPRS includes modal squares of opposition,Footnote 4 cf. Fig. 5.3, by which the presence of individuals in a knowledge base can be associated with statements on whether their occurrence is possible, necessary, factual or fictional [16]. The modal operators can be given a variety of interpretations, depending on the precise use that is made of the ideas of necessity (\( \Box \)) and possibility (\( \Diamond \)), respectively [17]; similarly, the definition of “occurrence” depends on the use that is made of the ontology and may depend on context—VIPRS accepts this ambiguity in order to be applicable to diverse types of knowledge bases and infrastructures. The term “to occur” in , “:X may occur,” and similar, is employed to refer to the (possible or necessary) appearance of an individual :X in a certain type of environment, e.g. as an element of a valid simulation workflow. On this basis, relations concerning the possible or necessary co-occurrence of multiple individuals are defined, e.g. viprs:n_loc_or_rnoc (and others following the same pattern, cf. Fig. 5.3), where the IRI is to be read as “necessarily, the left occurs or the right does not occur”

(5.4)

cf. Fig. 5.3. Thereby, “occurrence” (by appearing in a certain type of environment) is not the same as “existence,” i.e. presence in a knowledge base. It is in this sense that VIPRS can be employed as an implementation of possible-world semantics, Kripke semantics and/or ontological Meinongianism [16], even though it does not necessarily presuppose the use of any of these paradigms. The conceptualization relation

(5.5)

with \( K _ I \,\mathsf {C}\, I \) to be read as “\( K _ I \) conceptualizes \( I \),” relates a more (or equallyFootnote 5) generic individual to a more (or equally) specific one; it is used to introduce a step of abstraction into the modal co-occurrence relations, e.g. “necessarily, the left occurs conceptual-or the right does not occur”

(5.6)

Relations from the EMMO are mereological (or, more properly, mereotopological [10, 18, 19]), represented here at the highest level by proper parthood

$$\begin{aligned} \mathsf {P}~ \equiv ~ \textsf {{viprs:is\_proper\_part\_of}} ~ \equiv ~ {\textsf {{emmo-mereotopology:hasProperPart}}}^ - , \end{aligned}$$
(5.7)

and semiotic, represented at the highest level by the sign-to-object reference relationFootnote 6

(5.8)

cf. Expressions (3.6) and (3.7). To facilitate ontology alignment, which is discussed in Sects. 5.3 and 5.4, VIPRS also contains mereosemiotic chain products of these fundamental relations, i.e. elements of the free semigroup \(\mathbf {R}_\mathrm {ms}^+\) over \(\mathbf {R}_\mathrm {ms}= \{\mathsf {P}, \mathsf {S}, {\mathsf {P}}^ - , {\mathsf {S}}^ - \}\), with the product defined by concatenation. The mereosemiotic relations for which there is an explicit definition in VIPRS are limited to \(\mathbf {R}_\mathrm {ms}\, \cup \, \mathbf {R}_\mathrm {ms}^2 \, \cup \, \mathbf {R}_\mathrm {ms}^3\), i.e. relations generated by a sequence of up to three fundamental relations which are not redundant (\(\mathsf {P}\circ \,\mathsf {P}\) and its inverse),Footnote 7 complete (or almost complete), i.e. relating everything to everything, except possibly for a single “universe” entity,Footnote 8 as it is the case for \(\mathsf {P}\circ {\mathsf {P}}^ - \), or consist of three elements from the same category, e.g. \({\mathsf {S}}^ - \circ \,\mathsf {S}\,\circ \,{\mathsf {S}}^ - \) is excluded, because all three constituent elements are semiotic. In the nomenclature employed by VIPRS, the IRI elements ip, hp, is and hs stand for “is proper part,” “has proper part,” “is sign” and “has sign,” respectively. Accordingly, the binary chain relations include

(5.9)

while the ternary chain relations include

(5.10)

With minor exceptions, datatype properties (owl:DatatypeProperty) are absent from the EMMO [9]; by the domain ontologies, however, datatype properties are amply employed to associate objects with textual (xs:string), numerical (xs:decimal) attributes and xs:boolean flags. Figure 5.4 visualizes the hierarchy of top-level datatype properties introduced in VIPRS. At the highest level, VIPRS categorizes datatype properties according to their role:

  • Identification of an object is positioned below viprs:has_identifier; examples include otras:has_topic_code, which maps a materials modelling topic (otras:mm_topic) from OTRAS to a four-digit code. Each topic code uniquely corresponds to one topic, and its purpose is identification.

  • Where an elementary-datatype entry is the content (or part of the content) of an object, datatype properties below viprs:has_content are used, e.g. this applies to textual or numerical content of MODA from entries (in OSMO, aspects), corresponding to osmo:has_aspect_text_content and osmo:has_aspect_text_content [20, 21], cf. Section 3.3.

  • Elementary descriptors, specifiers and similar metadata that provide additional, contingent information on objects, viprs:has_specifier is used, e.g.  otras:has_cited_video_duration_seconds points to a metadata item on the length of a video. This contributes to our knowledge about the video by specification, while it does not permit its identification; moreover, the video duration is information about the video content, but it is not itself the content. Therefore, otras:has_cited_video_duration_seconds \( \sqsubseteq \) viprs:has_specifier.

At the second level, the datatypes are distinguished (string, decimal or Boolean). Further below, at the third level, the textual datatype properties are further split into subproperties according to their function (cf. Fig. 5.4).

Fig. 5.4
figure 4

Hierarchy of datatype properties from VIPRS, version 1.0.1; arrows denote subsumption (\( \sqsubseteq \))

5.3 Ontology Matching

A major design goal for a top-level ontology consists in achieving the desired level of expressivity with a minimal repertoire of basic terms and relations. Obversely, to ensure interoperability for services and tools interoperating at the level of a specific digital platform, the employed ontologies need to capture detailed characteristics of data pertaining to a particular domain of knowledge. Accordingly, the structure of the corresponding semantic space at the lower level is comparably complex, e.g. the ontologies from VIMMP contain about 1000 concepts, 550 relations (object properties) and 180 elementary datatype properties. Therefore, by design, the EMMO needs to have a structure that is substantially different from that of the marketplace-level ontologies [7]. To ensure that the EMMO is consistently employed at all levels, so that it can contribute to platform and service interoperability as far as possible, the marketplace-level ontologies need to be aligned with the EMMO. Before returning to this specific problem, the present section summarizes some of the related theoretical concepts.

In principle, semantic assets are designed to allow data integration and overcome the data heterogeneity problem; in reality, semantic heterogeneity does arise, and it grows over time as resources are added to the semantic web. This is known as the Tower of Babel problem [22, 23]. While some authors regard any presence of semantic heterogeneity as a failure of semantic interoperability and hope for universal agreements, others think that it is unavoidable and look for strategies to deal with it. This may involve a standardized way of documenting semantic assets; basic agreements on the approach to ontology design; and the formalizations of roles, procedures and good practices (or best practices), aiming at pragmatic interoperability [24,25,26,27]. For this approach, the challenge consists in agreeing and specifying how the semantic space is structured, documented and employed in practice; by raising the domain for which universal agreements are pursued from the ontological level to the metaontological level, “the Tower of Babel becomes a Meta-Tower of Babel” [28].

As a consequence, semantic heterogeneity is seen as a necessary property of the semantic web, and ontology matching and integration become basic features of its successful mode of operation, rather than an expression of incompleteness. Options for implementing such a mode of operation have been extensively discussed in the literature, first for schemas and then for ontologies, cf. Noy [29] as well as Euzenat and Shvaiko [30]. The common challenge is how to make use of the knowledge represented in two ontologies, which can differ at various levels (language used, expressivity, modelling paradigm, etc.). Typically, such challenges arise if there is an overlap in the domains of knowledge addressed by multiple ontologies, such that data annotated in diverse ways need to be combined and processed together, or if a platform employs multiple domain ontologies that are based on different top-level ontologies. Typical applications include, e.g. simultaneous querying of multiple knowledge bases [31,32,33,34] or, as addressed here, the mapping of semantic content from a source ontology \(\mathcal {S}\) to a target ontology \(\mathcal {T}\).

Such a mapping \( \alpha \), by which a scenario \(\mathcal {A}_\mathcal {S}\) expressed in the source ontology is mapped to a \(\mathcal {A}_\mathcal {T}\) expressed in the target ontology, is an ontology alignment. Equivalently, this can be applied to the corresponding knowledge graphs, \( \alpha : G _\mathcal {S}\mapsto G _\mathcal {T}\). The process by which an alignment is constructed is known as ontology matching [35]. Alignments can be probabilistic or deterministic, e.g. in a probabilistic formalism, it might be stated that “an osmo:condition that osmo:contains_variable an evmpo:material_property has a 40% probability of being an emmo-models:Physics BasedModel”, cf. Suchanek et al. [36]. For the present purpose, we restrict ourselves to deterministic alignments, based on rules that are asserted to be valid in general. If such an alignment is formulated coherently and correctly, the source and target scenarios need to be semantically consistent, i.e. the assertions from the target scenario may not contradict the assertions from the source scenario, which can be checked in multiple ways:

  1. 1.

    Immanently (ontologically), on the basis of a series of alignments \( \alpha \circ \alpha ' \circ \dots \), at the end of which another version of the scenario expressed in the source ontology is obtained. Then the consistency of the original and final scenarios can be determined on the basis of the rules from the source ontology \(\mathcal {S}\).

  2. 2.

    Transcendentally (metaontologically), either by creating a new ontology that encompasses both \(\mathcal {S}\) and \(\mathcal {T}\), containing rules in which concepts or relations from both ontologies occur jointly, or alternatively by a different system of—possibly human—arbitration that can detect contradictions between \(\mathcal {A}_\mathcal {S}\) and \(\mathcal {A}_\mathcal {T}\).

Under the constraint of consistency, it is the main challenge to preserve as much of the originally given information as possible. Test scenarios, for which the desired target representation is known, can be used to validate the alignment [34]. Moreover, alignment rules, whether probabilistic or deterministic, can be obtained by evaluating corpora of data that are annotated in both the source and target ontologies [35, 37]; in the probabilistic case, however, the outcome can be assumed to apply only as long as the population or corpus underlying the statistical analysis from which the probabilities were determined is representative of a class of scenarios to which \(\mathcal {A}_\mathcal {S}\) belongs. Simple alignment correspondences [38] can be specified by categorically subsuming concepts and relations from \(\mathcal {S}\) under those from \(\mathcal {T}\), yielding relabelling rules [39] that do not affect the graph structure (only the labels) and that are context free, i.e. independent of adjacent vertices and edges, such as

$$\begin{aligned} \textsf {{vivo:evaluates}} ~ \sqsubseteq ~ \mathsf {S}, \end{aligned}$$
(5.11)

stating that whatever evaluates an object, by implication, always also is a sign for that object. Besides, qualified subsumptions can be formulated, such as

$$\begin{aligned} \exists ({\textsf {{vivo:evaluates}}}^ - ).\textsf {{evmpo:assertion}} ~ \sqsubseteq ~ \textsf {{emmo-semiotics:Object}}, \end{aligned}$$
(5.12)

i.e. that which is evaluated by an assertion is an “object” in the sense of Peircean semiotics; this is a context-sensitive rule, since the relabelling of the vertex (individual) is contingent on one of the edges, namely, an incoming edge with the label vivo:evaluates. Beyond this, more complex graph transformation rules [40] can be applied in the case that the transformation goes beyond relabelling, i.e. if vertices or edges in the knowledge graph need to be eliminated or created by applying m : n property chain correspondences [38].

Fig. 5.5
figure 5

Fundamental categories, superclasses and selected subclasses from the EVMPO (ellipses), version 1.3.1, together with related concepts from EMMO version 1.0.0 alpha 2 (rectangles); arrows between concepts denote subsumption, and double lines between concepts denote equivalence

5.4 VIMMP-EMMO Alignment

To permit the transformation of a knowledge graph from the way in which it appears to the VIMMP marketplace platform to the more abstract representation required for interoperability within a heterogeneous ecosystem of platforms mediated through the EMMO, both concepts and relations need to be aligned between the (VIMMP marketplace) domain level and the (EMMO) top level. This is realized by an ontology module for EMMO-VIMMP Integration (EVI). For the present purpose, accordingly, \(\mathcal {S}\) is the VIMMP system of ontologies, including the EVMPO (but excluding VIPRS), and \(\mathcal {T}\) is the EMMO, in the case of concepts, and the EMMO in combination with VIPRS, in the case of relations. In the absence of co-annotated corpora that can be analysed automatically, the correspondences were all specified explicitly, by evaluating the concept and relation definitions from the EMMO alpha version in comparison with the respective definitions from the VIMMP ontologies.

Concerning the conceptual alignment, Fig. 5.5 shows how the categories from the EVMPO, cf. Sect. 3.2, are mapped to EMMO concepts. The red arrows and double lines in Fig. 5.5 represent this alignment, which is itself expressed as an ontology and implemented in the EVI module. This part of the alignment guarantees that all VIMMP domain-ontology concepts are subsumed under EMMO concepts (where they are all situated below emmo-physical:Physical taxonomically), since all of these concepts are either subclasses of one of the fundamental paradigmatic categories from the EVMPO or of the fundamental non-paradigmatic category evmpo:annotation.

Beyond this, the concepts from the domain ontologies are aligned with the EMMO down to a comparably fine-grained level; this is also implemented in EVI.Footnote 9 Table  5.1 contains the EVI statements corresponding to the concepts that were listed as examples in Chap. 3.

Table 5.1 Alignment between selected concepts from the VIMMP marketplace-level ontologies (source ontology \(\mathcal {S}\)) introduced in Sects.  (top), (middle) and (bottom) and the EMMO top-level ontology (target ontology \(\mathcal {T}\))
Table 5.2 Alignment between selected relations from the VIMMP marketplace-level ontologies (source ontology \(\mathcal {S}\)) introduced in Sects. 3.3 (top), 3.4 (middle) and 3.5 (bottom) and VIPRS in combination with the EMMO (target ontology \(\mathcal {T}\))
Fig. 5.6
figure 6

MMTO relation hierarchy, version 1.3.1, showing the subsumption (arrows) of relations from the MMTO (rectangles) under relations from OSMO (hexagons), the EVMPO (rounded boxes) and VIPRS (ellipses)

The relational alignment, which is shown for the MMTO in Fig. 5.6 and for the examples from Chap. 3 in Table  5.2, is implemented directly in the domain ontology TTL files, which contain statements by which the domain ontology relations are subsumed under VIPRS relations. Property chain correspondences are applied when the mereosemiotic chain relations from VIPRS are unfolded, cf. Fig. 5.7, yielding series of elementary parthood and reference relations from the EMMO, so that the graph grows both in terms of vertices and edges; in TTL notation, this corresponds to the introduction of blank nodes (individuals without an IRI [41]) by which, e.g. 

Fig. 5.7
figure 7

Correspondences between the domain and top levels; example: Description of a materials modelling use case following MODA [20] and OSMO [21]. Ellipses denote individuals, labelled by the concept names from the respective ontologies (EVMPO, OSMO and multiple EMMO modules), and arrows denote relations. At the intermediate stage, mereosemiotic chain relations from VIPRS are used to support the alignment [42]

figure f

which encodes becomes

figure g

5.5 Documentation of Molecular Models

For documenting molecular models and exchanging them between platforms, a semantic interoperability standard on the basis of the VIMMP system of ontologies as well as MODA [20] was agreed between VIMMP and the Molecular Model Database (MolMod DB) of the Boltzmann-Zuse Society [43]; the associated environment of interoperable platforms will prospectively also include Bottled SAFT [44].

Fig. 5.8
figure 8

Knowledge graph representing a 2CLJQ model for acetylene by Stöbener et al. [45], i.e. model ID 97 (C\(_2\)H\(_2\) III) from the MolMod DB [43]. Ellipses denote individuals, labelled by the concept names from the respective ontologies (EVMPO, OSMO, OTRAS, VISO and VOV) and arrows denote relations. The graph from Fig. 5.7 is included in the bottom left corner [42]

The structure of the knowledge graph representing a molecular model is illustrated by Fig. 5.8, which corresponds to a two-centre Lennard-Jones plus point-quadrupole model (2CLJQ) where a molecule (in this case, acetylene) is represented by a rigid unit (viso-am:rigid_object), consisting of two Lennard-Jones interaction sites (viso-am:lj_site), a point-quadrupole site (viso-am:charge_quadrupole_site) as well as a viso-am:structureless_object, representing the molecular centre of mass, which is used the initial point of vov:relative_position vectors that indicate the coordinates of the interaction sites. The relation vov:has_attached_variable and subproperties of it are used to connect the interaction sites with the non-geometrical model parameters, i.e. the mass associated with each of the two LJ sites (half the molecular mass), the \(\sigma \) and \(\epsilon \) site and energy parameters of the LJ potential, and a second-order tensor characterizing the quadrupole moment. Other rigid molecular models are described analogously.

The platform interoperability implementation developed on this basis employs JSON-LD to exchange information on molecular models. Therefore, the knowledge graph needs to be connected (i.e. it may not consist of multiple connected components), and its topology needs to be simplified to a tree structure such that each object is subordinate to exactly one object, except for a single root node at the top. For the present example, an osmo:workflow_graph with two sections, a use case (MODA Sect. 5.1) and a model (MODA Sect. 5.2), is selected as the root of the tree. In this way, e.g. one of the site coordinates is included in the rigid unit description as follows:

figure h

The hierarchy by which objects are embedded in other objects in JSON is obtained from a subset of the relations from the knowledge graph, shown in Fig. 5.8 as solid arrows in blue colour, while references to IRIs are used to represent the other relations (dashed black arrows) in JSON-LD. The relations vov:involves_object and vov:involves_variable are part of the JSON-LD tree structure (solid blue arrows), so that COM, SITE_LJ_A, and LJ_A_POS are all hierarchically subordinate to RIGID_UNIT. The relations vov:has_initial_point and vov:has_final_point, however, are sideways connections between nodes from multiple branches of the tree (dashed black arrows); therefore, their JSON-LD representation only points to the IRI of the referenced object, using the "@id" keyword.