Semantic Technology for Simulations and Molecular Particle-Based Methods

In this Chapter we discuss the role of ontologies for simulations, in the context of materials modelling in general and of molecular particle-based methods in particular. After a brief overview of the literature and possible applications, we present the VIMMP ontologies that allow to describe software capabilities and to further specify the various algorithms via the involved variables: the VImmp Ontology of Software (VISO) and the Vimmp Ontology of Variables (VOV).


Examples of Applications
As already explained in Chap. 1, ontologies are an explicit and formal way to represent knowledge in a certain domain. But how are they actually used in the context of simulations and modelling?
This question connects to the purpose the ontology is designed for and to technical aspects, such as the availability and choice of tools (for example, to connect ontologies to programming languages 1 ). And it also poses the question whether we expect the end users to be (mainly) humans or machines.
Also, the use could be more or less direct: thinking, for example, of a database, a triplestore would make an immediate use of the ontology, whereas a less direct approach would be to take into account aspects of the ontology when designing the database.
We can get some insight on the possibilities by looking at the examples given above: OntoSoft [11] was used to design a platform to find and compare software [20]; the Simulation Intent Ontology [21] is used in connection with the CoGui tool [22] to automatize some steps in the simulation setup; finally, one of the perspective uses of SVO and the connected tools [13,14] is to generate suitably formed variables starting from free-form text.
From the point of view of the source code, a perspective use of OOC-O is to support polyglot programming [18], i.e. the simultaneous use of multiple objectoriented programming languages.

Other Relevant Assets and Approaches
In the previous section, we limited the scope to ontologies; however, as discussed in Chap. 1, the semantic spectrum is wide, and along with them there are other relevant assets, which are technically different but similar in spirit, such as data schemas (cf. Chap. 2). Also, we should recall different branches of a field that is sometimes referred to as conceptual modelling. Historically, in the '60s-'70s, novel ideas setting the basis of this field appeared in different areas of computer science, namely, artificial intelligence, programming languages, databases, software engineering [23]: these ideas lead, among others, to the development of knowledge-representation languages, object-oriented programming and entity-relationship (ER) models (see [23] for a discussion of the pioneering ideas in each area and a brief history of the topic).
Even if the connections between these approaches are not always direct, the thinking behind their development is similar: so, for example, when building an ontology for a domain, it is definitely instructive to look also into object-oriented programs and schemas for such domain, and vice versa.
As an illustration of schemas for our area, we recall the Chemical Markup Language (CML) [24] and the ThermoML schema [25,26], a IUPAC standard primarily developed at NIST. In the direction of object-oriented programs, a popular tool is the Atomic Simulation Environment (ASE) [27] which allows to set up, control, visualize and analyse simulations at the atomic and electronic level.
Another relevant topic is that of visual programming: a visual scheme is used to represent a model (in the general sense), but also to generate the source code (model-driven simulation). This operation can sometimes work also in the opposite direction, extracting the model from the source code, a form of reverse engineering.
Finally, in the area of software design and business modelling, it is important to recall the role of the Object Management Group (OMG) [28] that was formed 30 years ago; in particular, its activities lead to the development of the Unified Modeling Language (UML) [29] and an ecosystem of specifications based on it.
In a nutshell, UML allows to describe a system 2 structure and behaviour via different types of diagrams. It was motivated as a unifying object-oriented language, and one of its main aims is to "advance the state of the industry by enabling object visual modeling tool interoperability" [29]. The visual aspect of the diagrams can be used to share ideas; however, these diagrams can also be given "life", in the sense of visual programming.
The OMG standards are widely adopted and there are many (commercial and not) tools based on UML that allow, for example, to generate executable code, check the model and generate test suites.
Of course, "modeling" in the case of UML has a more general meaning (as abstraction) than as we intend it in the EMMC sense (as applied to materials and based on physics and mathematics). From the literature, we note that UML does not appear to be strongly connected to science applications, but it is probably used internally by professional software tools, including those commonly used in engineering.
Finally, an important contribution bridging between UML and ontologies is OntoUML, which is ontologically well-founded version of UML (more specifically, of the UML 2.0 fragment of class diagrams) [16,17]

Software Capabilities
The aim of the VImmp Ontology of Software (VISO) 3 is to characterize software tools in the area of materials modelling, especially their features (i.e. capabilities), intended both at the model and solver level, but also their technical requirements, compatibility with other tools and licensing aspects. The concepts defined within this ontology will, first, guide the ingest of information on the VIMMP platform, and, later, allow the users to retrieve and compare tools. Below an upper level (viso-general, cf. Fig. 4.1) that addresses aspects common to all software, we split VISO into three branches focusing on classes of models: electronic (EL, viso-el), atomistic and mesoscopic (AM, viso-am), and continuum (CO, viso-co) models. 4 These branches depend on viso-general, but can be loaded independently of the other two siblings. We underline that both VISO and VOV (presented in the next section) are designed to address models from the four granularity levels of RoMM [30]. However, in this book, we 2 A "system" is intended here in a very general sense, as something made of components. 3 VISO: https://purl.vimmp.eu/semantics/viso/viso-general.ttl, https: //purl.vimmp.eu/semantics/viso/viso-electronic.ttl, https://purl. vimmp.eu/semantics/viso/viso-atomistic-mesoscopic.ttl, https:// purl.vimmp.eu/semantics/viso/viso-continuum.ttl (all of which are non-resolvable IRI); the concatenation of the four files is mirrored at http://www.molmod.info/semantics/viso-all-branches.ttl (resolvable URL). 4 To avoid name clashes between the branches, prefixes are used as indicated. In the protégé editor, one can choose different options for the rendering (view tab), including rendering by short name and rendering by prefixed name. connection to EVMPO and external assets; the diagram was generated using the OWLViz protégé plugin; grey arrows labelled "is-a" denote subsumption ( ), i.e. rdfs:subClassOf especially focus on its AM branch, which deals with molecular models and particlebased methods; for more details on the other branches, we refer the reader to [31] and to a recent VIMMP ontology release [32,33].
• viso:programming_language: a language that can be used to write software.
• viso:software_tool_feature ≡ (viso:model_feature viso:solver_feature): a capability of a software tool, intended as either a model aspect that can be addressed (viso:model_feature) or as a numerical algorithm which is implemented (viso:solver_feature). Following the approach from RoMM [30], these two classes are disjoint. • viso:model_type: a classification of the model, intended as in RoMM [30].
• viso:model_object: the type of object entering the model and carrying degrees of freedom. Its subclasses in the AM branch (cf. Fig. 4.2) include viso-am:interaction_ site, viso-am:interaction_surface, viso-am:connected_object.
• viso:software_update: it describes (as text) the changes between versions of a software. In particular, its subclass viso:software_tool_update allows to describe the addition/removal of features from a tool. • viso:software_interface: an interface between a software and a user or a client (i.e. a program or device). Some subclasses of this class are taken from the SWO software interface class (swo:SWO_9000050) [10]. • viso:license: a regulation of the right to use, modify and distribute something, in this case software. It is declared to be equivalent to the Software Licence class from SWO (swo:SWO_0000002), cf. Malone et al. [10]. • viso:license_clause: it is equivalent to the Licence clause class from SWO (swo:SWO_9000005), cf. Malone et al. [10].
Below viso:general, the EL, AM and CO branches of VISO expand on the categories viso:model_feature, viso:solver_feature and viso:model_type (cf. Fig. 4.2 for the AM one). These classes are the richest ones of VISO, and they contain most of the concepts that are peculiar to our domain. The three branches have a common structure, in that the subclasses of viso:model_feature are further classified into (non-disjoint) classes of viso:materials_relation_trait, viso:physical_equation_trait and viso:external_condition_trait. For clarity, we systematically use trait here, and not aspect, since the latter keyword has a different and well-defined role within OSMO and MODA.
In the last part of this section, we look into more detail at the viso-am branch, which was designed considering Molecular Dynamics, Molecular Mechanics, Dissipative Particle Dynamics and Monte Carlo methods. First of all, our choice to treat together the atomistic and mesoscopic models is motivated by the fact that in many cases they rely on the same numerical methods and a given software tool can address both. Also, the meaning of "mesoscopic" within RoMM is different from the usual acceptation: as soon as two or more atoms are grouped into an entity, this is considered a mesoscopic model; since united-atom models already fall into this class, treating these two granularity levels jointly seems well justified.
It is important to underline that the RoMM [30] classification principle is based on what a modelling entity represents , a criterion that is indeed well suited to multiscale modelling. A complementary and quite natural classification could be based on the mathematical nature of the modelling entity: for example, the classical models could be distinguished in particle-based and field-based ones. While as a rule of thumb AM models are particle-based and CO models are field-based, typically there are also fields in AM models, discrete particles in CO and classical particles in EL ones. Above all, it is important to realize that the two classifications are fundamentally different: to give an extreme example, we could have a particle-based model of the solar system, where each particle represents a planet! In this direction, an important concept in VISO is that of viso:model_object 5 (cf. Fig. 4.2) which is the type of object entering the model and carrying degrees of freedom. To be able to encompass different chemical objects, we need to adopt a neutral vocabulary; in the AM branch, we choose to use viso-am:interaction_site to indicate a point which is involved in (experiences) some interaction; it can represent the centre of a physical particle (an atom, a coarse-grained bead), but also be a fictitious particle. Similarly, a viso-am:connected_object, where connectedness is via bonds of some type, could be a molecule (in the chemical sense) or an aggregate. Finally, we have viso-am:interaction_surface, which is a surface affecting the interactions and is treated as continuum, not as a collection of sites; for example, it could be a wall. It is clear that our approach focuses on the mathematical nature of the objects, not on what they represent: this is a convenient point of view from the mathematical and numerical sides. Of course, to choose the appropriate Materials Relation we still need to know what the object represents.
Moving on to interactions, we highlight the classes viso-am:potential, viso-am:composite_potential and viso-am:non_conservative_force (cf. Fig. 4.2): the first one refers to the mathematical expression (functional form) of a potential energy and its elements are used as building blocks for elements of the second class; the second refers to a potential that is defined by more than just one single functional form acting between a pair of species; the last one refers to forces, typically appearing in coarse-grained models, that cannot be written in terms of a potential. Special cases of composite potentials are what in computational chemistry are known as Force Fields (called Interatomic Potentials in Physics): viso-am contains classes for some of the most popular ones [32].
So far, we have given examples that pertain to the materials relation, i.e. subclasses of viso-am:materials_relation_trait. Below viso-am:external_condition_trait, we find concepts as the boundary conditions, external fields and potentials, and the thermodynamic ensembles (cf. Fig. 4.2).
The class hierarchy for the solver features is much simpler than that for the model ones, being just a list of classes (including viso-am:integrator, viso-am:minimi-zation, …), each populated by various individual algorithms. 6 We underline at this point that the splitting into solver and model features is not always straightforward, since it depends on how much relevance is given to an ingredient of the method: a prototypical example is that of thermostats, which are typically considered as purely numerical aspects, but have a central role for models such as Dissipative Particle Dynamics (cf. the discussion in [31]). To circumvent this problem and allow for different views while keeping solver and model features separated, we define in VISO the relation viso:is_modelling_twin_of.
Within VISO, we intentionally don't go beyond a certain level of detail in the description of software; in particular, the variables entering the models and algorithms are dealt with by the VOV ontology, presented in the next section.

Variables and Functions
The purpose of the Vimmp Ontology of Variables (VOV) 7 is to organize the variables (in a broad sense, including constants) that appear in modelling and simulations, and to connect them to models and algorithms in which they are involved and to model objects (e.g. entities entering a simulation, such as sites, rigid bodies) which they are attached to. VOV can be used in connection with VISO and OSMO to further specify models, algorithms and workflows. The main concepts from VOV are: Variables in VOV can be classified according to three main criteria: by their scope (vov:object_variable, vov:pair_variable, vov:system_variable, vov:universal_variable), their rank (vov:scalar_variable, vov:vector_variable, vov:tensor_variable) or their basilar kind (vov:mass, vov:energy, …), for which qudt:QuantityKind is used [12]. In Fig. 4.3, we show the splitting of vov:variable according to scope and the subclasses the diagram was generated using the OWLViz protégé plugin; grey arrows labelled "is-a" denote subsumption ( ), i.e. rdfs:subClassOf of vov:function. A different classification, also present in VOV and shown in Fig. 4.4, distinguishes variables based on the nature of the features they are involved in, as that is stated in VISO. Since a variable can be involved in multiple features of different nature, clearly the two main classes (vov:model_variable and vov:solver_variable) are not disjoint. Also, the vov:model_variable class is further split, mirroring in this way the hierarchy of model features in VISO.
Selected relations (object properties) from VOV are: For properties such as the particle species (in a broad sense, chemical species in atomistic models, particle/object type in general), label and index, we use datatype properties (vov:has_species, vov:has_label, vov:has_index).
While for some typical variables it makes sense to define in the ontology named individuals, in other cases it is necessary to allow the user the freedom to define new ones. Accordingly, VOV provides different mechanisms to define the needed variables: one can directly use variables that are present in VOV as individuals (e.g. vov:TARGET_TEMPERATURE) or introduce customized ones populating VOV classes (e.g. defining elements of vov:object_mass). A third approach is to characterize the value and role of new variables using the relations vov:shares_value_with and vov:shares_role_with. The last method, while very convenient, has the drawback that it will not automatically transfer to the new variable all the properties of the prototype one, for example, its physical dimensions; however, this transfer can be taken care of when creating the data storage.
Coming to the class vov:function, which concerns relations between variables, in the case of classical particle-based models such a concept is mostly needed in the processing of data: 8 in Fig. 4.3, one can see classes such as vov:trajectory, vov:pair_distribution_function and vov:autocorrelation_function. The class vov:field, that has a more central role in continuum models, is however relevant also for particle-based ones: think, for example, of external spatially varying fields or of the density and velocity fields that are obtained processing the raw results.
To clarify how the two VIMMP ontologies we are discussing in this chapter are linked to each other, in Fig. 4.5, we highlight some classes from VISO and VOV, together with the main relations between them. In Fig. 4.6, we illustrate the same concepts with a more concrete example including individuals: the example is about a Molecular Dynamics (MD) software tool (we imagine it to be called "A_MD_TOOL") which has certain features (e.g. a velocity-Verlet integrator and a potential energy called "A_POTENTIAL") that in turn relate to variables (e.g. the simulation time step or the mass and velocity of an interaction site). These variables can extend to the whole system or be limited in scope to a model object. Considering the variable usage, the time step is a vov:solver_variable, whereas the site mass is a vov:model_variable. The idea behind the classification shown in Fig. 4.4 is to help to identify the variables that affect the physics of the system from those that do not, 9 or should not, and to recognize which part of the governing equations they enter. So, as soon as a variable is involved in some feature, we can infer which class it belongs to; however, since the classes are not disjoint, we cannot exclude it belongs to the sibling class too.
We note at this point the general and somewhat obvious, but practically relevant, fact that there is a delicate trade-off between the looseness of concept definitions and the ability to make informative inferences; and that is even more so given the assumptions under which ontologies by construction work 10 [34]. That is, to be able to make stringent inferences, we need to make explicit statements about class disjointness, individuals being different and so on. Otherwise, we can still extract information, but

Simulation Variables vs Physical Properties
We open here a parenthesis to discuss the relation between the variables entering a simulation and the physical properties of real-world objects; this naturally leads to a comparison of simulations and experiments.
Let's consider, for example, a certain material (say, liquid water) and focus on its electric permittivity . The latter is a quantity that appears in electrostatic and electrodynamic laws (e.g. Coulomb's law), and can be measured experimentally based on them.
When designing a model for such material that, in particular, captures its electric permittivity, different approaches are possible: (a) we might take as fixed (a model parameter, a constant needed as an input), or (b) we could design a model containing dynamic degrees of freedom that carry electric dipoles, so that is an emergent property, and the input of the model is instead the properties of the degrees of freedom, possibly fictitious particles. In both cases, will be matched to the experimental value: in one case directly, and in the other by tuning the parameters associated with the simulated entities.
In the case (b), the value of can be estimated using liquid-state theory and computed from simulations using linear response theory: both procedures are quite far from what is done experimentally. However, a way to test that the model actually behaves as it should is to compute the reduction in the force between two fixed ions due to the presence of the medium; the same test can be done in the case (a), as a basic check of the numerical implementation of electrostatics, for example. Now, is this so different from an experiment done on the material? Simulations, especially those involving some stochastic element, are in many ways similar to experiments, for example, they also require several repetitions and their results are affected by statistical errors.
In our view, the concept of observation (which includes measurement) in the EMMO could be generalized to accommodate also calculations (analytical and numerical ones), and the variables entering the models (both as input and output) that are numerical counterparts of physical properties could be recognized as such.

EngMeta and VIMMP Ontologies
The EngMeta scheme described in Chap. 2 and the VIMMP ontologies presented in this and in the previous chapter have a relevant overlap in scope, with similar keywords appearing in both assets. From Fig. 2.2, one can see that EngMeta includes concepts that on VIMMP side are addressed by different ontologies, in particular, OTRAS (e.g. author, publication, citation), VISO (e.g. software, force-field), VOV (e.g. variable), MMTO (e.g. project), OSMO (e.g. system component/material) and VICO (e.g. persons and organizations). The technical metadata (e.g. file size, checksum), instead, are mostly tackled by the Zontal storage itself [35,36].
At the syntactic level, we are comparing an XSD schema and OWL DL ontologies: what are entities (say, "Software") and attributes (say, "license" and "softwareVersion") within EngMeta, typically correspond to classes within VIMMP ontologies (in this case, viso:software_tool) and to the objects or data a relation points to (in this case, a viso:license, pointed to by viso:has_license and a xs:string, pointed to by viso:has_version_identifier).
In general, to match or integrate two assets that, like here, differ syntactically and (even if slightly) semantically, one can think of performing the operation in two steps: a syntactic conversion first, then a semantic matching or integration (cf. Sect. 5.3).

Closing Thoughts
Clearly, there are many concepts involved here: formalization, standardization and automation. One could argue that for the domain we are interested in, i.e. simulations of materials, physics and mathematics are already universal languages: why, where and what kind of further formalization and standardization are needed?
For example, imagine we are given a set of equations that models the mixing of two fluids in an industrial device. Even if the mathematical formulation will be accessible to everybody with a scientific background, this does not capture the context the model is embedded in (in fact, the simulation intent, the assumptions and approximations made are normally expressed in natural language in an accompanying paper), and understanding it will require delving into a jungle of details. Also, importantly, the tacit assumptions and the technical jargon can vary a lot across communities (with the same algorithm having a different name and so on). Classifications and standardization can therefore help inter-community communication and collaboration and facilitate intra-community reuse of models. Coming to automation, of course, the possibility to generate source code from a pseudocode is very appealing, both for non-experts and for experts.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.