Formalizing the four-layer metamodeling stack with MetaMorph: potential and benefits

Enterprise modeling deals with the increasing complexity of processes and systems by operationalizing model content and by linking complementary models and languages, thus amplifying the model value beyond mere comprehensible pictures. To enable this amplification and turn models into computer-processable structures, a comprehensive formalization is needed. This paper presents the formalism MetaMorph based on typed first-order logic and provides a perspective on the potential and benefits of formalization that arise for a variety of research issues in conceptual modeling. MetaMorph defines modeling languages as formal languages with a signature Σ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varSigma $$\end{document}—comprising object types, relation types, and attributes through types and function symbols—and a set of constraints. Four case studies are included to show the effectiveness of this approach. Applying the MetaMorph formalism to the next level in the hierarchy of models, we create M2FOL, a formal modeling language for metamodels. We show that M2FOL is self-describing and therefore complete the formalization of the full four-layer metamodeling stack. On the basis of our generic formalism applicable to arbitrary modeling languages, we examine four current research topics—modeling with power types, language interleaving & consistency, operations on models, and automatic translation of formalizations to platform-specific code—and how to approach them with the MetaMorph formalism. This shows that the rich knowledge stack on formal languages in logic offers new tools for old problems.


Introduction
Enterprise modeling has proven instrumental in facing the challenges of increasing complexity and interdependences of processes and systems in the modern world.Research on enterprise modeling has enhanced modeling languages from mere instruments for pictures supporting human understanding to highly specialized tools with value adding mechanisms like information querying, simulation, and transformation [5,24].The nature of models has evolved from a visual representation of information to an exploitable knowledge structure [11].Nevertheless the European enterprise modeling community experiences that the potential of enterprise modeling is currently not fully utilized in practice and modeling is employed only by a limited group of experts.Therefore in [56,57] a research agenda is formulated to establish "modeling for the masses" (MftM) and broadcast its benefits also to non-experts.

A Need for Formalization
Although the initiators of the MftM movement mention that the formality of model representation possibly hampers understandability, we argue that the idea behind MftM nevertheless requires an investigation of the formal foundations of models and languages.This is for three reasons: 1) According to the stakeholder dimension of challenges in enterprise modeling research [56, p.234] computers also have to be seen as stakeholders producing and consuming models.To make mod-els computer-processable they have to be formalized, because as computers do not understand semi-formal or unstructured models and language specifications [6].
2) The vision of models being not autotelic but being a means to the operationalization of information [56, p.229] calls for value-adding functionality beyond mere graphics like reasoning, verification & validation or simulation, which is formulated ideally computerunderstandably and implementation-independently, i.e. formalized.3) The vision of local modeling practices which are globally integrative [56, p.229] calls for a common foundation of what models and modeling languages are to enable the linking and merging of models in different domains with different semantics [29].
Formalization is also essential in the light of the emergent importance of domain specific modeling languages (DSMLs) [23] as well as an increasing agility in the advancement and extension of established languages and methods [34].The lack of a common way for formalizing DSMLs leads to divergent formal foundations limiting the opportunities to compare or link models.Frequently the big standards are extended for a specific domain, e.g. the extension of i* with security concepts constituting the modeling language Secure Tropos [48,55].Therefore a common way of specifying the base languages as well as the extensions or modules is required.A silo like formalization of the big standards is not sufficient as divergent base concepts of models and different underlying formal structures can impede a mutual interconnection and integration.Another important building block for advancing the science of conceptual modeling is an exact and commonly applied method for specifying modeling languages.A survey conducted by Bork et al. showed that the specification documents of the standardized languages like UML and ArchiMate diverge in the concepts they consider as well as in the techniques they use to specify their visual metamodels [8].Examples from recent scientific publications indicate that also in research on domain specific languages no common practice of metamodel specification is in use.Several contributions specify metamodels with UML class diagrams, declaring object types as classes and relation types as classes or named association arrows, e.g.[33,44,53,58,60].Others simply define the object and relation types with box-and-line models devoid of an underlying language and rely on the intuitive understanding of the reader, e.g.[43,51].This shows that although metamodels are models themselves and therefore subject of interest for enterprise modeling research no language for metamodels has been established yet.Nevertheless when a language has to be implemented or executed a precise and unambiguous definition of the metamodel is crucial [6].

Goal and Requirements
According to the AMME framework of agile modeling method engineering shown in Figure 1 the phase of formalization is pivotal in the lifecycle of a modeling language.Yet there is no common procedure of how to formalize arbitrary modeling languages.Often existing formalization approaches restrict to a concrete application, domain or language, thereby limiting the reusability in other domains and languages.As the AMME lifecycle is meant as a generic procedure model for generating arbitrary modeling languages and methods, we need a formalism applicable to any domain and language an engineer might come up with.The work at hand intends to close this gap and aims at building a bridge between the Design phase of collecting the relevant concepts and the Develop phase of transferring the final design to a metamodeling platform in the AMME lifecycle.Such a formalism must be consistent with the semantics and structure of a modeling language.For this reason we have to consider the formal foundations of modeling languages.We summarize the concrete requirements for the formalism as follows: 1) it has to be complete regarding the general building blocks of a language, 2) it has to be faithful to the character of modeling languages as such, and 3) it must be generic in a way that it admits the formalization of any language.
In the work at hand we will demonstrate that in accordance with these requirements modeling languages can be defined as as formal languages in the sense of logic.This means they comprise a signature Σ for the syntax and a set of constraints, for which we use firstorder predicate logic.This paper extends our prior work [19] published at the PoEM 2020 conference about the definition of modeling language, where we concretely stated how the core concepts of modeling languages can be expressed in logical terms.Predicate logic provides the construct of a Σ-structure, i.e. an interpretation of the signature, which is the canonical correspondent to the model being an instantiation of a metamodel.Applying the definition to the meta language level results in M2FOL, a formal modeling language for metamodels.With M2FOL we are capable of modeling the syntax of a language to be specified, to be more precise the signature of the language according to the definition.The paper at hand extends the presented definition of formal modeling languages as well as the language M2FOL with the concept of multi-value attributes.We furthermore exemplify the potential and benefits inherent to the proposed formalism on a diverse range of research Fig. 1: The AMME lifecycle for agile modeling method engineering adapted from [34] topics and demonstrate the opportunities established methods from logic can provide for conceptual modeling.
The rest of this paper is structured as follows: In Section 2 we give an overview of related work on formalization of metamodels and modeling languages.In Section 3 we introduce the definition of formal modeling languages and models and concretize how the basic concepts of a language -object and relation types, attributes, inheritance, and constraints -can be expressed in logical terms.We then use this definition in Section 4 to create M2FOL -a formal modeling language for metamodels -and outline its self-describing character.Given a metamodel specified with M2FOL we show how to algorithmically deduce the signature of the corresponding modeling language.After that we give in Section 5 an outlook to the potential and benefits of formal modeling languages.We present some ongoing research and approaches on how to interleave modeling languages, formally include operations on models into the language specification, and use the formalization as a single point of specification processable by machines.In Section 6 we evaluate the formalism with respect to the formulated goal and requirements and explain the agenda for the empirical evaluation that is currently being conducted.
2 Background and Related Work

Formalism vs. Formal Syntax
We begin by discussing the distinction between a formalism and a formal way of specifying a language.A formalism always gives rise to a formal specification.The converse, however, is not true.This can be compared to the concept of a graph and the unique and precise way of specifying a graph with an adjacent matrix.Each graph can be represented as a matrix but a matrix per se does not provide the semantics of graph theory.The same applies to specification techniques merely offering a formal syntax to describe modeling languages.These techniques provide a unique way of specification but lack the structure and semantics of the underlying components of conceptual modeling.A formal syntax may offer a notation for specifying inheritance of object types.Nevertheless, the notation does not accomplish the inherent semantics, i.e. the transfer of features of the supertype to the subtype.This behaviour must be added to the syntax system by hand although a suitable underlying structure would entail concepts for inheritance.For this simple case basic set theory would suffice to capture inheritance via sets and containment automatically extending functions defined on the superset to all subsets.
In current research there is a consensus about a proper underlying structure of modeling languages, that is formal languages as defined in mathematical logic [16,25,50,52,62].Of course not every formal language is a modeling language, but modeling languages form a subclass of all formal languages.In this paper we want to concretize and work out this class of formal modeling languages.
Our principal distinction between a formalism and a formal syntax distinguishes our goal from existing approaches like the Meta-Object Facility (MOF) standard which offers a concrete syntax or notation system for specifying modeling languages but no inherent theory and methods for the concepts to be described.Furthermore MOF was developed with metamodels meant as software structure specifications [59] which restricts metamodeling only to one domain.In contrast to that, our approach regards metamodeling as a generic instrument applicable to a broad range of domains rather than only software engineering.Also the language Z or the lambda calculus differ from our endeavour in a fundamental way as they are methods developed to support the specification of computation, programming and software development.Of course they also make use of concepts from logic and may overlap with our approach in some areas.Nevertheless, they do not inherit the semantics of modeling languages and provide no explicit way of describing modeling languages.

Formalisms for Concrete Modeling Languages
According to the Characterizing Conceptual Model Research (CCMR) framework we are interested in contributions located in the dimension Formalize working on the level of Conceptual Modeling Languages and Metamodeling Languages [17].In this respect, we want to delineate our approach from the various attempts addressing the formalization of a specific modeling language.These attempts mostly aim at supporting a specific purpose or functionality and do not provide means to define arbitrary metamodels and languages.An example is the OSM-logic using typed predicate logic for object-oriented systems modeling with a focus on timedependent system behaviour [15].Another example is the SAVE method for simulating IoT systems using the δ-calculus [14].These specific formalizations may offer ideas suitable for being generalized to a generic approach but will not be comprehensively discussed here.However, as soon as there is a common practice of formally defining the ubiquitous concepts of modeling languages these specific approaches can be constructed as reusable extensions and modules and be of value in a broader field of application.

Formalisms for Ontologies and Concept Spaces
For a systematic positioning of our approach we use of the triptych allegory proposed by Mayr and Thalheim [45].They define conceptual modeling as tripartite consisting of an encyclopedic dimension for grounding the semantics in an ontology or concept space, a language dimension for the definition of language terms and valid expressions, and the conceptual modeling dimension in between as a link between term and concept space.We are mainly interested in a formalization of the language dimension and acknowledge that in the encyclopedic dimension there also exist various attempts to formalization, like the KL-ONE family [9] and Description Logic [2].Also the formal system of a conceptualization of domains as basis for truthful modeling languages proposed by Guizzardi et al. [26,27] has to be located in the encyclopedic dimension and has therefore to be distinguished from our goal.In this theory of ontologicallydriven conceptual modeling fruitful for the objective of a domain-faithful grounding for modeling languages, the language dimension is an a-posteriori concept implicitly obtained from ontological considerations.

Formalisms for Languages
When focusing on formalizations in the language dimension the existing approaches can be categorized according to the underlying theory they use, which is mostly graph theory, set theory or logic.All three of them offer concepts for the concrete structural behaviour of the elements to be described.In the following we present examples illustrating the shortcomings of the former two and argue why logic provides the most canonical approach.
In the domain specific language KM3 presented by Jouault and Bezivin models are defined as directed multi-graphs conforming to another model, the metamodel, itself a graph [32].Using this formalism the authors define a self-descriptive metametamodel and deduce a domain specific language to specify metamodels.This approach puts an emphasis on the graph-like boxand-line structure of models, rather than on the linguistic aspects and similar to MOF has a narrow focus on software structure specification.
A system based on set-theory is the formalization of Ecore and EMOF proposed by Burger [12,2.3.2] which uses the formal description of concepts from the OCL specification [49, A.1]. Set theory comprises very basic concepts describing structures, only admiting the subsumption of elements in sets and set hierarchies.It holds no further information about the semantics of the elements.
Also the FDMM formalism introduced by Fill et al. addressing conceptual modeling domains in a broader variety uses set theory to specify metamodels and models [22].The authors explicitly aim at a formalization of metamodels realized with the metamodeling platform ADOxx [4] and do not claim to be applicable for platform-independent specifications.
Neither graph theory, basis of KM3, nor set theory, basis of FDMM and the MOF formalization by Burger, do justice to the linguistic character of modeling languages and provide canonical concepts for the definition of a set of terms and for instantiation, an essential characteristic of modeling languages.Therefore the technique and semantics of this instantiation relation between model and metamodel has to be constructed ad-hoc and lacks the beneficial knowledge stack of established theories dealing with linguistic structures.

Formalisms Based on Logic
Formal languages as defined in mathematical logic inherently comprise the concept of instantiation as interpretation of the signature in logical terms, and they provide a rich knowledge base about their properties.Therefore, in current research the notion of modeling languages as formal languages in the sense of mathematical logic is receiving increasing attention [16,25,50,52,62].
In their investigation of formal foundations of domain-specific languages Jackson and Sztipanovits introduce typed predicate logic to handle object types in models [30,31].They indeed treat modeling languages as formal languages but they do not adopt the concept of a language interpretation, i.e. instantiation for model instances, but rather consider a model to be a set of valid statements about the model.This is also true for Telos [38], which builds on the premise that the concepts of entities and relations are omitted and replaced by propositions constituting the knowledge base.The choice of typed first-order logic for the formalization of these propositions is natural and explained in great detail in [39].Similar to Jackson and Sztipanovits knowledge is represented solely as a set of sentences in the formal language.In our approach on the other hand we do not adopt the transformation of models into propositions but rather directly deal with the ubiquitous concepts of objects and relations and an instantiation hierarchy between models and metamodels.This leads to a different view on models.In the attempts above a model is constituted by statements, whereas in our approach these statements are used as constraints restricting valid expressions using the proposed signature.
In his work on the theory of conceptual models, Thalheim describes modeling languages as being based on a signature Σ comprising a set of postulates, i.e. sentences expressed with elements of Σ [62].Models are defined as language structures satisfying the postulates, which canonically corresponds to the concept of instantiation of a metamodel.We go one step further and concretely point out how to capture the core concepts of a modeling language in a signature Σ to unify the method of formalizing a language.This then enables us to investigate the class of formal modeling languages, compare formalized languages, reuse components and develop generic methods for language fusion, model transformation etc. independent of a concrete language.
In summary, the literature review suggests that the structure of modeling languages including its linguistic character can be grounded in the concepts of formal languages.Therefore, in the work at hand we propose a formal definition of modeling languages in which we concretely specify the modeling concepts and their formal equivalent in logical terms with the prospect of successive elaboration.

Definition of Formal Modeling Languages
The intended definition shall serve as a cornerstone for a common way of formalizing modeling languages, which thereby become comparable, reusable and modularizable.A formal definition for modeling languages in general enables an investigation of common features of the resulting subclass of formal languages as well as a sound mathematical foundation for their functionality.We build on a survey conducted by Kern et al. [36] on common concepts in the meta 2 models of six established metamodeling platforms.The definition below incorporates all concepts identified in at least half of the investigated platforms.These are object types, relation types (binary), attributes (multi-value), inheritance (for object types, single), and a constraint language.Other concepts identified in [36] which are not yet included are roles, ports, inheritance of relations, nary relations, and models in the sense of model types.
These concepts mainly coincide with the core concepts introduced for conceptual modeling of information systems by Olivé [50].Additional concepts mentioned in Olivé's work which are of high interest but not yet included in our approach are derived types, generic relation types, and powertypes.

A Definition based on Predicate Logic
We use typed (also called sorted) predicate logic in this approach.The mathematical basics can be found in textbooks on logic or mathematics for computer science, e.g.[21,46].Some remarks on notation: To ease the differentiation between language and model level, we use capital letters for the symbols of the language and lowercase letters for the elements of the model.This definition explicates the formalization of the essential modeling concepts of a language, i.e. object types and inheritance, binary directed relation types and single-or multi-value attributes.Note that the definition does not prohibit the existence of additional symbols in the signature, so broader concepts like n-ary relations can optionally be included and are topic of further investigation.Also structures beyond the visual elements of a model can be included, e.g.paths as transitive relations or substructures comprising several elements.
We want to point out, that relation types are defined on the same level as object types, not subordinate to them.This highlights their significance for a model beyond mere arrows and allows for defining attributes of relations, multiple relations of the same type between the same two objects, as well as for inheritance of relation types.
With the data types and constants we can define attribute domains like integers via specifying a type called N and constant symbols 0,1,2,3, ... in C of type N for the numbers, or enumeration lists like a person's gender via specifying a type called gender and constant symbols male, female, and else in C. The elements of the simple or product types of S D are typically not Fig.2: Notation excerpt of the CoChaCo Method [35] visible in graphical models.They are exclusively used for specifying attribute domains.
Note that if we assume the set of constants for attribute domains to be finite, models are always finite, because by construction they contain only finitely many objects and relations.
Definition 2 A model M of a language L with typed signature Σ = {S, F, R, C} is an L-structure conforming to the language constraints C, i.e.M consists of a universe U of typed elements respecting the type hierarchy, that is for each T in S there exists a set U T ⊂ U and U = T∈S U T ; all sets U T for T ∈ S O ∪ S R have to be pairwise disjoint except for sets U O1 and This definition of models as language structures goes beyond a visualisation and considers models as knowledge structures as described in [11].Thereby we overcome several shortcomings of graphical representations, like the missing depiction of attributes and their domains in models or the visual mixing of the metarelation inheritance with the definition of relation types in metamodels.

Running Example Petri Nets
We will now illustrate the definition on the example of the Petri Net modeling language.For the visualization of the metamodel we use the notation of CoChaCo, a method to support the creative process of modeling Fig. 3: A metamodel of Petri Nets method design [35].This method comprises concrete syntax for most of the concepts contained in Definition 1 with a slightly different naming, see Figure 2. both with domain Arc and codomain Node.For the attribute Tokens we introduce a function symbol F T okens with domain Place and codomain N assigning each place instance a number of tokens.Finally we have to define the constraints of the language.These rules are not contained in a graphical metamodel.In existing specifications they are mainly specified with natural language or OCL.In the predicative formalization at hand constraints are an integral part of the language.Following four sentences written in the alphabet of PN ensure Node to be abstract, i.e. any element in Node lies either in Place or in Transition (1), as well as the alternation of types of the elements connected by an arc (2, 3) and the prohibition of multiple arcs between the same two elements (4).For ease of readability we abuse the notation ∀x ∈ T for x being of type T instead of using the type specific quantifier ∀ T x.Notice that the formalized model in Example 2 and the graphical model in Figure 4 represent the same thing.They are merely alternative ways of describing a system but with different merits.Whereas the graphical model is easy and fast to comprehend, the formal model is precise and complete, as attribute values are often not legible from a pictoral model.This can be compared to the different representation forms of a graph -once as a graphical depiction and once as an adjacent matrix.

M2FOL -MetaModel 2 First Order Logic A Formal Modeling Language for Metamodels
Metamodels are models themselves expressed in a metamodeling language.We propose a formal modeling language in the sense of Definition 1 for metamodels called M2FOL, i.e. a metamodeling language to be exact.This language is capable of describing precisely the concepts explicated in Definition 1.In general meta 2 models of metamodeling languages are supposed to be self-describing, which results in a four-layer metamodeling stack as depicted in Figure 5.We will show, that the Fig. 5: The four-layer metamodeling stack based on [41] metamodel of M2FOL, a meta 2 model by nature, also partakes of this property.

Definition of M2FOL
We stick to the notational convention of capital letters for elements on the language level and lowercase letters for elements on the model level.To indicate the metalevel of M2FOL and metamodels we use the typewriter font for meta symbols and elements.For ease of readability we write F : X → Y when F is a function with domain type X and codomain type Y .Nevertheless, the instantiation is then a function f : U X → U Y defined on universes of typed elements.To be consistent in the naming of the symbols in M2FOL we distinguish between an attribute type on meta level and an attribute as the concrete assignment of a value to an element on model level.
With M2FOL we want to model object types and inheritance relations between them, relation types connected to their from and to object types, attribute types and their data types, and possible data.According to Definition 1 all the bold concepts constitute a type in S O in M2FOL, whereas all italic concepts make up a type in S R in M2FOL.The types inheritance, from, and to, furthermore require assignment functions for source and target specification.Data types and data are necessary for defining attribute domains and its values, e.g. the domain N 0..10 and values {0, 1, 2, ..., 9, 10} or an enumeration list domain gender with values male, female, else.Attribute types need the assignment of owning type and value domain.The postulates of the language (for brevity we use the abbreviation xry for relation r of type T, x being F T s (r) and y being For < OT we furthermore require to be a strict partial order, i.e. < OT is transitive, irreflexive and antisymmetric.
The constraints ensure single inheritance (5), < OT being the transitive closure of Inh under the assumption that all universes are finite (6-7), the existence and uniqueness of to and from objects of a relation (8)(9)(10), and the abstractness of the types ORT and DORT (11)(12).The absence of cyclic inheritance and self-inheritance follow from the properties of < OT .

Running Example Petri Nets
With this language we now can transfer the graphical metamodel of Figure 3 to a formal M2FOL-model.

Example 3 The Petri Net Metamodel M N P
The universe of object types U OT comprises three elements n(ode), p(lace), and tr(ansition).The universe of relation types U RT contains one element a(rc).One element tok(ens) is contained in the universe of attribute types U AT .The universe U Inh contains the instantiation relations p n between p and n as well as tr n between tr and n.U Fr contains the relation a from of the source element assignment to the relation type a. U To contains the relation a to of the target element assignment to the relation type a.For these four elements the corresponding source and target elements have to be assigned: From Inh the transitive order relation < OT is deduced: < OT = {(p, n), (tr, n)}.Furthermore there are data values {0, 1, 2, ...} in U D all of type N ∈ U DT , f DT (i) = N ∀i ∈ U D .These are needed for the value domain of the attribute type tok, an attribute assigned to p: f type (tok) = p, f val (tok) = N.In short this can be written as follows: U ORT = {n, p, tr, a}, U DORT = {n, p, tr, a, N} This formal metamodel M N P conforms to all constraints 5-12 and describes the formal language N P introduced in Example 1. Their subordination prompts a generic procedure on how to deduce the latter from the former.
In Table 1 we present this procedure as an algorithm.
In the right column the algorithm is exemplified on the metamodel of Petri Nets.Compare the resulting language to Example 1.

Meta-perspective on M2FOL
Finally we formalize the metamodel of M2FOL as M2FOL model.The graphical metamodel is depicted in Figure 6.
f type (ass to) = at, f val (ass to) = ort, On the one hand the construct above is itself a model expressed in the language M2FOL.On the other hand this metamodel defines M2FOL as a meta 2 model.With the algorithm presented above we deduce Definition 3 from Example 4. So we conclude that the proposed modeling language M2FOL for metamodels is self- Each metamodel element o in the set U OT defines an object type O of the language.The inheritance relation < OT ⊂ U OT × U OT must be adopted to the types.
Each metamodel element r in the set U RT defines a relation type R of the language.
For each relation type r ∈ U RT there exist an element s of type From and an element t of type To and both relation elements have as source element r, f Fr Each metamodel element dt in U DT defines a data type DT of the language.Each metamodel element Each metamodel element a in the set U AT defines a function symbol F a of the language.The object or relation type a belongs to, i.e. the domain of F a , is given by the assignment f type (a) = t ty ∈ U OT ∪ U RT , its value range, i.e. codomain, by The constraints of the language have to be added manually, because this information is not determined by the metamodel.
Table 1: Algorithm to deduce a formal modeling language signature from its M2FOL metamodel specification In Figure 7 we do a wrap-up of all the presented definitions and examples on the language definition hierarchy proposed by Thalheim and Mayr [45].The authors make an explicit distinction between this hierarchy and the model hierarchy.On the grammar definition level they allocate the means of defining the language grammars.Examples for elements on this level are EBNF or in our case Definition 1 and Definition 2 concerning the formal definitions of a modeling language and a model.The grammars residing on this level are used in the next lower level of language definition.Here we can see not only model representation grammars but also metamodel representation grammars.An example for a model representation grammar is the formalized Petri Net language from Example 1, an example for a metamodel representation grammar is Definition 3 of M2FOL both using the proposed formalism in Definition 1 as grammar definition tool.On the lowest level of language usage we find instances of these languages: the Petri Net model from Example 2 as a modeling language representation and the metamodel of Petri Net from Example 3 as a metamodeling language representation defined by means of M2FOL.An example of a meta 2 modeling language representation on this level is the metamodel of M2FOL also defined by means of M2FOL.This shows that M2FOL is a metamodel representation grammar as well as a meta 2 model representation grammar.The algorithm presented in Table 1 allows for the automatic derivation of the language syntax of a model representation grammar on the language definition level from a metamodeling language representation on the language usage level.

Potential and Benefits of Formalized Conceptual Modeling Languages
In this section we give an outlook to several research topics potentially benefiting from formalizing conceptual modeling languages with the proposed formalism -these are language interleaving and consistency, operations on models, and translators of platform independent formalizations to platform specific code.For this purpose, we make use of established concepts from formal language theory.For reasons of brevity we will not discuss all topics in detail and will restrict to an extensive elaboration only for the first topic of interest:

Language Interleaving and Consistency
Models are means to manage information in highly complex systems in business modeling, in software engineering and many other fields.The solution to cope with complexity is often seen in the distribution and fragmentation of information between different models or views possibly in different modeling languages, thereby raising the issue of keeping the models consistent [1,13,37].
In the following we demonstrate that language interleaving and the definition of consistency constraints can be easily realized in formalized conceptual modeling languages.Depending on the initial situation we can distinguish top-down approaches, where a newly defined or existing language is segmented in several sublanguages or views, and bottom-up approaches, where existing languages are interleaved and their metamodels are amalgamated and equipped with additional constraints [47].We will discuss the first approach briefly and exemplify the second one in a case study.

Top-Down Approach
Expressed in our formalism the top-down approach means to restrict the signature Σ of a given language L to subsets Σ 1 and Σ 2 of the signature.When working in one view, i.e. with a sublanguage L| Σi , we are restricted only to the types appearing in Σ i .Note that we also have to remove relation types, if their source or target object type was excluded from the signature as well as for attribute types, if their source type or value type was excluded.All constraints considering unavailable types have to be removed.Note also that the signatures Σ i of the sublanguages do not have to be disjoint.While restricting to a sublanguage L| Σi and thereby restricting to a concrete view on a system under study we are still interesting in the model as a whole.So we assume that for each view of L| Σi there exist correlated models in the other views L| Σj being dependent on each other.Therefore we need pairwise constraints between the possible views always considering the signatures of the two relevant languages.If the signatures are not disjoint these constraints contain isomorphisms of elements with a common type to keep the shared structure consistent.

Bottom-Up Approach
Expressed in our formalism the bottom-up approach means to fusion the signatures of two given languages L 1 and L 2 .The interleaving of models and their language reaches from simply referencing to elements in other models to a highly dependent content and structure of models in both directions.There exist different techniques to link conceptual modeling languages [1].There are also different techniques in the field of logic on how to combine formal languages, e.g.[3,42].The presented attempt is mainly inspired by the former reference.
Uniting two given languages L 1 and L 2 requires uniting their signatures Σ 1 and Σ 2 .When doing so, we have to take care for types T occurring in both languages.To stay compatible with existing models we keep both types and rename them to T 1 and T 2 .Furthermore we introduce new function symbols i : T 1 → T 2 .These functions are required to be bijective as we assume, that we want to depict the same situation with both views, i.e. sublanguages.
With this union the sets of object types and relation types of the new language L are fixed.Also the inheritance relations do not change and stay separated for both initial languages.To create new references and consistency constraints we may introduce new attributes A with F A : T dom → T val , where attributed type T dom Fig. 7: Language Definition and Model Hierarchy adapted from [45] and value domain T val might stem from different initial languages.To define the attributes and constraints as required we might also have to introduce new product types in S D and additional function and relation symbols in F and R respectively.
In summary, the newly obtained signature Σ for the language L looks as follows: addition contains the newly created relation symbols; The constraints C are extended with the requirement that the mappings of the coinciding types i : T 1 → T 2 are bijective, as well as further defined postulates necessary for information consistency.

Case Study
We will demonstrate the procedure of interleaving on a case study of UML Class Diagrams and Sequence Diagrams.First we have to formalize the initial languages CD of Class Diagrams and SD of Sequence Diagrams.For an easier comprehension the considered signatures restrict to a subset of the original UML concepts relevant for the connection of the two languages, see Figure 8.

Example 5
The UML Class Diagram Language CD For our purpose it suffices to consider in the signature Σ CD only one object type Class (Cl) and one relation type Association (As) connecting classes.Besides the plain data types Visibility (V) and SimpleType (ST), class diagrams also provide the construct of attributes and operations of classes exhibiting a complex structure themselves.We define a data type called Attribute (At) which is a tuple of an element of type Visibility and the value of the attribute, i.e. an object of ComplexType (CT) = SimpleType∪Class.Furthermore we need a data type Operation (Op) which is a tuple of an element of type Visibility, a return type element in CT, and arbitrary many parameters of type CT.For these parameters we use the union of all product types (CT) i .For reasons of brevity we skip the explicit definition of all intermediate product types in the signature and just refer to i (CT) i .As classes can have several (distinct) attributes and operations, the model attributes C(lass)At(tributes) and C(lass)Op(erations) point to the powersets of types ℘(AT) and ℘(OP).The signature of CD looks as follows: Thereby, +, −, ∼ are of type Visibility and String, Integer, Real, and Boolean are of type SimpleType.
We do not need any postulates on the language CD.

Example 6
The UML Sequence Diagram Language SD Also in this example we restrict to the simplified case of having only one object type Lifeline (Ll) and two relation types Message (Msg) and Replymessage (Rmsg) both connecting lifelines.The temporal sequence of messages usually captured in the graphical order of arrows is defined in the attribute Sendtime (MSt and RSt) with value domain N assigning a point in time to the messages and replymessages.To be able to compare sendtimes we need the usual order relation < time ⊂ N × N and the usual addition function + time in the signature: To ensure a reasonable temporal flow of messages we need two language constraints: Equation 48 restricts diagrams to be sequential, so no two messages are sent at the same time.Equation 49forces the message flow to be synchronous.

Example 7
The Interleaved Modeling Language CD SD In the case of UML class diagrams and sequence diagrams we do not have to take care of identical types.We define several new attributes to bind lifelines in a sequence diagram to the classes in the corresponding class diagram: a reference L(ife)l(ine)Cl(as), a reference Cal(led )Op(eration) of a message, and a reference Re(turn)Ty(pe) of a replymessage.
These references of course require some new constraints.
To formulate these we need a new relation symbol ∈ Op and a new function symbol F pr projecting an element of type Operation to its returntype, i.e. the last value of the tuple.
With these symbols we can define the additional constraints: Equation 56 ensures that a message can only call operations of the addressed class.Equation 57 guarantees that each replymessage follows a message and the returntype is exactly the returntype of the called operation.
The complete language CD SD looks as follows: S R = {Association, Message, Replymessage} (59) C = {+, −, ∼, String, Integer, Real, Boolean, With the newly generated language each model contains all information of both views, the structural view of class diagrams as well as the procedural view of sequence diagrams.Of course when viewing the model we only consider the model restricted to a sublanguage, CD or SD but in the background all elements of both reside in the "supermodel".This means all information is captured in the model at all points in time and at the same time kept consistent due to the newly introduced postulates.This conforms to the idea of the single underlying model as proposed by Burger et al. [37,47].

Operations on Models
Model functionality is a crucial point to amplify the value of models beyond mere pictures [5].One of the most prominent examples is the firing mechanism on Petri Nets [54].Also many domain-specific languages gain in value by the offered model operations.For example, model operations in the sense of model to model transformations play a crucial role in Model Driven Software Engineering [10, chapter 8].Nevertheless, operations on models are often out of scope or simply ignored in formalizations.An exception is the theory of graph grammars and graph transformations [28].Formalisms based on logic are often critiqued for not being able to capture the operational syntax of modeling languages.We argue that this is not an inevitable inability of these approaches and show some ideas how operations on models can also be supported by concepts from logic.

Structural Events and Domain Events
We adopt the notion of Olivé [50, chapter 11] who defines domain events, i.e. semantically and syntactically admissible operations on models, by decomposing them into the smallest possible changes in a model, the so called structural events.While Olivé only names deletion and insertion of objects and relations as structural events, for our purpose in the formalism at hand we also have to consider the change of attribute values as a third variant.
Given a language L we define M as the set of valid models, i.e. those language-structures fulfilling the constraints, and M − as the set of all possible language constructs not necessarily complying to the postulates.Domain events are therefore mappings de : M → M whereas structural events are functions on arbitrary constructs: Consider again model M of example 2. A valid domain event is the firing of the transition Serve.This event is the concatenation of the three structural events of changing the attribute values None of these structural events alone is semantically valid, but together they form a semantically and syntactically admissible operation.This also shows, that there are many structural events (we can set the attribute tokens of each place to any number we like) but much less domain events.Another point to be considered are pre-and postconditions of domain events.To capture these in a generic way we can use concepts from temporal logic [40].With the logical operators from temporal logic we are able to formulate postulates considering both states of a model, before and after the application of a domain event, and define dependencies between both.
Concatenations of domain events form sequences of valid models In Petri Nets for example the firing of a transition is a domain event.Therefore these sequences are of special interest as they reveal inaccessible states and final markings in a net when starting from a concrete model.This is closely related to the concept of marking graphs in Petri Nets [54, Sec.2.8].

Translators
Another salient benefit of having an unambiguous and complete formalization of a modeling language is that it can serve as a single point of platform independent specification, thereby being precise enough to be automatically processed by a machine.Of course a modeling language without a technical tool supporting the creation and execution of models is very much useless for the target audience.When implementing a language many engineers have made the experience that available metamodeling platforms differ heavily in available concepts and functionality and thereby impose more or less severe restrictions on the final product [36].So the implementation forces the engineer to think in the frame of the used platform and to modify the language to fit to the given meta 2 model and available model processing algorithms.A further drawback of this current practice is the fact, that each effort of implementation is lost, whenever the language has to be transferred to another platform, may it be caused by missing functionality for new language features or a cessation of platform support.
With the formalization of a language as stipulated by the AMME lifecylce of modeling methods we derive a sort of platform independent code and close the gap between the specification document and the final implementation.By using the proposed formalism the specification of the main concepts is unified and therefore offers the possibility to be translated to any metamodeling platform.Thus, the language specification stays on a platform-agnostic level and the complexity of the platform-specificity can be outsourced to a platformspecific translator.The feasibility of this endeavour has been shown by Visic et al. [63,64].When platforms change only the translator has to be adapted but not the platform-independent conceptualization of a language.
While Visic et al. stay at the level of translators of language syntax our attempt on the formalization of model operations shown in Section 5.2 holds promise to be able to integrate an automatic translation of the functionality of modeling languages.The decomposition of domain events into the three types of structural events allows for an automatization of translating the modeling language specification to a concrete tool as most platforms offer methods for creating or deleting elements or changing attributes.

Evaluation
To evaluate the proposed formalism we recap the requirements mentioned in the Section 1.2: 1) The formalism has to be complete regarding the general building blocks of a language, 2) it has to be faithful to the character of modeling languages as such, and 3) it must be generic in a way that it admits the formalization of any language.
The proposed formalism comprises the core concepts constituting a modeling language.These were chosen based on a survey by Kern et al. [36] and the concept discussion by Olivé [50].We restricted to the most common concepts, i.e. those appearing in at least half of the surveyed metamodeling platforms in [36].In Section 3 we also listed the concepts for future integration.Regarding the first requirement we conclude that the proposed definition of a modeling language is not yet complete, but depicts the most relevant core.This is also shown by the realizability of several case studies depicted in this paper.
In current scientific literature there is a consensus about modeling languages being formal languages by nature [16,25,50,52,62].This supports our choice of using logic as basis for the formalism and underpins the adherence to the linguistic character of languages including the alphabet and the instantiation relation.Additional affirmation is given by the multitude of practical constructs and methods of formal language theory and its straightforward applicability to current research issues, which was exemplarily shown in Section 5.
The generic realizability of arbitrary modeling languages is provided by construction, as the concepts integrated in the formalism stem from literature concerning conceptual modeling in general.A realization of the three divergent use cases in this paper and several more unpublished use cases conducted by the authors furthermore backs this claim.
The empirical evaluation of feasibility and usability so far has been mainly conducted via the realization of prototypical case studies of various domains (not all published).Three of them were shown in this paper.Other cases guiding the advancement of the formalism are for example ER-diagrams starting in [18].In the light of language interleaving we formalized a language for modeling smart cities [7] besides the UML case study.To investigate the formalization of model operations we formalized Petri Nets and ProVis, a tool for math education providing sophisticated methods to process statistical diagrams [20].
A proof of concept for the significance of the presented formalism can be given by an implementation of translators to at least two different metamodeling platforms especially if we are able to integrate a specification of model operations.Such a tool is currently under design.
In parallel, a more outreaching empirical evaluation is currently being designed.We will conduct a study with around sixty business informatics students in a university course about metamodeling.The students will be asked to apply the proposed formalism to the modeling language they develop during the course and to evaluate complexity and limitations.

Conclusion
In this paper we presented a definition of modeling languages as formal languages L with a signature Σ in the sense of logic.The concept of a L-structure canonically corresponds to the instantiation relation between model and language and led us to the definition of models as L-structures.To illustrate the specification of formal modeling languages we demonstrated the definition on the Petri Net modeling language.We applied the definition also on the meta level and developed M2FOL -a formal modeling language for metamodels.M2FOL models are precise and complete and therefore we were able to show how to algorithmically derive a formal modeling language signature from its metamodel.M2FOL is self-describing, which can be seen by applying the algorithm to its own metamodel.
After the introduction of the formalism we gave an outlook to the potential and benefits of formalized modeling languages using the approach at hand.We addressed the topic of language interleaving and consistency.Established methods from formal language theory provide methods to create an interleaved formal language from existing ones.We illustrated the process on a case study using UML Class Diagrams and Sequence Diagrams.
Another topic with high potential for the automatization of language implementation is the formalization of model operations.We outlined how to break down algorithms on models in smallest possible building blocks able to be formalized.This allows model operations to become an integral part of the formal language specification.
This formal specification -syntax as well as operations -precise enough to be processed by a machine yet platform-independent, additionally allows us to develop platform-specific translators, transferring the single source of language specification to realizations on different platforms.
With this common practice of defining metamodels and modeling languages, these languages become comparable, reusable, and open to modularization.To broaden the conceptual capabilities of our approach we will further investigate more subtle concepts to be integrated into the definition.These are for example powertypes [50, chapter 17.2], the concepts of mixins and extenders for modular metamodels as proposed in [65], or the structural types of relations identified in [61].For a practical application of the language M2FOL, a suitable tool for transforming graphical metamodels into formal ones will be developed.
Finally, by using a sophisticated mathematical theory as grounding for the definition of modeling languages we can use this knowledge stack as resource to further establish a formal foundation for modeling languages.We can investigate the subclass of conceptual modeling languages in the class of formal languages and approach old problems with new tools.

Definition 1 A
(formal) modeling language L consists of a typed signature Σ = {S, F, R, C} and a set C of sentences in L for the constraints, where: -S is a set of types, which can be further divided into three disjoint subsets S O , S R , and S D for object types, relation types and data types; the type set S O is strictly partially ordered with order relation < O ⊂ S O × S O to indicate the inheritance relation between the corresponding object types; the type set S D can contain simple types T for value domains of single value attributes, or product types T = T 1 × T 2 × • • • × T n for value domains of n-ary multi-value attributes (n > 1), where the i-th value is of type T i ∈ S D ∪S O ∪S R ; -F is a set of typed function symbols such that: for each relation type R in S R there exist two function symbols F R s and F R t with domain type R ∈ S R and codomain type O s , O t ∈ S O assigning the source and target object types to a relation; for each single-value attribute A of an object or relation type T there exists a function symbol F A with domain type T and codomain type an element in S D ∪ S O ∪ S R assigning the simple data type or referenced object type or relation type to the attribute; for each multi-value attribute A of an object or relation type T there exists a function symbol F A with domain type T and codomain type an product type in S D ; -R is a set of typed relation symbols containing < O ; -C is a set of typed constants to specify the possible values c i of a simple type T ∈ S D of the attributes; the set C is a set of sentences in L constraining the possible models, also called the postulates of the language.
interpretation of the function symbols in L, i.e. for each function symbol F ∈ F with domain type T 1 × . . .× T n and codomain type T a function f : U T1 × . . .× U Tn → U T ; an interpretation of the relation symbols in L, i.e. for each relation symbol R ∈ R with domain type T 1 × . . .× T m a relation r ⊂ U T1 × . . .× U Tm ; for each simple type T ∈ S D and constant C ∈ C of type T an interpretation c ∈ U T ; for each constraint φ in C the model M satisfies φ, i.e.M |= φ.

Example 1
The Petri Net Modeling Language PN The Petri Net metamodel depicted in Figure 3 comprises three object types Node (No), Place (Pl), and Transition (Tr) constituting S O .Thereby Place and Transition inherit from Node, i.e.Place < O Node and Transition < O Node.Furthermore, the language comprises only one relation type Arc element of S R .For the attribute Tokens of object type Place we need a type N with the usual addition + and order relation < N as well as constants in C = {0, 1, 2, ...} all of type N. The set S of types is then the union S = S O ∪ S R ∪ S D ={Node, Place, Transition, Arc, N}.For the relation Arc we have to specify the source and target object types by introducing two function symbols F Arc s and F Arct

) Example 2 A
Fig. 4: A Petri Net model depicting a simple barber shop scenario and f Arc t (a 6 ) = s.For the attribute type and values the natural numbers N are included in the model, U N = {0, 1, 2, ...}.The instantiation of the attribute Tokens looks as follows: f T okens (w) = 2, f T okens (b) = 0 and f T okens (i) = 1.We can easily check that the formalized model satisfies all postulates 1-4 of the language PN .

Definition 3
The metalanguage M2FOL is a modeling language with signature Σ = {S, F, R, C} with the set of types split in S = S O ∪ S R ∪ S D , where: -S O consists of the types O(bject) T(ype), R(elation) T(ype), A(ttribute) T(ype), D(ata) T(ype), and D(ata), furthermore two supertypes: ORT(ype), and DORT(ype) S O = {OT, RT, AT, DT, D, ORT, DORT}; -The types OT, and RT, inherit from ORT, the types ORT, and DT inherit from DORT: OT < O ORT, RT < O ORT, ORT < O DORT, DT < O DORT; -S R consists of the types Inh(eritance), Fr(om), and To: S R = {Inh, Fr, To}; -S D contains product types DORT n for all n > 1 as well as a type T DORT for the union of all DORT n : T DORT = i DORT i the set of function symbols consists of following elements: two symbols F Inh s and F Inh t assigning source and target to Inh-typed relations: F Inh s : Inh → OT, F Inh t : Inh → OT; two symbols F Fr s and F Fr t assigning source and target to Fr-typed relations: F Fr s : Fr → RT, F Fr t : Fr → OT; two symbols F To s and F To t assigning source and target to To-typed relations: F To s : To → RT, F To t : To → OT; two symbols F val and F type assigning to an attribute type its value domain and the object or relation type it belongs to.The value assignment can be a reference or a n-valued type in DORT n : F val : AT → i (DORT) i , F type : AT → ORT; a symbol F DT to assign a data type to a data element: F DT : D → DT; -R consists of a symbol < OT transitively extending the inheritance relation given by Inh to a strict partial order on the set of object types R = {< OT ⊂ OT × OT}.

Fig. 8 :
Fig. 8: Simplified metamodels of the UML Class Diagram (a) and Sequence Diagrams (b)