Keywords

1 Introduction

Motivation. The generation of sample instance models of Domain-Specific Language (DSL) specifications has become an active research line due to its increasing industrial relevance for engineering complex modeling tools by using large metamodels (MM) and complex well-formedness (WF) constraints [25]. Such instance models derived as representative examples [2] and counterexamples [18, 32] may serve as test cases or performance benchmarks for DSL modeling tools, model transformations or code generators [4]. Existing approaches dominantly use either a logic solver or a rule-based instance generator in the background.

Problem Statement. Model finding using logic solvers [16] (like SMT or SAT-solvers) is an effective technique (1) to identify inconsistencies of a DSL specification or (2) to generate well-formed sample instances of a DSL. This approach handles complex global WF constraints which necessitates to access and query several model elements during evaluation. Model generation for graph structures needs to satisfy complex structural global constraints (which is typical characteristic for DSLs), which restricts the direct use of logical numerical and constraint solvers despite the existence of various encodings of graph structures into logic formulae. As the metamodel of an industrial DSL may contain hundreds of model elements, any realistic instance model should be of similar size. Unfortunately, this cannot currently be achieved by a single direct call to the underlying solver [17, 32], thus existing logic based model generators fail to scale. Furthermore, logic solvers tend to retrieve simple unrealistic models consisting of unconnected islands of model fragments and many isolated nodes, which is problematic in an industrial setting.

Rule-based instance generators [4, 13, 33] are effective in generating larger model instances by independent modifications to the model by randomly applying mutation rules. Such a rule-based approach offers better scalability for complex DSLs. These approaches may incorporate local WF constraints which can be evaluated in the context of a single model element (or within its 1-context). However, they fail to handle global WF constraints which require to access and navigate along a complex network of model elements. Since constraint evaluation is typically the final step of the generation process, the synthesized models may violate several WF constraints of the DSL in an industrial setting.

Contribution. The long term objective of our research is to synthesize large, well-formed and realistic models. In this paper, we propose an iterative process for incrementally generating valid instance models by calling existing logic solvers as black-box components using various abstractions and approximations to improve overall scalability. (1) First, we apply enhanced metamodel pruning [33] and partial instance models [32] to reduce the complexity of model generation subtasks and the retrieved partial solutions initiated in each step. (2) Then we propose an (over-)approximation technique for well-formedness constraints in order to interpret and evaluate them on partial (pruned) metamodels. (3) Finally, we define a workflow that incrementally generates a sequence of instance models by refining and extending partial models in multiple steps, where each step is an independent call to the underlying solver. We carried out experiments using the state-of-the-art Alloy Analyzer [16] to assess the scalability of our approach.

Added Value. Our approach increases the size of generated models by carefully controlling the information fed into and retrieved back from logic solvers in each step via abstractions. Each generated model (1) increases in size by only a handful number of elements, (2) satisfies all WF constraints (on a certain level of abstraction), and (3) it is realistic in the sense that each model is a single component (and not disconnected islands). The incremental derivation of the result set provides graceful degradation, i.e. if the back-end solver fails to synthesize models of size N (due to timeout), all previous model instances are still available. From a practical viewpoint, the DSL engineer can influence or assist the instance generation process by selecting the important fragment of the analyzed metamodel (so called effective metamodel [4]). This is also common practice for testing model transformations or code generators.

Structure of the Report. Next, Sect. 2 introduces some preliminaries for formalizing metamodels, constraints and partial snaptshots. Our approach is presented in Sect. 3 followed by an initial experimental evaluation in Sect. 4. Related work is assessed in Sect. 5 while Sect. 6 concludes our paper.

2 Preliminaries

In this section we present an overview of model generation with logic solvers with a running case study of Yakindu statecharts. Yakindu Statecharts Tools [37] is an industrial integrated modeling environment developed by Itemis AG for the specification and development of reactive, event-driven systems based on the concept of statecharts captured in combined graphical and textual syntax. Yakindu simultaneously supports static validation of well-formedness constraints as well as simulation of (and code generation from) statechart models. A sample statechart is illustrated in Fig. 1. Yakindu provides two types of synchronization mechanisms: explicit synchronization nodes (marked as black rectangles) and event-based synchronization (i.e. raising and consuming events).

Fig. 1.
figure 1figure 1

Example Yakindu statechart with synchronisations.

Validation is crucial for domain-specific modelling tools to detect conceptual design flaws early and ensure that malformed models does not processed by tooling. Therefore missing validation rules are considered as bugs of the editor. While Yakindu is a stable modeling tool, it is still surprisingly easy to develop model instances as corner cases which satisfy all (implemented) well-formedness constraints of the language but crashes the simulator or code generator due to synchronization issues. One of such problems is depicted in Fig. 1 where (1) after 5 s a (2) timeout event raised in region timer, but (3) it cannot be accepted in state wait in the simulator and in the generated code.

Our goal is to systematically synthesize such model instances by using logic solvers in the background by mapping DSL specifications to a logic problem [17, 32]. Such model generation approach usually takes three inputs: (1) a metamodel of the domain (Sect. 2.1), (2) a set of well-formedness constraints of the language (Sect. 2.2), and optionally (3) a partial snapshot (Sect. 2.3) serving as an initial seed which generated models need to contain.

2.1 Domain Metamodel

Metamodels define the main concepts, relations and attributes of the target domain to specify the basic structure of the models. In this paper, the Eclipse Modeling Framework (EMF) is used for domain modeling, which is dominantly used in many industrial DSL tools and modeling environments. The main concepts are illustrated using Yakindu state graph metamodel [37] in Fig. 2.

Fig. 2.
figure 2figure 2

Metamodel extract from Yakindu state machines

A state machine consists of Regions, which in turn contain states (called Vertexes) and Transitions. An abstract state Vertex is further refined into RegularStates (like State) and PseudoStates like Entry and Synchronization states. Note that we intentionally kept the generalization hierarchy unchanged and simplified the original metamodel only by removing some elements. Metamodel elements are mapped to a set of logic relations as defined in [17, 32]:

  • Classes (CLS): In EMF, EClasses can be instantiated to EObjects, where the set of objects of a model is denoted by \( objects _{}\). Additionally, the metamodel can specify finite types with predefined set of \( enum =\{l_1,\ldots ,l_n\}\) literals by EEnums. For both classes and enums, if an o is an instance of a type C it is denoted as \(\textsf {C}(o)\).

  • References (REF): EReferences between classes S and T capture a binary relation R(ST) of the metamodel. When two objects o and t are in a relation R, an EReference is instantiated leading from o to t denoted as \(\textsf {R}(o,t)\).

  • Attributes (ATT): EAttributes enrich a class C with values of predefined primitive types like integers, strings, etc. by binary relations A(CV). If an object o stores a value v as attribute A it is denoted as \(\textsf {A}(o,v)\).

Further structural restrictions implied by a metamodel (and formalized in [32]) include (1) Generalization (GEN) which expresses that a more specific (child) class has every structural feature of the more general (parent) class, (2) Type compliance (TC) that requires that for any relation \(\textsf {R}(o,t)\), its source and target objects o and t need to have compliant types, (3) Abstract (ABS): If a class is defined as abstract, it is not allowed to have direct instances, (4) Multiplicity (MUL) of structural features can be limited with upper and lower bound in the form of “lower..upper” and (5) Inverse (INV), which states that two parallel references of opposite direction always occur in pairs. EMF instance models are arranged into a strict containment hierarchy, which is a directed tree along relations marked in the metamodel as containment (e.g. regions or vertices).

An instance model M is an instance of a metamodel \( Meta \) (denoted with \(M \models Meta \)) if all the corresponding constraints above are satisfied, i.e. \( Meta = CLS \wedge REF \wedge \dots \wedge MUL \wedge INV \) [32]. Therefore a model generation task for a given size s and a metamodel \( Meta \) can be solved as logic problem, where the solver creates an interpretation for all class predicates, all reference and attribute relations over the set of \( objects _{} = \{o_1,\ldots ,o_s\}\) and sets of enum literals, which satisfies all structural constraints.

2.2 Well-Formedness Constraints

Structural well-formedness (WF) constraints (aka design rules or consistency rules) complement metamodels with additional restrictions that have to be satisfied by a valid instance model (in our case, statechart model). Such constraints are frequently defined by graph patterns [36] or OCL invariants [27]. To abstract from the actual constraint language, we assume in the paper that WF constraints are defined in first order logic. Given a set \( WF \) of well-formedness constraints, a model M is called valid if \(M \models Meta \wedge WF \).

Example. The Yakindu documentation states several constraints for statecharts including the following ones regulating the use of synchronization states. (Abbreviated names of classes and references are used as predicates).

  • \(\varPhi _1\) Source states of a synchronization have to be contained in different regions! \(\forall syn,s_1,s_2,t_1,t_2,r_1,r_2: \) \(( \textsf {Synchron}(syn) \wedge \textsf {outgoing}(s_1,t_1) \wedge \textsf {outgoing}(s_2,t_2) \wedge \textsf {target}(t_1,syn) \wedge \) \( \textsf {target}(t_2,syn) \wedge \textsf {vertices}(r_1,s_1) \wedge \textsf {vertices}(r_2,s_2) \wedge s_1 \ne s_2) \Rightarrow r_1 \ne r_2\)

  • \(\varPhi _2\) Source states of a synchronization are contained in the same parent state! \(\forall syn,s_1,s_2,t_1,t_2,r_1,r_2 \exists p: \) \(( \textsf {Synchron}(syn) \wedge \textsf {outgoing}(s_1,t_1) \wedge \textsf {outgoing}(s_2,t_2) \wedge \textsf {target}(t_1,syn) \wedge \) \( \textsf {target}(t_2,syn) \wedge \textsf {vertices}(r_1,s_1) \wedge \textsf {vertices}(r_2,s_2) \wedge s_1 \ne s_2) \) \( \Rightarrow (\textsf {regions}(p,r_1) \wedge \textsf {regions}(p,r_2))\)

  • \(\varPhi _3\) Target states of a synchronization have to be contained in different regions! \(\forall syn,s_1,s_2,t_1,t_2,r_1,r_2: \) \(( \textsf {Synchron}(syn) \wedge \textsf {incoming}(s_1,t_1) \wedge \textsf {incoming}(s_2,t_2) \wedge \textsf {source}(t_1,syn) \wedge \) \( \textsf {source}(t_2,syn) \wedge \textsf {vertices}(r_1,s_1) \wedge \textsf {vertices}(r_2,s_2) \wedge s_1 \ne s_2) \Rightarrow r_1 \ne r_2\)

  • \(\varPhi _4\) Target states of a synchronization are contained in the same parent state! \(\forall syn,s_1,s_2,t_1,t_2,r_1,r_2 \exists p: \) \(( \textsf {Synchron}(syn) \wedge \textsf {incoming}(s_1,t_1) \wedge \textsf {incoming}(s_2,t_2) \wedge \textsf {source}(t_1,syn) \wedge \) \( \textsf {source}(t_2,syn) \wedge \textsf {vertices}(r_1,s_1) \wedge \textsf {vertices}(r_2,s_2) \wedge s_1 \ne s_2) \) \( \Rightarrow (\textsf {regions}(p,r_1) \wedge \textsf {regions}(p,r_2))\)

  • \(\varPhi _5\) A synchronization shall have at least two incoming or outgoing transitions! \(\forall syn: \textsf {Synchron}(syn) \Rightarrow \exists t_1,t_2 :t_1 \ne t_2 \wedge ( (\textsf {incoming}(t_1,syn)\wedge \textsf {incoming}(t_2,syn)) \vee (\textsf {outgoing}(t_1,syn) \wedge \textsf {outgoing}(t_2,syn)))\)

2.3 Partial Snapshots

Partial Snapshots (PS) specify required instance model fragments of a metamodel [32]. A partial snapshot is a model constructed from the same classes and relations as a valid instance model. Formally, a PS satisfies the constraints \( CLS \), \( GEN \), \( REF \) and \( TC \), but it possibly violates \( ABS \), \( ATT \), \( MUL \) and \( INV \), which means that even abstract classes can be instantiated, and multiplicity constraints, the inverse relation of references and containment hierarchy rules might be violated. If a PS is a partial snapshot of a metamodel it is denoted by \(PS \models _{P} Meta \). A model M contains a partial snapshot PS (denoted with \(M \models PS\)) if there is a morphism \(m: PS \rightarrow M\) (composed of a pair of morphisms \( objects _{PS} \rightarrow objects _{M}\) and \( references _{PS} \rightarrow references _{M}\) for mapping objects and references) which satisfies the following constraints for each \(o_1 ,o_2 \in objects _{PS}\):

  1. 1.

    m is injective: \(o_1 \ne o_2 \Rightarrow m(o_1)\ne m(o_2) \)

  2. 2.

    For each class C the mapping preserves the type: \(\textsf {C}(o_1) \Rightarrow \textsf {C}(m(o_1))\)

  3. 3.

    For each reference R the mapping preserves the source and the target of the reference: \(\textsf {R}(o_1,o_2) \Rightarrow \textsf {R}(m(o_1),m(o_2))\)

  4. 4.

    For each attribute A the mapping preserves the attribute value v and the location: \(\textsf {A}(o_1,v) \Rightarrow \textsf {A}(m(o_1),v)\)

A partial snapshot can be generalized from a regular (fully specified) instance model by relaxing specific properties identified by the DSL developer [32] to guide testing in practical cases. In the current paper, we create partial snapshots by iteratively reusing the instance models generated in a previous run to achieve incremental model generation (see Sect. 3.3).

3 Incremental Model Generation by Approximations

Despite the precise definition of logic formulae for our statechart language using existing mappings [32], a major practical drawback is that a direct (single step) model generation using Z3 or Alloy as back-end solver only terminates for very small model sizes. If we aim to improve scalability by omitting certain constraints, the synthesized models are no longer well-formed thus they cannot be fed into Yakindu as sample models.

To increase the size of synthesized models while still keeping them well-formed, we propose an incremental model generation approach (Sect. 3.3) by iterative calls to backend solvers exploiting two enabling techniques of metamodel pruning (Sect. 3.1) and constraint approximation (Sect. 3.2).

3.1 Metamodel Pruning

Metamodel pruning [13, 33] takes a metamodel Meta as input and derives a simplified (pruned) metamodel \(Meta_P\) as output by removing some EClasses, EReferences and EAttributes. When removing a class from a metamodel, we need to remove all subclasses, all attributes and incoming or outgoing references to obtain a consistent pruned metamodel. Formally, we may iteratively remove certain predicates from Meta by pruning as follows:

  • EReference: if \(R(S,T) \in Meta\) then \(R(S,T) \not \in Meta_P\);

  • EAttributes: if \(A(C,V) \in Meta\) then \(A(C,V) \not \in Meta_P\);

  • EClasses: if \(C \in Meta\) and \(sub(C,Sub) \not \in Meta_P\) and \(A(C,V) \not \in Meta_P\) and \(R(C,T) \not \in Meta_P\) and \(R(S,C) \not \in Meta_P\) then \(C \not \in Meta_P\);

Fig. 3.
figure 3figure 3

Metamodel pruning with overapproximation

Example. We prune our statechart metamodel in two phases (see the slices in Fig. 2): classes Trigger, Guard and Action are omitted together with incoming references (Stage II), and then classes Transition, Pseudostate, Entry and Synchronization are removed (Stage I).

By using metamodel pruning, we first aim to generate valid instance models for the pruned metamodel and then extend them to valid instance models of the original larger metamodel. For that purpose, we exploit a property we call the overapproximation property of metamodel pruning (see Fig. 3), which ensures that if there exist a valid instance model M for a metamodel \( Meta \) (formally, \(M \models Meta \)) then there exists a valid instance model \(M_P\) for the pruned metamodel \( Meta _P\) (formally, \(M_P \models Meta_P \)) such that \(M_P\) is a partial snapshot of M (\(M_P \subseteq M\)). Consequently, if a model generation problem is unsatisfiable for the pruned metamodel, then it remains unsatisfiable for the larger metamodel. However, we may derive a pruned instance model \(M_P\) which cannot be completed in the full metamodel \( Meta \), which is called a false positive.

Example. The statechart model in the middle of Fig. 3 corresponds to the pruned metamodel after Stage II. In our example, it can be extended by adding transitions and entry states to the model illustrated in the right side of Fig. 3, which now corresponds to the pruned metamodel of Stage I.

3.2 Constraint Pruning and Approximation

When removing certain metamodel elements by pruning, related structural constraints (such as multiplicity, inverse, etc.) can be automatically removed, which trivially fulfills the overapproximation property. However, the treatment of additional well- formedness constraints needs special care since simple automated removal would significantly increase the rate of false positives in a later phase of model generation to such an extent that no intermediate models can be extended to a valid final model.

Based on some first-order logic representation of the constraints (derived e.g. in accordance with [32]), we propose to maintain approximated versions of constraint sets during metamodel pruning. In order to investigate the interrelations of constraints, we assume that logical consequences of a constraint set can be derived manually by experts or automatically by theorem provers [21]. The actual derivation approach falls outside the scope of the current paper. Given a DSL specification with a metamodel \( Meta \) and a set of WF constraints \( WF = \{\varPhi _1, \dots , \varPhi _n\}\), let \(\varPhi \) be a formula derived as a theorem \( WF \vdash \varPhi \).

Now an overapproximation of formula \(\varPhi \) over metamodel \( Meta \) for a pruned metamodel \( Meta _P\) is a formula \(\varPhi _P\) such that (1) \(\varPhi \Rightarrow \varPhi _P\), (2) \(\varPhi _P\) contains symbols only from \( Meta _P\). The details of approximation are illustrated in Fig. 4 where R denotes a relation symbol derived for class or reference predicates in accordance with the metamodel. While more precise approximations can possibly be defined in the future, the current approximation is logically correct as if a model generation problem is unsatisfiable for an approximated set of constraints (over the pruned metamodel) then it remains unsatisfiable for the original set of constraints.

Fig. 4.
figure 4figure 4

Constraint pruning and approximation

Example. Based on the set of WF constraints \(\{\varPhi _1, \varPhi _2, \varPhi _3, \varPhi _4, \varPhi _5 \}\) defined in Sect. 2.2, a prover can derive the following formula as a theorem over the metamodel of Stage II: \(\varPhi _{syncout} \vee \varPhi _{syncin}\), where \(\varPhi _1,\varPhi _5\models \varPhi _{syncout} \vee \varPhi _{syncin}\). The generated theorem \(\varPhi _{syncout}\) (and \(\varPhi _{syncin}\)) restricts the number of outgoing (ingoing) transitions from (to) a synchronization as follows:

\(\varPhi _{syncout} = \forall syn \exists \underline{t_1, t_2}, s_1, r_1, r_2, p: \textsf {Synchron}(syn) \Rightarrow \)

\( \underline{(\textsf {outgoing}(syn,t_1)} \wedge \underline{\textsf {target}(t_1,s_1)} \wedge \underline{\textsf {outgoing}(syn,t_2)} \wedge \underline{\textsf {target}(t_2,s_2)} \wedge s_1 \ne s_2 \wedge \)

\( \textsf {vertices}(r_1,s_1) \wedge \textsf {vertices}(r2,s2) \wedge r_1 \ne r_2 \wedge \textsf {regions}(p,r1) \wedge \textsf {regions}(p,r2))\)

The variables and relations approximated in this phase are underlined: in Stage I the generation is restricted to the model by omitting transitions. The result of overapproximation states that if a model contains a synchronization, then needs to contain at least two regions:

\(\varPhi _{syncout}^{O} \vee \varPhi _{syncin}^{O} = \forall syn \exists s_1, r_1, r_2, p: \textsf {Synchron}(syn) \Rightarrow \)

\( (s_1 \ne s_2 \wedge \textsf {vertices}(r_1,s_1) \wedge \textsf {vertices}(r2,s2) \wedge r_1 \ne r_2 \wedge \textsf {regions}(p,r1) \wedge \textsf {regions}(p,r2))\)

Applying the approximation rules of Fig. 4 directly on \(\{\varPhi _1, \varPhi _5\}\) would lead to \(\varPhi _1^{O}: true \) and \(\varPhi _5^{O}: true \). These constraints are too coarse overapproximations providing no useful information to the model generator at this phase.

3.3 Incremental Model Generation by Iterative Solver Calls

By using metamodel pruning, we first aim to generate valid instance models for the pruned metamodel, which is a simplified problem for the underlying logic solver. Instance models of increasing size will be gradually generated by using valid models of the pruned metamodel as partial snapshots (i.e. initial seeds) for generating instances for a larger metamodel. Therefore, an incremental model generation task is also given with a target size s and a target metamodel \( Meta \), but with an additional partial snapshot \(M_P\). \(M_P\) is a valid instance of pruned metamodel \( Meta _P\). \(M_P\) has \(s_P\) number of objects (\(s_P \le s\)).

From a logic perspective, the partial snapshot defines a partial interpretation of relations for model generation, which may simplify the task of the solver compared to using fully uninterpreted relations. In order to exploit this additional information, the relations in the logic problem are partitioned into two sets of interpreted and uninterpreted symbols. \( objects _{P} = \{o_1,\ldots ,o_{s_P}\}\) are the objects in the partial snapshot. The extra objects to be generated in this step are denoted by \( objects _{N} = \{o_{s_P+1},\ldots ,o_{s}\}\). The relations are partitioned according to the following rules:

  • Classes (CLS): Each class predicate \(\textsf {C}(o)\) in Meta is separated into two: a fully interpreted \(C_{O}(o)\) predicate for the objects in the partial snapshot \( objects _{P}\), and an uninterpreted \(C_{N}(o)\) for the newly generated objects \( objects _{N}\). Therefore an object o is instance of a class C in the generated model if \(C_{O}(o) \vee C_{N}(o)\) is satisfied. If the class is not in the pruned metamodel (\(C \not \in Meta_P\)) then \(C_{O}(o)\) is to be omitted, and if no new elements are created from a class then \(C_{N}(o)\) can be omitted.

  • References (REF): Each reference predicate \(\textsf {R}(o,t)\) is separated into four categories: a fully interpreted \(R_{OO}(o,t)\) between the objects of the partial snapshot (\( objects _{P}\)), an uninterpreted \(R_{NN}(o,t)\) between the objects of the newly created objects (\( objects _{N}\)), and two additional uninterpreted relations \(R_{ON}(o,t)\) and \(R_{NO}(o,t)\) connecting the elements of the partial snapshot with the newly created elements (relations over \( objects _{O}\times objects _{N}\) and \( objects _{N}\times objects _{O}\) respectively). Therefore a reference R(ot) exists in the generated model if \(R_{OO}(o,t) \vee R_{NN}(o,t) \vee R_{NO}(o,t) \vee R_{ON}(o,t)\). If the relation is not in the pruned metamodel (\(R \not \in Meta_P\)) then \(R_{OO}(o,t)\) can be omitted, and if no new elements are created from a class then \(R_{NN}(o,t)\), \(R_{NO}(o,t)\) and \(R_{ON}(o,t)\) can also be omitted.

  • Attributes (ATT): Attribute predicates are separated into a fully interpreted \(A_{O}(o,v)\) for the objects in the partial snapshots \( objects _{P}\), and an uninterpreted relation \(A_{N}(o,v)\) for the newly created elements \( objects _{N}\). An object o has an attribute value v (A(ov)) if \(A_{O}(o,v) \vee A_{N}(o,v)\). Attribute predicates are treated as reference predicates for omission.

The level of incrementality is still unfortunately limited from an important aspect. The background solvers typically provide no direct control over the simultaneous creation of new elements, i.e. we cannot provide domain- specific hints to the solver when the creation of an object always depends on the creation or existence of another object. This can still cause issues when a multitude of WF constraints are defined.

Example. In our running example, the instance models are generated in four steps, which is illustrated in Fig. 5. First, initial seeds are generated for the state hierarchy (\(M_1\) over \( Meta _1\)), which are extended in the second step to model \(M_2\) with the same metamodel elements. Then the metamodel is extended to \( Meta _2\), and the transitions and the initial states are added to model \(M_3\). Finally, triggers, guards and actions can be added to the model to obtain \(M_4\).

Fig. 5.
figure 5figure 5

Model generation iterations

4 Measurements

In order to assess the effectiveness of incremental model generation using constraint approximation for synthesizing well-formed instance models for domain-specific languages, we conducted some initial experiments using the Alloy Analyzer as background solver. We were interested in the following questions:

  • Is incremental model generation with metamodel pruning and constraint approximation effective in increasing the size of models, the success rate or decreasing the runtime of the solver?

  • Is incremental model generation still effective if metamodel pruning or constraint approximation is excluded?

Configurations. We conducted measurements on two versions of the Yakindu statechart metamodel: Phase 1 and Phase 2 (see Fig. 2). The pruned metamodel of Phase 1 (\( MM1 \)) contains 8 classes and 2 references, and no well- formedness constraints by default. The metamodel of Phase 2 (\( MM2 \)) contains 10 classes, 4 references and 8 constraints (including the 5 WF constraints listed in the paper and 3 more for restricting entry states).

  • As a base configuration, the Alloy Analyzer is executed separately for the two problems with 1 min timeout. We record two cases: the largest model derived and a slightly larger model size where timeout was observed.

  • Next, we run the solver incrementally with an initial model of size N and an increment of size K denoted as \(N+K\) in Fig. 6 without constraint approximation but with metamodel pruning. Moreover, instance models derived for Phase 1 are used as partial snapshots for Phase 2.

  • Then we run the solver incrementally with constraint approximation but without metamodel pruning. For that purpose, the constraint set for Phase 1 constains two approximated constraints: (1) Each region has a state where the entry state will point, and (2) There are orthogonal states in the model. Again, instance models derived for Phase 1 are used as partial snapshots for Phase 2, but the full metamodel is considered in Phase 2.

  • Finally we configure the solver for full incrementally with constraint approximation and metamodel pruning by reusing instances of Phase 1 as partial snapshots in Phase 2.

Measurement Setup. Each model generation task was executed on the DSL presented in this paper 5 times using the Alloy Analyzer (with SAT4j- solver), then the median of the execution times was calculated. The measures are executed with one minute timeout on an average personal computerFootnote 1. We measure the runtime of model generation, the model size denoting the maximal number of elements the derived model may contain, and the success rate denoting the percentage of cases when a well-formed model was derived, which satisfies all WF constraints within the given search scope.

Measurement Results. Results of our measurements are summarized in Fig. 6. We summarize our observations below.

Fig. 6.
figure 6figure 6

Measurement results

  • Base. For \( MM1 \), Alloy was able to generate models with up to 60 objects. As there are no constraints at this level, many synchronizations are created (about half of the objects were synchronization and with only 5–10 states). Over 60 objects, the runtime grows rapidly as the SAT solver runs out of the maximal 4 GB memory. For \( MM2 \), Alloy was unable to create any models that satisfies all of the constraints as the search scope turned out to be too small to create valid models with synchronizations.

  • W/o Approx. Alloy was able to generate models with 100 elements in two steps where each iterative step had comparable runtime. However, since no constraints are considered for \( MM1 \), Alloyed failed to extend partial snapshots of \( MM1 \) to well-formed models for \( MM2 \) (success rate: 0 %, although for this specific case, we executed over 100 runs of the solver due to the unexpectedly low success rate). Furthermore, we had to reduce the scope of search to 20 and 30 new elements with types taken from \( MM2 \setminus MM1 \) due to timeouts.

  • W/o Prune. When metamodel pruning was excluded but approximated constraints were included for \( MM1 \), model generation succeeded for 100 elements, but extending them to models of \( MM2 \) failed (as in this case, new elements could take any elements from \( MM2 \))

  • Full. With incremental model generation by combining metamodel pruning and constraint approximation, we were able to generate well-formed models for both \( MM1 \) and \( MM2 \), which was the only successful case for the latter.

Analysis of Results. While we used a reasonably sized statechart metamodel extracted from a real modeling tool (including everything to model state machines, but excluding imports and namespacing), we avoid drawing generic conclusions for the exact scalability of our results. Instead, we summarize some negative results which are hardly specific to the chosen example:

  • Mapping a model generation problem to Alloy and running the Alloy Analyzer in itself will likely fail to derive useful results for practical metamodels, especially, in the presence of complex well-formedness constraints. Our observation is that many objects need to be created at the same time in consistent way, which cannot be efficiently handled by the underlying solver (either the scope is too small or out-of-memory). Altogether, the Alloy Analyzer was more effective in finding consistent model instance than proving that a problem is inconsistent, thus there are no solutions.

  • An incremental approach with metamodel pruning but without constraint approximation will increase the overall size of the derived models, but the false positive rate would quickly increase.

  • An incremental approach without metamodel pruning but with constraint approximation will likely have the same pitfalls as the original Alloy case: either the scope of search will become insufficient, or we run out of memory.

  • Combining incremental model generation with metamodel pruning and constraint approximation is promising as a concept as it significantly improved wrt. the baseline case. But the underlying solver was still not sufficiently powerful to guarantee scalability for complex industrial cases.

5 Related Work

We compared our solution with existing model generation techniques with respect to the characteristics of inputs and output results in Table 1. As for inputs, the model generation can be (1) initiated from a partial snapshot, (2) focused on an effective metamodel. Additionally, an approach may support (3) local and (4) global constraints well-formedness constraints: a local constraint accesses only the attributes and the outgoing references of an object, while a global constraint specifies a complex structural pattern. Local constraints are frequently attached to objects (e.g. in UML class diagrams), while global constraints are widely used in domain-specific modeling languages. As outputs, the generated models may (i) be metamodel-compliant (ii) satisfy all well-formedness constraints of the language. When generated models are intended to be used as test cases, some approaches may guarantee a certain level of coverage or (iii) diversity. We consider a technique (iv) scalable if there is no hard limit on the model size (as demonstrated in the respective papers). Finally, a model generation approach may be (v) decidable which always terminates with a result. Our comparison excludes approaches like which do not guarantee metamodel- compliance of generated instance models.

Table 1. Comparison of related approaches

Logic Solver Approaches. Several approaches map a model generation problem (captured by a metamodel, partial snapshots, and a set of WF constraints) into a logic problem, which are solved by underlying SAT/SMT-solvers. Complete frameworks with standalone specification languages include Formula [17] (which uses Z3 SMT- solver [26]), Alloy [16] (which relies on SAT solvers like Sat4j [23]) and Clafer [2] (using backend reasoners like Alloy).

There are several approaches aiming to validate standardized engineering models enriched with OCL constraints [14] by relying upon different back-end logic-based approaches such as constraint logic programming [6, 8, 9], SAT-based model finders (like Alloy) [1, 7, 22, 34, 35], first-order logic [3], constructive query containment [28], higher-order logic [5, 15], or rewriting logics [10].

Partial snapshots and WF constraints can be uniformly represented as constraints [32], but metamodel pruning is not typical. Growing models are supported in [19] for a limited set of constraints. Scalability of all these approaches are limited to small models / counter-examples. Furthermore, these approaches are either a priori bounded (where the search space needs to be restricted explicitly) or they have decidability issues.

The main difference of our current approach is its iterative derivation of models and the approximative handling of metamodels and constraints. However, our approach is independent from the actual mapping of constraints to logic formulae, thus it could potentially be integrated with most of the above techniques.

Uncertain Models. Partial models are also similarity to uncertain models, which offer a rich specification language [12, 29] amenable to analysis. Uncertain models provide a more expressive language compared to partial snapshots but without handling additional WF constraints. Such models document semantic variation points generically by annotations on a regular instance model, which are gradually resolved during the generation of concrete models. An uncertain model is more complex (or informative) than a concrete one, thus an a priori upper bound exists for the derivation, which is not an assumption in our case.

Potential concrete models compliant with an uncertain model can synthesized by the Alloy Analyzer [31], or refined by graph transformation rules [30]. Each concrete model is derived in a single step, thus their approach is not iterative like ours. Scalability analysis is omitted from the respective papers, but refinement of uncertain models is always decidable.

Rule-based Instance Generators. A different class of model generators relies on rule-based synthesis driven by randomized, statistical or metamodel coverage information for testing purposes [4, 13]. Some approaches support the calculation of effective metamodels [33], but partial snapshots are excluded from input specifications. Moreover, WF constraints are restricted to local constraints evaluated on individual objects while global constraints of a DSL are not supported. On the positive side, these approaches guarantee the diversity of models and scale well in practice.

Iterative Approaches. An iterative approach is proposed specifically for allocation problems in [20] based on Formula. Models are generated in two steps to increase diversity of results. First, non-isomorphic submodels are created only from an effective metamodel fragment. Diversity between submodels is achieved by a problem-specific symmetry-breaking predicate [11] which ensures that no isomorphic model is generated twice. In the second step the algorithm completes the different submodels according to the full model, but constraints are only checked at the very final stage. This is a key difference in our approach where an approximation of constraints is checked at each step, which reduces the number of inconsistent intermediate models. An iterative, counter-example guided synthesis is proposed for higher-order logic formulae in [24], but the size of derived models is fixed.

6 Conclusion and Future Work

The validation of DSL tools frequently necessitates the synthesis of well-formed and realistic instance models, which satisfy the language specification. In the paper, we proposed an incremental model generation approach which (1) iteratively calls black- box logic solvers to guarantee well-formedness by (2) feeding instance models obtained in a previous step as partial snapshots (compulsory model fragments) to a subsequent phase to limit the number of new elements, and using (3) various approximations of metamodels and constraints. Our initial experiments show that significantly larger model instances can be generated with the same solvers using such an incremental approach especially in the presence of complex well-formedness constraints.

However, part of our experimental results are negative in the sense that the proposed iterative approach is still not scalable to derive large model instances of complex industrial languages due to restrictions of the underlying Alloy Analyzer and the SAT solver libraries. We believe that dedicated decision procedures and heuristics for graph models would be beneficial in the long run to improve the performance of model generation.

As future work, we aim to generate a structurally diverse set of test cases by enumerating different possible extensions of a partial snapshot in each iteration step. Additionally, we plan to check other underlying solvers and further approximations and strategies for deriving relevant formulae as logical consequences of constraints. And finally, we will investigate if the metamodel partitions and the iteration steps can be automatically created, thus creating a (semi-)automated process with improved DSL-specific heuristics.