Implementing QVT-R via semantic interpretation in UML-RSDS

The QVT-Relations (QVT-R) model transformation language is an OMG standard notation for model transformation specification. It is highly declarative and supports (in principle) bidirectional (bx) transformation specification. However, there are many unclear or unsatisfactory aspects to its semantics, which is not precisely defined in the standard. UML-RSDS is an executable subset of UML and OCL. It has a precise mathematical semantics and criteria for ensuring correctness of applications (including model transformations) by construction. There is extensive tool support for verification and for production of 3GL code in multiple languages (Java, C#, C++, C, Swift and Python). In this paper, we define a translation from QVT-R into UML-RSDS, which provides a logically oriented semantics for QVT-R, aligned with the RelToCore mapping semantics in the QVT standard. The translation includes variation points to enable specialised semantics to be selected in particular transformation cases. The translation provides a basis for verification and static analysis of QVT-R specifications and also enables the production of efficient code implementations of QVT-R specifications. We evaluate the approach by applying it to solve benchmark examples of bx.


Introduction
Model transformations (MT) are used in model-driven engineering (MDE) to map data of a source model src to a target model trg, where the models conform to particular source and target metamodels/languages SL and T L. Transformation specifications (e.g. in QVT-R, ATL or UML-RSDS) typically consist of a collection of transformation rules, each of which is concerned with mapping source elements of one or more SL classes to target elements of one or more T L classes.
A unidirectional transformation τ can only be executed in one direction (from SL models to T L models, or from T L models to SL models) whilst a bidirectional transformation (or bx) has both forward τ → and reverse τ ← mappings, Electronic supplementary material The online version of this article (https://doi.org/10.1007/s10270-020-00824-3) contains supplementary material, which is available to authorized users. B K. Lano kevin.lano@kcl.ac.uk 1 King's College London, London, UK 2 University of Isfahan, Isfahan, Iran derived from the same transformation specification τ . The reverse mapping operates on models trg of T L to produce models src of SL. Bidirectional transformations are not necessarily bijective as functions.
Two execution modes can be distinguished for either forward or reverse transformation directions: • Batch-mode execution, where an empty trg (or src) model is populated from a src (or trg) model in one complete execution. • Incremental-mode execution, where incremental changes to a src (or trg) model are propagated to changes to an already populated trg (or src) model. Changes can be: creation/deletion of elements; reassignment of 1multiplicity features; addition/removal of elements from other features. In this paper, we address source-to-target incremental change propagation in the sense of [10].
Apart from models, incremental-mode execution could also make use of persistent traces, which record src − trg correspondences established by previous executions of the transformation. The QVT-Relations (QVT-R) language is an OMG standard for model transformation specification. The latest cur- Although QVT-R has been widely used in research, there remain several limitations with the language which prevent wider industrial adoption: • Incompleteness in the semantics makes it difficult to verify transformations, or to systematically design bx [36]. • Unclear semantics for update-in-place transformations makes it difficult to define and use such transformations. The combination of bx and update-in-place execution has not been developed [41]. • Tool support is incomplete, with the only mature tool, Medini QVT [11], no longer actively maintained. The tool uses a restricted version of QVT-R, with a variant semantics which has not been formalised.
In this paper, we aim to address these defects by providing a translation from QVT-R into the UML-RSDS formalism [18], which is a subset of UML with a formal semantics and extensive tool support. UML-RSDS directly supports transformation analysis and update-in-place execution of transformations, and efficient execution via code generation in 3GLs. The translation has itself been formalised in UML-RSDS. UML-RSDS is implemented in the Eclipse Agile UML tools (https://projects.eclipse.org/projects/modeling. agileuml). A guide to using the QVT2UMLRSDS translator is at [26].
The overall process which we use to analyse and implement QVT-R transformations is illustrated in Fig. 1. All of the steps are automated, although there may be user choices to be made in the synthesis of designs. The present paper concerns the semantic derivation and analysis steps; the design synthesis and code generation steps have been previously described [18,19,22]. Section 2 gives an overview of QVT-R. Section 3 highlights some of the issues which remain unresolved regarding the language semantics and implementation. Section 4 gives an overview of UML-RSDS. Section 5 defines the translation from QVT-R to UML-RSDS for separate-models transformations. Section 6 describes semantic analysis techniques for the translated transformations. Section 7 considers the use of design patterns in QVT-R and UML-RSDS. Section 8 gives an evaluation of the approach by applying it to a number of bx benchmarks from [36]. Section 9 compares our approach to other related work.
In the appendix, "Appendix A" defines the mapping from QVT-R to UML-RSDS for update-in-place transformations. "Appendix B" defines the logical interpretation of QVT-R domains. "Appendix C" gives the definitions of read and write frames of predicates, and of their procedural interpretations. "Appendix D" gives the interpretation of relation overriding and transformation extension. "Appendix E" (in supplementary material) gives the detailed evaluation results on 10 case studies.

QVT-R
QVT-R is one of the three MT languages defined in the QVT standard [30], the others being QVT Core (QVT-C) and QVT Operational (QVT-O). QVT-R is intended to enable transformation developers to write high-level and declarative transformation specifications, including bidirectional (bx) and multidirectional transformations, supporting both batch and incremental-mode execution. QVT-O is a unidirectional language oriented towards an imperative style, whilst QVT-C is mainly used as a low-level target language into which QVT-R or other transformation languages can be translated. Figure 2 shows the subset of the QVT-R metamodel which we address in this paper. Black-box operations are not included, nor are collection templates in target domains or the 'opposite' navigation mechanism. The computational part of a QVT-R specification (a relational transformation) consists of a set of rule definitions (termed relations) and a set of query operation definitions. The rules and queries have distinct names: rule→isU nique(name) helpers→isU nique(name) rule.name→intersection(helpers.name) = Set{} Rules may be top level, in which case they are executed on all matching source elements for which they are enabled, or non-top level, when they can only execute if explicitly invoked from a rule.

QVT-R transformation structure
For example, considering the UML2C specification of a UML to ANSI C code generator [3] (Fig. 3 shows simplified extracts of the metamodels of this system), a simple rule that A QVT-R rule has a sequence of domains (Rule::domain in Fig. 2). Each domain of a relation represents a source or target element of a specific type, e.g. u:UMLModel, from the typedModel of the domain, in this case the design model. This element is represented by the root variable of the domain (RelationDomain::rootVariable in Fig. 2). The remainder of the domain is a template pattern (Relation-Domain::pattern.templateExpression) which matches and constrains the data of the root element. Templates consist of specialised forms of expression that specify individual source or target elements, Object T emplateE x ps, or collections of source elements, CollectionT emplateE x ps. Non-top relations may also have primitive domains, which only have a root variable, and an empty template expression. Primitive domains are used to pass in non-object parameter values. The sequence domain.r oot V ariable of domain root variables is the (input) parameters of the non-top relation.
In Model2Program, the domain template expression for u introduces an auxiliary local variable n of the relation and assigns this the value of u.name. Such local variables (included in Pattern :: bindsT o) can also be explicitly declared in the relation: In this paper, we will only use implicit declarations for such local variables.

QVT-R relation semantics
The data of an enforce domain (a domain with is En f orceable = true) can be modified by application of the relation, whilst data of a checkonly or primitive domain can only be queried. Transformations have a directionality-they may be executed in the direction of any one of their parameters (typed models), in which case all their rules are also executed in this direction. This means that relation domains with this (target) model can have their data modified (in the case of en f orce domains) using data read from domains with other An important mechanism in QVT-R is check-beforeenforce semantics: if the constraints of a relation already hold for some target elements, relative to specific source elements, then the relation execution binds some such target elements to the target domain variables, instead of creating new elements. Thus, executing Model2Program in the C direction, with a non-empty C model, the creation of a new C Program only occurs if there is not already a C Program instance satisfying the required property name = u.name, likewise for execution in the design (UML) direction.
A more complex relation defines the mapping of UML class types to C pointer types together with a C struct type: The t ypeI d and ct ypeI d features are unique identifiers for the UML and C type occurrences (i.e. keys in terms of QVT-R, Sect. 2.4). Executed in the C direction, for each e : Entit y, both a C Pointer T ype instance and a linked C Struct instance are potentially created. In general, any number of linked elements may be defined in a domain template. We describe this situation as (vertical) entit y splitting [20], and it is common in refinement transformations.
The QVT-R template expressions describe the data of model elements. They can also be interpreted as standard OCL expressions. For example, the above domains have the OCL interpretations In the C direction, this additional condition restricts the mapping to concrete UML classes. In the design direction, the condition assigns e.is Abstract = f alse for the target e : Entit y.

QVT-R relation dependencies
Top relations may have dependencies on other relations, in cases where a relation R can only be applied to an element x if another relation P has been previously applied to a linked element y. Such dependencies are specified in a when {P(y)} clause of R (Relation :: when in Fig. 2). For example, consider the mapping of UML properties to C struct members, omitting the mapping of property types: we provide a semantics in Sect. 5.2 and "Appendix D". We add the restriction that concrete rules should have concrete domain types S1 or T 1 in their direction of execution (because elements of these classes may need to be instantiated by the rule execution).

Specialised control of transformation behaviour
So far we have only used 1-multiplicity attribute features (such as name : String) or 1-multiplicity reference features (such as owner : Entit y) in domain templates. Semantic problems begin to appear when optional features are used in domains, i.e. features f with multiplicity 0..n or * . For example, the following top relation could seem to be a valid alternative way of mapping classes to structs and attributes to members in a single rule: 1. If e.owned Attribute is empty, then no p : Propert y can match the e template, and the relation will not be applied to e. In other words, classes without owned properties will not be mapped to C at all. 2. Even for classes with properties, the logic of the rule is that for every pair (e, p) of a class e and an owned property p, there should exist a pair (c, m) of a struct and a member. In the absence of key declarations, there is no obligation that the same c is chosen for different p within one class e.
Notice that check-before-enforce does not avoid problem 2: the C domain will be executed to create c and m in any case where there is not already both a C Struct and a C Member with the required logical properties relative to e and p. One technique that QVT-R provides to address the second problem are keys: a transformation can specify that certain features or feature combinations uniquely identify elements of certain classes. Multiple elements with the same key feature values/combinations are not permitted, and target elements are looked up by key and updated when they already exist, instead of being created. In the above example, key specifications The effect of N onT op Propert y2C Member is that for each pair (e, c) of related class e and C struct c, for all properties p of e, a corresponding C member m is added to c. This resolves problems (1) and (2) above.
Problems (1) and (2) arise because matching of domain patterns in QVT-R is performed in an element-by-element manner, i.e. there is an implicit quantifier e.owned Attribute→ f or All( p|...) over the Class2C Struct V 2 relation. This is in contrast to languages such as ATL or UML-RSDS, where assignments can be used to set the values of collection-valued features of target elements in a single step, based on the collection values of source element features.
where clauses can also contain assignments to features of elements, e.g. a non-standard way of writing Model2Program could be: In the C direction, only the first assignment is effective as an update, because u and its features are not writable ("object creation, modification, and deletion can only take place in the target model for the current execution", Page 15 of [30]). Instead, it is treated as a condition which should be established by the relation. In the design direction, only the second assignment is effective as an update.
Finally, it is possible to define auxiliary query operations (RelationalT rans f ormation :: helpers in Fig. 2

Overall QVT-R transformation semantics
The logical semantics of a QVT-R transformation is that at termination, all the concrete top relations will be established between the source and target models-i.e. the target model will have been modified wrt the source model in order that all these relations hold. The execution semantics of a QVT-R transformation is that each concrete top relation is applied to all source elements for which it is enabled, until the relation is established for all such elements. In addition, target elements which are not "required to exist" by the concrete top relations should be removed. This is a somewhat ambiguous requirement [2], which we make precise in Sect. 5.
The QVT-R semantics can be characterised as being state based: only the states of the source and target models are relevant for relation application. In particular, no execution trace is persisted from one execution to another-however, an internal trace is available, whereby one relation can test if another has been established in the same execution of the transformation, via the when clause, as described above.

Issues in QVT-R semantics
Although QVT-R was devised with the intent of being a declarative language with a clear semantic interpretation [15], there have been a continuing series of problems over its semantics.
These problems can be grouped into the following main categories: • Incompleteness and inconsistencies in the standard [31].
For example, issue QVT14-55 identifies incompleteness in the check-before-enforce mechanism, and issue QVT14-57 identifies gaps and problems in the RelTo-Core mapping in the standard, which (partially) defines the semantics of QVT-R via a translation to QVT-C. • The state-based semantics and check-before-enforce mechanism are insufficient in some cases to support efficient or precise incremental updates of a target model in response to source model changes [37]. The mechanism can also have unintended effects in batch mode, e.g. performing n to 1 merging of elements which should not be identified. • Different transformation problems require different criteria for rule application conditions and for target element matching/creation/update. The standard does not provide any capability for such flexibility, leading to contrived and complex specifications when a variant semantics is needed [7,36,37].
At the heart of these problems is the dichotomy between the aim that QVT-R should be a purely declarative language, defining the effect of a transformation independently of any algorithm/design, and the practical needs of specifiers to define modular, efficient and comprehensible specifications. Thus, the resolved issue QVT13-48 points out that intermediate states during a transformation execution must actually be considered.
The ATL language addresses the declarative/imperative dilemma by prohibiting read access to the target model. QVT-R uses when clauses with relation tests to provide read access to target elements-but there is no guarantee in QVT-R that the data of these accessed elements may not sub-sequently change in value and hence invalidate the relation that accessed the elements.
In UML-RSDS, we resolve the dilemma by expressing rules as logical predicates. The rules also have a procedural interpretation, and the sequential composition of these procedures establishes the logical conjunction of the rules interpreted as predicates (Sect. 4). Target data can be read in rules, but only at points where it has reached its final state. We will apply this same idea in our QVT-R semantics and impose conditions ((a) to (e) of Sect. 5.4) to enable QVT-R specifications to be interpreted in a declarative manner.
In contrast to the QVT standard, the implementation of QVT-R in the Medini QVT tool [11] is oriented towards the efficient execution of QVT-R for practical use. Medini QVT has become a de facto standard for QVT-R developers, as it has been the most widely used QVT-R tool for several years. The tool adopts a variant semantics (based on persistent traces), but also has limitations for incremental updates and a lack of flexibility in its semantics [38].

Incompleteness and inconsistencies in the QVT standard
The semantics of QVT-R is defined in different ways in [30]: Section 7.10 gives a semiformal description, supported by a more formal definition in Section B.2. Section 10 defines a mapping, RelT oCore, from QVT-R to QVT-C. Each of these descriptions is incomplete, and RelT oCore is inconsistent with Section B.2 [2]. It is not clear how the semantics operates for update-in-place execution (Issue QVT14-47). Detailed criteria for transformation correctness and verification are omitted from [30]. For example, an update by a relation R may be inconsistent because a different or the same application of R-instead of an application of a different relation-has already assigned a conflicting value to a feature of a selected target element. As discussed in Sect. 2, relations with source domain constraints r = r x : R{} on optional r references can fail to be applied to elements whose r value is empty/undefined (Issue QVT14-67). Multiple additions of elements to * -multiplicity features are not necessarily inconsistent (Issue QVT14-46), nor are multiple removals of elements from optional features, but mixtures of additions and removals, and other combinations of updates, can produce inconsistent behaviours.
As shown above, apparently correct rules can lead to subtle flaws, dependent upon the execution semantics of the rules. Three different mechanisms (check-before-enforce, keys and non-top relations) are available in standard QVT-R to support target element resolution and reduce non-determinism in relation execution, but these mechanisms also lead to semantic problems. Check-before-enforce is a coarse-grain mechanism, applying at the level of entire target domains. In situations where there are multiple target object template expressions, it would be preferable to have an alternative mechanism which tests individual target templates and only executes their actions if their predicate does not already hold for some target element (in which case such an element could be looked up and bound to the template root variable for use in subsequent target templates of the relation, or in the relation where clause).
The check-before-enforce mechanism is intended to support change propagation based only on the states of source and target models, but the lack of precise information about source-target correspondences may result in nondeterministic and imprecise change propagation. A change in the value of a source feature may result in deletion and re-creation of a target element, instead of a feature change of the target [37]. In some cases, check-before-enforce is the incorrect semantics for a transformation, for example, in cases where there are no keys, but nonetheless a 1-1 mapping of source elements to target elements is required, as in the Families to Persons case [37]. A more subtle case is where intermediate objects in a chain of linked objects should not be overwritten (Sect. 6).
Implicit deletion of elements (Section 7.10.2 of [30]) is unclear and ambiguous. It is expressed with respect to a single relation: that target elements t are deleted if they are not "required to exist" by a valid binding of source elements s in an application of the relation. However, t could also be "required to exist" by other relations, which would mean it should not be deleted. This implicit delete mechanism does not seem adequate to propagate removal of elements from collection-valued features. For example, if source class A and target class A1 have *-multiplicity features r and rr: Removal of some bx from a.r does not necessarily lead to removal of any corresponding b1x (i.e. where B2B1(bx, b1x) holds) from a1.rr, because b1x is still "required to exist" by B2B1. Only deletion of bx is propagated to deletion of b1x and its removal from a1.rr.
The order in which different parts of a QVT-R relation are executed is not fully defined in [30]. The source domains and when clause are considered together, the source domains declare variables representing input model elements and features of these elements, and the source domains and when clause (possibly also involving target variables in relation tests) define constraints on these variables. For Propert y2C Member, p is bound to some Propert y instance, e is bound to p.owner, and n is bound to the value of p.name and k to the value of p.isU nique. The when clause further binds or constrains variables which occur in it, e.g. in Propert y2C Member, the call Class2C Struct(e, s) restricts target variable s to be any C Struct already matched to e by Class2C Struct. On the target side of the relation, a binding of the remaining target variables is constructed (if possible) which satisfies the target domains and the where clause. In Propert y2C Member, this means finding/creating an instance m of C Member which satisfies m.member O f = s, m.name = n and m.is K ey = k. However, the standard does not specify the relative order of execution of the where clause and target domains, nor of different target domains. There is a requirement that all arguments of a where-call of a relation must be fully bound at the point of call; however, this does not ensure that all data needed by the call are available [36].
This lack of a definite execution order appears to have no benefit for efficiency or abstraction and complicates the definition of QVT-R specifications, particularly bx specifications [36]. Finally, the definition of rule inheritance (relation overriding) and of transformation extension is left unspecified in [30].

Issues with the RelToCore semantics
The RelToCore translation (Sect. 10 in [30]) defines an explicit semantics for QVT-R by translating QVT-R specifications into QVT-C. This translation is defined as a large and complex QVT-R transformation, which has many quality flaws [23]. The translation is also incomplete: various functions (get V ars O f E x p) and rules (R E x pT oM E x p, T opLevel RelationT oMapping For En f orcement) are not given full definitions, and there is no treatment of collection templates (QVT-R issue QVT14-28) or relation overriding (issue QVT14-57). The translation of where-invoked relations requires a separate Core mapping for each pair of an invoker and invoked relation [7].

Issues with Medini QVT
Medini QVT [11] has been the most successful QVT-R tool to date; however, it differs from the QVT-R stan-dard in that it does not support check-before-enforce and instead uses persistent traces to support change propagation. Apart from these differences, Medini QVT also has several omissions and errors: domain conditions are not supported, leading to additional complexity in bx specifications (cf. the dag2ast/ast2dag case of [36]); there is no transformation extension or relation overriding, leading to code duplication; sets cannot be passed as parameters to non-top relations (cf. the bag22bag1 case of [36]). Container references are not writable, so that relations must be defined relative to container objects, resulting in multiple (> 2) levels of element structure in domains (cf. the ecore2sql3 case of [36]). This also complicates traceability, requiring the use of marker relations [6]. A number of OCL operators are not supported or are incorrectly supported. As in the standard, the execution order of where clauses is not determined by their textual order. However, the tool does enforce that top relations cannot be invoked in where clauses, which is still an open issue in the standard (issue 14-59). The Medini QVT change propagation strategy is fixed and cannot be varied. The strategy propagates attribute value changes, but movement of an element from one association end to another may result in target element deletion and creation.
The approach is inherently operational, and verification facilities and correctness criteria are missing, although Medini QVT provides debugging facilities.

UML-RSDS
UML-RSDS is a specification language based on a subset of UML class diagrams, use cases, activities and OCL 2.4. Applications, including transformations and subtransformations, are specified as use cases τ which have preconditions Pre τ and a sequence of postcondition constraints Post τ expressed in an OCL subset. Use cases can also have activities (behaviours) in a subset of UML activity language, expressed as pseudocode statements ("Appendix C").
The UML-RSDS OCL subset excludes Ocl Any, and the invalid value, and uses classical logic instead of the 3valued logic of the OCL standard. It uses the notation "&" for and, ⇒ for implies, and x : s as an alternative for s→includes(x) 1 . The null value cannot be explicitly referred to but can be tested using the ocl I sU nde f ined() operator.

UML-RSDS specification structure
A UML-RSDS specification consists of a class diagram together with one or more use cases, which may be linked by extend or include dependencies. For transformations, classes may be distinguished as belonging to the source or target metamodels. A use case is itself a UML classifier and may have local attributes and operations. It is also a behaviour (specification) and may have parameters, preconditions, postconditions, invariants and an activity. Postconditions provide a declarative specification of the use case behaviour, and are expressed as rules or constraints with the schematic form where E is a source class and Pre and Post are booleanvalued expressions with context E, with Post possessing a procedural interpretation stat(Post) or stat LC (Post) as a UML activity 2 . The context class E is optional. As in UML operation postconditions, prestate forms v@pre of variables and expressions v can be used, to refer to the read-only state of v at initiation of the execution of the constraint on an E instance.
For example, the Model2Program rule would be expressed as an OCL constraint: UMLModel:: CProgram->exists( p | p.name = name ) In general, UML-RSDS rules are more concise and simpler in form than corresponding QVT-R rules.
Query operations may be defined locally to a use case, as in QVT-R transformations, and in addition, update operations can be defined, which may or may not return values. Query and update operations may also be defined for class diagram classes. All these types of operation can be called from UML-RSDS rules, but update operations should only be invoked in rule succedents.
Specifications can be internally composed sequentially by defining an activity of a composite use case uc which includes use cases uc 1 , . . . , uc n . The activity of uc is defined as the sequential composition of the activities of the uc i .

UML-RSDS rule semantics
Constraints are logically interpreted as universally quantified first-order logic formulae, e.g.: The mathematical interpretation of UML-RSDS OCL is given in [21].
OCL predicates such as p.name = name and E→exists (e|P) are also given an operational interpretation as UML activities, such as assignments and object lookup/creation actions (object creation of e : E implicitly updates the class extent collection E.all I nstances() of E instances). For each predicate P, an operational interpretation stat(P) defines an activity (written as a procedural statement) which attempts to establish P by reading a set rd(P) of features and class extents, and changing a set wr (P) of writable features and class extents [18,21]. This is the conventional behaviour semantics of P. An alternative "least-change" semantics is given by an activity stat LC (P) [25]. This tries to minimise changes to wr (P) data and to maximise reuse of existing elements. Definitions of rd, wr , stat and stat LC are given in "Appendix C". For example, the U M L Model rule has read frame {U M L Model, U M L Model::name} and write frame {C Program, C Program::name}. The "design synthesis" step of Fig. 1 involves the derivation of the stat(Post τ ) or stat LC (Post τ ) activities for each use case τ (a bx stereotype on a use case uc indicates that stat LC should be used for uc).
The QVT-R check-before-enforce mechanism for such a rule could be expressed as In other words, execution of the succedent is only attempted if it is not already true in the initial state of the rule execution. However, instead of using this global check over the entire postcondition, we can use the stat LC (Post) activity, which uses local checks to avoid un-necessary updates. For example, stat LC (x : y.r ) for *-multiplicity r first tests if y.r →includes(x) and only adds x to y.r if the test is false (for 0..1 multiplicity r , only if y.r is empty).
In UML-RSDS, Class2C Pointer T ype could be written as: where it is assumed that the C Pointer T ype instance with ct ypeI d = owner.t ypeI d already exists. In UML-RSDS, setting one end of a bidirectional association such as members/member O f implicitly also updates the other end (here, adding m to the members of the selected C Struct). Likewise, removing an element from one association end implicitly updates the other end. Deletion of an element implicitly removes it from all association ends and class extents in which it resides. Deletion of an aggregation owner element deletes all the aggregation part elements (cascaded deletion). These implicit updates will also apply for the QVT-R semantics representation in UML-RSDS.
The lookup mechanism E[val] also applies to collections of key values: CT ype [vals] for collection vals is the collection of instances of CT ype with ct ypeI d value in vals.
An alternative way of writing the above Propert y to C Member constraint is: Here, p is an auxiliary variable, similar to an OCL let variable or QVT-R local variable. It is implicitly ∀-quantified over the entire constraint. There are two versions of the general existential quantifier. →exists LC is used to provide local "least-change" semantics in a conventional execution semantics context.

1.
E→exists(x|P)-in the case that E has a key : String identity attribute, the stat or stat LC design of this quantifier expression uses any key specification x.key = value present in P to lookup x (or create x and set its key if there is no existing x with the key value) and then attempts to establish the remainder Q of P for x using stat(Q) or stat LC (Q). If E has no identity attribute key, then with stat semantics a new x is always created and P is established for this x using stat(P).

E→exists LC(x|P)-in the case that E has a key :
String identity attribute, the stat or stat LC design of this quantifier expression uses any key specification x.key = value present in P to lookup x (or create x and set its key if there is no existing x with the key value) and then attempts to establish the remainder Q of P for x using stat LC (Q). Otherwise, successive conjuncts of P are considered, with the aim to find an instance x : E which satisfies as many of these conjuncts as possible. An attempt is then made to establish the remaining conjuncts for such an x using stat LC . A new instance x : E is created if the first (or only) conjunct of P cannot be satisfied by any E instance.
Used as queries, exists and exists LC are equivalent, and test that E→select(x|P) is non-empty.

UML-RSDS transformation semantics
UML-RSDS transformation constraints are unidirectional as rules, e.g. reading U M L Model instances and creating/updating C Program instances. However, in some cases the rules of a transformation τ can be syntactically inverted to operate in the reverse direction, e.g. the inverse of the U M L Model rule is: This syntactic inversion is based on the semantic concept of a rule invariant. The conjunction of the inverted rules forms a transformation invariant I nv τ of τ [25]. Unlike QVT-R, there is no rule inheritance mechanism in UML-RSDS. Rules are explicitly ordered in the transformation postcondition and are executed in this order on all applicable source elements. There is also no concept of designated read-only source models in UML-RSDS rules: individual rules may read and write any source or target data. However, to ensure logical consistency within a use case τ , the postconditions are explicitly ordered as r 1 , . . . , r n in such a way that a rule r i which writes data read by a rule r j must precede r j : i < j. For example, the rule for Entit y above must precede the rule for Propert y, because the Entit y rule writes C Pointer T ype, which is read by the Propert y rule. This condition is termed syntactic noninterference. The activity synthesised for such a τ by design synthesis is stat(r 1 ); . . . ; stat(r n ).
The suffix @pre can be added to entity type names or feature names at points where they are read, to distinguish the read access from updates to these elements. v@pre is termed a pre-state expression. This enables a bounded-loop implementation of a constraint E :: P to be used, rather than a fixed-point implementation (which would repeatedly apply stat(P) to all instances of E until P becomes true for all instances).
Unlike QVT-R, UML-RSDS has an operator to explicitly delete model elements: x→is Deleted() removes x from the model. Cascaded deletions and other implicit effects also take place, as described above. The operator can also be applied to sets of instances. This mechanism is often more convenient for specifying transformations involving element deletion, compared to the QVT-R technique of "deletion by selective copying" (copying the elements that are not to be deleted) [12].
For each form of statement S, there is a definition of its weakest precondition with respect to some predicate P: [S]P. This is the most general condition under which every execution of S establishes P [17,21]. If the Post τ constraints of a use case τ are ordered so that no constraint r i can be invalidated by later rules r j :  [18,21]. This is a direct semantics compared to the complexities of QVT-R semantics and facilitates verification using classical logic theorem provers. Transformation invariants I nv τ are particularly useful in reasoning about transformation correctness, using proof by induction over transformation steps [21].
Identity attributes provide a persistent trace mechanism, which enables the definition of incremental-mode execution of UML-RSDS transformations, as an extension of the standard batch-mode execution [25]. Table 1 summarises the differences between UML-RSDS and QVT-R. The facilities missing from QVT-R and present in UML-RSDS could potentially be added to QVT-R and supported by semantic translation. Facilities present in QVT-R and missing in UML-RSDS need to be translated into suitable representations in UML-RSDS.

Translation from QVT-R to UML-RSDS
In this section, we present the rationale for our approach to the semantics of QVT-R (Sect. 5.1) and define a detailed translational semantics for QVT-R separate-models transformations using UML-RSDS (Sect. 5.2). Section 5.3 illustrates the semantics on an example. In Sect. 5.4, we discuss properties of the semantics, in Sect. 5.5 consider how it supports incremental model changes, and in Sect. 5.6 consider variations and extensions of the semantics.
The general principle of the translation from QVT-R to UML-RSDS is to represent QVT-R top relations as UML-RSDS rules (OCL constraints), non-top relations as update operations, and queries as query operations (Table 2).

Rationale for the semantics
As identified in Sect. 3, there are significant problems with the semantic basis of the QVT-R standard, particularly due to the check-before-enforce mechanism, state-based semantics, and the lack of variability and verifiability. We carried out an analysis of 27 published QVT-R specifications, to identify how the main language features are used in practice. We took tutorial examples from the Medini QVT site projects.ikv.de/qvt and from the QVT-D project repository of examples originating mainly from Mod-elMorf: git.eclipse.org/c/mmt/org.eclipse.qvtd.git. We also took cases from Github repositories and from published papers [12, 28,[36][37][38]. Table 3 lists cases with their size in LOC, together with the kind of element mapping which is performed by the transformation, and any design patterns [20,25] or specification approach adopted. Table 4 gives a summary of the case approaches.
It can be seen that the most common strategy for mapping source to target elements is to enforce a 1-1 correspondence using QVT-R keys. Another common approach is to map individual source elements to a group of target elements in different classes: this vertical entity splitting is typical of refinement cases such as the ecore2sql versions [36]. Otherwise, update-in-place cases usually involve modifications to elements in-place, sometimes with element creations and deletions. Keys can also be used to merge multiple source elements with the same key value into a single target element-this is used in abstraction cases such as ast2dag [36]. The check-before-enforce mechanism is less frequently used to achieve such merging.
Overall, we conclude that key-based element matching and merging must be supported by any proposed QVT-R semantics and implementation. In the absence of key values, different approaches for target element resolution may be needed, such as 1-1 mapping in the Families2Persons case [37], or n − 1 merging of duplicate source objects (the HSM to NHSM cases). Thus, whilst check-before-enforce should be supported by a QVT-R semantics/implementation, it should not be mandatory. The use of persistent traces should be available as an alternative change propagation mechanism, because this enables more precise change propagation than check-before-enforce, but improved capabilities for using persistent traces should be provided, compared to the support in Medini QVT.
For separate-models transformations, we will consider four possibilities for a target element resolution strategy, to identify target elements t which establish required constraints P(sv, tv) on source and target variables sv, tv, with respect to a binding sv → s of source elements s to sv: Key based: Use a key property k to locate target elements

Mandatory creation:
In the absence of target key properties, always create new target instances t and update these to establish P with the binding tv → t. Check-before-enforce: In the absence of target key properties, search for elements t which already satisfy all constraints of P wrt the binding sv → s, and bind t to tv. If such t cannot be found, create new elements as for the previous case. Least-change check-before-enforce: In the absence of target key properties, find target elements t which are maximal partial or total matches for the constraints of P (and t satisfies at least one constraint of P), and update the t if necessary so that they satisfy P. Again, if no such t can be found, create new target elements and update as required.
As regards efficiency, key-based and mandatory creation are the least costly strategies, in principle, whilst the other strategies require inspection of all target elements of particular classes. As regards correctness, both key based and least-change can result in direct conflicts between different rules: t. f = v could be set for reused element t and feature f in order to satisfy P, conflicting with a previously set value t. f = w established to satisfy another constraint Q. However, both mandatory creation and check-before-enforce can also result in conflicts, in cases where a 1-multiplicity reference r has been set so that t.r = r x and w : r x. f to satisfy a requirement w : The creation of a new r x1 and setting t.r = r x1, v : r x1. f will invalidate Q. In these cases, r x should be reused using least-change.
Our default semantics for target resolution is presented in Sect. 5.2, using key-based and mandatory creation. These are also the default mechanisms in Medini QVT and UML-RSDS (the stat interpretation of constraints, and →exists quantifier in UML-RSDS). Least-change can be defined using the stat LC interpretation and the →exists LC operator of UML-RSDS, and check-before-enforce using a generalised let operator. These are presented in Sect. 5.6.
In order to ensure transformation correctness, we restrict transformations τ by five conditions (a) to (e): • (a) "No secretly created objects": Target data are writeonly, and relations R can only refer to model elements e of source or target models which are explicitly declared in R as object variables (as the r oot V ariable of a Relation Domain, or bindsT o of a T emplateE xp). • (b) "No inter-relation conflicts": Different top relations do not have conflicting effects. • (c) "No intra-relation conflicts": No relation has internal conflicts in its effects (i.e. conflicts between different applications or within one application of the relation). • (d) "No secretly deleted objects": If τ refers to a target element t, then any target element x from which deletion could propagate to t must also be referenced by τ . • (e) "Call graph is surjective and non-cyclic": There are no unused non-top relations and no cycles in relation calling dependencies.
(a) and (d) ensure that target elements created or updated by τ are always recorded in some trace tuple and cannot be deleted except via the trace-based semantics. (b) and (c) prevent internal semantic conflicts within τ . (e) simplifies the semantic analysis. Although this prevents recursion between relations, recursion is permitted between query operations. Formal definitions of these conditions are given in Sect. 5.4.

Translation for separate-models QVT-R transformations
A separate-models QVT-R transformation τ is semantically represented as a UML-RSDS use case τ . τ has rules which each have at least two domains, and these domains cannot all have the same typed model. The root variable names of distinct domains should be distinct. There should be at least one target domain and at least one source domain per relation. We assume that τ satisfies the semantic correctness properties of (a) to (e) of Sect. 5.4. Key declarations key E { p} of τ are interpreted as asserting that p is an identity attribute of E: E→isU nique( p). In UML-RSDS, a single key can be used to look up elements, using the →exists quantifier as described in Sect. 4. In the case of two or more features forming a compound key, the translation is from key E { p 1 , . . . , p n } to the constraint The query functions of τ.helpers are represented as query operations of the use case τ .
In our separate-models semantics, we give formal interpretations Pres τ (m), Con τ (m) and Cleanup τ (m) as UML-RSDS transformation use cases for the update, creation and deletion phases of a relational transformation τ executed in the direction of typed model m. These transformations are invoked in the order Pres τ (m); Con τ (m); Cleanup τ (m) from the UML-RSDS transformation τ that represents the complete transformation τ .
As in the QVT-R to QVT-Core mapping of [30], we use trace classes R$trace to record that relations R on source domains s (domains with model m , m = m) and target domains t (with model m) have been successfully applied to particular model elements (Fig. 4). Traces enable relations to inspect the target model indirectly, without direct reference to target language classes or features. However, the information in traces therefore needs to be kept up-to-date with the actual source-target relationships: incremental changes to source or target models may invalidate existing source-target relationships, or result in the establishment of new relationships.
For each relation R, we define a trace class R$trace, which has properties x : E for each domain root variable x : E of each domain d of R and for each object template root variable (T emplateE xp :: bindsT o in Fig. 2) x : E occurring in a domain d of R (rule 1 of the RelT oCore mapping, The source object variables svars R of R are the object variables d:sdom ovars d of its source (non-target) domains sdom, and the target object variables tvars R are the object variables d:tdom ovars d of its target domains tdom. For separate-models transformations, svars R and tvars R are disjoint. The object variables of relation R are denoted by ovars R ; these are svars R ∪tvars R . Thus, R$trace has properties corresponding to ovars R . The set of all (free) variables declared in the domains of R is denoted avars R . This consists of the object variables and all other variables which occur as the re f erred Propert y of Propert yT emplateI tems in R, but does not include bound variables of quantifier or iterator expressions within R.
whenvars R is the set of variables occurring free in the when clause of R (Relation :: when.bindsT o). These can include elements of tvars R used as parameters of relation calls (tests). sourcevars R is the collection of all avars R variables occurring in the source domains, and svars R is a subset of sourcevars R (the other variables of sourcevars R typically represent features of the svars R elements). whenvars R and sourcevars R are subsets of avars R . The input variables invars R of a top relation R are the object variables occurring in the source domains or the when clause. These are read and not written by R. For a non-top relation, all the domain root variables are also included in invars R . The output variables For non-top relations: These are the object variables possibly instantiated by a successful application of R of R are the other object variables ovars R − invars R of R; these are potentially written by R. Table 5 summarises the notations used in the subsequent semantic definitions. Table 6 summarises some of the key terminology used in the semantic definition.
Elements elems of the source and target models are linked to one trace element tr : R$trace if R has been successfully applied to the source elements of elems to update or create the target elements of elems. These traces are tested when R(a 1 , ..., a n ) occurs as a rule call in a when clause (for either top or non-top relations R). The a i corresponds in order to the domain root variables v i (R.domain.r oot V ariable) of R, which are a subcollection of ovars R .
A positive call R(a 1 , ..., a n ) in a when clause is logically interpreted as tr : If a i is a variable, the call binds the value of tr.v i to a i . By using traces in this way we simulate logically this aspect of the Relations-to-Core semantics in [30]. A negative call not(R(a 1 , ..., a n )) is interpreted as not(R$trace→exists(tr|tr An expression e that has a defined stat(e), and with wr (e) a nonempty subset of the variables (targetvars R ) from en f orce domains with model m a n )). In this case, the a i must already be bound prior to the call. However, negative calls cause semantic problems (Sect. 5.4). The trace instances also provide a means to prevent reapplication of a relation to the same source elements if it has already succeeded on them: relation R is only applied to source elements a 1 , . . . , a k if there is not already a trace tr ∈ R$trace linked to these elements (the same approach is used in the translation of [8] from QVT-R to CPN). However, membership of R$trace is not necessarily equivalent to the validity of R on the linked elements, because of changes to source or target elements subsequent to the trace being created. Our semantics is designed to ensure that R$tracelinked elements do satisfy the logical interpretation of R at any points where they may be tested.
For the enforce semantics of a concrete top-level relation R, we define three logical constraints in UML-RSDS: • Preservation constraints Pr es R (m) expresses the effect of R when applied to tuples elems of source and target elements which are linked to one R$trace element, but which nonetheless may not satisfy the logical properties of R's target domains-due to incremental changes of source or target model data. The LHS of θ R (m, vars) is the basis of determining the application conditions of R, whilst the RHS can be adapted to give different semantics for target element resolution.
whenp is the logical interpretation of the when clause: the conjunction of the clause predicates, with relation calls treated as tests on the relation trace.
wherep is the logical interpretation of the where clause, the ordered conjunction of its predicate expressions, with non-top relation calls r (vars) treated as calls of the operation corresponding to r , defined below. Only where clause predicates effective for update in the m direction are included in wherep.
cpreds(sdom, bound) is the logical interpretation of the non-target domains sdom, given a set of currently bound vari-ables bound, and epreds(tdom, bound) is the interpretation of the target domains tdom, i.e. domains d with model m. The scope of any exists or exists LC quantifiers introduced for target object variables in the epreds formulae is extended over the remainder of the succedent, including wherep (as in [30], Annex B). We define cpreds and epreds in "Appendix B".
We write θ R (m) for θ R (m, ovars R ). A predicate guard R (m) is formed as the antecedent ϕ R (m) of θ R (m), with all variables apart from the ovars R variables ∃-quantified. That is, as an OCL formula it is where the v i are the variables in avars R − ovars R . However, we eliminate ∃ quantifiers where possible by with target elements tvars R quantified out, so that it is a predicate on svars R only.
The checkonly semantics of R in the direction of m asserts that at termination of τ , for every tuple of svars R elements that satisfy sguard R (m), there is an extended tuple of elements for ovars R which satisfies θ R (m) (Section 7.10.1 of [30]).
Regarding the overrides clause in relations, we remove this by considering that all overridden relations are abstract and not executable-they are present in order to express commonalities between more specialised relations. Before applying the semantics, we merge the definition of an overridden relation into its concrete overriding relations ("Appendix D"). The overridden relation does not need a trace set. A when test on an overridden abstract relation R( pars) is replaced by a disjunction R 1 ( pars) or ... or R n ( pars) testing the concrete relations R i that override R. This then becomes a test on their trace relations in the semantics. A where call of a non-top abstract R( pars) is semantically expressed as a conditional The Op R i are the operations representing the merged concrete relations R i overriding R. In a similar way, we can expand out a transformation composed using the extends mechanism ("Appendix C"). Thus, in the following we will only consider concrete relations and transformations without extends.

Preservation constraints
The first execution phase Pres τ (m) of τ applies for incremental-mode execution only, using a persistent trace. For each relation R, Pres R (m) is an OCL constraint that defines R's feature change propagation actions for elements that have already been matched (i.e. by a previous execution of τ ); it is defined schematically as R$trace@ pr e :: If the guard holds, then the effect of R is re-applied on the target elements linked to the trace element, to re-establish the logical property θ R (m). Otherwise, the trace-linked elements cannot be updated to re-establish the property (because only target data can be updated), and the trace element is deleted.
Provided that target classes and features are not referred to in the when clause or source domains, and that target data are only written and not read in R, and that R itself is not invoked directly or indirectly in the when or where clause (the conditions (a) and (e) of Sect. 5.4), then Pres R (m) has disjoint read and write frames. In addition, elements of R$trace are only deleted, not created. This means that a polynomial-time complexity bounded-loop implementation of Pres R is sufficient, and we enforce this design choice by using the @pre annotation on R$trace.
The This constraint propagates name changes from U M L Model instances to C Programs. In addition, if the u element has been deleted (i.e. the trace reference u is null), then the i f condition is f alse and the trace is deleted.

Construction constraints
The second phase Con τ (m) of τ deals with the creation of new target elements and traces. It applies both for batch and incremental-mode execution. It consists of constraints We use R$trace@pre in the antecedent because otherwise R$trace would be read and written by the constraint. However, a bounded-loop implementation is sufficient because the succedent can only reduce and not increase the collection of element tuples which satisfy the antecedent. Thus, as with Pres R , we use the prestate expression to enforce a boundedloop implementation. Any exists or exists LC quantifiers introduced by epreds(...) in the succedent apply over all of the succedent following their introduction, by definition of epreds. Assuming the conditions (a), (e) as for Pres R (m), the write and read frames of Con R (m) are disjoint.
The Con R (m) constraints for concrete top relations R are placed in Con τ (m) in < order as defined in Sect. 5.4. These constraints use the default target element resolution approach (key based with mandatory creation in the absence of keys) to satisfy R in cases where the target elements do not already exist, and propagate element creation from source models to m.

Cleanup constraints
The third phase Cleanup τ (m) of τ consists of constraints Cleanup E (m) for each target entity E of m. Cleanup E (m) is defined as: where the Ri, i = 1 to k, are all the relations (top or non-top) in which the entity E occurs as the type of some target object variable ei : E in outvars Ri . All Cleanup E (m) constraints for entity types of m are placed in a final τ phase, Cleanup τ (m). The constraint Cleanup E (m) reads and writes E, so it needs a fixed-point implementation: the constraint is re-applied until there are no E elements satisfying its antecedent.
For example, if the U M L2C transformation contains only one relation which can create C Program instances, Model2Program, then Cleanup C Program (C) is: The Cleanup τ (m) constraints propagate element deletion from the source models to m, because if any element linked to a trace instance is deleted, so is the trace (the links from the trace to the elements are mandatory, as shown in Fig. 4).
A generally stronger version of the Cleanup E (m) constraint is: guard Rk (m))) ⇒ sel f →is Deleted() r x.P is P with variables u of ovars R in P replaced by r x.u. This version of the cleanup constraint deletes any target element which does not occur in any Ri trace that satisfies guard Ri (m). However, assuming the correctness conditions (a) to (e), all trace elements will satisfy their relation guard at termination of Con τ (m), so the simpler version of Cleanup τ (m) is sufficient in this case.

Non-top relations
Non-top relations R are interpreted as operations with postconditions defined in a similar manner to Con R (m). For a non-top-level relation R, a constraint ConOp R (m) is used to form the postcondition of an update operation R m which has as parameters the root variables of the (relation and primitive) domains of R. ConOp R (m) is defined as for Con R (m), These operations are added as static owned operations of the use case Con τ (m) representing the transformation phase.
For Pres τ (m), a different version of the operation R m is defined, with postcondition Pres Op R (m): The v 1 , . . . , v n are ovars R . The operation R m is added as a static owned operation of Pres τ (m).

Overall transformation semantics
For each execution direction of a separate-models transformation τ , the three phases described above are executed in the order Pres τ (m); Con τ (m); Cleanup τ (m). The semantics generalises in a natural manner to consider execution directed at multiple target models, extending [30]. Figure 5 shows the data dependencies of Con R and Pres R wrt trace entities, for the default semantics. where * is the recursive closure of relation calling though where clauses, likewise for when * . An arrow d → R means that R can read data d, and an arrow R → d means that R can write d.
This shows that in order to avoid cycles in data dependencies between the constraints within transformations Con τ and Pres τ , quite strict constraints must be placed on how relations depend on each other via when and where clauses. These conditions are encoded in condition (e) of the following section.
Cleanup E reads and writes E, and reads R$trace for any relation R with an outvars R variable of type E. It may also write target classes F which are affected by deletion propagation from E. To ensure that trace classes linked to F are not also written, we assert the condition (d) that target elements referenced by τ (occurring in some trace tuple) cannot be affected by cascaded deletion from target elements that are not referenced by τ (not occurring in any trace tuple).

Example of the semantics
As an illustration of the semantics, we show the translation of a simple version of the Families to Persons case of [1] to UML-RSDS. Figure 6 shows the metamodels; the transformation consists of a single rule:  The Pres constraints concern incremental execution, i.e. re-application of tau to a modified source model. The Con constraints apply to initially map a families model to a persons model and to propagate the introduction of new source elements to new target elements. The antecedent of these constraints includes a check that f m is not already mapped to some m.
Finally, the Cleanup constraints remove target elements that are not related to some source element. Notice that the inverse direction of the trace-to-class association in Fig. 4 is used to optimise the Con and Cleanup constraints.
The translation can be used to show semantic equivalence of different versions of the transformation, to support bidirectionalisation of the transformation. Further details of this example are given in https://nms.kcl.ac.uk/kevin.lano/ qvt2umlrsds.pdf.

Properties of the semantics
The three phases Pres τ (m), Con τ (m), Cleanup τ (m) of τ should establish the following consistency relations Rel τ of τ directed at m: 1. That any existing target instance t x : E of any entity type E of m must appear as a target property of some Conditions (2) and (3) imply that for any source element tuple x for svars R which satisfies sguard R (m), for top relation R, there is an extended tuple y of elements for ovars R such that y is in R$trace and satisfies guard R (m) & θ R (m). This means that the checkonly semantics of R is satisfied at termination of τ . However, the following equivalence is not generally true (either at termination or during transformation execution) for a top relation R, for any tuple vals of instantiating elements for the object variables ovars R = {v i } i of R: That is, membership of vals in the trace is equivalent to successful execution of R on vals. This fails in the ⇐ direction because it is possible for a tuple vals to satisfy guard R (m)(vals) & θ R (m)(vals) but not be in R$trace, because vals has not been processed by R. can conflict, with some b : B created by R1X and hence with b = tr.b for some tr : R1X $trace, but also matched by key to some c : C by R2X and hence potentially modified so that b.value no longer satisfies θ R1X (trg). Another situation of conflict between relations is if a relation R2 has a negative when test not(R1(v)) on another relation. Subsequent execution of R1 may invalidate the guard of R2 and hence invalidate the ⇒ direction of the equivalence. Such tests should be replaced by alternative guards when using the "Guard against duplicate applications" pattern.
To address these issues, we formulate five correctness conditions for QVT-R specifications τ directed at a model m.  so is any aggregation owner element b : B linked to t by an aggregation association (Fig. 7). Another way to express this is if t appears in any trace tuple of some τ relation, b must also appear in a trace tuple, of the same or a different relation; • (e) "Call graph is surjective and non-cyclic": Relation calls in when clauses must be positive (including disjunctions or conjunctions of positive calls). Any recursion in queries should be shown to be always terminating. There should be no recursion between relations via when or where clauses 3 . All non-top relations are called from at least one top relation directly or indirectly. No relation can be called via both when and where clauses starting from one relation. More precisely, it should be possible to order the top relations as R 1 , . . . , R n such that: 1. R i called in R j .when ⇒ i < j 2. S called in R j .where * and R i called in S.when * ⇒ i < j 3. S called in R i .where * and S called in R j .when * ⇒ i < j.
Condition (a) prevents relations from referring to arbitrary target elements or to the extents T .all I nstances() of target classes T . Thus, predicates such as v = T .all I nstances() →si ze() or T →exists(t|P) are prohibited. The condition ensures that all target elements whose data are read by or written to in a relation must also be linked to some trace of the relation. (a) and (e) ensure that the constraints in the Pres τ and Con τ phases can be ordered to avoid cycles in data-dependency. These conditions ensure that cycles cannot arise in the trace data dependencies of Fig. 5. This means that bounded loop execution can be used for the Con τ and Pres τ phases, resulting in polynomial time complexity for both batch and incremental execution modes. (b) and (c) make precise the conditions for specification consistency based on clashes of bindings discussed in [30]. Condition (d) is a further consistency requirement which avoids conflict between the creation and deletion actions of relations. Condition (e) is the 'no cycles' condition also assumed by [8]. Recursion in specifications can hinder bidirectional execution and can usually be replaced by the Map Objects before Links pattern, closur e expressions [28] or f or All quantifiers. • If P and R are both top relations, then P < R and hence ¬(R < P). • If P is top, R non-top, then any top S with R ∈ S.where * has P < S. But R P would imply that S < P. • Likewise for non-top P, top R.
• If both P, R are non-top, then any top S, T with P ∈ S.where * , R ∈ T .where * has S < T . But R P would imply T < S.
Thus, provided that every non-top relation P has some top R with P ∈ R.where * , the result follows.
This suggests a way to organise QVT-R transformations: list the top relations in < order, and group the non-top relations of R.where * for top relation R together with R.
Note that it is possible for τ to satisfy the correctness conditions (a) to (e) in one execution direction but not in others. For a bx, we require that the conditions hold in all execution directions. This means that the transformation can be executed in multiple directions to synchronise models which may be modified independently. This capability fails for the standard specification of the UML to RDBMS QVT-R transformation [28]. A consequence of (a) is that when clauses of bx can only contain relation tests and equality/inequality tests on variables, and no occurrences of source or target features or class names.

Theorem 2 For both batch-mode and incremental-mode execution of a separate-models transformation τ , the correctness conditions (a) to (e) ensure that τ establishes or preserves the properties (1), (2) and (3).
Proof For batch-mode execution, assuming (a), (b), (c), (d), (e), the Con τ (m) phase establishes invariant (3), since if tuple v satisfies sguard R (m) for concrete top relation R at termination of τ , i.e. sguard R (m)[v/svars R ], it also did so at the start of the Con R (m) execution step of Con τ (m)-because sguard R (m) depends only on source model and trace data which are read-only subsequent to the start of Con R (m) (if R reads S$trace via a relation test of S in a when clause, then S R in the execution ordering). By definition of sguard R , there is an extended tuple w of v which includes tvars R elements in whenvars R , which satisfies guard R (m), i.e. guard R (m)[w/invars R ]. If w is not already linked to an R$trace element, then Con R (m) is applied to w, and w extended to u with created/matched target elements of outvars R then satisfies guard R (m) and θ R (m), and is linked to a new R$trace element. Trace elements can only arise in this manner, and are not removed by Cleanup τ (m) (because of condition (d)); hence, (2) and (3) hold. The validity of guard R (m) and θ R (m) for u in R$trace is not affected by subsequent Con P applications in Con τ (m), either of R or another rule, by conditions (b), (c), (e). Condition (e) also ensures termination of the Con τ phase.
The Cleanup τ (m) actions do not invalidate guard R (m) or θ R (m) for u in R$trace because they cannot modify elements that are in a trace. In particular, (a) means that R only created/modified target elements that (after its execution) occur in R$trace, and hence cannot be affected by Cleanup τ . For separate-models transformations, the Cleanup τ (m) constraints establish (1) and do not invalidate (2) because they only update the target model, not the source or traces. Condition (d) ensures that Cleanup-initiated cascaded delete does not delete elements that are in traces. In situations where a target a : A element would be deleted if a target b : B element is deleted, if a is in some trace tuple, so must b be.
For incremental-mode execution, if conditions (1), (2), (3) already hold, then none of the phases Pres τ (m), Con τ (m) or Cleanup τ (m) have any effect. (2) means that the Pres τ (m) constraints have no effect. (3) means that no Con R (m) constraint is enabled. (1) means that no elements satisfy the criteria to be deleted by Cleanup τ (m).
Assuming the above conditions (a) to (e) for transformation τ , Pres τ (m) establishes (2) for any incremental change in the source model, since for each top relation R, if a trace element no longer satisfies guard R (m), then it is deleted; otherwise, its linked elements are updated if necessary to reestablish guard R (m) & θ R (m). Deletion of source elements will delete any trace that they belong to. Con τ (m) does not invalidate any existing traces, because of the non-interference and ordering properties (b), (c) and (e), but may create new traces for new source elements, or for existing source elements which now satisfy some relation guard. Likewise, Cleanup τ (m) does not invalidate existing traces, but deletes target elements which no longer appear in traces. Thus, (1) and (3) are established.
Changing the definition of target element resolution does not affect this proof; however, the definition affects conditions (b) and (c) of rule consistency. In some execution scenarios, only one subtransformation needs to execute in order to enforce (1), (2) and (3). In batch mode with an initially empty target model and empty trace, only the Con τ (m) use case is necessary.

Corollary If conditions (a) to (e) hold for τ in direction m, then τ satisfies the checkonly semantics of [30] in the direction m.
Proof Theorem 2 shows that the conditions (1), (2), (3) are established by τ at its termination. For each concrete top relation R of τ , conditions (2) and (3) imply that for any source element tuple x for svars R which satisfies sguard R (m), there is an extended tuple y of elements for ovars R such that y is in R$trace and satisfies guard R (m) & θ R (m).

Theorem 3 The presented semantics is consistent with the Relations-to-Core semantics of [30].
Proof We consider the 6 principal rules of Sect. 10

Incremental execution and change propagation
Source-to-target change propagation of the following source model changes is supported for separate-models transformations in incremental mode by our semantics: • Addition of a new source element sx : ST -if this element satisfies some sguard R (m) of a relation R with source domain s : SST for SST equal to ST or a supertype of ST , then sx will be processed by Con R (m) and appropriate target and trace elements selected or created. Further top relations may consequently also become enabled and will be applied. In general, we use conditions on traces to selectively enable specific constraints for particular incremental changes, instead of re-applying the entire transformation. This is based on a static scheme, rather than a dynamic mechanism as in [10], but the performance was found to be satisfactory on all the test cases of Sect. 8. A more elaborate static scheme for managing incremental changes by additional operations is defined by [33]; however, this considerably enlarges and complicates the semantic representation of a transformation, so making it harder to understand and analyse.

Variant semantics and extensions for QVT-R
The above semantics can be adapted to the cases of checkbefore-enforce and least-change semantics for target element resolution. We also consider the case of non-persistent traces and propagation of element removal. In addition, a semantics can be given to internal transformation composition.

Check-before-enforce semantics
Modifying the semantics of a relation R to use check-beforeenforce target resolution means changing the definition of the succedent of Con R (or ConOp R for non-top R). Two cases need to be distinguished: In order to achieve this logic, we need an extended OCL let operator, which generalises the version defined in [29]. This has the form let vars be P in Q where vars are a list of variables, P are constraints on the vars, and Q is a constraint which has a stat(Q) interpretation and does not write to the variables of P. This generalised let has a procedural interpretation as an any statement [17] and hence has a well-defined predicate transformer semantics.
The check-before-enforce version of Con R (m) is therefore formalised as: To require that check-before-enforce semantics is used for a relation, a stereotype/annotation @checkBeforeEnforce could be written before the relation header. Alternatively, the stereotype could be written before a transformation header to express that it applies to all relations in the transformation. This definition of target element resolution affects the circumstances under which consistency conditions (b) and (c) are valid, as discussed in Sect. 6.

Least-change semantics
In some cases, it is useful to be able to specify that existing target elements should be reused and updated if they partially match the required conditions of a target pattern. The 'check-before-enforce' semantics only deals with situations where target elements completely match the required conditions, whilst the 'always create new elements' semantics ignores existing target elements. Key-based partial matching and update only operates for elements with keys.
We use the notation e <:= E { pattern } for target object templates where least-change partial matching is required. Such templates have the semantic interpretation E→exists LC(e|Pred) in the epreds predicate, instead of E→exists(e|Pred), where Pred semantically interprets pattern. Examples of the use of this mechanism are in the online dataset directories con f luence and simple f amilies2 persons of [27]. The application of the semantics to the families to persons case is explained in [26].
As for check-before-enforce, this definition of target element resolution alters the circumstances under which consistency conditions (b) and (c) are valid.

Non-persistent traces
We have defined the semantics τ of τ on the basis that trace instances of the R$trace classes are persisted to support incremental execution. However, phases Con τ (m) and Cleanup τ (m) are also applicable in the case of nonpersistent traces. The properties (1), (2), (3) remain valid as postcondition properties established at termination of τ , and the proof of Theorem 2 remains valid for batch-mode execution.
However, incremental execution mode is no longer supported. Execution of τ on a non-empty target model may fail to propagate changes to the correct target objects, because the information of precise source-target correspondences in persistent traces used by Pres τ (m) is not available. Each Con R (m) will be re-applied to every relevant source element, irrespective of previous mappings established for these elements via R.

Propagation of element removal
The semantics we have presented does not necessarily propagate element removal changes. To address this, an explicit undo R (m) predicate can be derived, which is a conjunction of predicates y.r →excludes(x) for each predicate y.r →includes(x) or x : y.r in epreds(tdom, ovars R ∪ whenvars R ∪ sourcevars R ) & wherep, where x and y are in ovars R . Pres R (m) is then modified to perform undo R (m) if the guard of R is false for an existing trace: R$trace@ pr e :: Similarly for Pres Op R in the case of non-top R.

Internal composition of transformations
QVT-R transformations are typically written in a monolithic style, with all operations and rules contained in a single semantic unit. For large transformations (such as the RelTo-Core transformation, or the Ecore to SQL case of [36]), this can result in very large numbers of rules and operations (e.g. 51 operations, 12 rules and 118 calling dependencies in the first version of ecore2sql). To improve the organisation of large transformations, we suggest using subtransformations which are internally sequentially composed. The structure of such a transformation would be:

Each of the subtransformation parameter lists pars i is a subsequence of the main transformation parameters pars.
A sequential composition τ = τ 1 ; τ 2 of two QVT-R transformations can be given a semantics as a sequential composition of the corresponding UML-RSDS transformations τ 1 ; τ 2 . This chains the three phases of τ 2 after the three phases of τ 1 . Provided that the write frames of τ 1 and τ 2 are disjoint, and that τ 2 does not write to any data read by τ 1 , then the composed effect of τ is well defined: τ 2 does not invalidate the postconditions established by τ 1 on its models.

Semantic analysis of QVT-R
The translation of a QVT-R relation R to an OCL constraint θ R (m) immediately exposes the semantic form of the relation in terms of quantified source and target elements.
In general, each relation should only concern a single oneto-many association within each model [16]. Hence, each relation should refer to elements at two composition levels or fewer, per model (e.g. to programs and classes, or to classes and attributes, but not to programs, classes and attributes). Multiple levels of quantification within one relation are both difficult to comprehend (cf. the AbstractToConcrete case from Modelmorf) and inefficient to execute.
The problems identified with the Class2C Struct V 2 relation in Sect. 2 can now be understood by considering the form of the semantics θ Class2C Struct V 2 (C) of the relation in UML-RSDS notation: For problem (1), if e.owned Attribute is empty, the antecedent is not satisfiable for any p, and so no c : C Struct will be created for e.
For problem (2), if C Struct does not have a key attribute, then a different c : C Struct could be created for each different p : e.owned Attribute. Check-before-enforce semantics does not avoid this problem because there may not be a C Member with all the required properties for p, and hence, the c domain will need to be executed for p. However, the least-change semantics for target element resolution for the c : C Struct instantiation would resolve problem (2): new C Struct instances would not be created for different p; instead, the C Struct with name = e.name would be reused and only new C Member instances would be created and added as members of this C Struct.
To avoid such semantic problems, specifiers should separate matching on many-valued features into separate rules, in which the owner elements are already bound by a when clause. For example, Class2C Struct V 2 should be restructured as two rules, Class2C Struct and: In general, the semantics can be used to identify anomalies which can indicate specification errors. In our tools, we include static checks for the correctness conditions (a) to (c) and (e), including flaws such as the use of calls to top relations in where clauses, unused non-top relations, and updates to the same data in the target domains and the where clause of a relation. Figure 8 shows an example of analysis of condition (b) and Fig. 9 an example of analysis of condition (e). then assignments of a constant to bI d, such as bI d = "1 , are detected as potential confluence errors. These are also cases of potential semantic conflicts (c) within a single relation (the QVT-R semantics of [30] only considers semantic conflicts between different relations).
Assignments of (values of) non-keys A::att to key attributes B::bI d are also a potential confluence error because different A instances a1, a2 can have the same att values and hence match to the same B instance. If other features of a1, a2 have different values and are used to update B features, then a conflict of type (c) arises. Least-change semantics is also prone to this kind of error. Potential conflicts of type (b) between relations can also be detected by our analysis: a warning is issued if two different top relations update the same features of the same target class (Fig. 8).
A more subtle confluence problem arises with specifications of the form Assuming that there are no key features, there are three different options for target resolution of b1x: (i) always create new B1 instances b1x for each different bx : a.br ; (ii) check-before-enforce; (iii) least-change check-beforeenforce. Option (i) is incorrect, because the assignment a1.b1r = b1x will overwrite any previous assignment to a1.b1r and invalidate previously established b2x : a1.b1r .b2r for bx = bx and (bx , b2x ) : B2B2$trace. Check-before-enforce semantics only avoids the creation of b1x if b2x is already in a1.b1r .b2r . Hence overwriting can also occur in this situation. However, the leastchange semantics (iii) does give correct behaviour. Using B1→exists LC(b1x, an instance b1x : B1 with a1.b1r = b1x is only created once, then subsequently this instance is looked-up and b2x : b1x.b2r is established for b1x. This avoids the above confluence problem. Thus, using our extended QVT-R notation, the relation should be written as: Syntactic confluence checks for the default exists semantics are defined in [24] and are implemented in the tools.

Design patterns
Several model transformation design patterns from [20,25] are particularly useful in structuring QVT-R transformations in order to reduce semantic flaws and increase capabilities for bidirectional execution: • Map objects before links: used to avoid mutual/cyclic dependencies between relations and to separate out the mapping of collection-valued references. The pattern relies on key-based or mandatory-create target resolution, in order to impose a 1-1 mapping of elements in the 'map' phase (e.g. Model2Program in Sect. 2). • Lens: used to provide incremental bidirectional execution in cases where the forward map ignores existing target data. This is used in cases where a target data feature g of instances t : T of entity type T can be computed in the forward direction as a function of source features f 1 , . . . , f n of instances s 1 , . . . , s n of source entity types S 1 , . . . , S n : In the reverse direction, put i functions update the f i based on the initial values f i @pre of these features and on the target data g: In QVT-R, the equations can be placed in a relation where clause, the assignment to g will be effective for update only in the source-target execution direction, whilst the assignments to the f i will be effective only in the target-source execution direction. The @pre suffix is used for the f i so that correctness condition (a) is not violated in this direction. To avoid conflicting updates in the put assignments, each s : S i for any i should be related to only one t : T . • Entity splitting/merging: where the data of one class in SL are distributed to data of two or more classes in T L, or vice versa. Horizontal merging/splitting is the situation where two or more exclusive classes are merged into/split out from one class. Vertical merging/splitting is the situation where two or more non-exclusive classes are merged into/split out from one class. • Flattening/unflattening: various situations in which source model data structure corresponds to simplified or elaborated structure in the target model. For example, introduce intermediate class adds a new class C in T L between two classes derived from SL linked by an association: A −→ r B in SL is elaborated to A1 −→ r 1 C −→ r 2 B1 in T L, so that r is represented by the composition r 1.r 2. The reverse of this transformation is a flattening which discards the intermediate elements.
A recursive *-accumulation is a flattening where the closur e of a source association corresponds to a target association.
Transformations involving flattening can be problematic for bidirectional execution because of the loss of infor-mation (e.g. the two cases where a bx is not definable in [36] both involve flattening). • Auxiliary metamodel: Introduce additional model structure in order to support bidirectionalisation, e.g. to introduce flag attributes in target classes to record the specific source class from which a target instance is derived, in the case of horizontal Entity merging. • Auxiliary models: in cases where there is substantial disparity between the structures of SL and T L, introduce an intermediate model and split τ into a sequential composition of two transformations which use this model. Auxiliary models can also contain configuration information to restrict nondeterminism, as in the Families to Persons case [37]. • Object indexing: Introduce identity attributes/keys for elements in order to enforce reuse of target elements (instead of creation of new elements) and to enforce 1-1 or n − 1 relations between source and target elements of corresponding classes. • Restrict input ranges: Use additional guards to prevent duplicated application of relations with source object In addition to design patterns, some specific idioms are useful: • Marker relation [6]: simple non-top relations with domains s : S{}; t : T {}; with no explicit functionality, which are called from the where clause of more complex rela-tions in order to store specific pairs or tuples of elements into a trace that can be queried in other relations. For example, the dag2ast/ast2dag and ecore2sql transformation versions of [36] use this idiom. • Test and update bidirectional 1-* or 1-0..n associations at the 1 end: as noted in Sect. 2, matching on 0..n multiplicity or * multiplicity association ends has semantic complications. Therefore, it is preferable to manipulate such associations via their opposite ends if these are 1multiplicity. • Replace recursion by iteration: recursive relations can be a cause of semantic problems and can be avoided in many cases by using the Map Objects before Links pattern, by using the →closur e(r ) operator on a self-association r [28] or by using a ∀ quantifier (e.g. the bag migration case in Sect. 8, defined in "Appendix E").
Deletion by selective copying is useful for update-in-place transformations ("Appendix A").

Evaluation
In this section, we consider the cases of [36] and other bx examples, and use the above patterns and the implementation of QVT-R via UML-RSDS to provide systematic specifications of these. We consider both batch-mode and incremental-mode execution.
The code of all examples, together with their semantic interpretations in UML-RSDS and example execution scenarios, can be found at [27]. Due to space restrictions, we provide the detailed evaluation results in supplementary material ("Appendix E"). The Families to persons case is also presented as an example in [26]. We used Version 1.9 of the Agile UML tools for UML-RSDS, available at https://projects.eclipse.org/projects/modeling.agileuml. We used the current version of the Medini QVT tools [11]. All tests were carried out on a Windows 10 i5-6300 dual core laptop with 2.4GHz clock frequency, 8GB RAM and 3MB cache.

Comparison
The cases can be evaluated in terms of the quality metrics of [13,23] and in terms of the bx properties they support. Table 7 compares previous versions of the cases in QVT-R or ETL (for the tree to graph case) with the QVT-R and UML-RSDS versions defined in this paper, with regard to the number of quality flaws per LOC. Quality flaws are counts ERS and EHS of excessively large (over 50 LOC) rules/helpers, counts EFO and EFI of rules with excessive (over 5) fan-out and fan-in, counts EPL of excessive (over 10) numbers of variables, and counts CBR of excessive numbers of calling dependencies and DC of code clones. We also give the performance gain ratios of our version relative to the original version executed via Medini QVT (for the cases of [36]). In each case, our solutions have the same or fewer flaws than previous solutions and are more efficient for batch-mode execution. Table 8 summarises the bx properties of our solutions. Reduced flaws are due to our use of patterns, especially the Map objects before links pattern, which leads to a logical specification style with top relations dependent via when, rather than a functional style using non-top relations with where calls. Improved efficiency is due to optimisation in the UML-RSDS design synthesis process (e.g. reducing the range of element quantifications and searches where possible), and because code in 3GLs is produced, whereas Medini uses interpreted execution.
We have therefore improved on the properties of the solutions of [36] in two cases: the mapping of bags has been specified by a bidirectional transformation instead of by two separate forward and reverse transformations, and for trees and dags we specified forward and reverse transformations which are closely related and mutually inverse. We also provided incremental solutions for 4 of the 6 cases of [36] and provided a deterministic solution for the sets/ordered sets case. We improved the Hsm2nhsm transformation of [28] by eliminating the circular calling dependencies of the previous solution.

Related work
The closest related work to our approach is [2] and [8]. In [8] QVT-R is translated into coloured Petri nets (CPNs). This enables the use of CPN tools to simulate and analyse QVT-R specifications. The translation covers batch-mode execution but not incremental or in-place modes. They follow the semantic approach of RelToCore, using non-persistent traces. Separate CPN representations of source-to-target and target-to-source execution directions are generated. Both checkonly and enforce semantics are treated. Only checkbefore-enforce using key-based target resolution seems to be considered in [8]. As in our approach, read access to target models is considered an error in [8]. Criteria for termination are provided, as are consistency checks. However, general OCL expressions are not supported in specifications, and verification requires a developer to relate CPN-based analysis results to the original QVT-R text, which is not trivial, due to the complex and low-level encoding. Execution via CPN is possible; however, this is not efficient and is only useful for debugging/simulation. In [2], QVT-R is translated into a transition system formalism with model mu-calculus constraints expressing the semantics. Both batch and incremental mode are supported, but non-top relations are not permitted to create elements. The formal representation varies depending on the transformation direction. Key-based and check-beforeenforce target resolution is adopted, without traces. As with [8], the formalism is quite distant from the UML/OCL basis of QVT-R, and it is non-trivial to relate results from the semantics back to the original specification. Execution and verification via model checking are supported in principle, but not implemented by [2]. Table 9 summarises the approaches of [2,8] and compares them with our approach. Our approach is the only one which formalises both leastchange and in-place execution modes and provides 3GL code implementation. The conceptual distance of our formalisa- tion (in UML and OCL) from the standard is lower than for the other approaches.
The problems with QVT-R semantics seem to go back to the original RFP for the language, which emphasised technology alignment instead of language semantics [15]. Specific problems in QVT-R semantics were documented by Stevens in [34]. The limitations of a purely declarative interpretation of QVT-R were identified. This work led to the semantics of QVT-R defined using mu-calculus [2]. An alternative QVT-R semantics is defined in [28], using constraint solving in alloy to implement a least-change interpretation of relations in enforce mode. We have instead formulated a least-change semantics within standard UML/OCL.
Other approaches concern implementation of QVT-R by translation to another language or formalism, but do not provide semantic analysis. For example, [32] translates QVT-R to QVT-O, and [7] translates QVT-R to TGG. The translation to TGG, however, helps to expose semantic ambiguities in the QVT-R semantics. In [39], a fine-grain model of QVT-R computations is defined and used to guide automated execution optimisation. Here we have described how QVT-R specifications can be restricted and organised to avoid selfdependencies of relations and mutual dependencies between relations, and other forms of circular data-dependence. This enables a simpler implementation approach to be adopted whilst retaining high efficiency.
In [36], the problems in defining bx in QVT-R are investigated via seven case studies. Specific limitations of QVT-R are identified: the lack of definite ordering in where clause/target domain execution, which complicates the definition of bx execution strategies, and the lack of facilities for defining variability options, such as rules specific to one execution direction. We have addressed the first problem in our semantics. The second would be a useful facility, especially in cases where most relations can operate in both forward and reverse directions, but some are only valid in one direction. The transformation could be divided into three subtransformations: one (τ ) for the fully bx relations, one (σ ) for the specific forward relations and another (ρ) for the specific reverse relations. Transformation extension could be used to combine τ with σ and τ with ρ.
Additionally, [36] identifies tooling problems for QVT-R, in that the only practical QVT-R tool available, Medini QVT [11], does not support the full language. We have defined tooling support which overcomes this problem via translation to UML-RSDS (effectively a subset of UML) and its tools. This supports domain conditions, relation overriding and transformation extension (unlike Medini) and provides an option to use least-change semantics.
Similar work has been carried out for other MT languages. For example, [4] translate ATL into an intermediate representation with a formal semantics to support verification. Similarly, [35] translate ATL to the Maude formalism. Operationalisation approaches for TGG [5,9] have some similarities to our approach, with the correspondence models in TGG being used in a similar way to QVT-R traces. However, TGG is a simpler language than QVT-R because TGG rules cannot explicitly refer to other rules, and the expression language used for TGG is simpler than OCL. As in our approach, [9] also uses multiple phases to define an incremental operational form of a TGG specification. However, this form is a graph transformation, not a logical representation, and requires the introduction of additional flag attributes. In [25], we define an approach for bidirectionalising UML-RSDS using syntactic inversion of predicates. Although this produces efficient transformations, it requires the specifier to ensure that they remain within the subset of the language which is syntactically invertible. Using QVT-R enables bx to be defined in many cases using a single specification for forward and reverse directions, avoiding the need for syntactic bidirectionalisation. We have extended the QVT-R semantics of [25] to provide variations in target element resolution semantics, including the Medini QVT semantics and updatein-place semantics.

Conclusions
In this paper, we have shown that a coherent mathematical semantics can be given to QVT-R, based on a translation to UML-RSDS. This semantics is compatible with the de facto QVT-R semantics given by the Medini QVT tool, and it is consistent with the (partial) QVT-R to QVT-C translation of [30]. The semantics provides a basis for the static analysis of QVT-R specifications and for the identification of semantic problems in them. We provide variation points to permit alternative target resolution approaches and trace models. In addition, by translating to UML-RSDS, specifiers gain the ability to perform other semantic checks, such as confluence analysis, and to generate efficient implementations of QVT-R specifications in multiple programming languages. By adopting MT design patterns from UML-RSDS, it also becomes possible to systematically construct QVT-R bx of particular kinds. We consider that overall our approach can contribute to increasing the precision of QVT-R semantics and enhancing the usability of the language for practitioners.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.
The Cleanup E constraints are defined as in the separatemodels case, except that the Ri range over all relations with some e : E variable in their tvars Ri set (instead of outvars Ri ). The reason for this change is that target elements can exist because they were copied from the source model, and do not need to be explicitly created. Unlike the separate-models case, the Cleanup E constraints can delete source elements (since the source and target models are the same) and delete traces.
The phases Pres τ (m), Con τ (m), Cleanup τ (m) are then defined from the Pres, Con and Cleanup constraints as in the separate-models case. Correctness conditions (b) to (e) are also applicable to update-in-place transformations, but condition (a) is weakened to: • (a') Relations can only refer to target features/class names via explicitly declared object variables t : T and their features.
Two cases of update-in-place transformation can be distinguished: (i) where a single execution is sufficient to establish the required target model; (ii) where repeated iteration is necessary. We suggest using transformation stereotypes/annotations to distinguish these cases. For iterated transformations, an annotation @iterated could be written before the transformation header to indicate that the transformation should have this semantics.
A simple example of an update-in-place rule for the UML metamodel of Fig. 3  Such transformations may require repeated execution. In this example, Con R (graph) will not be enabled on n which have n.neighbours@ pr e→si ze() = 1; hence, such nodes will not be recorded in R$trace and will be deleted by Cleanup N ode (graph). This implicit deletion may then reduce the set of neighbours of other nodes and result in R being enabled on these in the next iteration.
Con R (graph) is: :: n : Node@pre & n.neighbours@pre-> size() = 1 & not(R$trace@pre->exists ( tr | tr.n = n)) => R$trace->exists( tr | tr.n = n ) This places any node with 1 neighbour into the trace. Pres R (graph) is: R$trace@pre:: if n : Node@pre & n.neighbours@pre-> size() = 1 then n.neighbours@pre->size() = 1 => true else self->isDeleted() endif A node n may be in the trace because on the previous iteration it had one neighbour in the Pres τ phase, but after Cleanup τ it does not. Pres R (graph) will consequently delete the trace containing n, and n will be removed by the following Cleanup N ode (graph). The same actions are taken if n.neighbours has been changed by an incremental modification of the model to falsify n.neighbours.si ze = 1.
The consistency properties (1), (2) and (3) of Sect. 5.4 also apply to update-in-place transformations. (1) is established by the Cleanup τ (m) constraints, and (3) is redundant because the model has been restricted to elements which are in traces. (2) need not be established by a single iteration. For example, in the N ode case, some remaining node may have neighbours.si ze = 1 due to deletions carried out by Cleanup τ (m). Thus, the phases Pres τ (m); Con τ (m); Cleanup τ (m) need to be iterated until both (1) and (2) hold. In such cases, the annotation @iterated should be attached to the transformation.
The iteration of the transformation is defined by the activity of τ : while not(2) do (Pres τ (m); Con τ (m); Cleanup τ (m)) Proof of termination will generally require the use of some variant expression ν : N, whose value is strictly reduced by each iteration, as in [21]. In the above example, the size of N ode.all I nstances() could be used as a variant. Another possibility would be to allow the number of iterations to be manually controlled. In this case, the condition (2) could be tested to determine if the execution is complete.
Examples of update-in-place execution are given in the qvt2umlrsds dataset [27], including an update-in-place solution to the tree to dag case of [36].   [24]) gives some cases of the definitions of read and write frames of OCL constraints. var(P) is the set of all features and entity type names used in P, likewise for var * (P). V ↓v is V with pairs (x, f ) removed, where x ∈ v.

C.1 Read and write frames
In computing wr (P), we also take account of the features and entity types which implicitly depend upon the explicitly updated features and entity types of P, such as inverse association ends. In particular, if an association end role2 has a named opposite end role1, then role1 depends on role2 and vice versa. Creating an instance x of a concrete entity type E also adds x to each supertype extent F of E, and so the extents of these supertypes are also included in the write frames of E→exists(x|Q) and E→exists LC(x|Q) in Table 10.
Deleting an instance x of entity type E by x→is Deleted() may affect any supertype of E and any association end owned by E or its supertypes, and any association end incident with E or with any supertype of E. Additionally, if entity types E and F are related by an association which is a composition at the E end, or by an association with a mandatory multiplicity at the E end, i.e. a multiplicity with lower bound 1 or more, then deletion of E instances will affect F and its features and supertypes and incident associations, recursively. del(x) and del * (x) are the corresponding sets of features and class names potentially written by deletion propagation from x. wr (G) of a set G of constraints is the union of the constraint write frames, likewise for rd(G), wr * (G), rd * (G).

C.2 Design synthesis
The design-level activity stat(P) synthesised from an effective-for-update OCL predicate P is defined systematically based on the structure of P. stat(P) can be read as 'Make P true'. P involving negations not(Q) are normalised where possible so that not is removed, e.g. not(s→includes(x)) is rewritten to s→excludes(x). Table 11 shows the main cases of the stat definition (extended and refined from [24]). In the context of QVT-R semantics, an expression is assignable if it is a local domain variable, or is of the form v. f for a feature f of a target domain object variable v. In UML-RSDS, construction of objects of concrete class E possessing a key is performed by the cr eateBy P K E(keyvalue) operation, whilst creation of objects of concrete classes E without keys is performed by cr eateE(). cr eateBy P K E(v) only creates a new E instance if there is not already an instance E [v] of E with key value v.
Updates to association ends may require additional further updates to opposite association ends, updates to class extents or to features may require further updates to derived and other data-dependent features, and so forth. These updates are all included in the stat activity. In particular, for x→is Deleted(), x is removed from every association end and entity type extent in which it resides, and further cascaded deletions may occur if the association ends are mandatory/composition ends. The or and xor constructs are typically used in cases such as P or Q where Q is an alternative to be established if P fails to be established by stat(P) or stat LC (P). For P xor Q, a normalisation should exist for not(Q).
The clauses for X →exists(x|x.id = v & P1) and X →exists LC(x|x.id = v & P1) with id an identity attribute of X test for existence of an x with x.id = v before creating such an object: this has implications for efficiency but is necessary for correctness: two distinct X elements with the same key value should not exist.
stat LC (P) gives a "least-change" procedural interpretation of expressions P: an update is only performed by this interpretation to establish P if P does not already hold, or if the update would make no change to data in the case P holds. For existential quantifiers E→exists LC(e|P 1 & . . . & P n ), their stat or stat LC effect only creates a new e in cases where there is no existing e : E that satisfies P partially or completely. In the case of partial satisfaction, the updates only for the unsatisfied conjuncts are carried out.
If E has an identity attribute pk and a conjunct P i is of the form e. pk = value, then stat(E→exists LC(e|P 1  The general case for k ≥ 2, k < n is e := eset->any(true); eset := eset->select(Pk); By reusing e : E instances where possible, the redundant creation of instances is avoided; however, this also introduces the possibility of conflicts where one target instance is required to have conflicting attribute values to satisfy a constraint wrt two source instances.

Appendix D: Relation overriding and transformation extension
If relation R is declared as overriding relation S, then both must either be top level, or both non-top level. R should have corresponding relation and primitive domains d R for each relation and primitive domain d S of S, with the same name, model and modality: The type d R .r oot V ariable.t ype of d R should be the same as that of d S or a subclass/descendant of the d S type. R may also have additional domains to those of S. The commonnamed domains of S and R should occur in the same order in both relations. For update-in-place R, S, domains of S with the same root variable name are overridden by the corresponding domains of R on the basis of their modality.
The combination of S and R is expressed as a composed relation P = S ⊕ R. This has domains d S ⊕ d R for common-named domains d of S and R, together with any additional domains of R. The when clause of P is the con-junction S.when and R.when, i.e. the pattern formed by concatenating S.when.predicate and R.when.predicate and removing duplicated conjuncts. The same applies for the where clauses. P is abstract if R is abstract, and concrete if R is concrete. d S ⊕ d R is defined by recursion on the structure of the domain templates. For object template expressions t S = d S . pattern.templateE x pression, t R = d R . pattern. templateE x pression, we can consider t S . part and t R . part to be ordered so that the common-named variables of the parts are listed together in the same order at the start of each part sequence. For two parts on a common property of non-object type, ⊕ is defined on Propert yT emplateI tem to discard the first part: where p.re f erred Propert y.name = v.
Otherwise, if both parts are object definitions of samenamed properties, the definitions are merged: where v : E{ p1} ⊕ w : F{ p2} = u : G{ p3} Object templates can be combined in this manner if w.name = v.name and either F = E or F is a subclass/descendant of E. The result has u = w and G = F.
Parts that belong to either t S or t R and have no corresponding part (with the same property name) in the other template are retained in t S ⊕ t R . The where conditions of the two templates are conjoined to form the where condition of the result.
The combination of two set-typed collection template expressions cte S : Errors may arise if R and S contain same-named domains or same-named variables with conflicting types. For example, object variables x : E, x : F where E and F are not related by inheritance. Error messages are produced in such cases. If S is called in a when or where clause, then the domain.rootVariable.name sequences of the two relations should be the same.
Transformation extension is syntactically represented as τ extends σ in [30], but no semantics is provided. We can infer that τ and σ should have the same set of typed models: The relations of ρ are the union of those of τ and σ , but with common-named rules of the two transformations combined using ⊕.