Using model-driven engineering to automate software language translation

Lano, Kevin; Siala, Hanan

doi:10.1007/s10515-024-00419-y

Using model-driven engineering to automate software language translation

Open access
Published: 28 February 2024

Volume 31, article number 20, (2024)
Cite this article

Download PDF

You have full access to this open access article

Automated Software Engineering Aims and scope Submit manuscript

Using model-driven engineering to automate software language translation

Download PDF

Kevin Lano¹ &
Hanan Siala¹

1370 Accesses
1 Citation
Explore all metrics

Abstract

The porting or translation of software applications from one programming language to another is a common requirement of organisations that utilise software, and the increasing number and diversity of programming languages makes this capability as relevant today as in previous decades. Several approaches have been used to address this challenge, including machine learning and the manual definition of direct language-to-language translation rules, however the accuracy of these approaches remains unsatisfactory. In this paper we describe a new approach to program translation using model-driven engineering techniques: reverse-engineering source programs into specifications in the UML and OCL formalisms, and then forward-engineering the specifications to the required target language. This approach can provide assurance of semantic preservation, and additionally has the advantage of extracting precise specifications of software from code. We provide an evaluation based on a comprehensive dataset of examples, including industrial cases, and compare our results to those of other approaches and tools. Our specific contributions are: (1) Reverse-engineering source programs to detailed semantic models of software behaviour, to enable semantically-correct translations and reduce re-testing costs; (2) Program abstraction processes defined by precise and explicit rules, which can be edited and configured by users; (3) A set of reusable OCL library components appropriate for representing program semantics, and which can also be used for OCL specification of new applications; (4) A systematic procedure for building program abstractors based on language grammars and semantics.

A Concrete Syntax Transformation Approach for Software Language Processing

Article Open access 13 June 2024

An Approach to the Models Translation Intelligent Support for Its Reuse

Model-Based Code-Generators and Compilers - Track Introduction

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The management and maintenance of software, especially of legacy software, has become a significant business and social problem (Agarwal et al. 2021; Khadka et al. 2014; Ogheneovo 2014) which costs increasing human and financial resources to tackle. Critical software systems exist in old languages and platforms, and need to be modernised in order that they can be effectively used and maintained. The costs and time required for manual software modernisation can be extremely high. For example, porting of banking applications from COBOL to Java cost approximately $750 million in the case of one commercial bank (Lachaux et al. 2020).

Automated program translation is therefore an attractive alternative to manual translation or redevelopment of software assets.

Two main approaches have been used for program translation: (i) heuristic approaches using explicit language-to-language mapping rules (De Marco et al. 2018; Sneed 2011; De Marco et al. 2018; Tangible Software 2023; ii) machine learning approaches which learn implicit translation rules from sets of examples (Ahmad et al. 2023; Aggarwal et al. 2015; Chen et al. 2018; Lachaux et al. 2020; Nguyen et al. 2013; Roziere et al. 2022). Approach (i) involves substantial manual effort to create and maintain the mapping rules. Approach (ii) aims to avoid this cost, but has the limitation that it requires large datasets of examples in the source and target languages, that the translation operates in a probabilistic manner, and that the learned rules are not available for inspection/review. Multiple candidate translations may be produced, which need to be selected from (e.g., the best results of Lachaux et al. 2020 are obtained by generating 25 candidate target programs). It is difficult to modify the behaviour of such ML systems, instead specific pre and post-processing actions need to be added (Malyaya et al. 2023). More recently, ML approaches using large language models (LLMs) (Abukhalaf et al. 2023; Hou et al. 2023; Zhao et al. 2023) pre-trained on large code datasets have been applied to code translation (Chen et al. 2021; Wang et al. 2021; Guo et al. 2021; Jana et al. 2023; Lu et al. 2021). These have the best current results for ML program translation approaches, however LLMs also have limitations with regard to accuracy, modifiability, explainability and reliability when applied to code (Liu et al. 2023; Ouyang et al. 2023).

Given the large and increasing number of programming languages which could be the source and targets of program translation, it is clearly impractical in the long term to develop dedicated translations for each separate language pair. Instead, we propose an approach based on dividing the translation process into two stages: (i) abstraction of the source program into a specification, expressed in the UML and OCL international standard software modelling languages (OMG 2014; ii) forward engineering of the specification into the target language. This approach takes advantage of the substantial research and tool development which has taken place over the last 25 years into MDE. There are many MDE tools which provide forward engineering from UML/OCL into multiple target languages, for example (Eclipse AgileUML project 2024; Papyrus toolset 2023; Simulink toolset 2023). Hence we can focus effort on the reverse-engineering step and the representation of programming concepts and constructs using UML and OCL.

The advantages of this approach include:

1.
Semantic preservation can be ensured from source to specification by using explicit abstraction rules which accurately express source code semantics.
2.
Specifications are produced in a general-purpose standard modelling language (UML/OCL), and can be used for forward engineering, quality analysis and quality improvement via refactoring or rearchitecting (Lano et al. 2023).
3.
Abstraction and forward engineering are performed via concise user-configurable scripts which require only grammar-based knowledge to create and edit, not MDE expertise.
4.
An extensive set of library specifications and language-specific library implementations are provided to facilitate code migration.

We address the following research questions:

RQ1:: Can an MDE-based program translation approach be defined to support practical cases of program translation, and to assure the generation of semantically-correct code?
RQ2:: Is the approach able to process a substantial subset of the source language constructs?
RQ3:: Can the approach be effectively customised by end users in order to meet their specific language translation requirements?

Section 2 gives background on MDE and UML/OCL and on the supporting technologies used in our process. We summarise the overall process of language translation in Sect. 3. Section 4 describes the semantic model used to represent program abstractions. Section 5 gives details of the abstraction process steps. Section 6 addresses the issue of semantic preservation by the abstraction process. A systematic process for defining a new abstraction mapping is given in Sect. 7. A detailed evaluation of our approach with respect to the above research questions RQ1, RQ2, RQ3 is given in Sect. 8, and the relation with previous work is described in Sect. 9. Threats to validity are discussed in Sect. 10. Future work and conclusions are given in Sects. 11 and 12.

An initial investigation into the approach was published as a short paper at ICSE 2022 (Lano 2022).

2 Background

In this section we give background information on UML and OCL, and on the text-to-text transformation language, ${\mathcal {CSTL}}$, which we use to define abstraction transformations.

2.1 Model-driven engineering, UML and OCL

Model-driven engineering (MDE) aims to raise the level of abstraction of software development from the code level to the model level, and to make software models the focus of development, ranging from requirements to code production (Brambilla et al. 2012; Lano 2016). The use of precise software representations in well-defined modelling languages supports increased automation and rigour throughout the development process, and MDE has been particularly adopted in domains such as avionics and automotive systems where such properties are essential. General-purpose modelling languages such as the Unified Modelling Language (UML) and Systems Modelling Language (SysML) emphasise the use of graphical notations such as class diagrams and state machines, however textual domain-specific modelling languages (DSLs) may also be used to define software models.

UML resulted from the combination of three leading object-oriented modelling approaches in the 1990s: OMT, the Booch method, and Jacobson’s Objectory. It has been managed by the OMG since 1997 and is currently at Version 2.5.1.^{Footnote 1} UML provides a wide range of modelling notations, including dynamic behaviour models such as state machines and interactions, but here we will use only a subset of the class diagram and use case models, corresponding to the AgileUML subset of UML (Lano 2016). Class diagrams define the data and associated operations of a system. The AgileUML class diagram subset includes the definition of classes with single and multiple inheritance, attributes and associations to other classes, together with operations which may be defined declaratively using pre and post conditions written in OCL, or imperatively in a procedural extension of OCL. Generic classes and operations can be defined, together with interfaces and abstract classes and operations. This subset however excludes specialised class constructs such as nested classes, so such features of programming languages must be translated into more basic constructs during the program abstraction process.

OCL originated in the work of Cook and others on the Syntropy language (Cook and Daniels 1994). OCL was incorporated into version 1.0 of the UML standard as the notation for expressing precise constraints over UML models. Subsequently the language has been revised and expanded, and variant dialects such as EOL (2023) and Eclipse OCL (2022) have been created. OCL provides mathematical datatypes of sets, sequences, bags and tuples, and operations upon these (Cook et al. 2002; OMG 2014). Sets, sequences and bags are each subtypes of a general Collection type, for which operators ${{\rightarrow }union}$, ${{\rightarrow }intersection}$, ${{\rightarrow }select}$ of union, intersection, selection and many others are available. AgileUML and Eclipse OCL add a Map datatype to OCL. This is not a subtype of Collection, but has a similar set of operators. The high-level collection operators of OCL such as ${{\rightarrow }select}$ and ${{\rightarrow }collect}$ are now commonly available in modern programming languages (e.g., as .filter and .map in Swift 5).

2.2 ${\mathcal {CSTL}}$

${\mathcal {CSTL}}$ is a scripting language which defines text-to-text (T2T) transformations in terms of the concrete syntax of the source and target languages. ${\mathcal {CSTL}}$ is used in AgileUML to write code generators from UML and OCL to particular implementation 3GLs, such as Swift 5 or Java 8 (Eclipse AgileUML project 2024). However, it can also be applied to any textual language which has a grammar, and in this paper we employ it to map program language source texts to textual UML/OCL specifications, i.e., to define program abstraction mappings.

A ${\mathcal {CSTL}}$ script operates upon parse trees or abstract syntax trees (ASTs) of the source language, also referred to as AST terms. The metamodel of AST terms which we use to represent program parse trees is shown in Fig. 1. The features map of a term records information about the language element represented by the term, such as its type. For a term t, $ASTTerm.features[t + ""]$ is a sequence of stereotypes or tagged values: $f=v$ where f is a feature name and v the feature value. This information can be set by the actions of ${\mathcal {CSTL}}$ rules and read by ${\mathcal {CSTL}}$ rule conditions. Composite terms with tag tg and n subterms are written as LISP-style lists (tg $t_1$... $t_n)$. Basic terms with tag tg and value v are represented as (tg v). Symbol terms represent individual terminal tokens such as ‘*’ and ‘while’.

A ${\mathcal {CSTL}}$ script to process software language $\mathcal{L}_1$ consists of a collection of rulesets corresponding to the syntax categories of $\mathcal{L}_1$. These categories are the non-terminals of the $\mathcal{L}_1$ grammar, i.e., for the ANTLR Java grammar the rulesets are named statement, expression, classDeclaration, etc. When executed, the ${\mathcal {CSTL}}$ script operates upon syntax trees (ASTTerm instances) of $\mathcal{L}_1$ produced by a parser for $\mathcal{L}_1$. Syntax trees with tag tg are processed by the ruleset with name tg. Each ruleset contains a sequence of ${\mathcal {CSTL}}$ rules. Declarative rules in ${\mathcal {CSTL}}$ notation have the form:

The ${<}when{>}$ clause and conditions are optional. ${\mathcal {CSTL}}$ conditions can test the syntactic category (tag name) of elements bound to ${\mathcal {CSTL}}$ variables, and other properties of these elements.

The left hand side (LHS) of a ${\mathcal {CSTL}}$ rule is a schematic representation of textual concrete syntax in the source language ${\mathcal {L}}_1$, e.g., in Java 7, and the right hand side (RHS) is the corresponding concrete textual syntax in the target language ${\mathcal {L}}_2$, e.g., in OCL or the Kernel Metamodel (KM3) (Jouault and Bezivin 2006) textual notation for UML class models, which the LHS should translate to. The LHS represents a node in a parse tree of ${\mathcal {L}}_1$ elements, i.e., an element of ASTTerm. Apart from literal text concrete syntax items, the LHS may contain variable terms $\_1$, $\_2$, etc, representing direct child terms of the node being processed (the properties of these child terms can be constrained by the optional rule conditions), and the RHS refers to the translation of these child terms also by $\_1$, $\_2$, etc. This enables ${\mathcal {CSTL}}$ mappings to be applied recursively. The special variables $\_*$, $\_+$ denote lists of subterms, e.g., parameter lists of method calls.

Specialised rules are listed before more general rules. Functions f can be applied to (the elements bound to) a variable $\_x$ by the notation _x‘f. f could be a ruleset name in the same script or a built-in function such as first, which returns the first subterm of its argument. Other ${\mathcal {CSTL}}$ scripts f can also be invoked using the same notation. f may also be a user-defined metafeature used to associate specific properties with a term via the features map of Fig. 1. If no rule LHS matches a source node then the node text is copied to the output, there is therefore no need to include rules whose LHS and RHS are the same literal text. General ${\mathcal {CSTL}}$ rules may also have an ${<}action{>}$ clause defining updates to ASTTerm.features.

2.2.1 Using ${\mathcal {CSTL}}$ for program abstraction

As an example of program abstraction, we can use ${\mathcal {CSTL}}$ rules to perform an abstraction mapping from Java 6/7 types to UML/OCL types. The relevant syntax categories in the ANTLR grammar for Java^{Footnote 2} are primitiveType, classOrInterfaceType, typeType, typeTypeOrVoid, typeArguments, and typeArgument. Examples of the ${\mathcal {CSTL}}$ rules for mapping Java types to UML/OCL types are as follows:

The effect of these rules is to map array types and Java collection types to collection and map types in OCL, with corresponding type parameters. Primitive types are mapped to corresponding OCL primitive types. Java language classes such as Object map to OCL language classes. Class types of an application map to corresponding UML class model classes.

For example, the Java type

is parsed by the ANTLR Java parser to the syntax tree

and this is mapped to the OCL type Map(String, int) by the above rules of the ${\mathcal {CSTL}}$ script: the tree has the form (typeType tt), which matches the second rule of the typeType ruleset, with tt bound to $\_1$. tt matches the rule LHS $HashMap~\_1$ of the classOrInterfaceType ruleset, with $\_1$ bound to the $(typeArguments < t1 ~, ~t2 >)$ term. This in turn matches the first rule of the typeArguments ruleset.

The same approach can be used to write ${\mathcal {CSTL}}$ scripts for any source language ${\mathcal {L}}$ which has a grammar and parser. In this paper we use ANTLR grammars and parsers, but other grammar technologies such as SableCC or JavaCC could alternatively be used.

2.2.2 ${\mathcal {CSTL}}$ semantics

${\mathcal {CSTL}}$ semantics is based on AST term matching and text substitution. The semantic representation of each ${\mathcal {CSTL}}$ script s is a function

$$\begin{aligned} {cstl(s): ASTTerm \rightarrow String} \end{aligned}$$

together with auxiliary functions

$$\begin{aligned} {cstl_{tg}(s): ASTTerm \rightarrow String} \end{aligned}$$

for each ruleset tg of s.

The semantics defines the result cstl(s)(t) of applying script s to a composite or basic AST term t.^{Footnote 3}

If t has tag tg, then $cstl(s)(t) = cstl_{tg}(s)(t)$ when s contains a ruleset tg : : which has a rule matching t, otherwise $cstl(s)(t) = t$.

For a ruleset name tg of s,

$$\begin{aligned} cstl_{tg}(s)(t) = rhs[t_1',...,t_n'] \end{aligned}$$

where $t = (tg~t_1~...~t_n)$, each $t_i$ is mapped by s to $t_i'$, and the rule r

of tg : : is the first rule in this ruleset whose lhs matches t and whose Conditions are true. If there is no such matching rule then $cstl_{tg}(s)(t) = t$.

A basic term t matches any rule LHS that consists of a single variable $\_j$. A composite term t matches lhs if t’s subterms t.terms match successive tokens of lhs: symbol terms of t must equal corresponding tokens of lhs, and non-symbol terms $t_i$ are bound to corresponding variables $\_i$ in the token list of lhs. A variable $\_*$ or $\_+$ binds to a list of successive terms which occur between/before/after specific symbol terms. r is applicable to a matched t if $Conditions[t_1,...,t_n]$ also hold. In this case, the script s is applied to each of the $t_i$ to produce $t_i' ~=~ cstl(s)(t_i)$ if $\_i$ occurs as a simple variable expression on the RHS. However, if it occurs as _i‘f for ruleset name f, then $t_i' ~=~ cstl_f(s)(t_i)$, and in the case of a script application _i‘g: $t_i' ~=~ cstl(g)(t_i)$.

A variable $\_*$ or $\_+$ is replaced in rhs by the string concatenation of the $t_i'$ of the terms bound to it. Various built-in functions such as recurse, sum and first have specific denotations. Metafeature applications _i‘f are evaluated as the tagged value of f in $ASTTerm.features[t_i + ""]$, in both the rhs and conditions.

In addition to declarative ${\mathcal {CSTL}}$ rules, there are rules with actions:

These allow information about the terms processed by the rule to be asserted. Typically this would be a typing constraint, e.g., for VB6:

This asserts that the identifier bound to $\_1$ in a declaration of form DIM X() is of sequence type. It adds the stereotype "Sequence" to the features list of the term bound to $\_1$. Subsequent processing within the scope of this declaration can make use of this knowledge to correctly translate an expression such as X(1) to ${X{\rightarrow }at(1)}$.

3 A MDE process for language translation

Our program translation process using MDE consists of successive steps of parsing, abstraction and forward engineering from a source language L1 to a target language L2 (Fig. 2):

1.
A parser for L1 is used to produce syntax trees (ASTs) according to the metamodel of Fig. 1.
2.
These are then input to an abstraction transformation for L1, written using ${\mathcal {CSTL}}$ (Sect. 2.2). The output is a UML/OCL specification in textual form, consisting of class specifications with data features and operations, and use cases defining global processing, such as application initialisation. The result specification may utilise the operations of OCL libraries. We have provided additional libraries to represent program semantics for aspects such as files, dates, exceptions and iterators, which are not present in standard OCL (Sect. 4).
3.
Forward engineering using established MDE techniques is then employed to map the abstracted specification to the target language L2 (Sect. 5.8).

As a parsing technology for the first step, we utilise ANTLR (2023). This is a lightweight tool facilitating the rapid construction and adaptation of parsers. There are ANTLR grammars for over 230 source languages, including all the main 3GLs.^{Footnote 4} For forward engineering we adopt the AgileUML MDE toolset (Eclipse AgileUML project 2024), which is a lightweight MDE platform providing the text-to-text transformation language, ${\mathcal {CSTL}}$ (Lano et al. 2020), and text-based UML/OCL specification. As targets of forward engineering it supports C, Python, Java, Swift, Go, C# and C++. However, our approach is not specific to these technologies, and alternative MDE tool chains, such as Xtext 2021 and Papyrus, could be used. Likewise, in this paper we focus upon ANSI C, VB6, COBOL ‘85, Python, ISO Pascal, JavaScript and Java versions 6 and 7 as source languages, but the same approach can be used for any source 3GL with a suitable parser and grammar.

With regard to the appropriate source and target languages L1, L2 to consider, translations from Java to Python and from Java to C-based languages appear to be the most-requested program language translations (for instance, over 4200 questions on stackoverflow.com concern translating Java code to C/C#/C++, and over 2600 concern Java to Python conversion). The demand for such translations is also evident from the fact that several tools are available in this area, for example at kalkicode.com and the Java to Python converter of Tangible Software (2023). From our consultancy work in the financial services sector, we are also aware of a significant demand for conversion of VB6/VBA and Matlab code to production languages such as C# and C++. Translation from COBOL to modernised languages and platforms has been the subject of research and substantial development effort for decades (Bowen et al. 1993; Sneed 2011; De Marco et al. 2018; Khadka et al. 2014). To analyse and improve machine learning (ML) and other AI applications, analysis of Python programs is necessary. Safety-critical systems (SCS) often use C or C++.

Thus we provide abstraction mappings from Java, Python, C, VB6 and COBOL programs to UML/OCL, making use of existing MDE code generators to perform the second step from UML/OCL to a target programming language. To demonstrate the versatility of our approach, we also define abstractions of Pascal and JavaScript to UML/OCL.

As an example of the process steps, we show in Figs. 3, 4 parts of a legacy VBA/Excel option pricing application, in Fig. 5 the abstracted UML specification, and in Fig. 6 the generated Python version of the application. This illustrates the use of additional OCL libraries for Excel functions (Excel.py) and mathematical functions (ocl.py). The result code can be shown to have the same semantics as the source. The UML model can be used to support quality improvement of the application via refactoring, model-based test case generation, or to support reuse of the application within a new system.

4 The semantic model

The semantic model aims to provide a platform-independent semantic representation of programming language semantics, in a way that is consistent with standard UML/OCL semantics and also amenable to reasoning/verification.

4.1 Data types

As described in Sect. 2.1, OCL has some appropriate collection data structures for program abstraction, however it lacks datatypes for several common programming aspects such as files, dates, iterators and exceptions. Thus these types need to be added in order to utilise OCL as an intermediate representation for program translation. We achieve this by defining additional library components to provide the necessary types and operations (Sects. 4.4 to 4.7). We also use library components to add facilities such as random number generation and numeric format conversions which are missing from OCL.

The type OclAny in OCL is the supertype of all other types, and corresponds approximately to Object in Java. The type OclType of OCL represents OCL types, the closest equivalent in Java is Class. The String datatype of OCL, together with the extended string operations of Eclipse OCL and AgileUML, provides a sufficient basis to represent the semantics of programming language string types and operators. The OCL standard provides mathematical numeric datatypes Integer and Real with unbounded range/precision. Computational datatypes int, long, double can be derived from these and are provided as basic types in AgileUML. int, long and double are respectively the subsets of Integer of 32-bit and 64-bit integers, and the IEEE 754 64-bit floating point subrange of Real. The positive/negative infinity and NaN values of the double subrange are denoted $Math\_PINFINITY$, $Math\_NINFINITY$, $Math\_NAN$. We also define a wide range of numeric operators ${{\rightarrow }sin()}$, ${{\rightarrow }cos()}$, ${{\rightarrow }tan()}$, ${{\rightarrow }sqrt()}$, ${{\rightarrow }cbrt()}$, ${{\rightarrow }asin()}$, ${{\rightarrow }acos()}$, ${{\rightarrow }atan()}$, ${{\rightarrow }exp()}$, ${{\rightarrow }log()}$, ${{\rightarrow }log10()}$, ${{\rightarrow }sinh()}$, ${{\rightarrow }cosh()}$, ${{\rightarrow }tanh()}$, etc, in order to provide convenient representations of program numeric operators. Further mathematical operations are defined in the MathLib and Excel OCL libraries. For strings, a set of regex operators are provided (Lano 2021).

An important aspect of the OCL semantics (Appendix A of OMG 2014) is that the collection type operators Sequence(X), Set(X), etc are monotonic in their argument types: if T is a subtype of S, then Set(T) is considered to be a subtype of Set(S). This property is not necessarily true for program collection types because the Set(T) operations such as add(x) will be more restricted than the corresponding Set(S) operations: they will only be valid for x in T, not for x in S. Thus the monotonicity properties of the OCL collection and map type construction operators are not assumed in our semantic model.

The collection, map, numeric and string types of OCL are value types: an assignment $x~ {:=}~ v$ of a value v of such a type to a variable x stores a copy of v in x, and changes to x do not affect the value of any other variable which holds a copy of v. On the other hand, class types are reference types in OMG (2014), Annex A: objects are represented as object identifiers in the OCL semantics, and an update performed on an object x via one variable that holds the object identifier of x is also visible via any other variable that holds the identifier of x, i.e., aliasing can occur. In programming languages, collections are usually reference types, as are arrays.^{Footnote 5} Strings may also be implemented as reference types. For these reasons it is also necessary to distinguish reference and value equality, in order to correctly express the semantics of program language variables. An operator ${<}{>}{=}$ is added to the OCL expression language to denote reference equality, together with a reference type constructor Ref(T) and address/dereference operators ?, !. As usual, reference equality implies value equality $=$. ?var is a constant with value in Ref(T) if var is a variable of type T. If $x \in Ref(T)$ then !x is a value of type T. Using Ref we can accurately model program reference types, for example, Java $ArrayList{<}String{>}$ can be modelled as Ref(Sequence(String)) in OCL. First-class function types Function(S, T) are also added to OCL, together with $\lambda $-abstraction expressions lambda x : T in expr and function application operator ${f{\rightarrow }apply(x)}$ (Lano and Kolahdouz-Rahimi 2021).

4.2 Expressions

We added the following operators to OCL to represent the semantics of program collections:

${m{\rightarrow }restrict(s)}$—a copy of map m retaining only elements with keys in set s
${m{\rightarrow }antirestrict(s)}$—a copy of map m retaining only elements with keys not in set s
sq.insertAt(i, x)—a copy of sequence sq with x inserted as an element at index i
sq.insertInto(i, sq1)—a copy of sq with sq1 inserted as a subsequence starting at index i
sq.setAt(i, x)—a copy of sq with the i’th element set to x
${sq{\rightarrow }excludingAt(i)}$—a copy of sq with the i’th element removed
${sq{\rightarrow }excludingFirst(x)}$—a copy of sq with the first occurrence of element x removed.

Additional operators are added for instances x of OclAny:

${x{\rightarrow }copy()}$—make a shallow clone of x
${x{\rightarrow }compareTo(y)}$—return −1, 0 or 1 depending on whether $x < y$, $x = y$ or $x > y$.

One significant semantic difference between programming languages and OCL is that OCL expression evaluations do not have side-effects, whilst in programming languages side-effecting expressions are quite frequently used. For example, the “remove element" operation col.remove(x) of Java collections potentially both updates col by removing (the first occurrence of) element x from col, and returns a boolean result indicating if the collection changed. In contrast, the corresponding OCL operator ${col{\rightarrow }excludingFirst(x)}$ returns a copy of col with the first occurrence of x removed, but does not update col. We address this issue by separately representing the query form of a side-effecting program expression as an OCL expression without side-effects, and its side effects (its update form) as an OCL statement (Sect. 5.3).

4.3 Statements

To define program executable behaviour, we adopt the AgileUML textual notation for UML structured activities (Table 1). This notation is a procedural OCL extension similar to the extended executable OCL of Buttner and Gogolla (2014) or Motogna (2008), and observes a strict hierarchical relation between expressions and statements, that is, statements cannot occur as subparts of expressions.

Table 1 AgileUML structured activities: procedural OCL statements

Using model-driven engineering to automate software language translation

Abstract

Similar content being viewed by others

A Concrete Syntax Transformation Approach for Software Language Processing

An Approach to the Models Translation Intelligent Support for Its Reuse

Model-Based Code-Generators and Compilers - Track Introduction

1 Introduction

2 Background

2.1 Model-driven engineering, UML and OCL

2.2 \({\mathcal {CSTL}}\)

2.2.1 Using \({\mathcal {CSTL}}\) for program abstraction

2.2.2 \({\mathcal {CSTL}}\) semantics

3 A MDE process for language translation

4 The semantic model

4.1 Data types

4.2 Expressions

4.3 Statements

4.4 Data structure libraries

4.5 Exception handling

4.6 Files and iterators

4.7 Threads and processes

4.8 Pointers

4.9 Generic classes and operations

5 Program abstraction

5.1 Abstraction of types

5.2 Abstraction of expressions

5.3 Abstraction of statements

5.4 Abstraction of features and classes

5.5 Library components and facilities

5.6 Unstructured control flow

5.7 Untranslated program elements

5.8 Forward engineering from the semantic model

6 Semantic preservation

6.1 Semantic preservation of types

6.2 Semantic preservation of expressions

6.3 Semantic preservation of statements

7 Defining a language abstraction

8 Evaluation and comparison

8.1 RQ1: Assurance of semantic correctness

8.1.1 Java abstraction and translation

8.1.2 Translation of other programming languages

8.2 RQ2: Completeness of source language coverage

8.3 RQ3: Flexibility and customisation of the approach

9 Related work

9.1 Model-driven reverse and re-engineering (MDRE)

9.2 Program translation using explicit rules

9.3 Program translation using machine learning

9.4 Summary

10 Threats to validity

10.1 Threats to internal validity

10.1.1 Instrumental bias

10.1.2 Selection bias

10.2 Threats to external validity

10.2.1 Generalisation to different samples

10.3 Threats to construct validity

10.3.1 Inexact characterisation of constructs

10.4 Threats to content validity

10.4.1 Relevance

10.4.2 Representativeness

10.5 Threats to conclusion validity

11 Limitations and future work

12 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation