1 Introduction

The management and maintenance of software, especially of legacy software, has become a significant business and social problem (Agarwal et al. 2021; Khadka et al. 2014; Ogheneovo 2014) which costs increasing human and financial resources to tackle. Critical software systems exist in old languages and platforms, and need to be modernised in order that they can be effectively used and maintained. The costs and time required for manual software modernisation can be extremely high. For example, porting of banking applications from COBOL to Java cost approximately $750 million in the case of one commercial bank (Lachaux et al. 2020).

Automated program translation is therefore an attractive alternative to manual translation or redevelopment of software assets.

Two main approaches have been used for program translation: (i) heuristic approaches using explicit language-to-language mapping rules (De Marco et al. 2018; Sneed 2011; De Marco et al. 2018; Tangible Software 2023; ii) machine learning approaches which learn implicit translation rules from sets of examples (Ahmad et al. 2023; Aggarwal et al. 2015; Chen et al. 2018; Lachaux et al. 2020; Nguyen et al. 2013; Roziere et al. 2022). Approach (i) involves substantial manual effort to create and maintain the mapping rules. Approach (ii) aims to avoid this cost, but has the limitation that it requires large datasets of examples in the source and target languages, that the translation operates in a probabilistic manner, and that the learned rules are not available for inspection/review. Multiple candidate translations may be produced, which need to be selected from (e.g., the best results of Lachaux et al. 2020 are obtained by generating 25 candidate target programs). It is difficult to modify the behaviour of such ML systems, instead specific pre and post-processing actions need to be added (Malyaya et al. 2023). More recently, ML approaches using large language models (LLMs) (Abukhalaf et al. 2023; Hou et al. 2023; Zhao et al. 2023) pre-trained on large code datasets have been applied to code translation (Chen et al. 2021; Wang et al. 2021; Guo et al. 2021; Jana et al. 2023; Lu et al. 2021). These have the best current results for ML program translation approaches, however LLMs also have limitations with regard to accuracy, modifiability, explainability and reliability when applied to code (Liu et al. 2023; Ouyang et al. 2023).

Given the large and increasing number of programming languages which could be the source and targets of program translation, it is clearly impractical in the long term to develop dedicated translations for each separate language pair. Instead, we propose an approach based on dividing the translation process into two stages: (i) abstraction of the source program into a specification, expressed in the UML and OCL international standard software modelling languages (OMG 2014; ii) forward engineering of the specification into the target language. This approach takes advantage of the substantial research and tool development which has taken place over the last 25 years into MDE. There are many MDE tools which provide forward engineering from UML/OCL into multiple target languages, for example (Eclipse AgileUML project 2024; Papyrus toolset 2023; Simulink toolset 2023). Hence we can focus effort on the reverse-engineering step and the representation of programming concepts and constructs using UML and OCL.

The advantages of this approach include:

  1. 1.

    Semantic preservation can be ensured from source to specification by using explicit abstraction rules which accurately express source code semantics.

  2. 2.

    Specifications are produced in a general-purpose standard modelling language (UML/OCL), and can be used for forward engineering, quality analysis and quality improvement via refactoring or rearchitecting (Lano et al. 2023).

  3. 3.

    Abstraction and forward engineering are performed via concise user-configurable scripts which require only grammar-based knowledge to create and edit, not MDE expertise.

  4. 4.

    An extensive set of library specifications and language-specific library implementations are provided to facilitate code migration.

We address the following research questions:

RQ1:

Can an MDE-based program translation approach be defined to support practical cases of program translation, and to assure the generation of semantically-correct code?

RQ2:

Is the approach able to process a substantial subset of the source language constructs?

RQ3:

Can the approach be effectively customised by end users in order to meet their specific language translation requirements?

Section 2 gives background on MDE and UML/OCL and on the supporting technologies used in our process. We summarise the overall process of language translation in Sect. 3. Section 4 describes the semantic model used to represent program abstractions. Section 5 gives details of the abstraction process steps. Section 6 addresses the issue of semantic preservation by the abstraction process. A systematic process for defining a new abstraction mapping is given in Sect. 7. A detailed evaluation of our approach with respect to the above research questions RQ1, RQ2, RQ3 is given in Sect. 8, and the relation with previous work is described in Sect. 9. Threats to validity are discussed in Sect. 10. Future work and conclusions are given in Sects. 11 and 12.

An initial investigation into the approach was published as a short paper at ICSE 2022 (Lano 2022).

2 Background

In this section we give background information on UML and OCL, and on the text-to-text transformation language, \({\mathcal {CSTL}}\), which we use to define abstraction transformations.

2.1 Model-driven engineering, UML and OCL

Model-driven engineering (MDE) aims to raise the level of abstraction of software development from the code level to the model level, and to make software models the focus of development, ranging from requirements to code production (Brambilla et al. 2012; Lano 2016). The use of precise software representations in well-defined modelling languages supports increased automation and rigour throughout the development process, and MDE has been particularly adopted in domains such as avionics and automotive systems where such properties are essential. General-purpose modelling languages such as the Unified Modelling Language (UML) and Systems Modelling Language (SysML) emphasise the use of graphical notations such as class diagrams and state machines, however textual domain-specific modelling languages (DSLs) may also be used to define software models.

UML resulted from the combination of three leading object-oriented modelling approaches in the 1990s: OMT, the Booch method, and Jacobson’s Objectory. It has been managed by the OMG since 1997 and is currently at Version 2.5.1.Footnote 1 UML provides a wide range of modelling notations, including dynamic behaviour models such as state machines and interactions, but here we will use only a subset of the class diagram and use case models, corresponding to the AgileUML subset of UML (Lano 2016). Class diagrams define the data and associated operations of a system. The AgileUML class diagram subset includes the definition of classes with single and multiple inheritance, attributes and associations to other classes, together with operations which may be defined declaratively using pre and post conditions written in OCL, or imperatively in a procedural extension of OCL. Generic classes and operations can be defined, together with interfaces and abstract classes and operations. This subset however excludes specialised class constructs such as nested classes, so such features of programming languages must be translated into more basic constructs during the program abstraction process.

OCL originated in the work of Cook and others on the Syntropy language (Cook and Daniels 1994). OCL was incorporated into version 1.0 of the UML standard as the notation for expressing precise constraints over UML models. Subsequently the language has been revised and expanded, and variant dialects such as EOL (2023) and Eclipse OCL (2022) have been created. OCL provides mathematical datatypes of sets, sequences, bags and tuples, and operations upon these (Cook et al. 2002; OMG 2014). Sets, sequences and bags are each subtypes of a general Collection type, for which operators \({{\rightarrow }union}\), \({{\rightarrow }intersection}\), \({{\rightarrow }select}\) of union, intersection, selection and many others are available. AgileUML and Eclipse OCL add a Map datatype to OCL. This is not a subtype of Collection, but has a similar set of operators. The high-level collection operators of OCL such as \({{\rightarrow }select}\) and \({{\rightarrow }collect}\) are now commonly available in modern programming languages (e.g., as .filter and .map in Swift 5).

2.2 \({\mathcal {CSTL}}\)

\({\mathcal {CSTL}}\) is a scripting language which defines text-to-text (T2T) transformations in terms of the concrete syntax of the source and target languages. \({\mathcal {CSTL}}\) is used in AgileUML to write code generators from UML and OCL to particular implementation 3GLs, such as Swift 5 or Java 8 (Eclipse AgileUML project 2024). However, it can also be applied to any textual language which has a grammar, and in this paper we employ it to map program language source texts to textual UML/OCL specifications, i.e., to define program abstraction mappings.

A \({\mathcal {CSTL}}\) script operates upon parse trees or abstract syntax trees (ASTs) of the source language, also referred to as AST terms. The metamodel of AST terms which we use to represent program parse trees is shown in Fig. 1. The features map of a term records information about the language element represented by the term, such as its type. For a term t, \(ASTTerm.features[t + ""]\) is a sequence of stereotypes or tagged values: \(f=v\) where f is a feature name and v the feature value. This information can be set by the actions of \({\mathcal {CSTL}}\) rules and read by \({\mathcal {CSTL}}\) rule conditions. Composite terms with tag tg and n subterms are written as LISP-style lists (tg \(t_1\)... \(t_n)\). Basic terms with tag tg and value v are represented as (tg v). Symbol terms represent individual terminal tokens such as ‘*’ and ‘while’.

Fig. 1
figure 1

AST metamodel

A \({\mathcal {CSTL}}\) script to process software language \(\mathcal{L}_1\) consists of a collection of rulesets corresponding to the syntax categories of \(\mathcal{L}_1\). These categories are the non-terminals of the \(\mathcal{L}_1\) grammar, i.e., for the ANTLR Java grammar the rulesets are named statement, expression, classDeclaration, etc. When executed, the \({\mathcal {CSTL}}\) script operates upon syntax trees (ASTTerm instances) of \(\mathcal{L}_1\) produced by a parser for \(\mathcal{L}_1\). Syntax trees with tag tg are processed by the ruleset with name tg. Each ruleset contains a sequence of \({\mathcal {CSTL}}\) rules. Declarative rules in \({\mathcal {CSTL}}\) notation have the form:

figure a

The \({<}when{>}\) clause and conditions are optional. \({\mathcal {CSTL}}\) conditions can test the syntactic category (tag name) of elements bound to \({\mathcal {CSTL}}\) variables, and other properties of these elements.

The left hand side (LHS) of a \({\mathcal {CSTL}}\) rule is a schematic representation of textual concrete syntax in the source language \({\mathcal {L}}_1\), e.g., in Java 7, and the right hand side (RHS) is the corresponding concrete textual syntax in the target language \({\mathcal {L}}_2\), e.g., in OCL or the Kernel Metamodel (KM3) (Jouault and Bezivin 2006) textual notation for UML class models, which the LHS should translate to. The LHS represents a node in a parse tree of \({\mathcal {L}}_1\) elements, i.e., an element of ASTTerm. Apart from literal text concrete syntax items, the LHS may contain variable terms \(\_1\), \(\_2\), etc, representing direct child terms of the node being processed (the properties of these child terms can be constrained by the optional rule conditions), and the RHS refers to the translation of these child terms also by \(\_1\), \(\_2\), etc. This enables \({\mathcal {CSTL}}\) mappings to be applied recursively. The special variables \(\_*\), \(\_+\) denote lists of subterms, e.g., parameter lists of method calls.

Specialised rules are listed before more general rules. Functions f can be applied to (the elements bound to) a variable \(\_x\) by the notation _x‘f. f could be a ruleset name in the same script or a built-in function such as first, which returns the first subterm of its argument. Other \({\mathcal {CSTL}}\) scripts f can also be invoked using the same notation. f may also be a user-defined metafeature used to associate specific properties with a term via the features map of Fig. 1. If no rule LHS matches a source node then the node text is copied to the output, there is therefore no need to include rules whose LHS and RHS are the same literal text. General \({\mathcal {CSTL}}\) rules may also have an \({<}action{>}\) clause defining updates to ASTTerm.features.

2.2.1 Using \({\mathcal {CSTL}}\) for program abstraction

As an example of program abstraction, we can use \({\mathcal {CSTL}}\) rules to perform an abstraction mapping from Java 6/7 types to UML/OCL types. The relevant syntax categories in the ANTLR grammar for JavaFootnote 2 are primitiveType, classOrInterfaceType, typeType, typeTypeOrVoid, typeArguments, and typeArgument. Examples of the \({\mathcal {CSTL}}\) rules for mapping Java types to UML/OCL types are as follows:

figure b

The effect of these rules is to map array types and Java collection types to collection and map types in OCL, with corresponding type parameters. Primitive types are mapped to corresponding OCL primitive types. Java language classes such as Object map to OCL language classes. Class types of an application map to corresponding UML class model classes.

For example, the Java type

figure c

is parsed by the ANTLR Java parser to the syntax tree

figure d

and this is mapped to the OCL type Map(Stringint) by the above rules of the \({\mathcal {CSTL}}\) script: the tree has the form (typeType tt), which matches the second rule of the typeType ruleset, with tt bound to \(\_1\). tt matches the rule LHS \(HashMap~\_1\) of the classOrInterfaceType ruleset, with \(\_1\) bound to the \((typeArguments < t1 ~, ~t2 >)\) term. This in turn matches the first rule of the typeArguments ruleset.

The same approach can be used to write \({\mathcal {CSTL}}\) scripts for any source language \({\mathcal {L}}\) which has a grammar and parser. In this paper we use ANTLR grammars and parsers, but other grammar technologies such as SableCC or JavaCC could alternatively be used.

2.2.2 \({\mathcal {CSTL}}\) semantics

\({\mathcal {CSTL}}\) semantics is based on AST term matching and text substitution. The semantic representation of each \({\mathcal {CSTL}}\) script s is a function

$$\begin{aligned} {cstl(s): ASTTerm \rightarrow String} \end{aligned}$$

together with auxiliary functions

$$\begin{aligned} {cstl_{tg}(s): ASTTerm \rightarrow String} \end{aligned}$$

for each ruleset tg of s.

The semantics defines the result cstl(s)(t) of applying script s to a composite or basic AST term t.Footnote 3

If t has tag tg, then   \(cstl(s)(t) = cstl_{tg}(s)(t)\)   when s contains a ruleset tg :  :  which has a rule matching t, otherwise \(cstl(s)(t) = t\).

For a ruleset name tg of s,

$$\begin{aligned} cstl_{tg}(s)(t) = rhs[t_1',...,t_n'] \end{aligned}$$

where \(t = (tg~t_1~...~t_n)\), each \(t_i\) is mapped by s to \(t_i'\), and the rule r

figure e

of tg :  :  is the first rule in this ruleset whose lhs matches t and whose Conditions are true. If there is no such matching rule then \(cstl_{tg}(s)(t) = t\).

A basic term t matches any rule LHS that consists of a single variable \(\_j\). A composite term t matches lhs if t’s subterms t.terms match successive tokens of lhs: symbol terms of t must equal corresponding tokens of lhs, and non-symbol terms \(t_i\) are bound to corresponding variables \(\_i\) in the token list of lhs. A variable \(\_*\) or \(\_+\) binds to a list of successive terms which occur between/before/after specific symbol terms. r is applicable to a matched t if \(Conditions[t_1,...,t_n]\) also hold. In this case, the script s is applied to each of the \(t_i\) to produce \(t_i' ~=~ cstl(s)(t_i)\) if \(\_i\) occurs as a simple variable expression on the RHS. However, if it occurs as _i‘f for ruleset name f, then \(t_i' ~=~ cstl_f(s)(t_i)\), and in the case of a script application _i‘g: \(t_i' ~=~ cstl(g)(t_i)\).

A variable \(\_*\) or \(\_+\) is replaced in rhs by the string concatenation of the \(t_i'\) of the terms bound to it. Various built-in functions such as recurse, sum and first have specific denotations. Metafeature applications _i‘f are evaluated as the tagged value of f in \(ASTTerm.features[t_i + ""]\), in both the rhs and conditions.

In addition to declarative \({\mathcal {CSTL}}\) rules, there are rules with actions:

figure f

These allow information about the terms processed by the rule to be asserted. Typically this would be a typing constraint, e.g., for VB6:

figure g

This asserts that the identifier bound to \(\_1\) in a declaration of form DIM X() is of sequence type. It adds the stereotype "Sequence" to the features list of the term bound to \(\_1\). Subsequent processing within the scope of this declaration can make use of this knowledge to correctly translate an expression such as X(1) to \({X{\rightarrow }at(1)}\).

3 A MDE process for language translation

Our program translation process using MDE consists of successive steps of parsing, abstraction and forward engineering from a source language L1 to a target language L2 (Fig. 2):

  1. 1.

    A parser for L1 is used to produce syntax trees (ASTs) according to the metamodel of Fig. 1.

  2. 2.

    These are then input to an abstraction transformation for L1, written using \({\mathcal {CSTL}}\) (Sect. 2.2). The output is a UML/OCL specification in textual form, consisting of class specifications with data features and operations, and use cases defining global processing, such as application initialisation. The result specification may utilise the operations of OCL libraries. We have provided additional libraries to represent program semantics for aspects such as files, dates, exceptions and iterators, which are not present in standard OCL (Sect. 4).

  3. 3.

    Forward engineering using established MDE techniques is then employed to map the abstracted specification to the target language L2 (Sect. 5.8).

Fig. 2
figure 2

MDE program translation process

As a parsing technology for the first step, we utilise ANTLR (2023). This is a lightweight tool facilitating the rapid construction and adaptation of parsers. There are ANTLR grammars for over 230 source languages, including all the main 3GLs.Footnote 4 For forward engineering we adopt the AgileUML MDE toolset (Eclipse AgileUML project 2024), which is a lightweight MDE platform providing the text-to-text transformation language, \({\mathcal {CSTL}}\) (Lano et al. 2020), and text-based UML/OCL specification. As targets of forward engineering it supports C, Python, Java, Swift, Go, C# and C++. However, our approach is not specific to these technologies, and alternative MDE tool chains, such as Xtext 2021 and Papyrus, could be used. Likewise, in this paper we focus upon ANSI C, VB6, COBOL ‘85, Python, ISO Pascal, JavaScript and Java versions 6 and 7 as source languages, but the same approach can be used for any source 3GL with a suitable parser and grammar.

With regard to the appropriate source and target languages L1, L2 to consider, translations from Java to Python and from Java to C-based languages appear to be the most-requested program language translations (for instance, over 4200 questions on stackoverflow.com concern translating Java code to C/C#/C++, and over 2600 concern Java to Python conversion). The demand for such translations is also evident from the fact that several tools are available in this area, for example at kalkicode.com and the Java to Python converter of Tangible Software (2023). From our consultancy work in the financial services sector, we are also aware of a significant demand for conversion of VB6/VBA and Matlab code to production languages such as C# and C++. Translation from COBOL to modernised languages and platforms has been the subject of research and substantial development effort for decades (Bowen et al. 1993; Sneed 2011; De Marco et al. 2018; Khadka et al. 2014). To analyse and improve machine learning (ML) and other AI applications, analysis of Python programs is necessary. Safety-critical systems (SCS) often use C or C++.

Thus we provide abstraction mappings from Java, Python, C, VB6 and COBOL programs to UML/OCL, making use of existing MDE code generators to perform the second step from UML/OCL to a target programming language. To demonstrate the versatility of our approach, we also define abstractions of Pascal and JavaScript to UML/OCL.

As an example of the process steps, we show in Figs. 3, 4 parts of a legacy VBA/Excel option pricing application, in Fig. 5 the abstracted UML specification, and in Fig. 6 the generated Python version of the application. This illustrates the use of additional OCL libraries for Excel functions (Excel.py) and mathematical functions (ocl.py). The result code can be shown to have the same semantics as the source. The UML model can be used to support quality improvement of the application via refactoring, model-based test case generation, or to support reuse of the application within a new system.

Fig. 3
figure 3

Example VBA case—spreadsheet

Fig. 4
figure 4

Example VBA case—source code

Fig. 5
figure 5

Example VBA case—UML model

Fig. 6
figure 6

Example VBA case—generated Python

4 The semantic model

The semantic model aims to provide a platform-independent semantic representation of programming language semantics, in a way that is consistent with standard UML/OCL semantics and also amenable to reasoning/verification.

4.1 Data types

As described in Sect. 2.1, OCL has some appropriate collection data structures for program abstraction, however it lacks datatypes for several common programming aspects such as files, dates, iterators and exceptions. Thus these types need to be added in order to utilise OCL as an intermediate representation for program translation. We achieve this by defining additional library components to provide the necessary types and operations (Sects. 4.4 to 4.7). We also use library components to add facilities such as random number generation and numeric format conversions which are missing from OCL.

The type OclAny in OCL is the supertype of all other types, and corresponds approximately to Object in Java. The type OclType of OCL represents OCL types, the closest equivalent in Java is Class. The String datatype of OCL, together with the extended string operations of Eclipse OCL and AgileUML, provides a sufficient basis to represent the semantics of programming language string types and operators. The OCL standard provides mathematical numeric datatypes Integer and Real with unbounded range/precision. Computational datatypes int, long, double can be derived from these and are provided as basic types in AgileUML. int, long and double are respectively the subsets of Integer of 32-bit and 64-bit integers, and the IEEE 754 64-bit floating point subrange of Real. The positive/negative infinity and NaN values of the double subrange are denoted \(Math\_PINFINITY\), \(Math\_NINFINITY\), \(Math\_NAN\). We also define a wide range of numeric operators \({{\rightarrow }sin()}\), \({{\rightarrow }cos()}\), \({{\rightarrow }tan()}\), \({{\rightarrow }sqrt()}\), \({{\rightarrow }cbrt()}\), \({{\rightarrow }asin()}\), \({{\rightarrow }acos()}\), \({{\rightarrow }atan()}\), \({{\rightarrow }exp()}\), \({{\rightarrow }log()}\), \({{\rightarrow }log10()}\), \({{\rightarrow }sinh()}\), \({{\rightarrow }cosh()}\), \({{\rightarrow }tanh()}\), etc, in order to provide convenient representations of program numeric operators. Further mathematical operations are defined in the MathLib and Excel OCL libraries. For strings, a set of regex operators are provided (Lano 2021).

An important aspect of the OCL semantics (Appendix A of OMG 2014) is that the collection type operators Sequence(X), Set(X), etc are monotonic in their argument types: if T is a subtype of S, then Set(T) is considered to be a subtype of Set(S). This property is not necessarily true for program collection types because the Set(T) operations such as add(x) will be more restricted than the corresponding Set(S) operations: they will only be valid for x in T, not for x in S. Thus the monotonicity properties of the OCL collection and map type construction operators are not assumed in our semantic model.

The collection, map, numeric and string types of OCL are value types: an assignment \(x~ {:=}~ v\) of a value v of such a type to a variable x stores a copy of v in x, and changes to x do not affect the value of any other variable which holds a copy of v. On the other hand, class types are reference types in OMG (2014), Annex A: objects are represented as object identifiers in the OCL semantics, and an update performed on an object x via one variable that holds the object identifier of x is also visible via any other variable that holds the identifier of x, i.e., aliasing can occur. In programming languages, collections are usually reference types, as are arrays.Footnote 5 Strings may also be implemented as reference types. For these reasons it is also necessary to distinguish reference and value equality, in order to correctly express the semantics of program language variables. An operator \({<}{>}{=}\) is added to the OCL expression language to denote reference equality, together with a reference type constructor Ref(T) and address/dereference operators ?, !. As usual, reference equality implies value equality \(=\). ?var is a constant with value in Ref(T) if var is a variable of type T. If \(x \in Ref(T)\) then !x is a value of type T. Using Ref we can accurately model program reference types, for example, Java \(ArrayList{<}String{>}\) can be modelled as Ref(Sequence(String)) in OCL. First-class function types Function(ST) are also added to OCL, together with \(\lambda \)-abstraction expressions lambda x : T in expr and function application operator \({f{\rightarrow }apply(x)}\) (Lano and Kolahdouz-Rahimi 2021).

4.2 Expressions

We added the following operators to OCL to represent the semantics of program collections:

  • \({m{\rightarrow }restrict(s)}\)—a copy of map m retaining only elements with keys in set s

  • \({m{\rightarrow }antirestrict(s)}\)—a copy of map m retaining only elements with keys not in set s

  • sq.insertAt(ix)—a copy of sequence sq with x inserted as an element at index i

  • sq.insertInto(isq1)—a copy of sq with sq1 inserted as a subsequence starting at index i

  • sq.setAt(ix)—a copy of sq with the i’th element set to x

  • \({sq{\rightarrow }excludingAt(i)}\)—a copy of sq with the i’th element removed

  • \({sq{\rightarrow }excludingFirst(x)}\)—a copy of sq with the first occurrence of element x removed.

Additional operators are added for instances x of OclAny:

  • \({x{\rightarrow }copy()}\)—make a shallow clone of x

  • \({x{\rightarrow }compareTo(y)}\)—return −1, 0 or 1 depending on whether \(x < y\), \(x = y\) or \(x > y\).

One significant semantic difference between programming languages and OCL is that OCL expression evaluations do not have side-effects, whilst in programming languages side-effecting expressions are quite frequently used. For example, the “remove element" operation col.remove(x) of Java collections potentially both updates col by removing (the first occurrence of) element x from col, and returns a boolean result indicating if the collection changed. In contrast, the corresponding OCL operator \({col{\rightarrow }excludingFirst(x)}\) returns a copy of col with the first occurrence of x removed, but does not update col. We address this issue by separately representing the query form of a side-effecting program expression as an OCL expression without side-effects, and its side effects (its update form) as an OCL statement (Sect. 5.3).

4.3 Statements

To define program executable behaviour, we adopt the AgileUML textual notation for UML structured activities (Table 1). This notation is a procedural OCL extension similar to the extended executable OCL of Buttner and Gogolla (2014) or Motogna (2008), and observes a strict hierarchical relation between expressions and statements, that is, statements cannot occur as subparts of expressions.

Table 1 AgileUML structured activities: procedural OCL statements

In order to represent a wide range of source programs, this basic activity language needs to be extended to include statements for exception handling (Table 2).

Table 2 Extended OCL statements for exception handling

4.4 Data structure libraries

While most programming languages provide the core aggregate data types of OCL (sequences, sets and maps) either as built-in types or from libraries, variations and extensions of these types are also used: C++ provides sorted sets and maps, and multisets and multimaps. VB Collections and COBOL indexed files combine sequence and map aspects, and can be regarded as ordered maps. Java has sorted and unsorted versions of sequences, sets and maps. Multisets can be represented by the OCL Bag type, and multimaps can be represented as maps to sets. However, sorted sets, maps and sequences are missing from OCL and we will simulate these using the unsorted versions of the collections/maps. Ordered maps are represented as Sequence(Map(OclAnyOclAny)).

Dates and times are widely used in programs, but there is no OCL representation for these, thus we add a library component OclDate to represent types such as Java Date and Calendar, C# DateTime, VB Date, C++ and C tm/\(time\_t\) and Python datetime.

Conversion between bytes and higher-level data types is also a fundamental process in programming. We add facilities for byte processing and number format conversions in the library component MathLib. MathLib also supports random number generation and bitwise operations. New unary operators \({{\rightarrow }char2byte()}\) and \({{\rightarrow }byte2char()}\) are introduced.

Reflection facilities are provided in some languages (e.g., in Java and Python) but are not available in others. OCL supports a limited form of reflection via the OclType type (corresponding to Class in Java), and the operation \({e{\rightarrow }oclType()}\) to obtain the type of an expression. We extended OclType to enable inspection of the data and behaviour features of each OclType instance, representing the features of the classes of a given application. In other words, the structure of the application data is encoded in OclType instances in the generated code. This enables a reflection facility to be provided for translated applications, even in target languages without intrinsic reflection support.

4.5 Exception handling

Most modern programming languages have adopted the try-throw-catch paradigm for exception handling, however OCL uses a more abstract exceptions concept whereby a distinguished expression value, invalid, represents a non-terminating or exception-producing computation. This value is propagated through expression evaluations. In this respect the abstraction gap between programming languages and OCL is too wide, and we instead represent program exception handling via a library type OclException and the procedural OCL statements of Table 2. These statements represent the common forms of exception handling as found in Java, C++, C#, Python, Swift, etc. Subclasses of OclException are used to represent the main categories of exception which are used in different programming languages, such as system errors, IO exceptions, assertion exceptions, arithmetic exceptions, indexing exceptions, casting exceptions, invalid type/format of data, and data access violations. Source programming language exception types are abstracted to these OCL types, and the OCL types are in turn mapped to target language-specific exceptions.

The theoretical model of exceptions appropriate for OCL semantics defines sets E.raisedInstances() of E instances for each exception class E, which represent the instances of E which have been raised but not yet caught by a catch statement. For each E the inclusion

$$\begin{aligned} E.raisedInstances()~\subseteq ~ E.allInstances() \end{aligned}$$

holds.

An error ex statement for E instance ex adds ex to E.raisedInstances(), and likewise an assert Cond do message statement with a failed Cond creates and raises an AssertionException instance with the message. Expression evaluations which can produce an exception (e.g., because of an attempt to apply a map to an element not in its domain) also create and raise instances of appropriate exception classes (such as IndexingException). A catch e : E do stat statement tests for the existence of an instance ex in E.raisedInstances(), and then assigns one such ex to e if one exists, removes ex from E.raisedInstances(), and executes stat.

4.6 Files and iterators

Iterators are widely-used in programming languages, but are not supported as first-class elements in standard OCL. Thus we define a library class OclIterator which supports the creation of iterators for collections, with random-access bidirectional iterator operations hasNext(), moveForward(), hasPrevious(), moveBackward(), moveTo(index), etc. A unary operator \({{\rightarrow }iterator()}\) is introduced, which can be applied to any OCL collection, and returns a new OclIterator for the collection. If the underlying collection is already in sequential form, then the iterator provides a navigation interface for manipulating the collection, and updates via the iterator also directly update the collection. In the abstraction of Java programs, the \({{\rightarrow }iterator()}\) operator is used to express the construction of iterators, enumerations and string tokenizers from Java collections, maps and strings. OclIterator is also used to represent generator objects in JavaScript and Python, and to represent C pointer arithmetic, with x.moveForward() expressing \(x{++}\), etc.

OclIterator instances are used to represent SQL result sets as iterable sequences of maps of type Map(StringOclAny), representing the rows returned from a relational database query.

Files can also be regarded as a navigation and manipulation wrapper for an underlying sequence of data (bytes, text lines or objects). The model of files in most programming languages is consistent with this concept. We define a library class OclFile to represent all kinds of local file and stream. This class also provides operations to query the existence and status of files.

OclDatasource represents SQL databases, HTTP Internet connections and TCP sockets. It depends upon OclIterator, OclDate and OclFile.

4.7 Threads and processes

Thread processing is supported via a library class OclProcess. UML classes with the \({\ll }active{\gg }\) stereotype correspond to Java Runnable and Python callable classes. OclProcess also supports the creation of OS processes and querying the environment. Currently, timed processes and process groups are not supported, but these are intended extensions.

4.8 Pointers

Pointer types and pointer arithmetic are not represented in standard OCL, however these are an essential part of C and C++ programming. Pointers are used in three distinct ways: (i) to pass a reference of a data item to a function/operation instead of the item itself; (ii) to support navigation/iteration over a data structure by means of pointer increment/decrement; (iii) to use a function pointer.

To provide a semantic representation for case (i), a parameterised reference type Ref(T) is introduced into OCL, with associated operators ?v to obtain a reference to a variable, and !p to dereference a reference. To represent case (ii), the navigation pointer p to data structure d is expressed as an OclIterator for d, with navigation and dereference translated into iterator operations. Case (iii) is covered by the definition of function types Function(ST), enabling the use of functions as values.

4.9 Generic classes and operations

Generics are a central facility in some languages (C++, Swift, modern versions of Java and C#). OCL provides generic collection types Set(T), Sequence(T), etc., but there is no means to declare generic operations or generic classes. We add a facility to define type parameters of operations, using the notation

figure h

where the typePars are a comma-separated list of identifiers.

Likewise, generic classes can be written as:

figure i

The generation of code for such generic elements in target languages without generics can be achieved either by type erasure (as for Python) or by generating concrete instantiations \(op\_T1\), \(op\_T2\), etc., for each actual usage \(op{<}T1{>}\), \(op{<}T2{>}\) of the element that occurs in a particular application (as in C++). As discussed in Sect. 4.1, it cannot be assumed that a programming language generic class construction \(C{<}T{>}\) is monotonic in T wrt subtyping.

5 Program abstraction

This section gives the detailed steps of the program abstraction process, and highlights the significant issues which need to be addressed in this process.

5.1 Abstraction of types

Table 3 shows the general abstraction strategy used to map program data types to the extended UML/OCL types of the semantic model.

Table 3 Mapping of program types to UML/OCL types

In general, the abstraction process conflates different programming language types into single OCL types. For example, OCL treats characters as length-1 strings. We abstract all Java, C, VB6, Python, COBOL ‘85 and JavaScript sequential data structures such as arrays, linked lists and array lists into OCL sequences, and all map-like structures such as Java Hashtable and TreeMap into OCL maps. Java Enumeration, Iterator, ResultSet, StringTokenizer and ListIterator are all abstracted to OclIterator, and all varieties of files and IO streams are abstracted to OclFile (Sect. 4.6). Sorted sets, sequences and maps are simulated by the unsorted versions provided by OCL.

5.2 Abstraction of expressions

Programming language expressions are abstracted to OCL expressions. Numeric, boolean and string literal values map to corresponding values in OCL, with minor syntactic differences. Numeric operators also map to corresponding numeric operators in OCL, with integer division (\(\backslash \) in VB6, // in Python 3) being mapped to the div operator of OCL, and modulus operators, such as \(\%\) in C-based languages, being mapped to OCL mod. For boolean operators, these are usually defined in a ‘short-circuit’ manner in programming languages, with left-to-right evaluation order, in contrast to the symmetric evaluation semantics of the OCL and, or operators (OMG 2014). Thus for Java we have the abstraction mappings:

figure j

For the main collection types of lists, sets and maps, program operators can be abstracted to corresponding OCL operators (e.g., Java’s list add operation abstracts to OCL \({{\rightarrow }including}\)), but with the proviso that side-effecting operators need to be translated both to a query form (as an expression) and update form (as a statement).

In many cases there are immediate translations from program expression constructs to corresponding OCL expressions, for example, the mapping from Java conditional expressions and type tests to OCL can be directly defined by \({\mathcal {CSTL}}\) rules:

figure k

(Note that Java’s instanceof can only be used where the second argument is a non-generic class or interface type).

However in some cases it is necessary to interpret the source language elements using more elaborate semantic representations, if there is no exact equivalent of the source element in OCL. For example, the headSet Java operation on sorted sets has the following semantic representation as a query on the OCL unsorted set representing the sorted set:

figure l

Similarly, the Collections.replaceAll(colxy) operation of Java has the following semantic representation as an update:

figure m

5.3 Abstraction of statements

The main forms of programming language statements translate directly into the procedural OCL statement notation used in AgileUML (Tables 1, 2). For example, for Java we have:

figure n

Recall that the notation _i‘f means that the ruleset with name f should be applied to \(\_i\) (Sect. 2.2). In the final case above, program expressions used as statements should be interpreted by the updateForm of a program expression, which is a procedural OCL statement, rather than by the usual expression interpretation (the query form) as a declarative OCL expression. Because declarative OCL expressions are side-effect free, program expressions with side-effects are translated into a combination of the query and update forms of the expression. For example, the Java statement:

figure o

is translated to

figure p

There may be both pre side-effects and post side-effects, respectively before and after the evaluation of the query form of the expression. In general, the abstraction stat‘updateForm of an expression statement stat will be a composition

figure q

where \(stat'\) is a translation using query forms of subexpressions of stat.

For example, ++i has pre update form \(i ~{:=}~ i+1\) and query form i, whereas i++ has post update form \(i ~{:=}~ i+1\) and query form i. The query form of Collections.replaceAll is given by:

figure r

Abstraction of switch and for loops is complex due to the syntactic complexity and variant forms possible for these kinds of program statements. In particular, we abstract switch statements to conditionals, and map Java extended for statements

figure s

to corresponding procedural OCL statements

figure t

5.4 Abstraction of features and classes

Application-specific classes defined in a program are mapped to corresponding classes in the UML/OCL representation, with their data features and methods mapped to attributes and operations in UML. C structs, Pascal records and JavaScript constructor functions are also represented as classes. C structs and Pascal records are value types, this is semantically represented by a stereotype \({\ll }struct{\gg }\) of the abstracted classes. Nested classes in Java are represented as classes associated to their container class. VB, COBOL, Python, Pascal, C and JavaScript also define top-level module/program data and code. The translations of these elements are placed in a class that represents the complete program.

As an example of application abstraction, Figs. 7 and 8 show the code and abstraction for part of the options pricing application of Fig. 3.

Fig. 7
figure 7

Example VBA case—option pricing routines source code

Fig. 8
figure 8

Example VBA case—code abstraction

The abstraction rules preserve the structure of the source program in the abstracted representation, facilitating traceability. Recursive and generic operations in the code remain as recursively-defined and generic operations in the UML/OCL abstraction. Tail recursion may be replaced by iteration via a refactoring transformation at the semantic model level (Lano et al. 2023).

Overloaded operations are translated to overloaded operations in UML/OCL, however it should be taken into account that some target languages, such as C and Python, do not support operation overloading, and hence forward engineering to these languages will not be fully supported. Optional operation parameters are translated as parameters which may have null values. Default initial parameter values p = val are interpreted as conditional assignments at the operation start:

figure u

Vararg parameters/arguments are represented as sequence-valued parameters/arguments.

5.5 Library components and facilities

Programming languages may have extensive language libraries for data structures, algorithms and file processing, and in addition they may provide auxiliary language features such as multithreading facilities. Because programs in the languages can make essential use of the libraries or extended features, the semantics of such libraries and features must also be abstracted. Table 4 summarises the translation of Java library classes to the semantic model.

Table 4 Java library representations

5.6 Unstructured control flow

Unstructured control flow elements, such as goto statements in C, Pascal and VB, are abstracted to function calls, using the concept of a continuation (Reynolds 1993) and functional abstraction (Lano 1994). A new function is introduced for each code label, the semantics of the function represents all code from the label point onwards to the end of the enclosing operation (or to the next RETURN statement in the case of VB). continue and break to a label in Java and JavaScript are treated in the same manner.

5.7 Untranslated program elements

GUI facilities are not currently translated by our process, however these could (in principle) also be abstracted into a specification in terms of generic GUI concepts such as frames, forms, buttons, fields, etc. Forward engineering tools could then produce the required type of GUI from the GUI specification, such as a web-based or mobile app GUI.

Asynchronous operations and asynchronous operation invocation are not modelled. Currently, non-SQL databases such as HBase or Firebase are not modelled. These are intended as future extensions. Likewise, specialised financial analysis libraries are future extensions.

5.8 Forward engineering from the semantic model

MDE tools typically provide code generators to map UML/OCL specifications into executable code in a range of target languages. Common targets are Java, C++, C, C# and Python. AgileUML incorporates code generators for Java, C# and C++. These are designed to produce efficient and semantically-correct code from UML/OCL specifications. They have been extensively tested over 20 years of use. Additional AgileUML code generators targetted at C and Python are defined by OCL specifications (Lano et al. 2017). However, the recommended approach for developing new code generators is to use \({\mathcal {CSTL}}\), as described in Sect. 2.2. This enables a faster and simpler definition of code generators, compared to other approaches. Three generators of this form, for Swift 5, Java 8 and Go 15, are provided with AgileUML, and have been extended to process the OCL extensions for program translation.

For example, the following additional statement rules for the Swift 5 code generator deal with the translation of assert and error statements to Swift:

figure v

In the same manner, the Java 8 and Go code generators have been extended to process the enlarged OCL used to represent program semantics.

Additional OCL libraries are usually defined as external UML components, which means that they are used only for type-checking at the specification level, and code is not generated for them by AgileUML. Either language-specific implementations of the component need to be provided for each target language, or the types and features of the component should be entirely translated into target language types and features.

6 Semantic preservation

Semantic preservation means that the semantics of program types, and evaluation semantics of source program expressions are correctly simulated by their abstractions in OCL, and likewise that the execution semantics of program statements is preserved by their abstractions.

The semantics of OCL types is given by their extension as a set of values (which may be specific to a given model and may change over time), and by the denotation of the operators on the type (OMG 2014; Richters and Gogolla 1998):

$$\begin{aligned} {[[~]]_{OCL}: OclTypes \times State \rightarrow \mathbb {P}(Value)} \end{aligned}$$

\([[T]]_{OCL}(s)\) gives the extension of the type \(T \in OclTypes\) in a given state/environment \(s \in State\), where State = Identifier Value, and Value is a domain of values including numeric, string, boolean, collection and map values (OMG 2014; Richters and Gogolla 1998). For example, the Integer type has the constant extension \(\mathbb {Z}\), whilst a model-specific class type has state-dependent extension given by the set of object instances of the class which exist in s: s(Tname) where Tname is the name of T.

The evaluation semantics of OCL expressions can be defined by an interpretation function

$$\begin{aligned} {[[~]]_{OCL}: OclExpr \times State \rightarrow Value} \end{aligned}$$

giving the value of the expression in a given environment (OMG 2014; Richters and Gogolla 1998). This interpretation can be extended to a statement interpretation, whereby statements are semantically represented by pre/post relations between states:

$$\begin{aligned} {[[ ~]]_{OCL}: OclStat \rightarrow (State \leftrightarrow State)} \end{aligned}$$

Similar definitions can be given for programming languages, however in contrast to OCL, program behaviour may be defined by side-effecting expressions, and this aspect needs to be correctly represented via abstraction mappings.

A formal abstraction mapping \(\alpha \) consists of type, expression and statement mappings:

$$\begin{aligned}{} & {} \alpha _T: ProgTypes \rightarrow OclTypes \\{} & {} \alpha _E: ProgExpr \rightarrow OclExpr \\{} & {} \alpha _S: ProgStat \rightarrow OclStat\end{aligned}$$

\(\alpha _E\) includes the mapping of literal values from the source programming language.

Such mappings can be derived from \({\mathcal {CSTL}}\) specifications, for example

$$\begin{aligned} \alpha _T(t) ~=~ cstl(abst)(t^{*}) \end{aligned}$$

for each source language type t, where abst is a type-abstraction script for the source language, and \(t^{*}\) is the AST of t according to the grammar used by abst.

Semantic preservation of program types \(Tp \in ProgTypes\) means that the OCL extension of the abstraction \(\alpha _T(Tp)\) of Tp satisfies the same membership properties as the program semantics extension of Tp, relative to the translation \(\alpha _E\) of values:

$$\begin{aligned} x \in [[Tp]]_{Prog}(s) ~\implies ~ \alpha _E(x) \in [[\alpha _T(Tp)]]_{OCL}(\alpha (s)) \end{aligned}$$

The abstraction \(\alpha (s)\) of a state s: Identifier Value is defined as that \(s'\) with \(\textrm{dom}(s') = \textrm{dom}(s)\) and \(s'(id) = \alpha _E(s(id))\).

In addition, the abstracted interpretation of operators on elements of the type should preserve the type operator semantics. Semantic preservation of expression evaluation means that if \([[ e ]]_{Prog}(s) = v\) at the program level, then \([[ \alpha _E(e) ]]_{OCL}(\alpha (s)) = \alpha _E(v)\) at the abstraction level. Semantic preservation of statement semantics means that if \((s,s') \in [[ stat ]]_{Prog}\) at the program level, then

$$\begin{aligned} (\alpha (s),\alpha (s')) \in [[ \alpha _S(stat) ]]_{OCL} \end{aligned}$$

at the abstraction level.

6.1 Semantic preservation of types

The int type in our semantic model represents 32-bit integers, and long represents 64-bit integers. The Integer type of OCL is used to represent unbounded integer types such as Java’s BigInteger. For real-number computations the IEEE 754 64-bit floating-point standard is assumed to be satisfied by the source platform/language. This is mandated by the language specifications in the case of VB6 Double, JavaScript Number and Java double types, and is a common representation of the C double type (and hence also of the Python float type). The OCL double type represents this range in our semantic model. We do not currently represent other floating-point sizes. Unbounded arbitrary-precision real numbers are represented by the OCL Real type.

With the above choices, it is direct to show semantic preservation of numeric types. Boolean types of programming languages are mapped to the OCL boolean type, and program string types to OCL String. These mappings are semantics-preserving. One area of complexity is that program languages may use a char type to represent individual characters, and this can be equivalent to a small subtype of int. In contrast, OCL represents individual characters as strings of length 1. To address this issue we use the \({{\rightarrow }char2byte()}\) and \({{\rightarrow }byte2char()}\) operators to convert between String and int representations of a character, depending upon the context of use of the character in the source program. The char numeric type is mapped to int.

The abstraction mappings represent program arrays arr (which usually have a fixed length) by OCL sequences (with variable length). This is semantically valid if the source program only uses valid array operations on arr: assignment and query of individual arr elements, array comparison by (reference) equality, and initialisation of arr. Each of these can be simulated by sequence operations which preserve the source semantics (Table 5). Note that program arrays may be 0-indexed (in Java, C and Python), whilst OCL sequences are 1-indexed. Hence a conversion of indexing operations is necessary.

Table 5 Semantic preservation of Java and C array operations

As discussed in Sect. 4, program array, collection and map types are usually reference types, whereas OCL collections and maps are value types. To ensure semantic preservation of collections/maps, variables of these types should not be aliased, and operation parameters of these types should be read-only within the operations. Alternatively, program collection types T could be mapped to \(Ref(T')\) where \(T'\) is the simple value-based representation of T.

6.2 Semantic preservation of expressions

Semantic preservation of numeric, boolean and String expressions can only be asserted subject to a number of conditions and restrictions. In particular numeric computations should remain within their defined types in the source code, so that there are no numeric overflows, underflows or exceptions due to division by zero. The restrictions are similar to those required for program semantic analysis (Barnes 1997).

For example, in a C program environment s resulting from the declaration

figure w

the denotation of x in s in C semantics is a pointer ptr0 to the first character ‘T’ of the C string (a sequence of 5 characters terminated by ‘\(\backslash {0}\)’). The program evaluation semantics \([[ x+1 ]]_C(s)\) of expression \(x+1\) is a pointer ptr1 to the ‘e’ character of the string. The \(\alpha _S\) abstraction defined by the \({\mathcal {CSTL}}\) abstraction scripts for C interprets the declaration as

figure x

and hence the abstracted state \(\alpha (s)\) maps identifier x to the OCL string “Text". \(\alpha _E(x+1)\) is \({x.subrange(2,x{\rightarrow }size())}\), i.e., the tail of the string, and this has the same evaluation semantics in OCL as the abstraction of ptr1. Thus \([[ \alpha _E(e) ]]_{OCL}(\alpha (s)) = \alpha _E(v)\) in this case. Semantic preservation fails in this situation if an attempt is made to navigate beyond the end of the string x.

Likewise, for program arrays arr and sequential collection values col, accesses arr[i], col.get(i), etc should be valid in the source program. These accesses are then correctly interpreted by sequence accesses \({arr{\rightarrow }at(i'+1)}\), \({col{\rightarrow }at(i'+1)}\), etc.

6.3 Semantic preservation of statements

The mapping of assignments, conditionals and simple while loops is direct and correct by construction. In the case of statements which execute expression side-effects, the additional statement representing the side-effect needs to be recursively extracted from the expression. There may be multiple side-effects from a given expression, and these need to be combined in the correct order.

For example, incrementing a string pointer x by the C expression statement y = x++; corresponds to

figure y

at the abstraction level.

To ensure semantic preservation by the general statement abstraction mapping, only structured code should be used, without branching to labelled statements. There should be no aliasing except for variables of user-defined class types (or of other types which are represented by classes in UML/OCL). Transformation of do and for loops to while loops does not preserve source semantics if there are continue statements in the loop bodies, likewise for the transformation of switch statements to conditional statements. Table 6 summarises the semantic preservation conditions for the Java abstraction mapping.

Table 6 Semantic preservation conditions for abstraction of Java 6/7 to UML/OCL

7 Defining a language abstraction

A systematic process is carried out to define an abstraction mapping for a new source language \({\mathcal {L}_1}\):

  1. 1.

    Obtain a grammar definition for \({\mathcal {L}_1}\), e.g., from the ANTLR grammar repository.

  2. 2.

    Identify appropriate sources for the semantic definition of the language, such as the official language reference documents.

  3. 3.

    Define a conceptual mapping of the language elements to UML/OCL, identifying the appropriate representations of types, expressions, statements, operations and modules.

  4. 4.

    Evaluate if this mapping is consistent and (in principle) preserves source semantics.

  5. 5.

    Systematically go through the language grammar, for each grammar non-terminal NT define a corresponding ruleset NT :  :  in the \(\mathcal {CSTL}\) abstraction script, whose rules \(r_c\) handle each production/case c of the grammar definition of NT. The LHS of \(r_c\) expresses the format of case c in \(\mathcal {CSTL}\) notation. The RHS expresses the intended UML/OCL abstraction of the LHS.

  6. 6.

    Abstraction processing may involve multiple iterations through the ASTs—for example, (i) to obtain global data and operations for a class representing the main application; (ii) to obtain the detailed operation definitions. These separate phases of analysis can be defined in separate \(\mathcal {CSTL}\) scripts.

For example, for VB6, a grammar VisualBasic6.g4 exists in the ANTLR grammar repository, and Microsoft Com (2022) provides a semantics definition for the VBA dialect of VB6 (our principal interest in VB is to re-engineer financial system codes from Excel-based applications that use VBA). We can identify from the documentation that the VB6 Long type corresponds to OCL int, and VB6 Double to OCL double. We can identify that numeric comparator expressions should translate directly to corresponding expressions in OCL. The operator & of VB6 corresponds to string concatenation \(+\) in OCL. The specialised operator LIKE of VB6 corresponds to the exact regular-expression matching operator \({{\rightarrow }isMatch}\) of OCL (Lano 2021), whilst IS corresponds to reference equality \({<>=}\).

The VB6 grammar definition for the valueStmt non-terminal includes the cases:

figure z

Thus the corresponding abstraction ruleset valueStmt :  :  must have rules for each of the 9 binary operators of these cases, which express the identified semantic mapping:

figure aa

In some cases the representation may require calls to one or more of the OCL extension libraries, such as:

figure ab

which has the abstraction rules:

figure ac

In some cases there may not be any OCL language element or existing library component/operation which serves to provide the required semantics. In this case the reverse-engineer must decide either to explicitly exclude the \({\mathcal {L}}_1\) language element from the scope of their abstraction (e.g., because the element is relatively obscure and rarely-used) or identify a suitable OCL language/library extension to represent it. Library extensions are preferred to language extensions, in order to reduce the impact of the extension. Where possible, extensions should be justified by reference to a need identified for multiple source languages, not only a specific \({\mathcal {L}}_1\). The impact of the extension on semantic preservation should be identified, together with any restrictions necessary for semantic preservation.

For example, the VB6 LOAD Lib statement loads an executable object at runtime (Microsoft Com 2022). There is no existing OCL library facility for this, however similar mechanisms exist in other languages, such as Class.forName(Lib) in Java. Thus there is some justification for adding a new library operation

figure ad

Appropriate library components for this operation could be OclType or OclProcess. Implementations of the new operation then need to be added to all target language implementations of OclType or OclProcess.

The above process can be partly automated: from an ANTLR grammar L1.g an outline \(\mathcal {CSTL}\) script L1.cstl can be derived using a \(\mathcal {CSTL}\) script antlr2cstl based on the ANTLR grammar for ANTLR4. However, definition of the detailed abstraction rules requires expertise in the source language and in UML/OCL.

8 Evaluation and comparison

In this section we evaluate our approach with respect to the research questions RQ1, RQ2, RQ3 of Sect. 1. We evaluate example abstraction translations from Java (versions 6 and 7), JavaScript, VB6, Python versions 2 and 3, ISO Pascal, COBOL ‘85 and ANSI C to UML/OCL. All artefacts used in this evaluation are provided in the dataset zenodo.org/records/10540845.

The README file describes how to replicate our results using these artefacts.

Table 6 summarises the abstraction mapping from Java 6/7 language elements to UML/OCL. Table 7 summarises the abstraction mapping from ANSI C language elements to UML/OCL. The actual representation of numeric types will depend upon the C environment being considered: C long may correspond to OCL int on 32-bit platforms, for example.

Table 7 Abstraction mapping from C to UML/OCL

Table 8 shows the abstraction of JavaScript to UML/OCL, Table 9 the abstraction of VB6, and Table 10 the abstraction of COBOL ‘85.

Table 8 Abstraction mapping from JavaScript to UML/OCL
Table 9 Abstraction mapping from VB6 to UML/OCL
Table 10 Abstraction mapping from COBOL ‘85 to UML/OCL

Table 11 shows the abstraction mapping for Python versions 2 and 3, and Table 12 the abstraction mapping for ISO Pascal.

Table 11 Abstraction mapping from Python 2/3 to UML/OCL
Table 12 Abstraction mapping from Pascal to UML/OCL

8.1 RQ1: Assurance of semantic correctness

We answer this question by evaluating the accuracy of abstractions and translations for a variety of source and target languages, on large datasets of cases, including external examples from established translation benchmarks.

To evaluate the correctness of translations, we use the computational accuracy measure of Lachaux et al. (2020). This measure is the percentage of tests from a test set for each case which give the same results for both the source and target versions of a program. We also use the runtime equivalence measure of Jana et al. (2023): this is the percentage of cases for which all tests give the same results on source and target versions of a program. If all cases have the same number of tests, then runtime equivalence is a stronger requirement than computational accuracy:

$$\begin{aligned} runtime ~equivalence ~\le ~ computational~ accuracy \end{aligned}$$

8.1.1 Java abstraction and translation

We apply our Java 6/7 to Python, Swift, C#, Go, C, C++ and Java8 via UML/OCL translations to 100 Java cases, consisting of 61 examples of Java library facilities (from java.io, java.lang, java.math and java.util), 34 examples of Java language features, and 5 cases of complete Java applications, including three cases taken from a package of financial software (bond valuation: Bondapp; yield curve computation: NSapp; CDO risk evaluation: CDOapp). In addition, we apply the Java to UML/OCL abstraction mapping to extracts of large application cases from external organisations: the ANTLR AllInOne7.java test file for the ANTLR Java parser, the first 50 Java validation examples from Lachaux et al. (2020), and the ANTLR 4 source code. Finally, we also apply the Java to C# translation to 50 cases (10% of the Java validation cases) from the widely-used CodeXGLUE Java/C# translation dataset (Lu et al. 2021). Table 13 summarises the main Java translation cases and results.

The computational accuracy for the cases in each category is shown in the Translation accuracy columns of Table 13.

Table 13 Java 6/7 evaluation cases: computational accuracy

Our results can be compared with those of java2python and Transcoder for the Java to Python cases reported in Lachaux et al. (2020). Java2python only achieves an accuracy of 38.3% on the Java to Python examples of Lachaux et al. (2020), whilst Transcoder achieves 68.7% accuracy using generation of 25 candidate solutions for each input. Our results (on a similar number of tests—463 tests are used in Lachaux et al. (2020)) are significantly better than these scores. Whilst (Lachaux et al. 2020) only translate individual functions, our approach translates complete programs.

In terms of runtime equivalence, 80% of cases passed all tests. The original and translated versions of the financial applications returned identical numeric results for all tests. These results contrast with the 22% of completely correct Java-to-Python translations reported for Transcoder in Malyaya et al. (2023), and the 49.66% runtime equivalence accuracy achieved by CoTran Jana et al. (2023).

Table 14 compares our results on 50 cases (10%) from the CodeXGLUE validation dataset (Lu et al. 2021) with the results of Liu et al. (2023); Lu et al. (2021) obtained using large language models. The LLM results are evaluated using syntactic similarity to a ‘gold standard’ correct translation g. Note that when a semantically correct translation t is produced by our approach, this translation could be used as a valid gold standard for syntactic correctness, i.e., \(g = t\), and hence our semantic correctness value could also be regarded as a syntactic correctness figure. Our Java to C# semantic/syntactic accuracy of 86.3% on the dataset can be contrasted to the best result of 63.9% syntactic accuracy reported for CodeT5 in Liu et al. (2023). As noted in Liu et al. (2023), results for ML translation approaches may be overestimates due to duplication of data between the training and evaluation datasets (this is around 9.5% for CodeXGLUE). We also noted considerable similarity of examples within the CodeXGLUE validation dataset itself, which may also inflate accuracy results for CodeXGLUE (using any translation approach).

Table 14 Java to C# evaluation: CodeXGLUE cases

Table 20 compares our results for semantic accuracy to the baselines for ML and rule-base approaches (Transcoder and java2python).

For the large external Java cases, we measure the accuracy of abstraction, by manually comparing the semantics of the abstraction to the original version for each test case (Table 15).

Table 15 Java evaluation: external cases

8.1.2 Translation of other programming languages

Table 16 summarises the computational accuracy results for translation between all the considered programming languages, and Table 17 summarises the runtime equivalence results.

For C we translate to Swift, C# and Go. There are 70 C examples, divided into categories of statements (14 cases and 33 tests), declarations/types (36 cases and 53 tests) and libraries (20 cases and 61 tests). The examples and tests are taken from Kernighan and Ritchie (1988).

Table 16 Evaluation cases for all languages: computational accuracy
Table 17 Evaluation cases for all languages: runtime equivalence

For JavaScript there are 100 examples, divided into categories of language constructs (49 cases and 71 tests), data types (36 cases and 91 tests) and libraries (15 cases and 35 tests). The examples and tests were taken from the online JavaScript manuals Mozilla (2023). Translation is to Python 3.

There are 100 VB6/VBA examples, 60 for language constructs [123 tests taken from Microsoft Com (2022)], 30 for built-in functions [74 tests taken from Microsoft Com (2022)] and 10 parts of an industrial case of 2000 LOC used for bond pricing using a genetic algorithm for yield curve fitting (Lano et al. 2023), with 51 tests designed in consultation with the case provider.

We examined 100 examples of COBOL code, including examples from the language manual ClearPath Enterprise Servers 2015 and textbook (Parkin 1982). In addition, we applied the abstraction and translation process to parts of the large industrial case of Lano and Malik (1999). The cases are divided into language constructs (25 cases and 58 tests), statements (65 cases and 138 tests), and functions (10 cases and 33 tests). The tests were taken from the same sources as the examples. Translation is to Java and C#.

There are 100 Python cases, divided into language (27 cases and 45 tests), statements (34 cases and 74 tests) data types (30 cases and 72 tests) and library cases (9 cases and 35 tests). These cases and tests have been taken from the online reference manual at python.org. Translation is to Java and C#. In addition, we evaluated the Python to Java translation on the first 40 cases of the AVATAR AtCoder dataset (Ahmad et al. 2023), consisting of complete programs. Table 18 summarises the results for computational accuracy, compared with the best result (for Transcoder-ST Roziere et al. 2022) reported in Ahmad et al. (2023) for the complete AVATAR dataset. The runtime equivalence of our approach for these cases is 78%. Whilst there appears to be less duplication in AVATAR compared to CodeXGLUE, many of the Python programs in the AVATAR dataset use poor coding practices, such as using one variable to hold multiple types of data.

Table 18 Comparison with Ahmad et al. (2023): computational accuracy

There are 50 Pascal cases, taken from textbooks and the examples of freepascal. org, and these are translated to Java.

Table 19 summarises the size distributions of the evaluation cases for each language.

Table 19 Evaluation cases: size distributions

It can be seen from Tables 16, 17 that translations involving a large semantic distance between the source and target (such as C to Swift, COBOL to Java or Python to Java) are generally less accurate than those between similar languages (such as Java to C#). In particular, translation from Python to Java is hindered by the lack of explicit types in Python, which leads to many semantic errors in the generated Java. This issue could be partly addressed by more powerful type inference at the UML/OCL level. Nevertheless the overall level of accuracy is quite high, and for the industrial Java and VB cases all numerical computations were translated without error.

Table 20 compares our results to the baselines for Java to C++ and Java-Python translation reported in Lachaux et al. (2020), using java2python as a rule-based approach and Transcoder as a ML approach, with the same computational accuracy measure for each approach.

Table 20 Comparison with baseline approaches: computational accuracy

Table 21 compares our results to the results reported in Jana et al. (2023) for Java-Python translation, using Transcoder and CoTran, and the runtime-equivalence accuracy measure.

Table 21 Comparison with baseline approaches: runtime-equivalence accuracy

In terms of the efficiency of the translation process, the abstraction stage is comparable to code generation. Table 22 shows the time taken for this step, and for code generation, for the Java application cases. The time is computed as the average of 3 executions, on a Windows 10 quad-core laptop (Intel i5 2.8GHz processor).

Table 22 Performance on Java application cases

Figures 9 and 10 show how abstraction time varies with the size of input source code, for Java and VB6. The times are computed as for Table 22. In each case a linear time complexity is observed. These can be compared with the inference time of 5 min for CodeXGLUE translation examples given in Lu et al. (2021).

In summary, for RQ1 it can be concluded that the accuracy of the translation is high compared with other translation approaches, and that the computational efficiency is satisfactory.

Fig. 9
figure 9

Abstraction time for Java cases

Fig. 10
figure 10

Abstraction time for VB6 cases

8.2 RQ2: Completeness of source language coverage

Completeness of the approach can be measured in terms of the percentage of the Java, JavaScript, Python, VB6, COBOL ‘85 and C grammar rules, including rule variants, which have corresponding abstraction rules in the reverse-engineering scripts for these languages.

Table 23 shows the percentages of ANTLR Java grammar rules/rule cases which have corresponding abstraction rules, for each of the main syntactic divisions of Java. All of the 48 arithmetic, relational, shift, assignment and other expression operators of Java 7 are abstracted. Each abstraction rule is exercised by at least one of the evaluation cases of Tables 13 or 15. For C, 138 of the 153 grammar rules/cases of Kernighan and Ritchie (1988) are covered by the abstraction script (90%), and 158 of 179 library operations (88%). For JavaScript, 258 of 324 grammar rules/rule options are covered (80%), and 39 of 53 library components (74%). For VB6, 197 of 229 statement kinds, built-in operators, functions or types from Microsoft Com (2022) are covered (86%). For COBOL ‘85, all statement kinds which occur both in the ANTLR Cobol85.g4 grammar and in ClearPath Enterprise Servers (2015) are covered, this is 38 of the 49 statement kinds in the ANTLR grammar (78%).

Table 23 Grammar rule coverage: Java

Table 24 summarises the grammar coverage for the considered source languages.

Table 24 Grammar coverage for all languages

Overall we can conclude for RQ2 that the completeness of the approach is satisfactory, and that it is possible in principle to achieve 100% language coverage.

8.3 RQ3: Flexibility and customisation of the approach

We can evaluate flexibility by quantifying the effort required to modify or extend the \({\mathcal {CSTL}}\) scripts and other artefacts used in our program translation process.

For the abstraction step, the critical artefacts are the abstraction \({\mathcal {CSTL}}\) scripts. The Java, JavaScript, VB6, Python, Pascal, COBOL ‘85 and C abstraction scripts are comparable in size and effort to the forward engineering code generation scripts for Java, Go and Swift (Table 25). There are 2427 LOC in the three code generation scripts, with an average production rate of 285.5 LOC per person month. There are 15533 LOC in the seven abstraction scripts, with an average production rate of 621 LOC/person month.

The writer/editor of a \({\mathcal {CSTL}}\) script needs to understand the grammar structure and categories of the source language, and the syntax of the target language, but does not need to know either the source or target language metamodels. Specific MDE expertise is not required. The scripts do not require compilation, instead they can be immediately re-executed after a modification, using the cgtl command line tool.

Table 25 \({\mathcal {CSTL}}\) script size and development effort

These figures indicate that the typical development and modification costs for a \({\mathcal {CSTL}}\) script is of the order of 536 LOC/month, or 24 LOC/person day. Since \({\mathcal {CSTL}}\) code is approximately 3 times as concise as equivalent Java/3GL code, this represents a high level of productivity.

Abstraction scripts may require more effort to develop, compared to code generation scripts, because programming languages typically have larger and more complex grammars than UML or OCL, and utilise extensive libraries. However the effort remains practicable, based on our experience with Java, JavaScript, Python, Pascal, VB6, COBOL ‘85 and C. We have found that use of \({\mathcal {CSTL}}\) substantially accelerates the creation and maintenance of abstraction scripts compared to the coding of abstraction mappings in Java. For Java abstraction there are both \({\mathcal {CSTL}}\) and Java-coded abstractors, developed in parallel. A change (such as adding a rule to handle an additional grammar production) that may take 5 min in the case of the \({\mathcal {CSTL}}\) script Java2UML.cstl could demand an hour or more of coding and debugging time for the corresponding Java-coded abstractor. The reason is that explicit processing of the input AST structures and output UML/OCL structures involves complex searching, navigation and feature access/update, whereas a \({\mathcal {CSTL}}\) script uses pattern matching and substitution to abstract from the details of AST and output language representations.

In the case of large and complex grammars such as the COBOL grammar, the antlr2cstl command line tool can be used to partly automate the construction of the abstraction script based on the ANTLR grammar definition. This tool also helps to ensure completeness of the abstraction script, because a template \({\mathcal {CSTL}}\) rule is generated for each valid grammar rule production. This technique was used extensively to create COBOL2UML.cstl. For such grammars it may require several person years to define an abstraction mapping in a conventional programming or model transformation language, and we did not attempt to do so.

New OCL libraries have been developed to represent program semantics (Table 26). These are UML/OCL text files of moderate size and required only 32 person days of effort in total to construct and test (with productivity of 96 LOC/day). Adding a new library is therefore a relatively low-cost activity, but the choice of representation has to be considered carefully, since the library should (ideally) be of general use for representing the particular software aspect, rather than being specific to a particular translation task.

Table 26 OCL library size and development effort

Overall it can be concluded for RQ3 that the level of effort needed to adapt the process to use an alternative translation policy is quite moderate, for \({\mathcal {CSTL}}\)-based tools. Software practitioners do not need specialised MDE expertise to understand or modify \({\mathcal {CSTL}}\) files, an understanding of software language grammars and parse tree structures is sufficient.

9 Related work

Our work is located in the field of model-driven reverse and re-engineering (MDRE), and also relates to program translation in general, which may involve machine learning or the coding of explicit language-to-language translation rules, instead of MDE techniques. We compare our approach to these research areas in the following subsections.

9.1 Model-driven reverse and re-engineering (MDRE)

There has been substantial research and application of MDE over the last 20 years, however the focus of most MDE work has been predominately on forward engineering, with relatively few approaches addressing reverse-engineering or re-engineering, such as Bruneliere et al. (2014); Fleurey et al. (2007); Fuhr et al. (2013); Izquierdo and Molina (2014); Krasteva et al. (2013); Perez (2003); Sabir et al. (2019); Sen and Mall (2016).

To fully examine related work in MDRE we conducted a systematic literature review (SLR) of published research in this field, over the period 2000–2023. We followed the SLR procedure of Kitchenham and Charters (2007). We defined search strings and applied these to five leading publication databases (Scopus, Xplore, WoS, ACM digital library, and Google Scholar), obtaining a total of 938 candidate papers. These were then screened for quality and relevance, resulting in 73 selected primary studies, which defined 55 distinct MDRE approaches.

In these 55 approaches, various MDE tools were used (Fig. 11), of which ATL was the most popular (9 approaches). The only specific MDRE tool used by multiple approaches (6) was MoDisco (Bruneliere et al. 2014).

Fig. 11
figure 11

MDE tools/technologies used in MDRE approaches

In terms of the source and target programming languages supported by MDRE work, we found a strong bias towards Java, which has also been the case for the MDE field in general. 21 approaches operated upon Java as a source language, and 7 applied to versions of C, whilst only 4 handled COBOL. VB and Python were only treated by one approach each. In this respect our approach provides support for a wider range of input languages compared to the MDRE state-of-the-art. Regarding target languages, 6 approaches targetted Java versions, whilst other languages such as Python and C# were each only addressed by a single approach.

The target representation for reverse-engineering approaches was primarily UML models (20 approaches), with class diagrams the most popular (12 approaches). Otherwise, the Knowledge Discovery Metamodel (KDM) (Deltombe et al. 2012; Perez-Castillo et al. 2011) was the target of 9 approaches. 19 approaches produced abstract syntax trees (ASTs), 9 XML and 4 XMI. The purpose of reverse-engineering was primarily program comprehension and documentation (17 approaches), with 14 approaches addressing system modernisation, 10 system migration, and 7 program translation (Bruneliere et al. 2014; Candel 2019; Claudia et al. 2011; Fuhr et al. 2013; Heidenreich et al. 2011; Lano 2022; Reus et al. 2006). Overall there are few MDRE approaches which extract fine-grained semantic representations from programs (and only two approaches extract OCL—Claudia et al. 2011 and Lano 2022), and there is a lack of approaches which support translation of multiple input languages to multiple target languages.

MoDisco is an established MDRE approach and tool for reverse engineering which uses metamodels to represent the structure and syntax of source language programs (Bruneliere et al. 2014). This provides a consistent framework for program analysis tools, but does not provide precise abstraction of source program semantics, which is necessary for accurate program translation. Likewise, the REMICS methodology is focussed on the structural representation of legacy applications for migration and modernisation (Krasteva et al. 2013).

In Fleurey et al. (2007), a customised metamodel is used to represent reverse-engineered business applications. This includes partial representation of program semantics. Izquierdo and Molina (2014) define a text-to-model transformation language, Gra2Mol, to perform program abstraction to models. This is an ATL-style language which is considerably more complex than \({\mathcal {CSTL}}\). Similarly, the GReTL MT language is used by Fuhr et al. (2013) to define customised global queries to extract specifications from programs. Reverse-engineering of Java programs to statemachines is used in Sen and Mall (2016) to analyse reactive control systems. They analyse Java bytecode rather than source code. Similarly, Eichberg et al. (2010); Keschenau (2004) use model transformations to abstract and analyse bytecode. Reverse-engineering bytecode may have advantages in reducing the complexity of the abstraction process and the number of source constructs which need to be abstracted, however it could lead to obscure specifications and hence we choose to work from source code in our process. Formal semantic approaches have also been used for the reverse-engineering of database schemas. For example, Perez (2003) use term rewriting to abstract database schemas to class diagrams.

As a result of our survey we found that most work on MDRE has focussed on the creation of code visualisations such as class, sequence and activity diagrams, for example El Beggar et al. (2013); Keschenau (2004); Korshunova et al. (2006); Sabir et al. (2019); Siala and El-Etri (2007). These approaches are aimed at supporting program comprehension and do not produce the fine-grained semantic models of program behaviour that are necessary to support semantically-precise program translation. The mainstream MDRE work using KDM or similar metamodels (Bruneliere et al. 2014; Krasteva et al. 2013; Perez-Castillo et al. 2010, 2011) to represent software systems also does not model source program semantics in sufficient detail to support semantics-preserving program translation.

To conclude, MDRE has the advantage that it produces models from code, hence having the potential for supporting system evolution to multiple different target platforms and languages. By extracting models from code it also facilitates design recovery, and system comprehension, quality improvement and reuse. However MDRE has the disadvantages that:

  1. 1.

    It uses complex technologies requiring specialised MDE skills and significant resources. Customisation or extension of a re-engineering pipeline typically needs high expertise in the metamodels and transformation languages involved.

  2. 2.

    It involves multiple translation or transformation steps, each of which need to be shown to preserve semantics in order to establish that a source-to-target translation is semantics preserving.

As a program translation technique, MDRE is therefore most appropriate for cases where system models need to be produced in addition to translated code, and where source and target languages/platforms are significantly different, and not for translations between similar languages, where the effort required in MDRE would not be cost-effective compared to a code-to-code translation.

9.2 Program translation using explicit rules

Apart from MDRE approaches, two other general strategies have been used for program translation: (i) via heuristic manually-created rules (De Marco et al. 2018; Java2Python 2023; De Marco et al. 2018; Sneed 2011; ii) via machine learning approaches inducing implicit translation rules from large code datasets involving the source and target languages (Ahmad et al. 2023; Aggarwal et al. 2015; Chen et al. 2018; Guo et al. 2021; Jana et al. 2023; Liu et al. 2023; Malyaya et al. 2023; Nguyen et al. 2013; Roziere et al. 2022; Szafraniec et al. 2023; Zhu et al. 2022).

The first strategy has the advantage that precise rules are defined, which are deterministic and could, in principle, be shown to preserve semantics. However the approach has the disadvantage that a large number of specific rules are necessary to translate from one programming language to another, and the effort must be repeated for each pair of languages under consideration. The size of some source languages such as COBOL or Java in terms of the number of specialised facilities and supporting libraries for the languages means that it is infeasible to manually code a complete set of translation rules for such languages. The commercial tools using this approach do not usually support inspection or customisation of their mapping rules.

Examples of open-source and commercial program conversion tools are (Java2C 2023; Java2ObjectiveC 2023; Java2Python 2023; Tangible Software 2023). The reliability of such tools can be quite poor (Lachaux et al. 2020), and end-user customisation of these tools may be impossible or require substantial effort. Such approaches appear most suitable for translation between closely-related languages, including between versions of the same language (such as Python versions 2 and 3).

Internal intermediate languages may be used to generalise the rule-based approach to support multiple source and target languages. Intermediate languages are an established means of factoring compilation or reverse-engineering processes, e.g., Lattner and Adve (2004); van Zuylen (1993). Approaches to employing an intermediate specification language for re-engineering have made use of the Z formal language (Bowen et al. 1993) or other mathematical formal methods (Liu et al. 1997). Z is a highly-abstract language, and forward engineering from Z is a complex process and not suitable for the production of general software. The same consideration applies to related formalisms such as algebraic specification languages and temporal logics. In our approach, the UML/OCL application abstraction is intended as a human-readable intermediate representation and system specification, to support application analysis and evolution in addition to translation.

9.3 Program translation using machine learning

A different approach attempts to automate the creation of translation rules by using ML techniques, such as neural nets, including large language models (LLMs) (Zhao et al. 2023). One set of ML approaches uses supervised training of a neural net model using paired source and target examples, in an analogous manner to the training of neural machine translation systems for translation of natural languages (Chen et al. 2018; Nguyen et al. 2013). However, it is difficult to find sufficiently large and reliable datasets of corresponding program examples (Lachaux et al. 2020; Zhu et al. 2022), and validating the semantic equivalence of such examples also requires significant effort. An alternative is unsupervised machine learning, whereby training uses a task such as denoising autoencoding (Lachaux et al. 2020; Szafraniec et al. 2023) to learn language translations from large unpaired code datasets. This task is typically used for training code LLMs such as CodeGen or AlphaCode (Zhao et al. 2023). In either case, the quality of training data has high impact on the accuracy of the resulting translation, and the induced mappings are based on syntactic correlations of the program codes in different languages, rather than upon semantic correspondences (Le et al. 2020). Incorporation of further semantic knowledge into training, such as dataflow information, can improve ML code translation accuracy (Guo et al. 2021). In Szafraniec et al. (2023), a low-level compiler representation (LLVM) of code is used to embed semantic knowledge into ML training, which results in improved accuracy compared to the baseline approach of Lachaux et al. (2020). In Jana et al. (2023), LLMs are used for translation between Java and Python, with training based on symbolic execution to take account of program semantics. This achieves state-of-the-art accuracy for Python to Java translation, however this level of accuracy is still below 50% for the runtime equivalence measure of Jana et al. (2023). Because both training and inference for neural net ML approaches remain fundamentally stochastic processes, 100% accuracy in translation using such approaches is generally unattainable.

Currently the accuracy of ML program translation approaches appears to be inadequate for practical use as a stand-alone tool (Liu et al. 2023; Malyaya et al. 2023). Typical translation flaws are the introduction of spurious code in the target, or the erroneous copying of source code unchanged to the target (Malyaya et al. 2023). Output programs may be syntactically close to the correct target, but have very different semantics (Szafraniec et al. 2023). LLMs can also exhibit high variability in the outputs given for the same input (Camara et al. 2023; Ouyang et al. 2023). Thus these approaches do not assure semantic preservation. The mapping rules produced by ML approaches are implicit and cannot be inspected nor easily customised. Only a limited form of customisation via pre and post processing steps is possible (Malyaya et al. 2023). Moreover, ML program translation approaches do not produce software specifications or designs. Thus these approaches seem mainly useful as assistants to support processes of manually-based program translation and platform migration.

An alternative to non-symbolic ML is a symbolic ML approach, such as model transformation by example (MTBE) (Balogh and Varro 2009; Lano et al. 2021) or code generation by example (CGBE) (Lano and Xue 2023). These techniques learn explicit symbolic rules which can be inspected and modified. They typically use much smaller datasets for training (e.g., datasets of KB size instead of MB or GB). In Lano and Xue (2023), CGBE is applied to the FUN2LAM code translation case of Chen et al. (2018), resulting in the learning of 100% accurate rules. However this research area is only at an initial state and the application of symbolic ML to learn translations of large-scale languages remains unproven.

9.4 Summary

Table 27 we summarise the different approaches for program translation, with their advantages and disadvantages.

Table 27 Program translation approaches

In relation to the previous work discussed here, our approach is based on explicit rules, but uses a semantics-based abstraction transformation to accurately represent detailed source program static and behavioural semantics in a UML/OCL specification, and MDE forward engineering to map the specification to a complete and executable target program. Thus we can support translation between substantially different programming languages. In addition, a specification of the source application is produced in UML/OCL, and this can be used for further refactoring, analysis and re-engineering activities. In contrast to existing rule-based tools such as (Java2Python 2023), we enable end-user customisation and extension of the code abstraction and code-generation scripts. In contrast to MDRE approaches using specialised metamodels such as KDM, we use UML and OCL for semantic representation, these notations have substantial tool support and seem preferable to a custom notation for long-term retention of software asset specifications.

10 Threats to validity

Threats to validity include bias in the construction of the evaluation, inability to generalise the results, inappropriate constructs and inappropriate measures.

10.1 Threats to internal validity

10.1.1 Instrumental bias

This concerns the consistency of measures over the course of the analysis. To ensure consistency, all analysis and measurement was carried out in the same manner by a single individual (the first author) on all cases. The comparison with the results of Lachaux et al. (2020) used the same accuracy measure and a similar set of test cases to the evaluation in Lachaux et al. (2020). Analysis and measurement for the results of Tables 13, 15, 16, 17 were repeated in order to ensure the consistency of the results. Test case selection and evaluation were performed manually. In principle this could be automated as in Lachaux et al. (2020), using a form of model-based testing (Jin and Lano 2022). However, manual evaluation enables us to check semantic correctness for programs which involve file processing, process creation or other complex effects, in contrast to simple testing of function result values (Lachaux et al. 2020).

10.1.2 Selection bias

We selected example cases for evaluation of program translation based on the grammars of the source languages, in order to cover the widest possible range of grammar constructs and options. The large external cases of Table 15 originated from a wide range of sources, and were not filtered by the authors to facilitate analysis: we applied our tools to all of the ANTLR evaluation example AllInOne7.java, and to the first 50 Java validation cases from Lachaux et al. (2020). 50 validation cases (10% of the dataset) from the CodeXGLUE Java to C# code dataset were also used for evaluation, this dataset has been widely used to evaluate ML-based program translation approaches (Liu et al. 2023; Lu et al. 2021; Zhu et al. 2022). The first 40 cases from the AVATAR dataset (Ahmad et al. 2023) were used to evaluate Python-to-Java translation. Examples of VB6 programs from real-world financial applications were provided by a collaborator in the finance industry (Holistic Risk Solutions Ltd) (Lano et al. 2023), e.g., Fig. 4. An externally-provided COBOL program was analysed (the industrial case of Lano and Malik 1999). Other JavaScript, VB6 and COBOL examples were taken from official reference sources (ClearPath Enterprise Servers 2015; Microsoft Com 2022; Mozilla 2023), together with test cases for these examples. Python cases were taken from the Python documentation at python.org.

Although our cases are predominately less than 100 LOC in size, the same is true for benchmark translation datasets such as CodeXGLUE (Lu et al. 2021), where the 500 Java to C# validation examples have average size 175 bytes. The average size of Python cases in the AVATAR dataset is 148 tokens (Ahmad et al. 2023). Likewise, the average size of the Transcoder Java examples (Lachaux et al. 2020) is 460 bytes. The cases in Lu et al. (2021); Lachaux et al. (2020) are predominately program fragments (individual operations), whilst our cases are always complete programs.

10.2 Threats to external validity

10.2.1 Generalisation to different samples

We considered examples of the three mainstream programming language categories in our program translation work: Java represents the category of classical object-oriented languages, C, Pascal, COBOL ‘85 and VB6 represent the category of procedural languages, and JavaScript and Python are representative of prototype-based languages with implicit typing. Thus a wide spectrum of programming languages has been considered, facilitating the generalisation of our work to other source languages in these categories, such as C# or C++. Other parsing tools could be used to produce ASTs, because \({\mathcal {CSTL}}\) is independent of the parsing technology used to produce parse trees, and other MDE tools such as Papyrus (2023) could be used for forward engineering from UML/OCL specifications.Footnote 6

10.3 Threats to construct validity

10.3.1 Inexact characterisation of constructs

Our concept of program translation is consistent with the usual understanding of this term in software engineering. We have given a detailed semantics of \({\mathcal {CSTL}}\) (Sect. 2.2), a metamodel for parse trees (Fig. 1) and a precise definition of semantic preservation by abstraction transformations (Sect. 6).

10.4 Threats to content validity

10.4.1 Relevance

The approach has been shown to be applicable to the processing of programming language source code for a range of languages, which include three (Java, Python and C) of the five most-popular programming languages according to the TIOBE index.Footnote 7 Thus the approach should be relevant to other 3GL program translation tasks.

10.4.2 Representativeness

The 3GL code translation tasks we have examined (translation of Java, VB6, Python, Pascal, COBOL ‘85, C and JavaScript) are representative of typical program translation tasks for 3GLs. The examples used for evaluation are also representative of actual programs in the languages, and we have included examples which have been used for evaluation of other translation tools: the Transcoder validation cases of Lachaux et al. (2020), the ANTLR source code used by Chen et al. (2018), the AVATAR dataset used by Ahmad et al. (2023), and the CodeXGLUE validation cases of Lu et al. (2021).

10.5 Threats to conclusion validity

We used the concept of computational accuracy Lachaux et al. (2020) to measure the quality of program translations. The related—but generally stronger—runtime equivalence measure of Jana et al. (2023) has also been used. These are semantic measures and appear to be more appropriate to evaluate software translation than syntactic similarity measures such as the BLEU score, used to evaluate machine translation of natural languages (Tran et al. 2019), or syntactic accuracy, used in many studies of ML-based program translation (Guo et al. 2021; Liu et al. 2023). Syntactic similarity of a translated program t to a ‘gold standard’ reference solution g does not ensure that t and g have the same or even similar semantics. On the other hand, syntactically quite dissimilar programs may have the same semantics (Lachaux et al. 2020).

In comparing our results to those of other approaches, we have used similar evaluation example sets, or subsets of the same evaluation sets used in other work. In Table 14 we compare our results to ML-based translation on a subset of the CodeXGLUE Java/C# translation dataset (Lu et al. 2021). This subset was the first 50 evaluation cases of the dataset which were complete programs (could be compiled using javac). We also used a subset (the first 50 cases) of the Java validation examples from Lachaux et al. (2020), and the first 40 cases of the AtCoder examples from Ahmad et al. (2023). Thus the figures for comparative accuracy of our approach with other approaches may be affected by discrepancies between the evaluation datasets.

11 Limitations and future work

Because languages such as Java and Python have very large sets of library classes, it is unavoidable that any language translation approach will be incomplete. In cases where the semantics of a library class are unknown to the translator, it is translated in the same manner as an application class, but with no semantics provided for its operations. Users of our tools may extend the \({\mathcal {CSTL}}\) abstraction scripts as required to add semantics for additional program libraries or constructs.

As discussed above, GUI code and asynchronous execution are not currently treated by our approach, but could be addressed in principle. We assume that source programs are syntactically correct. We do not preserve code comments or annotations, or provide any guarantee that the execution time of a transformed program is similar to that of the source program. Some loss of efficiency may arise due to the abstraction of arrays into sequences.

A fundamental limitation of program translation is that in some cases a target language has no effective means to implement an aspect of the source language. For example, ANSI C provides no multithreading capability, nor any graphical UI capability, hence such aspects of a source program could not be translated into C.

The adoption of UML/OCL as the intermediate language restricts our approach to languages which can be effectively translated into/generated from UML/OCL, that is, procedural, object-oriented and object-based languages. It would not be appropriate for translations involving languages with substantially different paradigms such as logic programming languages.

This paper has focussed on translations from Java 6/7 to Python, Swift, C#, C++, Go, C and Java 8, from JavaScript and VB6 to Python, COBOL ‘85 and Python translations to Java and C#, Pascal translation to Java, and C translations to C#, Swift and Go. In future work we will extend the syntactic and semantic coverage of these abstraction tools, and define abstraction scripts for other source languages, e.g., for C# or C++, which is of interest because of its widespread use in safety-critical systems. The possibility of using symbolic machine learning to automatically derive abstraction scripts from examples will also be investigated following the approach of Lano and Xue (2023). The possibility of combining symbolic and non-symbolic ML approaches will also be investigated, for example to use an LLM to abstract individual expressions such as library calls, with statement structures abstracted by precise rules (Siala 2024).

We have focussed on the re-engineering use case where a customer requires a literal translation of a source program to a target, preserving as much semantics as possible. Other situations which arise in re-engineering can involve substantial rewriting/rationalisation of the business logic (Lano and Malik 1999; Lano et al. 2023), or reverse engineering for program comprehension, quality improvement or restructuring (Canfora et al. 2011). Our approach partly supports such cases, by abstracting source program facilities to high-level UML/OCL representations which are more implementation-independent (e.g., abstracting database result sets and pointers to iterators). These representations can then be processed by standard MDE tooling for quality analysis and application design.

12 Conclusions

We have shown that a program translation approach based upon the use of MDE can be effective in supporting practical program translation tasks. The approach has advantages over previous approaches, by enabling users to effectively customise the translation rules used, and by providing a rigorous semantically-based abstraction and forward-engineering process, to support the semantic preservation of application functionality. By reverse-engineering their software resources from a source language into UML and OCL, an organisation acquires the ability to translate their software into multiple target languages at no additional cost, and hence to support diverse software modernisation and application porting activities.