Traceability and Reuse Mechanisms, the most important Properties of Model Transformation Languages

Dedicated model transformation languages are claimed to provide many benefits over the use of general purpose languages for developing model transformations. However, the actual advantages associated with the use of MTLs are poorly understood empirically. There is little knowledge and empirical assessment about what advantages and disadvantages hold and where they originate from. In a prior interview study, we elicited expert opinions on what advantages result from what factors and a number of factors that moderate the influence. We aim to quantitatively asses the interview results to confirm or reject the effects posed by different factors. We intend to gain insights into how valuable different factors are so that future studies can draw on these data for designing targeted and relevant studies. We gather data on the factors and quality attributes using an online survey. To analyse the data, we use universal structure modelling based on a structure model. We use significance values and path coefficients produced bz USM for each hypothesised interdependence to confirm or reject correlation and to weigh the strength of influence present. We analyzed 113 responses. The results show that the Tracing and Reuse Mechanisms are most important overall. Though the observed effects were generally 10 times lower than anticipated. Additionally, we found that a more nuanced view of moderation effects is warranted. Their moderating influence differed significantly between the different influences, with the strongest effects being 1000 times higher than the weakest. The empirical assessment of MTLs is a complex topic that cannot be solved by looking at a single stand-alone factor. Our results provide clear indication that evaluation should consider transformations of different sizes and use-cases. Language development should focus on providing transformation specific reuse mechanisms .


Introduction
Model driven engineering (MDE) envisions the use of model transformations as a main activity during development (Sendall and Kozaczynski ). When practising MDE, model transformations are used for a wide array of tasks such as manipulating and evolving models (Metzger ), deriving artefacts like source code or documentation, simulating system behaviour or analysing system aspects (Schmidt ). Numerous dedicated model transformation languages (MTLs) of different form, aim and syntax (Kahani et al. ) have been developed to aid with model transformations. Using MTLs is associated with many benefits compared to using general purpose languages (GPLs), though little evidence for this has been brought forth (Götz, Tichy, and Groner ). The number of claimed benefits is enormous and includes, but is not limited to, better Comprehensibility, Productivity and Maintainability as well as easier development in general (Götz, Tichy, and Groner ). The existence of such claims can partially be attributed to the advantages that are ascribed to domain specific languages (DSLs) (Hermans, Pinzger, and Deursen ; Johannes et al. ). In a prior systematic literature review, we have shown that it is still uncertain whether these advantages exist and where they arise from (Götz, Tichy, and Groner ). Due to this uncertainty it is hard to convincingly argue the use of MTLs over GPLs for transformation development. This problem is exacerbated when considering recent GPL advancements, like Java Streams, LINQ in C# or advanced pattern matching syntax, that help reduce boilerplate code (Höppner, Kehrer, and Tichy ) and have put them back into the discussion for transformation development. Even a community discussion held at the th edition for the International Conference on Model Transformations (ICMT' ) acknowledges GPLs as suitable contenders (Cabot and  Gérard ). Moreover, the few existing empirical studies on this topic provide mixed and limited results. Hebig et al. found no direct advantage for the development of transformations, but did find an advantage for the comprehensibility of transformation code in their limited setup (Hebig et al. ). A study conducted by us, found that certain use cases favour the use of MTLs, while in others the versatility of GPLs prevails (Höppner, Kehrer, and Tichy ). Overall there exists a gap in knowledge in what the exact benefits of MTLs are, how strong their impact really is and what parts of the language they originate from.
To bridge this gap, we conducted an interview study with experts from research and industry to discuss the topic of advantages and disadvantages of model transformation languages (Höppner et al. ). Participants were queried about their views on the advantages and disadvantages of model transformation languages and the origins thereof. The results point towards three main-areas that are relevant to the discussion, namely General Purpose Languages Capabilities, Model Transformation Languages Capabilities and Tooling. From the responses of the interviewees we identified which claimed MTL properties are influenced by which sub-areas and why. They also provided us with insights on moderation effects on these interdependencies caused by different Use-Cases, Skill & Experience levels of users and Choice of Transformation Language.
All results of the interview study are qualitative and therefore limited in their informative value as they do not provide indication on the strength of influence between the involved variables. It is also not clear whether the influence model is complete and whether the views pretended by the interview participants withstand community scrutiny. Therefore they only represent an initial data set that requires a quantitative and detailed analysis.
In this paper, we report on the results of a study to confirm or deny the interdependencies hypothesised from our interview results. We provide quantification of the influence strengths and moderation effects. To ensure a more complete theory of interactions, we also present the results of exploring interdependencies between factors and quality properties not hypothesised in the interviews.
Due to limited resources, this study focuses on the effects of MTL capabilities (namely Bidirectionality, Incrementality, Mappings, Model Management, Model Navigation, Model Traversal, Pattern Matching, Reuse Mechanisms and Traceability) on MTL properties (namely Comprehensibility, Ease of Writing, Expressiveness, Productivity, Maintainability and Reusability and Tool Support) in the context of their uses-case (namely bidirectional or unidirectional, incremental or non-incremental, meta-model sanity, meta-model, model and transformation size and semantic gap between input and output), the skills & experience of users and language choice. Further studies can follow the same approach and focus on different areas. Descriptions for all MTL capabilities and MTL properties can be found in Section and thorough explanations can be found in our previous works (Götz, Tichy, and Groner ; Höppner et al. ). The goal of our study is to provide quantitative results on the influence strengths of interdependences between model transformation language Capabilities and claimed Quality Properties as perceived by users. Additionally we provide data on the strength of moder-ation expressed by contextual properties. The study is structured around the hypothesised interdependencies between these variables, and their more detailed breakdown, extracted from our previous interview study. Each presumed influence of a MTL capability on a MTL property forms one hypothesis which is to be examined in this study. All hypotheses are extended with an assumption of moderation by the context variables. The system of hypotheses that arises from these deliberations is visualised in a structure model, which forms the basis for our study. The structure model is depicted in Figure . The model shows exogenous variables on the left and right and endogenous variables at the centre. Exogenous variables depicted in a ellipse with a dashed outline constitute the hypothesised moderating variables.
All hypotheses investigated in our study are of the form: "<MTL Property> is (positively or negatively) influenced by <MTL Capability>". They are represented by arrows from exogenous variables on the left of Figure to endogenous variable at the centre. A moderation on the hypothesised influence is assumed from all exogenous variables on the right of the figure connected to the considered endogenous variable. In total we investigate hypothesised influences, i.e. the number of outgoing arrows from the exogenous variables on the left of Figure . Our study is guided by the following research questions: RQ Which of the hypothesised interdependencies withstands a test of significance?
RQ How strong are the influences of model transformation language capabilities on the properties thereof?
RQ How strong are moderation effects expressed by the contextual factors use-case, skills & experience and MTL choice? RQ What additional interdependencies arise from the analysis that were not initially hypothesised?
As the first study on this subject it contains confirmatory and exploratory elements. We intend to confirm which of the interdependencies between MTL capabilities, MTL properties and contextual properties withstand quantitative scrutiny (RQ ). We explore how strong the influence and moderation effects between variables are (RQ & RQ ), to gain new insights and to confirm their significance and relevance (minor influence strengths might suggest irrelevance even if goodness of fit tests confirm a correlation that is not purely accidental). Lastly, we utilise the exploratory elements of USM to identify interdependencies not hypothesised by the experts in our interviews (RQ ).
We use an online survey to gather data on language use and perceived quality of researchers and practitioners. The responses are analysed using universal structure modelling (USM) (Buckler and Hennig-Thurau ) based on the structure model developed from the interview responses. This results in a quantified structure model with influence weights, significance values and effect strengths.
Based on the responses from participants, the key contributions of this paper are: • An adjusted structure model with newly discovered interdependencies; • Quantitative data on the influence weight and effect strength of all factors as well as significant values for the influences; • Quantitative data on the moderation strength of context factors; • An analysis of the implications of the results for further empirical studies and language development; • Reflections on the use of USM for investigating large hypotheses systems in software engineering research; The method used in the reported study has been reviewed and published as part of the Registered Reports track at ESEM' (Höppner and Tichy ). The structure of this paper is as follows: Section provides an extensive overview of model-driven engineering, domain-specific languages, model transformation languages and structural equation modelling as well as universal structure modelling. Afterwards, in Section the methodology is outlined. Demographic data of the responses is reported in Section and the results of analysis is presented in Section . In Section we discuss implications of the results and report our reflections on the use of USM. Section discusses threats to validity of our study and how we met them. Lastly, in Section we present related work before giving concluding remarks on our study in Section .

Background
In this section we provide the necessary background for our study. Since it is a follow up study to our interview study (Höppner et al. ) much of the background is the same and is therefore taken from those descriptions. To stay self contained we still provide these descriptions. This concerns Sections . to . . Sections . and . contains an extension of our descriptions from the registered report (Höppner and Tichy ). .

Model-driven engineering
The Model-Driven Architecture (MDA) paradigm was first introduced by the Object Management Group in (OMG ). It forms the basis for an approach commonly referred to as Model-driven development (MDD) (Brown, Conallen, and Tropeano ), introduced as means to cope with the ever growing complexity associated with software development. At the core of it lies the notion of using models as the central artefact for development. In essence this means, that models are used both to describe and reason about the problem domain as well as to develop solutions (Brown, Conallen, and Tropeano ). An advantage ascribed to this approach that arises from the use of models in this way, is that they can be expressed with concepts closer to the related domain than when using regular programming languages (Selic ).
When fully utilized, MDD envisions automatic generation of executable solutions specialized from abstract models (Selic ; Schmidt ). To be able to achieve this, the structure of models needs to be known. This is achieved through so called meta-models which define the structure of models. The structure of meta-models themselves is then defined through meta-models of their own. For this setup, the OMG developed a modelling standard called Meta-object Facility (MOF) (OMG ) on the basis of which a number of modelling frameworks such as the Eclipse Modelling Framework (EMF) (Steinberg et al. ) and the .NET Modelling Framework (Hinkel ) have been developed.
. Domain-specific languages Domain-specific languages (DSLs) are languages designed with a notation that is tailored for a specific domain by focusing on relevant features of the domain (Van Deursen and Klint ). In doing so DSLs aim to provide domain specific language constructs, that let developers feel like working directly with domain concepts thus increasing speed and ease of development (Sprinkle et al. ). Because of these potential advantages, a well defined DSL can provide a promising alternative to using general purpose tools for solving problems in a specific domain. Examples of this include languages such as shell scripts in Unix operating systems (Kernighan and Pike ), HTML (Raggett, Le Hors, Jacobs, et al. ) for designing web pages or AADL an architecture design language (SAEMobilus ). .

. . External and Internal transformation languages
Domain specific languages, and MTLs by extension, can be distinguished on whether they are embedded into another language, the so called host language, or whether they are fully independent languages that come with their own compiler or virtual machine.
Languages  . Examples for transformation rules are the rules that make up transformation modules in ATL, but also functions, methods or procedures that implement a transformation from input elements to output elements. The fundamental difference between model transformation languages and general-purpose languages that originates in this definition, lies in dedicated constructs that represent rules. The difference between a transformation rule and any other function, method or procedure is not clear cut when looking at GPLs. It can only be made based on the contents thereof. An example of this can be seen in Listing , which contains exemplary Java methods. Without detailed inspection of the two methods it is not apparent which method does some form of transformation and which does not.
In a MTL on the other hand transformation rules tend to be dedicated constructs within the language that allow a definition of a mapping between input and output (elements). The example rules written in the model transformation language ATL in Listing make this apparent. They define mappings between model elements of type Member and model elements of type Male as well as between Member and Female using rules, a § ¤

. . Rule Application Control: Location Determination
Location determination describes the strategy that is applied for determining the elements within a model onto which a transformation rule should be applied We differentiate two forms of location determination, based on the kind of matching that takes place during traversal. There is the basic automatic traversal in languages such as ATL or QVT, where single elements are matched to which transformation rules are applied. The other form of location determination, used in languages like Henshin, is based on pattern matching, meaning a model-or graph-pattern is matched to which rules are applied. This does allow developers to define sub-graphs consisting of several model elements and references between them which are then manipulated by a rule.
The automatic traversal of ATL applied to the example from Listing will result in the transformation engine automatically executing the Member2Male on all model elements of type Member where the function isFemale() returns false and the Member2Female on all other model elements of type Member.
The pattern matching of Henshin can be demonstrated using Figure  ). It describes a transformation that creates a couple connection between two actors that play in two films together. When the transformation is executed the transformation engine will try and find instances of the defined graph pattern and apply the changes on the found matches.
This highlights the main difference between automatic traversal and pattern matching as the engine will search for a sub graph within the model instead of applying a rule to single elements within the model. The directionality of a model transformation describes whether it can be executed in one direction, called a unidirectional transformation or in multiple directions, called a multidirectional transformation (Czarnecki and Helsen ). For the purpose of our study the distinction between unidirectional and bidirectional transformations is relevant. Some languages allow dedicated support for executing a transformation both ways based on only one transformation definition, while other require users to define transformation rules for both directions. Generalpurpose languages can not provide bidirectional support and also require both directions to be implemented explicitly.
The ATL transformation from Listing defines a unidirectional transformation. Input and output are defined and the transformation can only be executed in that direction.
The QVT-R relation defined in Listing is an example of a bidirectional transformation definition (For simplicity reasons the transformation omits the condition that males are only created from members that are not female). Instead of a declaration of input and output, it defines how two elements from different domains relate to one another. As a result given a Member element its corresponding Male elements can be inferred, and vice versa.

. . Incrementality
Incrementality of a transformation describes whether existing models can be updated based on changes in the source models without rerunning the complete transformation (Czarnecki and Helsen ). This feature is sometimes also called model synchronisation. Providing incrementality for transformations requires active monitoring of input and/or output models as well as information which rules affect what parts of the models. When a change is detected the corresponding rules can then be executed. It can also require additional management tasks to be executed to keep models valid and consistent.

. . Tracing
According to Czarnecki and Helsen ( ) tracing "is concerned with the mechanisms for recording different aspects of transformation execution, such as creating and maintaining trace links between source and target model elements". Several model transformation languages, such as ATL and QVT have automated mechanisms for trace management. This means that traces are automatically created during runtime. Some of the trace information can be accessed through special syntax constructs while some of it is automatically resolved to provide seamless access to the target elements based on their sources.
An example of tracing in action can be seen in line 16 of Listing . Here the partner attribute of a Female element that is being created, is assigned to s.companion. The s.companion reference points towards a element of type Member within the input model. When creating a Female or Male element from a Member element, the ATL engine will resolve this reference into the corresponding element, that was created from the referred Member element via either the Member2Male or Member2Female rule. ATL achieves this by automatically tracing which target model elements are created from which source model elements.

. . Dedicated Model Navigation Syntax
Languages or syntax constructs for navigating models is not part of any feature classification for model transformation languages. However, it was often discussed in our interviews and thus requires an explanation as to what interviewees refer to.
Languages such as OCL (OMG ), which is used in transformation languages like ATL, provide dedicated syntax for querying and navigating models. As such they provide syntactical constructs that aid users in navigation tasks. Different model transformation languages provide different syntax for this purpose. The aim is to provide specific syntax so users do not have to manually implement queries using loops or other general purpose constructs. OCL provides a functional approach for accumulating and querying data based on collections while Henshin uses graph patterns for expressing the relationship of sought-after model elements.

. Structural equation modelling and (Universal) Structural Equation Modelling
Structural equation modelling (SEM) is an approach used for confirmatory factor analysis (Graziotin et al. ). It defines a set of methods used to "investigate complex relationship structures between variables and allows for quantitative estimates of interdependencies thereof. Its goal is to map the a-priori formulated cause-effect relationships into a linear system of equations and to estimate the model parameters in such a way that the initial data, collected for the variables, are reproduced as well as possible" (Weiber and Mühlhaus ). Structural equation modelling distinguishes between two sets of variables manifest and latent. Manifest variables are variables that are empirically measured and latent variables describe theoretical constructs that are hypothesised to interact with each other. Latent variables are further divided into exogenous or independent and endogenous or dependent variables.
So called structural equation models, a sample of which can be seen in Figure , comprised of manifest and latent variables, form the heart of analysis. They are made up of three connected sub-models. The structure model, the measurement model of the exogenous latent variables and the measurement model of the endogenous latent variables.
The structure model defines all hypothesised interactions between exogenous ( ) and endogenous ( ) latent variables. Each exogenous variable is linked, by arrow, to all endogenous variables that are presumed to be influenced by it. Each of these connections is given a variable ( _ ) that measures the influence strength. If an exogenous variable moderates the influences on a endogenous variable, the exogenous variable is depicted with a dashed outline and connected is assigned. In addition, an residual (or error) variable is appended to each endogenous latent variable to represent the influence of variables not represented in the model. Figure shows an example structure equation model model for the hypothesis that "Mappings help with the comprehensibility of transformations, depending on the developers experience.". The structure model seen at the centre of the figure, is comprised of the exogenous latent variable 1 (Mappings), the moderating exogenous variable 2 (Experience), the endogenous latent variable 1 (Comprehensibility), a presumed influence of Mappings on Comprehensibility via 11 and the error variable 1 . Lastly the model also contains a moderation of Experience on all influences of Comprehensibility. As described earlier, this moderation effect is assigned the variable 11_2 . The moderation variables are not depicted in our graphical representation of the structure model because of their high number and associated visibility issues.
The measurement model of the exogenous latent variables reflects the relationships between all exogenous latent variables and their associated manifest variables. Each manifest variable is linked, by arrow, to all exogenous latent variables that are measured through it.
To illustrate moderation, arrows are usually shown from the moderating exogenous variable to the arrow representing the moderated influence , i.e., an arrow between an exogenous variable and an endogenous variable. However our illustration deviates from this due to the size and makeup of our hypothesis system. Standard representations can be found in the basic literature such as Weiber and Mühlhaus ( ).
Each of these connections is given a variable that measures the indication strength of the manifest variable for the latent variable. Additionally, an error variable for each manifest variable is introduced that represents measurement errors. In Figure , the measurement model for exogenous latent variables, seen at the left of the figure, is comprised of the exogenous latent variables 1 (Mappings) and 2 (Experience), the manifest variables 1 (% of code using Mappings), 2 (number of years a person has been a programmer) and 3 (number of hours per month spent developing transformations) their measurement accuracy for Mapping usage 11 and their measurement accuracy for Experience 22 and and 32 and the associated measurement error 1 and 2 and 3 .
The measurement model of the endogenous latent variables reflects the relationships between all endogenous latent variables and their associated manifest variables. It is structured the same way as the measurement model of the exogenous latent variables. In Figure , it is shown on the right of the figure.
Given a structural equation model and measurements for manifest variables, the SEM approach calls for estimating the influence weights and latent variables within the models. This is done in alternation for the measurement models and the structure model until a predefined quality criterion is reached. Traditional methods (covariance-based structural equation modeling & partial least squares) use different mathematical approaches such as maximum-likelihood estimation or least squares (Weiber and Mühlhaus ) to estimate influence weights.

Universal Structure Modeling (USM) is an exploratory approach that complements the traditional confirmatory SEM methods (Buckler and Hennig-Thurau
). It combines the iterative methodology of partial least squares with a Bayesian neural network approach using multilayer perceptron architecture. USM derives a starting value for latent variables in the model via principal component analysis and then applies the Bayesian neural network to discover an optimal system of linear, nonlinear and interactive paths between the variables. This enables USM to identify complex relationships that may not be detected using traditional SEM approaches including hidden structures within the data and highlights unproposed model paths, nonlinear relations among model variables, and moderation effects.
The primary measures calculated in USM are the 'Average Simulated Effect' (ASE), 'Overall Explained Absolute Deviation' (OEAD), 'interaction effect' (IE) and 'parameter significance'. ASE measures the average change in the endogenous variable resulting from a one-unit change in the exogenous variable across all simulations. OEAD assesses the degree of fit between the observed and simulated values of the endogenous variable, capturing the overall explanatory power of the model. IE evaluates the extent to which the effect of one exogenous variable on the endogenous variable depends on the level of another variable. Parameter Significance determines whether the estimated coefficients for each exogenous variable in the model are statistically significant at a predetermined level of confidence which indicated if the exogenous variable has a meaningful impact on the endogenous variable and is calculated through a bootstrapping routine (Mooney et al. ). These metrics together provide a comprehensive assessment of the performance and explanatory power of a USM model.
USM is recommended for use in situations where traditional SEM approaches may not be sufficient to fully explore the relationships between variables. Using USM instead of traditional structural equation modelling approaches is suggested for studies where there are still uncertainties about the completeness of the underlying hypotheses system and for exploring nonlinearity in the influences (Weiber and Mühlhaus ; Buckler and Hennig-Thurau ). Moreover its use of a neural network also reduces the requirements for the scale levels of data thus allowing the introduction of categorical variables in addition to metric variables (Weiber  and Mühlhaus ). At present, the tool NEUSREL is the only tool available for conducting USM. https://www.neusrel.com

. MTL Quality Properties
There exists a large body of quality properties that get associated with model transformation languages. In literature many claims are made about advantages or disadvantages of MTLs in these different properties. We categorised these properties in a previous work of ours (Götz, Tichy, and Groner ). This study focuses on a subset of all the identified quality properties of MTLs which requires them to be properly explained. In this section, we give a brief description of our definitions of each of the quality properties of MTLs relevant to the study.
Comprehensibility describes the ease of understanding the purpose and functionality of a transformation based on reading code.
Ease of Writing describes the ease at which a developer can produce a transformation for a specific purpose.
Expressiveness describes the amount of useful dedicated transformation concepts in a language.
Productivity describes the degree of effectiveness and efficiency with which transformations can be developed and used.
Maintainability describes the degree of effectiveness and efficiency with which a transformation can be modified.
Reusability describes the ease of reusing transformations or parts of transformations to create new transformations (with different purposes).
Tool Support describes the amount of quality tools that exist to support developers in their efforts.

Methodology
The methodology used in this study has been reviewed and published as part of the Registered Reports track at ESEM' (Höppner and Tichy ). In the following, we provide a more detailed description and highlight all deviations from the reported method as well as justification for the changes.
The study itself is comprised of the following steps which were executed sequentially and are reported on in this section. The steps executed differ in two ways from those reported in the registered report. First, we do not contact potential participants for a second time after two weeks. This was deemed unnecessary based on the number of participants at that point in time. Moreover we did not want to bother those that participated already and had no way of knowing their identity. Second, we kept the survey open weeks longer than intended due to receiving several requests to do so.

. Survey Design
In this section we detail the design of the used questionnaire and methodology used to develop and distribute it.

. . Questionnaire
The questions in the questionnaire are designed to query data for measuring the latent variables from the structure model in Figure . The complete questionnaire can be found in Appendix B. In the following, we describe each latent variable and explain how we measure it through questions in the questionnaire.
There are latent variables relevant to our study. Variables 1..19 describe exogenous variables and 1..7 describe endogenous variables. Each latent variable is measured through one or more manifest variables. Extending the structure model from Figure with the manifest variables produces the complete structural equation model evaluated in this study. Note that USM reduces the requirements for the scale levels of data thus allowing the use of categorical variables in addition to metric variables (Weiber and Mühlhaus ). All latent variables related to MTL capabilities ( 1..9 ) are associated with a single manifest variable 1..9 , which measures how frequently the participants utilized the MTL capabilities in their transformations. This measurement is represented as a ratio ranging from % to %. The higher the value of 1..9 , the more frequently the participants used the MTL capabilities in their transformations. Similarly, latent variables related to MTL properties ( 1..7 ) are associated with a single manifest variable 1..7 which measures the perceived quality of the property on a -point likert scale (e.g., very good, good, neither good nor bad, bad, very bad).
The use of single-item scales is a debated topic. We justify their usage for the described latent variables on multiple grounds. First, the latent variables are of high complexity due to the abstract concepts they represent. Second, our study aims to produce first results that need to be investigated in more detail in follow up studies, more focused on single aspects of the model. And third, due to the size of our structural equation model multi-item scales for all latent variables would increase the size of the survey, potentially putting off many subjects. The validity of these deliberations for using single-item scales is supported by Fuchs and Diamantopoulos ( ).
The latent variable language choice ( 10 ) is measured by means of querying participants to list their most recently used transformation languages. In our registered report we planned to also request participants to give an estimate on the percentage of their respective use % ( 10 ). This was discarded during pilot testing as it was seen as unnecessarily prolonging the questionnaire. Pilot testers had difficulties providing accurate data and questioned whether this data was actually used in analysis.
Language skills ( 11 ) is measured through 11 and 12 for which participants are asked to give the amount of years they have been using each language ( 11 ) and the amount of hours they use the language per month ( 12 ).

Meta-model size
). To formulate the semantic gap between input and output ( 16 ) we elicit the similarity of the structure ( 18 ) and data types ( 19 ) on a -point likert scale (very similar, similar, neither similar nor dissimilar, dissimilar, very dissimilar). Participants are asked to give the percentage of all their meta-models that fall within each of the five assessments.
The meta-model sanity ( 17 ) is measured through means of how well participants perceive their structure ( 20 ) and their documentation ( 21 ) to be on a -point scale (very well, well, neither well nor bad, bad, very bad). Participants are asked to give the percentage of all their meta-models that fall within each of the five assessments.
Lastly, for both bidirectional uses ( 18 ) and incremental uses ( 19 ) we query participants on the ratio of bidirectional ( 22 ) and incremental ( 23 ) transformations compared to simple uni-directional transformations they have written.

. . Pilot Study
We pilot tested the study with three researchers from the institute. All pilot testers are researchers in the field of model driven engineering with more than years of experience. Based on their feedback, we reworded some questions questions, removed the usage percentage part of the question for language choice and added more precise descriptions of the queried concepts. We then made the questionnaire publicly available and distributed a link to it via emails.

. . Target Subjects & Distribution
The target subjects are both researchers and professionals from industry that have used dedicated model transformation languages to develop model transformations in the last five years. We use voluntary and convenience sampling to select our study participants. Both authors reached out to researchers and professionals they knew personally via mail and request them to fill out the online survey. We further reach out, via mail, to all authors of publications listed in ACM Digital Library, IEEE Xplore, Springer Link and Web of Science that contain the key word model transformation from the last five years. A third source of subjects is drawn from social media. The authors use their available social media channels to recruit further subjects by posting about the online-survey on the platforms. The social media platform used for distribution was MDE-Net , a community platform dedicated to model driven engineering.
The sampling method differs from the intended method by not including snowballing sampling as a secondary sampling method. We decided on this to have more control over the subjects receiving a link to the study as we believe secondary and tertiary contacts might be too far secluded from our target subjects.
Participation was voluntary and we did not incentivise participation through offering rewards. This decision is rooted in our experience in previous studies one other survey with subjects (Groner et al.
) and the interview study we are basing this study on with subjects (Höppner et al. ). It is suggested in literature to have between to times as many participants as the largest number of parameters to be estimated in each structural equation (i.e., the largest number of incoming paths for a latent model variable) (Buckler and Hennig-Thurau ). Thus, the minimal number of subjects for our study to achieve stable results is . To gain any meaningful results a sample size of must not be undercut (Buckler and Hennig-Thurau ). In total we contacted potential participants and got responses exceeding the minimum requirement for stable results.

. Data Analysis
We use USM to examine the hypotheses system modelled by the structure model shown in Figure . USM is chosen over its structural equation modelling alternatives due to it being able to better handle uncertainty about the completeness of the hypothesis system under investigation, it having more capabilities to analyse moderation effects and the ability to investigate non linear correlations (Weiber and Mühlhaus ). USM requires a declaration of an initial likelihood of an interdependence between two variables. This is used as a starting point for calculating influence weights but can change over the course of calculation. For this, Buckler and Hennig-Thurau ( ) suggest to only assign a value of to those relationships that are known to be wrong. We use the results of our interview study (Höppner et al. ), shown in the structure model, to assign these values. For each path that is present in the model, we assume a likelihood of %. To check for interdependencies that might have been missed by interview https://mde-network.com/ This constitutes a response rate of 4.8%. We do however not know how many responses are a result of our social media posting. participants, we also use a likelihood of % for all missing paths between 1..19 and 1..7 . Our plan was to use a likelihood of % for these interdependencies but the tool available to us only allowed for either % or % to be put as input.
The tool NEUSREL is used on the extracted empirical data and the described additional input to estimate path weights and moderation weights within the extended structure model, i.e., the structure model where each exogenous latent variable is connected to all endogenous latent variables. It also runs significance tests via a bootstrapping routine (Buckler and Hennig-Thurau ; Mooney et al. ) and produces the significance value estimates for each influence. The following procedures are then followed to answer the research questions from Section .
RQ . We reject all hypothesised influences, i.e., those present in our structure model in Figure , that do not pass the statistical significance test. The threshold we set for this is 0.01. Moreover, we discard hypothesised influences with minimal effects strengths that are several magnitudes lower than the median influence of all coefficients. If, for example, the median of all path coefficients is . all influences with a coefficient lower or equal to . are discarded. We do so because such low influences suggest that the influence is negligible.

RQ & RQ .
All path coefficients produced that were not rejected in RQ will then provide direct values for the influence and moderation strengths to answer RQ . The same significance criteria we applied to all hypothesised influences for RQ , we also apply to the extended influences, i.e., those not present in the structure model from Figure . Those influences that pass the significance test are added to the initial structural model as newly discovered influences.
. Privacy and Ethical concerns All participants were informed of the data collection procedure, handling of the data and their rights, prior to filling out the questionnaire. Participation was completely voluntary and not incentivised through rewards.
During selection of potential participants the following data was collected and processed.
The questionnaire did not collect any sensitive or identifiable data.
All data collected during the study was not shared with any person outside of the group of authors.
The complete information and consent form can be found in Appendix D.
The study design was not presented to an ethical board. The basis for this decision are the rules of the German Research Foundation (DFG) on when to use a ethical board in humanities and social sciences . We refer to these guidelines because there are none specifically for software engineering research and humanities and social sciences are the closest related branch of science for our research.

Demographics
We detail the background and experience of the participants in our study in the following sections.
. Experience in developing model transformations ( 12 ) Our survey captured model transformation developers with wide range of experience. The experience span ( 13 ) ranges from the least experience participant with half a year of experience up to the one with most experience of years. Figure shows a histogram of the experience stated by participants. Over half of all participants have between to ten years of experience in writing model transformations. Three stated to have more than years in total. On average our participants have years of experience.
How much time participants spend developing transformations each month ( 14 ) also greatly varies. Some participants have not developed transformations in recent time whereas others stated to spend or more hours each month on transformation development. Figure shows an overview over the hours participants spend each month in developing transformations. The vast majority spends around to hours each month on transformation development. Nine stated that they did not develop any transformation in recent times. On average our participants spend about hours per month developing model transformations.
. Languages used for developing model transformations ( 10 ) and experience therein ( 11 ) To develop their transformations, participants use a wide array of languages. In total languages ( 10 ) have been named of which are unique languages used only by a single participant.  Surprisingly the language that has been used by the most participants is Java, a general purpose language. Java has been used by of the participants. The most used MTL is ATL with users closely followed by another GPL, namely Xtend with users. Table shows how many participants use one of the ten most used languages for developing transformations. Overall the prevalence of general purpose programming languages is higher than expected. This might be explained by the large number of existing MTLs which reduce the amount of total users per language while only four different GPLs are used.
. Sizes ( 12 , 14 ) The size distribution of meta-models ( 15 ) transformed by participants is shown in Figure . On the x-axis the given intervals of meta-model sizes are shown and on the y-axis the distribution for each participant is shown. For example, the first ridge line at the bottom of Figure shows the answers of a participant who has stated that % of their transformations revolve around meta-models with or less meta-model elements.
The figure illustrates that most transformations involve meta-models with to meta-model elements. Moreover, most participants have some experience with small meta-models while only a handful of them has experience with transformations involving large meta models of more than .
elements. The size distribution of model transformations ( 17 ) written by participants is shown in Figure . Similarly to the meta-model sizes, the figure illustrates that most participants have some experience with small transformations of sizes up to lines of code. Most also have experience with large transformations up to . lines of code. More than % of all participants also have experience with large and very large transformations ranging from . up to more than . lines of transformation code.
Overall the experience of our participants includes many moderately large to large transformations. This strengthens us in the assumption that their answers are meaningful for our study.    ( 17 ) Participants agreed that the vast majority of metamodels they transform are well structured ( 20 ). This means there is little to no additional burden put onto development solely due to unfavourably structured metamodels. The distribution of structure assessment per participant is shown in Figure . The situation is different with documentation ( 21 ). Most participants stated that they have experience with badly or even very baldy documented meta-models Figure . For many participants, this constitutes the majority of meta-models they work with.

Results
In this section, we present the results of our analysis of the questionnaire responses using universal structure modelling structured around the research questions RQ -. The quantitative results for all influences between MTL capabilities and MTL properties are shown in Table in  The rest of this section presents our results in context of the four research questions. We focus on the most salient influences that we deem interesting for the respective research question. Detailed interpretation and discussion of the implications of the presented results are done in Section .
. RQ : Which of the hypothesised interdependencies withstands a test of significance? & RQ : What additional interdependencies arise from the analysis that were not initially hypothesised?
Our first research question is aimed at evaluating the accuracy of the structure model developed in the previous study (Höppner et al. ). We do so by subjecting all hypothesised influences to a significance test during analysis. The significance test can also be used to directly gain insights into interdependencies missed in the initial model. Thus we discuss both the rejection of previously hypothesised influences as well as the extension of the model through newly discovered significant interdependencies in this section.
Most initially hypothesised influences withstand the test of significance but there are several exceptions. Most notably all but one(Maintainability) of the hy- Regarding the moderating effects, our findings suggest that a nuanced view is warranted. The hypothesis that context moderates all influences on an MTL Property still holds but the strength of the moderation effects varies greatly.
As hypothesised, we are able to observe that Comprehensibility and Ease of Writing are the two properties moderated by the most context variables. But the moderation is only significant for a hand full of influences on these properties. This can be seen e.g. in the moderation effects of Meta-Model Size on influences on Comprehensibility depicted in Table in Appendix A. Changes in the Meta-model sizes participants worked with had next to no effect on how their usage of Bidirectionality functionality affected their view on the

Comprehensibility of transformations. The impact on the influence of Model Management on Comprehensibility is orders of magnitudes higher. Another observation that stands out is the impact of
Language Choice and Language Experience. The moderation effects of both variables are negligible or even for all influences. We believe this is due to the large number of languages considered in this study. It makes analysing the effects of choosing one of the languages difficult.
Overall the results for research questions RQ & suggest that our initial structure model contains many relevant interdependencies but several more have to be considered as well. We do have to reject several direct influences due to low significance and moderation effects have to be considered on a per influence basis instead of being generalised for each MTL Property.
. RQ : How strong are the influences of model transformation language capabilities on the properties thereof?
Our second research question is intended to provide numbers that can help to identify the most important factors to consider when evaluating the advantages and disadvantages of model transformation languages empirically. We do this by considering both the average simulated effect of influences calculated by NEUSREL as well as the overall explained absolute deviation of influences compared to each other. As explained earlier in this section all numbers can be found in Table . Overall the effects identified in our analysis are lower than anticipated. They range from . down till . e-. We expected some effects to be low, mainly those from non significant interdependencies, but the fact that even significant effects are in the order of . is surprising. We assume this stems from the large number of variables that are involved and the overall complexity of the matter under investigation. Nonetheless we believe there are meaningful insights that can be drawn when comparing the influences for each MTL Property with each other.
Of the influences hypothesised from our previous interview study Traceability is the most impactful MTL Capability. Its usage exerts the highest influence on perceived Comprehensibility with 0.29. Similarly it has the highest influence for Ease of Writing though with a value of 0.0021 the effect is small. We were, however, already able to show empirical evidence that MTLs utilising automatic trace handling provide clear advantages for writing transformations compared to GPLs (Höppner, Kehrer, and Tichy ).  Please note that the significance values obtained through the NEUSREL tool may exhibit reduced accuracy compared to standard approaches due to the bootstrapping method used for their estimation.

For the properties Tool Support, Maintainability and
Productivity the availability of Reuse Mechanisms seems to be the strongest driving factor with an average simulated effect of 0.1, 0.1, 0.1 and 0.2, respectively. No other factor has an ASE or effect strength as high as Reuse Mechanisms for these properties. This result is surprising as the influences were not raised even once during our interview study. Overall, automatic tracing and reuse mechanisms appear to be the most influential factors for MTL properties. This suggests to us two main pathways for further research. First, to improve model transformation languages more research should be devoted to developing effective ways to reuse transformations or parts of transformations. From our experience, current mechanism are hard to use and are especially unsuited for different use-cases. Secondly, the first area to address for improved adoption of model transformation concepts in general purpose languages should be the development of mechanisms for automatic trace handling.
. RQ : How strong are moderation effects expressed by the contextual factors use-case, skills & experience and MTL choice?
As expressed in Section . the results of our analysis suggest that a more nuanced view of moderation effects is warranted. In this section we go into detail on these nuances.
As hypothesised the size of meta-models moderates the influences on Comprehensibility. The moderation strength differs greatly between the different causing factors though. For example, Meta-model size exerts the strongest moderation on the influence of Model Management onto Comprehensibility with 0.14. All other moderation effects are far lower. The second highest moderation effect, the moderation of Meta-model size on the influence of Traceability on Comprehensibility, is about half es strong (0.0778) and the lowest, the moderation of Meta-model size on the influence of Bidirectionality functionality on Comprehensibility, is only 0.0009. The moderations make sense intuitively as larger metamodels would make implementing these tasks manually more labour intensive and thus clutter the code unnecessarily.
Model size exerts similar moderation effects as metamodel size. Its strongest moderation effect is also on the influence of Model Management on Comprehensibility (0.36). Moreover, Model size also strongly moderates the influence of Traceability functionality on the Ease of Writing transformations (0.17). Most other moderation effects of Model size are far lower than 0.1.
Transformation size seems to be the most relevant moderating factors across the board. It has many noteworthy moderation effects on all influences of MTL Capabilities on Tool Support, none being less than 0.16, and Productivity, most being above 0.12. We assume this is because the larger transformations get, the more reliant developers are on tooling and abstractions that reduce the development effort.
Another interesting effect we found is, that devel- Overall the size of transformations is in our opinion the most relevant moderating variable. The assumption on the relevance of language choice could however not be confirmed. This is most likely due to the large amount of languages each participant has had experience with which weakens the ability to elicit the effect of differences of language choice between participants.

Discussion
The results of our analysis provide useful insights for research on model transformation languages. In this section, we discuss the implications of our results for evaluation and development of MTLs. Additionally, we provide a critical evaluation of our methodology with regards to the goals of this study. .

Implications of results
The topic of influences on the quality properties of model transformation language is vastly complex, as reflected in the already large structure model which we set out to analyse. While we were able to reject some of the hypothesised influences, our analysis also identified several new influences. As a result, the structure model depicting the influences grew in complexity, further highlighting the need for comprehensive studies of the factors that influence MTL quality properties. The updated structure model can be seen in Figure . It contains more interdependencies than the one we started our analysis with.
Our analysis produced a number of interesting observations that have important implications for further research. In particular, we now discuss the implications for empirical evaluations. Additionally, we highlight the implications of our results for further development of MTLs and domain-specific features thereof.

. . Suggestions for further empirical evaluation studies
Traceability is one of the most important factors to consider when it comes to the development of model transformations. This is because it has the strongest influence on the perceived quality of both the ease of writing and the comprehensibility of the resulting code. It is crucial to consider scenarios where tracing is involved in order to properly evaluate the value of MTL abstractions for writing and comprehending transformations. Additionally, it is important to evaluate scenarios where tracing is not necessary to understand the difference that MTL abstractions can make. To truly understand the relevance of this feature, it is also important to assess how many real-world use cases require it. By taking all of these factors into account, it is possible to gain a comprehensive understanding of the value of MTL abstractions for writing and comprehending transformations.
For evaluation of Maintainability, Reuse Mechanisms as well as Model Traversal functionality are important capabilities to consider. We therefore believe that researchers focusing on such an evaluation must make sure to use transformations that utilise these capabilities. Moreover, the most important context to consider is the semantic gap between input and output metamodels. Empirical evaluations focusing on maintainability should therefore make sure to evaluate transformation cases with varying degrees of differences between input and output meta-models. These studies should then analyse how much the effectiveness of MTLs and GPLs changes in light of the semantic gap between input and output. When selecting transformations for evaluation, it is essential to consider their size. Our results have shown that size has the most significant impact on the influence of other factors on properties. Put differently, the larger the transformation, the more noticeable the effect of all capabilities will be. As such, it is imperative to focus on large transformation use-cases when designing a study to evaluate MTLs.

. . Suggestions on language development
For us, the most surprising finding of this study is the importance of reuse functionality. The quality attributes tool support, maintainability, productivity and reusability are all most influenced by it. This is especially surprising because there was no indication of this in our interviews (Höppner et al. ). We suppose this influence stems from the fact that reuse mechanisms allow for more abstraction and thus less code that can be developed and maintained more efficiently.
As a result we believe that more focus should be put on developing transformation specific reuse mechanisms. We are aware that some languages, e.g. ATL, already provide general reuse mechanisms through concepts like inheritance. However, these concepts are limited by the fact that they rely on the object-oriented nature of the involved models. This means that they can only be used to define reusable code within transformations of a single meta-model. Defining transformation behaviour that can be reused between different meta-models is not possible. But this would be important to further reduce redundancy in transformation development.
As result, we believe, that development of reuse mechanisms tailored to MTs is important to focus on. In order to stand out compared to the reuse mechanisms of GPLs, it may be valuable to explore ways to define and reuse common transformation patterns independently of meta-models. Higher order transformations are sometimes used to allow reuse too (Kusel et al. ), but from our experience current implementations are too cumbersome to be used productively. Chechik et al. ( ) provide a number of suggestions for transformation specific reuse mechanisms but to the best of our knowledge there exist no implementations of their concepts.
. Interesting observations outside of USM When discussing model transformation languages, it is often stated that they are only demonstrated on 'toy examples' that have little to no real world value. This argumentation has for example been raised several times in our previous interview study (Höppner et al. ). However, the demographic data collected in our study disputes this.
There are several participants that stated to have worked solely on small transformations with small metaand input models. But this group is opposed by a simi-  larly large group of participants that have worked with huge transformations, dissimilar and large meta-models as well as large inputs. From this we conclude, that there are large use-cases where model transformations and MTLs are applied but they rarely get described in publications. It seems likely that such examples are not used for highlighting important aspects authors want to discuss due to the space describing such cases would take up. However, we argue that it is paramount that such case-studies are published to diminish the cynicism that MTLs are only useful for small examples. Another noteworthy observation based on the demographic data of our participants is that documentation pertaining meta-models is predominantly perceived as inadequate. We believe that this is primarily due to the fact that many of meta-models stem from research projects that prioritize expeditious prototyping over the long-term viability of the artefacts. Nonetheless, we are convinced that there is an urgent need to enhance the documentation surrounding model transformations. This issue is not limited solely to the meta-models, but also extends to the languages that are known for their challenging learning curve because of lack of tutorials (Höppner et al. ).
. Critical Assessment of the used methodology The appeal of using structural equation modelling for analysing the responses to our survey was to have a method of analysis that can be used to investigate a complex hypothesis system in its entirety. Moreover, analysis is straight forward after an initial setup due to the sophisticated tooling for this methodology. Instead of presenting participants with a case that they should assess we also opted for querying them on their overall assessment of MTL quality attributes. These design decisions have implications and ramifications that we discuss in this section. First, the effects observed in our study are small. We assume this stems from the intricate and large structure model and the comparatively small sample size. As explained in Section it is suggested to have between to times as many participants as the largest number of parameters to be estimated in each structural equation. In light of the newly discovered paths in our structure model, the total participants are close to the minimum sample size required. Moreover, because of the large number of influences we do expect the influence of a single factor to be much smaller than in structure models where only -factors are relevant. The results therefore reinforce our assessment that it is a very complex topic.
We also ran into some difficulties when using NEUSREL to analyse our data. The structure model was so large that sometimes the tool crashed during calculations. The online tooling to set everything up was also painfully inefficient leading to more problems during setup like browser crashes. It took us some trial and error to find a way to get everything set up and run the analysis without crashes.
We chose to execute a study based on our study design in hopes of producing a complete theory independent of the use case under consideration. The results exhibit less effect strength but we believe them to be more externally valid. Nonetheless, we think that several additional studies need to be conducted to confirm our results for different use-cases.

Threats to validity
Our study is carefully designed and follows standard procedures for this type of study. There are, however still threats to validity that stem from design decisions and limitations. In this section we discuss these threats.

. Internal Validity
Internal validity is threatened by manual errors and biases of the involved researchers throughout the process.
The two activities where such errors and biases can be introduced are the subject selection and question creation. The selection criteria for study subjects is designed in such a way, that no ambiguities exist during selection. This prevents researcher bias.
The survey questions and answers to the questions pose another threat to internal validity. We used neutral questions to prevent subconsciously influencing the opinions of research subjects. We also provide explanations for ambiguous terms used in the survey. However, there are several instances where we can not fully ensure that each participant interprets terms the same way. The questions on quality properties of model transformation languages allow room for interpretation in that we do not provide a clear metric what terms such as 'Very Comprehensible' or 'Very Hard to write' mean. Similarly, the questions on meta-model quality leave room for interpretation on the side of participants. We opted for this limitation because there are no universal ways to quantify such estimates and because the subjective assessment is what we want to collect. The reason for this is, that subjective experiences are the main driving factor for all discussions on development when people are the main subject.
To ensure overall understandability and prevent errors in the setup of the survey we used a pilot study.
. External Validity External validity is threatened by our subject sampling strategy and the limitations on the survey questions imposed by the complexity of the subject matter.
We utilise convenience sampling. Convenience sampling can limit how representative the final group of interviewees is. Since we do not know the target populations makeup, it is difficult to asses the extend of this problem.
Using research articles as a starting point introduces a bias towards researchers. There is little potential to mitigate this problem during the study design, because there exists no systematic way to find industry users.
Due to the complexity and abstractness of the concepts under investigation, a measurement via reflective of formative indicators is not possible. Instead we use single item questions. We further assume that positive and negative effects of a feature are more prominent if the feature is used more frequently. This can have a negative effect on the external validity of our results. However, we consciously decided for these limitations to be able to create a study that concerns itself with all factors and influences at once.

. Construct Validity
Construct validity is threatened by inappropriate methods used for the study.
Using the results of online surveys as input for structural equation modelling techniques is common practice in market research (Weiber and Mühlhaus ). It is less common in computer science. However, we argue that for the purpose of our study it is an appropriate methodology. This is because the goal of extracting influence strengths and moderation effects of factors on different properties aligns with the goals of market research studies that employ structural equation modelling.

. Conclusion Validity
Conclusion validity is mainly threatened by biases of our survey participants.
It is possible that people who do research on model transformation languages or use them for a long time are more likely to see them in a positive light. As such there is the risk that too little experiences will be reported on in our survey. However, this problem did not present itself in a previous study by us on the subject matter (Höppner et al. ). In fact researchers were far more critical in dealing with the subject. As a result, there might be a slight positive bias in the survey responses, but we believe this to be negligible.

Related Work
There are numerous works that explore the possibilities gained through the usage of MTLs such as automatic parallelisation (Sanchez Cuadrado et al. ; Biermann, Ermel, and Taentzer ; Benelallam et al. ), verification (Lano, Clark, and Kolahdouz-Rahimi ; Ko, Chung, and Han ) or simply the application of difficult transformations (Anastasakis et al. ). There is, however, only a small amount of works trying to evaluate the languages to gain insights into where specific advantages or disadvantages associated with the use of MTLs originate from. Several other works that can be related to our study also exist. ). The goal of the survey was to identify reasons why developers decided to use or dismiss MTLs for writing transformations. They also tried to gauge the communities sentiment on the future of model transformation languages. At ICMT' , where the results of the survey were presented, they then held an open discussion on this topic and collected the responses of participants. Their results show that MTLs have fallen in popularity. They attribute this to types of issues, technical issues, tooling issues and social issues, as well as the fact that GPLs have assimilated many ideas from MTLs. The results of their study are a major driver in the motivation of our work. While they identified issues and potential avenues for future research, their results are qualitative and broad which we try to improve upon with our study.
In a prior study of ours (Götz, Tichy, and Groner ), we conducted a structured literature review which forms the basis of much of our work since then. The literature review aimed at extracting and categorising claims about the advantages and disadvantages of model transformation languages as well as the state of empirical evaluation thereof. We searched over publication for this purpose and extracted that directly claim properties of MTLs. In total claims were found and categorised into quality properties of model transformation languages. The results of the study show that little to no empirical studies to evaluate MTLs exist and that there is a severe lack of context and background information that further hinders their evaluation.
Lastly, there is our interview study (Höppner et al. ) the data of which forms the basis for the reported study. We interviewed people on what they believe the most relevant factors are that facilitate or hamper their advantages for different quality properties identified in the prior literature review. The interviews brought forth insights into factors from which the advantages and disadvantages of MTLs originate from as well as suggested a number of moderation effects on the effects of these factors. These results for the data basis for this study.

. Empirical Studies on Model Transformation Languages
Hebig et al. ( ) report on a controlled experiment to evaluate how the use of different languages, namely ATL, QVT-O and Xtend affects the outcome of students solving several transformation tasks. During the study student participants had to complete a series of three model transformation tasks. One task was focused on comprehension, one task focused on modifying an existing transformation and one task required participants to develop a transformation from scratch. The authors compared how the use of ATL, QVTo and Xtend affected the outcome of each of the tasks. Unfortunately their results show no clear evidence of an advantage when using a model transformation language compared to Xtend. However, they concede that the conditions under which the observations are made, were narrow.
We published a study on how much complexity stems from what parts of ATL transformations (Götz and  Tichy ) and compared these results with data for transformations written in Java (Höppner, Kehrer, and Tichy ) to elicit advantageous features in ATL and to explore what use-cases justify the use of a general purpose language over a model transformation language. In the study, the complexity of transformations written in ATL were compared to the same transformations written in Java SE and Java SE allowing for a comparison and historical perspective. The Java transfor-mations were translated from the ATL transformations using a predefined translation schema. The results show that new language features in Java, like the Streams API, allow for significant improvement over older Java code, the relative amount of complexity aspects that ATL can hide stays the same between the two versions.
Gerpheide, Schiffelers, and Serebrenik ( ) use a mixed method study consisting of expert interviews, a literature review and introspection, to formalize a quality model for the QVTo model transformation standard. The quality model is validated using a survey and used to identify the necessity of quality tool support for developers.
We know of two study templates for evaluating model transformation languages that have been proposed but not yet used. Kramer et al. ( ) propose a template for a controlled experiment to evaluate comprehensibility of MTLs. The template envisages using a questionnaire to evaluate the ability of participants to understand what presented transformation code does. The influence of the language used for the transformation should then be measured by comparing the average number of correct answers and average time spent to fill out the questionnaire. Strüber and Anjorin ( ) also propose a template for a controlled experiment. The aim of the study is to evaluate the benefits and drawbacks of rule refinement and variability-based rules for reuse. The quality of reusability is measured through measuring the comprehensibility as well as the changeability collected in bug-fixing and modification tasks.

Conclusion
Our study provides the first quantification of the importance of model transformation language capabilities for the perception of quality attributes by developers. It once again highlight the complexity of the subject matter as the effect sizes of the influences are small and the final structure model grew in size.
As demonstrated by the amount of influences contained in the structure model many language capabilities need to be considered when designing empirical studies on MTLs. The results however point towards Traceability and Reuse Mechanisms as the two most important MTL capabilities. Moreover, the size of the transformations provides the strongest moderation effects to many of influences and is thus the most important context factor to consider.
Apart from implications for further empirical studies our results also point a clear picture for further language development. Transformation specific reuse mechanisms should be the main focus as shown by their rel-evance for many development lifecycle focused quality attributes such as Maintainability and Productivity.

Conflict of Interests
The authors have no competing interests to declare that are relevant to the content of this article. We now aim to quantitatively asses the interview results to confirm or reject the influences and moderation effects posed by different factors and to gain insights into how valuable different factors are to the discussion.
As an expert in the field of model2model transformations your opinion is of high value for us because your answers can provide meaningful insights.
Participating in the survey will take about 25 minutes.
There are 3 pages in this survey.
There are 35 questions in this survey.

Quality properties of Model Transformation Languages
In the following you will assess quality attributes of model transformations and the languages used for writing them.
Each question presents a description of the quality attribute that is being assessed. Comprehensibility describes the degree of effectiveness and efficiency with which the purpose and functionality of a transformation can be understood. Tool Support describes the degree of effectiveness and efficiency with which tools support developers in their effort.

Capability utilisation of Model Transformation Languages
In the following you will be asked to estimate how often you use certain capabilities of model How many elements do the meta-models involved in your transformations have? Please estimate the percentage of your use cases that fall in the following ranges.
 Each answer must be between 0 and 100  The sum must be at most 100  Only integer values may be entered in these fields.
Please write your answer(s) here: E.g. If half of the meta-models in your transformations have 25 elements and the other half are meta-models with 4 elements, you would put 50 for #elements ≤ 10 and 50 for 20 < #elements ≤ 50.
#elements ≤ 10 10 < #elements ≤ 20 20 < #elements ≤ 50 50 < #elements ≤ 100 100 < #elements ≤ 1.000 #elements > 1.000 How large are the models you transform measured in number of model elements? Please estimate the percentage of your use cases that fall in the following ranges.
 Each answer must be between 0 and 100  The sum must be at most 100  Only integer values may be entered in these fields.
Please write your answer(s) here: E.g. if 1/3 of all models you tranform contain 200 elements and the rest are larger than 100.000 elements, you would put 33 for 100 < #elements ≤ 1.000 and 66 for #elements > 100.000.
 Each answer must be between 0 and 100  The sum must be at most 100  Only integer values may be entered in these fields.
 Each answer must be between 0 and 100  The sum must be at most 100  Only integer values may be entered in these fields.
Please write your answer(s) here: The structure of a meta-model is define by the number of elements and their associations with each other. How Dissimilar or Similar are the attribute types of input and output elements that are related to each other in your transformations? Please estimate the percentage of your use cases that fall in the following ranges.
 Each answer must be between 0 and 100  The sum must be at most 100  Only integer values may be entered in these fields.
Please write your answer(s) here: E.g. when mapping a Class to a How Bad or Well structured are the meta-models in your transformations? Please estimate the percentage of your use cases that fall in the following ranges.
 Each answer must be between 0 and 100  The sum must be at most 100  Only integer values may be entered in these fields.
Please write your answer(s) here: A well structured meta-model does for example not split related data over a large number of meta-model elements if it can be avoided. How Bad or Well documented are the meta-models in your transformations? Please estimate the percentage of your use cases that fall in the following ranges. Only consider documentation of the meta-model itself not documentation in you code.
 Each answer must be between 0 and 100  The sum must be at most 100  Only integer values may be entered in these fields.
Please write your answer(s) here: Documentation means description of the meta-model elements, their attributes and associations as well as any invariants on them.
What percentage of your use cases require Synchronization between "input" and "output".
 Only numbers may be entered in this field.  Your answer must be between 0 and 100 Please write your answer here: very bad bad neither well nor bad well very well