Advertisement

Empirical Software Engineering

, Volume 23, Issue 4, pp 2323–2358 | Cite as

ProMeTA: a taxonomy for program metamodels in program reverse engineering

  • Hironori Washizaki
  • Yann-Gaël Guéhéneuc
  • Foutse Khomh
Open Access
Article
  • 696 Downloads

Abstract

To support program comprehension, maintenance, and evolution, metamodels are frequently used during program reverse engineering activities to describe and analyze constituents of a program and their relations. Reverse engineering tools often define their own metamodels according to the intended purposes and features. Although each metamodel has its own advantages, its limitations may be addressed by other metamodels. Existing works have evaluated and compared metamodels and tools, but none have considered all the possible characteristics and limitations to provide a comprehensive guideline for classifying, comparing, reusing, and extending program metamodels. To aid practitioners and researchers in classifying, comparing, reusing, and extending program metamodels and their corresponding reverse engineering tools according to the intended goals, we establish a conceptual framework with definitions of program metamodels and related concepts. We confirmed that any reverse engineering activity can be clearly described as a pattern based on the framework from the viewpoint of program metamodels. Then the framework is used to provide a comprehensive taxonomy, named Program Metamodel TAxonomy (ProMeTA), which incorporates newly identified characteristics into those stated in previous works, which were identified via a systematic literature review (SLR) on program metamodels, while keeping the orthogonality of the entire taxonomy. Additionally, we validate the taxonomy in terms of its orthogonality and usefulness through the classification of popular metamodels.

Keywords

Reverse engineering Program metamodels Program comprehension and analysis Taxonomy 

1 Introduction

Program reverse engineering plays an important role during software maintenance and evolution activities. This is because reliable information is often only embedded in the source code when maintaining and–or evolving a software system (Canfora et al. 2011). Program reverse engineering is the process of analyzing the program source code written in general purpose programming languages (Garwick 1968; Buchner and Matthes 2006), to identify program code elements and create representations of a program at a certain level of abstraction.

Metamodels exist to describe and process software programs in program reverse engineering for program comprehension, maintenance, and evolution. They are essential for developing reverse engineering tools because they define constituents and relations to be identified in programs, enabling and circumscribing the features of the tools.

Reverse engineering tools often define their own metamodels according to their purposes and intended features (Ebert et al. 2002). The code representation (i.e., metamodel) depends on the actual reverse engineering problem and the aspired program analysis technique. Each reverse engineering tool must choose the appropriate abstraction level of the metamodel. For many reverse engineering activities, only a broad overview of the system is necessary. Consequently, the amount of extracted data by language analyzers (like compilers based on low-level metamodels) may become too large to comprehend or analyze in a reasonable amount of time (Sim and Koschke 2001). On the other hand, some analysis requires details to ensure high precision and recall in the analysis results.

Each metamodel has advantages as well as limitations, which may be resolved by other metamodels. By conducting a systematic literature review (SLR) as a rigorous survey on program metamodels, we found that metamodels can be characterized by the following exhaustive orthogonal features: target language, abstraction level, meta-metamodel, exchange format, processing environment, definition, program metadata and history data, and quality.

Regarding the abstraction level, low-level metamodels represent the complete code syntax, high-level ones represent abstract architectural constituents, while mid-level ones represent neither the complete code syntax nor the architectural constituents (Lethbridge et al. 2004). Due to the differences between the metamodels, it is difficult to compare reverse engineering tools. These differences also lead to problems when exchanging information among tools (Lin and Holt 2004). For example, fact extractors often disagree and emit different facts for the same source program, undermining the users’ understanding of the program and decreasing their confidence in the extractor (Lin and Holt 2004). For example, FAMOOS Information Exchange Model (FAMIX) (Demeyer et al. 1999) and Knowledge Discovery Metamodel (KDM) (OMG 2011b), which are popular metamodels, are applicable to the same programs, but have slightly different structures.

To assist practitioners and researchers in classifying, comparing, reusing, and extending program metamodels and the corresponding reverse engineering tools according to their goals, some works have evaluated and compared metamodels and tools (Jin and Cordy 2006; Izquierdo and Molina 2014). However, the comparisons and evaluations were conducted independently and do not provide a comprehensive guide of all the possible characteristics and limitations of metamodels.

The goal of this paper is to provide a comprehensive taxonomy and use this taxonomy to classify some popular metamodels. Our taxonomy, named Program Metamodel TAxonomy (ProMeTA), and the classification results support the classification, comparison, reuse, and extension of program metamodels and reverse engineering tools in various usage scenarios. To make the taxonomy and classification results consistent, we establish a conceptual framework with definitions of program metamodels and related concepts. The framework allows our taxonomy to incorporate newly identified characteristics into existing ones while keeping the orthogonality of the entire taxonomy.

We address the following research questions.
RQ1

Does ProMeTA cover all possible characteristics and limitations in existing works that evaluate and compare program metamodels?

RQ2

Does ProMeTA have orthogonality in its classification features?

RQ3

Is ProMeTA useful for guiding practitioners and researchers? Possible usecases include creating or choosing reverse engineering tools, and, communicating or researching program metamodels and reverse engineering tools.

This paper is an extended version of a paper presented at the 32nd International Conference on Software Maintenance and Evolution (ICSME) (Washizaki et al. 2016a). We have substantially added explanations on the taxonomy construction process and all of program metamodels found in the SLR. Moreover, we have added a reverse engineering pattern to make the conceptual framework and related terminology comprehensive. We summarize our contributions as follows:
  • We developed a conceptual framework along with a pattern for program reverse engineering from the viewpoint of metamodels.

  • Using a SLR to identify necessary features, we created a comprehensive taxonomy called ProMeTA. ProMeTA characterizes program metamodels in reverse engineering based on our framework.

  • We classified existing popular program metamodels based on our taxonomy.

The remainder of this paper is organized as follows. Section 2 provides the background. Section 3 proposes our conceptual framework together with a program reverse engineering pattern, while we show the taxonomy construction process in Section 4. Section 5 shows our taxonomy. Section 6 validates and discusses our work. Finally, we provide our conclusion and future work in Section 7.

2 Background

2.1 Program Reverse Engineering

Reverse engineering is the process of analyzing a subject system in order to identify the system’s constituents and create representations in other forms or at higher levels of abstraction (Chikofsky and Cross 1990). Although reverse engineering can be initiated from any level of abstraction, this paper focuses on program reverse engineering, i.e., the process of analyzing a program source code to identify the program’s constituents and create a representation of the program. This work is motivated by the fact that, when maintaining a software system, only the source code of the program often contains reliable information (Canfora et al. 2011).

Moreover, this paper limits the target program codes to those written in general purpose programming languages (GPLs) (Garwick 1968), such as C and Java. Compared to domain specific languages (DSLs), which are used for a specific problem, GPLs are used to solve a broad spectrum of problems (Buchner and Matthes 2006). DSLs usually offer higher-level constructs (e.g., rules) in comparison to GPLs (Jouault et al. 2006). Thus, it is challenging for metamodels to describe GPL programs at appropriate abstraction levels according to specific purposes, such as program analysis (Washizaki and Fukazawa 2005), visualization (Ishizue et al. 2016), etc.

For example, the first author developed an automatic component-extraction system with its visualization (shown in Fig. 1) targeting Java source code (Washizaki and Fukazawa 2005), which parses the Java source code, and selects only basic structural data such as classes, methods, fields, and dependencies among them with respect to a Java program metamodel.
Fig. 1

Example of a view of our component-extraction system

Due to the above mentioned limitation, metamodels only handling domain specific languages (DLSs) such as SQL and XML are out of scope of this paper. Moreover the limitation leads to include metamodels and relevant reverse engineering approaches handling program source codes into the SLR, and exclude those handling only program bytecodes since bytecodes are not written in GPLs but machine-executable language specifications such as Java bytecode instructions.1

2.2 Program Metamodel

Fact extraction from source codes aims to find pieces of information about a program (e.g., the name of a class or what function calls what function) (Knodel and Calderon-Meza 2004). Fact extraction is often the first step when analyzing a software system during reverse engineering. Before performing any high-level reverse engineering activity, the available information (i.e., facts) must be extracted and aggregated in a fact base or repository (Knodel and Calderon-Meza 2004). The metamodel (i.e., schema), which specifies the constituents and relations to be extracted, is essential to a fact extractor (Knodel and Calderon-Meza 2004).

In addition, schemas are crucial to develop reverse engineering tools since they also specify the underlying semantic model of various analysis services (Favre et al. 2003). From the viewpoint of modeling technology, herein schemas for fact extraction from programs are regarded as program metamodels while the extracted facts are regarded as models of the programs that conform to the corresponding schemas used for the extraction.

3 Terminology and Conceptual Framework

Although program metamodels are used under various contexts (e.g., forward engineering and reverse engineering) and at different abstraction levels from architecture to code, the concept of metamodels is not clearly defined. Indeed, there are many synonyms for “metamodel” including “schema”, “representation”, “format”, and “form”. Moreover, metamodels are often discussed along with standard exchange formats (SEFs) without a clear distinction between the two. For example, Sim and Koschke (2001) report that the workshop focused on SEFs had a presentation addressing “a family of related SEFs including MOF, XMI, UML, XML and CDIF”. However, a Meta Object Facility (MOF) (OMG 2015a) is a meta-metamodel, whereas the others are SEFs, although a UML can also serve as a program metamodel.

3.1 Terminology

To establish a common vocabulary, we first define the following core concepts:
  • A model is a simplification of a system with an intended goal (Favre and Nguyen 2005). For example, a diagram showing only the program modules and their dependencies is a model of a program created with the goal of understanding the basic structure.

  • A metamodel is a model of the language that captures the essential properties and features of a model (Clark et al. 2015). In this context, a model is an abstraction of an aspect of the real world, while a metamodel is a further abstraction to describe the model. Although metamodels have primarily been developed and advertised by the Object Management Group (OMG) with its MOF standard (Alanen and Porres 2003) in the context of modelware, metamodels are not limited to MOF-based models. Examples of metamodels include Program metamodels in modelware, schemas (or exchange format) in dataware, and grammars in grammarware (Favre and Nguyen 2005), which are models of program modeling languages, data languages, and programming languages, respectively in different technological spaces (Kurtev et al. 2002; Wimmer and Kramler 2005). By referring to the ISO/IEC 42010:2011 terminology (ISO/IEC/IEEE 2011), a model is a “view” conforming to a “viewpoint” (i.e., metamodel) (Bruneliėre et al. 2014).2

3.2 Conceptual Framework

According to the above concepts, program metamodels and related concepts are defined below. Figure 2 shows the relationships among them following the OMG four-layer metamodel hierarchy (OMG 2015a; Kurtev et al. 2002) with some modifications to make it comparable with other model-driven engineering frameworks and views.
  • A program metamodel is a model of a programming language grammar, which represents target programs according to a specific purpose. The elements of any program metamodel must be mapped to (a set of) elements of the corresponding grammar. As shown in Fig. 2, “Program metamodel” is mapped by “Grammar”. A program model must conform to its program metamodel. In Fig. 2, “Program model” conforms to “Program metamodel”. Examples include KDM, FAMIX, and UML.

  • A program metalanguage is a language to describe program metamodels. In Fig. 2, “Program metamodel” conforms to “Metalanguage”. Metalanguages can be classified as metasyntaxes of grammar such as Extended BNF (EBNF) (ISO/IEC 1996) in textual presentation or meta-metamodels of metamodels at certain abstraction levels such as MOF and Eclipse Modeling Framework (EMF) meta model Ecore (Steinberg et al. 2008) usually in a graphic presentation.3

  • A context-free grammar (or simply grammar) is a formal device to specify which strings are part of the language, where the language is a set of strings over a finite set of symbols (Earley 1970).

  • A concrete syntax tree (CST) is a parse tree that pictorially shows how a string in a language is derived from the start symbol of the grammar (Aho et al. 2006).

  • An abstract syntax tree (AST), is a simplified syntactic representation of the source code, excluding superficial distinctions of form and constituents that are unimportant for translation from the tree (Aho et al. 2006). An AST follows an abstract grammar, which is a representation of the original concrete grammar at a higher-level of abstraction.

  • An abstract syntax model is a graphical representation of an abstract syntax (tree). Abstract syntax models can be seen as low-level program metamodels. Examples include programming-language-independent AST models such as ASTM (OMG 2011a)4 and programming-language-specific AST models such as Java Metamodel (Kollmann and Gogolla 2001).

  • A standard exchange format (SEF) (or simply an exchange format) is a metamodel (i.e., schema) of model data used to store the program models exchangeable among different tools (Fig. 2). For example, “Model data” conforms to “Exchange format”. Most of the elements in the exchange format can be mapped to (a set of) elements in the corresponding program metamodel. The exchange format may contain additional information (e.g., visual layout information) that is not included in the corresponding program model. Thus, “Exchange format” might be mapped by “Program metamodel” in Fig. 2. Examples include XML, XML Metadata Interchange format (XMI), Resource Descriptor Format (RDF), Rigi Standard Form (RSF), Tuple-Attribute Language (TA), GraX (Sim and Koschke 2001), CASE Data Interchange Format (CDIF) (Imber 1991) and MSE (Ducasse et al. 2011). Some of these (e.g., XMI and RDF) are general-purpose exchange formats that can be adapted to software, while others are specific to software (Sim and Koschke 2001).

Fig. 2

Conceptual framework of program metamodels

3.3 Program Reverse Engineering as a Pattern

Based on the conceptual framework, now we can clearly explain any program reverse engineering activity and tool from the viewpoint of program metamodels. Program reverse engineering consists of various transformations such as extraction and abstraction.

Table 1 shows the pattern Transformation to higher abstraction levels that describes a common fundamental process of software transformation in any reverse engineering activity (Washizaki et al. 2016b). The pattern is described in the pattern form consisting of an Alias name (if necessary), a specific Context, a recurrent Problem under the context, its corresponding Solution, and a Known implementation.
Table 1

Pattern: Transformation to higher abstraction levels

Section

Description

Name

Transformation to higher abstraction levels

Context

You are analyzing software to comprehend or maintain it.

Problem

The description of the software contains too much data to be comprehended or analyzed in a reasonable amount of time. You have some interest in certain aspects on the software; however, its description is too complex to focus on particular aspects of the interest.

Solution

Transform the software (i.e., Lower-base in Fig. 3) as a source to another as a target at a higher or the same level of abstraction (Higher-base). This is usually done by defining rules mapping from a metamodel at a lower level (i.e., Lower-meta) as the domain to another metamodel at a higher or the same level (i.e., Higher-meta) as the range. Figure 3 shows the elements involved in the transformation. Concrete transformations can be classified into four types: Extraction, Abstraction, View and Store.

 

Extraction transforms code artifacts based on a certain grammar to a set of program facts based on a certain program metamodel. It is usually done by a parser that parses code artifacts.

 

Abstraction transforms program models based on a certain lower metamodel to another model based on a certain higher metamodel. It is usually done by a filter component that queries, selects, and joins necessary data with respect to the higher metamodel; target higher metamodels are sometimes implicitly declared for the purpose of interactive ad hoc abstraction.

 

View transforms program models based on a metamodel to another model based on another visualization metamodel at a similar or almost the same abstraction level. The transformation results are then displayed. Typical examples are HTML tables, UML diagrams, and any general graph representation.

 

Store transforms program models based on a metamodel to model data according to an exchange format at a similar or almost the same abstraction level. Then the results are stored in a repository. Typical examples are XMI files, RDF files, and relational database.

Known implementation

Any reverse engineering tool.

Related patterns

The following patterns are based on combinations of multiple concrete transformations.

 

Integrated program reverse engineering performs Extraction, Abstraction, Store, and View in its solution.

 

Fact extraction performs Extraction and Store in its solution.

 

Architecture recovery performs Extraction, Abstraction and View in its solution.

The pattern gives specific reverse engineering activities (i.e., Integrated program reverse engineering, Fact extraction and Architecture recovery Washizaki et al. 2016b) a common context, problem, and solution. By referring to this pattern, practitioners and researchers can recognize when, why, and how to perform reverse engineering together with underlying metamodels (Fig. 3).
Fig. 3

Structure of Transformation to higher abstraction levels

For example, by referring to the pattern, maintainers may develop a new tool or use an existing tool environment such as MOOSE (Ducasse et al. 2000) to comprehend Java source code. MOOSE can extract program entities and their relationships from Java source code by dealing with the abstract grammar of Java and FAMIX as a Lower-meta and a Higher-meta, respectively. MOOSE can visualize the extracted information in various forms such as the graph representation. Moreover, MOOSE can store the information in the form a compact serialization format called MSE.

4 Taxonomy Construction

Based on the background described in Section 2 and the vocabulary presented in Section 3, we identify various characteristics to distinguish existing program metamodels. We propose a comprehensive taxonomy, called the Program Metamodel TAxonomy (ProMeTA), to classify program metamodels in the form of feature diagrams based on our conceptual framework. ProMeTA integrates the characteristics stated in existing works with those newly identified, while maintaining the orthogonality of the entire taxonomy. Below its construction process is described in detail.

4.1 Construction Process

The development of a taxonomy can be approached in two different ways: top-down and bottom-up (Unterkalmsteiner et al. 2014; Glass 2002). In the top-down approach, the taxonomy is built upon existing knowledge structures, allowing established definitions and categorizations to be reused, increasing the probability of achieving an objective classification procedure (Unterkalmsteiner et al. 2014).

As previously mentioned, existing works have evaluated and compared program metamodels and tools, but none have provided a comprehensive guide that takes all possible characteristics into account. Therefore, we adopt a top-down approach to design ProMeTA based on our conceptual framework as follows. Figure 4 outlines the process.
  1. 1.

    A specific taxonomy is designed to accommodate a single, well-defined purpose, which is applicable to various circumstances (Unterkalmsteiner et al. 2014). First, we clearly define the specific purpose of ProMeTA – to support stakeholders in classifying, comparing, reusing, and extending program metamodels in program reverse engineering. Additionally, the taxonomy should support communications among stakeholders, improving the accessibility of the research results in program metamodels and reverse engineering.

     
  2. 2.

    Evidence-based Software Engineering (EBSE) (Kitchenham et al. 2004) has been used to provide detailed insights regarding different topics in software engineering research and practice. A Systematic Literature Review (SLR) is known as the recommended EBSE method for aggregating evidence (Kitchenham et al. 2009). Using a SLR, existing works on classification and quality properties of program metamodels and tools are identified. Regarding the paper selection process within the SLR, we referred to the process adopted in another successful SLR (Sharafi et al. 2015). The aim of an SLR is to aggregate existing evidence on the research questions and to support the development of evidence-based guidelines for researchers and practitioners (Kitchenham et al. 2009). During the SLR, several popular metamodels are also identified.

     
  3. 3.

    Then, existing classifications, comparisons of program metamodels and related concepts (Lethbridge et al. 2004; Jin and Cordy 2006; Izquierdo and Molina 2014; Bellay and Gall 1997; Armstrong and Trudeau 1998; Lethbridge 1998; Sim et al. 2000; Ferenc et al. 2001, 2002; Arcelli et al. 2005; Amelunxen et al. 2006) are analyzed. This information is merged into one structure in the form of feature diagrams (Kang et al. 1990) by referring to the basic term classification defined in our conceptual framework. Feature diagrams are trees that visualize the following relationships between a parent feature and its subfeatures (i.e., child features): “Mandatory”, “Optional”, “Or”, and “Alternative”. Mandatory means that subfeatures are required. Optional indicates subfeatuers do not have to be selected. Or implies at least one subfeature must be selected. Alternative denotes only one subfeature must be selected. A feature diagram essentially defines a taxonomy (Czarnecki and Helsen 2003). Additionally, the quality properties of program metamodels and related concepts discussed in papers (Favre et al. 2003; Kurtev et al. 2002; Clark et al. 2015; Sim et al. 2000; Ferenc et al. 2002; Tilley et al. 1994; Saint-Denis et al. 2000; Jin 2001, Jin et al. 2002; Czarnecki and Helsen 2003; Christopher 2006; Wu 2010) as identified by the SLR are combined.

     
  4. 4.

    Then by referring to the basic term classification defined in the framework, all the identified characteristics in existing metamodels are added to the taxonomy while maintaining the orthogonality.

     
  5. 5.

    Finally, the taxonomy is validated in terms of its orthogonality, coverage, and usefulness by using it to classify the five popular metamodels identified in the SLR.

     
Fig. 4

Overview of taxonomy construction process

4.2 Systematic Literature Review

We searched for papers about program metamodels in reverse engineering using Engineering Village,5 which is a search platform providing access to 12 trusted engineering document databases, such as Ei Compendex and Inspec. The Engineering Village gives us the ability to search in all recognized scholarly engineering journals, conference, and workshop proceedings over different databases with a unique search query. Moreover the Engineering Village allows us to detect and remove most of duplicates in the search results automatically.

Because our main goal is to study characteristics of GPL-based program metamodels, and advantages and limitations that they offer to reverse engineering tools, metamodels dealing with program source code are regarded as study subjects while performing reverse engineering tasks processing program source codes as stimuli. Thus, we define the following three sets of keywords for defining our search query shown in Fig. 5. In our search query, “*” at the end of each word is a truncation and can be replaced with zero or more characters.
  • Subject: “meta model”6 OR “meta models” OR metamodel* . We use this category to find papers that define and/or use a metamodel.

  • Stimuli: “source code” OR “source codes” OR program* . We define this set to find studies based on the types of stimuli that are usually use in program metamodels studies.

  • Task: extract* OR transform* OR generat* . These are simple yet sufficient to identify relevant papers since any reverse engineering objective and application must employ some sort of transformation; For example, extraction and generation can be regarded as types of vertical transformations (Gray et al. 2004).

Fig. 5

Search query

4.3 Inclusion and Exclusion Criteria

We defined the following inclusion and exclusion criteria for the SLR. The relevance was verified by reviewing the title, the abstract, and if necessary, the body.

Inclusion criteria:
  1. a).

    Studies published in journals or conference proceedings in the form of papers employing metamodels for program reverse engineering targeting program source code written in GPLs. For example, we included studies on program reengineering such as modernization and refactoring only if they employed program metamodels for the explicit reverse engineering phase as part of the entire reengineering process.

     
  2. b).

    Studies that present details and–or complete results if a group reported more than one study on the same topic.

     
Exclusion criteria:
  1. a).

    Studies that do not employ a program metamodel.

     
  2. b).

    Studies that are not directly related to program reverse engineering targeting program source codes written in GPLs. For example, we exclude studies on model refactoring or transformation if they do not include any reverse engineering phase in the proposed refactoring or transformation process. We also exclude studies on parsers just for program compilation even though these parsers such as javac and the Eclipse Java Development Tools (JDT) in Java have internal metamodels (Heidenreich et al. 2010). For the same reason, we exclude studies just focusing on program transformations such as (Bravenboer et al. 2008). None of these implementations does provide an integration with standard metamodelling tools (Heidenreich et al. 2010); i.e., these are not originally intended for program reverse engineering objectives.

     
  3. c).

    Elements of “grey” literature that are not published by trusted, well-known publishers, and do not use a well-defined referee process (Budgen et al. 2011).

     
  4. d).

    Articles not published in English.

     
Fig. 6

The selection process with numbers showing number of papers remaining after each activity

4.4 Paper Selection Process

The process that we adopt to select the relevant papers is presented in Fig. 6. In the figure, we present the set of activities that we undertook on the left while we present the number of remaining papers after each activity on the right as of October 13th 2015. In below, we explain each activity in detail.
  1. a).

    Initial search: we execute our query in the Engineering Village. The search engine searches into the title, the abstract, and the keywords section of the papers looking for the keywords that are defined in our query to find the matching papers. The original set of papers provided by our proposed search query contains 1587 papers.

     
  2. b).

    Automatic duplicate removal: we apply the Engineering Village’s duplicate removal feature to automatically find and remove duplicate papers for first 1000 results.7

    1462 papers remain in the list.

     
  3. c).

    Manual duplicate removal: we find and remove duplicates manually. Looking at the title, abstract and source (such as the conference name), we check whether the paper is duplicated or not. Although the Engineering Village search engine performed the duplicate removal process, there are still some duplicates in the result because different publishers use different formats to save and display the name of the authors, e.g., using initials instead of full names.

    1234 papers remain in the list.

     
  4. d).

    Apply inclusion and exclusion criteria: we perform this activity to check whether the paper is relevant or not by using the criteria. For each selected paper, one author conducts the initial check, while another author confirms the result of the initial check. In case of disagreement, there is a discussion among all authors until agreement.

    We apply the inclusion/exclusion criteria and reduce the number of papers to 50. We use the title, the abstract and the body of the papers to remove irrelevant papers such as model-driven development works without any program metamodels for reverse engineering. 51 non-English papers are removed. 11 proceedings and books are also removed, because they list all of the accepted papers or chapters; we have already selected those related to the study.

     
  5. e).

    Full analysis: we check whether there are multiple papers on the same approach reported by authors belonging a same group; in that case we include only a paper that represents most details. Moreover, we replace work summary papers such as a summary of a Ph.D. work with their complete version papers.

    We perform the full analysis and remove 6 more papers from the list, leaving 44 papers (Ebert 2008; Bergmayr and Wimmer 2013; Naik and Bahulkar 2004; Chirila and Jebelean 2010; Izquierdo and Molina 2010, 2014; Martinez et al. 2014: Owens and Anderson 2013; Soden and Eichler 2007; Antoniol et al. 2003; Vidács 2009; Strein et al. 2006; Lethbridge et al. 2004; Lin and Holt 2004; Knodel and Calderon-Meza 2004; Brühlmann et al. 2008; Tripathi et al. 2009; Lanza 2003; Pinzger et al. 2005; Mens and Lanza 2002; Tichelaar et al. 2000; Antoniol et al. 2005; Gȯmez et al. 2009; Reus et al. 2006; Reus et al. 2004; Cho 2005; Heidenreich et al. 2010; Kollmann and Gogolla 2001; Favre 2008; Pėrez-Castillo et al. 2013; Santibȧnėz et al. 2015; Durelli et al. 2014; Izquierdo and Molina 2010; Martinez et al. 2014; Arcelli et al. 2010; Guėhėneuc and Albin-Amiot 2001; Harmer and Wilkie 2002; Wu 2010; Gȯmez and Ducasse 2012; Alikacem and Sahraoui 2009; Ossher et al. 2009; Keller et al. 2001; Abdi et al. 2006; Krasovec and Howell 1995; Sora 2012a, 2012b). We remove two papers because (Chirila and Jebelean 2010) presents details about the proposed reverse engineering approach employing the logic-based program representation that is common in those three papers. We also remove Izquierdo and Molina (2009) because Izquierdo and Molina (2014) is its corresponding complete journal paper. There are three papers whose text we could not find online.

     
  6. f).

    Perform the focused-snowballing process: for each selected paper, one author is assigned to go through the list of all references in order to find additional papers about classifications and quality properties of program metamodels.

    By performing the snowballing process, we additionally identified 9 papers about classification and comparisons of program metamodels and related concepts (Jin and Cordy 2006; Izquierdo and Molina 2014; Bellay and Gall 1997; Armstrong and Trudeau 1998; Sim et al. 2000; Ferenc et al. 2001; Ferenc et al. 2002; Arcelli et al. 2005; Amelunxen et al. 2006) and 11 about quality properties (Favre et al. 2003; Clark et al. 2015; Kurtev et al. 2002; Sim et al. 2000; Ferenc et al. 2002; Tilley et al. 1994; Saint-Denis et al. 2000; Jin 2001; Jin et al. 2002; Czarnecki and Helsen 2003; Christopher 2006).

     

5 Program Metamodel TAxonomy (ProMeTA)

ProMeTA consists of nine features that represent major points of variation (Fig. 7). Each feature is described below. Some of the features are designed to include concrete artifacts such as concrete programming languages if those are well known and accepted.
Fig. 7

Feature diagram for the program metamodels

5.1 Feature: Target Language

Language independence varies by metamodel; it depends on what kind of “Grammars” are supported and mapped by the “Program metamodel” as shown in Fig. 2. Some metamodels only handle a certain language, while others handle multiple languages in a specific or any category; In later case, usually only common concepts among multiple languages are addressed. Even if a metamodel is stated to be “language independent”, our analysis reveals that it often supports only a very limited number of languages. To define these characteristics precisely, the target language feature consists of two parts: language independence and current supported languages (Fig. 8).8
Fig. 8

Feature diagram for the target languages

5.2 Feature: Abstraction Level

A representation (i.e., model) conforming to a metamodel must be as abstract as possible (Kunert 2008) within the limits of its reverse engineering objectives. Metamodels can be classified into three abstraction levels (Fig. 9): 1) low where the metamodel represents the complete syntax of a code, 2) high where the metamodel represents abstract architectural elements, and 3) middle where the metamodel represents neither of the above (Lethbridge et al. 2004). In Fig. 2, “Grammar metamodel” corresponds to 1), while “Architecture / design metamodel” corresponds to 2) and 3).
Fig. 9

Feature diagram for the abstraction levels

According to the requirements (Lethbridge 1998), SEFs should address classes (i.e., modules),9 associations (i.e., relationships), and attributes. The same requirements can commonly be applied to high- or mid-level program metamodels; the domain ontology for integrating several reverse engineering tools (based on high- or mid-level metamodels) (Jin and Cordy 2006) specifies these characteristics. The ontology also contains other concepts such as System, Module (i.e., self-contained entity), SubProgram (i.e., non-self-contained entity), Variable, Containment relationship, and Use relationship (Jin and Cordy 2006), which are applicable to mid-level metamodels.

Regarding low-level metamodels, we follow the three representation aspects (Ferenc et al. 2001): Lexical Structure, Syntax, and Semantics. Moreover, we add Dialects such as non-standard language specifiers as well as Preprocessor Artifacts (Ferenc et al. 2002) and Static/Dynamic semantics (Amelunxen et al. 2006), taken from existing schema comparisons (Ferenc et al. 2002; Amelunxen et al. 2006). For example, ASTM supports creating AST models for specific general purpose languages and DSLs as well as dialects and preprocessor artifacts of these languages.

5.3 Feature: Metalanguage

The data structures of SEFs used to represent software are classified as a Tree, a Graph, or Structured Data (i.e., data that is not a tree or a graph) (Jin 2001). We adopt the same classification for classifying metalanguages with our conceptual framework.

Based on the classification shown in Fig. 10, several well-accepted standard meta-metamodels together with the metasyntax of grammar, including MOF, EMF/Ecore, Kernel MetaMetaModel (KM3) (Jouault and Bezivin 2006), UML, and EBNF, are listed. In Fig. 2, EBNF corresponds to “Metasyntax of grammar”, while others correspond to “Meta-metamodel”. KM3 is a meta-metamodel that has concepts similar to those found in MOF but is simpler than MOF (Jouault et al. 2006). Although UML is originally a modeling language classified in the M2 layer of the OMG four-layer metamodel hierarchy, it is often used to model program metamodels.
Fig. 10

Feature diagram for the metalanguages

5.4 Feature: Exchange Format

Program metamodels may depend on or have a high affinity with specific SEFs (i.e., “Exchange format” in Fig. 2). However, it is preferable if program metamodels are independent from any SEF in order to exchange models among tools. For example, a reverse engineering tool environment called MOOSE (Ducasse et al. 2000) defines its own program metamodel called FAMIX, but it adopts CDIF (and later XMI and MSE) to exchange FAMIX-based information between different tools (Nierstrasz et al. 1998; Jin 2001).

Figure 11 shows the characteristic properties and considerations of SEFs (Jin 2001; Jin et al. 2002). Among them, most quality characteristics, including scalability, simplicity, neutrality, formality, flexibility, evolvability, identity, solution reuse, and legibility, are examined according to the exchange patterns (Jin et al. 2002) (i.e., combinations of clarity and locality of the exchange format on which the metamodel depends).
Fig. 11

Feature diagram for the exchange formats

The exchange format satisfies integrity only if a special mechanism to ensure an errorless exchange is provided (Jin et al. 2002). If supported by many different tools, it satisfies popularity (Jin et al. 2002). The exchange format satisfies completeness only if all the information in the metamodel can be included. On the other hand, the exchange format satisfies transparency only if no loss, alteration, or gain in the transferred information occurs due to the use of encoders and decoders (Jin et al. 2002).

As for the Abstract Syntax property, we list well-accepted SEFs, including Annotated Terms (ATerms) (van den Brand and Klint 2007), InterMediate Language (IML), and Resource Graph (RG) (Czeranski et al. 2000), Multi-Layer, and Multi-Edge-Set (MLMES) graph (Lin et al. 1998), CASE Data Interchange Format (CDIF) (Imber 1991), Tuple-Attribute Language (TA) (Holt 1998), TA++ (Lethbridge 1998), and Datrix-TA (Lapierre et al. 2001), PROgramming with Graph Rewriting Systems (PROGRES) graph specification (Schu̇rr 1997), GraX/TGraph (Ebert et al. 1999), Graph Exchange Language (GXL) (Holt et al. 2006), Rigi Standard Form (RSF) (Kienle and Mu̇ller 2010), and MSE (Ducasse et al. 2011), along with general-purpose exchange formats, including XML (W3C 2000) and XMI (OMG 2015b).

5.5 Feature: Processing Environment

By providing mechanisms to query (i.e., navigate) and transform program models, language toolkits, including reverse engineering tools, can fulfill analysis and comprehension tasks as well as maintenance and source code transformation tasks (Antoniol et al. 2003). All these tasks follow the fundamental process of transformation as described in the Transformation to higher abstraction levels pattern. Specific processing environments to provide such mechanisms for navigation, transformation, analysis, and extraction are often provided together with the program metamodels. Figure 12 represents the major points of variations in the processing environment.
Fig. 12

Feature diagram for the processing environments

5.6 Feature: Definition

Typically, program metamodels are defined manually. Some approaches exist to automatically generate program metamodels from grammars (Kunert 2008; Bergmayr and Wimmer 2013), but they were originally intended for DSLs. Regarding the clarity and locality of the definitions, program metamodels can be classified into four exchange patterns similar to SEFs (Jin 2001; Jin et al. 2002): implicitly-internally defined, implicitly-externally defined, explicitly-internally defined, and explicitly-externally defined (Fig. 13).
Fig. 13

Feature diagram for the definition

5.7 Feature: Program Metadata and History Data

According to the requirements for SEFs (Lethbridge 1998), they should be able to store basic data (i.e., metadata) about the software systems they represent, including programming language versions, software system versions, file creation dates, and file versions. We believe that program metamodels should handle such metadata together with the name of the programming languages.

Moreover, several program metamodels such as Ring (Gȯmez and Ducasse 2012) directly support the history data, allowing reverse engineering tools to work easily with source code versioning systems to conduct history analysis at some abstraction level. Figure 14 shows these characteristic properties.
Fig. 14

Feature diagram for the program metadata and history data

5.8 Feature: Quality

We use the standard quality model ISO/IEC 25010:2011 (ISO/IEC 2011) as the basis to specify the quality properties of the program metamodels in a comprehensive and consistent manner. During the SLR, we found 12 papers discussing quality properties that are applicable to program metamodels. They are requirements for SEFs (Saint-Denis et al. 2000; Jin 2001; Jin et al. 2002), requirements for C++ schemas (Ferenc et al. 2002), requirements for reverse engineering tools enabled by schemas (Favre et al. 2003; Tilley et al. 1994), comparative considerations for program comprehension tools (Sim et al. 2000), evaluation properties for static analysis frameworks (Christopher 2006), comparative issues for technological spaces (Kurtev et al. 2002), tracing features for model transformations (Czarnecki and Helsen 2003), formality levels of metamodeling (Clark et al. 2015), and correctness of metamodels (Wu 2010).

We categorized these properties along with those newly identified such as available form and verification into seven quality characteristics and their sub-characteristics defined in the ISO/IEC 25010:2011 quality model. Figure 15 shows the feature diagram for functional suitability, while Fig. 16 shows the feature diagram for the other quality characteristics. They can be summarized as follows:
  • Functional suitability consists of three sub-characteristics: 1) functional appropriateness, which is mostly concerned with traceability (Kurtev et al. 2002; Czarnecki and Helsen 2003) from model elements to the corresponding portion of the source code, 2) functional correctness regarding how the program metamodel is verified (Wu 2010), and 3) functional completeness regarding the applicability of the metamodel (i.e., general purpose metamodels or task-specific ones) (Tilley et al. 1994). In general, low-level metamodels are good for executability since any GPL should provide executable semantics, whereas most mid- or high-level metamodels lack executable semantics.

  • Performance efficiency addresses the quantity of extracted data (Sim et al. 2000) and primarily depends on the granularity of the metamodel. A metamodel sacrifices such resource utilization if the ratio of the extracted information to code is very high.

  • Compatibility addresses the interoperability among different tools and environments, which is broken down into several concrete properties. The identity (i.e., the identity preservation during transformation), solution reuse, and neutrality are primarily determined by the exchange patterns (Jin et al. 2002). A metamodel satisfies integrity only if some special mechanism to ensure an errorless exchange has been provided with the metamodel (Jin et al. 2002). A metamodel satisfies the instance representation (Ferenc et al. 2002) if a model can be easily represented in any SEF. This property is almost identical to the content-presentation separation (Kurtev et al. 2002).

  • Usability addresses the learnability that is supported by the existence of documentations, samples, and user communities (Christopher 2006).

  • Reliability addresses the availability of the program metamodel in terms of licensing (Christopher 2006). Although metamodels should be fully available through websites or other means, sometimes only parts of a metamodel are provided.

  • Maintainability encompasses five sub-characteristics. Among them, simplicity and evolvability are primarily determined by the exchange patterns (Jin et al. 2002). Some metamodels have specific modularity mechanisms (such as packages) and–or reuse mechanisms (such as the inheritance and logical composition of metamodel elements) (Czarnecki and Helsen 2003) to improve maintainability. The formality is specified as partially formalized or completely formalized (Clark et al. 2015) according to the available metamodel definition.

  • Portability addresses adaptability and is composed of three concrete properties: flexibility and scalability are primarily determined by the exchange patterns (Jin et al. 2002). A metamodel satisfies popularity if many different organizations beside the original developers have used it.

Fig. 15

Feature diagram for the functional suitability

Fig. 16

Feature diagram for the performance efficiency, compatibility, usability, reliability, maintainability, and portability

6 Validation of ProMeTA

A taxonomy can be validated by demonstrating the orthogonality of its classification features, benchmarking against existing classification schemes, and demonstrating its utility to classify existing knowledge (Smite et al. 2014). In our case, orthogonality means that a metamodel is classified as only one category of possible combinations of concrete features in the feature diagram. For example, the feature diagram of Definition yields 12 possible combinations of concrete features.10 Each metamodel is classified into only one category such as (Manually, Implicit, Internal). We validated ProMeTA by classifying the popular metamodels identified in the SLR.

6.1 Target Popular Metamodels

For the concrete reverse engineering techniques or tools described in the set of 44 original papers obtained during the SLR, a total of 35 named and unnamed program metamodels were adopted.11 Table 2 shows the list of the metamodels and corresponding papers. Surprisingly, most of papers adopted their own metamodels although there is not so much difference in characteristics and objectives. For example, there are seven similar AST-based metamodels (i.e., “Abstract Syntax *” in the table) defined independently. Moreover, there are four similar metamodels specific to the Java language (i.e., “Java * Model” in the table) but defined independently.
Table 2

List of metamodels found in SLR

Metamodel

List of papers

Abstract Syntax Graph in TGraph

(Ebert 2008)

Abstract Syntax Metamodel in ECORE/EMF

(Bergmayr and Wimmer 2013)

Abstract Syntax Model in a graph grammar

(Naik and Bahulkar 2004)

Abstract Syntax Tree in logic representation

(Chirila and Jebelean 2010)

Abstract Syntax Tree Metamodel (ASTM)

(Izquierdo and Molina 2010, 2014; Martinez et al. 2014; Owens and Anderson 2013)

Abstract Syntax Tree Model in MOF

(Soden and Eichler 2007)

Abstract Syntax Tree Model in UML

(Antoniol et al. 2003)

Architecture Model in TGraph

(Ebert 2008)

Columbus Schema

(Vidȧcs 2009)

Common Meta-Model in common tree grammar

(Strein et al. 2006)

Daghstul Middle Metamodel

(Lethbridge et al. 2004)

Datrix schema

(Lin and Holt 2004)

Delphi metamodel in UML

(Knodel and Calderon-Meza 2004)

Generic AST model in MOF

(Reus et al. 2006)

Grammar by EBNF

(Bergmayr and Wimmer 2013)

GXL schema in UML

(Meng and Wong 2004)

Hismo

(Gȯmez et al. 2009)

Integrated Meta-model of Reengineering in UML

(Cho 2005)

JaMoPP Java Model

(Heidenreich et al. 2010)

Java Meta Model in UML

(Kollmann and Gogolla 2001)

Java MetaModel in grUML

(Ebert 2008)

Java Metamodel in MOF

(Favre 2008)

KDM

(Pėrez-Castillo et al. 2013; Santibȧṅez et al. 2015; Durelli et al. 2014; Izquierdo and Molina 2010; Martinez et al. 2014)

FAMIX

(Bru̇hlmann et al. 2008; Tripathi et al. 2009; Lanza 2003; Pinzger et al. 2005; Mens and Lanza 2002; Tichelaar et al. 2000; Antoniol et al. 2005; Gȯmez et al. 2009)

MARPLE model in ECORE/EMF

(Arcelli et al. 2010)

Meta-model for design patterns and source code

(Guėhėneuc and Albin-Amiot 2001)

Program entities and relationships in RDB

(Harmer and Wilkie 2002)

Program Metamodel in UML

(Wu 2010)

Ring meta-model

(Gȯmez and Ducasse 2012)

Source Code Meta-Model in UML

(Alikacem and Sahraoui 2009)

SourcererDB Metamodel

(Ossher et al. 2009)

SPOOL repository schema

(Keller et al. 2001; Abdi et al. 2006)

System Engineering Technology Interface metamodel

(Krasovec and Howell 1995)

UNIQ-ART Meta-model

(Sora 2012a, 2012b)

In Table 2, we identified that there are five program metamodels adopted in multiple papers:12
  • M1. Abstract Syntax Tree Metamodel (ASTM): four papers (Izquierdo and Molina 2010, 2014; Martinez et al. 2014; Owens and Anderson 2013)

  • M2. Knowledge Discovery Meta-Model (KDM): five papers (Pėrez-Castillo et al. 2013; Santibȧṅez et al. 2015; Durelli et al. 2014; Izquierdo and Molina 2010; Martinez et al. 2014)

  • M3. FAMOOS Information Exchange Model (FAMIX): eight papers (Bru̇hlmann et al. 2008; Tripathi et al. 2009; Lanza 2003; Pinzger et al. 2005; Mens and Lanza 2002; Tichelaar et al. 2000; Antoniol et al. 2005; Gȯmez et al. 2009)

  • M4. SPOOL Metamodel: two papers (Keller et al. 2001; Abdi et al. 2006)13

  • M5. UNIQ-ART Metamodel: two papers (Sora 2012a, 2012b)14

6.2 Classification Results

We classified the aforementioned metamodels M1–M5 using ProMeTA (Fig. 17). The findings and corresponding suggestions for practitioners and researchers are summarized as follows:
  • Target language: Of the five metamodels, three are language independent, while two handle object-oriented source code. Regardless of the language independence, all support the Java language since it seems to be the most common, especially in the context of reverse engineering research and practice. The second most common language is C++.

    If the target language is a major one like Java or C++, existing program metamodels and their corresponding reverse engineering tools may be reused, but if the target language is a minor one, a specific metamodel must be selected or a new one must be created.

  • Abstraction level: All of the five metamodels can be used as mid-level metamodels, but only one metamodel (M2) can be used as a high-level one. According to the coverage of the low-level metamodel features, M1 and M2 are more useful even though they still miss some lexical structure features such as Token, Separator, and Layout. There are limited supports for language dialects.

    Practitioners and researchers can choose an appropriate metamodel and its corresponding reverse engineering tool according to their abstraction level requirements. However, our classification results indicate that none of the existing metamodels supports all of the required features at certain abstraction levels; in this case, it may be necessary to extend existing metamodels or create new one to cover the missing features.

  • Metalanguage: Four of the five metamodels adopt the standard meta-metamodel MOF or the unified language UML, which are explicitly and externally defined, while only M5 adopts a specific implicitly-internally definition.

    If practitioners and researchers adopt various tools for long-term usage, it may be better to choose or create program metamodels (like M1–M4) defined by widely accepted, explicitly-externally defined metalanguages (especially MOF and UML).

    In addition, the existence of user communities of metamodels could contribute to the ease of usage of their metalanguages; for example, since M3 has a large user community as identified regarding the feature Q9: Learnability, its metalanguage UML could be a good choice for creating (or selecting) program metamodels.

  • Exchange format: Corresponding to the metalanguage used, three of the five metamodels adopt standard SEFs such as XMI, which are explicitly-externally defined, while M5 supports a specific binary-based implicitly-internally defined data exchange.

    If practitioners and researchers consider utilizing various tools for long-term usage, selecting or creating program metamodels with a good exchange format quality (like M1, M2 and M4), which support the widely accepted, explicitly-externally defined SEFs (especially XMI) may be a better choice; however, its impact on selection or creation could be less than those of other features (such as the abstraction level) since specific exchange formats can be additionally supported by preparing convertors among exchange formats, unless the metamodel originally supports explicitly-externally defined SEFs.

  • Processing environment: Due to their popularity, all of the five metamodels have dedicated extractors and navigation supports. It is obvious that extractors and navigation supports should be prepared to improve the ease of use of any program metamodels.

    There are dedicated transformation supports including refactoring facilities for three of five. Most of the metamodels (except for M5) are suitable for transformations and program analysis. Practitioners and researchers should check whether the processing environment and facilities are available to meet their reverse engineering objectives.

  • Definition: All of the five metamodels are manually defined. All except M5 are explicitly defined, leading to high quality metamodels with high compatibility, maintainability, and portability. Three of which are externally and fully formalized. The other two (M4 and M5) are internally defined.

    If practitioners and researchers utilize various tools for long-term usage, selecting or creating explicitly-externally defined metamodels (like M1–M3) is a better choice.

  • Program metadata and history data: There are few supports to describe meta and history data in metamodels; only the programming language name and the file version are supported by M1 and M2, respectively.15

    During the SLR, several history-aware metamodels were found to explicitly address the version history: Ring (Gȯmez and Ducasse 2012), Hismo (Gîrba and Ducasse 2006; Gȯmez et al. 2009), FAMIX-based RHDB code model (Antoniol et al. 2005) and FAMIX-based ArchEvoDB schema (Pinzger et al. 2005). If practitioners and researchers conduct reverse engineering in which history analysis is taken into account, selecting a history-aware metamodel, especially the RHDB code model and the ArchEvoDB schema, may be better since these are defined as extensions of FAMIX, which is a widely accepted popular metamodel.

  • Functionality: Two metamodels (M1 and M2) support most of the functional suitability features, including executability, traceability, and transformability, since these are low-level metamodels supporting static and dynamic semantics shown in the abstraction level features. None explicitly state how these have been verified. Although most can be used for various purposes, only M5 is for several specific tasks such as the dependency analysis.

    Practitioners and researchers should verify whether the potential program metamodels satisfy their reverse engineering functionality requirements. If a metamodel is used for various reverse engineering purposes, selecting a general one (like M1–M4) is better.

  • Non-functionality: Only M1 sacrifices the performance efficiency since it contains all of the statement-level code descriptions. Three (M1–M3) have a good usability since documents and samples with communities are well prepared. These three metamodels also have good compatibility, maintainability, and portability since these are explicitly-externally defined, fully formalized, and fully available. Unfortunately the definitions of M4 and M5 seem to be unavailable elsewhere on the Internet or in the literature. Most of the metamodels (except M5) support inheritance and logical composition as reuse mechanism. However, only M2 supports the dedicated modularity mechanism.

    Practitioners and researchers should check whether potential program metamodels satisfy their non-functionality requirements. If existing metamodels are to be reused, they must select fully available and formalized metamodels (like M1–M3).

The above-mentioned findings and suggestions can be summarized as follows. Existing program metamodels can be reused for major languages such as Java and C++. It is better to choose and/or create program metamodels defined by explicitly-externally defined major metalanguages. It is better to choose and/or create program metamodels associated with explicitly-externally defined SEFs. Most of popular program metamodels are suitable for transformations and program analysis; however, few support to describe meta and history data.
Fig. 17

Classification results using ProMeTA (M1: ASTM, M2: KDM, M3: FAMIX, M4: SPOOL Metamodel, M5: UNIQ-ART Metamodel, X: supports the characteristic indicated, ++: particularly satisfies the characteristic/requirement indicated, +: satisfies the characteristic/requirement indicated, -: sacrifices or does not satisfy the characteristic/requirement indicated, Exp: Explicit, Imp: Implicit, Ext: External, Int: Internal)

6.3 Discussions

7 RQ1: Does ProMeTA cover all possible characteristics and limitations in existing works that evaluate and compare program metamodels?

During the construction process of ProMeTA, the important characteristics from existing classification schemes/frameworks and comparisons (Lethbridge et al. 2004; Jin and Cordy 2006; Izquierdo and Molina 2014; Bellay and Gall 1997; Armstrong and Trudeau 1998; Lethbridge 1998; Sim et al. 2000; Ferenc et al. 2001, 2002; Arcelli et al. 2005; Amelunxen et al. 2006) as well as discussions on quality properties (Favre et al. 2003; Clark et al. 2015; Kurtev et al. 2002; Sim et al. 2000; Ferenc et al. 2002; Tilley et al. 1994; Saint-Denis et al. 2000; Jin 2001; Jin et al. 2002; Czarnecki and Helsen 2003; Christopher 2006; Wu 2010) for program metamodels and related concepts identified by the SLR were included or mapped to the items, implying that it has adequate coverage. Thus, ProMeTA is implicitly benchmarked against existing classification schemes.

8 RQ2: Does ProMeTA have orthogonality in its classification features?

We successfully classified popular program metamodels from the SLR according to the characteristics defined in ProMeTA and show how it can help classify program metamodels. Moreover, the classification did not result in the characteristics fitting into more than one category, demonstrating the orthogonality of the classification features.

9 RQ3: Is ProMeTA useful for guiding practitioners and researchers?

ProMeTA can guide practitioners and researchers in the following possible usecases UC1–UC3.
  • UC1. Developing new reverse engineering tools: When engineers want to build their own reverse engineering tools, they must define the requirements in program metamodels that enable and circumscribe the features of the tools. ProMeTA supports the requirements definition and guides reuse, extension, or creation of metamodels because engineers can recognize features included in ProMeTA as possible requirement items. Moreover, if a ProMeTA-based classification result of a potential metamodel for reuse or extension is available like M1–M5 in the above validation, engineers can easily determine whether the metamodel satisfies their requirements.

  • UC2. Choosing existing reverse engineering tools: When engineers want to reuse and eventually extend existing reverse engineering tools, they must compare and then select the appropriate one according to how the underlying program metamodels meet their objectives. ProMeTA can help by comparing criteria and the metamodels according to the characteristics defined in ProMeTA. Moreover, ProMeTA may help by comparing existing classification results of metamodels (if available).

  • UC3. Communicating or researching program metamodels and reverse engineering tools: ProMeTA can serve as a reference for the reverse engineering community, including practitioners and researchers. It can be extended by peers, providing the community with an important body of knowledge to guide future communications and research on program metamodels and the corresponding reverse engineering tools since it incorporates the characteristics of metamodels into a single orthogonal structure based on a conceptual framework that defines common terminology. For example, ProMeTA can serve as the basis for building an open repository of information of existing program metamodels (and corresponding tools) by accumulating classification results. The above-mentioned classification results of M1–M5 can be a starting point.

9.1 Limitations

Five popular metamodels are identified solely on their adoption in papers selected by the SLR. It is plausible that such “popularity” does not reflect actual popularity in program reverse engineering tools and projects. In the future, we will investigate actual adoptions of metamodels in active projects on reverse engineering tools and classify these metamodels using ProMeTA.

The classification of the five popular metamodels based on ProMeTA was conducted by the first author of this paper and reviewed by the second and third authors. Therefore, it is possible that our classification results may not be completely correct. To mitigate this threat to validity, we have opened the classification results and ProMeTA to the public and call for comments at our Website.16 In particular, we did contact the original developers of the metamodels addressed in the paper and request a review.

Thus, we received six sets of complementary information regarding the metamodels ASTM/KDM, FAMIX, SPOOL, and UNIQ-ART. These sets validated our findings, as reported in the previous sections, but also confirmed the known limitation of this and any SLR: the limitation due to circumscribing the review and thus missing interesting papers. For example, our discussions with the colleagues working on FAMIX pointed us to works in which FAMIX was used to model program metadata and related data, including, but not limited to: Evolizer to analyze source code and software project data (Gall et al. 2009), ChEOPS to represent changes as first-class entities for change-oriented software engineering (Ebraert et al. 2007), and Orion to model simultaneous versions in a software version repository (Laval et al. 2011). We did not include these works in our review because they were beyond the borders of our review. These discussions with colleagues show the relevance of our review and taxonomy and the need to open the taxonomy so that it can be augmented incrementally to encompass more metamodels and usages thereof, even beyond program reverse engineering.

Our taxonomy also does not include infrastructure such as srcML17 (Collard et al. 2011), which describes source code in an alternative, well specified format, because such infrastructure at are a lower-level of abstraction that the metamodels described in our taxonomy. Indeed, srcML, although useful and well used in research, does not directly provide a metamodel abstracting source code elements but rather describe source code elements systematically, using a XML format. Thus, it is not included in our taxonomy although well worth mentioning here.

We used Engineering Village as the initial document base of the SLR. Although it is adopted in other SLRs (Sharafi et al. 2015), relevant papers may be missed. Additionally, we may have missed relevant papers even after double-checking the paper selection results. To mitigate these threats, we plan to use other databases, extend our SLR, and elicit public review of the revised results.

Although our rigorous systematic literature survey identified the characteristics of program metamodels, other characteristics to be used for classification of metamodels may be omitted. ProMeTA is expected to efficiently incorporate such missing characteristics into the single structure because the form of feature diagrams should make such an extension of the taxonomy easy.

Any taxonomy can only unleash its full potential through widespread awareness and a large number of contributions (Engstrȯm and Petersen 2015). Therefore, our future work is to follow a popularization strategy (Engstrȯm and Petersen 2015).

10 Conclusion and Future Work

In this paper, we propose a conceptual framework with definitions of program metamodels and related concepts as well as build a comprehensive taxonomy named ProMeTA based on this framework. ProMeTA incorporates newly identified characteristics into those stated in existing works via a systematic literature survey on program metamodels, while maintaining the orthogonality of the entire taxonomy. This feat is accomplished by referring to the basic term classification defined in the framework. Additionally, we validate the taxonomy in terms of its orthogonality and usefulness through the classification of five popular metamodels from the survey. We have made ProMeTA available to the reverse engineering community, including practitioners and researchers, through our Website.

In the near future, we plan to validate ProMeTA by conducting experiments involving the three usecases (UC1–UC3) in Section 6. This should provide improved answers to the research questions, especially RQ3. We are also planning a collaborative Wiki to let the community refine or modify ProMeTA online.

Over the long term, we plan to extend our SLR using additional databases and share the revised results to obtain reviews from the public. We expect that the research community will further validate ProMeTA as well as the SLR results from the viewpoints of practitioners and researchers. Public input should not only lead to standard terminology and classification characteristics in the taxonomy, but also extend the taxonomy to include new categories and datasets that reflect its usage.

Footnotes

  1. 1.

    Another reason to exclude those handling only bytecodes is because bytecodes often do not contain all the source code information. For example, Java bytecodes lack information on type parameters in generic types due to a mechanism called “Type Erasure”.

  2. 2.

    Usually a single viewpoint corresponds to a single metamodel. However, KDM is a multi-viewpoint metamodel because a KDM specification provides a set of viewpoints that define a set of metamodel elements.

  3. 3.

    In some environments (Amelunxen et al. 2008; Minas 2006; Pedro et al. 2009), a graphical (i.e., visual) syntax can be used to define meta-models. However, such graphical syntax is usually intended to define domain specific modeling languages, which are graphical DSLs.

  4. 4.

    ASTM can also be a basis for deriving programming-language-specific AST models.

  5. 5.
  6. 6.

    It also finds papers having a keyword “meta-model”.

  7. 7.

    The Engineering Village performs duplicate removal only for first 1000 results.

  8. 8.

    For simplification and comprehension, possible exclusion dependencies between subfeatures of Category Specific and subfeatures of Supported Language are omitted. For example, if Java is selected as the Supported Language and Category Specific is selected as the Language Independence, then Object-Oriented must be also selected as Category Specific. However, listing all such dependencies makes the taxonomy complex.

  9. 9.

    “Abstraction level” does not discriminate methods and classes from functions and modules since the former pair is applied on objects; “language independent” specifies whether metamodels cope with objects.

  10. 10.

    These are the product of three subfeature sets: { Automatically, Manually, Automatically & Manually } ×{ Implicit, Explicit }× { Internal, External }.

  11. 11.

    For illustrative purposes, we named the “unnamed” metamodels by a combination of concepts in our conceptual framework and metalanguages such as the “Abstract Syntax Tree Model in UML” in Table 2.

  12. 12.

    This selection does not necessarily reflect actual adoption in program reverse engineering tools and projects.

  13. 13.

    A more comprehensive paper (Schauer et al. 2002) using the SPOOL Metamodel that is excluded from the SLR result.

  14. 14.

    Some recent papers (Sora 2015; Sora and Todinca 2016) using the UNIQ-ART Metamodel are not included in the SLR result.

  15. 15.

    Some of the metamodels (especially M1) can be extended to include metadata and history data.

  16. 16.
  17. 17.

Notes

Acknowledgements

We thank reviewers as well as the original developers of the metamodels addressed in the paper for their valuable comments, which significantly improved our paper. This work was supported by JSPS KAKENHI Grant Number 16H02804, IISF SSR Forum 2015 and 2016.

References

  1. Abdi MK, Lounis H, Sahraoui HA (2006) Analyzing change impact in object-oriented systems. In: Proceedings of the 32nd EUROMICRO conference on software engineering and advanced applications (EUROMICRO-SEAA). IEEE Computer Society, pp 310–319Google Scholar
  2. Aho AV, Lam MS, Sethi R, Ullman JD (eds) (2006) Compilers: principles, techniques, and tools, 2nd edn. Addison-Wesley Professional, BostonGoogle Scholar
  3. Alanen M, Porres I (2003) A relation between context-free grammars and meta object facility metamodels. TUCS Technical Report 606:1–13Google Scholar
  4. Alikacem EH, Sahraoui HA (2009) A metric extraction framework based on a high-level description language. In: Proceedings of the 9th IEEE International working conference on source code analysis and manipulation (SCAM). IEEE Computer Society, pp 159–167Google Scholar
  5. Amelunxen C, Kȯnigs A, Rȯtschke T (2006) MOSL: composing a visual language for a metamodeling framework. In: Proceedings of the IEEE symposium on visual languages and human-centric computing (VL/HCC). IEEE Computer Society, pp 81–84Google Scholar
  6. Amelunxen C, Klar F, Kȯnigs A, Rȯtschke T, Schu̇rr A (2008) Metamodel-based tool integration with MOFLON. In: Proceedings of the 30th international conference on software engineering (ICSE). ACM, pp 807–810Google Scholar
  7. Antoniol G, Penta MD, Merlo E (2003) YAAB (yet another AST browser): using OCL to navigate ASTs. In: Proceedings of the 11th international workshop on program comprehension (IWPC). IEEE Computer Society, pp 13–22Google Scholar
  8. Antoniol G, Penta MD, Gall HC, Pinzger M (2005) Towards the integration of versioning systems, bug reports and source code meta-models. Electron Notes Theor Comput Sci 127(3):87–99CrossRefGoogle Scholar
  9. Arcelli F, Masiero S, Raibulet C, Tisato F (2005) A comparison of reverse engineering tools based on design pattern decomposition. In: Proceedings of the Australian software engineering conference (ASWEC). IEEE Computer Society, pp 262–269Google Scholar
  10. Arcelli F, Zanoni M, Porrini R, Vivanti M (2010) A model proposal for program comprehension. In: Proceedings of the 16th international conference on distributed multimedia systems (DMS). Knowledge Systems Institute, pp 23–28Google Scholar
  11. Armstrong M, Trudeau C (1998) Evaluating architectural extractors. In: Proceedings of the 5th working conference on reverse engineering (WCRE). IEEE Computer Society, pp 30–39Google Scholar
  12. Bellay B, Gall H (1997) A comparison of four reverse engineering tools. In: Proceedings of the 4th working conference on reverse engineering (WCRE). IEEE Computer Society, pp 2–11Google Scholar
  13. Bergmayr A, Wimmer M (2013) Generating metamodels from grammars by chaining translational and by-example techniques. Proceedings of the 1st international workshop on model-driven engineering by example co-located with ACM/IEEE 16th international conference on model driven engineering languages and systems (MoDELS). CEUR Workshop Proc 1104:22–31Google Scholar
  14. Bravenboer M, Kalleberg KT, Vermaasc R, Visser E (2008) Stratego/XT 0.17. A language and toolset for program transformation. Sci Comput Program 72:52–70MathSciNetCrossRefGoogle Scholar
  15. Bru̇hlmann A, Gîrba T, Greevy O, Nierstrasz O (2008) Enriching reverse engineering with annotations. In: Proceedings of the 11th international conference on model driven engineering languages and systems (MoDELS). Springer-Verlag, pp 660–674Google Scholar
  16. Bruneliėre H, Cabot J, Dupė G, Madiot F (2014) MoDisco: a model driven reverse engineering framework. Inf Softw Technol 56(8):1012–1032CrossRefGoogle Scholar
  17. Buchner T, Matthes F (2006) Introspective model-driven development. Proceedings of the 3rd European workshop on software architecture (EWSA). Lect Notes Comput Sci 4344:33–49CrossRefGoogle Scholar
  18. Budgen D, Burn A, Brereton O, Kitchenham B, Pretorius R (2011) Empirical evidence about the UML: a systematic literature review. Softw Pract Exper 41(4):363–392CrossRefGoogle Scholar
  19. Canfora G, Penta MD, Cerulo L (2011) Achievements and challenges in software reverse engineering. Commun ACM 54(4):142–151CrossRefGoogle Scholar
  20. Chikofsky EJ, Cross JH (1990) Reverse engineering and design recovery: a taxonomy. IEEE Softw 7(1):13–17CrossRefGoogle Scholar
  21. Chirila CB, Jebelean C (2010) Towards programs logic based representation driven by grammar and conforming to a metamodel. In: Proceedings of the IEEE international joint conferences on computational cybernetics and technical informatics (ICCC-CONTI). IEEE Computer Society, pp 107– 112Google Scholar
  22. Cho ES (2005) Integrated meta-model approach for reengineering from legacy into CBD. In: Proceedings of the international conference on computational science and its applications (ICCSA). Springer-Verlag, pp 868–877Google Scholar
  23. Christopher CN (2006) Evaluating static analysis frameworks. http://www.cs.cmu.edu/~aldrich/courses/654/tools/christopher-analysis-frameworks-06.pdf
  24. Clark T, Sammut P, Willans J (2015) Applied metamodelling: a foundation for language driven development, 3rd edn. ArXiv e-prints pp 1–228Google Scholar
  25. Collard ML, Decker MJ, Maletic JI (2011) Lightweight transformation and fact extraction with the srcml toolkit. In: Proceedings of the 11th IEEE working conference on source code analysis and manipulation (SCAM), pp 173–184. IEEE Computer SocietyGoogle Scholar
  26. Czarnecki K, Helsen S (2003) Classification of model transformation approaches. In: Proceedings of the OOPSLA workshop on generative techniques in the context of model-driven architecture, pp 1–17. http://www.s23m.com/oopsla2003/mda-workshop.html
  27. Czeranski J, Eisenbarth T, Kienle HM, Koschke R, Plȯdereder E, Simon D, Zhang Y, Girard JF, Wu̇rthner M (2000) Data exchange in bauhaus. In: Proceedings of the 7th working conference on reverse engineering (WCRE). IEEE Computer Society, pp 293–295Google Scholar
  28. Demeyer S, Ducasse S, Tichelaar S (1999) Why unified is not universal? UML Shortcomings for coping with round-trip engineering. In: Proceedings of the 2nd international conference on the unified modeling language (UML), pp 630–644. Springer-VerlagGoogle Scholar
  29. Ducasse S, Lanza M, Tichelaar S (2000) MOOSE: an extensible language-independent environment for reengineering object-oriented systems. In: Proceedings of the 2nd international symposium on constructing software engineering tools (COSET). IEEE, pp 1–7Google Scholar
  30. Ducasse S, Anquetil N, Bhatti U, Hora AC, Laval J, Girba T (2011) MSE and FAMIX 3.0: an interexchange format and source code model family. HAL 00646884, pp 1–39Google Scholar
  31. Durelli RS, Santibȧṅez DSM, Delamaro ME, de Camargo VV (2014) Towards a refactoring catalogue for knowledge discovery metamodel. In: Proceedings of the 15th IEEE international conference on information reuse and integration (IRI). IEEE Computer Society, pp 569–576Google Scholar
  32. Earley J (1970) An efficient context-free parsing algorithm. Commun ACM 13 (2):94–102CrossRefzbMATHGoogle Scholar
  33. Ebert J (2008) Metamodels taken seriously: the TGraph approach. In: Proceedings of the 12th European conference on software maintenance and reengineering (CSMR). IEEE Computer Society, p 2Google Scholar
  34. Ebert J, Kullbach B, Winter A (1999) GraX - an interchange format for reengineering tools. In: Proceedings of the 6th working conference on reverse engineering (WCRE). IEEE Computer Society, p 89Google Scholar
  35. Ebert J, Kullbach B, Riediger V, Winter A (2002) GUPRO - generic understanding of programs, an overview. Electron Notes Theor Comput Sci 72(2):47–56CrossRefGoogle Scholar
  36. Ebraert P, Vallejos J, Costanza P, Paesschen E V, D’Hondt T (2007) Change-oriented software engineering. In: Proceedings of the 2007 international conference on dynamic languages (ICDL). ACM, pp 3– 24Google Scholar
  37. Engstrȯm E, Petersen K (2015) Mapping software testing practice with software testing research - SERP-test taxonomy. In: Proceedings of the 8th IEEE international conference on software testing, verification and validation (ICST) workshops. IEEE Computer Society, pp 1–4Google Scholar
  38. Favre L (2008) Formalizing MDA-based reverse engineering processes. In: Proceedings of the 6th International conference on software engineering research, management and applications (SERA). IEEE Computer Society, pp 153–160Google Scholar
  39. Favre JM, Nguyen T (2005) Towards a megamodel to model software evolution through transformations. Proceedings of the workshop on software evolution through transformations: model-based vs. implementation-level solutions (SETra). Electron Notes Theor Comput Sci 127(3):59–74CrossRefGoogle Scholar
  40. Favre JM, Godfrey M, Winter A (2003) First international workshop on meta-models and schemas for reverse engineering ateM 2003. In: van Deursen A, Stroulia E, Storey MAD (eds) Proceedings of the 10th working conference on reverse engineering (WCRE). IEEE Computer Society, pp 366–367Google Scholar
  41. Ferenc R, Sim SE, Holt RC, Koschke R, Gyimothy T (2001) Towards a stardard schema for C/C++. In: Proceedings of the 8th working conference on reverse engineering (WCRE). IEEE Computer Society, pp 49–58Google Scholar
  42. Ferenc R, Beszėdes Á, Tarkiainen M, Gyimȯthy T (2002) Columbus - reverse engineering tool and schema for C++. In: Proceedings of the 18th international conference on software maintenance (ICSM). IEEE Computer Society, pp 172–181Google Scholar
  43. Gall HC, Fluri B, Pinzger M (2009) Change analysis with evolizer and changedistiller. Softw IEEE 26(1):26–33CrossRefGoogle Scholar
  44. Garwick JV (1968) Programming languages: GPL, a truly general purpose language. Commun ACM 11(9):634–638CrossRefzbMATHGoogle Scholar
  45. Gîrba T, Ducasse S (2006) Modeling history to analyze software evolution. J Softw Maint Evol Res Pract 18(3):207–236CrossRefGoogle Scholar
  46. Glass RL (2002) Sorting out software complexity. Commun ACM 45(11):19–21CrossRefGoogle Scholar
  47. Gȯmez VU, Ducasse S (2012) Ring: a unifying meta-model and infrastructure for smalltalk source code analysis tools. Comput Lang Syst Struct 38(1):44–60Google Scholar
  48. Gȯmez VU, Kellens A, Brichau J, D’Hondt T (2009) Time warp, an approach for reasoning over system histories. In: Proceedings of the joint international and annual ERCIM workshops on principles of software evolution (IWPSE) and software evolution (EVOL) workshops. IEEE Computer Society, pp 79– 88Google Scholar
  49. Gray J, Zhang J, Lin Y, Roychoudhury S, Wu H, Sudarsan R, Gokhale A, Neema S, Shi F, Bapty T (2004) Model-driven program transformation of a large avionics framework. Proceedings of the 3rd international conference on generative programming and component engineering (GPCE). Lect Notes Comput Sci 3286:361–378CrossRefGoogle Scholar
  50. Guėhėneuc Y, Albin-Amiot H (2001) Using design patterns and constraints to automate the detection and correction of inter-class design defects. In: Proceedings of the 39th international conference and exhibition on technology of object-oriented languages and sytems (TOOLS). IEEE Computer Society, pp 296–306Google Scholar
  51. Harmer TJ, Wilkie FG (2002) An extensible metrics extraction environment for object-oriented programming languages. In: Proceedings of the 2nd IEEE international workshop on source code analysis and manipulation (SCAM). IEEE Computer Society, pp 26–35Google Scholar
  52. Heidenreich F, Johannes J, Seifert M, Wende C (2010) Closing the gap between modelling and java. Proceedings of the international conference on software language engineering (SLE). Lect Notes Comput Sci 5969:374–383CrossRefGoogle Scholar
  53. Holt R (1998) An introduction to TA: the tuple attribute language. Department of Computer Science, University of Waterloo and Toronto, pp 1–10. http://plg.uwaterloo.ca/~holt/papers/ta.html
  54. Holt RC, Schu̇rr A, Sim SE, Winter A (2006) GXL: a graph-based standard exchange format for reengineering. Sci Comput Program 60(2):149–170MathSciNetCrossRefzbMATHGoogle Scholar
  55. Imber M (1991) CASE data interchange format standards. Inf Softw Technol 33 (9):647–655CrossRefGoogle Scholar
  56. Ishizue R, Washizaki H, Fukazawa Y, Inoue S, Hanai Y, Kanazawa M, Namba K (2016) Metrics visualization technique based on the origins and function layers for OSS-based development. In: Proceedings of the IEEE working conference on software visualization (VISSOFT). IEEE Computer Society, pp 71–75Google Scholar
  57. ISO/IEC (1996) ISO/IEC 14977:1996 information technology - syntactic metalanguage - extended BNFGoogle Scholar
  58. ISO/IEC (2011) ISO/IEC 25010:2011 systems and software engineering - systems and software quality requirements and evaluation (SQuaRE) - system and software quality modelsGoogle Scholar
  59. ISO/IEC/IEEE (2011) ISO/IEC/IEEE 42010:2011 systems and software engineering - architecture descriptionGoogle Scholar
  60. Izquierdo JLC, Molina JG (2009) A domain specific language for extracting models in software modernization. In: Proceedings of the 5th European conference on model driven architecture-foundations and applications (ECMDA-FA). Springer-Verlag, pp 82–97Google Scholar
  61. Izquierdo JLC, Molina JG (2010) An architecture-driven modernization tool for calculating metrics. IEEE Softw 27(4):37–43CrossRefGoogle Scholar
  62. Izquierdo JLC, Molina JG (2014) Extracting models from source code in software modernization. Softw Syst Model 13(2):713–734CrossRefGoogle Scholar
  63. Jin D (2001) Exchange of software representations among reverse engineering tools. Technical Report 2001–454:1–131Google Scholar
  64. Jin D, Cordy J, Dean T (2002) Where’s the schema? A taxonomy of patterns for software exchange. In: Proceedings of the 10th international workshop on program comprehension (IWPC). IEEE Computer Society, pp 65–74Google Scholar
  65. Jin D, Cordy JR (2006) Integrating reverse engineering tools using a service-sharing methodology. In: Proceedings of the 14th IEEE international conference on program comprehension (ICPC). IEEE Computer Society, pp 94–99Google Scholar
  66. Jouault F, Bezivin J (2006) KM3: a DSL for metamodel specification. Proceedings of 8th IFIP International conference on formal methods for open object-based distributed systems. Lect Notes Comput Sci 4037:171–185CrossRefGoogle Scholar
  67. Jouault F, Bezivin J, Kurtev I (2006) TCS: a DSL for the specification of textual concrete syntaxes in model engineering. In: Proceedings of the 5th international conference on generative programming and component engineering (GPCE). ACM, pp 249–254Google Scholar
  68. Kang KC, Cohen SG, Hess JA, Novak WE, Peterson AS (1990) Feature-oriented domain analysis (FODA) feasibility study. Technical Report CMU/SEI-90-TR-21, pp 1–148Google Scholar
  69. Keller RK, Bėdard J, Saint-Denis G (2001) Design and implementation of a UML-based design repository. In: Proceedings of the 13th international conference on advanced information systems engineering (CAiSE). Springer-Verlag, pp 448–464Google Scholar
  70. Kienle HM, Mu̇ller HA (2010) Rigi - an environment for software reverse engineering, exploration, visualization, and redocumentation. Sci Comput Program 75 (4):247–263MathSciNetCrossRefzbMATHGoogle Scholar
  71. Kitchenham BA, Dyba T, Jorgensen M (2004) Evidence-based software engineering. In: Proceedings of the 26th international conference on software engineering (ICSE). IEEE Computer Society, pp 273–281Google Scholar
  72. Kitchenham B, Brereton OP, Budgen D, Turner M, Bailey J, Linkman S (2009) Systematic literature reviews in software engineering - a systematic literature review. Inf Softw Technol 51(1):7–15CrossRefGoogle Scholar
  73. Knodel J, Calderon-Meza G (2004) A meta-model for fact extraction from delphi source code. Electron Notes Theor Comput Sci 94:19–28CrossRefGoogle Scholar
  74. Kollmann R, Gogolla M (2001) Capturing dynamic program behaviour with UML collaboration diagrams. In: Proceedings of the 5th conference on software maintenance and reengineering (CSMR). IEEE Computer Society, pp 58–67Google Scholar
  75. Krasovec G, Howell S (1995) Applying the system engineering environment to the reengineering process. J Syst Integr 5(4):309–336CrossRefGoogle Scholar
  76. Kunert A (2008) Semi-automatic generation of metamodels and models from grammars and programs. Electron Notes Theor Comput Sci 211:111–119CrossRefGoogle Scholar
  77. Kurtev I, Bézivin J, Aksit M (2002) Technological spaces: an initial appraisal. In: International symposium on distributed objects and applications (DOA). Springer-Verlag, pp 1–6. http://doc.utwente.nl/55814/
  78. Lanza M (2003) CodeCrawler - lessons learned in building a software visualization tool. In: Proceedings of the 7th European conference on software maintenance and reengineering (CSMR). IEEE Computer Society, pp 409–418Google Scholar
  79. Lapierre S, Laguė B, Leduc C (2001) Datrix(™) source code model and its interchange format: lessons learned and considerations for future work. ACM SIGSOFT Softw Eng Notes 26(1):53–56CrossRefGoogle Scholar
  80. Laval J, Denier S, Ducasse S, Falleri J (2011) Supporting simultaneous versions for software evolution assessment. Sci Comput Program 76(12):1177–1193CrossRefGoogle Scholar
  81. Lethbridge TC (1998) Requirements and proposal for a software information exchange format (SIEF) standard http://www.site.uottawa.ca/~tcl/papers/sief/standardProposal.html
  82. Lethbridge TC, Tichelaar S, Ploedereder E (2004) The Dagstuhl middle metamodel: a schema for reverse engineering. Electron Notes Theor Comput Sci 94:7–18CrossRefGoogle Scholar
  83. Lin Y, Holt RC (2004) Formalizing fact extraction. Electron Notes Theor Comput Sci 94:93–102CrossRefGoogle Scholar
  84. Lin T, Robert Cheung Z H, Smith K (1998) Exploration of data from modeling and simulation through visualization. In: Proceedings of the 3rd international SimTect conference, pp 303–308Google Scholar
  85. Martinez L, Pereira C, Favre L (2014) Recovering sequence diagrams from object-oriented code - an ADM approach. In: Proceedings of the 9th international conference on evaluation of novel approaches to software engineering (ENASE). SciTePress, pp 188–195Google Scholar
  86. Meng C, Wong K (2004) A GXL Schema For Story Diagrams. Electron Notes Theor Comput Sci 94:29–38CrossRefGoogle Scholar
  87. Mens T, Lanza M (2002) A graph-based metamodel for object-oriented software metrics. Electron Notes Theor Comput Sci 72(2):57–68CrossRefGoogle Scholar
  88. Minas M (2006) Generating visual editors based on Fujaba/MOFLON and DiaMeta. In: Proceedings of the 4th international Fujaba days, pp 35–42Google Scholar
  89. Naik R, Bahulkar A (2004) A programmable analysis and transformation framework for reverse engineering. Electron Notes Theor Comput Sci 94:39–49CrossRefGoogle Scholar
  90. Nierstrasz O, Tichelaar E, Demeyer S (1998) CDIF as the interchange format between reengineering. In: Proceedings of the OOPSLA workshop on model engineering, methods and tools integration with CDIF. ACM, pp 1–8Google Scholar
  91. OMG (2011a) Architecture-driven modernization: abstract syntax tree metamodel (ASTM), version 1.0Google Scholar
  92. OMG (2011b) Architecture-driven modernization: knowledge discovery meta-model (KDM), version 1.3Google Scholar
  93. OMG (2015a) Meta object facility (MOF) core specification, version 2.5Google Scholar
  94. OMG (2015b) XML metadata interchange (XMI), Version 2.5.1. http://www.omg.org/spec/XMI/2.5.1/
  95. Ossher J, Bajracharya SK, Linstead E, Baldi P, Lopes CV (2009) SourcererDB: an aggregated repository of statically analyzed and cross-linked open source java projects. In: Proceedings of the 6th international working conference on mining software repositories (MSR). IEEE Computer Society, pp 183– 186Google Scholar
  96. Owens D, Anderson M (2013) A generic framework for automated quality assurance of software models - application of an abstract syntax tree. In: Proceedings of science and information conference (SAI). IEEE, pp 207–211Google Scholar
  97. Pedro L, Risoldi M, Buchs D, Barroca B, Amaral V (2009) Composing visual syntax for domain specific languages. In: Proceedings of the 13th international conference on human-computer (HCI). Springer-Verlag, pp 889–898Google Scholar
  98. Pėrez-Castillo R, de Guzmȧn IGR, Gȯmez-Cornejo R, Fernȧndez-Ropero M, Piattini M (2013) ANDRIU. A technique for migrating graphical user interfaces to android. In: Proceedings of the 25th international conference on software engineering and knowledge engineering (SEKE). Knowledge Systems Institute Graduate School, pp 516–519Google Scholar
  99. Pinzger M, Gall HC, Fischer M (2005) Towards an integrated view on architecture and its evolution. Electron Notes Theor Comput Sci 127(3):183–196CrossRefGoogle Scholar
  100. Reus T, Geers H, van Deursen A (2006) Harvesting software systems for MDA-based reengineering. In: Proceedings of the Second European conference on model driven architecture - foundations and applications (ECMDA-FA). Springer-Verlag, pp 213–225Google Scholar
  101. Saint-Denis G, Schauer R, Keller RK (2000) Selecting a model interchange format: the SPOOL case study. In: Proceedings of the 33rd Annual Hawaii international conference on system sciences (HICSS). IEEE Computer Society, pp 1–10Google Scholar
  102. Santibȧṅez DSM, Durelli RS, de Camargo VV (2015) A combined approach for concern identification in KDM models. J Braz Comput Soc 21(1):1–20CrossRefGoogle Scholar
  103. Schauer R, Keller RK, Lague B, Knapen G, Robitaille S, Saint-Denis G (2002) The SPOOL design repository: architecture, schema, and mechanisms. In: Erdogmus H, Tanir O (eds) Advances in software engineering, chap 13. Springer-Verlag, pp 269–294Google Scholar
  104. Schu̇rr A (1997) Developing graphical (software engineering) tools with PROGRES. In: Proceedings of the 19th international conference on software engineering (ICSE). ACM, pp 618–619Google Scholar
  105. Sharafi Z, Soh Z, Guėhėneuc Y (2015) A systematic literature review on the usage of eye-tracking in software engineering. Inf Softw Technol 67:79–107CrossRefGoogle Scholar
  106. Sim SE, Koschke R (2001) WoSEF: workshop on standard exchange format. ACM SIGSOFT Softw Eng Notes 26(1):44–49CrossRefGoogle Scholar
  107. Sim SE, Storey MA, Winter A (2000) A structured demonstration of five program comprehension tools: lessons learnt. In: Proceedings of the 7th working conference on reverse engineering (WCRE). IEEE Computer Society, pp 210–212Google Scholar
  108. Smite D, Wohlin C, Galvina Z, Prikladnicki R (2014) An empirically based terminology and taxonomy for global software engineering. Empir Softw Eng 19 (1):105–153CrossRefGoogle Scholar
  109. Soden M, Eichler H (2007) An approach to use executable models for testing. In: Proceedings of the 2nd international workshop on enterprise modelling and information systems architectures (EMISA). GI, pp 75–85Google Scholar
  110. Sora I (2012a) A meta-model for representing language-independent primary dependency structures. In: Proceedings of the 7th international conference on evaluation of novel approaches to software engineering (ENASE). SciTePress, pp 65–74Google Scholar
  111. Sora I (2012b) Unified modeling of static relationships between program elements. Proceedings of the 7th international conference on evaluation of novel approaches to software engineering (ENASE). Commun Comput Inf Sci 410:95–109Google Scholar
  112. Sora I (2015) Helping program comprehension of large software systems by identifying their most important classes. Proceedings of the 10th international conference on evaluation of novel approaches to software engineering (ENASE), vol 599Google Scholar
  113. Sora I, Todinca D (2016) Using fuzzy rules for identifying key classes in software systems. In: Proceedings of the IEEE 11th International symposium on applied computational intelligence and informatics (SACI). IEEE, pp 317–322Google Scholar
  114. Steinberg D, Budinsky F, Paternostro M, Merks E (eds) (2008) EMF: eclipse modeling framework, 2nd edn. Addison-Wesley Professional, BostonGoogle Scholar
  115. Strein D, Kratz H, Lȯwe W (2006) Cross-language program analysis and refactoring. In: Proceedings of the 6th IEEE international workshop on source code analysis and manipulation (SCAM). IEEE Computer Society, pp 207–216Google Scholar
  116. Tichelaar S, Ducasse S, Demeyer S, Nierstrasz O (2000) A meta-model for language-independent refactoring. In: Proceedings of the international symposium on principles of software evolution (ISPSE). IEEE Computer Society, pp 154–164Google Scholar
  117. Tilley SR, Wong K, Storey MD, Mu̇ller HA (1994) Programmable reverse engineering. Int J Softw Eng Knowl Eng 4(4):501–520CrossRefGoogle Scholar
  118. Tripathi V, Mahesh TSG, Srivastava A (2009) Performance and language compatibility in software pattern detection. In: Proceedings of the IEEE international advance computing conference (IACC). IEEE, pp 1639–1643Google Scholar
  119. Unterkalmsteiner M, Feldt R, Gorschek T (2014) A taxonomy for requirements engineering and software test alignment. ACM Trans Softw Eng Methodol 23(2):16:1–16:38CrossRefGoogle Scholar
  120. van den Brand M, Klint P (2007) ATerms for manipulation and exchange of structured data: it’s all about sharing. Inf Softw Technol 49(1):55–64CrossRefGoogle Scholar
  121. Vidȧcs L (2009) Refactoring of C/C++ preprocessor constructs at the model level. In: Proceedings of the 4th international conference on software and data technologies (ICSOFT). INSTICC Press, pp 232– 237Google Scholar
  122. W3C (2000) Extensible markup language (XML). http://www.w3.org/XML/
  123. Washizaki H, Fukazawa Y (2005) A technique for automatic component extraction from object-oriented programs by refactoring. Sci Comput Program 56(1–2):99–116MathSciNetCrossRefzbMATHGoogle Scholar
  124. Washizaki H, Guėhėneuc Y, Khomh F (2016a) A taxonomy for program metamodels in program reverse engineering. In: Proceedings of the 32nd IEEE international conference on software maintenance and evolution (ICSME). IEEE Computer Society, pp 44–55Google Scholar
  125. Washizaki H, Guėhėneuc Y, Khomh F (2016b) Patterns for program reverse engineering from the viewpoint of metamodel. In: Proceedings of the 23rd conference on pattern languages of programs (PLoP). ACM, pp 1–9Google Scholar
  126. Wimmer M, Kramler G (2005) Bridging grammarware and modelware. Proceedings of the satellite events at the MoDELS conference. Lect Notes Comput Sci 3844:159–168CrossRefGoogle Scholar
  127. Wu H (2010) Test case generation for programming language metamodels. Proceedings of the 1st doctoral symposium of the international conference on software language engineering (SLE). CEUR Workshop Proc 64:27–30Google Scholar

Copyright information

© The Author(s) 2018

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  • Hironori Washizaki
    • 1
    • 2
    • 3
    • 5
  • Yann-Gaël Guéhéneuc
    • 4
  • Foutse Khomh
    • 4
  1. 1.Department of Computer Science and EngineeringWaseda UniversityTokyoJapan
  2. 2.National Institute of InformaticsTokyoJapan
  3. 3.SYSTEM INFORMATION CO., LTDTokyoJapan
  4. 4.PTIDEJ–SWAT, DGIGLPolytechnique MontréalMontréalCanada
  5. 5.eXmotion Co., LtdTokyoJapan

Personalised recommendations