MemoRec: A Recommender System for Assisting Modelers in Specifying Metamodels

Model Driven Engineering (MDE) has been widely applied in software development, aiming to facilitate the coordination among various stakeholders. Such a methodology allows for a more efficient and effective development process. Nevertheless, modeling is a strenuous activity that requires proper knowledge of components, attributes, and logic to reach the level of abstraction required by the application domain. In particular, metamodels play an important role in several paradigms, and specifying wrong entities or attributes in metamodels can negatively impact on the quality of the produced artifacts as well as other elements of the whole process. During the metamodeling phase, modelers can benefit from assistance to avoid mistakes, e.g., getting recommendations like meta-classes and structural features relevant to the metamodel being defined. However, suitable machinery is needed to mine data from repositories of existing modeling artifacts and compute recommendations. In this work, we propose MemoRec, a novel approach that makes use of a collaborative filtering strategy to recommend valuable entities related to the metamodel under construction. Our approach can provide suggestions related to both metaclasses and structured features that should be added in the metamodel under definition. We assess the quality of the work with respect to different metrics, i.e., success rate, precision, and recall. The results demonstrate that MemoRec is capable of suggesting relevant items given a partial metamodel and supporting modelers in their task.

Abstract Model Driven Engineering (MDE) has been widely applied in software development, aiming to facilitate the coordination among various stakeholders.Such a methodology allows for a more efficient and effective development process.Nevertheless, modeling is a strenuous activity that requires proper knowledge of components, attributes, and logic to reach the level of abstraction required by the application domain.In particular, metamodels play an important role in several paradigms, and specifying wrong entities or attributes in metamodels can negatively impact on the quality of the produced artifacts as well as other elements of the whole process.During the metamodeling phase, modelers can benefit from assistance to avoid mistakes, e.g., getting recommendations like meta-classes and structural features relevant to the metamodel being defined.However, suitable machinery is needed to mine data from repositories of existing modeling artifacts and compute recommendations.In this work, we propose Mem-oRec, a novel approach that makes use of a collaborative filtering strategy to recommend valuable entities

Introduction
Model-Driven Engineering (MDE) [36] development relies heavily on metamodels and models which are used to represent an abstraction of real-world entities as well as to produce application code automatically.As a result, the modeling activity represents the core of this paradigm and should be carefully addressed to avoid possible errors in application deployment.Nowadays, modelers are equipped with several tools that support their tasks with different features, i.e., graphical environment, drag-and-drop utilities, and auto-completion.So far, different modeling assistants have been proposed to support modelers in their daily activities [3,12,23,24].Nevertheless, most of them deal with testing or repairing [2], and to the best of our knowledge there have been no approaches dedicated to supporting metamodel specification.In particular, packages and metaclasses are the building blocks of a metamodel, but so far there exists no tool to help modelers effectively specify these artifacts.In fact, while working on a metamodel, modelers might expect to get recommendations consisting of relevant metaclasses or structural features that can be further integrated.However, due to a huge amount of available resources, searching for suitable artifacts is a daunting task.Under the circumstances, we see an urgent need for suitable machinery to mine data from open source platforms such as GitHub.Among others, we are interested in finding which packages and metaclasses can be added to the metamodel under development, given that other packages and metaclasses are already defined.
In this work, we aim to provide modelers with an automated assistant, providing support during metamodeling activities.We propose MemoRec, a recommender system that exploits a context-aware collaborative filtering technique [5] to recommend relevant artifacts related to the modeling domain.In particular, MemoRec has been conceptualized by learning from a series of recommender systems developed in the scope of the CROSSMINER project [8] to mine open source software, providing developers with various artifacts, including topics [6], API invocations, and source code [29].MemoRec goes one step further to assist metamodeling activities by processing input data with four different encoding techniques (i.e., different selections of what information has to be kept from a metamodel).More importantly, we tailor the internal design to compute similarity among metamodels in an efficient way.Given a metamodel partially specified by the modeler, MemoRec is able to suggest two types of artifacts, namely (i) metaclasses at the level of package; and (ii) structural features for a given metaclass.To properly capture the current context, we rely on the data encoding technique that has been successfully conceptualized in our previous work [28].To our best knowledge, this is the first attempt to support MDE application development, exploiting collaborative filtering techniques.Thus, the contributions of our work are summarized as follows.
-A recommender system, named MemoRec, to provide modelers with classes and structural features relevant to the metamodels under development; -An empirical evaluation of the conceived system on two real metamodel datasets, employing a set of well-defined metrics commonly used in the recommender systems domain, i.e., success rate, precision, recall, and F 1 score; -The replication package of the tool has been made available to facilitate future research. 1hrough a careful observation, we realized that there exist no other tools that perform the same tasks, and thus it is not possible to compare our tool with any baseline.In the scope of this paper, we evaluate the per-formance of our proposed tool by relying on the k-fold cross-validation technique [18], aiming to investigate its practicality in real-world settings.
The paper is structured into the following sections.Section 2 shows a motivating example as well as background related to the context-aware collaborative filtering technique.In Section 3, we present MemoRec and its main components.The evaluation is presented in Section 4, while the results are analyzed in Section 5. A qualitative discussion of MemoRec is given in Section 6. Section 7 presents the work that is related to the approach presented in this paper.We conclude and discuss possible future work in Section 8.

Motivations and Background
This section describes the issues we tackle in this work and provides an overview of the background.In particular, Section 2.1 gives a motivating example where recommendations for completing a metamodel are needed.Afterward, Section 2.2 presents the context-aware collaborative-filtering technique, and its application in recommending API function calls and usage pattern as a base for further presentations.

Motivating example
We present a motivating example to illustrate the need for proper recommendations.The considered context is a modeler who is developing the Web metamodel depicted in Fig. 1(a).The metamodel consists of two packages, i.e., Web and Data, where the former models the presentation concepts and contains three classes, i.e., Static and Dynamic that inherit from Page.The latter groups data concepts such as Entity and Field.By considering a metamodel and an active context as input, a model assistant is expected to recommend both additional classes and structural features that the metamodel under development should incorporate.An active context is a package or a class that the modeler is attempting to define when asking for recommendations.In Fig. 1, let us suppose that the modeler is asking for additional classes and structural features for two given active contexts, i.e., the Web package and the Page class, being marked with the blue and the red color, respectively.
Fig. 1(b) depicts two possible recommendations for the given context, i.e., css structural feature (marked in red) within the Page class and Form class (marked in blue) within the Web package.To be concrete, the Form class allows the modeler to define complete HTML resources.By considering the possible structural feature recommendations Page class, the usage of css adds common and interesting information to Pages in the Web domain.In other words, a modeler assistant should consider the active context to predict suitable recommendations that help the modeler complete the context.

Context-aware collaborative filtering technique
A context-aware collaborative filtering recommender system provides recommendations to users items that have been bought by similar users in similar contexts [34,35].Based on this premise, we successfully developed a recommender system named FOCUS to provide developers with API function calls and usage patterns [27].We modeled the mutual relationships among projects using a tensor and mined API usage from the most similar projects.
The successful deployment of different variants of collaborative filtering techniques allows us to transfer the acquired knowledge into the MDE domain.We build MemoRec by redesigning and customizing FOCUS to support the completion of metamodels.Mining from similar objects is a building block of collaborative filtering techniques, and we exploit this to provide recommendations for metamodels under development.In particular, our approach works based on the assumption that: "if metamodels share some common artifacts, then they should probably have other common artifacts."In this respect, MemoRec mines meta-classes and structural features from similar metamodels, given an input metamodel.The following section presents our conceived approach to recommend useful elements for a metamodel being under development.

Proposed Approach
MemoRec provides modelers with recommendations, which can be helpful while defining a metamodel, by considering the existing portion of the metamodel as active context.The system makes use of a graph representation to encode the relationships among various metamodels artifacts, and generates recommendations employing a context-aware collaborative filtering technique [35].In addition, we exploit a tailored textual representation of metamodels, which are encoded to enable the extraction of the containing knowledge to feed as input for the recommendation engine.
The architecture of MemoRec is depicted in Fig. 2. To provide recommendations, MemoRec accepts as input a set of Metamodel Repositories 1 .Afterwards, the Metamodel Encoder component 2 extracts packages and classes pairs as well as class and structural fea-ture pairs from the metamodel being developed.Metamodel Comparator, a subcomponent of Similarity Calculator 3 , measures the similarity between the metamodels stored in the repositories and the metamodel under specification.Using the set of metamodels and the information extracted by Metamodel Encoder, the Data Encoder component 4 computes rating matrices.Given an active context of a metamodel, i.e., package or class, Item Comparator computes the similarities between packages and classes.From the similarity scores, Recommendation Engine 5 generates recommendations, either as a ranked list of classes if the active context is a packages, or a ranked list of structural features if the active context is a class.In the remainder of this section, we present in greater details each of these components.

Metamodel Encoder
A metamodel defines the abstract concepts of a domain where concepts, as well as the relationships among them, are expressed by the used modeling infrastructure.In particular, a metamodel consists of Packages that aggregate similar concepts expressed by Classes.Classes consist of structural features, i.e., attributes and references.Moreover, Classes can inherit structural features from other classes.
Being inspired by our recent work [28], we employ four encoding schemes to represent different views concerning the terms extracted from packages, classes, and structural features named instance.Each scheme has been used to elicit relevant information from the input metamodels according to different granularity levels.In particular, we make use of two standard encoding (SE) scheme for recommending structural features within a context class and two other improved encoding (IE) scheme for supporting classes within a package context.In particular, we consider the following definitions: -SE s : it includes pairs in the form of <class nam-e>#<structural feature name> for each structural feature contained within a class.This encoding scheme is used to suggest additional structural features within a given class context; -IE s : it consists of pairs <class name>#<structural feature name> for each structural feature contained within a class.Moreover, it includes structural features inherited from the super classes.In our previous work [9], we studied how structural features are used with hierarchies.On one hand, we found out that increasing the number of metaclasses with super-types decreases the average number of    structural features directly specified in a metaclass, since structural features are spread through the class hierarchies.On the other hand, the average number of structural features including the inherited ones is uncorrelated with the number of metaclasses with super-types.We anticipate that, by including the inherited structural features in IE s , we will be able to increase the informative part of the encoding.We use this scheme to suggest additional structural features within a given class context; -SE c : it includes pairs <package name>#<class na-me> for each class contained within a package.This encoding scheme is utilized in providing additional classes within a given package context; -IE c : it flattens packages and classes and encodes classes within a default artificial package.We use this encoding scheme to suggest additional classes within a given package context.Since metamodels consist of few ePackages [9], we envision that a flatten representation of classes, i.e., by bypassing the Package/Class containment can help MemoRec to consider more metamodels and to extract classes from different top similar metamodels.
An encoding scheme depends on two main factors: -the purpose of the recommendation; depending on the type of the recommended items (e.g., structural features, classes, specialization/generalization of a metaclass, etc.) the encoding scheme should be tailored to support the identified recommendation goal; -the prediction performance; the encoding scheme strongly impacts on the prediction performance.For this reason, the identification of suitable encoding scheme is an iterative process where the encodings are incrementally improved to maximize the prediction performance for a specific purpose.
As future work, further encoding schemes could be provided to target different kinds of recommendations.For instance, an encoding scheme representing the inheritance relations between classes could suggest a possible set of generalizations or specializations for a given metaclass in the active context.In addition, a different encoding scheme could be used to include types for the recommended structural features.
Fig. 3 depicts an extract of the four encoding schemes related to the metamodels depicted in Fig. 1(a).In the following subsection, we show how the pairs package/class and class/structural feature relationships are encoded.In Section 4, we evaluated MemoRec by con-  sidering the four encoding schemes described in this section, i.e., SE s , IE s , SE c , and IE s .

Data Encoder
Once package and class pairs as well as class and structural features have been extracted, MemoRec represents the relationships among them using two rating matrices to support class and structural feature recommendations.Given a metamodel, each row in the matrix corresponds to a package (class), and each column represents a class (structural feature).A cell is set to 1 if the package (class) in the corresponding row contains the class (structural feature) in the column, otherwise it is set to 0. Table 1 and Table 2 illustrate how the metamodel depicted in Fig. 1(a) is encoded into corresponding rating matrices.In particular, Table 1 shows the rating matrix combined with SE c , whereas Table 2 reports the rating matrix combined with IE s .
A 3D context-based ratings matrix is introduced to model the intrinsic relationships among various metamodels, package (classes) and class (structural feature).The third dimension of this matrix represents a metamodel, which is analogous to the so-called context in context-aware collaborative filtering systems.For example, Fig. 4 depicts three metamodels M = (m a , m 1 , m 2 ) represented by three slices with four classes and five structural features: m a is the active metamodel and it  has an active context highlighted in dark gray.Both m 1 and m 2 are complete metamodels similar to m a , and they are called background data, as they serve as a base for the recommendation process.On one hand, the more background metamodel we have, the better is the chance that we recommend relevant structural features.On the other hand, increasing the number of top similar metamodels will enlarge the ratings matrix, and thus will add more computational complexity.

Similarity Calculator
The recommendation of suitable metamodel items, i.e., classes or structural features, is derived from similar metamodels and the active context, i.e., packages or classes.Similarity Calculator is a generic component, it can be used to compute similarity for both classes and structural features.Given an active context of a metamodel under development, it is essential to find the subset of the most similar ones, and then the most similar contexts in that set of metamodels.Based on the active context type, we create a weighted directed graph that models the relationships among metamodels and structural features to compute similarities.Moreover, we implemented a graph-based similarity function [7,27] to calculate the similarities among metamodels.
In particular, we used two graph representations to support both class and structural feature recommendations.Each node in the graph represents either a metamodel or a structural feature.If metamodel m contains structural feature f , then there is a directed edge from m to f .The weight of the edge m → f corresponds to the number of times m includes f .Figure 5 depicts the graph for the set of projects in Fig. 4: white nodes represent structural features, blue nodes represent most similar metamodels to the input ones depicted in green.For instance, the Web metamodel has five classes and two of them define the attribute name.As a result, the edge Web → name contains a weight of 2. In the graph, a question mark represents missing information, i.e., for the active declaration in Web, we need to find out if invocations links and css shall be included or not.
Given the node representing a metamodel m, there are nodes connected to m via different edges, and they are called neighbor nodes.Considering (n 1 , n 2 , .., n l ) as a set of neighbor nodes of m, the feature set of m is the vector φ = (φ 1 , φ 2 , .., φ l ), where φ k is the weight of node n k , and computed using the term-frequency inverse document frequency function computed by the following formula: where f n k is the weight of the edge m → n k ; |M | is the number of all considered metamodels; and a n k is the number of metamodels connected to n k .
The similarity between two metamodels m and n is comprehended as the cosine between their feature vectors φ = {φ k } k=1,..,l and ω = {ω j } j=1,..,z , computed below: where π is the cardinality of the union of the sets of nodes by m and n.
Finally, the similarity between classes c and d is calculated with the Jaccard index given below: where F(c) and F(d) are the sets of structural features for c and d, respectively.By referring to the motivation example proposed in Section 2.1 and depicted in Fig. 5, we present a concrete application of the proposed formalisms.Table 3 reports the φ vectors for the Web, M1, and M2 metamodels.Then, Table 4 lists the cosine similarity among vectors.Each cell reports the sim 1 score between the metamodels represented in the corresponding column and row.According to the results reported in Table 4, the Web metamodel is more similar to M1 than to M2 .

Recommendation Engine
This component is used to generate a ranked list of relevant items, i.e., classes and structural features that depend on the metamodel context, i.e., package and class.In the rest of this section, we present structural feature recommendations based on the class context.Analogously, we apply the same approach for recommending classes within packages.
Figure 4 depicts an instance of structural features rating matrices.In particular, the active metamodel m a already includes three classes, and the modeler is working on the fourth class, corresponding to the last row of the matrix.The active class c a contains two structural features, represented in the last two columns of the matrix, i.e., cells marked with 1.The first two cells are filled with a question mark (?), implying that at the time of consideration, it is not clear whether these two structural features should also be added into c a .Recommendation Engine computes the missing ratings to predict additional structural features for the active class by exploiting the following collaborative filtering formula [5,27]: Equation 4 is used to compute a score for the cell representing structural feature f , class c of metamodel m, where topsim(c) is the set of top-N similar classes of c, sim 2 (c, d) is the similarity between two classes c and d, computed by Equation 3; r c and r d are calculated by averaging out all the ratings of c and d, respectively; R d,f,m is the combined rating of d for f in all the similar metamodels, computed in Equation 5 [5].
where topsim(m) is the set of top similar metamodels of m; and sim 1 (m, n) is the similarity between metamodels m and n, calculated by means of Equation 2. Equation 5 suggests that given the active metamodel, a more similar metamodel is assigned a higher weight.This makes sense in practice, since similar metamodels contain more relevant structural features than less similar metamodels.Using Equation 4we compute all the missing ratings in the active class and get a ranked list of structural features with real scores in descending order.The list is then provided to modelers as recommendations.

Evaluation
This section describes the datasets and the process we conceived to evaluate MemoRec.In particular, Section 4.1 presents the research questions to study the performance of our proposed approach.Section 4.2 gives an overview of the datasets used in the evaluation.The methodology and metrics are described in Section 4.3 and Section 4.4, respectively.

Research Questions
The following research questions are considered to investigate MemoRec's recommendation performance: -RQ 1 : How well can MemoRec provide recommendations with different configurations?We examine different configurations of MemoRec, i.e., the number of recommended items as well as the number of neighbor metamodels, to find the settings that bring the best performance.-RQ 2 : How does the training data affect the performance of MemoRec?We study the outputs of Mem-oRec by considering two different datasets to assess to what extent their quality can have ripple effects on the prediction accuracy of MemoRec; -RQ 3 : How do the encoding schemes affect the performance of MemoRec?Since the definition of a suitable encoding scheme is an iterative process, we compared refined versions of the encodings, i.e., IE c and IE s , with the initial ones, i.e., SE c and SE s to pin down which of them facilitates the best recommendation outcome for MemoRec.

Data Extraction
To evaluate the proposed approach, we exploited two independent datasets namely D 1 and D 2 as shown in Table 5, and they are described below.
-D 1 is a curated dataset [43], which consists of 555 metamodels mined from GitHub and already labeled by humans.In particular, its metamodels have been already classified into nine categories, i.e., Bibliography, Issue tracker, Project build, Review system, Database, Office tools, Petrinet, State machine, and Requirements specification.Though MemoRec does not require the input data to be labeled, such predefined categories are beneficial to the recommendation as there is a high similarity among the metamodels within a category.This is important since MemoRec heavily relies on similarity to function (see Section 3), i.e., given a metamodel, it searches for relevant items from similar metamodels; -D 2 is a raw and randomly collected dataset using the GitHub API [1].To aim for a reliable evaluation of MemoRec, we identified and filtered out from the dataset all the duplicated metamodels, resulting in a final set with 2,151 metamodels.By means of the GitHub API [1] we searched for files with the .ecoreextension, which corresponds to Ecore metamodels.Due to the restrictions imposed by GitHub, e.g., it returns a maximum of 1,000 elements per query, we had to perform the searches by iteratively varying the query keywords.In particular, we used the extension qualifier as a base to search for ecore files.Then, we refined the query by adding typical ecore keywords, i.e., ePackage, xml, eClass, to name a few.Afterward, all the discovered metamodels were downloaded and collected in a dedicated folder.We removed all the files that we cannot directly parse with the EMF facilities [39] from the corpus of collected artifacts.Finally, we removed the duplicated metamodels by the following process: (i) a hash is computed for every collected ecore file based on its content; (ii) the obtained hashes are used to build a hashmap where the key is the hash itself, and the value is the corresponding file; (iii) if a duplicated key occurs in the map, we assume that the corresponding ecore is a duplicate and it is discarded.

Methodology
As described in Section 3, MemoRec can recommend classes or structural features, i.e., attributes and references, depending on the recommendation context, i.e., packages or classes.For this reason, we perform the experiments by exploiting both classes and structural feature recommendations.In the rest of this section, we use classes within packages and structural features within classes as the recommendation objective.
To study if MemoRec is applicable in real-world settings, we perform an offline evaluation by simulating the behavior of a modeler who partially defines a metamodel and needs pratical recommendations on how to do next.Figure 6 depicts the evaluation process with three consecutive steps, i.e., Data Preparation, Recommendation Production, and Outcome Evaluation explained as follows.
-Data Preparation.As seen in Fig. 6

Metrics
We introduce the following notations as a base for further presentation: -N is the cut-off value for the ranked list of recommendations.MemoRec returns a ranked list of recommendation items, and those on the top of the list are considered to be more relevant to the given context.The cut-off N is used to select the top items.For instance, if MemoRec retrieves four items, e.g., FormElement, Form, Link, and Page, and the cut-off value N is set to 2, then only the first two recom- Then the metrics, i.e., success rate, precision, and recall are defined as follows.
SR@N.Given the testing set of metamodels M, success rate measures the ratio of queries that have at least a matched item among the total number of queries.
In particular SR stands for success rate, and @N corresponds to a cut-off value of N. For example, SR@2 means success rate for a recommended list with 2 items.
Precision, Recall, and F 1 -score (F-measure).These metrics are utilized to measure the accuracy of the recommendation results.In particular, precision corresponds to the ratio of TP N (m) among the number of recommended items, recall is the ratio of TP N (m) belonging to GT(m), whereas F 1 -score is the harmonic mean of precision and recall.
Recommendation time.To measure the time needed to perform a prediction, we used a laptop with 2,7 GHz Intel Core i7 quad-core 16GB RAM, and macOS Catalina 10.15.5.

Experimental Results
This section analyzes the performance obtained by running MemoRec on the considered datasets.The three research questions are addressed in Section 5.1, Section 5.2, and Section 5.3.Section 5.4 discusses possible threats to validity.

RQ 1 : How well can MemoRec provide recommendations with different configurations?
We conducted experiments on the curated dataset (i.e., D 1 ) by varying the number of neighbour nodes k of the input metamodel, i.e., k = {1, 5, 10, 15, 20}, and the value of N, i.e., N = {1, 10, 20}.The rationale behind the selection of those values is as follows.First, we should not present a long list of recommended items since it may confuse the modeler, thus we select N = 20 as the maximum value.Second, since the number of neighborhood items impacts on the computational complexity (cf.Equation ( 5)), it is impractical to use a large number of metamodels as neighbors.Therefore, we consider the following values k = {1, 5, 10, 15, 20}.In the experiments, we use SE c and SE s as encoding schemes to compute recommendations for classes and structural features, respectively (the results related to the adoption of the other encoding schemes are presented in the discussion of RQ 3 ).
Table 6 and Table 7 show the average success rates obtained by running the ten-fold cross-validation technique to recommend structural features and classes, respectively by using different cut-off values N .In particular, Table 6 depicts the success rate obtained for recommending structural features classes.According to Table 6, we can observe that using more neighbour metamodels to compute recommendations brings a better success rate when the first recommendation item is considered.For instance, MemoRec gets a success rate@1 of 0.153 and 0.202 when k=1 and k=20, respectively.This is not confirmed when a longer list of recommendation items is considered, i.e., N = {10, 20}.Moreover, MemoRec yields a better performance when we increase the cut-off value N. Take as an example, for k=20, N=1, the obtained success rate is 0.202 which is less than a half of 0.479, the corresponding value when N=20.A longer list of recommended items means an increase in the match rate, however the modeler may tire of skimming through it.Thus, in practice, we should choose a suitable cut-off value N.   6, we see that incorporating more neighbors to compute recommendations is useful for a small k, i.e., k = {1, 5, 10}.However, starting from k=15, there is a decrease in success rate by all the cut-off values N. We suppose that this happens due to the adoption of new neighbors that introduces only noise.
Answer to RQ 1 .Considering a certain number of similar metamodels contributes to more relevant recommendations.Using data encoded with the SE c and SE s encoding schemes allows MemoRec to predict better classes within a package than structural features within a class.Moreover, by considering a longer list of recommended items, MemoRec obtains an increase in success rate.

RQ 2 : How does the training data affect the performance of MemoRec?
We conducted similar experiments previously presented by measuring also the performance induced by the adoption of the dataset D 2 .As previously described in Section 4.2, D 1 and D 2 are different in terms of size and quality.In particular, D 1 contains different groups of similar metamodels.Each group is labeled to refer the application domain that the metamodels in the considered group are intended to describe.Thus, as done for According to Table 8(a), it is evident that using more neighbour metamodels to compute recommendations brings a better success rate for both datasets when the first recommended item is considered.An increasing number of neighbor k does not improve the success rate values when a longer list of recommendations is considered, i.e., SR@10 and SR@20.However, by using the randomly created dataset D 2 success rate is lower than that of D 1 .For instance, with D 1 , MemoRec gets a success rate@1 of 0.159 and 0.202 when k=1 and k=20, respectively, whereas with D 2 the corresponding values are 0.114 and 0.161.The same trend can also be seen with other cut-off values.As in the case of D 2 , Memo-Rec yields a better performance when we increase the cut-off value N. Take as an example, with D 2 and k=20, N=1, the corresponding success rate is 0.178 which is less than a half of 0.373, the corresponding value when N=20.In any case, the success rate related to the adoption of D 2 is always lower than that of D 1 .
Table 8(b) report the success rate obtained with class recommendations by comparing the adoption of D 1 and D 2 .The decrease in accuracy related to the adoption of D 2 as shown in Table 8(a) is confirmed also in Table 8(b).To further study MemoRec's performance, we compute and report in Table.9, Table 10, and Table 11 the precision, recall, and f-measure D 1 and D 2 .For this setting, the number of recommended items N was varied from 1 to 10, attempting to examine the performance for a considerably long list of items.The value of k was varied of 1 to 20 with 5 as the step, considering a large number of neighbors.
Table. 9 and Table.10 confirm that by considering a curated dataset with more similar metamodels, Mem-oRec has better prediction performance than using a raw dataset.Moreover, MemoRec reaches better prediction performance in recommending structural features classes than classes packages.Answer to RQ 2 .The quality of the input data plays a key role in MemoRec's performance.Curated datasets with more similar metamodels allow MemoRec to improve its prediction performance, even if the size of such datasets is smaller than that of those randomly collected.

RQ 3 : How do the encoding schemes affect the performance of MemoRec?
Before addressing this research question, we discuss the recommendations produced by MemoRec on the exam-    by considering the Bibtex package as the active context.We briefly summarize the four proposed schemes in Section 3.1 as follows.SE s consists of class-structural feature pairs for each structural feature directly defined within a class while IE s also includes structural feature inherited from the superclasses; IE c consists of packageclass pairs while SE c flattens packages within a default artificial package.
Table 12 shows the list of top-10 recommended items for each encoding scheme, i.e., SE c , IE c , SE s , and SE c .As described in Section 3.1, the type of the active context discriminates which encoding MemoRec needs to adopt.By observing the outcomes of Table 12, it can be seen that using IE s and SE s produces diverse recommended items.The result provided by IE s seems to contain more generic recommendations, e.g., id, actor, attachment, to name a few, while SE s provides more focused recommended structural features.This intuition is confirmed by manually analyzing significant recommendation examples.Though both IE s and SE s provide useful suggestions to the modeler, the augmented information of SE c allows MemoRec to predict more specific structural features.Referring to the scores obtained with IE c and SE c , we conclude that the results depend on the similarities among the training metamodels.By flattening package structure, SE c mainly uses class names to compute similarities, while IE c includes also package structure.So, when a similar metamodel is identified, SE c can recommend classes that belong to any package.
To answer RQ 3 , we conducted experiments on D 1 by comparing pairwise the encoding schemes, i.e., SE s versus IE s and SE c versus IE c .As we mentioned before, we should select a suitable number of neighbor metamodels to compute recommendations to maintain a trade-off between efficiency and effectiveness.Thus, to simplify the evaluation, we set the number of neighbors k to 5. 2 The evaluation metrics computed by the encoding schemes for structural features and classes are reported in Table 13(a) and Table 13(b), respectively.
Table 13(a) demonstrates an evident outcome: by using IE s as the encoding scheme, we obtain a better prediction performance than that when using SE s .For instance, given N = 1, we get a success rate of 0.241 and 0.181 for IE s and SE s encodings, respectively.Similarly by other evaluation metrics, i.e., precision and recall, IE s helps achieve a superior performance.When we consider larger cut-off values, i.e., N = {5, 10, 15, 20}, IE s is always beneficial to the recommendation outcomes, as it brings in higher quality indicators.Take as an example, with N=20, using IE s yields a success rate of 0.604, which is much higher than 0.489, the corresponding value for SE s .
Next, we analyze the recommendations by using SE c and IE c as shown in Table 13(b).For success rate and precision, using IE c help MemoRec perform better than using SE c .However, IE c negatively impacts on the recall values.In our opinion, it is due to the flattening operation which affects the similarity function, making metamodels too similar.In this case, the recommender engine is limited to suggests the most common classes.For instance, because of metamodelling best-practice, NamedElement is commonly used in a metamodel.This impacts on success rate and precision, but recall goes down because the recommended items do not depend on the metamodel context.
Through the experiment, we see that a suitable encoding scheme fosters better prediction performance.In this respect, we believe that the introduction of Natural Language Processing (NLP) steps, i.e., stemming, lemmatization, and stop words removal, can boost up the accuracy of MemoRec.In particular, preprocessing steps can be employed to reduce the usage of different terms with very close semantics and, thus, to increase the corresponding term usages.For instance, the word "reference" and its plural form "references" are two conjugations of the same noun.Even though the two words have the same semantics, currently, MemoRec does not match those two terms and considers them different by negatively affecting the resulting Memo-Rec performance.We consider the integration of NLP techniques as our future work.
The average execution time among ten folds on various values of k for both datasets is depicted in Fig. 7(a) and Fig. 7(b).It can be seen that while IE s helps Mem-oRec achieve a good prediction performance, it sustains a high computational complexity, resulting in prolonged execution time.This is understandable since compared to other techniques, the encoding scheme incorporates more information from metamodels for its computation.
Answer to RQ 3 .Inherited structural features (IE s ) enable MemoRec to achieve a superior performance compared to using SE s , despite a higher computational complexity.With IE s , MemoRec predicts better structural features within a class than classes within a package.

Threats to validity
In this section we give a discussion of threats, which might harm the validity of the performed experiments.In particular, we discuss threats with respect to internal and external validity as follows.Internal validity.Such threats refer to internal factors that might affect the outcomes of the performed experiments.A possible threat is represented by the datasets that have been used for the experiments.We mitigated such a threat by using two completely different datasets, and one of them has been randomly created without performing any data curation activities.Another internal threat to validity factor is represented by the adopted encoding schemes.Also in this case, we mitigated the issue by employing different encoding schemes.However, by considering data and encoding dimensions, we managed to identify distinctive characteristics of the approach that resulted to be valid independently from the adopted encoding schemes and input data sets, i.e., graph builder, similarity calculator, and recommendation engine.External validity.It is related to factors that can affect the generalizability of our findings, by possibly making the obtained results not valid outside the scope of this study.We mitigated the issue by evaluating MemoRec in different scenarios, with the aim of simulating several usages of the systems, e.g., by varying the number of neighbour metamodels, and the size of the list of recommended items.Another threat to validity can be the fact that currently, we do not consider the sequences of actions that are operated to lead to a given metamodel.We believe that alternative approaches like LSTM (Long Short-Term Memory) [16] can be a possible candidate to produce recommendations that rely on creation sequences.Moreover, it is crucial to investigate how modelers perceive MemoRec.In this respect, we plan to conduct a user study where human evaluators are asked to give their assessment for recommendations provided by MemoRec.

Discussion
Mussbacher et al. [25] proposed a conceptual framework to characterize and compare approaches for intelligent modeling assistance (IMA).The proposed assessment grid consists of nine properties, namely Model, Autonomy, Relevance, Confidence, Trust, Explainability, Quality Degree, Timeliness, and Quality regarding external sources.We provide a qualitative discussion of MemoRec with respect to such properties as follows.
With Model, the authors [25] refer to the quality level of the models recommended by the considered IMA.Such a quality is measured with syntactic, semantic, and pragmatic quality.Concerning such a dimension, MemoRec ensures syntactic quality as long as it is trained with syntactically correct artifacts.
With the Autonomy property, authors refer to what extent the IMA under analysis is autonomous in gathering context information or user feedback without any user intervention.Such a property is mainly related to the tool gearing the considered IMA.MemoRec cannot be analyzed concerning the Autonomy dimension because, at this stage, we focused on the algorithmic part of the approach and deferred its integration in a supporting environment as a next step.
The Relevance property is related to the degrees of precision and recall of the recommendations provided by the adopted IMA with respect to modeler intensions.As shown in Section 5, MemoRec has been evaluated by resembling different configurations, and the obtained accuracy is satisfactory, and it very much depends on the quality of the training data.
With the Confidence property, Mussbacher et al. [25] aim at measuring how often the IMA under analysis provides a confidence value for the recommended items.Similarly to all the analyzed IMAs [25], MemoRec does not provide any confidence value for the recommended metamodel elements.Indeed, this represents an interesting and important improvement for MemoRec.
Trust has been defined as "the perception that the modeler has about the quality of an IMA."This represents an important property that we plan to assess once we have to evaluate the quality of the supporting tool integration of MemoRec in an existing IDE, e.g., Eclipse.
The Explainability property is another essential characteristic that is related to the emerging Explainable AI research field [14], which aims at existing AI techniques and tools to make produced outcomes understood by humans.The current version of MemoRec cannot provide any explanations related to recommended elements.An interesting extension can be complementing recommended metamodel elements with the sources of the most representative metamodels that triggered the given recommendations.The Quality Degree dimension measures "the degree of excellence of the IMA to address the needs of a modeler."Similar to all the approaches that have been analyzed [25], MemoRec relies on external sources to provide modelers with recommendations that are valid for the active context.
Since Timeliness refers to the user satisfaction for a given IMA, such a quality metric cannot be assessed for MemoRec at this stage.Instead, we plan to evaluate it once MemoRec is integrated into an existing IDE, like Eclipse.
The quality regarding external sources property refers to the quality of the IMA under analysis concerning its external sources.As shown in Fig. 2, MemoRec is repository independent as long as it is possible to download the available modeling artifacts.However, as also discussed in Section 5.2, being MemoRec a data-driven approach, the quality of the mined metamodels has an impact on the performance of MemoRec.Consequently, if the system is trained with low-quality metamodels, also recommendations would be of limited quality.
7 Related Work

Existing modeling assistants
The Extremo tool [24] has been proposed to assist modelers in a platform-independent way.By relying on miscellaneous resources excerpted from the context, it creates shared data employed to develop a flexible query mechanism.This querying system is used to explore and find useful entities that help the modeler complete the model under construction.To assess the quality of the work, Extremo has been used to implement a DSL for the financial domain and validate its soundness through different use cases.The tool has been fully integrated into the Eclipse IDE as a plugin.In contrast with Mem-oRec that extracts the query directly from the metamodel under development, Extremo requires a modeler to perform custom or pre-defined queries to search for information chunks.
Papyrus [12] is a system to support domain-specific model specification by exploiting UML profiles.The mapping between each profile's metaclass and real Java classes is performed using EMF generator model utilities.Furthermore, the tool embeds a palette and a context menu to graphically specify the selected metaclasses.Though the results are promising, more features could be added to enrich the experience, e.g., proactive triggering of recommendations or fine-grained customizations using EMF utilities.By transforming UML profiles to EMF metamodels, Papyrus supports the development of domain-specific environments.Although the generated domain-specific environments facilitate editing a model, Papyrus does not support the modeler with suggestions to complete the input profile from which the EMF metamodel is generated nor edit a model that conforms to those specifications.
Batot and Sahraoui [3] introduced a modeling assistant based on a multi-objective optimization problem (MOOP).A well-founded evolutionary algorithm, namely NSGA-II, is employed to obtain representative models using an initial set provided by the user.According to the Pareto optimality definition, the algorithm is able to solve the MOOP to find relevant candidates.In this way, NSGA-II retrieves partial models to be completed by expert-domain users who can personalize the obtained results by selecting different coverage degree or changing pre-defined minimality criteria.To assess the quality of the work, the proposed NSGA-II adaptation was compared with random and mono-objective functions.Experimental results showed that the MOOP adaptation outperforms the baselines.Different from MemoRec that suggests possible metamodel elements, the approach in [3] generates a set of models for various MDE tasks, e.g.,, testing automated learning.
López-Fernández et al. [23] presented an exampledriven tool to recommend a complete metamodel starting from model fragments specified by graphical tools, e.g., Visio, PowerPoint, Dia.The tool extracts untyped model fragments using initial examples as the starting point.Then, it infers an agnostic metamodel from them that the modeler can then enrich.In contrast with MemoRec, metamodels are generated by an iterative and inductive process where model fragments are given either sketched by domain experts using drawing tools or by a compact textual notation.
AVIDA-MDE [15] extends the original AVIDA tool and facilitates the generation of behavioral models starting from the requirements' specification.At the beginning of the process, an instinctual knowledge is formed by considering state diagrams, their inner elements, and the alphabet to specify such models.By relying on such information, AVIDA-MDE constructs new transitions and propose different behavior models to support scenarios and meet the constraints specified in the initial phase by the user.The approach was evaluated using a robot navigation system as the testing scenario.AVIDA-MDE aims at generating a set of behavioral models starting from the given parameters, whereas MemoRec recommends additional model elements starting from the context consisting of the metamodel under development.
A model assistant based on clone detection has been recently envisioned [40].The envisioned prototype is able to edit incomplete input model as well as propose fine-grain operations on the model itself.The approach makes use of Simulink Virtual Modeling Assistant (SimVMA), a well-founded technique to detect model clones.SimVMA is capable of finding all possible intersections between the initial metamodel and the clones belonging to the knowledge base.By exploiting the Type-3 clone similarity, the system can perform the two recommended types, i.e.,, retrieving complete models or single operation suggestions.By using similar metamodels, MemoRec applies a context-aware collaborative filtering technique using a given number of similar metamodels to predict additional model elements.
MAR [22] employs a query-by-example approach to search for similar metamodels/models.First, model structure is encoded as bags of paths before being indexed and stored on Apache HBase.Given a model, MAR uses it as a query to search for similar artifacts using a similarity score.Though modelers can learn by inspecting similar metamodels, they have to manually examine the results to extract useful information.
Sen et al. [37] proposed a model assistant based on constraint logic program (CLP) to support the definition of domain-specific models in the modeling framework Atom 3 .Given a partial model, the proposed tool is able to synthesize the complete model by relying on several constraints specified in Prolog.The designer can eventually ask for additional recommendations using the generated domain-specific model editor.Similarly, an extension of Diagram Predicate Framework (DPF) has been proposed to rewrite partial models by adding new elements graphically [31].In particular, the proposed framework grants the compliance of the edited model using several termination rules adapted from the layered graph grammar technique.Differently from MemoRec that mines existing metamodels to predict additional model elements, the proposed approach does not use information from existing models, but makes use of i) the notation of the modeler under development, ii) the constraints expressed on this metamodel, and iii) the partial model built by a domain expert to generate a visual model editor for the DSML supporting recommendations for possible completions.
The ASketch tool [42] supports the completion of Alloy partial model with holes using automated analysis.First, the input interpreter parses the partial model to generate possible candidates.Afterward, these model fragments are encoded with the partial model and AUnit test files.Thus, ASketch can find possible solutions in a large search space by relying on an SAT solver.ASketch fills an Alloy partial model with concrete candidate fragments such that predefined tests, i.e., unit testing, test execution.Therefore, the final model fragments are not extracted from existing model/metamodel, which is actually done by MemoRec.
Recently, we presented MORGAN [10], a recommender system based on a graph neural network (GNN) to assist modelers in performing the specification of metamodels and models.Similar to MemoRec, MOR-GAN makes use of tailored model and metamodel parsers to excerpt relevant information in textual data format.Then, the encoder builds the graphs from the text produced by the parser.Finally, the generated graphs feed a GNN-based engine to compute additional metamodel or model parts.Unlike MemoRec, MORGAN does not need an active context where the recommender suggests additional elements.At the same time, MemoRec attempts to get very related recommendations to the context where the modeler is working on the definition.For this reason, we cannot directly compare MORGAN with MemoRec.
Our work distinguishes itself from the studies mentioned above as it can provide missing classes and structural features for a metamodel under development, exploiting a context-aware collaborative filtering technique.In addition, we anticipate that applying natural language processing techniques can help MemoRec improve the prediction performance, and we consider the issue in our future work.

Code recommender systems
In the context of open-source software, developing new systems by reusing existing components raises relevant challenges in (i) searching for relevant modules; and (ii) adapting the selected components to meet pre-defined requirements.To this end, recommender systems in software engineering have been developed to support developers in their daily tasks [8,32].Such systems have gained traction in recent years as they can provide developers with a wide range of valuable items, including code snippets [27,29], tags/topics [11], third-party libraries [26], documentation [33], to mention but a few.
Sourcerer [20] performs code search in large-scale repositories by exploiting different components.The first component is the crawler, which automatically downloads repositories to build a knowledge base.Then, it parses the source code to represent it as a database entity.Additionally, Apache Lucene and fingerprint are used to support keyword-base search and structural representation of the repository, respectively.Finally, the ranker retrieves the most relevant results.
StackOverflow has been exploited to enrich code queries, with the aim of getting relevant source code.In particular, FaCoY [17] is a code-to-code search engine that recommends relevant GitHub snippets to a project being developed.It is based on an alternate query technique to augment the possible retrieved results.The initial query is built from StackOverflow posts, and the additional query is performed directly on GitHub local repositories to deliver final recommendation items.
Differently from the work that proposes tools being able to provide developers with specific recommendations, Korotaev et al. [19] introduce a GRU-based recurrent neural network (RNN) to build a universal recommender system.To this end, the approach supports the recommendation phase using a client-server architecture equipped with different components.The data collection and processing phases are conducted, taking into consideration user's behavior.The data mining module is used to feed a GRU-based RNN.To support user profiling, the approach uses ontologies by building an external knowledge representation module.The proposed network outperforms the long-short term memory (LSTM) technique concerning accuracy.
MemoRec has been built following a series of recommender systems developed through the CROSSMINER project [8].Rather than recommending source code, or libraries, MemoRec provides modelers with artifacts related to metamodeling activities, using a collaborative filtering technique.Such a technique has been successfully exploited to build recommdender systems to suggest API calls [29] and third-party libraries [26].Mem-oRec processes input data with four different encoding techniques.Its internal design is also tailored to compute similarity among metamodels in an efficient way.

Application of ML in MDE
In recent years, we have witnessed a proliferation of Artificial Intelligence (AI) in various aspects of human life.In the MDE domain, though learning algorithms have been successfully applied to tackle various issues, the adoption of AI and Machine Learning (ML) techniques in this domain is still in its infancy.This section recalls some of the most important work in this topic.
Mussbacher et al. [25] conducted an initial investigation on Intelligent Modeling Assistants (IMAs) using a comprehensive assessment grid.To elicit critical IMAs features, the authors analyzed the well-founded Reference Framework for Intelligent Modeling Assistance (RF-IMA).The main finding of the work is that existing IMAs obtain low scores in the extracted features and they can be further enhanced in terms of performance as well as in the underpinning structure.
Breuker [4] reviews the main modeling languages used in ML as well as inference algorithms and corresponding software implementations.The aim of this work is to explore the opportunities of defining a DSML for probabilistic modeling.To allow developers to design solutions that solve machine learning-based problems by automatically generating code for other technologies as a transparent bridge, a language and a technology-independent development environment are introduced in [13].A similar tool named OptiML has been proposed [41], aiming to bridge the gap between ML algorithms and heterogeneous hardware to provide a productive programming environment.
In collaborative modeling, Barriga et al. [2] propose the adoption of an unsupervised and Reinforcement Learning (RL) approach to repair broken models, which have been corrupted because of conflicting changes.The main intent is to potentially reach model repairing with human-quality without requiring supervision.
AURORA [28] can be considered as the first attempt to classify metamodels exploiting a Machine Learning algorithm.The approach is built of top of a neural network to learn from labeled metamodels and classify unlabeled data.Despite its simplicity, the tool is efficient and on a small dataset, it classifies the metamodels, obtaining high prediction accuracy.In a recent work [30], we further improved the performance by employing a convolutional neural network to classify metamodels.
Heterogeneity issues in customizable recommender systems have been analyzed by involving two different use cases [38].The participants had to tune the system according to their preferences.In the first session, users configured a travel recommender by means of different facets of the trip, i.e., costs, food, and location.The second use case involved a personal exercise recommender system for the training activity.The results show that even homogeneous groups of users select different system configurations.Thus, a tailored recommender system might consider the mental model of the target users, namely their preferences and custom algorithms.
Blended recommending [21] introduces a similar strategy embedded in a movie recommender.It implements several filtering techniques used in the domain, i.e., content-based filtering and collaborative filtering.Using the blended recommending strategy, users can specify a recommendation algorithm as well as refine its parameters in a hybrid filtering fashion.In this landscape, our work aims to cope with these challenges by promoting the adoption of a low-code platform.To our best knowledge, this is the first attempt to use such a technology in this domain.
A recent work [40] envisioned a new idea for supporting the modeler with step-wise guidance or entire model examples.The proposed approach involves model clone detection techniques to find similar metamodels to the one that the modeler is defining.In contrast to our approach, the discovery of similar metamodels does not include language syntaxes, but it relies on the Simone model clone detector.

Conclusions and future work
In this paper, we introduced MemoRec, a novel approach that uses a context-aware collaborative filtering technique to support the modeler in completing the specification of a metamodel.By encoding metamodels and their contents in four different schemes, we built rating matrices and applied a syntactic-based similarity function to predict missing items, i.e., classes and structural features.An evaluation on two independent datasets, i.e., D 1 and D 2 , and four encoding schemes, i.e., SE s , IE s , SE c , and IE c , exploiting ten-fold crossvalidation demonstrates that the tool is able to provide decent recommendations.
We plan to extend MemoRec by adding other similarity functions, e.g., structural and semantic based methods.Moreover, we can improve the encoding schemes by introducing Natural Language Pre-processing (NLP) techniques.We will augment additional information to the recommendation outcomes, e.g., type, cardinality.Afterward, we are going to conduct a proper user study with the involvement of modelers to evalu-ate the usability of MemoRec.Last but not least, now that we have validated the algorithmic accuracy of the proposed technique, we will integrate the conceived tool into the Eclipse IDE, providing modelers with supports embedded in their development environment.
(a) The Web metamodel with the Web package and the Page class.(b) An example of recommendated elements for the active contexts.

Fig. 4
Fig. 4 Matrix representation of metamodels w.r.t.structural features and classes.

Fig. 5
Fig. 5 Graph representation of metamodels and structural features.

Table 1
Package-class feature rating matrix combined with SE c for the Web metamodel.

Table 2
Class-structural feature rating matrix combined with IE s for the Web metamodel.

Table 3 φ
vectors for the metamodels depicted in Fig.5.

Table 4
sim 1 matrix for the metamodels depicted in Fig.5.
Training data and Testing data (Split tenfold ).The former corresponds to the metamodels collected ex-ante, whereas the latter represents the metamodel being modeled, or the active metamodel.The ten-fold cross validation technique is used to conduct the evaluation as follows.The dataset is split into ten equal parts, one part represents the testing set and the remaining nine parts are combined to create a training set.We consider a modeler who is defining a metamodel m, so some parts of m are removed to mimic an actual metamodeling task: some packages/classes are already available in the active metamodel and the system should recommend additional packages/classes to be incorporated.For each metamodel in the Testing data, by the Split input data phase, a random package , starting from an input Dataset, we split it into two independent parts, i.e., (class) that, together with the remaining packages/classes, is selected to be used as Query data.In the considered context, the first class/structural feature is kept as query data and all the others are taken out to be used as ground-truth (GT) data.In other words, by the Split input data phase, package (class) is selected as the active context c.For c, only the selected classes are provided as query, while the rest is removed and saved as ground-truth data.-Recommendation production.In this phase, the extracted Query data and Training data are fed as input for MemoRec, which in turn computes the final Recommendations.It is important to remark that the current version of MemoRec can recommend classes and structural features.The types of the recommended attributes and relationships are not supported yet.This represents our next step to further develop the proposed approach.-Outcome evaluation.The performance of Memo-Rec is measured by comparing the recommendation outcomes with the ground-truth data (GT data), exploiting the quality metrics, i.e., Success rate, Precision, and Recall which are presented in Section 4.4.
FormElement and Form are provided as the final recommendation; k corresponds to the number of top-similar neighbor metamodels MemoRec considers to predict suggested items; -REC N (m) is the top-N recommended items for m; -GT(m) is defined as the list of classes/structural features that are saved as ground-truth data for metamodel m; -TP N (m) is the set of true positives, i.e., items in the top-N list that match with those in the groundtruth data, T P N

Table 7
report the success rate obtained with class recommendations.Partly similar to the results presented for recommending structural features classes in Table

Table 7
Success rate for class recommendations, k = {1, 5, 10, 15, 20}, by considering the D 1 dataset., we performed experiments by varying the number of neighbour nodes of the input metamodel and the value of N.Table8(a) and Table8(b) show the average success rates obtained by running the ten-fold cross-validation technique to recommend structural features and classes, respectively by using different cut-off values N .

Table 12
Predicted recommendations for the Article metaclass and Bibtex package of the metamodel in Fig.8.

Table 13
Success rate, precision and recall for class recommendations (a) using SE s and IE s encodings.