Keywords

1 Introduction

Models and modeling are essential parts of every engineering endeavors [1]. Unified Modeling Language (UML) is a nonindustrial standard for object-oriented modeling [2]. UML Class Diagram is used to describe the static structure of a system. A Class Diagram could be very large if a system is huge and complex. Designers easily become overwhelmed with details when dealing with large Class Diagrams. Model transformation is an essential part of MDE [15]. UML Class Diagram abstraction transforms a low-level class diagram to a high-level Class Diagram [3]. Well-designed Class Diagrams can lead to an eased development process towards a more ensured result system since they can be understood by stakeholders easily. Most existing model abstraction approaches fit for the scenario that stakeholders decide a few key entities usually according to their understanding of the significance of the entities. These key entities are used to represent the other entities. Then the main task of the abstraction is to expose the direct relationships among the key entities through mostly relationship abstraction. However this scenario is not always true especially when stakeholders are not familiar with the modeling techniques and the global perspective of a project. Then abstract need to be done without input of key entities. We can this scenario as unguided abstraction. Towards resolving the abstraction covering both automatic locating of representative entities and abstracting of link among these entities in Class Diagrams, we proposed a combination of class rank algorithm which prioritizes classes similar to page rank algorithm [14]. Page rank algorithm works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. We regard classes as pages, relationships between classes as hyperlinks between pages, so we can apply page rank algorithm to compute the importance of a class in a Class Diagram. After the locating of the key entities, we introduce the heuristic relationship abstraction rules from Dr. Egyed [4] which heuristically determine the representative semantics of relationships towards improving the efficiency and effectiveness of class abstraction. We validate our approach with case studies.

In the rest of this paper, we give an overview of this paper in Sect. 2 firstly. Then we elaborate class rank algorithm to compute class ranks in Sect. 3 and a method to compute correlations between classes in Sect. 4. Approaches to compute relationships between classes are represented in Sect. 5. The related works are elaborated in Sect. 6. We give our conclusions in Sect. 7.

2 Overview

We propose a method to abstract class diagrams based on a class rank algorithm and relationship abstraction rules as Fig. 1 shows. We map a Class Diagram to a graph with nodes and edges. Nodes in a graph stand for classes in a Class Diagram and edges stand for relationships. On a mapped graph we implement abstraction through following operations.

Fig. 1.
figure 1

Process of Class Diagram abstraction

  1. (1)

    Compute ranks of classes: A rank of a class represents the importance of the class in a Class Diagram. Class with a high rank is important in a Class Diagram. We use class rank algorithm to calculate ranks of classes in a Class Diagram. Then we obtain a one-dimensional class rank vector R. We suppose that classes ranked in the top thirty percent are important classes and abstracted Class Diagram should only contain important classes. We need to get relationships between important classes and decide which relationships should be presented in the abstracted Class Diagram.

  2. (2)

    Compute correlations between classes: Correlation between two nodes is used to judge whether a strong association exists between them. If there is a strong association between two nodes, they can be connected through a line. We calculate correlations of two nodes based on the distance and relationships between them. Then we get a weighted graph G(V,E 1 ). V is a node set and E 1 is an edge set with weighted correlations. Relationship between two classes can be presented in an abstracted Class Diagram if the correlation between two classes is bigger than 0.4.

  3. (3)

    Compute the direct relationship between two classes: If there is a path between two classes, a direct relationship may exist between them. We use relationship abstraction rules to get the direct relationship. Then we can obtain a weighted graph G(V,E 2 ). E 2 is an edge set with weighted relationships.

  4. (4)

    Generate the abstracted graph: According to G(V,E 1 ), G(V,E 2 ) and R, we abstract G(V,E) and get the abstracted graph G 1 (V,E) which is the abstraction of original graph G(V,E). G 1 (V,E) only contains nodes that represent important classes and relationships between them.

  5. (5)

    Generate abstracted Class Diagram: We generate an abstracted Class Diagram based on graph G 1 (V,E). We generate classes and relationships of the abstracted Class Diagram according to nodes and edges in G 1 (V,E). The abstracted Class Diagram is the abstraction of original Class Diagram.

3 Calculate Ranks of Classes

We locate key entities in a Class Diagram by calculating ranks of classes. The concept of rank is firstly used in Google’s page rank algorithm to describe the importance of a page and rank pages [14]. We adopt the idea of page rank algorithm to calculate ranks of classes in a Class Diagram and name it class rank algorithm. Class rank algorithm works by counting the number and semantic influence of relationships of classes. We regard classes as pages and relationships between classes as links between pages. But there are some differences between links and relationships. Different relationships have different ranks and different semantic influence.

3.1 Definitions

Table 1 shows logograms of relevant relationships. For example, AG is an abbreviation for aggregation. An expression such as “A × AG × B” stands for “A aggregate B”. Class rank (Rc) is represented with a float value between [0, 1000]. The value of Rc stands for the importance of a class in a Class Diagram. A class is deemed as more important if the class is labeled with a greater value. Relationship rank (Rr) is represented with an integer value between [0, 10]. The value of Rr stands for the semantic influence of a relationship in a Class Diagram. For example, if the rank of GL is larger than that of AS, it indicates that GL is of higher semantic influence than AS. We set ranks for different relationships as Table 2 shows.

Table 1. Logograms of relationships
Table 2. Ranks of relationships

Transition probability (TP) represents the probability of a rank transition from one class to another. Transition probability between class i and class j is defined according to Eq. 1. Rr(i,j) is the relationship rank of edge (i,j). For example, Rr(i,j) is equal to 7 if the relationship of edge (i,j) is AG. Transition probability matrix represents all transition probabilities of class ranks from one class to another in a Class Diagram.

$$ {\text{TP}}(i,j) = \frac{{R_{r} (i,j)}}{{\sum\limits_{(i,k) \in E} {R_{r} (i,k)} }} . $$
(1)

3.2 Calculate Class Ranks

We calculate the rank of a class based on its adjacent classes’ ranks and transition probabilities from the class to its adjacent classes. We calculate a class rank according to Eq. 2:

$$ {\text{R}}c(i) = \sum\limits_{(i,j) \in E} {Rc(j) \bullet TP(j,i)} = \sum\limits_{(i,j) \in E} {Rc(j) \bullet \frac{Rr(j,i)}{{\sum\limits_{(j,k) \in E} {Rr(j,k)} }}} $$
(2)

where Rc(i) is the class rank of class i. (i,j) is the edge from class i to class j. Rr(j,i) is the relationship rank of edge (j,i). The formulated rules of this article will be evaluated using class diagram of a shopping management system shown in Fig. 2 at the analysis phase. Process of computing class ranks is as following shows:

  1. (1)

    Set an initial vector of class ranks: There are 11 classes in the Class Diagram of shopping management system. Class rank is initialized to the same value for all classes. We suppose that the initial rank of each class is equal to 1000/11.

  2. (2)

    Calculate transition probability matrix (M): We construct transition probability matrix based on rank transition probabilities of any two different classes having relationship with each other.

  3. (3)

    Iteration: We use the last class rank vector to multiply the transition probability matrix and obtain a new class rank vector which is represented as Eq. 3:

    $$ R_{i} = R_{i - 1} \bullet M . $$
    (3)
Fig. 2.
figure 2

A low-level Class Diagram of shopping management system

where R i is the class rank vector after iterating i − 1 times. M is a constant transition probability matrix. In each iteration, a class will transfer its value of rank to its adjacent classes. After iterating i times, rank of a class tends to be stable. R i approximates R i1 . We skip out of the iteration and get the final class rank R i . A class is deemed as more important if the class is labeled with a greater value of rank. We suppose that classes ranked in the top 30% are important classes. After computing, rank values of the 11 classes in Fig. 2 are shown in Table 3. So important classes in the Class Diagram of shopping management system are Order, Payment and Customer.

Table 3. Ranks values of classes in Fig. 2

4 Compute Correlations Between Classes

In order to compute correlations between classes, we assume that two classes could be connected if the correlation between them is of high semantic influence. Correlation between two classes is related to the distance and relationships between them.

4.1 Definition

Correlation between two classes depends on paths between them. Paths between two classes contain intermediate classes and relationships between them. We use a line to connect two classes if the correlation between them is of high semantic influence. We suppose that correlation between classes is of high semantic influence if value of the correlation is bigger than 0.4. Strength of a relationship is similar to a relationship rank. We use strength of relationship to calculate the correlation of two classes. We give the strength of each relationship in Table 4. For example, correlation between class Customer and class Order is 0.5 because the relationship between them is AS.

Table 4. Strength of relationships

4.2 Calculate the Correlation Between Two Classes

We need to compute the correlation if two classes are not directly connected with each other. In Fig. 3, class A and class D are not directly connected with each other. There are two intermediate classes that are class B and class C between them. Steps of computing correlation between class A and class D are as follows:

  1. (1)

    Find strength of intermediate relationships in the path: Correlation between class A and class B is 0.5 because the relationship between them is AS and strength of AS is 0.5. Correlation between class B and class C is 0.7 because the relationship between class B and class C is AG and strength of AG is 0.7. Correlation between class C and class D is 1.0 because the relationship between class C and D is GL and strength of GL is 1.0.

  2. (2)

    Multiply the strengths: We multiply strengths of AS, AG and GL. Then we get the correlation between class A and class D which is C(A,D). C(A,D) is equal to 0.35. We need to compute correlation of two classes in each path if there are multiple paths between them according to Eq. 5:

    $$ C(i,j) = \mathop {Max}\limits_{Path(n) \in (i,j)} \left\{ {\prod\limits_{k \in Path(n)} {S(k)} } \right\} $$
    (5)

    where Path(n) is a path between class i and j. K is a relationship in Path(n) such as AG. S(k) is the strength of relationship k. In Fig. 4, there are two paths between class A and class D. For Path 1, correlation between class A and class D is C 1 (A,D) which is equal to 0.5(1.0 * 0.5). For Path 2, correlation between class A and class D is C 2 (A,D) which is equal to 0.56(0.8 * 0.7). We choose C 2 (A,D) as the correlation between class A and class D because C 2 (A,D) is bigger than C 1 (A,D).

    Fig. 3.
    figure 3

    Different relationships in a path between two classes (

  3. (3)

    Calculate the correlation matrix of a Class Diagram: For any two classes in a Class Diagram, we compute the correlation between them and get a correlation matrix of the Class Diagram. Then we obtain graph G(V,E1) of the Class Diagram based on the correlation matrix. E1 is an edge set with weighted correlations. In this way, we get correlations between classes in Fig. 2 as Table 5 shows. Correlation between classes is of high semantic influence if value of the correlation is bigger than 0.4 for instance correlation between class Customer and Order.

    Table 5. Correlations between classes
Fig. 4.
figure 4

Multiple paths between two classes.

5 Compute Relationships Between Classes

Getting the direct relationship between two classes can ease the cognitive load of checking the consistency in a Class Diagram. If all intermediate relationships in the path between two classes are the same, it is easy to know the direct relationship between the two classes. On the left side of Fig. 5, class A inherits class C if class A inherits class B and class B inherits class C. Class A depends on class C if class A depends on class B and class B depends on class C. Class A aggregates class C if class A aggregates class B and class B aggregates class C. Class A associates with class C if class A associates with class B and class B associates with class C. However, it is not easy to obtain the direct relationship between class A and class C if different relationships are grouped together as shown on the right side of Fig. 5.

Fig. 5.
figure 5

Combination of relationships

5.1 Abstraction Rules

Dr. Egyed proposed a series of abstraction rules for automatically getting direct relationships from relationship combinations as Fig. 6 shows. The left side depicts the class input patterns and the right side (after “equals”) depicts the class output patterns. Input and output patterns are allowed to be more complex as long as the output pattern is simpler than the input pattern. If not, the abstraction algorithm could be non-deterministic. We also analyzed the semantic dependencies between other classes and their relationships. Note that the direction of relations is indicated through their name. If the relation name is used with no add-on, then a forward relation (a relation from left to right) is meant. If the string “r” is added then a backward relation (a relation from right to left) is meant. The number following each rule indicates the reliability of result obtained from the implementation of this rule. For example, rule(30){AS × Class × DP equals DP 50} indicates that class A depends on class C if class A associates with class B and class B depends on class C. Value of the reliability of the result obtained from implementing rule(30) is equal to 0.5(50/100). According to rule(15), class A associates with class C if class A inherits class B and class B associates with class C. Value of the reliability of the result obtained from implementing rule(15) is 1.

Fig. 6.
figure 6

Rules for abstraction

5.2 Compute the Relationship Between Classes

  1. (1)

    Compute direct relationship between two classes in a path: We can get all intermediate relationships in the path between two classes and form a one-dimensional array of relationships. For any two adjacent relationships in a relationship array, we can abstract them if they satisfy a rule as shown in Fig. 6. In Fig. 3, relationship array that we got from the path is equal to Arr = [AS, AG, GL]. For any two adjacent relationships in Arr, we judge whether they satisfy a rule. For example, AS and AG meets rule(2). AG and GL meets rule(27). Then we need to consider which rule should be executed firstly and whether different execution orders of rules will lead to different results. We get the final relationship between class A and class D which is AS if we execute rule(2) firstly following the first order shown in Fig. 7. Relationship array A = [AS, AG, GL] changes to A1 = [AS, GL] and relationship array A1 = [AS, GL] changes to A2 = [AS] if we execute rule(28) secondly. The final relationship between class A and class D is also equal to AS if we execute the rule(27) firstly following the second order shown in Fig. 7. Relationship array A = [AS, AG, GL] changes to A3 = [AS, AG] and relationship array A3 = [AS, AG] changes to A4 = [AS] if we execute rule(2) secondly.

    Fig. 7.
    figure 7

    Execution orders of abstraction rules

In Fig. 7, result of execution following the first order is A2 = [AS] and result of execution following the second order is A4 = [AS]. Value of the reliability of A2 = [AS] is equal to 0.7. Value of the reliability of A4 = [AS] is equal to 0.5. We choose the relationship with a higher reliability. We need to consider whether different execution orders will lead to different relationship results and find a way to get a relationship result with the highest reliability. For a relationship array A = [X, Y, Z] which contains only 3 relationships, there are 343(7 * 7 * 7) types of A (e.g. [AS, AG, AS]). For each type of A, we analyzed two execution orders of A like Fig. 7 shows. We found that results of two execution orders are the same regardless of the type. Table 6 gives the analysis results of 343 relationship array types.

Table 6. Analysis results of 343 types

Invalid result means that combination of the latter two relationships in array A does not conform to any of the abstraction rules. For relationship array A[AS, GL, GLr] shown in Fig. 8, there is no rule that meets the two adjacent relationships GL and GLr. Thus, there is only one order for A to choose which leads to a unique result. We will get only one relationship result for the invalid type. We can conclude that there is only one relationship result no matter in which array type that contains 3 relationships. We apply heuristic abstraction rules to abstract relationships between two classes as algorithm 1 shows.

Fig. 8.
figure 8

Execution orders of relationship array A[AS, GL, GLr]

figure a

Compute direct relationship between two classes if there are two or more paths: We will get more than one relationship result if there are multiple paths between two classes. We should consider which result to choose. Our approach is to choose result with the highest reliability. We choose result with the highest relationship rank if reliabilities of two different results are equal.

  1. (2)

    Calculate the relationship matrix of a Class Diagram: After computing direct relationship between any two classes in a Class Diagram, we create a relationship matrix of the Class Diagram and form graph G(V,E2). We obtain relationships between classes in Fig. 2 as Table 7 shows.

    Table 7. Relationship between classes in Fig. 2

We set the level of abstraction by controlling requirements for the number of important classes and strength of relationships. We get four important classes if we suppose that classes ranked in the top 40% are important. They are Payment, Customer, Order and OrderDetail. We reduce the number of important classes to three if the standard that is set for the important classes is ranked in top 30%. We locate key entities by using class rank algorithm. From Tables 5 and 7 we can know the correlations and relationships between important classes. Therefore, we can get abstracted Class Diagrams of different levels of shopping management system as Fig. 9 shows which only contains important classes and relationships between them. The Class Diagram on the right side has a higher abstraction than the one on the left.

Fig. 9.
figure 9

Abstracted Class Diagram of different levels

6 Related Work

It is more and more difficult for people to program, understand, and modify software with the increasingly complexity of a software system [13]. Dr. Eyged proposed a series of abstraction rules for Class Diagram which contain class abstraction rules and relationship abstraction rules. Direct relationship between any two classes in a Class Diagram can be calculated by using heuristic abstraction rules [4]. His abstraction method supports the scenario when key entities such as classes or instances are identified by stakeholders before an abstraction process or during interactions of a process. The pattern matching not only is costly [5] but also depends on the experience or project knowledge of the stakeholders in identifying the key entities/relationships. The scenario of abstraction without relying on the experience of stakeholders is a more challenging situation which need to be explored but not well addressed in existing literature. The first obstacle for unguided abstraction is to create or find the representative entities which can represent the trivial data, information and knowledge which are distributed in the mess of the trivial elements in model diagrams. Based on our hypothesis that some existing classes can represent the others properly, we uses class rank algorithm to calculate ranks of classes in a Class Diagram and use the ranks to represent the importance of the entities. During the implementation of the visualization [12], we adopted the hierarchical layout by Sugiyama to display our layered abstraction of UML Class Diagrams [6, 8] and also referred to the rank-direct layout method [9] which centers on higher ranked classes and clusters lower ranked classed to higher ranked classes. Harald found that various factors like layout quality, modeler experience, and diagram type led to significant differences in diagram reading strategies [10, 11]. Helen investigated the preference of UML modelers and found that joined inheritance arcs and directional indicators were preferred for Class Diagrams [7].

7 Conclusion

Well designed hierarchical abstraction can provide an effective means to efficiently understand and maintain complex models. Most existing model abstraction approaches assume that target classes are provided ahead. Thereafter the abstraction focuses on the abstraction at relationship level. Dr. Egyed had proposed a series of heuristic rules which are capable to efficiently fulfill the relationship abstraction. However unguided abstraction which doesn’t come with guided information such as predetermined target classes is not fully studied in existing literature. We analyzed that unguided abstraction requires automated recognition of target classes which represent important entity concepts of abstracted models. Then based on the hypothesis that some existing classes fit well to represent other classes, we proposed a class prioritizing algorithm to recognize these representative classes through ranking them with different values. We further integrate the entity abstraction with the heuristic relationship abstraction to form an approach of unguided automatic abstraction of models. In the future, we propose to explore the entity abstraction scenario where no existing classes fit as the representative entity for other classes through the data, information and knowledge recreation of existing classes and relationships with the introduction of knowledge graphs.