Advertisement

Model Based Test Case Prioritization Using Association Rule Mining

  • Arup Abhinna Acharya
  • Prateeva Mahali
  • Durga Prasad Mohapatra
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 33)

Abstract

Regression testing has gained importance due to increase in frequency of change requests made for software during maintenance phase. The retesting criteria of regression testing leads to increasing cost and time. Prioritization is an important procedure during regression testing which makes the debugging easier. This paper discusses a novel approach for test case prioritization using Association Rule Mining (ARM). In this paper, the system under test is modelled using UML Activity Diagram (AD) which is further converted into an Activity Graph (AG). A historical data store is maintained to keep details about the system which revealing more number of faults. Whenever a change is made in the system, the frequent patterns of highly affected nodes are found out. These frequent patterns reveal the probable affected nodes i.e. used to prioritize the test cases. This approach effectively prioritizes the test cases with a higher Average Percentage of Fault Detection (APFD) value.

Keywords

Regression testing Association rule mining Test case prioritization and test case 

1 Introduction

Software Testing is a time, effort and cost consuming phase in the software development life cycle. Retesting of new versions of the product during maintenance phase is carried out using previously developed test suite to identify the issue related to the modifications [1]. The biggest challenge for software testers is to complete this retesting process of regression testing within time and budget. Researchers have proposed various techniques for regression testing such as test case selection, test suite reduction and test case prioritization [2]. Out of these techniques, test case prioritization is the most effective one to have higher percentage of fault detection [2].

Test Case Prioritization is a methodology used for scheduling of test cases with the intention of detecting maximum faults with minimum time and cost [3]. Prioritization also focuses on increase in rate of high risk faults detection and ensures reliability of software products at a faster rate. Researchers had proposed several criteria for scheduling of test cases during prioritization such as, coverage-based, requirement-based, risk factor based etc. [4, 5]. But, test cases with maximum coverage do not confirm maximum fault detection. Hence we have to propose a technique which should give maximum coverage with high fault detection capability.

Average Percentage of Fault Detection (APFD) metric is available to check the efficiency of test suites [6].

In this paper we have tried to overcome this problem by generating frequent patterns of nodes revealing more faults which help in identifying the test cases which detects maximum faults, collected from last few releases of the software. If maximum priority is given to retest those test cases first in the next release of software, then we can ensure that the probability of getting defects or faults early will be enhanced and the quality of software will be also improved by the time it is delivered. Hence the software analyst can be able to reduce the estimated budget for the new release of software. The frequent pattern of nodes is discovered by using data mining mechanism.

Data Mining is a process of discovering or extracting data patterns which will help in identifying the frequently occurring faults [7]. A frequent pattern can be generated by using association rule, apriori algorithm or vertical data format [7]. In this paper, association rule is used to generate frequent patterns for mostly affected nodes.

Association rule is a popular method for discovering interesting relations between variables in large database. The main goal of this rule is to discover regularities between products in large scale database [7]. In an association rule mining, a rule is defined as an implication form X ==> Y where X,Y ⊆ I and X ∩ Y = ɸ, where I defines the itemset. Here, it requires to satisfy a user-specified minimum support and user-specified minimum confidence simultaneously.

The rest of the paper is organized as follows: Sect. 2 discusses the related work and its analysis and Sect. 3 describes the proposed approach. Section 4 presents the implementation of proposed approach using a case study of Shopping Mall Management System (SMMS) and Sect. 5 compares the current approach with the related work. Section 6 discusses the conclusion and future work.

2 Related Work

Many researchers [8, 9, 10, 11] have proposed different mechanisms for test case prioritization using model based approach. Khandai et al. [3] proposed an approach for prioritizing the test cases according to Business Criticality Test Value (BCTV) of the test cases. In this case functionalities which incorporate more risks to the business are given more priority. Miller et al. [8] proposes a set of new techniques for functional test case prioritization based on the inherent structure of dependencies between tests i.e. Dependency Structure Prioritization (DSP). The objective of this technique is to verify the fault detection capability of DSP relative to existing coarse grained and random fine grained techniques and efficiency of this technique is compared with existing techniques.

Mahali et al. [12] proposed another method for test case prioritization using model based approach. In that paper, they have carried out the prioritization on an optimized test suite. The authors have optimized the generated test suite using genetic algorithm and that optimized test suite is used for prioritization by comparing the current version of software with the previous version of corresponding software. Acharya et al. [13] proposed an approach for test case prioritization by assigning priority to user requirement concerning parameter like database parameter, networking parameter and Graphical User Interface (GUI) parameter.

Muthusamy et al. [14] proposed an approach for test case prioritization process by identifying the severe faults. Here test case prioritization is performed based on four practical weight factor such as customer allotted priority, developer observed code execution complexity, change in requirements, fault impact, completeness and traceability.

3 Proposed Approach

In this paper a new heuristic approach for test case prioritization is presented. In the proposed approach, UML activity diagram is used as system model and test case prioritization is performed by analysing the modification history of different version of that SUT. First the system model is converted to a model dependency graph called Activity Graph (AG) and test cases are generated from AG for the new version of SUT. Then, the proposed forward slicing algorithm is used on the AG to track the modified node and trace out the affected nodes of all modified nodes. Simultaneously historical data of affected nodes of the previous version of SUT are analysed. The historical data contains Graph Data (GD) and Observation Data (OD). Graph data defines information or data collected from AG without knowing the behaviour of system. The tester only knows the objective and proposed technique of testing process. Observation data defines information or data collected from the low level design of the system. At that time, software tester knows behaviour and required technical knowledge to implement the proposed technique by using an automation tool to show the actual result. GD and OD for each modified node are used to find out a common pattern of affected nodes revealing more number of faults. Then, test case prioritization is done by using the patterns of affected nodes generated using ARM. The proposed framework for test case prioritization is shown in Fig. 1 and elaborated in Sect. 4 with a case study of Shopping Mall Management System (SMMS).
Fig. 1

Proposed framework for test case prioritization

4 Implementation of Proposed Approach

4.1 Test Case Generation Using Activity Diagram

In this paper the UML Activity Diagram (AD) is used for system modelling. The AD of Shopping Mall Management System (SMMS) is shown in Fig. 2. Then AD is converted into a model dependency graph called as Activity Graph (AG) (shown in Fig. 3), where each activity is represented as a node and activity flow between two activities is represented as an edge between the two nodes. The graph contain two decision nodes, one fork and join node, five control nodes and eighteen normal nodes. Test cases are generated from the activity graph by using Breadth-First-Search (BFS) algorithm [15]. Each individual path in the graph is considered as a test scenario. There are 38 paths in the activity graph and hence 38 test scenarios are possible for this SMMS, which are shown in Table 1. Due to space constrained seven number test cases are represented.
Fig. 2

Activity diagram of shopping mall management system (SMMS)

Fig. 3

Activity graph of shopping mall management system

Table 1

Test scenarios of SMMS

Sl no (path)

Test case id

Independent path

1

T1

1→2→3→4→5→6→7→8→9→11→12→13→14→15→17→19→20→21→22 →23→25→26→27

2

T2

1→2→3→4→5→6→7→8→9→11→12→13→14→15→17→19→20→21→22 →24→25→26→27

3

T3

1→2→3→4→5→6→7→8→9→11→17→19→20→21→22→23→25→ 26→27

4

T4

1→2→3→4→5→6→7→8→9→11→17→19→20→21→22→24→25→26→27

5

T5

1→2→3→4→5→6→7→8→9→10→11→12→13→14→15→17→19→20 →21→22→23→25→26→27

37

T37

1→2→3→18→19→20→21→22→23→25→26→27

38

T38

1→2→3→18→19→20→21→22→24→25→26→27

4.2 Creation of Historical Data Table Using Forward Slicing

Historical database schema representation is one of the goal of the proposed approach. The database schema contains all information about the developed system like Project ID, Project Name, Project Type, Total No. of Nodes, Total No. of Modified Nodes, and Total No. of Affected Nodes for each modified node. Database containing information about oldest version of system is called as historical database and it is updated for each latest version of that system. It contains two types of data such as Graph Data (GD) and Observation Data (OD).

As per the proposed framework, modified nodes are collected from the AG and affected nodes are obtained from proposed forward slicing algorithm (given in Algorithm 1). The affected nodes with respect to the modified nodes are shown in Table 2.
Table 2

Modified nodes and affected nodes

Modified nodes

Affected nodes

2

3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27

3

4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27

4

5,6,7,8,9,11,12,13,14,15,17,19,20,21,22,23,24, 25,26,27

5

6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27

6

5,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27

7

5,6,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27

8

5,6,7,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27

9

6,7,8,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27

10

6,7,8,9,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27

15

17,19,20,21,22,23,24,25,26,27

16

5,6,7,9,10,11,12,13,14,15,17,19,20,21,22,23,24,25,26,27

18

5,6,7,9,10,11,12,13,14,15,16,17,19,20,21,22,23,24,25,26,27

19

20,21,22,23,24,25,26,27

26

27

Open image in new window

4.3 Finding Common or Frequent Patterns Using Association Rule Mining

After collecting the required data for the system, ARM [7] is used to generate a common pattern or frequent pattern. To generate association rule, first the items are collected and stored in database called as itemset. Here items are the affected nodes for a particular modification done in different versions of the system. Several types of modifications can be possible in one system like requirement modification, design modification, functionality modification etc. In this paper, we have considered requirement modification for each version of the system and affected nodes due to the modification are collected. These affected nodes are stored in the historical database. Historical data for last four versions (C1, C2, C3, C4) due to requirement modification are given below.

C1:

GD - {2, 3, 8, 9, 13, 14, 15}

OD - {3, 4, 5, 8, 9, 15}

C2:

GD - {4, 5, 6, 8, 10, 11, 17, 18, 19}

OD - {4, 6, 7, 11, 12, 15, 18, 19}

C3:

GD - {6, 7, 8, 10, 12, 14, 15, 16, 19, 20, 21}

OD - {4, 5, 8, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22}

C4:

GD - {3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27}

OD - {4, 5, 6, 7, 10, 14, 16, 20, 21, 22, 26}

These affected nodes are called as itemsets. Then first step of association rule is applied i.e. generation of frequent itemsets from the whole itemset. To generate a frequent itemset, support of individual items and support of combination of items are calculated and compared with user-specified minimum support value. Then the generated frequent pattern is used for calculation of confidence value. After calculating the confidence value for each item, the frequent pattern is compared with user-specified minimum confidence value. The resultant pattern is called as common pattern or frequent pattern for frequently occurring itemset. Here, we have used MATLAB R2012a(7.14.0.739) to implement the itemsets by taking minimum support value as 40 % and minimum confidence value as 85 % and the resultant frequent pattern graph found from the implementation is shown in Fig. 4. The nodes which affect the other nodes maximum time are collected from the frequent pattern graph and the node sequence (let N) is given by
Fig. 4

Frequent pattern graph found for SMMS with mini-support = 40 %, min-confidence = 85 %

N = {6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27}

The prioritized node sequence (Np) is

Np = {27, 26, 25, 24, 23, 22, 21, 20, 19, 17, 15, 14, 13, 12, 11, 9, 8, 7, 6, 16, 18}

This prioritized node sequence is used for test case prioritization. Test case prioritization is done as per early detection of affected nodes due to a particular modification in the system. The modified information was collected from last few configuration of the system. Hence maximum priority is given to that node which was affected maximum time with maximum confidence value. Here, we have assigned a priority level from 1 to 21 to each node present in Np by giving highest priority to first node of Np and subsequently assigning the priority level to other nodes.

Test case prioritization is done as per the total priority value of test cases. Total priority value of test scenarios is the summation of priority levels of all nodes present in the test scenario (test scenarios are shown in Table 1), given in Eq. 1.

Mathematically, this can be expressed as
$$ {\text{Total}}\,{\text{priority}}\,{\text{value}} = \mathop \sum \limits_{i = 1}^{n} {\text{PL}}_{{\textbf{i}}} $$
(1)
where PLi is the total priority level of node present in the test scenario. If node is present in Np then priority level value is consider, otherwise it is zero.

For example, Total Priority value for T1 = 0 + 0 + 0 + 0 + 0 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 19 + 20 + 21 = 210

Similarly the Total Priority value of all test cases are found put. Here the author has implemented this proposed metric/formula to find out the prioritized test suite. Hence the prioritized test suite (TP) is T24, T23, T32, T2, T6, T12, T16, T31, T1, T5, T11, T15, T20, T19, T26, T25, T34, T4, T14, T18, T33, T3, T13, T17, T28, T27, T8, T7, T22, T36, T10, T21, T35, T9, T30, T29, T38 and T37. The prioritized test suite found using the proposed approach has a higher APFD value as compared to the non-prioritized test cases. APFD value of prioritized and non-prioritized test cases is shown in Fig. 5.
Fig. 5

APFD Value of prioritized and non-prioritized test cases

5 Comparison with Related Work

The approaches proposed by Khandai et al. [3], Mahali et al. [10] and Acharya et al. [13] are not efficient enough as far as high risk functions, scalability of the project and frequency of change are concerned. In the real industrial practice, the volume of historical database increases due to increase in complexity of projects and subsequently the number of faults and affected nodes are also increasing. There is no such technique is available which will search the number of modification, faults and affected nodes from the large database and will be able to give highly affected test cases. In our approach, we have taken one attempt to overcome this problem by mining the historical data of projects using ARM and identified a frequent pattern containing highly affected test scenarios.

6 Conclusion and Future Work

In this paper the author has used historical data on the occurrence of past faults to prioritize the test cases. Frequent patterns generated using association rule mining helped effectively in prioritization. This approach gives a better result w.r.t. the average percentage of fault detection. Here the author has only considered the functional requirements of the system. A change during the maintenance phase also affects the non-functional behaviour of the system. Future research can be carried out on prioritizing test cases for testing the non-functional aspects of the system.

References

  1. 1.
    Mall, R.: Fundamental of Software Engineering, 3rd edn. PHI Learning Private Limited, New Delhi (2009)Google Scholar
  2. 2.
    Chauhan, N.: Software Testing Principles: Practices, 3rd edn. Oxford University Press, New Delhi (2010)Google Scholar
  3. 3.
    Khandai, S., Acharya, A.A., Mohapatra, D.P.: Prioritizing test cases using business test criticality value. Int. J. Adv. Comput. Sci. Appl. 3(5), 103–110 (2011)Google Scholar
  4. 4.
    Askarunisa, A., Shanmugariya, L., Ramaraj, N.: Cost and coverage metrics for measuring the effectiveness of test case prioritization techniques. INFOCOMP J. Comput. Sci. 9(1), 43–52 (2010)Google Scholar
  5. 5.
    Aggrawal, K.K., Singh, Y., Kaur, A.: Code coverage based technique for prioritizing test cases for regression testing. ACM SIGSOFT Softw. Eng. Notes 29(5), 1–4 (2004)CrossRefGoogle Scholar
  6. 6.
    Srivastava, P.R.: Test case prioritization. J. Theor. Appl. Inf. Technol. 178–181 (2008)Google Scholar
  7. 7.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. 2nd edn. Morgan Kaufmann Publishers, San Francisco (2010)Google Scholar
  8. 8.
    Haidry, S.Z., Miller, T.: Using dependency structures for prioritization of fundamental test suites. IEEE Trans. Softw. Eng. 39(2), 258–275 (2013)CrossRefGoogle Scholar
  9. 9.
    Kaushik, N., Salehie, M., Tahvildari, L., Li, S., Moore, M.: Dynamic prioritization in regression testing. In: Fourth International Conference on Software Testing. Verification and Validation Workshops, pp. 135–138 (2011)Google Scholar
  10. 10.
    Malhotra, R., Tiwari, D.: Development of a framework for test case prioritization using genetic algorithm. ACM SIGSOFT Softw. Eng. Notes 38(3), 1–6 (2013)CrossRefGoogle Scholar
  11. 11.
    Yoo, S., Harman, M.: Regression testing, minimisation, selection and prioritisation: a survey. Softw. Test. Verif. Reliab. 1–60 (2007)Google Scholar
  12. 12.
    Mahali, P., Acharya, A.A.: Model based test case prioritization using UML activity diagram and evolutionary algorithm. Int. J. Comput. Sci. Inf. 3, 42–47 (2013)Google Scholar
  13. 13.
    Acharya, A.A., Budha, G., Panda, N.: A novel approach for test case prioritization using priority level technique. Int. J. Comput. Sci. Inf. Technol. 2, 1054–1060 (2011)Google Scholar
  14. 14.
    Muthusamy, T., Seetharaman, K.: A new effective test case prioritization for regression testing based on prioritization algorithm. Int. J. Appl. Inf. Syst. 6(7), 21–26 (2014)Google Scholar
  15. 15.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms, 3rd edn. PHI Learning Private Limited, New Delhi (2010)MATHGoogle Scholar

Copyright information

© Springer India 2015

Authors and Affiliations

  • Arup Abhinna Acharya
    • 1
  • Prateeva Mahali
    • 1
  • Durga Prasad Mohapatra
    • 2
  1. 1.School of Computer EngineeringKIIT UniversityBhubaneswarIndia
  2. 2.Department of Computer Science EngineeringNational Institute of TechnologyRourkelaIndia

Personalised recommendations