1 Introduction

Testing is an integral part and plays a pivotal role in the software development process whether it is desktop applications or mobile applications [1]. The testing process enhances software reliability by eliminating faults and ensuring fault-free performance [2]. Software development modifications are a continuous process that needs regression testing that reports the effect on the software due to changes in one or more modules or adding additional functionalities. Generally, the software testing and maintenance budget is very high, so running the whole test suite every time is not desirable as it is expensive. The best way is to choose the most critical and practical subset of test cases for re-testing. Regression testing is generally done in three ways: test case selection [3], test case reductions [4], and test case prioritization. Test case prioritization (TCP) is widely acknowledged as the most favored approach for regression testing. Following this, two additional methods, test case selection, and test case reduction, are commonly employed. The prioritization factors for test case selection include total coverage, mutant coverage, and fault detection [5]. The literature extensively compared several TCP solutions, such as firefly [6], genetic algorithm [7], Ant colony optimization, integer linear programming [8], greedy, and particle swarm optimization as mentioned in Table 1.

Table 1 List of related work

The manuscript focuses on regression testing and test case prioritization, exploring various algorithms, including greedy, meta-heuristic, and optimization techniques. The regression testing delves into the following steps: efficient selection of test cases, reduction in numbers to avoid complexity, and prioritization to increase the rate of fault detection [9]. Multi-objective optimization techniques, particularly in multi-objective test case prioritization (MOTCP), aim to optimize multiple objectives, such as code coverage and execution time. While higher code coverage enhances fault detection, the paper emphasizes identifying test cases covering modified code or segments likely to impact functionality. The main contribution is prioritizing test cases based on identified target points highlighting fault-prone code areas. In addition to code complexity, the paper considers the historical behavior of test cases to determine their prioritized order.

2 Proposed methodology

The MOTCP task utilizing NSGA-2 necessitates carefully balancing conflicting objectives, requiring thoughtful consideration. Our objective is to maximize the values of APFD and sensitivity index (SI), prioritizing test cases with a higher potential for fault detection and coverage of critical code areas. This approach enhances the effectiveness and comprehensiveness of regression testing, ultimately leading to improved software quality. In addition, we strive to minimize the execution cost to optimize resource utilization and reduce the time required for test case execution. Minimizing the execution cost ensures efficient allocation of testing resources and helps streamline the overall regression testing process.

We formulate and optimize three fitness functions: APFD, sensitivity index, and cost to accomplish these goals. Through a comprehensive discussion of these fitness functions and their formulation, we provide insights into how they contribute to the overarching objective of effective and efficient test case prioritization. To illustrate our proposed methodology and the formulation of fitness functions, we employ a small-scale project named Project P as a case study [15]. This project comprises five modules and seven classes. It also contains two tables: the first is the test case vs. fault metrics, and the second is the test cases vs. class matrix, illustrating the relationship between test cases with faults and project classes, respectively. This small project is used in further sections to calculate various parameters for the proposed methodology.

2.1 Average percentage of fault detection (APFD)

The primary objective function measures the fault detection rate (0 to 100) by organizing test cases. A higher APFD value indicates a better fault detection rate. Equation 1 calculates APFD, where TCi is the test case sequence, n is the number of test cases, and m is the total number of faults.

$$APFD=1-\frac{{TC}_{1}+{TC}_{2}+\dots +{TC}_{m}}{nm}+\frac{1}{2n}$$
(1)

2.2 Sensitivity index (SI)

The objective of regression testing is to evaluate the impact of software modifications by giving priority to test cases that cover these changes. Fault sensitivity, assessed using weighted assignments, is critical in determining the priorities. Key target points in this process involve prioritizing test cases that successfully detect a significant number of faults, newly created test cases, test cases that cover modified or newly generated code, test cases with a history of high failure rates, test cases dependent on fault-prone areas of code complexity, and test cases that cover highly complex code or classes. These considerations significantly enhance the effectiveness of regression testing and ensure comprehensive coverage of critical areas within the software.

This paper justifies the points mentioned above by introducing the sensitivity index (SI) based on two crucial parameters: the code complexity of classes and the history/status of test cases. By incorporating these parameters, we derive two weight metrics:\({W}_{c}\), a weight matrix for classes, and \({W}_{t}\), a weight matrix for test cases. Computing the second objective function, SI, involves calculating the area under the curve, which offers valuable insights into the prioritization process. The internal complexity of the code and its functional dependencies are utilized to identify critical test cases. As regression testing is conducted on Java applications, object-oriented metrics such as McCabe's Cyclomatic Complexity matrices and other custom metrics are employed to assess the code complexity [16,17,18].

2.2.1 Weight corresponding to classes (\({W}_{c}\))

For regression testing, the software tester assigns weights to code complexity properties based on Table 2, focusing on Project P [15]. \({W}_{c}\) is calculated using this weight matrix. Class-wise code complexity matrix values are computed, outliers are detected, and if found, marked as 1; otherwise, marked as 0. The Weight for each class is then calculated accordingly.

Table 2 Weights assigned to CCM by tester

Example: If the coupling between objects (CBO) has a value ranging from 0 to 47, and the acceptable range falls within 15% or lower and upper limits (i.e., 15–85%), such that values between 7.5 and 39 are accepted, we can examine an example scenario. Suppose the CBO value for class C2 is 32, and the CBO value for C1 is 4. In this case, C1 is considered an outlier because its CBO value is below 15% (7.5). Therefore, the CBO value for class C1 is 0, while the CBO value for C2 is 1. Table 3 shows the class complexity metrics (CCM) value of class C1 after outlier detection; the same process is also performed for other classes.

Table 3 CCM value of class C1 after outlier detection

After identifying outliers, the Weight of a class is determined by multiplying the Weight specified by the tester with the calculated value after outlier identification, using Eq. 2. For class C1 in the given Project P, the calculation of \({W}_{c}\) is presented below:

$${W}_{c}={\sum }_{m\in M}{W}_{m}\times {V}_{m}$$
(2)

\({W}_{c}\) for C1 = 2 × 0 + 1 × 1 + 2 × 0 + 1 × 0 + 1 × 0 + 2 × 1 + 2 × 0 + 1 × 1 + 2 × 1 = 6.

The \({W}_{c}\) values for all classes are calculated and summed up to generate the total Weight \({W}_{tc}\) for the test case corresponding to the covered classes in this test case [15].

2.2.2 Weight corresponding to test cases (\({W}_{t}\))

Weighted test case (\({W}_{t}\)) is determined by analyzing test case behaviour through historical and status data of the test case. The calculation involves modified class coverage, code coverage, dependency, faults, cost, new functionality, and status history. The calculation of \({W}_{t}\) employs the same outlier method (Eq. 3), utilizing a threshold of 15% to identify outliers.

$${W}_{t}={\sum }_{m\in M}{W}_{m}\times {V}_{m}$$
(3)

Once the values of \({W}_{t}\) and \({W}_{tc}\) have been computed, their cumulative sums, \({CW}_{t}\) and \({CW}_{tc}\), are calculated by the order of prioritized test cases. The area under the curve (AUC) between \({CW}_{t}\) and \({CW}_{tc}\), is our sensitivity index [15]. The AUC is determined using the trapezoidal rule, as illustrated in Eq. 4.

$${\int }_{{x}_{1}}^{{x}_{n}}f\left(x\right)dx=\left({x}_{2}-{x}_{1}\right)\frac{f\left({x}_{1}\right)+f\left({x}_{2}\right)}{2}+\left({x}_{3}-{x}_{2}\right)\frac{f\left({x}_{2}\right)+f\left({x}_{3}\right)}{2}+\dots +\left({x}_{n}-{x}_{n-1}\right)\frac{f\left({x}_{n-1}\right)+f\left({x}_{n}\right)}{2}$$
(4)

2.3 Cost

The cost is the third objective function, defined as the execution time associated with the order of test cases. A straightforward cumulative sum is computed to determine the total cost of the test cases based on their execution order.

$$Cost=\sum_{i=1}^{n}{CumSum(ET}_{i})$$
(5)

Based on the provided order T0 [15] of test cases and the corresponding execution times presented in Table 4, the total cost, computed using Eq. 5, amounts to 735 s.

Table 4 Execution cost of test cases used in projects P

3 Experimental assessment

The Experimental Assessment section aims to assess the effectiveness and performance of the proposed MOTCP approach. Our experimental procedure consists of a series of steps shown below to evaluate the effectiveness of our proposed methodology for test case prioritization, as depicted in Fig. 1.

Fig. 1
figure 1

Block diagram of proposed methodology

Step 1: Dataset (subjects): Utilizing custom Java applications [15] and three open-source Java projects from SIR [19], we create different versions of each custom application and gather information about the size, lines of code, test cases, and faults for the open-source projects.

Step 2: Fault seeding: We employ fault seeding and mutation fault techniques to introduce artificial faults and simulate potential code mutations to evaluate our test case prioritization methods comprehensively.

Step 3: Test case generation: Test cases are explicitly generated and tailored for assessing module functionality, maintaining diverse test cases at two levels: test class and test method. No reduction or prioritization of test cases occurs during this phase.

Step 4: Software matrices generation: Various software metrics are generated, resulting in five data files: TestCases_Faults.csv, TestCases_Classes.csv, Class_Weights.csv, TestCases_Weights.csv, and TestCases.csv.

Step 5: Objective function optimization: Using the extracted data, we calculate three proposed objective functions (APFD, cost, and sensitivity index) and optimize them using the NSGA-2 algorithm. Parameters include an initial population size equal to the number of test cases, iterations twice the number of test cases, and crossover/mutation rates set to 0.5 and 0.25, respectively.

3.1 Evaluation of model

When assessing the performance of our proposed model, we rely on the APFD as the primary objective to determine the effectiveness of various models. APFD is a metric employed to gauge the efficacy of a software testing technique or strategy in fault detection. The evaluation determines the approach's ability in test case prioritization and its impact on regression testing. To guide our evaluation, we address the following research questions.

  1. (a)

    How does the performance of the proposed MOTCP approach compare to existing methods in terms of prioritization effectiveness?

  2. (b)

    What is the trade-off between maximizing APFD and SI and minimizing execution costs in MOTCP? How does this inclusion affect prioritization results?

4 Result analysis

This section provides a brief overview of the outcomes obtained from our proposed methodology. Here, the performance of NSGA-2 is compared to various state-of-the-art algorithms and recent publications related to TCP. This includes additional algorithms such as greedy [20], 2 Opt [20], genetic algorithm (GA) [21], TCP using Honey Bee optimization (HB) [22], MOTCP using African buffalo optimization (MOBF) [23], and Analytic hierarchy process (AHP) based TCP [24]. These algorithms were applied to the provided dataset, as presented in the previous section.

Table 5 shows the comparison of various models corresponding to the dataset provided. We can see that our NSGA-2 perform better compared to other algorithms. If we compare NSGA-2 with the MOBF algorithm, it is found that MOBF lacks performance. The reason behind this is that NSGA-2 take care of diversity preservation and provides balanced exploration and exploitation compared to other multi-objective algorithms. We can also see that MOBF is lagging behind HB and GA but performs much better than 2-OptmAHP and Greedy. If we compare the single objective algorithm GA and HB with NSGA-2, it is found that HB performs better results for the medium type of problem when the number of tests is not significant and stuck in the local optimal solution for large test cases. Conversely, GA maintains its performance and gives good results for large problems compared to HB. If we compare 2-Opt, AHP and Greedy, it is found that Greedy performs worst while 2-Opt and AHP performance are mixed.

Table 5 Comparison of APFD value calculated from different algorithms

To conduct further statistical analysis on the performance of different algorithms, we performed the Wilcoxon–Mann–Whitney statistical test between NSGA-2 and other algorithms. The results of this analysis are presented in Table 6. In this test, a p value threshold of 0.05 was chosen. The test indicates significant differences between the performance of NSGA-2 and the other algorithms. In response to our first research question, it has been determined that NSGA-2 outperforms other algorithms. Additionally, it also discovered that the proposed approach demonstrates superior performance for more extensive data sizes.

Table 6 Wilcoxon–Mann–Whitney statistical test result

4.1 Three-point analysis

In addressing the second research question concerning the trade-off between different objective functions in the context of MOTCP, which explores the impact of their inclusion on prioritization results, we generated two tables. Table 7 illustrates the effect on the performance of our model when each objective function is removed individually. It was observed that the proposed MOTCP exhibits the poorest performance when APFD is removed as an objective function, whereas the removal of Cost and SI yields comparatively better results.

Table 7 APFD comparison while removing one objective at a time

The response is mixed when removing SI as one of the objective functions. This is because having a high SI value does not necessarily correspond to a high APFD value. The purpose of including SI is to ensure adequate coverage of target points for effective regression testing. It is possible that test cases covering target points may not contain any faults to detect, resulting in their execution being delayed in the sequence. Consequently, for the same SI values, multiple APFD values can be obtained.

It can be observed that removing cost as an objective function has the most negligible impact on the proposed MOTCP. This is because there is very little likelihood that a test case with a lower execution cost would have a high APFD score. It is important to note that this cost refers to the cumulative sum of execution time rather than the total execution time. This cumulative sum is entirely dependent on the order of test cases. Table 8 also reveals that incorporating Cost or SI alongside APFD improves the model's performance. There are instances where single-objective approaches get trapped in local optimization problems, and the inclusion of cost or SI acts as a catalyst to overcome such issues. Furthermore, to ascertain the individual importance of each objective function, we conducted GA on each objective independently, as depicted in Table 8.

Table 8 APFD comparison considering one objective at a time

5 Conclusion and future scope

This paper employs NSGA-2 for multi-objective test case prioritization, emphasizing APFD, SI, and Cost as objective functions. SI, calculated with a focus on fault generation and identification, significantly contributes to our approach. Comparative analysis with state-of-the-art algorithms, using APFD as the criterion, demonstrates the effectiveness of our methodology. NSGA-2 outperforms other algorithms, providing balanced solutions across all objectives, and proves valuable in regression testing scenarios. However, future work should explore integrating SI and APFD into a unified objective to enhance control in test case prioritization.