Integrating software quality models into risk-based testing

Foidl, Harald; Felderer, Michael

doi:10.1007/s11219-016-9345-3

Integrating software quality models into risk-based testing

Open access
Published: 12 November 2016

Volume 26, pages 809–847, (2018)
Cite this article

Download PDF

You have full access to this open access article

Software Quality Journal Aims and scope Submit manuscript

Integrating software quality models into risk-based testing

Download PDF

Harald Foidl¹ &
Michael Felderer¹

5995 Accesses
10 Citations
6 Altmetric
Explore all metrics

Abstract

Risk-based testing is a frequently used testing approach which utilizes identified risks of a software system to provide decision support in all phases of the testing process. Risk assessment, which is a core activity of every risk-based testing process, is often done in an ad hoc manual way. Software quality assessments, based on quality models, already describe the product-related risks of a whole software product and provide objective and automation-supported assessments. But so far, quality models have not been applied for risk assessment and risk-based testing in a systematic way. This article tries to fill this gap and investigates how the information and data of a quality assessment based on the open quality model QuaMoCo can be integrated into risk-based testing. We first present two generic approaches showing how quality assessments based on quality models can be integrated into risk-based testing and then provide the concrete integration on the basis of the open quality model QuaMoCo. Based on five open source products, a case study is performed. Results of the case study show that a risk-based testing strategy outperforms a lines of code-based testing strategy with regard to the number of defects detected. Moreover, a significant positive relationship between the risk coefficient and the associated number of defects was found.

Test case selection and prioritization using machine learning: a systematic literature review

Article 14 December 2021

Rongqi Pan, Mojtaba Bagherzadeh, … Lionel Briand

Software defect prediction: future directions and challenges

Article 27 February 2024

Zhiqiang Li, Jingwen Niu & Xiao-Yuan Jing

Empirical Research in Software Engineering — A Literature Survey

Article 12 September 2018

Li Zhang, Jia-Hao Tian, … Tao Yue

1 Introduction

Testing is an essential quality assurance technique for modern software-intensive systems which often has to be performed under severe pressure (Felderer et al. 2014b; Perry and Rice 1997). A challenging time schedule, limited resources (Felderer et al. 2014b), and increasing pressure from senior management, who often see testing as “something that has to be done” (Perry and Rice 1997), are the major causing factors for that. In addition, complete software testing is virtually impossible (Redmill 2004; Pries and Quigley 2010). As a result, effective and efficient software testing must be selective (Redmill 2004) to ensure the right amount of testing (Graham et al. 2008). Risk-based testing (Felderer and Schieferdecker 2014), which utilizes identified product risks of a software system for testing purposes, has a high potential to support and improve testing in this context (complete testing not possible, challenging time schedule and limited resources). It optimizes the allocation of resources and time, is a means for mitigating risks, helps to early identify critical areas, and provides decision support for deciding when to release (Felderer et al. 2014b).

Lately, the international standard ISO/IEC/IEEE 29119 Software Testing (2013) on testing techniques, processes, and documentation even explicitly mentions risks as integral part of the testing process. Also, several risk-based testing approaches which consider risks of the software product as the guiding factor to support decisions in all phases of the test process have been proposed in the literature (Erdogan et al. 2014; Felderer and Schieferdecker 2014).

A core activity in every risk-based testing process is risk assessment because it determines the significance of the underlying risk values and, therefore, the quality (effectiveness and efficiency) of the overall risk-based testing process (Felderer et al. 2012). Risk assessment is often done in an ad hoc manual way which is expensive, time-consuming, and has low reliability.

Because product risk can be seen as a factor that could result in future negative consequences (ISTQB 2015), which are usually system and software defects in the field of software testing (Redmill 2004), one can argue that risk represents missing software product quality and therefore should be measured via software quality assessment. The recent standard ISO/IEC 25010 (2011) decomposes software quality into characteristics which further consist of subcharacteristics and even sub-subcharacteristics. Quality Modeling and Control (QuaMoCo) operationalizes ISO/IEC 25010 by providing a tool-supported quality assessment method for defining and assessing software quality (Wagner et al. 2015; Deissenböck et al. 2011). Based on this quality model assessment, it is also possible to provide an objective and automation-supported risk assessment for risk-based testing.

As risk represents missing software quality and quality assessments based on quality models that already describe the quality-related risks of a whole software product (Zhang et al. 2006; Wagner 2013; ISO/IEC 25010 2011), this article addresses the research objective: how quality assessments based on quality models can be used and applied in risk assessment for testing purposes. This objective is investigated by showing how quality models can principally be used for risk assessment and by providing a concrete integration of a quality assessment based on QuaMoCo into risk-based testing and its evaluation.

An exploratory study of available risk-based testing approaches by Felderer et al. (2015) showed that, until now, the potential of quality models for risk assessment in the context of risk-based testing has not been investigated. The contribution of this article is, on the one hand, to show the potential usage of quality models for risk-based testing by presenting two integration approaches and, on the other hand, to provide a concrete integration including tool support and an empirical evaluation for the quality model QuaMoCo. In addition, the presented integration approach bridges the gap between the international standard ISO/IEC 25010 on Software Quality, which is operationalized by the quality model QuaMoCo, and the international standard ISO/IEC/IEEE 29119 on Software Testing which explicitly mentions risks as an integral part of the testing process.

The evaluation of the developed integration approach based on a case study of five open source products showed that a risk-based testing strategy outperforms a line of code-based testing strategy according to the number of classes which must be tested in order to find all defects. In addition, a significant positive relationship between the risk coefficient^{Footnote 1} and the associated number of defects of a class was found. Moreover, on average, 80 % of all defects of the five analyzed software products were found by testing 30 % of all classes when a risk-based testing strategy was applied. For the sake of comprehensibility and due to the fact that an explicit distinction is not always required, we use the term defect according to Wagner (2013) as a superset of faults (=bugs) and failures in this article. This is because there is always some relationship between the two, and at a certain abstraction layer, it is useful to have a common term for both.

The remainder of this article is structured as follows. Section 2 discusses background on risk-based testing and software quality models as well as related work on their integration. Section 3 presents two generic integration approaches of how quality assessments based on quality models can be used for further risk-based testing of the investigated software product. Section 4 presents the concrete integration of quality assessments and risk-based testing on the basis of the open quality model QuaMoCo. Section 5 describes the applied research design including the research questions, the case selection as well as the data collection, analysis, and validity procedures. In Section 6, the results of the case study and threats to validity are discussed. Finally, Section 7 draws conclusions and presents possible future work.

2 Background and related work

This section discusses background on risk-based testing and software quality models as well as related work about their integration. Section 2.1 provides background on risk-based testing and Section 2.2 on software quality models. Finally, Section 2.3 discusses related work on the integration of quality models and risk-based testing.

2.1 Risk-based testing

Risk-based testing (RBT) is a testing approach which considers risks of the software product as the guiding factor to support decisions in all phases of the test process (Gerrard and Thompson 2002; Felderer and Schieferdecker 2014). A risk is a factor that could result in future negative consequences and is usually expressed by its probability and impact (ISTQB 2015). In software testing, the probability is typically determined by the likelihood that a defect assigned to a risk occurs, and the impact is determined by the cost or severity of a defect if it occurs in operation. Mathematically, we can define the risk exposure (R) of an arbitrary risk item or asset (a) as a multiplication of the probability factor (P) and the impact factor (I):

$$ R(a)=P(a)*I(a) $$

In the context of testing, a risk item is anything of value (i.e., an asset) under test, for instance, a requirement, a component, or a defect one explicitly wants to avoid. Risk exposure values are estimated during development before the information whether a risk item is actually defective or not is available. Based on the risk exposure values, the risk items are typically prioritized and assigned to risk levels. The resulting risk information is used to support decisions in all phases of the test process.

For the determination of risk, probability and impact several proposals were made in the literature (Felderer et al. 2015). Probability of defect occurrence is often determined by technical factors, whereas impact is often determined by business factors. For instance, Van Veenendaal (2012) proposes complexity, number of changes, new technology and methods, size, or defect history as probability factors and critical areas, visible areas, most used areas, business importance, or cost or rework as impact criteria. In another paper, Felderer et al. (2012) propose, for instance, code complexity, functional complexity, or testability as probability criteria as well as importance or usage as impact criteria. All listed factors are typically estimated manually and not guided by software quality models (Felderer et al. 2015) as proposed in this article. However, the guidance of risk-based testing by software quality models is of high practical importance. In previous studies (Felderer and Ramler 2014a, b; Felderer and Ramler 2016), we found that making testing more effective in terms of (1) detecting additional defects in testing such that fewer defects slip through to the field as well as (2) prioritization for detecting most critical defects first to reduce the overall stabilization costs and time are essential benefits of risk-based testing. Both aspects of effectiveness can be addressed by suitable software quality factors (systematically) selected from quality models. Guidance by software quality models thus also supports the development of risk models for testing purposes in a structured way.

2.2 Software quality models

Software quality models are, according to Deissenböck et al. (2009), a well-accepted mean for managing and describing software quality as the study by Wagner et al. (2012a, b, c) showed. In the last 30 years, plenty of quality models were developed by various researchers to understand and measure the quality of software (Kitchenham and Pfleeger 1996; Deissenböck et al. 2009). A complete coverage of all research contributions, existing literature, approaches, and concepts in the area of software quality models would be out of the scope of this article. Therefore, the following subsection aims to provide a general understanding about software quality models and presents the ISO/IEC 25010 standard about software quality, which is operationalized by QuaMoCo, in more detail.

Based on the long history of research effort on quality models, they can be seperated into different groups, for example hierarchical and richer quality models (Wagner et al. 2015), meta-model-based and implict quality models (Wagner 2013) or basic and tailored quality models (Miguel et al. 2014). Further, Deissenböck et al. (2009) suggest to classify software quality models according to their different purposes.

The predominat group is the hierarchical quality models (Wagner et al. 2015) which decompose the concept of quality into different factors, criteria, and metrics (Factor Criteria Metrics models) (Cavano and McCall 1978). Examples are McCall’s quality model (McCall et al. 1977), Boehm’s quality model (Boehm et al. 1978), FURPS quality model (Grady 1992), QuaMoCo (Wagner et al. 2015), or the ISO/IEC 25010 quality model (ISO/IEC 25010 2011).

The ISO/IEC 25010 standard, as a hierarchical definition quality model, decomposes software quality into characteristics which further can consist of subcharacteristics and even sub-subcharacteristics (Wagner 2013). The aim of this decomposition is to reach a level where the characteristics can be measured in order to evaluate the software quality (Wagner 2013). In detail, the ISO/IEC 25010 (2011) defines two quality models, the quality-in-use, and the product quality model to evaluate and define software quality (Wagner 2013). The product quality model uses eight characteristics for describing a software product’s quality in a comprehensive way. Figure 1 graphically illustrates these eight charactertistics with their corresponding subcharacteristics.

Many different quality models were developed in the last decades (Wagner 2013). According to Al-Qutaish (2010), it is a real challenge to select which model to use. A comprehensive overview about the different quality models and concepts can be looked up in Al-Qutaish (2010), Miguel et al. (2014), or Wagner (2013, Chapter 2).

2.3 Approaches integrating risk-based testing and quality models

This section aims to review related work about quality models associated with risk-based testing. Research interest on software quality is according to Miguel et al. (2014) as old as software construction itself. Although the concepts of software quality models and risk-based testing were addressed by several research papers and contributions (i.e., Deissenböck et al. (2007), Franch and Carvallo (2003) or Felderer and Schieferdecker (2014)), we found no related work which explicitly deals with integrating quality models and risk-based testing. Moreover, no existing risk-based testing approach in the literature especially considers quality-related information as basis for further execution of the risk-based testing process. Neither a tool was available in the literature that supports risk-based testing by using quality-related information. However, two contributions were identified in the literature which can be seen as related work as their contribution partially deals with integrating quality related information and software testing.

The first contribution (Neubauer et al. 2014) deals with the extension of the “Active Continuous Quality Control” (ACQC) approach (Windmüller et al. (2013)) for supporting risk-based testing. ACQC is an approach that aims to automatically maintain software test models by employing automata learning technology. Neubauer et al. (2014) extended the ACQC approach with risk analysts in order to support risk-based testing. Although the ACQC approach is not a quality model, we assumed this contribution as relevant as it actively integrates risk analysts in the ACQC approach in order to prioritize critical user interations with the software system.

The second contribution (Zeiss et al. 2007) adapted the ISO/IEC 9126-1 (2001) quality model to test specifications. Concretely, the authors developed a quality model for test specifications which is based on seven internal quality characteristics provided by the ISO/IEC 9126 domain. We assumed this contribution as important because the usage of quality models which instantiate test specifications seems to be promising for the integration with risk-based testing.

3 Integration of quality models into risk-based testing

In this section, we present two generic approaches how quality models can be integrated into risk-based testing. The presented approaches are based on previous work of Felderer et al. (2012) who defined a model-based risk assessment procedure integrated in a generic risk-based testing process.

The risk assessment procedure defined by Felderer et al. applies automatic risk assessment by static analysis. Therefore, Felderer et al. suggest the usage of automatic metrics for determining the probability and impact factor. Due the fact that metrics for the impact factor are typically derived from requirements and depend on the evaluator’s viewpoint, they are usually evaluated manually. At least web applications could use Google Analytics (2005) to determine, for example, the frequency of use and therefore the importance of single parts of the system. Changing the look and feel of an online banking system’s graphical user interface is typically not as often used as the function to transfer money. One further possibility to determine the frequency of use is to use earlier deployed versions of a software system. In the case that a software system is developed completely new and no previous versions are available, the frequency of use can be determined by analyzing similar software systems from the same category (i.e., web browsers, accounting software systems).

According to Van Veenendaal (2009, p. 9), determining the probability factor means predicting where the most defects are located in the software. The most defects are typically located in the worst areas of the software. Redmill (2004, p. 8) further suggests to observe the quality of the documentation and structure of the software code for determining the likelihood of defects. Moreover, Van Veenendaal (2009, p. 9) claims that one of the most important defect generators is complexity. For determining the complexity, a lot of different metrics (i.e., McCabe (1976)) are available (Van Veenendaal 2009, p. 9). Nagappan et al. (2006) found that code complexity metrics are an effective mean to predict defects in software code. Further research showed (i.e., Catal et al. (2007), Jiang et al. (2008), Radjenovic et al. (2013)) that software metrics in general are useful for predicting defects and their location in software products. In the following, Felderer et al. (2012, p. 163) claim that the employment of different metrics, which can usually be determined automatically, can serve as a basis for determining the probability factor.

Current quality models use an integrated chain from rather abstract software characteristics down to specific metrics. Therefore, quality models typically decompose the concept of quality into different factors, criteria, and metrics (Factor Criteria Metrics models (Cavano and McCall 1978)). A quality assessment further evaluates and specifies the quality of a software product based on a defined Factor Criteria Metric hierarchy.

Based on the previous explanation and discussion in this section, one can argue that the probability factor can be used to integrate quality assessments based on quality models into risk-based testing. Due to the fact that the probability factor is computed mainly based on metrics (Felderer et al. 2012, p. 163) as well as considers complexity (Van Veenendaal 2009, p. 9) and the quality of the structure of the software product (Redmill 2004, p. 8), it seems appropriate to use the probability factor for integrating quality assessments, which are based on metrics that define the quality characteristics, into risk-based testing. In addition, also Redmill (2005, p. 13) mentions the possibility to use quality factors as “surrogates” for determining the probability factor.

Hence, our basic integration idea is to process the information from a quality assessment based on a quality model in a way that it represents the probability factor in the concept of risk-based testing. According to the explanation that the impact factor is mainly determined manually (Felderer et al. 2012, p. 166) and varies based on different possible perspectives (Redmill 2004, p. 7), it does not seem appropriate to integrate quality assessments based on quality models into risk-based testing by using the impact factor. Therefore, the impact factor must be determined manually in our two presented integration approaches.

Felderer et al. suggest assigning the probability factor to units or components of a software system. Units are the technical implementations of a software system and components can contain several units. As a result, the approaches presented in the following aim to integrate the quality assessments of quality models into risk-based testing by mapping the information provided by the assessments to an adequate probability factor for each component or unit. For the sake of comprehensibility, the following description of the two approaches uses the more generic term component.

3.1 Approach 1

First, a quality assessment based on the defined quality model is conducted for each component (the setting of the quality model (i.e., which quality factors are used) must be the same for each quality assessment).

The quality assessment of each component must then further be processed to an adequate probability factor for each component under test. Meaning the results of the quality assessments (which are represented by the quality factors at the highest level of the quality model hierarchy) are used to determine the probability factors for each component. Hence, components with high quality are assumed to have a low probability of defects and vice versa. The impact factor must further be determined for each component, and finally, the risk coefficient can be calculated for each component. Figure 2 illustrates the approach by an example. The rounded rectangles in Fig. 2 represent the quality factors, the octagons as the criteria, and the ellipsis as the metrics of the Factor Criteria Metric hierarchy. Further, the risk-based testing concept is represented by rectangles (Probability, Impact and Risk).

Components are illustrated by rectangles with cut corners. The red arrows represent the determination of the probability factor. In Fig. 2, a quality assessment is conducted for two components (component x and component y). Supposing the quality assessment specifies a high quality for component x and a low quality for component y, the probability factor for component x is then assumed to be lower as for component y because the high quality of component x indicates a low probability of defects. Further, the impact factor for both components must be determined. Finally, the risk coefficient for component x and y can be calculated by multiplying the probability and impact factor.

3.2 Approach 2

The second approach aims to directly use the metrics on the lowest level of the quality model hierarchy. Therefore, one single quality assessment for a software product is necessary and the measured values of each metric and component must further be processed to an adequate probability factor for each component.

Moreover, the impact factor must be determined for each component. Finally, the risk coefficient can be calculated by multiplying both factors. Figure 3 illustrates the approach by an example (the used symbols have the same meaning as defined above). Supposing there are three metrics (Metric a, Metric b, and Metric c) defined in the quality model and two components (component x and component y). Metric a, which measures the lines of code of each component, measures 340 lines of code for component x and 780 lines of code for component y. Metric b, which measures the nesting depth of each component, measures the values 16 for component x and 20 for component y. For the sake of comprehensibility, Metric c is skipped for the further illustration of the example.

Based on the measured values of each metric, an adequate probability factor can be calculated for each component. As a result, component y is assumed to have a higher probability factor as component x because the measured values indicate a higher complexity and therefore a higher probability of defects for component x.

To summarize, the first difference between the two approaches is that approach 1 requires a separate quality assessment for each component of a software product and approach 2 only requires one single quality assessment for a software product no matter how many components the software product has. The second difference is that approach 1 directly uses the quality assessment results which are represented by the quality factors at the highest level of the quality model hierarchy to determine the probability factor. Approach 2 directly uses the metrics on the lowest level of the quality model hierarchy to determine the probability factor for each component. As the quality factors are based on the defined criteria and further on the assigned metrics, the defined aggregations and evaluations (i.e., which metric affects which criteria, how are the metrics aggregated) play a key role in approach 1 whereas approach 2 is not impacted by this (it uses only the metrics).

The decision, which of these two approaches should be applied, must be made for each concrete quality model. In the next section, we demonstrate the decision process and the integration with the open quality model QuaMoCo and compare both integration approaches.

4 Usage of QuaMoCo in risk-based testing

In this section, we present the integration of quality assessments and risk-based testing on the basis of the open quality model QuaMoCo. First, Section 4.1 introduces the main concepts of the open quality model QuaMoCo. Section 4.2 illustrates the application of the selected integration approach. Finally, Section 4.3 illustrates the implementation of the integration approach.

4.1 QuaMoCo

The open quality model QuaMoCo (Quality Modeling and Control) is an operationalized software quality model together with a tool chain containing a quality assessment method for defining as well as assessing software quality (Wagner et al. 2015; Deissenböck et al. 2011). The declared aim of QuaMoCo was to close the gap between generic and abstract software quality characteristics and concrete quality metrics ( Wagner et al. 2012a, 2015). QuaMoCo was developed by software quality experts from academia (Fraunhofer IESE^{Footnote 2} and Technische Universität München^{Footnote 3}) and industry (Siemens^{Footnote 4}, SAP AG^{Footnote 5}, Capgemini^{Footnote 6} and iestra^{Footnote 7}) (Wagner et al. 2015).

The main concept used in the QuaMoCo quality model is a factor. “A factor expresses a property of an entity” (Wagner et al. 2015, p. 104) whereas an entity “describe[s] the things that are important for quality” (Wagner et al. 2015, p. 104). The attributes of those entities (entities are for example a class, a method, an interface or the whole software product) are described by properties. For example, Maintainability would be a property of the entity Software Product and Detail Complexity a property of the entity Class. In order to bridge the gap between concrete metrics and abstract quality characteristics, Wagner et al. use the concept of a factor on two different levels of abstraction.

The first factor is named Quality Aspect and describes the abstract quality characteristics provided by the ISO/IEC 25010 (i.e., Maintainability, Security, Portability...). Quality aspects have the whole software product as their entity (i.e., Maintainability of the whole software product). Product Factors are the second type of factors and represent attributes (properties) of parts of the software product (i.e., Detail Complexity, Duplication). These two factors can both consist of several sub-aspects (in case of quality aspects) and sub-factors (in case of product factors). An important requirement regarding the leaf product factors is that they must be measurable. Therefore, Wagner et al. require them “to be concrete enough to be measured” (Wagner et al. 2015, p. 104). For example, the product factor Detail Complexity of Method can be measured by nesting depth and length. Further, the seperation of entities and their properties allows decomposing the product factors either regarding their entity or property. For example, the entity Class can be decomposed into the entities Field and Method.

This addresses the common problem of the difficult decomposition of quality attributes. In order to bridge the gap between the measurable properties of a software product and the abstract quality aspects, Wagner et al. set the abstract quality aspects in relation to the product factors. Concretely, product factors can either have a positive or negative Impact on quality aspects. For example, the presence of the product factor Detail Complexity of Method negatively affects the quality aspects Analysability and Maintainability. Further, the presence of the product factor Conforming to Naming Convention of Class name positively affects the quality aspects Analyzability and Testability.

For measuring the leaf product factors, Wagner et al. introduced the concept of a measure (metric). Although some authors (i.e., Pressman 2010, p. 614f) see a subtle difference between the terms metric and measure, we use the term metric in this article on behalf of both terms as suggested by Wagner (2013, p. 43). “A measure [metric] is a concrete description of how a specific product factor should be quantified for a specific context” (Wagner et al. 2015, p. 105). For example, the product factor Detail Complexity of Method is measured by the metric deep nesting. There can be multiple metrics for one product factor and a metric further can be used for quantifying multiple product factors. As “a concrete implementation of a measure [metric]” (Wagner et al. 2015, p. 105) Instruments are used. The metrics are seperated from their instruments in order to provide the possibility to collect data for metrics with different tools or manually.

For getting a complete quality evaluation of a software product, Evaluations are assigned to quality aspects and product factors. The evaluations consist of formula which aggregate the measured metrics from the instruments (for the product factors) as well as the evaluation results caused by the impacts of the product factors on the quality aspects. The left side of Fig. 4 illustrates the so far discussed quality model concepts. On the right side, some concrete quality aspects, product factors as well as metrics and instruments are shown. Here, it can be seen that the product factor Detail Complexity of the entity Method is measured by deep nesting which is further determined by the quality assessment tool ConQAT. In addition, the product factor Duplication of the entity Source Code Part is measured by clone coverage and clone overhead which are both determined by ConQAT. Both product factors negatively impact the quality aspects Analysability and Modifiability which are both sub-quality aspects of the quality aspect Maintainability. As follows, a more detailed example of a quality assessment is illustrated.

Example

Figure 5 shows a limited quality assessment example of the Java Platform (version 6) with three metrics M ₁ (#Doomed test for quality to NaN), M ₂ (#Lines of source code), and M ₃ (Floating Point equality) as well as one leaf product factor F _1.1 (General expression applicability of comparison expression). The example is taken from Wagner et al. (2015, p. 110f).

The values of the three metrics are: M ₁ = 6, M ₂ = 2 759 369 and M ₃ = 9. For ensuring the comparability across different software products, Wagner et al. defined normalization metrics (i.e., number of class, LoC) for normalizing the metrics. The normalization metrics were defined by two measurement experts for each metric (Wagner et al. 2012a, p. 1138). The metrics M ₁ and M ₃ in the example are normalized based on metric M ₂ which results in the normalized metric $ {M}_4=\frac{M_1}{M_2}=2.17E-6 $ for M ₁ and $ {M}_5=\frac{M_3}{M_2}=3.19E-6 $ for M ₃. As a result of this normalization, the metrics M ₁ and M ₃ can be compared with other software products.

In the next step, the utility functions for M ₄ and M ₅ are defined whereas the utility of both is represented by a decreasing function. For specifying the utility, each metric has a linear decreasing or increasing utility function according to its associated leaf product factor (Wagner et al. 2012a, p. 1138). These utility functions provide a value between 0 and 1 whereas thresholds for the maximal utility (1) and the minimal utility (0) are determined by a benchmarking approach based on a large number of software products. The minimum and maximum thresholds for M ₄ and M ₅ are min(M ₄) = 0, max(M ₄) = 8.50E − 6 and min(M ₅) = 0, max(M ₅) = 3.26E − 6. Based on the defined thresholds, the utility values are 0.74 for M ₄ (U(M ₄)) and 0.89 for M ₅ (U(M ₅)). Figure 6 illustrates the utility function for metric M ₄ with the two thresholds as well as the resulting utility value 0.74.

Finally, the utility values are aggregated based on their weights. The weights were assigned based on expert opinion or available data (Wagner et al. 2015, p. 111). The assigned weight for M ₄ is 0.25 $ \left({w}_{M_4}\right) $ and 0.75 $ \left({w}_{M_5}\right) $ for M ₅. This means the metric M ₅ is three times more important for determining the leaf product factor General expression applicability of comparison expression (F _1.1) as metric M ₄.

As a result, the aggregated utility value of this product factor (U(F _1.1)) is 0.25 * 0.74 + 0.75 * 0.89 = 0.85. For determining the higher level product factors as well as quality aspects (i.e., F ₁ (Functional Correctness)), the same aggregation principle can be applied. As a last step, the aggregated utility values are mapped into a German ordinal school grade scale. The school grade scale provides a range from 1 (best grade) to 6 (worst grade), whereas the used thresholds are shown in Fig. 7.

For the application of the QuaMoCo approach, a tool chain which supports editing, building, and adapting quality models, assessing software products as well as visualizing the quality assessment results, was developed. The QuaMoCo tool chain^{Footnote 8} is freely available under the Apache license on the internet.

The tool chain consists of two main parts: the QuaMoCo quality editor and the quality assessment engine (Deissenböck et al. 2011). The aim of the quality editor is to provide the possibility to edit quality models by defining metrics, weights, or utility functions. The quality assessment engine automates the quality assessment procedure and is based on the toolkit ConQAT^{Footnote 9} (version 2013.10 used in this article). ConQAT is a quality assessment toolkit which integrates several state-of-the-art code analysis tools (i.e., FindBugs, Gendarme, PMD, FxCop) and quality metrics.

4.2 Integration approach

This section presents the integration of the open quality model QuaMoCo and risk-based testing. We chose approach 2 for the integration of QuaMoCo and risk-based testing because the metrics in the QuaMoCo quality model were calibrated by benchmarking whole software products (Wagner et al. 2015) and are therefore not appropriate for the usage on single components (Approach 1).

Our integration approach limits to the programming language Java because we assumed to focus the integration approach on a specific programming language. Java was chosen because it ensures a huge repository of open source projects on which the integration approach can be applied. This means that the integration approach only uses quality assessments based on the Java quality model of QuaMoCo (Java module). An extension on other modules of QuaMoCo (i.e., C#) is planned for possible future work.

In the next step, the used components for which the risk coefficient should be calculated must be specified. We assumed classes of software products as components because, on the one hand, Java software products are typically structured hierarchically in packages and classes. On the other hand, QuaMoCo already provides the measured values for each metric on the class level.

As a result, the main principle of the integration approach is to analyze all metrics provided by the Java quality model of QuaMoCo based on their values according to each class in order to determine the probability factor of the risk-based testing concept. As already stated, the impact factor is not determined based on the quality assessment results of QuaMoCo and must be determined manually. We provide a suggestion for determining the impact factor manually. Afterwards, the determination of the probability factor is presented (Section 4.2.1). Further, a suggestion how to determine the impact factor in a manual way is outlined (4.2.2). Lastly, the final integration approach for determining the risk coefficient is presented (4.2.3).

4.2.1 Determination of the probability factor

As a first step, we investigated all 23 metrics provided by QuaMoCo (Java module) and excluded those which were not appropriate for further usage (i.e., metrics which describe the overall software product and do not represent relevant information for a class, i.e., number of classes).

The remaining 13 metrics were divided in two groups. The first group (in the following referred as Complexity Metrics) is represented by 10 general metrics which measure properties of the source code directly for each class (i.e., nesting depth, number of methods, lines of code, etc.). On the other hand, the second group (in the following referred as Rule Checking Metrics) consists of metrics which are based on common rule checkers (i.e., FindBugs, Checkstyle, or PMD). These instruments aim to look for defects in the code, to find common programming flaws (i.e., empty catch blocks, unused variables, etc.), or to check if the code adheres to a coding standard (FindBugs 2003; PMD 2015; Checkstyle 2001). These instruments typically use defined rules for analyzing the code and provide their results as a list of findings. Concretely, QuaMoCo contains 361 rules/metrics for FindBugs, 4 rules/metrics for PMD, and 2 rules/metrics for Javadoc.

For further development of the integration approach, it is important to differentiate between these two groups. Table 1 presents the selected metrics for both groups.

Table 1 Quality assessment metrics

Integrating software quality models into risk-based testing

Abstract

Similar content being viewed by others

Test case selection and prioritization using machine learning: a systematic literature review

Software defect prediction: future directions and challenges

Empirical Research in Software Engineering — A Literature Survey

1 Introduction

2 Background and related work

2.1 Risk-based testing

2.2 Software quality models

2.3 Approaches integrating risk-based testing and quality models

3 Integration of quality models into risk-based testing

3.1 Approach 1

3.2 Approach 2

4 Usage of QuaMoCo in risk-based testing

4.1 QuaMoCo

Example

4.2 Integration approach

4.2.1 Determination of the probability factor

Complexity metrics (Complexity factor)

Example complexity factor

Rule checking metrics (FindBugs, PMD, Javadoc factors)

Example rule checking metrics

Determination of the weights

Final probability factor

4.2.2 Determination of the impact factor

4.2.3 Determination of the risk coefficient

4.3 Tool implementation

5 Study design

5.1 Research questions

5.2 Case selection

5.3 Data collection procedure

5.4 Analysis procedure

5.5 Validity procedure

6 Results and discussion

6.1 Is there a relationship between the risk coefficient and the number of defects of a class? (RQ1)

6.2 How is the performance of a risk-based testing strategy compared with a line of code-based testing strategy? (RQ2)

6.3 Threats to validity

6.4 Construct validity

6.5 Reliability

6.6 Conclusion validity

6.7 External validity

7 Conclusion and future work

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation