Overhead for MCC compared to MC/DC
We executed an analysis of the number of test cases needed for MCC, and MC/DC, respectively, for all possible Boolean expressions up to 5 conditions: For a Boolean expression with 2 conditions, 2 expressions are possible: A&& B, and A || B. For a Boolean expression with 3 conditions, 4 expressions are possible: A&& B&& C, A&& B || C, A || B&& C, and A || B || C. For a Boolean expression with 4 conditions 8 different expressions are possible, and for a Boolean expression with 5 conditions 16 different expressions are possible.
Besides that we can integrate parentheses, this increases the number of possible Boolean expressions (OP...operator): For a Boolean expression with 4 conditions 6 different ways to apply parentheses are possible (for each of the 8 expressions): (A OP B) OP C OP D; A OP (B OP C) OP D; A OP B OP (C OP D); (A OP B) OP (C OP D); (A OP B OP C) OP D; A OP (B OP C OP D). For a Boolean expression with 5 conditions 14 different ways to apply parentheses are possible (for each of the 16 expressions).
For the evaluation we only considered variants of the Boolean expressions for which the parentheses have an impact on the result of the expression: A&& B&& C is the same as (A&& B)&& C, whereas A || B&& C differs from (A || B)&& C. The idea of these permutations is to cover as many different Boolean expressions as possible.
Then we determine the number of required test cases for each Boolean expression by enumeration.
Analyzing the number of test cases required for MCC with short-circuit evaluation for all possible Boolean expressions with up to 5 conditions (without parentheses wp and including parentheses ip) shows following results, see Table 6.
Table 6 Summary of the results
The columns are described in the following.
-
Cond
:
-
Number of conditions.
-
MCDC
:
-
Number of test cases for MC/DC.
-
MCC_maxwp
:
-
Maximum number of test cases for MCC for all Boolean expressions without parentheses.
-
MCC_maxip
:
-
Maximum number of test cases for MCC for all Boolean expressions including parentheses.
-
OH_max
:
-
Considering that a system under test has decisions with Boolean expressions that require the maximum number of observed test cases for MCC, this value would be the maximum overhead for the number of test cases for MCC (compared to the number of required test cases for MC/DC).
-
OH_avwp
:
-
Considering that a system under test contains all kinds of possible Boolean expressions with N conditions without parentheses (uniformly distributed), this value describes the average overhead for the required test cases for MCC.
-
OH_avip
:
-
Considering that a system under test contains all kinds of possible Boolean expressions with N conditions also including parentheses (uniformly distributed), this value describes the average overhead for the required test cases for MCC.
A value of 60 % overhead means that a test set for MCC requires 60 % more test cases than the test set for MC/DC.Example: The number of test cases for MC/DC is 100, then the number of test cases for MCC is 160.
The maximum overhead of 116 % means that a test set for MCC is approximately twice as big as the test set for MC/DC (e.g., 100 test cases for MC/DC means 216 test cases for MCC).
Observations from this survey:
-
For 2 conditions the number of test cases for MCC is always equal to the number of test cases for MC/DC.
-
For 3, or 4 conditions, respectively, the maximum overhead for MCC is 25 %, or 60 %, respectively. So in the worst case (many decisions with Boolean expressions containing 4 conditions), MCC testing means an overhead of 60 % for the test cases (in comparison to MC/DC). This overhead is acceptable.
-
For 3, or 4 conditions, respectively, the average overhead (including also expressions with parentheses) for MCC is approx. 9 %, or 20 %, respectively, so almost negligible.
-
Even for 5 conditions the average overhead is around 35 % (compared to the number of test cases for MC/DC), which is still feasible for testing. Based on the experiences from our case studies, the number of conditions within software for the automotive domain is often limited by 5, so the resulting overhead is acceptable.
Error-detection effectiveness MCC versus MC/DC
The main attribute of a test set we are interested in is the error-detection effectiveness, i.e. how many errors are detected, or how many errors are not detected, respectively. For a comparison between the error-detection effectiveness of the MCC-test set compared to the MC/DC-test set, we are interested in errors in the program that are detected by the MCC-test set, but reveal undetected by the MC/DC-test set.
Fault versus Error: Although the terms fault, error, and failure are well-defined they are sometimes used in a confusing way in literature. Referring to [9] we have to distinguish three different terms for erroneous system behavior.
-
A fault is the cause of an error, and thus the indirect cause of a failure.
-
An error is an unintended, resp. incorrect, internal state of a computer.
-
A failure is an event that denotes a deviation between the actual service and the specified or intended service, occurring at a particular point in real time. Most computer-system failures can be traced to an incorrect internal state of the computer, e.g., a wrong data element in the memory or a register.
We consider a programming mistake as a fault, the consequence is an error in the software upon activation, the error becomes effective when this error produces erroneous data which affect the delivered service, then a failure occurs. In the literature about testing and the effectiveness of a test set, both terms occur:
Fault-detection effectiveness and error-detection effectiveness. Indeed we are interested in determining both: the errors and the underlying faults, but within our testing process we can only observe the errors. Therefore in this work we stick to the term error-detection effectiveness.
Minimization of the test set: The initial test set includes redundant test cases that are not necessary to achieve the intended coverage. In a first step we have to reduce the test set to a minimal test set, i.e. removing the redundant test cases. This minimal test set is used for the test runs to determine the error-detection effectiveness.
Program mutations: To determine the real error-detection effectiveness we introduce faults into the original programs. These faults can relate to a mutation for an operator, the name of a variable, or concrete values.
Consider the example given in Listing 1. Let us assume the programmer omits the brackets in the Boolean expression in line 4, resulting in the program given in Listing 2. The required test sets for MCC and MC/DC are given in Table 7. The test case marked in bold depicts the test case capable to detect the introduced fault.
Table 7 Test sets for MCC and MC/DC for the listing 1
Running these test sets on the original program results in full MCC, and full MC/DC, respectively. For executing the mutated program with the MC/DC-test set all the test cases pass the testing, no erroneous behavior is observed. Executing the mutated program with the MCC-test set, shows an error for test case \(\bar{7}\). So for this example the MCC-test set is capable to detect the introduced mutation, whereas the MC/DC-test does not detect the error.
Latent faults/errors: We always have to be aware of that there may occur mutations in the program that have no effect on the observable values. This kind of faults/errors are called latent, see for instance the example given in Listing 3: The mutation occurs in line 3 where switch2 is written instead of switch1. This mutation of the name of a variable has no impact on the control flow of the program, thus no effect on the resulting value for the variable erg. This kind of mutation can not be detected by any test case, neither by the MC/DC-test set, nor by the MCC-test set.
Error-detection effectiveness for the case study: For our case study we considered 100 different mutations, with the result that 4 out of them were errors that were not detected by the MC/DC-test set, but were detected by the MCC-test set. This may seem a low rate of undetected errors, but keeping in mind that we deal with safety-relevant software, for which we aim at high reliability, 4 % undetected errors is a very high rate.
Discussion of the results
Based on the observations from the case study, we question the reasonability of MC/DC instead of MCC for software from the automotive domain (with a manageable complexity, i.e. a limited number of conditions) realized in a programming language with short-circuit evaluation. We learned that the overhead for the MCC-test set is almost negligible (regarding the number of test cases) in comparison to an MC/DC-test set. As we showed in the analysis the number of test cases required for MCC (for a system implemented in a language with short-circuit evaluation) causes only a small overhead (5 % for our case study) for testing in comparison to MC/DC. This can be explained in following way: Many of the decisions with a complex Boolean expression contain only 2 conditions. For 2 conditions the number of test cases required for MCC is equal to the number of test cases required for MC/DC (for both 3 test cases), so the MCC-test set is the same as the MC/DC-test set for these decisions. Some decisions contain more conditions, even for these decisions the additional test cases for MCC are only a few (e.g., 5 vs. 4 test cases for A&& B ||C, see Example_B in Sect. 3.4). Furthermore, we showed in the comparison of the error-detection effectiveness of MCC vs. MC/DC that some errors are only detected by the MCC-test set. In contrast to an MC/DC-test set, the MCC-test set covers the whole possible input-data space, so it guarantees that all detectable errors are identified. With the restricted MC/DC-test set, not all errors may be identified.
The usage of MC/DC makes sense as a qualitative means to assess the maturity of the software development process. The metric can be used to prove whether the requirements defined in the system specification map the implemented code (a poor value for MC/DC for a test set generated requirement-based indicates a lack in the specification, or unspecified functionality in the implemented code). This kind of deviations indicate a gap between the specification and the implementation. The use of MC/DC used as a quantitative measure is reasonable when it is used as an alternative coverage metric to stronger coverage metrics, like MCC, because it is not feasible to realize full testing (stronger in this context means that the test set of MCC is a superset of the test set of MC/DC, i.e. the test cases of MCC cover a larger part of the input data space, thus the ability to detect errors in the program is higher). But as far as the overhead for MCC is so low, MCC is better suitable as a quantitative measure for the evaluation of the testing process for safety-relevant programs implemented in a programming language with short-circuit evaluation.
Regarding the guidelines of the standard ISO 26262 [3] and addressing the aim of high reliability required for safety-relevant programs it would be desirable to combine the benefits of both metrics: As deriving the MC/DC-test set is a non-trivial issue this process assumes a detailed analysis of the structure (the control flow) of the program. So building a test set to achieve maximum MC/DC for a system under test forces the test engineers to study both, the specification and the implementation, in a very precise way. This activity by itself enforces the quality of the testing process. On the other side, by achieving full MCC it is guaranteed that all detectable errors are identified (not detectable errors are latent faults or errors, deviations in the program that have no impact on the resulting output; these errors are not detectable, no matter which test cases are applied).
Besides that, the test engineer should always be aware of that a structural code coverage metric is only evaluated based on the implementation. Achieving a specific coverage goal by incremental test case-generation until \(x\,\%\) coverage is achieved may increase the part of the tested code. But in the sense of a structured verification process, i.e., checking whether the system is conform with the specification, or not, this is by far not sufficient, see also [10]. In this white paper Büchner defines some commonly used code coverage measures and discusses their strength and weakness. Small examples are used to illustrate some measures to indicate common traps and pitfalls. Two main weaknesses of code coverage are identified: (1) Code coverage measurements cannot detect omissions, e.g. missing or incomplete code. (2) Code coverage measurement is insensitive to calculations (Example: Given a complex calculation as part of the control flow, a single input may cover this calculation regarding a structural code coverage metric, thus achieving 100 % coverage with only one test input. But this single test input does not verify the correctness of the complex calculation.). An increasing value for code coverage indicates a progress in the testing process, nevertheless achieving 100 % code coverage is not sufficient to rely on the proper functioning of a system.
Structural code coverage metrics should only be a supplement to approaches like requirement-based testing, in which the requirements guide the testing process (and not the test data), see [11] and [12].
[13] mentions some misunderstandings of the MC/DC Objective:
-
Not understanding the intent of structural coverage.
-
Trying to meet the MC/DC objective apart from requirement-based testing (that is, using the source code to derive inputs for all test cases).
-
Using MC/DC as a testing method (that is, expecting MC/DC to find errors instead of assuring that requirements-based testing is adequate).
-
Etc.
A coverage criterion is only a means to define a set of test cases and providing a quantitative measure which parts of the control flow and which subset of the input-data space is covered by this test set. Test sets for the DC-, MC/DC-, and MCC-criterion guarantee that all the branches of the control flow graph are covered by running the test set. But the MCC-test set contains more test cases than the MC/DC-test set, i.e. it covers more values of the input-data space. With MCC all the possible inputs for a decision are considered for testing, thus it covers the complete input-data space and assures that all detectable faults (i.e., all faults, except the latent faults) are detected by testing. The MC/DC-test set, as a subset of the MCC-test set, contains less test cases, thus it covers a smaller subset of the input-data space as the MCC-test set. This causes a decreased fault-detection sensitivity. As long as the overhead for an MCC-test set is reasonable, it is always better to use MCC instead of MC/DC.
The problem we see with MC/DC in the context of ISO 26262 is, that it is only mentioned as a metric to be fulfilled for the testing of software. The standard does not give any guidelines about requirement-based testing, nor does it give any advice how to use MC/DC as a technique as part of the software testing process. The danger is that achieving full MC/DC may be used as an argument for a sufficient testing process. But it has following fundamental restrictions:
-
(a)
MC/DC used as a quantitative measure is only reasonable if the test cases are derived directly from the requirements (and not by any other means, like static analysis of the source code).
-
(b)
MC/DC cannot be used to argue for reliability of a system regarding the confidence in error-freeness.
To cover as much as possible of the input-data space to maximize the probability to detect faults in the program, we recommend to use MCC instead of MC/DC as a code coverage metric (as long as the overhead is acceptable).