Does choice of mutation tool matter?

Gopinath, Rahul; Ahmed, Iftekhar; Alipour, Mohammad Amin; Jensen, Carlos; Groce, Alex

doi:10.1007/s11219-016-9317-7

Does choice of mutation tool matter?

Published: 10 May 2016

Volume 25, pages 871–920, (2017)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Rahul Gopinath¹,
Iftekhar Ahmed²,
Mohammad Amin Alipour²,
Carlos Jensen² &
…
Alex Groce²

460 Accesses
8 Citations
Explore all metrics

Abstract

Though mutation analysis is the primary means of evaluating the quality of test suites, it suffers from inadequate standardization. Mutation analysis tools vary based on language, when mutants are generated (phase of compilation), and target audience. Mutation tools rarely implement the complete set of operators proposed in the literature and mostly implement at least a few domain-specific mutation operators. Thus different tools may not always agree on the mutant kills of a test suite. Few criteria exist to guide a practitioner in choosing the right tool for either evaluating effectiveness of a test suite or for comparing different testing techniques. We investigate an ensemble of measures for evaluating efficacy of mutants produced by different tools. These include the traditional difficulty of detection, strength of minimal sets, and the diversity of mutants, as well as the information carried by the mutants produced. We find that mutation tools rarely agree. The disagreement between scores can be large, and the variation due to characteristics of the project—even after accounting for difference due to test suites—is a significant factor. However, the mean difference between tools is very small, indicating that no single tool consistently skews mutation scores high or low for all projects. These results suggest that experiments yielding small differences in mutation score, especially using a single tool, or a small number of projects may not be reliable. There is a clear need for greater standardization of mutation analysis. We propose one approach for such a standardization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Coverage-based quality metric of mutation operators for test suite improvement

Article 06 November 2018

Pedro Delgado-Pérez, Louis M. Rose & Inmaculada Medina-Bulo

Evaluating Mutation Operator and Test Case Effectiveness by Means of Mutation Testing

Toward Generalization of Mutant Clustering Results in Mutation Testing

Notes

Very often, a single high level statement is implemented as multiple lower level instructions. Hence, a simple change in assembly may not have an equivalent source representation. See Pit switch mutator (Coles 2016b) for an example which does not have a direct source equivalent.
See Pit return values mutator (Coles 2016b) for an example where first-order source changes imply much larger bytecode changes.
By semantics, we mean the actual behavior (in contrast to the static syntax) of the mutants. That is, some mutants, while syntactically different, are actually indistinguishable in their behavior. Similarly mutants may be hard or easy to detect, and a set of mutants may encode more difference in behavior than another set. We use measures such as mutual information and entropy to measure the ability of a set of mutants to provide a diverse a behavior set.
For any set of mutants, the strength of a test suite required to detect them depends on the number of non-redundant mutants within that set. Thus, for this paper, we define the strength of a set of mutants as the number of non-redundant mutants within that set.
Diversity of a set of mutants refers to how different one can expect any two mutants from the set to be, in terms of the tests that kill them. For example, say we have mutant set A, and killing tests given by $\{(m_1,t_1), (m_2,t_2)\}$, and mutant set B and killing tests given by $\{(m_1,t_1), (m_2,t_2), (m_3,t_3)\}$, both have similar diversity, while another set C given by $\{(m_1,t_1), (m_2,t_1)\}$ has a different diversity.
Note that the LOC given by Delahaye et al. is ambiguous. The text suggests that the LOC is that of the program. However, checking the LOC of some of the programs such as jopt-simple and commons-lang suggests that the given LOC is that of the test suite (and it is reported in the table as details of the test suite). Hence we do not include LOC details here.
The Siemens test suite is a test suite curated by researchers (Untch 2009), and this is at best a questionable representative for real-world test suites.
Even though a script mode is available, it still requires GUI to be present, and communication with its authors did not produce any assistance on this point.
In the case of Pit, we extended Pit to provide a more complete set of mutants, a modification which was latter accepted to the main line (Pit 1.0).
Statistical significance is the confidence we have in our estimates. It says nothing about the effect size. That is, we can be highly confident of a small consistent difference, but it may not be practically relevant.
Analysis of variance—ANOVA—is a statistical procedure used to compare the goodness of fit of statistical models. It can tell us whether a variable contributes significantly (statistical) to the variation in the dependent variable by comparing against a model that does not contain that variable. If the p value—given in tables as $Pr({>}F)$—is not statistically significant, it is an indication that the variable contributes little to the model fit. Note that the $R^2$ reported is adjusted $R^2$ after adjusting for the effect of complexity of the model due to the number of variables considered.

References

Acree, Jr. A. T. (1980). On mutation. Ph.D. dissertation, Georgia Institute of Technology, Atlanta, GA, USA.
Ammann, P. (2015a). Problems with jester. https://sites.google.com/site/mutationworkshop2015/program/MutationKeynote.
Ammann, P. (2015b). Transforming mutation testing from the technology of the future into the technology of the present. In International conference on software testing, verification and validation workshops. IEEE.
Ammann, P., Delamaro, M. E., & Offutt, J. (2014). Establishing theoretical minimal sets of mutants. In International conference on software testing, verification and validation (pp. 21–30). Washington, DC, USA: IEEE Computer Society.
Andrews, J. H., Briand, L. C., & Labiche, Y. (2005). Is mutation an appropriate tool for testing experiments? In International conference on software engineering (pp. 402–411). IEEE.
Andrews, J. H., Briand, L. C., Labiche, Y., & Namin, A. S. (2006). Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Transactions on Software Engineering, 32(8), 608–624.
Article Google Scholar
Apache Software Foundation. (2016). Apache commons. http://commons.apache.org/.
Baldwin, D., & Sayward, F. (1979). Heuristics for determining equivalence of program mutations. DTIC Document: Tech. rep.
Barbosa, E. F., Maldonado, J. C., & Vincenzi, A. M. R. (2001). Toward the determination of sufficient mutant operators for c. Software Testing, Verification and Reliability, 11(2), 113–136.
Article Google Scholar
Budd, T. A. (1980). Mutation analysis of program test data. Ph.D. dissertation, Yale University, New Haven, CT, USA.
Budd, T. A., DeMillo, R. A., Lipton, R. J., & Sayward, F. G. (1980). Theoretical and empirical studies on using program mutation to test the functional correctness of programs. In ACM SIGPLAN-SIGACT symposium on principles of programming languages (pp. 220–233). ACM.
Budd, T. A., Lipton, R. J., DeMillo, R. A., & Sayward, F. G. (1979). Mutation analysis. Yale University, Department of Computer Science.
Budd, T. A., & Gopal, A. S. (1985). Program testing by specification mutation. Computer Languages, 10(1), 63–73.
Article MATH Google Scholar
Cai, X., & Lyu, M. R. (2005). The effect of code coverage on fault detection under different testing profiles. In ACM SIGSOFT software engineering notes (Vol. 30, no. 4, pp. 1–7). ACM.
Chevalley, P., & Thévenod-Fosse, P. (2003). A mutation analysis tool for java programs. International Journal on Software Tools for Technology Transfer, 5(1), 90–103.
Article Google Scholar
Coles, H. (2016). Pit mutation testing. http://pitest.org/.
Coles, H. (2016a). Mutation testing systems for java compared. http://pitest.org/java_mutation_testing_systems/.
Coles, H. (2016b). Pit mutators. http://pitest.org/quickstart/mutators/.
Daran, M., & Thévenod-Fosse, P. (1996). Software error analysis: A real case study involving real faults and mutations. In ACM SIGSOFT international symposium on software testing and analysis (pp. 158–171). ACM.
Delahaye, M., & Du Bousquet, L. (2013). A comparison of mutation analysis tools for java. In Quality software (QSIC), 2013 13th international conference on (pp. 187–195). IEEE.
DeMillo, R. A., Guindi, D. S., McCracken, W., Offutt, A., & King, K. (1988). An extended overview of the mothra software testing environment. In International conference on software testing, verification and validation workshops (pp. 142–151). IEEE.
DeMillo, R. A., Lipton, R. J., & Sayward, F. G. (1978). Hints on test data selection: Help for the practicing programmer. Computer, 11(4), 34–41.
Article Google Scholar
Derezińska, A., & Hałas, K. (2014). Analysis of mutation operators for the python language. In International conference on dependability and complex systems, ser. Advances in Intelligent Systems and Computing (Vol. 286, pp. 155–164). Springer.
Do, H., & Rothermel, G. (2006). On the use of mutation faults in empirical assessments of test case prioritization techniques. IEEE Transactions on Software Engineering, 32(9), 733–752.
Article Google Scholar
Duraes, J., & Madeira, H. (2002). Emulation of software faults by educated mutations at machine-code level. International Symposium on Software Reliability Engineering, 2002, 329–340.
Article Google Scholar
GitHub Inc. (2016). Software repository. http://www.github.com.
Gligoric, M., Groce, A., Zhang, C., Sharma, R., Alipour, M. A., & Marinov, D. (2013). Comparing non-adequate test suites using coverage criteria. In ACM SIGSOFT international symposium on software testing and analysis. ACM.
Gligoric, M., Jagannath, V., & Marinov, D. (2010). Mutmut: Efficient exploration for mutation testing of multithreaded code. In Software testing, verification and validation (ICST), 2010 third international conference on (pp. 55–64). IEEE.
Gopinath, R. (2015). Replication data for: Does choice of mutation tool matter?. http://eecs.osuosl.org/rahul/sqj2015.
Gopinath, R., Alipour, A., Ahmed, I., Jensen, C., & Groce, A. (2015). Do mutation reduction strategies matter? Oregon State University, tech. rep., August 2015, under review for Software Quality Journal. http://hdl.handle.net/1957/56917.
Gopinath, R., Alipour, A., Ahmed, I., Jensen, C., & Groce, A. (2016). On the limits of mutation reduction strategies. In Proceedings of the 38th international conference on software engineering. ACM.
Gopinath, R., Alipour, A., Iftekhar, A., Jensen, C., & Groce, A. (2015). How hard does mutation analysis have to be, anyway? In International symposium on software reliability engineering. IEEE.
Gopinath, R., Jensen, C., & Groce, A. (2014). Code coverage for suite evaluation by developers. In International conference on software engineering. IEEE.
Gopinath, R., Jensen, C., & Groce, A. (2014). Mutations: How close are they to real faults? In Software reliability engineering (ISSRE), 2014 IEEE 25th international symposium on (pp. 189–200), November 2014.
Harder, M., Mellen, J., & Ernst, M.D. (2003). Improving test suites via operational abstraction. In International conference on software engineering (pp. 60–71). IEEE Computer Society.
Harder, M., Morse, B., & Ernst, M. D. (2001). Specification coverage as a measure of test suite quality. MIT Lab for Computer Science: tech. rep.
Irvine, S. A., Pavlinic, T., Trigg, L., Cleary, J. G., Inglis, S., & Utting, M. (2007). Jumble java byte code to measure the effectiveness of unit tests. In Testing: Academic and industrial conference practice and research techniques-MUTATION, 2007. TAICPART- MUTATION 2007 (pp. 169–175). IEEE, 2007.
Jia, Y., & Harman, M. (2008). Milu: A customizable, runtime-optimized higher order mutation testing tool for the full c language. In Practice and Research Techniques, 2008. TAIC PART’08. Testing: Academic & industrial conference (pp. 94–98). IEEE, 2008.
Jia, Y., & Harman, M. (2011). An analysis and survey of the development of mutation testing. IEEE Transactions on Software Engineering, 37(5), 649–678.
Article Google Scholar
Just, R. (2014). The major mutation framework: Efficient and scalable mutation analysis for java. In Proceedings of the 2014 international symposium on software testing and analysis, ser. ISSTA 2014 (pp. 433–436). New York, NY: ACM.
Just, R., Kapfhammer, G. M., & Schweiggert, F. (2012). Do redundant mutants affect the effectiveness and efficiency of mutation analysis? In Software testing, verification and validation (ICST), 2012 IEEE fifth international conference on (pp. 720–725). IEEE.
Just, R., Jalali, D., Inozemtseva, L., Ernst, M. D., Holmes, R., & Fraser, G. (2014). Are mutants a valid substitute for real faults in software testing? ACM SIGSOFT symposium on the foundations of software engineering (pp. 654–665). Hong Kong: ACM.
Google Scholar
Kintis, M., Papadakis, M., & Malevris, N. (2010). Evaluating mutation testing alternatives: A collateral experiment. In Asia Pacific software engineering conference (APSEC) (pp. 300–309). IEEE.
Kurtz, B., Ammann, P., Delamaro, M. E., Offutt, J., & Deng, L. (2014). Mutant subsumption graphs. In Software testing, verification and validation workshops (ICSTW), 2014 IEEE seventh international conference on (pp. 176–185). IEEE, 2014.
Kusano, M., & Wang, C. (2013). Ccmutator: A mutation generator for concurrency constructs in multithreaded c/c++ applications. In Automated software engineering (ASE), 2013 IEEE/ACM 28th international conference on (pp. 722–725). IEEE.
Langdon, W. B., Harman, M., & Jia, Y. (2010). Efficient multi-objective higher order mutation testing with genetic programming. Journal of systems and Software, 83(12), 2416–2430.
Article Google Scholar
Le, D., Alipour, M. A., Gopinath, R., & Groce, A. (2014). Mucheck: An extensible tool for mutation testing of haskell programs. In Proceedings of the 2014 international symposium on software testing and analysis (pp. 429–432). ACM.
Lipton, R. J. (1971). Fault diagnosis of computer programs. Carnegie Mellon University, Tech. rep.
Ma, Y.-S., Kwon, Y.-R., & Offutt, J. (2002). Inter-class mutation operators for java. In International symposium on software reliability engineering (pp. 352–363). IEEE.
Ma, Y.-S., Offutt, J., & Kwon, Y.-R. (2006). Mujava: A mutation system for java. In Proceedings of the 28th international conference on software engineering, ser. ICSE’06 (pp. 827–830). New York, NY: ACM, 2006.
Macedo, M. G. (2016). Mutator. http://ortask.com/mutator/.
Madeyski, L., & Radyk, N. (2010). Judy—A mutation testing tool for java. IET software, 4(1), 32–42.
Article Google Scholar
Ma, Y.-S., Offutt, J., & Kwon, Y. R. (2005). Mujava: An automated class mutation system. Software Testing, Verification and Reliability, 15(2), 97–133.
Article Google Scholar
Mathur, A. (1991). Performance, effectiveness, and reliability issues in software testing. In Annual international computer software and applications conference, COMPSAC (pp. 604–605), 1991.
Mathur, A. P., & Wong, W. E. (1994). An empirical comparison of data flow and mutation-based test adequacy criteria. Software Testing, Verification and Reliability, 4(1), 9–31.
Article Google Scholar
Moore, I. (2001). Jester—a junit test tester. In International conference on extreme programming (pp. 84–87).
Namin, A. S., & Andrews, J. H. (2009). The influence of size and coverage on test suite effectiveness. In ACM SIGSOFT international symposium on software testing and analysis (pp. 57–68). ACM.
Namin, A. S., Andrews, J. H., & Murdoch, D. J. (2008). Sufficient mutation operators for measuring test effectiveness. In International conference on software engineering (pp. 351–360). ACM.
Nanavati, J., Wu, F., Harman, M., Jia, Y., & Krinke, J. (2015). Mutation testing of memory-related operators. In Software testing, verification and validation workshops (ICSTW), 2015 IEEE eighth international conference on (pp. 1–10). IEEE.
Nica, S., & Wotawa, F. (2012). Using constraints for equivalent mutant detection. In Workshop on formal methods in the development of software, WS-FMDS (pp. 1–8).
Nimmer, J. W., & Ernst, M. D. (2002). Automatic generation of program specifications. ACM SIGSOFT Software Engineering Notes, 27(4), 229–239.
Article Google Scholar
Offut, J. (2016a). Problems with jester. https://cs.gmu.edu/offutt/documents/personal/jester-anal.html.
Offut, J. (2016b). Problems with parasoft insure++. https://cs.gmu.edu/offutt/documents/handouts/parasoft-anal.html.
Offutt, J. (2016). Insure++ critique. https://cs.gmu.edu/offutt/documents/handouts/parasoft-anal.html.
Offutt, A. J., & Untch, R. H. (2000). Mutation, uniting the orthogonal. In Mutation testing for the new century (pp. 34–44). Springer, 2001.
Offutt, A. J., & Voas, J. M. (1996). ‘Subsumption of condition coverage techniques by mutation testing. Technical report ISSE-TR-96-01. Information and Software Systems Engineering. Tech. rep.: George Mason University.
Offutt, A. J., Rothermel, G., & Zapf, C. (1993). An experimental evaluation of selective mutation. In International conference on software engineering (pp. 100–107). IEEE Computer Society Press.
Offutt, A. J. (1989). The coupling effect: Fact or fiction? ACM SIGSOFT Software Engineering Notes, 14(8), 131–140.
Article Google Scholar
Offutt, A. J. (1992). Investigations of the software testing coupling effect. ACM Transactions on Software Engineering and Methodology, 1(1), 5–20.
Article Google Scholar
Offutt, A. J., & Craft, W. M. (1994). Using compiler optimization techniques to detect equivalent mutants. Software Testing, Verification and Reliability, 4(3), 131–154.
Article Google Scholar
Offutt, A. J., Lee, A., Rothermel, G., Untch, R. H., & Zapf, C. (1996). An experimental determination of sufficient mutant operators. ACM Transactions on Software Engineering and Methodology, 5(2), 99–118.
Article Google Scholar
Offutt, A. J., & Pan, J. (1997). Automatically detecting equivalent mutants and infeasible paths. Software Testing, Verification and Reliability, 7(3), 165–192.
Article Google Scholar
Okun, V. (2004). Specification mutation for test generation and analysis. Ph.D. dissertation, University of Maryland Baltimore County.
Papadakis, M., Jia, Y., Harman, M., & Traon, Y. L. (2015). Trivial compiler equivalence: A large scale empirical study of a simple, fast and effective equivalent mutant detection technique. In International conference on software engineering.
Parasoft. (2014). Insure++. www.parasoft.com/products/insure/papers/tech_mut.htm.
Parasoft. (2015). Insure++ mutation analysis. http://www.parasoft.com/jsp/products/article.jsp?articleId=291&product=Insure.
Schuler, D., & Zeller, A. (2009). Javalanche: Efficient mutation testing for java. In ACM SIGSOFT symposium on the foundations of software engineering (pp. 297–298). August, 2009.
Schuler, D., Dallmeier, V., & Zeller, A. (2009). Efficient mutation testing by checking invariant violations. In ACM SIGSOFT international symposium on software testing and analysis (pp. 69–80). ACM.
Schuler, D., & Zeller, A. (2013). Covering and uncovering equivalent mutants. Software Testing, Verification and Reliability, 23(5), 353–374.
Article Google Scholar
Shannon, C. E. (2001). A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 5(1), 3–55.
Article MathSciNet Google Scholar
Singh, P. K., Sangwan, O. P., & Sharma, A. (2014). A study and review on the development of mutation testing tools for java and aspect-j programs. International Journal of Modern Education and Computer Science (IJMECS), 6(11), 1.
Article Google Scholar
Smith, B. H., & Williams, L. (2007). An empirical evaluation of the mujava mutation operators. In Testing: academic and industrial conference practice and research techniques-MUTATION, 2007. TAICPART-MUTATION 2007 (pp. 193–202). IEEE.
Sridharan, M., & Namin, A. S. (2010). Prioritizing mutation operators based on importance sampling. In International symposium on software reliability engineering (pp. 378–387). IEEE.
Untch, R. H. (2009). On reduced neighborhood mutation analysis using a single mutagenic operator. In Annual southeast regional conference, ser. ACM-SE 47 (pp. 71:1–71:4). New York, NY: ACM.
Usaola, M. P., & Mateo, P. R. (2012). Bacterio: Java mutation testing tool: A framework to evaluate quality of tests cases. In Proceedings of the 2012 IEEE international conference on software maintenance (ICSM), ser. ICSM’12 (pp. 646–649). Washington, DC: IEEE Computer Society.
Wah, K. S. H. T. (2000). A theoretical study of fault coupling. Software Testing, Verification and Reliability, 10(1), 3–45.
Article MathSciNet Google Scholar
Wah, K. S. H. T. (2003). An analysis of the coupling effect i: Single test data. Science of Computer Programming, 48(2), 119–161.
MathSciNet MATH Google Scholar
Watanabe, S. (1960). Information theoretical analysis of multivariate correlation. IBM Journal of Research and Development, 4(1), 66–82.
Article MathSciNet MATH Google Scholar
Wong, W. E. (1993). On mutation and data flow. Ph.D. dissertation, Purdue University, West Lafayette, IN, USA, uMI Order No. GAX94-20921.
Wong, W., & Mathur, A. P. (1995). Reducing the cost of mutation testing: An empirical study. Journal of Systems and Software, 31(3), 185–196.
Article Google Scholar
Yao, X., Harman, M., & Jia, Y. (2014). A study of equivalent and stubborn mutation operators using human analysis of equivalence. In International conference on software engineering (pp. 919–930).
Zhang, L., Gligoric, M., Marinov, D., & Khurshid, S. (2013). Operator-based and random mutant selection: Better together. In IEEE/ACM automated software engineering. ACM.
Zhang, L., Hou, S.-S., Hu, J.-J., Xie, T., & Mei, H. (2010). Is operator-based mutant selection superior to random mutant selection? In International conference on software engineering (pp. 435–444). New York, NY: ACM.
Zhang, J., Zhu, M., Hao, D., & Zhang, L. (2014). An empirical study on the scalability of selective mutation testing. In International symposium on software reliability engineering. ACM.
Zhou, C., & Frankl, P. (2009). Mutation testing for java database applications. In Software testing verification and validation, ICST’09. International conference on (pp. 396–405). IEEE, 2009.

Download references

Author information

Authors and Affiliations

EECS Department, Oregon State University, Corvallis, OR, USA
Rahul Gopinath
Oregon State University, Corvallis, OR, USA
Iftekhar Ahmed, Mohammad Amin Alipour, Carlos Jensen & Alex Groce

Authors

Rahul Gopinath
View author publications
You can also search for this author in PubMed Google Scholar
Iftekhar Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Amin Alipour
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Jensen
View author publications
You can also search for this author in PubMed Google Scholar
Alex Groce
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rahul Gopinath.

Appendix

1.1 Measures of correlation

We rely on two different correlations here: The first is $R^2$, which suggests how close variables are linearly. $R^2$ (Pearson’s correlation coefficient) is a statistical measure of the goodness of fit, that is, the amount of variation in one variable that is explained by the variation in the other. For our purposes, it is the ability of mutation scores produced by one tool to predict the score of the other. We expect $R^2 = 1$ if either A) scores given by both tools for same program are the same, or B) they are always separated by same amount. The Kendall’s $\tau _b$ is a measure of monotonicity between variables compared, measuring the difference between concordant and discordant pairs. Kendall’s $\tau _b$ rank correlation coefficient is a nonparametric measure of association between two variables. It requires only that the dependent and independent variables (here mutation scores from two different tools) are connected by a monotonic function. It is defined as

$$\begin{aligned} \tau _b = \frac{{\text {concordant\,pairs}} - {\text {discordant\,pairs}}}{\frac{1}{2}n(n-1)} \end{aligned}$$

$R^2$ and $\tau _b$ provide information along two different dimensions of comparison. That is, $R^2$ can be close to 1 if the scores from both tools are different by a small amount, even if there is no consistency in which one has the larger score. However, such data would result in very low $\tau _b$, since the difference between concordant and discordant pairs would be small. On the other hand, say the mutation scores of one tool are linearly proportional to the test suite, while another tool has a different relation to the test suite—say squared increase. In such a case, the $R^2$ would be low since the relation between the two tools is not linear, while $\tau _b$ would be high. Hence both measures provide useful comparative information. Note that low $\tau _b$ indicates that the troubling situation in which tools would rank two test suites in opposite order of effectiveness is more frequent—this could lead to a change in the results of software testing experiments using mutation analysis to evaluate techniques, just by changing the tool used for measurement.

While what can be considered high and low correlation is subjective, for the purpose of this paper, we consider $R^2 \le 0.40$ to be low correlation, and $R^2 \ge 0.60$ to be high correlation.

1.2 Covariance

We showed (Gopinath et al. 2015) that for mutation analysis, the maximum number of mutants to be sampled for given tolerance has an upper bound provided by the binomial distribution, and the actual number is determined by the covariance.

Let the random variable $D_n$ denote the number of detected mutants out of our sample n. The estimated mutation score is given by $M_n = \frac{D_n}{n}$. The random variable $D_n$ can be modeled as the sum of all random variables representing mutants $X_{1 \ldots n}$. That is, $D_n = \sum _{i}^{n} X_i$. The expected value of $E(M_n)$ is given by $\frac{1}{n}E(D_n)$. The variance $V(M_n)$ is given by $\frac{1}{n^2}V(D_n)$, which can be written in terms of component random variables $X_{1 \ldots n}$ as:

$$\begin{aligned} \frac{1}{n^2}V(D_n) = \frac{1}{n^2} \sum _{i}^{n} V(X_i) + 2\sum _{i<j}^{n}Cov(X_i,X_j) \end{aligned}$$

Using a simplifying assumption that mutants are more similar to each other than dissimilar, we can assume that

$$\begin{aligned} 2\sum _{i<j}^{n}Cov(X_i,X_j) >= 0 \end{aligned}$$

The sum of covariance will be zero when the mutants are independent. That is, the variance of the mutants $V(M_n)$ is strictly greater than or equal to that of a similar distribution of independent random variables.

This means that the covariance between mutants determines the size of the sample required. That is, the larger the covariance (or correlation) between mutants, the smaller the diversity.

1.3 Mutual information

The mutual information of a variable is defined as the reduction in uncertainty of a variable due to knowledge of another. That is, given two variables X and Y, the redundancy between them is estimated as:

$$\begin{aligned} I(X;Y) = I(Y;X) = \sum _{y\in Y} \sum _{x\in X} p(x,y) \log \bigg (\frac{p(x,y)}{p(x)p(y)} \bigg ) \end{aligned}$$

To extend this to a set of mutants, we use one of the multivariate generalizations of mutual information proposed by Watanabe (1960)—multi-information also called total correlation. The important aspects of multi-information that are relevant to us are that it is well behaved—that is it allows only positive values, and is zero only when all variables are completely independent. The multi-information for a set of random variables $x_i \in X$ is defined formally as:

$$\begin{aligned} C(X_1 \ldots X_n) = \sum _{x_1\in X_1} \ldots \sum _{x_n\in X_n} p(x_1 \ldots x_n)\log \bigg (\frac{p(x_1 \ldots x_n)}{p(x_1) \ldots p(x_n)} \bigg ). \end{aligned}$$

1.4 Entropy

In information theory, Shannon entropy (Shannon 2001) is a measure of the information content in the given data. Entropy is related to multi-information. That is, multi-information is the difference between the sum of independent entropies of random variables and their joint entropy. Formally,

$$\begin{aligned} C(X_1 \ldots X_n) = \sum _{i=1}^{N} H(X_i) - H(X_1 \ldots X_n) \end{aligned}$$

Another reason we are interested in the entropy of a set of mutants is that the properties of entropy are also relevant to how good we judge a set of mutants to be. That is, as we expect from a measure of quality of a set of mutants, the value can never be negative (adding a mutant to a set of mutants should not decrease the utility of a mutant set). Secondly, a mutant set where all mutants are killed by all test cases has minimal value (think of a minimal set of mutants for such a matrix). This is mirrored by the entropy property that $I(1) = 0$. Similarly, a mutant set where no mutants are killed by any test cases is also of no value (again consider the minimal set of mutants for such a matrix), which is also mirrored by entropy $I(0) = 0$. Finally, we expect that if two mutant sets representing independent failures are combined, the measure should reflect the sum of their utilities. With entropy, the joint information of two independent random variables is their sum of respective information. Finally, the maximum entropy for a set of mutants happens when none of the mutants in the set are subsumed by any other mutants in the set. The entropy of a random variable is given by:

$$\begin{aligned} I(p) = - p\times {}\log _2(p). \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gopinath, R., Ahmed, I., Alipour, M.A. et al. Does choice of mutation tool matter?. Software Qual J 25, 871–920 (2017). https://doi.org/10.1007/s11219-016-9317-7

Download citation

Published: 10 May 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s11219-016-9317-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Does choice of mutation tool matter?

Abstract

Access this article

Similar content being viewed by others

Coverage-based quality metric of mutation operators for test suite improvement

Evaluating Mutation Operator and Test Case Effectiveness by Means of Mutation Testing

Toward Generalization of Mutant Clustering Results in Mutation Testing

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Measures of correlation

1.2 Covariance

1.3 Mutual information

1.4 Entropy

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Does choice of mutation tool matter?

Abstract

Access this article

Similar content being viewed by others

Coverage-based quality metric of mutation operators for test suite improvement

Evaluating Mutation Operator and Test Case Effectiveness by Means of Mutation Testing

Toward Generalization of Mutant Clustering Results in Mutation Testing

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Measures of correlation

1.2 Covariance

1.3 Mutual information

1.4 Entropy

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation