Abstract
Mutation analysis is a well-studied, fault-based testing technique. It requires testers to design tests based on a set of artificial defects. The defects help in performing testing activities by measuring the ratio that is revealed by the candidate tests. Unfortunately, applying mutation to real-world programs requires automated tools due to the vast number of defects involved. In such a case, the effectiveness of the method strongly depends on the peculiarities of the employed tools. Thus, when using automated tools, their implementation inadequacies can lead to inaccurate results. To deal with this issue, we cross-evaluate four mutation testing tools for Java, namely PIT, muJava, Major and the research version of PIT, PITRV, with respect to their fault-detection capabilities. We investigate the strengths of the tools based on: a) a set of real faults and b) manual analysis of the mutants they introduce. We find that there are large differences between the tools’ effectiveness and demonstrate that no tool is able to subsume the others. We also provide results indicating the application cost of the method. Overall, we find that PITRV achieves the best results. In particular, PITRV outperforms the other tools by finding 6% more faults than the other tools combined.
Similar content being viewed by others
Notes
The killable mutant set is the set of all the generated mutants excluding the equivalent ones.
The reason we chose to generate only one test suite with Randoop is that it generates far more tests than EvoSuite.
The RIP model states that mutants are killed by test cases that reach (execute) the mutated statement and manage to cause an infection on the program state (the mutant and the original program are in a different state at the mutated point) and manifest the infection to the program output (the different program states result in different outputs).
References
Ammann P, Offutt J (2008) Introduction to software testing, 1st edn. Cambridge University Press, New York
Ammann P, Delamaro ME, Offutt J (2014) Establishing theoretical minimal sets of mutants. In: Seventh IEEE international conference on software testing, verification and validation, ICST 2014, March 31, 2014–April 4, 2014, Cleveland, Ohio, USA, pp 21–30. https://doi.org/10.1109/ICST.2014.13
Andrews J, Briand L, Labiche Y, Namin A (2006) Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans Softw Eng 32(8):608–624. https://doi.org/10.1109/TSE.2006.83
Baker R, Habli I (2013) An empirical evaluation of mutation testing for improving the test quality of safety-critical software. IEEE Trans Softw Eng 39(6):787–805. https://doi.org/10.1109/TSE.2012.56
Bardin S, Delahaye M, David R, Kosmatov N, Papadakis M, Traon YL, Marion J (2015) Sound and quasi-complete detection of infeasible test requirements. In: 8th IEEE international conference on software testing, verification and validation, ICST 2015, Graz, Austria, April 13–17, 2015, pp 1–10, DOI https://doi.org/10.1109/ICST.2015.7102607, (to appear in print)
Budd T A, Angluin D (1982) Two notions of correctness and their relation to testing. Acta Informatica 18(1):31–45. https://doi.org/10.1007/BF00625279
Chekam TT, Papadakis M, Traon YL, Harman M (2017) An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. In: Proceedings of the 39th international conference on software engineering, ICSE 2017, Buenos Aires, Argentina, May 20–28, 2017, pp 597–608. https://doi.org/10.1109/ICSE.2017.61
Coles H (2010) The PIT mutation testing tool. http://pitest.org/, Last accessed October 2017
Delahaye M, Du Bousquet L (2015) Selecting a software engineering tool: lessons learnt from mutation analysis. Software: Practice and Experience 45(7):875–891. https://doi.org/10.1002/spe.2312
DeMillo RA, Lipton RJ, Sayward FG (1978) Hints on test data selection: help for the practicing programmer. IEEE Computer 11(4):34–41. https://doi.org/10.1109/C-M.1978.218136
Deng L, Offutt J, Li N (2013) Empirical evaluation of the statement deletion mutation operator. In: IEEE sixth international conference on software testing, verification and validation, pp 84–93. https://doi.org/10.1109/ICST.2013.20
Devroey X, Perrouin G, Papadakis M, Legay A, Schobbens P, Heymans P (2016) Featured model-based mutation analysis. In: Proceedings of the 38th international conference on software engineering, ICSE 2016, Austin, TX, USA, May 14–22, 2016, pp 655–666. https://doi.org/10.1145/2884781.2884821
Fagerland MW, Lydersen S, Laake P (2014) Recommended tests and confidence intervals for paired binomial proportions. Stat Med 33(16):2850–2875. https://doi.org/10.1002/sim.6148
Fraser G, Arcuri A (2011) Evosuite: automatic test suite generation for object-oriented software. In: SIGSOFT/FSE’11 19th ACM SIGSOFT symposium on the foundations of software engineering (FSE-19) and ESEC’11: 13th european software engineering conference (ESEC-13), Szeged, Hungary, September 5–9, 2011, pp 416–419. https://doi.org/10.1145/2025113.2025179
Fraser G, Zeller A (2012) Mutation-driven generation of unit tests and oracles. IEEE Trans Software Eng 38(2):278–292. https://doi.org/10.1109/TSE.2011.93
Geist R, Offutt A, Harris JFC (1992) Estimation and enhancement of real-time software reliability through mutation analysis. IEEE Trans Comput 41(5):550–558. https://doi.org/10.1109/12.142681
Gopinath R, Ahmed I, Alipour MA, Jensen C, Groce A (2016) Does choice of mutation tool matter? Softw Qual J 1–50. https://doi.org/10.1007/s11219-016-9317-7
Henard C, Papadakis M, Traon YL (2014) Mutation-based generation of software product line test configurations. In: Search-based software engineering - 6th international symposium, SSBSE 2014, Fortaleza, Brazil, August 26–29, 2014. Proceedings, pp 92–106. https://doi.org/10.1007/978-3-319-09940-8_7
Coles H, Laurent T, Henard C, Papadakis M, Ventresque A (2016) PIT: a practical mutation testing tool for Java (demo). In: ISSTA. https://doi.org/10.1145/2931037.2948707, pp 449–452
Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng 37(5):649–678. https://doi.org/10.1109/TSE.2010.62
Just R, Schweiggert F, Kapfhammer GM (2011) MAJOR: An efficient and extensible tool for mutation analysis in a Java compiler. In: 26th IEEE/ACM international conference on automated software engineering (ASE 2011), Lawrence, KS, USA, November 6–10, 2011, pp 612–615. https://doi.org/10.1109/ASE.2011.6100138
Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for Java programs. In: International symposium on software testing and analysis, ISSTA ’14, San Jose, CA, USA - July 21–26, 2014, pp 437–440. https://doi.org/10.1145/2610384.2628055
Kintis M (2016) Effective methods to tackle the equivalent mutant problem when testing software with mutation. PhD thesis, Department of Informatics, Athens University of Economics and Business
Kintis M, Malevris N (2015) MEDIC: a static analysis framework for equivalent mutant identification. Inf Softw Technol 68:1–17. https://doi.org/10.1016/j.infsof.2015.07.009
Kintis M, Papadakis M, Malevris N (2010) Evaluating mutation testing alternatives: a collateral experiment. In: Proceedings of the 17th asia-pacific software engineering conference, pp 300–309. https://doi.org/10.1109/APSEC.2010.42
Kintis M, Papadakis M, Malevris N (2015) Employing second-order mutation for isolating first-order equivalent mutants. Software Testing, Verification and Reliability (STVR) 25(5-7):508–535. https://doi.org/10.1002/stvr.1529
Kintis M, Papadakis M, Papadopoulos A, Valvis E, Malevris N (2016) Analysing and comparing the effectiveness of mutation testing tools: a manual study. In: International working conference on source code analysis and manipulation, pp 147–156
Kintis M, Papadakis M, Jia Y, Malevris N, Traon YL, Harman M (2017a) Detecting trivial mutant equivalences via compiler optimisations. IEEE Trans Softw Eng PP(99):1–1. https://doi.org/10.1109/TSE.2017.2684805
Kintis M, Papadakis M, Papadopoulos A, Valvis E, Malevris N, Traon YL (2017b) Accompanying data for the paper: how effective mutation testing tools are? An empirical analysis of Java mutation testing tools with manual analysis and real faults. https://doi.org/10.6084/m9.figshare.5558587.v1
Kintis M, Papadakis M, Papadopoulos A, Valvis E, Malevris N, Traon YL (2017c) Supporting site for the paper: how effective mutation testing tools are? An empirical analysis of Java mutation testing tools with manual analysis and real faultss. http://pages.cs.aueb.gr/~kintism/papers/emse2017/
Kurtz B, Ammann P, Offutt J, Delamaro M E, Kurtz M, Gökçe N (2016) Analyzing the validity of selective mutation with dominator mutants. In: Proceedings of the 24th ACM SIGSOFT international symposium on foundations of software engineering, FSE 2016, Seattle, WA, USA, November 13–18, 2016, , pp 571–582. https://doi.org/10.1145/2950290.2950322
Laurent T, Papadakis M, Kintis M, Henard C, Traon YL, Ventresque A (2017a) Assessing and improving the mutation testing practice of pit. In: IEEE international conference on software testing, verification and validation, ICST
Laurent T, Ventresque A, Papadakis M, Henard C, Traon Y (2017b) PITRV: the extended version of PIT. https://github.com/laurenttho3/extendedpitest, Last Accessed October 2017
Li N, Praphamontripong U, Offutt J (2009) An experimental comparison of four unit test criteria: mutation, edge-pair, all-uses and prime path coverage. In: International conference on software testing, verification and validation workshops, pp 220–229. https://doi.org/10.1109/ICSTW.2009.30
Lindström B, Márki A (2016) On strong mutation and subsuming mutants. In: Proceedings of the 11th international workshop on mutation analysis
Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, ACM, New York, NY, USA, FSE 2014. https://doi.org/10.1145/2635868.2635920, pp 643–653
Ma Y S, Offutt J, Kwon YR (2005) MuJava: an automated class mutation system. Software Testing, Verification and Reliability 15(2):97–133. https://doi.org/10.1002/stvr.308
Offutt AJ, Pan J (1997) Automatically detecting equivalent mutants and infeasible paths. Softw Test, Verif Reliab 7(3):165–192. https://doi.org/10.1002/(SICI)1099-1689(199709)7:3<165::AID-STVR143>3.0.CO;2-U
Offutt AJ, Lee A, Rothermel G, Untch RH, Zapf C (1996) An experimental determination of sufficient mutant operators. ACM Trans Softw Eng Methodol 5(2):99–118
Offutt J (2011) A mutation carol: past, present and future. Inf Softw Technol 53(10):1098–1107. https://doi.org/10.1016/j.infsof.2011.03.007
Pacheco C, Ernst MD (2007) Randoop: feedback-directed random testing for Java. In: Companion to the 22nd annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications, OOPSLA 2007, October 21–25, 2007, Montreal, Quebec, Canada, pp 815–816. https://doi.org/10.1145/1297846.1297902
Papadakis M, Malevris N (2010a) Automatic mutation test case generation via dynamic symbolic execution. In: 21st international symposium on software reliability engineering, pp 121–130. https://doi.org/10.1109/ISSRE.2010.38
Papadakis M, Malevris N (2010b) An empirical evaluation of the first and second order mutation testing strategies. In: Third international conference on software testing, verification and validation, ICST 2010, Paris, France, April 7–9, 2010, workshops proceedings, pp 90–99. https://doi.org/10.1109/ICSTW.2010.50
Papadakis M, Traon YL (2015) Metallaxis-fl: mutation-based fault localization. Softw Test Verif Reliab 25(5-7):605–628. https://doi.org/10.1002/stvr.1509
Papadakis M, Delamaro ME, Traon YL (2014) Mitigating the effects of equivalent mutants with mutant classification strategies. Sci Comput Program 95:298–319. https://doi.org/10.1016/j.scico.2014.05.012
Papadakis M, Jia Y, Harman M, Traon YL (2015) Trivial compiler equivalence: a large scale empirical study of a simple, fast and effective equivalent mutant detection technique. In: 37th international conference on software engineering, vol 1, pp 936–946. https://doi.org/10.1109/ICSE.2015.103
Papadakis M, Henard C, Harman M, Jia Y, Le Traon Y (2016) Threats to the validity of mutation-based test assessment. In: Proceedings of the 25th international symposium on software testing and analysis, ACM, New York, NY, USA, ISSTA 2016, pp 354–365. https://doi.org/10.1145/2931037.2931040
Papadakis M, Kintis M, Zhang J, Jia Y, Traon YL, Harman M (2017) Mutation testing advances: an analysis and survey. Advances in Computers
Rani S, Suri B, Khatri SK (2015) Experimental comparison of automated mutation testing tools for Java. In: 4th international conference on reliability, infocom technologies and optimization, pp 1–6. https://doi.org/10.1109/ICRITO.2015.7359265
Shamshiri S, Just R, Rojas JM, Fraser G, McMinn P, Arcuri A (2015) Do automatically generated unit tests find real faults? An empirical study of effectiveness and challenges (t). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE), pp 201–211. https://doi.org/10.1109/ASE.2015.86
Visser W (2016) What makes killing a mutant hard. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, ASE 2016, Singapore, September 3–7, 2016, pp 39–44. https://doi.org/10.1145/2970276.2970345
Yao X, Harman M, Jia Y (2014) A study of equivalent and stubborn mutation operators using human analysis of equivalence. In: Proceedings of the 36th international conference on software engineering, pp 919–930. https://doi.org/10.1145/2568225.2568265
Zhang L, Hou S, Hu J, Xie T, Mei H (2010) Is operator-based mutant selection superior to random mutant selection?. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - volume 1, ICSE 2010, Cape Town, South Africa, 1–8 May 2010, pp 435–444. https://doi.org/10.1145/1806799.1806863
Zhang L, Marinov D, Zhang L, Khurshid S (2012) Regression mutation testing. In: International symposium on software testing and analysis, ISSTA 2012, Minneapolis, MN, USA, July 15–20, 2012, pp 331–341. https://doi.org/10.1145/2338965.2336793
Acknowledgements
Marinos Kintis and Nicos Malevris are partly supported by the Research Centre of Athens University of Economics and Business (RC AUEB).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Michaela Greiler and Gabriele Bavota
Appendix A
Appendix A
Rights and permissions
About this article
Cite this article
Kintis, M., Papadakis, M., Papadopoulos, A. et al. How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults. Empir Software Eng 23, 2426–2463 (2018). https://doi.org/10.1007/s10664-017-9582-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-017-9582-5