Skip to main content
Log in

How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Mutation analysis is a well-studied, fault-based testing technique. It requires testers to design tests based on a set of artificial defects. The defects help in performing testing activities by measuring the ratio that is revealed by the candidate tests. Unfortunately, applying mutation to real-world programs requires automated tools due to the vast number of defects involved. In such a case, the effectiveness of the method strongly depends on the peculiarities of the employed tools. Thus, when using automated tools, their implementation inadequacies can lead to inaccurate results. To deal with this issue, we cross-evaluate four mutation testing tools for Java, namely PIT, muJava, Major and the research version of PIT, PITRV, with respect to their fault-detection capabilities. We investigate the strengths of the tools based on: a) a set of real faults and b) manual analysis of the mutants they introduce. We find that there are large differences between the tools’ effectiveness and demonstrate that no tool is able to subsume the others. We also provide results indicating the application cost of the method. Overall, we find that PITRV achieves the best results. In particular, PITRV outperforms the other tools by finding 6% more faults than the other tools combined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The killable mutant set is the set of all the generated mutants excluding the equivalent ones.

  2. https://github.com/AlDanial/cloc

  3. https://github.com/rjust/defects4j

  4. http://www.jacoco.org/

  5. The reason we chose to generate only one test suite with Randoop is that it generates far more tests than EvoSuite.

  6. The RIP model states that mutants are killed by test cases that reach (execute) the mutated statement and manage to cause an infection on the program state (the mutant and the original program are in a different state at the mutated point) and manifest the infection to the program output (the different program states result in different outputs).

References

  • Ammann P, Offutt J (2008) Introduction to software testing, 1st edn. Cambridge University Press, New York

    Book  Google Scholar 

  • Ammann P, Delamaro ME, Offutt J (2014) Establishing theoretical minimal sets of mutants. In: Seventh IEEE international conference on software testing, verification and validation, ICST 2014, March 31, 2014–April 4, 2014, Cleveland, Ohio, USA, pp 21–30. https://doi.org/10.1109/ICST.2014.13

  • Andrews J, Briand L, Labiche Y, Namin A (2006) Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans Softw Eng 32(8):608–624. https://doi.org/10.1109/TSE.2006.83

    Article  Google Scholar 

  • Baker R, Habli I (2013) An empirical evaluation of mutation testing for improving the test quality of safety-critical software. IEEE Trans Softw Eng 39(6):787–805. https://doi.org/10.1109/TSE.2012.56

    Article  Google Scholar 

  • Bardin S, Delahaye M, David R, Kosmatov N, Papadakis M, Traon YL, Marion J (2015) Sound and quasi-complete detection of infeasible test requirements. In: 8th IEEE international conference on software testing, verification and validation, ICST 2015, Graz, Austria, April 13–17, 2015, pp 1–10, DOI https://doi.org/10.1109/ICST.2015.7102607, (to appear in print)

  • Budd T A, Angluin D (1982) Two notions of correctness and their relation to testing. Acta Informatica 18(1):31–45. https://doi.org/10.1007/BF00625279

    Article  MathSciNet  MATH  Google Scholar 

  • Chekam TT, Papadakis M, Traon YL, Harman M (2017) An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. In: Proceedings of the 39th international conference on software engineering, ICSE 2017, Buenos Aires, Argentina, May 20–28, 2017, pp 597–608. https://doi.org/10.1109/ICSE.2017.61

  • Coles H (2010) The PIT mutation testing tool. http://pitest.org/, Last accessed October 2017

  • Delahaye M, Du Bousquet L (2015) Selecting a software engineering tool: lessons learnt from mutation analysis. Software: Practice and Experience 45(7):875–891. https://doi.org/10.1002/spe.2312

    Google Scholar 

  • DeMillo RA, Lipton RJ, Sayward FG (1978) Hints on test data selection: help for the practicing programmer. IEEE Computer 11(4):34–41. https://doi.org/10.1109/C-M.1978.218136

    Article  Google Scholar 

  • Deng L, Offutt J, Li N (2013) Empirical evaluation of the statement deletion mutation operator. In: IEEE sixth international conference on software testing, verification and validation, pp 84–93. https://doi.org/10.1109/ICST.2013.20

  • Devroey X, Perrouin G, Papadakis M, Legay A, Schobbens P, Heymans P (2016) Featured model-based mutation analysis. In: Proceedings of the 38th international conference on software engineering, ICSE 2016, Austin, TX, USA, May 14–22, 2016, pp 655–666. https://doi.org/10.1145/2884781.2884821

  • Fagerland MW, Lydersen S, Laake P (2014) Recommended tests and confidence intervals for paired binomial proportions. Stat Med 33(16):2850–2875. https://doi.org/10.1002/sim.6148

    Article  MathSciNet  Google Scholar 

  • Fraser G, Arcuri A (2011) Evosuite: automatic test suite generation for object-oriented software. In: SIGSOFT/FSE’11 19th ACM SIGSOFT symposium on the foundations of software engineering (FSE-19) and ESEC’11: 13th european software engineering conference (ESEC-13), Szeged, Hungary, September 5–9, 2011, pp 416–419. https://doi.org/10.1145/2025113.2025179

  • Fraser G, Zeller A (2012) Mutation-driven generation of unit tests and oracles. IEEE Trans Software Eng 38(2):278–292. https://doi.org/10.1109/TSE.2011.93

    Article  Google Scholar 

  • Geist R, Offutt A, Harris JFC (1992) Estimation and enhancement of real-time software reliability through mutation analysis. IEEE Trans Comput 41(5):550–558. https://doi.org/10.1109/12.142681

    Article  Google Scholar 

  • Gopinath R, Ahmed I, Alipour MA, Jensen C, Groce A (2016) Does choice of mutation tool matter? Softw Qual J 1–50. https://doi.org/10.1007/s11219-016-9317-7

  • Henard C, Papadakis M, Traon YL (2014) Mutation-based generation of software product line test configurations. In: Search-based software engineering - 6th international symposium, SSBSE 2014, Fortaleza, Brazil, August 26–29, 2014. Proceedings, pp 92–106. https://doi.org/10.1007/978-3-319-09940-8_7

  • Coles H, Laurent T, Henard C, Papadakis M, Ventresque A (2016) PIT: a practical mutation testing tool for Java (demo). In: ISSTA. https://doi.org/10.1145/2931037.2948707, pp 449–452

  • Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng 37(5):649–678. https://doi.org/10.1109/TSE.2010.62

    Article  Google Scholar 

  • Just R, Schweiggert F, Kapfhammer GM (2011) MAJOR: An efficient and extensible tool for mutation analysis in a Java compiler. In: 26th IEEE/ACM international conference on automated software engineering (ASE 2011), Lawrence, KS, USA, November 6–10, 2011, pp 612–615. https://doi.org/10.1109/ASE.2011.6100138

  • Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for Java programs. In: International symposium on software testing and analysis, ISSTA ’14, San Jose, CA, USA - July 21–26, 2014, pp 437–440. https://doi.org/10.1145/2610384.2628055

  • Kintis M (2016) Effective methods to tackle the equivalent mutant problem when testing software with mutation. PhD thesis, Department of Informatics, Athens University of Economics and Business

  • Kintis M, Malevris N (2015) MEDIC: a static analysis framework for equivalent mutant identification. Inf Softw Technol 68:1–17. https://doi.org/10.1016/j.infsof.2015.07.009

    Article  Google Scholar 

  • Kintis M, Papadakis M, Malevris N (2010) Evaluating mutation testing alternatives: a collateral experiment. In: Proceedings of the 17th asia-pacific software engineering conference, pp 300–309. https://doi.org/10.1109/APSEC.2010.42

  • Kintis M, Papadakis M, Malevris N (2015) Employing second-order mutation for isolating first-order equivalent mutants. Software Testing, Verification and Reliability (STVR) 25(5-7):508–535. https://doi.org/10.1002/stvr.1529

    Article  Google Scholar 

  • Kintis M, Papadakis M, Papadopoulos A, Valvis E, Malevris N (2016) Analysing and comparing the effectiveness of mutation testing tools: a manual study. In: International working conference on source code analysis and manipulation, pp 147–156

  • Kintis M, Papadakis M, Jia Y, Malevris N, Traon YL, Harman M (2017a) Detecting trivial mutant equivalences via compiler optimisations. IEEE Trans Softw Eng PP(99):1–1. https://doi.org/10.1109/TSE.2017.2684805

  • Kintis M, Papadakis M, Papadopoulos A, Valvis E, Malevris N, Traon YL (2017b) Accompanying data for the paper: how effective mutation testing tools are? An empirical analysis of Java mutation testing tools with manual analysis and real faults. https://doi.org/10.6084/m9.figshare.5558587.v1

  • Kintis M, Papadakis M, Papadopoulos A, Valvis E, Malevris N, Traon YL (2017c) Supporting site for the paper: how effective mutation testing tools are? An empirical analysis of Java mutation testing tools with manual analysis and real faultss. http://pages.cs.aueb.gr/~kintism/papers/emse2017/

  • Kurtz B, Ammann P, Offutt J, Delamaro M E, Kurtz M, Gökçe N (2016) Analyzing the validity of selective mutation with dominator mutants. In: Proceedings of the 24th ACM SIGSOFT international symposium on foundations of software engineering, FSE 2016, Seattle, WA, USA, November 13–18, 2016, , pp 571–582. https://doi.org/10.1145/2950290.2950322

  • Laurent T, Papadakis M, Kintis M, Henard C, Traon YL, Ventresque A (2017a) Assessing and improving the mutation testing practice of pit. In: IEEE international conference on software testing, verification and validation, ICST

  • Laurent T, Ventresque A, Papadakis M, Henard C, Traon Y (2017b) PITRV: the extended version of PIT. https://github.com/laurenttho3/extendedpitest, Last Accessed October 2017

  • Li N, Praphamontripong U, Offutt J (2009) An experimental comparison of four unit test criteria: mutation, edge-pair, all-uses and prime path coverage. In: International conference on software testing, verification and validation workshops, pp 220–229. https://doi.org/10.1109/ICSTW.2009.30

  • Lindström B, Márki A (2016) On strong mutation and subsuming mutants. In: Proceedings of the 11th international workshop on mutation analysis

  • Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, ACM, New York, NY, USA, FSE 2014. https://doi.org/10.1145/2635868.2635920, pp 643–653

  • Ma Y S, Offutt J, Kwon YR (2005) MuJava: an automated class mutation system. Software Testing, Verification and Reliability 15(2):97–133. https://doi.org/10.1002/stvr.308

    Article  Google Scholar 

  • Offutt AJ, Pan J (1997) Automatically detecting equivalent mutants and infeasible paths. Softw Test, Verif Reliab 7(3):165–192. https://doi.org/10.1002/(SICI)1099-1689(199709)7:3<165::AID-STVR143>3.0.CO;2-U

    Article  Google Scholar 

  • Offutt AJ, Lee A, Rothermel G, Untch RH, Zapf C (1996) An experimental determination of sufficient mutant operators. ACM Trans Softw Eng Methodol 5(2):99–118

    Article  Google Scholar 

  • Offutt J (2011) A mutation carol: past, present and future. Inf Softw Technol 53(10):1098–1107. https://doi.org/10.1016/j.infsof.2011.03.007

    Article  Google Scholar 

  • Pacheco C, Ernst MD (2007) Randoop: feedback-directed random testing for Java. In: Companion to the 22nd annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications, OOPSLA 2007, October 21–25, 2007, Montreal, Quebec, Canada, pp 815–816. https://doi.org/10.1145/1297846.1297902

  • Papadakis M, Malevris N (2010a) Automatic mutation test case generation via dynamic symbolic execution. In: 21st international symposium on software reliability engineering, pp 121–130. https://doi.org/10.1109/ISSRE.2010.38

  • Papadakis M, Malevris N (2010b) An empirical evaluation of the first and second order mutation testing strategies. In: Third international conference on software testing, verification and validation, ICST 2010, Paris, France, April 7–9, 2010, workshops proceedings, pp 90–99. https://doi.org/10.1109/ICSTW.2010.50

  • Papadakis M, Traon YL (2015) Metallaxis-fl: mutation-based fault localization. Softw Test Verif Reliab 25(5-7):605–628. https://doi.org/10.1002/stvr.1509

    Article  Google Scholar 

  • Papadakis M, Delamaro ME, Traon YL (2014) Mitigating the effects of equivalent mutants with mutant classification strategies. Sci Comput Program 95:298–319. https://doi.org/10.1016/j.scico.2014.05.012

    Article  Google Scholar 

  • Papadakis M, Jia Y, Harman M, Traon YL (2015) Trivial compiler equivalence: a large scale empirical study of a simple, fast and effective equivalent mutant detection technique. In: 37th international conference on software engineering, vol 1, pp 936–946. https://doi.org/10.1109/ICSE.2015.103

  • Papadakis M, Henard C, Harman M, Jia Y, Le Traon Y (2016) Threats to the validity of mutation-based test assessment. In: Proceedings of the 25th international symposium on software testing and analysis, ACM, New York, NY, USA, ISSTA 2016, pp 354–365. https://doi.org/10.1145/2931037.2931040

  • Papadakis M, Kintis M, Zhang J, Jia Y, Traon YL, Harman M (2017) Mutation testing advances: an analysis and survey. Advances in Computers

  • Rani S, Suri B, Khatri SK (2015) Experimental comparison of automated mutation testing tools for Java. In: 4th international conference on reliability, infocom technologies and optimization, pp 1–6. https://doi.org/10.1109/ICRITO.2015.7359265

  • Shamshiri S, Just R, Rojas JM, Fraser G, McMinn P, Arcuri A (2015) Do automatically generated unit tests find real faults? An empirical study of effectiveness and challenges (t). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE), pp 201–211. https://doi.org/10.1109/ASE.2015.86

  • Visser W (2016) What makes killing a mutant hard. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, ASE 2016, Singapore, September 3–7, 2016, pp 39–44. https://doi.org/10.1145/2970276.2970345

  • Yao X, Harman M, Jia Y (2014) A study of equivalent and stubborn mutation operators using human analysis of equivalence. In: Proceedings of the 36th international conference on software engineering, pp 919–930. https://doi.org/10.1145/2568225.2568265

  • Zhang L, Hou S, Hu J, Xie T, Mei H (2010) Is operator-based mutant selection superior to random mutant selection?. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - volume 1, ICSE 2010, Cape Town, South Africa, 1–8 May 2010, pp 435–444. https://doi.org/10.1145/1806799.1806863

  • Zhang L, Marinov D, Zhang L, Khurshid S (2012) Regression mutation testing. In: International symposium on software testing and analysis, ISSTA 2012, Minneapolis, MN, USA, July 15–20, 2012, pp 331–341. https://doi.org/10.1145/2338965.2336793

Download references

Acknowledgements

Marinos Kintis and Nicos Malevris are partly supported by the Research Centre of Athens University of Economics and Business (RC AUEB).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marinos Kintis.

Additional information

Communicated by: Michaela Greiler and Gabriele Bavota

Appendix A

Appendix A

Table 10 Major’s, PITRV, PIT and muJava fault revelation on real faults studied

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kintis, M., Papadakis, M., Papadopoulos, A. et al. How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults. Empir Software Eng 23, 2426–2463 (2018). https://doi.org/10.1007/s10664-017-9582-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9582-5

Keywords

Navigation