How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults

Kintis, Marinos; Papadakis, Mike; Papadopoulos, Andreas; Valvis, Evangelos; Malevris, Nicos; Le Traon, Yves

doi:10.1007/s10664-017-9582-5

How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults

Published: 21 December 2017

Volume 23, pages 2426–2463, (2018)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Marinos Kintis ORCID: orcid.org/0000-0002-9068-6514¹,
Mike Papadakis¹,
Andreas Papadopoulos²,
Evangelos Valvis²,
Nicos Malevris² &
…
Yves Le Traon¹

2049 Accesses
44 Citations
6 Altmetric
Explore all metrics

Abstract

Mutation analysis is a well-studied, fault-based testing technique. It requires testers to design tests based on a set of artificial defects. The defects help in performing testing activities by measuring the ratio that is revealed by the candidate tests. Unfortunately, applying mutation to real-world programs requires automated tools due to the vast number of defects involved. In such a case, the effectiveness of the method strongly depends on the peculiarities of the employed tools. Thus, when using automated tools, their implementation inadequacies can lead to inaccurate results. To deal with this issue, we cross-evaluate four mutation testing tools for Java, namely PIT, muJava, Major and the research version of PIT, PIT_RV, with respect to their fault-detection capabilities. We investigate the strengths of the tools based on: a) a set of real faults and b) manual analysis of the mutants they introduce. We find that there are large differences between the tools’ effectiveness and demonstrate that no tool is able to subsume the others. We also provide results indicating the application cost of the method. Overall, we find that PIT_RV achieves the best results. In particular, PIT_RV outperforms the other tools by finding 6% more faults than the other tools combined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison and Validation of Mutation Testing Tools Based on Java Language

Do Null-Type Mutation Operators Help Prevent Null-Type Faults?

LittleDarwin: A Feature-Rich and Extensible Mutation Testing Framework for Large and Complex Java Systems

Notes

The killable mutant set is the set of all the generated mutants excluding the equivalent ones.
https://github.com/AlDanial/cloc
https://github.com/rjust/defects4j
http://www.jacoco.org/
The reason we chose to generate only one test suite with Randoop is that it generates far more tests than EvoSuite.
The RIP model states that mutants are killed by test cases that reach (execute) the mutated statement and manage to cause an infection on the program state (the mutant and the original program are in a different state at the mutated point) and manifest the infection to the program output (the different program states result in different outputs).

References

Ammann P, Offutt J (2008) Introduction to software testing, 1st edn. Cambridge University Press, New York
Book Google Scholar
Ammann P, Delamaro ME, Offutt J (2014) Establishing theoretical minimal sets of mutants. In: Seventh IEEE international conference on software testing, verification and validation, ICST 2014, March 31, 2014–April 4, 2014, Cleveland, Ohio, USA, pp 21–30. https://doi.org/10.1109/ICST.2014.13
Andrews J, Briand L, Labiche Y, Namin A (2006) Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans Softw Eng 32(8):608–624. https://doi.org/10.1109/TSE.2006.83
Article Google Scholar
Baker R, Habli I (2013) An empirical evaluation of mutation testing for improving the test quality of safety-critical software. IEEE Trans Softw Eng 39(6):787–805. https://doi.org/10.1109/TSE.2012.56
Article Google Scholar
Bardin S, Delahaye M, David R, Kosmatov N, Papadakis M, Traon YL, Marion J (2015) Sound and quasi-complete detection of infeasible test requirements. In: 8th IEEE international conference on software testing, verification and validation, ICST 2015, Graz, Austria, April 13–17, 2015, pp 1–10, DOI https://doi.org/10.1109/ICST.2015.7102607, (to appear in print)
Budd T A, Angluin D (1982) Two notions of correctness and their relation to testing. Acta Informatica 18(1):31–45. https://doi.org/10.1007/BF00625279
Article MathSciNet MATH Google Scholar
Chekam TT, Papadakis M, Traon YL, Harman M (2017) An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. In: Proceedings of the 39th international conference on software engineering, ICSE 2017, Buenos Aires, Argentina, May 20–28, 2017, pp 597–608. https://doi.org/10.1109/ICSE.2017.61
Coles H (2010) The PIT mutation testing tool. http://pitest.org/, Last accessed October 2017
Delahaye M, Du Bousquet L (2015) Selecting a software engineering tool: lessons learnt from mutation analysis. Software: Practice and Experience 45(7):875–891. https://doi.org/10.1002/spe.2312
Google Scholar
DeMillo RA, Lipton RJ, Sayward FG (1978) Hints on test data selection: help for the practicing programmer. IEEE Computer 11(4):34–41. https://doi.org/10.1109/C-M.1978.218136
Article Google Scholar
Deng L, Offutt J, Li N (2013) Empirical evaluation of the statement deletion mutation operator. In: IEEE sixth international conference on software testing, verification and validation, pp 84–93. https://doi.org/10.1109/ICST.2013.20
Devroey X, Perrouin G, Papadakis M, Legay A, Schobbens P, Heymans P (2016) Featured model-based mutation analysis. In: Proceedings of the 38th international conference on software engineering, ICSE 2016, Austin, TX, USA, May 14–22, 2016, pp 655–666. https://doi.org/10.1145/2884781.2884821
Fagerland MW, Lydersen S, Laake P (2014) Recommended tests and confidence intervals for paired binomial proportions. Stat Med 33(16):2850–2875. https://doi.org/10.1002/sim.6148
Article MathSciNet Google Scholar
Fraser G, Arcuri A (2011) Evosuite: automatic test suite generation for object-oriented software. In: SIGSOFT/FSE’11 19th ACM SIGSOFT symposium on the foundations of software engineering (FSE-19) and ESEC’11: 13th european software engineering conference (ESEC-13), Szeged, Hungary, September 5–9, 2011, pp 416–419. https://doi.org/10.1145/2025113.2025179
Fraser G, Zeller A (2012) Mutation-driven generation of unit tests and oracles. IEEE Trans Software Eng 38(2):278–292. https://doi.org/10.1109/TSE.2011.93
Article Google Scholar
Geist R, Offutt A, Harris JFC (1992) Estimation and enhancement of real-time software reliability through mutation analysis. IEEE Trans Comput 41(5):550–558. https://doi.org/10.1109/12.142681
Article Google Scholar
Gopinath R, Ahmed I, Alipour MA, Jensen C, Groce A (2016) Does choice of mutation tool matter? Softw Qual J 1–50. https://doi.org/10.1007/s11219-016-9317-7
Henard C, Papadakis M, Traon YL (2014) Mutation-based generation of software product line test configurations. In: Search-based software engineering - 6th international symposium, SSBSE 2014, Fortaleza, Brazil, August 26–29, 2014. Proceedings, pp 92–106. https://doi.org/10.1007/978-3-319-09940-8_7
Coles H, Laurent T, Henard C, Papadakis M, Ventresque A (2016) PIT: a practical mutation testing tool for Java (demo). In: ISSTA. https://doi.org/10.1145/2931037.2948707, pp 449–452
Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng 37(5):649–678. https://doi.org/10.1109/TSE.2010.62
Article Google Scholar
Just R, Schweiggert F, Kapfhammer GM (2011) MAJOR: An efficient and extensible tool for mutation analysis in a Java compiler. In: 26th IEEE/ACM international conference on automated software engineering (ASE 2011), Lawrence, KS, USA, November 6–10, 2011, pp 612–615. https://doi.org/10.1109/ASE.2011.6100138
Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for Java programs. In: International symposium on software testing and analysis, ISSTA ’14, San Jose, CA, USA - July 21–26, 2014, pp 437–440. https://doi.org/10.1145/2610384.2628055
Kintis M (2016) Effective methods to tackle the equivalent mutant problem when testing software with mutation. PhD thesis, Department of Informatics, Athens University of Economics and Business
Kintis M, Malevris N (2015) MEDIC: a static analysis framework for equivalent mutant identification. Inf Softw Technol 68:1–17. https://doi.org/10.1016/j.infsof.2015.07.009
Article Google Scholar
Kintis M, Papadakis M, Malevris N (2010) Evaluating mutation testing alternatives: a collateral experiment. In: Proceedings of the 17th asia-pacific software engineering conference, pp 300–309. https://doi.org/10.1109/APSEC.2010.42
Kintis M, Papadakis M, Malevris N (2015) Employing second-order mutation for isolating first-order equivalent mutants. Software Testing, Verification and Reliability (STVR) 25(5-7):508–535. https://doi.org/10.1002/stvr.1529
Article Google Scholar
Kintis M, Papadakis M, Papadopoulos A, Valvis E, Malevris N (2016) Analysing and comparing the effectiveness of mutation testing tools: a manual study. In: International working conference on source code analysis and manipulation, pp 147–156
Kintis M, Papadakis M, Jia Y, Malevris N, Traon YL, Harman M (2017a) Detecting trivial mutant equivalences via compiler optimisations. IEEE Trans Softw Eng PP(99):1–1. https://doi.org/10.1109/TSE.2017.2684805
Kintis M, Papadakis M, Papadopoulos A, Valvis E, Malevris N, Traon YL (2017b) Accompanying data for the paper: how effective mutation testing tools are? An empirical analysis of Java mutation testing tools with manual analysis and real faults. https://doi.org/10.6084/m9.figshare.5558587.v1
Kintis M, Papadakis M, Papadopoulos A, Valvis E, Malevris N, Traon YL (2017c) Supporting site for the paper: how effective mutation testing tools are? An empirical analysis of Java mutation testing tools with manual analysis and real faultss. http://pages.cs.aueb.gr/~kintism/papers/emse2017/
Kurtz B, Ammann P, Offutt J, Delamaro M E, Kurtz M, Gökçe N (2016) Analyzing the validity of selective mutation with dominator mutants. In: Proceedings of the 24th ACM SIGSOFT international symposium on foundations of software engineering, FSE 2016, Seattle, WA, USA, November 13–18, 2016, , pp 571–582. https://doi.org/10.1145/2950290.2950322
Laurent T, Papadakis M, Kintis M, Henard C, Traon YL, Ventresque A (2017a) Assessing and improving the mutation testing practice of pit. In: IEEE international conference on software testing, verification and validation, ICST
Laurent T, Ventresque A, Papadakis M, Henard C, Traon Y (2017b) PIT_RV: the extended version of PIT. https://github.com/laurenttho3/extendedpitest, Last Accessed October 2017
Li N, Praphamontripong U, Offutt J (2009) An experimental comparison of four unit test criteria: mutation, edge-pair, all-uses and prime path coverage. In: International conference on software testing, verification and validation workshops, pp 220–229. https://doi.org/10.1109/ICSTW.2009.30
Lindström B, Márki A (2016) On strong mutation and subsuming mutants. In: Proceedings of the 11th international workshop on mutation analysis
Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, ACM, New York, NY, USA, FSE 2014. https://doi.org/10.1145/2635868.2635920, pp 643–653
Ma Y S, Offutt J, Kwon YR (2005) MuJava: an automated class mutation system. Software Testing, Verification and Reliability 15(2):97–133. https://doi.org/10.1002/stvr.308
Article Google Scholar
Offutt AJ, Pan J (1997) Automatically detecting equivalent mutants and infeasible paths. Softw Test, Verif Reliab 7(3):165–192. https://doi.org/10.1002/(SICI)1099-1689(199709)7:3<165::AID-STVR143>3.0.CO;2-U
Article Google Scholar
Offutt AJ, Lee A, Rothermel G, Untch RH, Zapf C (1996) An experimental determination of sufficient mutant operators. ACM Trans Softw Eng Methodol 5(2):99–118
Article Google Scholar
Offutt J (2011) A mutation carol: past, present and future. Inf Softw Technol 53(10):1098–1107. https://doi.org/10.1016/j.infsof.2011.03.007
Article Google Scholar
Pacheco C, Ernst MD (2007) Randoop: feedback-directed random testing for Java. In: Companion to the 22nd annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications, OOPSLA 2007, October 21–25, 2007, Montreal, Quebec, Canada, pp 815–816. https://doi.org/10.1145/1297846.1297902
Papadakis M, Malevris N (2010a) Automatic mutation test case generation via dynamic symbolic execution. In: 21st international symposium on software reliability engineering, pp 121–130. https://doi.org/10.1109/ISSRE.2010.38
Papadakis M, Malevris N (2010b) An empirical evaluation of the first and second order mutation testing strategies. In: Third international conference on software testing, verification and validation, ICST 2010, Paris, France, April 7–9, 2010, workshops proceedings, pp 90–99. https://doi.org/10.1109/ICSTW.2010.50
Papadakis M, Traon YL (2015) Metallaxis-fl: mutation-based fault localization. Softw Test Verif Reliab 25(5-7):605–628. https://doi.org/10.1002/stvr.1509
Article Google Scholar
Papadakis M, Delamaro ME, Traon YL (2014) Mitigating the effects of equivalent mutants with mutant classification strategies. Sci Comput Program 95:298–319. https://doi.org/10.1016/j.scico.2014.05.012
Article Google Scholar
Papadakis M, Jia Y, Harman M, Traon YL (2015) Trivial compiler equivalence: a large scale empirical study of a simple, fast and effective equivalent mutant detection technique. In: 37th international conference on software engineering, vol 1, pp 936–946. https://doi.org/10.1109/ICSE.2015.103
Papadakis M, Henard C, Harman M, Jia Y, Le Traon Y (2016) Threats to the validity of mutation-based test assessment. In: Proceedings of the 25th international symposium on software testing and analysis, ACM, New York, NY, USA, ISSTA 2016, pp 354–365. https://doi.org/10.1145/2931037.2931040
Papadakis M, Kintis M, Zhang J, Jia Y, Traon YL, Harman M (2017) Mutation testing advances: an analysis and survey. Advances in Computers
Rani S, Suri B, Khatri SK (2015) Experimental comparison of automated mutation testing tools for Java. In: 4th international conference on reliability, infocom technologies and optimization, pp 1–6. https://doi.org/10.1109/ICRITO.2015.7359265
Shamshiri S, Just R, Rojas JM, Fraser G, McMinn P, Arcuri A (2015) Do automatically generated unit tests find real faults? An empirical study of effectiveness and challenges (t). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE), pp 201–211. https://doi.org/10.1109/ASE.2015.86
Visser W (2016) What makes killing a mutant hard. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, ASE 2016, Singapore, September 3–7, 2016, pp 39–44. https://doi.org/10.1145/2970276.2970345
Yao X, Harman M, Jia Y (2014) A study of equivalent and stubborn mutation operators using human analysis of equivalence. In: Proceedings of the 36th international conference on software engineering, pp 919–930. https://doi.org/10.1145/2568225.2568265
Zhang L, Hou S, Hu J, Xie T, Mei H (2010) Is operator-based mutant selection superior to random mutant selection?. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - volume 1, ICSE 2010, Cape Town, South Africa, 1–8 May 2010, pp 435–444. https://doi.org/10.1145/1806799.1806863
Zhang L, Marinov D, Zhang L, Khurshid S (2012) Regression mutation testing. In: International symposium on software testing and analysis, ISSTA 2012, Minneapolis, MN, USA, July 15–20, 2012, pp 331–341. https://doi.org/10.1145/2338965.2336793

Download references

Acknowledgements

Marinos Kintis and Nicos Malevris are partly supported by the Research Centre of Athens University of Economics and Business (RC AUEB).

Author information

Authors and Affiliations

Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Luxembourg, Luxembourg
Marinos Kintis, Mike Papadakis & Yves Le Traon
Department of Informatics, Athens University of Economics and Business, Athens, Greece
Andreas Papadopoulos, Evangelos Valvis & Nicos Malevris

Authors

Marinos Kintis
View author publications
You can also search for this author in PubMed Google Scholar
Mike Papadakis
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Papadopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos Valvis
View author publications
You can also search for this author in PubMed Google Scholar
Nicos Malevris
View author publications
You can also search for this author in PubMed Google Scholar
Yves Le Traon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marinos Kintis.

Additional information

Communicated by: Michaela Greiler and Gabriele Bavota

Appendix A

Table 10 Major’s, PIT_RV, PIT and muJava fault revelation on real faults studied

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kintis, M., Papadakis, M., Papadopoulos, A. et al. How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults. Empir Software Eng 23, 2426–2463 (2018). https://doi.org/10.1007/s10664-017-9582-5

Download citation

Published: 21 December 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s10664-017-9582-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults

Abstract

Access this article

Similar content being viewed by others

Comparison and Validation of Mutation Testing Tools Based on Java Language

Do Null-Type Mutation Operators Help Prevent Null-Type Faults?

LittleDarwin: A Feature-Rich and Extensible Mutation Testing Framework for Large and Complex Java Systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix A

Rights and permissions

About this article

Cite this article

Keywords

Navigation

How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults

Abstract

Access this article

Similar content being viewed by others

Comparison and Validation of Mutation Testing Tools Based on Java Language

Do Null-Type Mutation Operators Help Prevent Null-Type Faults?

LittleDarwin: A Feature-Rich and Extensible Mutation Testing Framework for Large and Complex Java Systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix A

Appendix A

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation