Advertisement

Empirical Software Engineering

, Volume 23, Issue 4, pp 2426–2463 | Cite as

How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults

  • Marinos KintisEmail author
  • Mike Papadakis
  • Andreas Papadopoulos
  • Evangelos Valvis
  • Nicos Malevris
  • Yves Le Traon
Article

Abstract

Mutation analysis is a well-studied, fault-based testing technique. It requires testers to design tests based on a set of artificial defects. The defects help in performing testing activities by measuring the ratio that is revealed by the candidate tests. Unfortunately, applying mutation to real-world programs requires automated tools due to the vast number of defects involved. In such a case, the effectiveness of the method strongly depends on the peculiarities of the employed tools. Thus, when using automated tools, their implementation inadequacies can lead to inaccurate results. To deal with this issue, we cross-evaluate four mutation testing tools for Java, namely PIT, muJava, Major and the research version of PIT, PITRV, with respect to their fault-detection capabilities. We investigate the strengths of the tools based on: a) a set of real faults and b) manual analysis of the mutants they introduce. We find that there are large differences between the tools’ effectiveness and demonstrate that no tool is able to subsume the others. We also provide results indicating the application cost of the method. Overall, we find that PITRV achieves the best results. In particular, PITRV outperforms the other tools by finding 6% more faults than the other tools combined.

Keywords

Mutation testing Fault detection Tool comparison Human study Real faults 

Notes

Acknowledgements

Marinos Kintis and Nicos Malevris are partly supported by the Research Centre of Athens University of Economics and Business (RC AUEB).

References

  1. Ammann P, Offutt J (2008) Introduction to software testing, 1st edn. Cambridge University Press, New YorkCrossRefGoogle Scholar
  2. Ammann P, Delamaro ME, Offutt J (2014) Establishing theoretical minimal sets of mutants. In: Seventh IEEE international conference on software testing, verification and validation, ICST 2014, March 31, 2014–April 4, 2014, Cleveland, Ohio, USA, pp 21–30.  https://doi.org/10.1109/ICST.2014.13
  3. Andrews J, Briand L, Labiche Y, Namin A (2006) Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans Softw Eng 32(8):608–624.  https://doi.org/10.1109/TSE.2006.83 CrossRefGoogle Scholar
  4. Baker R, Habli I (2013) An empirical evaluation of mutation testing for improving the test quality of safety-critical software. IEEE Trans Softw Eng 39(6):787–805.  https://doi.org/10.1109/TSE.2012.56 CrossRefGoogle Scholar
  5. Bardin S, Delahaye M, David R, Kosmatov N, Papadakis M, Traon YL, Marion J (2015) Sound and quasi-complete detection of infeasible test requirements. In: 8th IEEE international conference on software testing, verification and validation, ICST 2015, Graz, Austria, April 13–17, 2015, pp 1–10, DOI  https://doi.org/10.1109/ICST.2015.7102607, (to appear in print)
  6. Budd T A, Angluin D (1982) Two notions of correctness and their relation to testing. Acta Informatica 18(1):31–45.  https://doi.org/10.1007/BF00625279 MathSciNetCrossRefzbMATHGoogle Scholar
  7. Chekam TT, Papadakis M, Traon YL, Harman M (2017) An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. In: Proceedings of the 39th international conference on software engineering, ICSE 2017, Buenos Aires, Argentina, May 20–28, 2017, pp 597–608.  https://doi.org/10.1109/ICSE.2017.61
  8. Coles H (2010) The PIT mutation testing tool. http://pitest.org/, Last accessed October 2017
  9. Delahaye M, Du Bousquet L (2015) Selecting a software engineering tool: lessons learnt from mutation analysis. Software: Practice and Experience 45(7):875–891.  https://doi.org/10.1002/spe.2312 Google Scholar
  10. DeMillo RA, Lipton RJ, Sayward FG (1978) Hints on test data selection: help for the practicing programmer. IEEE Computer 11(4):34–41.  https://doi.org/10.1109/C-M.1978.218136 CrossRefGoogle Scholar
  11. Deng L, Offutt J, Li N (2013) Empirical evaluation of the statement deletion mutation operator. In: IEEE sixth international conference on software testing, verification and validation, pp 84–93.  https://doi.org/10.1109/ICST.2013.20
  12. Devroey X, Perrouin G, Papadakis M, Legay A, Schobbens P, Heymans P (2016) Featured model-based mutation analysis. In: Proceedings of the 38th international conference on software engineering, ICSE 2016, Austin, TX, USA, May 14–22, 2016, pp 655–666.  https://doi.org/10.1145/2884781.2884821
  13. Fagerland MW, Lydersen S, Laake P (2014) Recommended tests and confidence intervals for paired binomial proportions. Stat Med 33(16):2850–2875.  https://doi.org/10.1002/sim.6148 MathSciNetCrossRefGoogle Scholar
  14. Fraser G, Arcuri A (2011) Evosuite: automatic test suite generation for object-oriented software. In: SIGSOFT/FSE’11 19th ACM SIGSOFT symposium on the foundations of software engineering (FSE-19) and ESEC’11: 13th european software engineering conference (ESEC-13), Szeged, Hungary, September 5–9, 2011, pp 416–419.  https://doi.org/10.1145/2025113.2025179
  15. Fraser G, Zeller A (2012) Mutation-driven generation of unit tests and oracles. IEEE Trans Software Eng 38(2):278–292.  https://doi.org/10.1109/TSE.2011.93 CrossRefGoogle Scholar
  16. Geist R, Offutt A, Harris JFC (1992) Estimation and enhancement of real-time software reliability through mutation analysis. IEEE Trans Comput 41(5):550–558.  https://doi.org/10.1109/12.142681 CrossRefGoogle Scholar
  17. Gopinath R, Ahmed I, Alipour MA, Jensen C, Groce A (2016) Does choice of mutation tool matter? Softw Qual J 1–50.  https://doi.org/10.1007/s11219-016-9317-7
  18. Henard C, Papadakis M, Traon YL (2014) Mutation-based generation of software product line test configurations. In: Search-based software engineering - 6th international symposium, SSBSE 2014, Fortaleza, Brazil, August 26–29, 2014. Proceedings, pp 92–106.  https://doi.org/10.1007/978-3-319-09940-8_7
  19. Coles H, Laurent T, Henard C, Papadakis M, Ventresque A (2016) PIT: a practical mutation testing tool for Java (demo). In: ISSTA.  https://doi.org/10.1145/2931037.2948707, pp 449–452
  20. Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng 37(5):649–678.  https://doi.org/10.1109/TSE.2010.62 CrossRefGoogle Scholar
  21. Just R, Schweiggert F, Kapfhammer GM (2011) MAJOR: An efficient and extensible tool for mutation analysis in a Java compiler. In: 26th IEEE/ACM international conference on automated software engineering (ASE 2011), Lawrence, KS, USA, November 6–10, 2011, pp 612–615.  https://doi.org/10.1109/ASE.2011.6100138
  22. Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for Java programs. In: International symposium on software testing and analysis, ISSTA ’14, San Jose, CA, USA - July 21–26, 2014, pp 437–440.  https://doi.org/10.1145/2610384.2628055
  23. Kintis M (2016) Effective methods to tackle the equivalent mutant problem when testing software with mutation. PhD thesis, Department of Informatics, Athens University of Economics and BusinessGoogle Scholar
  24. Kintis M, Malevris N (2015) MEDIC: a static analysis framework for equivalent mutant identification. Inf Softw Technol 68:1–17.  https://doi.org/10.1016/j.infsof.2015.07.009 CrossRefGoogle Scholar
  25. Kintis M, Papadakis M, Malevris N (2010) Evaluating mutation testing alternatives: a collateral experiment. In: Proceedings of the 17th asia-pacific software engineering conference, pp 300–309.  https://doi.org/10.1109/APSEC.2010.42
  26. Kintis M, Papadakis M, Malevris N (2015) Employing second-order mutation for isolating first-order equivalent mutants. Software Testing, Verification and Reliability (STVR) 25(5-7):508–535.  https://doi.org/10.1002/stvr.1529 CrossRefGoogle Scholar
  27. Kintis M, Papadakis M, Papadopoulos A, Valvis E, Malevris N (2016) Analysing and comparing the effectiveness of mutation testing tools: a manual study. In: International working conference on source code analysis and manipulation, pp 147–156Google Scholar
  28. Kintis M, Papadakis M, Jia Y, Malevris N, Traon YL, Harman M (2017a) Detecting trivial mutant equivalences via compiler optimisations. IEEE Trans Softw Eng PP(99):1–1.  https://doi.org/10.1109/TSE.2017.2684805
  29. Kintis M, Papadakis M, Papadopoulos A, Valvis E, Malevris N, Traon YL (2017b) Accompanying data for the paper: how effective mutation testing tools are? An empirical analysis of Java mutation testing tools with manual analysis and real faults.  https://doi.org/10.6084/m9.figshare.5558587.v1
  30. Kintis M, Papadakis M, Papadopoulos A, Valvis E, Malevris N, Traon YL (2017c) Supporting site for the paper: how effective mutation testing tools are? An empirical analysis of Java mutation testing tools with manual analysis and real faultss. http://pages.cs.aueb.gr/~kintism/papers/emse2017/
  31. Kurtz B, Ammann P, Offutt J, Delamaro M E, Kurtz M, Gökçe N (2016) Analyzing the validity of selective mutation with dominator mutants. In: Proceedings of the 24th ACM SIGSOFT international symposium on foundations of software engineering, FSE 2016, Seattle, WA, USA, November 13–18, 2016, , pp 571–582.  https://doi.org/10.1145/2950290.2950322
  32. Laurent T, Papadakis M, Kintis M, Henard C, Traon YL, Ventresque A (2017a) Assessing and improving the mutation testing practice of pit. In: IEEE international conference on software testing, verification and validation, ICSTGoogle Scholar
  33. Laurent T, Ventresque A, Papadakis M, Henard C, Traon Y (2017b) PITRV: the extended version of PIT. https://github.com/laurenttho3/extendedpitest, Last Accessed October 2017
  34. Li N, Praphamontripong U, Offutt J (2009) An experimental comparison of four unit test criteria: mutation, edge-pair, all-uses and prime path coverage. In: International conference on software testing, verification and validation workshops, pp 220–229.  https://doi.org/10.1109/ICSTW.2009.30
  35. Lindström B, Márki A (2016) On strong mutation and subsuming mutants. In: Proceedings of the 11th international workshop on mutation analysisGoogle Scholar
  36. Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, ACM, New York, NY, USA, FSE 2014.  https://doi.org/10.1145/2635868.2635920, pp 643–653
  37. Ma Y S, Offutt J, Kwon YR (2005) MuJava: an automated class mutation system. Software Testing, Verification and Reliability 15(2):97–133.  https://doi.org/10.1002/stvr.308 CrossRefGoogle Scholar
  38. Offutt AJ, Pan J (1997) Automatically detecting equivalent mutants and infeasible paths. Softw Test, Verif Reliab 7(3):165–192.  https://doi.org/10.1002/(SICI)1099-1689(199709)7:3<165::AID-STVR143>3.0.CO;2-U CrossRefGoogle Scholar
  39. Offutt AJ, Lee A, Rothermel G, Untch RH, Zapf C (1996) An experimental determination of sufficient mutant operators. ACM Trans Softw Eng Methodol 5(2):99–118CrossRefGoogle Scholar
  40. Offutt J (2011) A mutation carol: past, present and future. Inf Softw Technol 53(10):1098–1107.  https://doi.org/10.1016/j.infsof.2011.03.007 CrossRefGoogle Scholar
  41. Pacheco C, Ernst MD (2007) Randoop: feedback-directed random testing for Java. In: Companion to the 22nd annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications, OOPSLA 2007, October 21–25, 2007, Montreal, Quebec, Canada, pp 815–816.  https://doi.org/10.1145/1297846.1297902
  42. Papadakis M, Malevris N (2010a) Automatic mutation test case generation via dynamic symbolic execution. In: 21st international symposium on software reliability engineering, pp 121–130.  https://doi.org/10.1109/ISSRE.2010.38
  43. Papadakis M, Malevris N (2010b) An empirical evaluation of the first and second order mutation testing strategies. In: Third international conference on software testing, verification and validation, ICST 2010, Paris, France, April 7–9, 2010, workshops proceedings, pp 90–99.  https://doi.org/10.1109/ICSTW.2010.50
  44. Papadakis M, Traon YL (2015) Metallaxis-fl: mutation-based fault localization. Softw Test Verif Reliab 25(5-7):605–628.  https://doi.org/10.1002/stvr.1509 CrossRefGoogle Scholar
  45. Papadakis M, Delamaro ME, Traon YL (2014) Mitigating the effects of equivalent mutants with mutant classification strategies. Sci Comput Program 95:298–319.  https://doi.org/10.1016/j.scico.2014.05.012 CrossRefGoogle Scholar
  46. Papadakis M, Jia Y, Harman M, Traon YL (2015) Trivial compiler equivalence: a large scale empirical study of a simple, fast and effective equivalent mutant detection technique. In: 37th international conference on software engineering, vol 1, pp 936–946.  https://doi.org/10.1109/ICSE.2015.103
  47. Papadakis M, Henard C, Harman M, Jia Y, Le Traon Y (2016) Threats to the validity of mutation-based test assessment. In: Proceedings of the 25th international symposium on software testing and analysis, ACM, New York, NY, USA, ISSTA 2016, pp 354–365.  https://doi.org/10.1145/2931037.2931040
  48. Papadakis M, Kintis M, Zhang J, Jia Y, Traon YL, Harman M (2017) Mutation testing advances: an analysis and survey. Advances in ComputersGoogle Scholar
  49. Rani S, Suri B, Khatri SK (2015) Experimental comparison of automated mutation testing tools for Java. In: 4th international conference on reliability, infocom technologies and optimization, pp 1–6.  https://doi.org/10.1109/ICRITO.2015.7359265
  50. Shamshiri S, Just R, Rojas JM, Fraser G, McMinn P, Arcuri A (2015) Do automatically generated unit tests find real faults? An empirical study of effectiveness and challenges (t). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE), pp 201–211.  https://doi.org/10.1109/ASE.2015.86
  51. Visser W (2016) What makes killing a mutant hard. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, ASE 2016, Singapore, September 3–7, 2016, pp 39–44.  https://doi.org/10.1145/2970276.2970345
  52. Yao X, Harman M, Jia Y (2014) A study of equivalent and stubborn mutation operators using human analysis of equivalence. In: Proceedings of the 36th international conference on software engineering, pp 919–930.  https://doi.org/10.1145/2568225.2568265
  53. Zhang L, Hou S, Hu J, Xie T, Mei H (2010) Is operator-based mutant selection superior to random mutant selection?. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - volume 1, ICSE 2010, Cape Town, South Africa, 1–8 May 2010, pp 435–444.  https://doi.org/10.1145/1806799.1806863
  54. Zhang L, Marinov D, Zhang L, Khurshid S (2012) Regression mutation testing. In: International symposium on software testing and analysis, ISSTA 2012, Minneapolis, MN, USA, July 15–20, 2012, pp 331–341.  https://doi.org/10.1145/2338965.2336793

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Interdisciplinary Centre for Security, Reliability and TrustUniversity of LuxembourgLuxembourgLuxembourg
  2. 2.Department of InformaticsAthens University of Economics and BusinessAthensGreece

Personalised recommendations