Software Quality Journal

, Volume 25, Issue 3, pp 871–920 | Cite as

Does choice of mutation tool matter?

  • Rahul Gopinath
  • Iftekhar Ahmed
  • Mohammad Amin Alipour
  • Carlos Jensen
  • Alex Groce
Article

Abstract

Though mutation analysis is the primary means of evaluating the quality of test suites, it suffers from inadequate standardization. Mutation analysis tools vary based on language, when mutants are generated (phase of compilation), and target audience. Mutation tools rarely implement the complete set of operators proposed in the literature and mostly implement at least a few domain-specific mutation operators. Thus different tools may not always agree on the mutant kills of a test suite. Few criteria exist to guide a practitioner in choosing the right tool for either evaluating effectiveness of a test suite or for comparing different testing techniques. We investigate an ensemble of measures for evaluating efficacy of mutants produced by different tools. These include the traditional difficulty of detection, strength of minimal sets, and the diversity of mutants, as well as the information carried by the mutants produced. We find that mutation tools rarely agree. The disagreement between scores can be large, and the variation due to characteristics of the project—even after accounting for difference due to test suites—is a significant factor. However, the mean difference between tools is very small, indicating that no single tool consistently skews mutation scores high or low for all projects. These results suggest that experiments yielding small differences in mutation score, especially using a single tool, or a small number of projects may not be reliable. There is a clear need for greater standardization of mutation analysis. We propose one approach for such a standardization.

Keywords

Mutation analysis Empirical analysis Software testing 

References

  1. Acree, Jr. A. T. (1980). On mutation. Ph.D. dissertation, Georgia Institute of Technology, Atlanta, GA, USA.Google Scholar
  2. Ammann, P. (2015b). Transforming mutation testing from the technology of the future into the technology of the present. In International conference on software testing, verification and validation workshops. IEEE.Google Scholar
  3. Ammann, P., Delamaro, M. E., & Offutt, J. (2014). Establishing theoretical minimal sets of mutants. In International conference on software testing, verification and validation (pp. 21–30). Washington, DC, USA: IEEE Computer Society.Google Scholar
  4. Andrews, J. H., Briand, L. C., & Labiche, Y. (2005). Is mutation an appropriate tool for testing experiments? In International conference on software engineering (pp. 402–411). IEEE.Google Scholar
  5. Andrews, J. H., Briand, L. C., Labiche, Y., & Namin, A. S. (2006). Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Transactions on Software Engineering, 32(8), 608–624.CrossRefGoogle Scholar
  6. Apache Software Foundation. (2016). Apache commons. http://commons.apache.org/.
  7. Baldwin, D., & Sayward, F. (1979). Heuristics for determining equivalence of program mutations. DTIC Document: Tech. rep.Google Scholar
  8. Barbosa, E. F., Maldonado, J. C., & Vincenzi, A. M. R. (2001). Toward the determination of sufficient mutant operators for c. Software Testing, Verification and Reliability, 11(2), 113–136.CrossRefGoogle Scholar
  9. Budd, T. A. (1980). Mutation analysis of program test data. Ph.D. dissertation, Yale University, New Haven, CT, USA.Google Scholar
  10. Budd, T. A., DeMillo, R. A., Lipton, R. J., & Sayward, F. G. (1980). Theoretical and empirical studies on using program mutation to test the functional correctness of programs. In ACM SIGPLAN-SIGACT symposium on principles of programming languages (pp. 220–233). ACM.Google Scholar
  11. Budd, T. A., Lipton, R. J., DeMillo, R. A., & Sayward, F. G. (1979). Mutation analysis. Yale University, Department of Computer Science.Google Scholar
  12. Budd, T. A., & Gopal, A. S. (1985). Program testing by specification mutation. Computer Languages, 10(1), 63–73.CrossRefMATHGoogle Scholar
  13. Cai, X., & Lyu, M. R. (2005). The effect of code coverage on fault detection under different testing profiles. In ACM SIGSOFT software engineering notes (Vol. 30, no. 4, pp. 1–7). ACM.Google Scholar
  14. Chevalley, P., & Thévenod-Fosse, P. (2003). A mutation analysis tool for java programs. International Journal on Software Tools for Technology Transfer, 5(1), 90–103.CrossRefGoogle Scholar
  15. Coles, H. (2016). Pit mutation testing. http://pitest.org/.
  16. Coles, H. (2016a). Mutation testing systems for java compared. http://pitest.org/java_mutation_testing_systems/.
  17. Coles, H. (2016b). Pit mutators. http://pitest.org/quickstart/mutators/.
  18. Daran, M., & Thévenod-Fosse, P. (1996). Software error analysis: A real case study involving real faults and mutations. In ACM SIGSOFT international symposium on software testing and analysis (pp. 158–171). ACM.Google Scholar
  19. Delahaye, M., & Du Bousquet, L. (2013). A comparison of mutation analysis tools for java. In Quality software (QSIC), 2013 13th international conference on (pp. 187–195). IEEE.Google Scholar
  20. DeMillo, R. A., Guindi, D. S., McCracken, W., Offutt, A., & King, K. (1988). An extended overview of the mothra software testing environment. In International conference on software testing, verification and validation workshops (pp. 142–151). IEEE.Google Scholar
  21. DeMillo, R. A., Lipton, R. J., & Sayward, F. G. (1978). Hints on test data selection: Help for the practicing programmer. Computer, 11(4), 34–41.CrossRefGoogle Scholar
  22. Derezińska, A., & Hałas, K. (2014). Analysis of mutation operators for the python language. In International conference on dependability and complex systems, ser. Advances in Intelligent Systems and Computing (Vol. 286, pp. 155–164). Springer.Google Scholar
  23. Do, H., & Rothermel, G. (2006). On the use of mutation faults in empirical assessments of test case prioritization techniques. IEEE Transactions on Software Engineering, 32(9), 733–752.CrossRefGoogle Scholar
  24. Duraes, J., & Madeira, H. (2002). Emulation of software faults by educated mutations at machine-code level. International Symposium on Software Reliability Engineering, 2002, 329–340.CrossRefGoogle Scholar
  25. GitHub Inc. (2016). Software repository. http://www.github.com.
  26. Gligoric, M., Groce, A., Zhang, C., Sharma, R., Alipour, M. A., & Marinov, D. (2013). Comparing non-adequate test suites using coverage criteria. In ACM SIGSOFT international symposium on software testing and analysis. ACM.Google Scholar
  27. Gligoric, M., Jagannath, V., & Marinov, D. (2010). Mutmut: Efficient exploration for mutation testing of multithreaded code. In Software testing, verification and validation (ICST), 2010 third international conference on (pp. 55–64). IEEE.Google Scholar
  28. Gopinath, R. (2015). Replication data for: Does choice of mutation tool matter?. http://eecs.osuosl.org/rahul/sqj2015.
  29. Gopinath, R., Alipour, A., Ahmed, I., Jensen, C., & Groce, A. (2015). Do mutation reduction strategies matter? Oregon State University, tech. rep., August 2015, under review for Software Quality Journal. http://hdl.handle.net/1957/56917.
  30. Gopinath, R., Alipour, A., Ahmed, I., Jensen, C., & Groce, A. (2016). On the limits of mutation reduction strategies. In Proceedings of the 38th international conference on software engineering. ACM.Google Scholar
  31. Gopinath, R., Alipour, A., Iftekhar, A., Jensen, C., & Groce, A. (2015). How hard does mutation analysis have to be, anyway? In International symposium on software reliability engineering. IEEE.Google Scholar
  32. Gopinath, R., Jensen, C., & Groce, A. (2014). Code coverage for suite evaluation by developers. In International conference on software engineering. IEEE.Google Scholar
  33. Gopinath, R., Jensen, C., & Groce, A. (2014). Mutations: How close are they to real faults? In Software reliability engineering (ISSRE), 2014 IEEE 25th international symposium on (pp. 189–200), November 2014.Google Scholar
  34. Harder, M., Mellen, J., & Ernst, M.D. (2003). Improving test suites via operational abstraction. In International conference on software engineering (pp. 60–71). IEEE Computer Society.Google Scholar
  35. Harder, M., Morse, B., & Ernst, M. D. (2001). Specification coverage as a measure of test suite quality. MIT Lab for Computer Science: tech. rep.Google Scholar
  36. Irvine, S. A., Pavlinic, T., Trigg, L., Cleary, J. G., Inglis, S., & Utting, M. (2007). Jumble java byte code to measure the effectiveness of unit tests. In Testing: Academic and industrial conference practice and research techniques-MUTATION, 2007. TAICPART- MUTATION 2007 (pp. 169–175). IEEE, 2007.Google Scholar
  37. Jia, Y., & Harman, M. (2008). Milu: A customizable, runtime-optimized higher order mutation testing tool for the full c language. In Practice and Research Techniques, 2008. TAIC PART’08. Testing: Academic & industrial conference (pp. 94–98). IEEE, 2008.Google Scholar
  38. Jia, Y., & Harman, M. (2011). An analysis and survey of the development of mutation testing. IEEE Transactions on Software Engineering, 37(5), 649–678.CrossRefGoogle Scholar
  39. Just, R. (2014). The major mutation framework: Efficient and scalable mutation analysis for java. In Proceedings of the 2014 international symposium on software testing and analysis, ser. ISSTA 2014 (pp. 433–436). New York, NY: ACM.Google Scholar
  40. Just, R., Kapfhammer, G. M., & Schweiggert, F. (2012). Do redundant mutants affect the effectiveness and efficiency of mutation analysis? In Software testing, verification and validation (ICST), 2012 IEEE fifth international conference on (pp. 720–725). IEEE.Google Scholar
  41. Just, R., Jalali, D., Inozemtseva, L., Ernst, M. D., Holmes, R., & Fraser, G. (2014). Are mutants a valid substitute for real faults in software testing? ACM SIGSOFT symposium on the foundations of software engineering (pp. 654–665). Hong Kong: ACM.Google Scholar
  42. Kintis, M., Papadakis, M., & Malevris, N. (2010). Evaluating mutation testing alternatives: A collateral experiment. In Asia Pacific software engineering conference (APSEC) (pp. 300–309). IEEE.Google Scholar
  43. Kurtz, B., Ammann, P., Delamaro, M. E., Offutt, J., & Deng, L. (2014). Mutant subsumption graphs. In Software testing, verification and validation workshops (ICSTW), 2014 IEEE seventh international conference on (pp. 176–185). IEEE, 2014.Google Scholar
  44. Kusano, M., & Wang, C. (2013). Ccmutator: A mutation generator for concurrency constructs in multithreaded c/c++ applications. In Automated software engineering (ASE), 2013 IEEE/ACM 28th international conference on (pp. 722–725). IEEE.Google Scholar
  45. Langdon, W. B., Harman, M., & Jia, Y. (2010). Efficient multi-objective higher order mutation testing with genetic programming. Journal of systems and Software, 83(12), 2416–2430.CrossRefGoogle Scholar
  46. Le, D., Alipour, M. A., Gopinath, R., & Groce, A. (2014). Mucheck: An extensible tool for mutation testing of haskell programs. In Proceedings of the 2014 international symposium on software testing and analysis (pp. 429–432). ACM.Google Scholar
  47. Lipton, R. J. (1971). Fault diagnosis of computer programs. Carnegie Mellon University, Tech. rep.Google Scholar
  48. Ma, Y.-S., Kwon, Y.-R., & Offutt, J. (2002). Inter-class mutation operators for java. In International symposium on software reliability engineering (pp. 352–363). IEEE.Google Scholar
  49. Ma, Y.-S., Offutt, J., & Kwon, Y.-R. (2006). Mujava: A mutation system for java. In Proceedings of the 28th international conference on software engineering, ser. ICSE’06 (pp. 827–830). New York, NY: ACM, 2006.Google Scholar
  50. Macedo, M. G. (2016). Mutator. http://ortask.com/mutator/.
  51. Madeyski, L., & Radyk, N. (2010). Judy—A mutation testing tool for java. IET software, 4(1), 32–42.CrossRefGoogle Scholar
  52. Ma, Y.-S., Offutt, J., & Kwon, Y. R. (2005). Mujava: An automated class mutation system. Software Testing, Verification and Reliability, 15(2), 97–133.CrossRefGoogle Scholar
  53. Mathur, A. (1991). Performance, effectiveness, and reliability issues in software testing. In Annual international computer software and applications conference, COMPSAC (pp. 604–605), 1991.Google Scholar
  54. Mathur, A. P., & Wong, W. E. (1994). An empirical comparison of data flow and mutation-based test adequacy criteria. Software Testing, Verification and Reliability, 4(1), 9–31.CrossRefGoogle Scholar
  55. Moore, I. (2001). Jester—a junit test tester. In International conference on extreme programming (pp. 84–87).Google Scholar
  56. Namin, A. S., & Andrews, J. H. (2009). The influence of size and coverage on test suite effectiveness. In ACM SIGSOFT international symposium on software testing and analysis (pp. 57–68). ACM.Google Scholar
  57. Namin, A. S., Andrews, J. H., & Murdoch, D. J. (2008). Sufficient mutation operators for measuring test effectiveness. In International conference on software engineering (pp. 351–360). ACM.Google Scholar
  58. Nanavati, J., Wu, F., Harman, M., Jia, Y., & Krinke, J. (2015). Mutation testing of memory-related operators. In Software testing, verification and validation workshops (ICSTW), 2015 IEEE eighth international conference on (pp. 1–10). IEEE.Google Scholar
  59. Nica, S., & Wotawa, F. (2012). Using constraints for equivalent mutant detection. In Workshop on formal methods in the development of software, WS-FMDS (pp. 1–8).Google Scholar
  60. Nimmer, J. W., & Ernst, M. D. (2002). Automatic generation of program specifications. ACM SIGSOFT Software Engineering Notes, 27(4), 229–239.CrossRefGoogle Scholar
  61. Offut, J. (2016a). Problems with jester. https://cs.gmu.edu/offutt/documents/personal/jester-anal.html.
  62. Offut, J. (2016b). Problems with parasoft insure++. https://cs.gmu.edu/offutt/documents/handouts/parasoft-anal.html.
  63. Offutt, A. J., & Untch, R. H. (2000). Mutation, uniting the orthogonal. In Mutation testing for the new century (pp. 34–44). Springer, 2001.Google Scholar
  64. Offutt, A. J., & Voas, J. M. (1996). ‘Subsumption of condition coverage techniques by mutation testing. Technical report ISSE-TR-96-01. Information and Software Systems Engineering. Tech. rep.: George Mason University.Google Scholar
  65. Offutt, A. J., Rothermel, G., & Zapf, C. (1993). An experimental evaluation of selective mutation. In International conference on software engineering (pp. 100–107). IEEE Computer Society Press.Google Scholar
  66. Offutt, A. J. (1989). The coupling effect: Fact or fiction? ACM SIGSOFT Software Engineering Notes, 14(8), 131–140.CrossRefGoogle Scholar
  67. Offutt, A. J. (1992). Investigations of the software testing coupling effect. ACM Transactions on Software Engineering and Methodology, 1(1), 5–20.CrossRefGoogle Scholar
  68. Offutt, A. J., & Craft, W. M. (1994). Using compiler optimization techniques to detect equivalent mutants. Software Testing, Verification and Reliability, 4(3), 131–154.CrossRefGoogle Scholar
  69. Offutt, A. J., Lee, A., Rothermel, G., Untch, R. H., & Zapf, C. (1996). An experimental determination of sufficient mutant operators. ACM Transactions on Software Engineering and Methodology, 5(2), 99–118.CrossRefGoogle Scholar
  70. Offutt, A. J., & Pan, J. (1997). Automatically detecting equivalent mutants and infeasible paths. Software Testing, Verification and Reliability, 7(3), 165–192.CrossRefGoogle Scholar
  71. Okun, V. (2004). Specification mutation for test generation and analysis. Ph.D. dissertation, University of Maryland Baltimore County.Google Scholar
  72. Papadakis, M., Jia, Y., Harman, M., & Traon, Y. L. (2015). Trivial compiler equivalence: A large scale empirical study of a simple, fast and effective equivalent mutant detection technique. In International conference on software engineering.Google Scholar
  73. Schuler, D., & Zeller, A. (2009). Javalanche: Efficient mutation testing for java. In ACM SIGSOFT symposium on the foundations of software engineering (pp. 297–298). August, 2009.Google Scholar
  74. Schuler, D., Dallmeier, V., & Zeller, A. (2009). Efficient mutation testing by checking invariant violations. In ACM SIGSOFT international symposium on software testing and analysis (pp. 69–80). ACM.Google Scholar
  75. Schuler, D., & Zeller, A. (2013). Covering and uncovering equivalent mutants. Software Testing, Verification and Reliability, 23(5), 353–374.CrossRefGoogle Scholar
  76. Shannon, C. E. (2001). A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 5(1), 3–55.MathSciNetCrossRefGoogle Scholar
  77. Singh, P. K., Sangwan, O. P., & Sharma, A. (2014). A study and review on the development of mutation testing tools for java and aspect-j programs. International Journal of Modern Education and Computer Science (IJMECS), 6(11), 1.CrossRefGoogle Scholar
  78. Smith, B. H., & Williams, L. (2007). An empirical evaluation of the mujava mutation operators. In Testing: academic and industrial conference practice and research techniques-MUTATION, 2007. TAICPART-MUTATION 2007 (pp. 193–202). IEEE.Google Scholar
  79. Sridharan, M., & Namin, A. S. (2010). Prioritizing mutation operators based on importance sampling. In International symposium on software reliability engineering (pp. 378–387). IEEE.Google Scholar
  80. Untch, R. H. (2009). On reduced neighborhood mutation analysis using a single mutagenic operator. In Annual southeast regional conference, ser. ACM-SE 47 (pp. 71:1–71:4). New York, NY: ACM.Google Scholar
  81. Usaola, M. P., & Mateo, P. R. (2012). Bacterio: Java mutation testing tool: A framework to evaluate quality of tests cases. In Proceedings of the 2012 IEEE international conference on software maintenance (ICSM), ser. ICSM’12 (pp. 646–649). Washington, DC: IEEE Computer Society.Google Scholar
  82. Wah, K. S. H. T. (2000). A theoretical study of fault coupling. Software Testing, Verification and Reliability, 10(1), 3–45.MathSciNetCrossRefGoogle Scholar
  83. Wah, K. S. H. T. (2003). An analysis of the coupling effect i: Single test data. Science of Computer Programming, 48(2), 119–161.MathSciNetMATHGoogle Scholar
  84. Watanabe, S. (1960). Information theoretical analysis of multivariate correlation. IBM Journal of Research and Development, 4(1), 66–82.MathSciNetCrossRefMATHGoogle Scholar
  85. Wong, W. E. (1993). On mutation and data flow. Ph.D. dissertation, Purdue University, West Lafayette, IN, USA, uMI Order No. GAX94-20921.Google Scholar
  86. Wong, W., & Mathur, A. P. (1995). Reducing the cost of mutation testing: An empirical study. Journal of Systems and Software, 31(3), 185–196.CrossRefGoogle Scholar
  87. Yao, X., Harman, M., & Jia, Y. (2014). A study of equivalent and stubborn mutation operators using human analysis of equivalence. In International conference on software engineering (pp. 919–930).Google Scholar
  88. Zhang, L., Gligoric, M., Marinov, D., & Khurshid, S. (2013). Operator-based and random mutant selection: Better together. In IEEE/ACM automated software engineering. ACM.Google Scholar
  89. Zhang, L., Hou, S.-S., Hu, J.-J., Xie, T., & Mei, H. (2010). Is operator-based mutant selection superior to random mutant selection? In International conference on software engineering (pp. 435–444). New York, NY: ACM.Google Scholar
  90. Zhang, J., Zhu, M., Hao, D., & Zhang, L. (2014). An empirical study on the scalability of selective mutation testing. In International symposium on software reliability engineering. ACM.Google Scholar
  91. Zhou, C., & Frankl, P. (2009). Mutation testing for java database applications. In Software testing verification and validation, ICST’09. International conference on (pp. 396–405). IEEE, 2009.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Rahul Gopinath
    • 1
  • Iftekhar Ahmed
    • 2
  • Mohammad Amin Alipour
    • 2
  • Carlos Jensen
    • 2
  • Alex Groce
    • 2
  1. 1.EECS DepartmentOregon State UniversityCorvallisUSA
  2. 2.Oregon State UniversityCorvallisUSA

Personalised recommendations