Advertisement

Empirical Software Engineering

, Volume 23, Issue 5, pp 2948–2979 | Cite as

A correlation study between automated program repair and test-suite metrics

  • Jooyong Yi
  • Shin Hwei Tan
  • Sergey Mechtaev
  • Marcel Böhme
  • Abhik Roychoudhury
Article

Abstract

Automated program repair is increasingly gaining traction, due to its potential to reduce debugging cost greatly. The feasibility of automated program repair has been shown in a number of works, and the research focus is gradually shifting toward the quality of generated patches. One promising direction is to control the quality of generated patches by controlling the quality of test-suites used for automated program repair. In this paper, we ask the following research question: “Can traditional test-suite metrics proposed for the purpose of software testing also be used for the purpose of automated program repair?” We empirically investigate whether traditional test-suite metrics such as statement/branch coverage and mutation score are effective in controlling the reliability of generated repairs (the likelihood that repairs cause regression errors). We conduct the largest-scale experiments of this kind to date with real-world software, and for the first time perform a correlation study between various test-suite metrics and the reliability of generated repairs. Our results show that in general, with the increase of traditional test suite metrics, the reliability of repairs tend to increase. In particular, such a trend is most strongly observed in statement coverage. Our results imply that the traditional test suite metrics proposed for software testing can also be used for automated program repair to improve the reliability of repairs.

Keywords

Automated program repair Test suite Empirical evaluation Correlation 

Notes

Acknowledgements

This research is supported in part by the National Research Foundation, Prime Minister’s Office, Singapore under its National Cybersecurity R&D Program (TSUNAMi project, Award No. NRF2014NCR-NCR001-21) and administered by the National Cybersecurity R&D Directorate. The first author thanks Innopolis University for its support.

References

  1. Andrews JH, Briand LC, Labiche Y, Namin AS (2006) Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans Softw Eng 32(8):608–624CrossRefGoogle Scholar
  2. Artzi S, Dolby J, Tip F, Pistoia M (2010) Directed test generation for effective fault localization. In: Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA ’10, pp 49–60Google Scholar
  3. Assiri FY, Bieman JM (2014) An assessment of the quality of automated program operator repair. In: Proceedings of the 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation, ICSE ’14, pp 273–282Google Scholar
  4. Baudry B, Fleurey F, Le Traon Y (2006) Improving test suites for efficient fault localization. In: 82–91Google Scholar
  5. Böhme M, Roychoudhury A (2014) CoREBench: Studying complexity of regression errors. In: Proceedings of the 2014 International Symposium on Software Testing and Analysis, ISSTA ’14, pp 105–115Google Scholar
  6. Böhme M, Oliveira BCdS, Roychoudhury A (2013a) Partition-based regression verification. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp 302–311Google Scholar
  7. Böhme M, Oliveira BCdS, Roychoudhury A (2013b) Regression tests to expose change interaction errors. In: Proceedings of the 2013 Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE ’13, pp 334–344Google Scholar
  8. Cadar C, Engler D (2005) Execution generated test cases: How to make systems code crash itself. In: Proceedings of the 12th International Conference on Model Checking Software, SPIN ’05, pp 2–23Google Scholar
  9. Cadar C, Dunbar D, Engler D (2008). In: KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’ 08, pp 209–224Google Scholar
  10. Dallmeier V, Zeller A, Meyer B (2009) Generating fixes from object behavior anomalies. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, pp 550–554Google Scholar
  11. Debroy V, Wong WE (2010) Using mutation to automatically suggest fixes for faulty programs. In: Proceedings of the Third International Conference on Software Testing, Verification and Validation, ICST ’10, pp 65–74Google Scholar
  12. Debroy V, Wong WE (2014) Combining mutation and fault localization for automated program debugging. J Syst Softw 90:45–60CrossRefGoogle Scholar
  13. Do H, Elbaum SG, Rothermel G (2005) Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empir Softw Eng 10(4):405–435CrossRefGoogle Scholar
  14. Elkarablieh B, Khurshid S (2008) Juzi: A tool for repairing complex data structures. In: Proceedings of the 30th International Conference on Software Engineering, ICSE ’08, pp 855–858Google Scholar
  15. Godefroid P, Klarlund N, Sen K (2005) DART: Directed automated random testing. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’05, pp 213–223Google Scholar
  16. Gopinath D, Malik MZ, Khurshid S (2011) Specification-based program repair using SAT. In: Proceedings of the 17th International Conference on Tools and Algorithms for the Construction and Analysis of Systems: Part of the Joint European Conferences on Theory and Practice of Software, TACAS ’11/ETAPS ’11, pp 173–188Google Scholar
  17. He H, Gupta N (2004) Automated debugging using path-based weakest preconditions. In: Proceedings of the 7th International Conference on Fundamental Approaches to Software Engineering, FASE ’04, pp 267–280Google Scholar
  18. Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng 37(5):649–678CrossRefGoogle Scholar
  19. Jobstmann B, Griesmayer A, Bloem R (2005) Program repair as a game. In: Proceedings of the 17th International Conference on Computer Aided Verification, CAV ’05, pp 226–238Google Scholar
  20. Jones JA, Harrold MJ, Stasko JT (2002) Visualization of test information to assist fault localization. In: Proceedings of the 24th International Conference on Software Engineering, ICSE ’02, pp 467–477Google Scholar
  21. Ke Y, Stolee KT, Le Goues C, Brun Y (2015) Repairing programs with semantic code search (t). In: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering, ASE ’15, pp 295–306Google Scholar
  22. Kendall MG (1945) The treatment of ties in ranking problems. Biometrika 33 (3):239–251MathSciNetCrossRefzbMATHGoogle Scholar
  23. Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp 802–811Google Scholar
  24. Kong X, Zhang L, Wong WE, Li B (2015) Experience report: How do techniques, programs, and tests impact automated program repair?. In: Proceedings of the 2015 IEEE 26th International Symposium on Software Reliability Engineering, ISSRE ’15, pp 194–204Google Scholar
  25. Könighofer R, Bloem R (2011) Automated error localization and correction for imperative programs. In: Proceedings of the International Conference on Formal Methods in Computer-Aided Design, FMCAD ’11, pp 91–100Google Scholar
  26. Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012a) A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp 3–13Google Scholar
  27. Le Goues C, Nguyen T, Forrest S, Weimer W (2012b) GenProg: A generic method for automatic software repair. IEEE Trans Softw Eng 38(1):54–72Google Scholar
  28. Le Goues C, Forrest S, Weimer W (2013) Current challenges in automatic software repair. Softw Qual J 21(3):421–443CrossRefGoogle Scholar
  29. Liblit B, Aiken A, Zheng AX, Jordan MI (2003) Bug isolation via remote program sampling. In: Proceedings of the ACM SIGPLAN 2003 conference on Programming Language Design and Implementation, PLDI ’03, pp 141–154Google Scholar
  30. Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 2015 Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE ’15, pp 166–178Google Scholar
  31. Long F, Rinard M (2016a) An analysis of the search spaces for generate and validate patch generation systems. In: Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pp 702–713Google Scholar
  32. Long F, Rinard M (2016b) Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’16, pp 298–312Google Scholar
  33. Long F, Sidiroglou-Douskos S, Rinard M (2014) Automatic runtime error repair and containment via recovery shepherding. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pp 227–238Google Scholar
  34. Maldonado JC, Delamaro ME, Fabbri SCPF, Simão A d S, Sugeta T, Vincenzi AMR, Masiero PC (2001) Proteum: A family of tools to support specification and program testing based on mutation. In: Wong W E (ed) Mutation Testing for the New Century, Kluwer Academic Publishers, Norwell, pp 113–116Google Scholar
  35. Mechtaev S, Yi J, Roychoudhury A (2015) DirectFix: Looking for simple program repairs. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, ICSE ’15, pp 448–458Google Scholar
  36. Mechtaev S, Yi J, Roychoudhury A (2016) Angelix: Scalable multiline program patch synthesis via symbolic analysis. In: Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pp 691–701Google Scholar
  37. Miller W, Spooner DL (1976) Automatic generation of floating-point test data. IEEE Trans Softw Eng 2(3):223–226MathSciNetCrossRefGoogle Scholar
  38. Namin AS, Andrews JH (2009) The influence of size and coverage on test suite effectiveness. In: Proceedings of the 8th International Symposium on Software Testing and Analysis, ISSTA ’09, pp 57–68Google Scholar
  39. Nguyen HDT, Qi D, Roychoudhury A, Chandra S (2013) SemFix: Program repair via semantic analysis. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp 772–781Google Scholar
  40. Pearson K (1895) Note on regression and inheritance in the case of two parents. Proc Royal Soc Lond 58:240–242CrossRefGoogle Scholar
  41. Pei Y, Furia C, Nordio M, Wei Y, Meyer B, Zeller A (2014) Automated fixing of programs with contracts. IEEE Trans Softw Eng 40(5):427–449CrossRefGoogle Scholar
  42. Perkins JH, Kim S, Larsen S, Amarasinghe S, Bachrach J, Carbin M, Pacheco C, Sherwood F, Sidiroglou S, Sullivan G, Wong WF, Zibin Y, Ernst MD, Rinard M (2009) Automatically patching errors in deployed software. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP ’09, pp 87–102Google Scholar
  43. Person S, Yang G, Rungta N, Khurshid S (2011) Directed incremental symbolic execution. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, pp 504–515Google Scholar
  44. Qi Y, Mao X, Lei Y (2013) Efficient automated program repair through fault-recorded testing prioritization. In: Proceedings of the 2013 IEEE International Conference on Software Maintenance, ICSM ’13, pp 180–189Google Scholar
  45. Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The strength of random search on automated program repair. In: Proceedings of the 36th International Conference on Software Engineering, ICSE ’14, pp 254–265Google Scholar
  46. Qi Z, Long F, Achour S, Rinard M (2015) An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA, pp 24–36Google Scholar
  47. Samimi H, Aung ED, Millstein T (2010) Falling back on executable specifications. In: Proceedings of the 24th European Conference on Object-oriented Programming, ECOOP’10, pp 552–576Google Scholar
  48. Samimi H, Schäfer M, Artzi S, Millstein T, Tip F, Hendren L (2012) Automated repair of HTML generation errors in PHP applications using string constraint solving. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp 277–287Google Scholar
  49. Santelices R, Chittimalli PK, Apiwattanapong T, Orso A, Harrold MJ (2008) Test-suite augmentation for evolving software. In: Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering, ASE ’08, pp 218–227Google Scholar
  50. Shoenauer M, Xanthakis S (1993) Constrained GA optimization. In: Proceedings of the 5th International Conference on Genetic Algorithms, ICGA ’93, pp 573–580Google Scholar
  51. Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the cure worse than the disease? overfitting in automated program repair. In: Proceedings of the 2015 Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE ’15, pp 532–543Google Scholar
  52. Tan SH, Roychoudhury A (2015) relifix: Automated repair of software regressions. In: Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, ICSE ’15, pp 471–482Google Scholar
  53. Tan SH, Yoshida H, Prasad MR, Roychoudhury A (2016) Anti-patterns in search-based program repair. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE’16, pp 727–738Google Scholar
  54. Weimer W, Fry ZP, Forrest S (2013) Leveraging program equivalence for adaptive program repair: Models and first results. In: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, ASE ’13, pp 356–366Google Scholar
  55. White DR, Arcuri A, Clark JA (2011) Evolutionary improvement of programs. IEEE Trans Evol Comput 15(4):515–538CrossRefGoogle Scholar
  56. Xuan J, Martinez M, Demarco F, Clement M, Marcote SRL, Durieux T, Berre DL, Monperrus M (2017) Nopol: Automatic repair of conditional statement bugs in Java programs. IEEE Trans Softw Eng 43(1):34–55CrossRefGoogle Scholar
  57. Yao X, Harman M, Jia Y (2014) A study of equivalent and stubborn mutation operators using human analysis of equivalence. In: Proceedings of the 36th International Conference on Software Engineering, ICSE ’14, pp 919–930Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Institute of Technologies and Software DevelopmentInnopolis UniversityInnopolisRussia
  2. 2.School of ComputingNational University of SingaporeSingaporeSingapore

Personalised recommendations