Advertisement

Empirical Software Engineering

, Volume 23, Issue 5, pp 3007–3033 | Cite as

Overfitting in semantics-based automated program repair

  • Xuan Bach D. LeEmail author
  • Ferdian Thung
  • David Lo
  • Claire Le Goues
Article

Abstract

The primary goal of Automated Program Repair (APR) is to automatically fix buggy software, to reduce the manual bug-fix burden that presently rests on human developers. Existing APR techniques can be generally divided into two families: semantics- vs. heuristics-based. Semantics-based APR uses symbolic execution and test suites to extract semantic constraints, and uses program synthesis to synthesize repairs that satisfy the extracted constraints. Heuristic-based APR generates large populations of repair candidates via source manipulation, and searches for the best among them. Both families largely rely on a primary assumption that a program is correctly patched if the generated patch leads the program to pass all provided test cases. Patch correctness is thus an especially pressing concern. A repair technique may generate overfitting patches, which lead a program to pass all existing test cases, but fails to generalize beyond them. In this work, we revisit the overfitting problem with a focus on semantics-based APR techniques, complementing previous studies of the overfitting problem in heuristics-based APR. We perform our study using IntroClass and Codeflaws benchmarks, two datasets well-suited for assessing repair quality, to systematically characterize and understand the nature of overfitting in semantics-based APR. We find that similar to heuristics-based APR, overfitting also occurs in semantics-based APR in various different ways.

Keywords

Automated program repair Program synthesis Symbolic execution Patch overfitting 

Notes

Acknowledgements

We thank the authors of Angelix for making the tool publicly available. We also thank the authors of CVC4 and Enumerative synthesis engines for making these engines accessible.

References

  1. Abreu R, Zoeteweij P, Van Gemund AJ (2007) On the accuracy of spectrum-based fault localization. In: Testing: academic and industrial conference practice and research techniques-MUTATION, 2007. TAICPART-MUTATION 2007. IEEE, pp 89–98Google Scholar
  2. Alur R, Bodik R, Juniwal G, Martin MM, Raghothaman M, Seshia SA, Singh R, Solar-Lezama A, Torlak E, Udupa A (2015) Syntax-guided synthesis. Dependable Software Systems Engineering 40:1–25Google Scholar
  3. Cadar C, Dunbar D, Engler DR et al. (2008) Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Operating systems design and implementation, OSDI, pp 209–224Google Scholar
  4. DeMarco F, Xuan J, Le Berre D, Monperrus M (2014) Automatic repair of buggy if conditions and missing preconditions with SMT. In: Proceedings of the 6th international workshop on constraints in software testing, verification, and analysis, pp 30–39Google Scholar
  5. Fry ZP, Landau B, Weimer W (2012) A human study of patch maintainability. In: International symposium on software testing and analysis (ISSTA), pp 177–187Google Scholar
  6. Gulwani S, Esparza J, Grumberg O, Sickert S (2016) Programming by examples (and its applications in data wrangling). Verification and synthesis of correct and secure systemsGoogle Scholar
  7. Jha S, Gulwani S, Seshia SA, Tiwari A (2010) Oracle-guided component-based program synthesis. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, ICSE, pp 215–224Google Scholar
  8. Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the 2014 international symposium on software testing and analysis, ISSTA, pp 437–440Google Scholar
  9. Ke Y, Stolee KT, Le Goues C, Brun Y (2015) Repairing programs with semantic code search. In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering, pp 295–306Google Scholar
  10. Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: ACM/IEEE international conference on software engineering, ICSE, pp 802–811Google Scholar
  11. Könighofer R, Bloem R (2011) Automated error localization and correction for imperative programs. In: Formal methods in computer-aided design, IEEE, FMCAD, pp 91–100Google Scholar
  12. Könighofer R, Bloem R (2012) Repair with on-the-fly program analysis. In: Haifa verification conference. Springer, pp 56–71Google Scholar
  13. Le XBD (2016) Towards efficient and effective automatic program repair. In: International conference on automated software engineering (ASE). IEEE, pp 876–879Google Scholar
  14. Le TDB, Le XBD, Lo D, Beschastnikh I (2015a) Synergizing specification miners through model fissions and fusions. In: International conference on automated software engineering (ASE). IEEE, pp 115–125Google Scholar
  15. Le X BD, Le TDB, Lo D (2015b) Should fixing these failures be delegated to automated program repair?. In: International symposium on software reliability engineering (ISSRE). IEEE, pp 427–437Google Scholar
  16. Le X BD, Le Q L, Lo D, Le Goues C (2016a) Enhancing automated program repair with deductive verification. In: International conference on software maintenance and evolution (ICSME). IEEE, pp 428–432Google Scholar
  17. Le X BD, Lo D, Le Goues C (2016b) Empirical study on synthesis engines for semantics-based program repair. In: International conference on software maintenance and evolution (ICSME). IEEE, pp 423–427Google Scholar
  18. Le X BD, Lo D, Le Goues C (2016c) History driven program repair. In: Proceedings of the 23rd IEEE international conference on software analysis, evolution, and reengineering, SANER, pp 213–224Google Scholar
  19. Le X BD, Chu D H, David L, Le Goues C (2017a) Jfix: sematics-based repair of java programs via symbolic pathfinder. In: Proceedings of the 2017 ACM international symposium on software testing and analysis, ISSTAGoogle Scholar
  20. Le X BD, Chu DH, Lo D, Le Goues C, Visser W (2017b) S3: syntax-and semantic-guided repair synthesis via programming by examples. In: Joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, pp 593–604Google Scholar
  21. Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012) A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. In: 2012 34th international conference on software engineering (ICSE). IEEE, pp 3–13Google Scholar
  22. Le Goues C, Nguyen T, Forrest S, Weimer W (2012) Genprog: a generic method for automatic software repair. IEEE Trans Softw Eng 38(1):54–72CrossRefGoogle Scholar
  23. Le Goues C, Holtschulte N, Smith EK, Brun Y, Devanbu P, Forrest S, Weimer W (2015) The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Trans Softw Eng 41(12):1236–1256CrossRefGoogle Scholar
  24. Logozzo F, Ball T (2012) Modular and verified automatic program repair. SIGPLAN Not 47(10):133–146CrossRefGoogle Scholar
  25. Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, pp 166–178Google Scholar
  26. Long F, Rinard M (2016a) An analysis of the search spaces for generate and validate patch generation systems. In: Proceedings of the 38th international conference on software engineering. ACM, pp 702–713Google Scholar
  27. Long F, Rinard M (2016b) Automatic patch generation by learning correct code. In: ACM SIGPLAN notices, vol 51. ACM, pp 298–312Google Scholar
  28. Martinez M, Monperrus M (2015) Mining software repair models for reasoning on the search space of automated program fixing. Empir Softw Eng 20(1):176–205CrossRefGoogle Scholar
  29. Mechtaev S, Yi J, Roychoudhury A (2015) Directfix: Looking for simple program repairs. In: Proceedings of the 37th ACM/IEEE international conference on software engineering, ICSE, pp 448–458Google Scholar
  30. Mechtaev S, Yi J, Roychoudhury A (2016) Angelix: scalable multiline program patch synthesis via symbolic analysis. In: Proceedings of the 38th ACM/IEEE international conference on software engineering, ICSE, pp 691–701Google Scholar
  31. Nguyen HD T, Qi D, Roychoudhury A, Chandra S (2013) Semfix: Program repair via semantic analysis. In: Proceedings of the 2013 international conference on software engineering, ICSE, pp 772–781Google Scholar
  32. Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The strength of random search on automated program repair. In: Proceedings of the 36th international conference on software engineering, ACM, ICSE, pp 254–265Google Scholar
  33. Qi Z, Long F, Achour S, Rinard M (2015) An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: Proceedings of the 2015 international symposium on software testing and analysis, ISSTA, pp 24–36Google Scholar
  34. Reynolds A, Deters M, Kuncak V, Tinelli C, Barrett C (2015) Counterexample-guided quantifier instantiation for synthesis in SMT. In: International conference on computer aided verification. Springer, pp 198–216Google Scholar
  35. Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the cure worse than the disease? Overfitting in automated program repair. In: Joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, ACM, ESEC/FSE, pp 532–543Google Scholar
  36. Tan SH, Yi J, Yulis, Mechtaev S, Roychoudhury A (2017) Codeflaws: a programming competition benchmark for evaluating automated program repair tools. In: ICSE Poster, to appearGoogle Scholar
  37. Tassey G (2002) The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology, RTI project 7007(011)Google Scholar
  38. Thung F, Le X BD, Lo D (2015) Active semi-supervised defect categorization. In: International conference on program comprehension (ICPC). IEEE Press, pp 60–70Google Scholar
  39. Weimer W, Forrest S, Le Goues C, Nguyen T (2010) Automatic program repair with evolutionary computation. Commun ACM 53(5):109–116CrossRefGoogle Scholar
  40. Weimer W, Fry ZP, Forrest S (2013) Leveraging program equivalence for adaptive program repair: models and first results. In: Proceedings of the 28th international conference on automated software engineering, IEEE, ASE, pp 356–366Google Scholar
  41. Xuan J, Martinez M, DeMarco F, Clément M, Lamelas S, Durieux T, Le Berre D, Monperrus M (2016) Nopol: Automatic repair of conditional statement bugs in java programs. IEEE Trans Softw Eng  https://doi.org/10.1109/TSE.2016.2560811. https://hal.archives-ouvertes.fr/hal-01285008/document

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Information SystemsSingapore Management UniversitySingaporeSingapore
  2. 2.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations