A manual inspection of Defects4J bugs and its implications for automatic program repair

Abstract

Automatic program repair techniques, which target to generate correct patches for real-world defects automatically, have gained a lot of attention in the last decade. Many different techniques and tools have been proposed and developed. However, even the most sophisticated automatic program repair techniques can only repair a small portion of defects while producing a large number of incorrect patches. A possible reason for the low performance is the test suites of real-world programs are usually too weak to guarantee the behavior of a program. To understand to what extent defects can be fixed with exiting test suites, we manually analyzed 50 real-world defects from Defects4J, where a large portion (i.e., 82%) of them were correctly fixed. This result suggests that there is much room for the current automatic program repair techniques to improve. Furthermore, we summarized seven fault localization and seven patch generation strategies that are useful in localizing and fixing these defects, and compared those strategies with current techniques. The results indicate potential directions to improve automatic program repair in the future.

This is a preview of subscription content, access via your institution.

References

  1. 1

    Mei H, Zhang L. Can big data bring a breakthrough for software automation? Sci China Inf Sci, 2018, 61: 056101

    Article  Google Scholar 

  2. 2

    Le Goues C, Nguyen T V, Forrest S, et al. Genprog: a generic method for automatic software repair. IEEE Trans Softw Eng, 2012, 38: 54–72

    Article  Google Scholar 

  3. 3

    Long F, Rinard M. Staged program repair with condition synthesis. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2015. 166–178

    Google Scholar 

  4. 4

    Xiong Y F, Wang J, Yan R F, et al. Precise condition synthesis for program repair. In: Proceedings of the 39th International Conference on Software Engineering. New York: IEEE, 2017. 416–426

    Google Scholar 

  5. 5

    Mechtaev S, Yi J, Roychoudhury A. Angelix: scalable multiline program patch synthesis via symbolic analysis. In: Proceedings of the 38th International Conference on Software Engineering. New York: ACM, 2016. 691–701

    Google Scholar 

  6. 6

    Kim D, Nam J, Song J, et al. Automatic patch generation learned from human-written patches. In: Proceedings of the 2013 International Conference on Software Engineering. New York: IEEE, 2013. 802–811

    Google Scholar 

  7. 7

    Thien Nguyen H D, Qi D, Roychoudhury A, et al. Semfix: program repair via semantic analysis. In: Proceedings of the 2013 International Conference on Software Engineering. New York: IEEE, 2013. 772–781

    Google Scholar 

  8. 8

    Mechtaev S, Yi J, Roychoudhury A. Directfix: looking for simple program repairs. In: Proceedings of the 37th International Conference on Software Engineering. New York: IEEE, 2015. 448–458

    Google Scholar 

  9. 9

    Gao Q, Zhang H S, Wang J, et al. Fixing recurring crash bugs via analyzing q&a sites (t). In: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering, 2015. 307–318

  10. 10

    Long F, Amidon P, Rinard M. Automatic inference of code transforms for patch generation. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2017. 727–739

    Google Scholar 

  11. 11

    Rolim R, Soares G, Dantoni L, et al. Learning syntactic program transformations from examples. In: Proceedings of the 39th International Conference on Software Engineering. New York: IEEE, 2017. 404–415

    Google Scholar 

  12. 12

    Long F, Rinard M. Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. New York: ACM, 2016. 298–312

    Google Scholar 

  13. 13

    Abreu R, Zoeteweij P, van Gemund A J. On the accuracy of spectrum-based fault localization. In: Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques, 2017. 89–98

  14. 14

    Abreu R, Zoeteweij P, van Gemund A J. Spectrum-based multiple fault localization. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, 2009. 88–99

  15. 15

    Zhang X Y, Gupta N, Gupta R. Locating faults through automated predicate switching. In: Proceedings of the 28th International Conference on Software Engineering. New York: ACM, 2006. 272–281

    Google Scholar 

  16. 16

    Chandra S, Torlak E, Barman S, et al. Angelic debugging. In: Proceedings of the 33rd International Conference on Software Engineering. New York: ACM, 2011. 121–130

    Google Scholar 

  17. 17

    Marcote S L, Durieux T, Le Berre D. Nopol: automatic repair of conditional statement bugs in java programs. IEEE Trans Softw Eng, 2016, 43: 34–55

    Google Scholar 

  18. 18

    Perkins J H, Kim S, Larsen S, et al. Automatically patching errors in deployed software. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. New York: ACM, 2009. 87–102

    Google Scholar 

  19. 19

    Wen M, Chen J J, Wu R X, et al. Context-aware patch generation for better automated program repair. In: Proceedings of the 40th International Conference on Software Engineering. New York: ACM, 2018. 1–11

    Google Scholar 

  20. 20

    Jiang J J, Xiong Y F, Zhang H Y, et al. Shaping program repair space with existing patches and similar code. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2018. 298–309

    Google Scholar 

  21. 21

    Liu C, Yang J Q, Tan L, et al. R2fix: automatically generating bug fixes from bug reports. In: Proceedings of the 2013 IEEE 6th International Conference on Software Testing, Verification and Validation, 2013. 282–291

  22. 22

    Le Goues C, Dewey-Vogt M, Forrest S, et al. A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. In: Proceedings of the 34th International Conference on Software Engineering. New York: IEEE, 2012. 3–13

    Google Scholar 

  23. 23

    Just R, Jalali D, Ernst M D. Defects4j: a database of existing faults to enable controlled testing studies for java programs. In: Proceedings of International Symposium on Software Testing and Analysis. New York: ACM, 2014. 437–440

    Google Scholar 

  24. 24

    Qi Z, Long F, Achour S, et al. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: Proceedings of the 2015 International Symposium on Software Testing and Analysis. New York: ACM, 2015. 24–36

    Google Scholar 

  25. 25

    Martinez M, Durieux T, Sommerard R, et al. Automatic repair of real bugs in java: a large-scale experiment on the Defects4J dataset. Empir Softw Eng, 2017, 22: 1936–1964

    Article  Google Scholar 

  26. 26

    Xiong Y F, Liu X Y, Zeng M H, et al. Identifying patch correctness in test-based program repair. In: Proceedings of the 40th International Conference on Software Engineering. New York: ACM, 2018. 789–799

    Google Scholar 

  27. 27

    Chen L S, Pei Y, Furia C A. Contract-based program repair without the contracts. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. New York: IEEE, 2017. 637–647

    Google Scholar 

  28. 28

    Saha R K, Lyu Y J, Yoshida H, et al. Elixir: effective object oriented program repair. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. New York: IEEE, 2017. 648–659

    Google Scholar 

  29. 29

    Smith E K, Barr E T, Le Goues C, et al. Is the cure worse than the disease? overfitting in automated program repair. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2015. 532–543

    Google Scholar 

  30. 30

    Long F, Rinard M. An analysis of the search spaces for generate and validate patch generation systems. In: Proceedings of the 38th International Conference on Software Engineering. New York: ACM, 2016. 702–713

    Google Scholar 

  31. 31

    Zhong H, Su Z D. An empirical study on real bug fixes. In: Proceedings of the 37th International Conference on Software Engineering. New York: IEEE, 2015. 913–923

    Google Scholar 

  32. 32

    Tan S H, Yoshida H, Prasad M R, et al. Anti-patterns in search-based program repair. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2016. 727–738

    Google Scholar 

  33. 33

    Dantoni L, Samanta R, Singh R. Qlose: program repair with quantiative objectives. In: Proceedings of International Conference on Computer Aided Verification. Berlin: Springer, 2016. 383–401

    Google Scholar 

  34. 34

    Wei Y, Pei Y, Furia C A, et al. Automated fixing of programs with contracts. In: Proceedings of the 19th International Symposium on Software Testing and Analysis. New York: ACM, 2010. 61–72

    Google Scholar 

  35. 35

    Gao Q, Xiong Y F, Mi Y Q, et al. Safe memory-leak fixing for c programs. In: Proceedings of IEEE/ACM 37th IEEE International Conference on Software Engineering, 2015. 459–470

  36. 36

    Cai Y, Cao L W. Fixing deadlocks via lock pre-acquisitions. In: Proceedings of the 38th International Conference on Software Engineering. New York: ACM, 2016. 1109–1120

    Google Scholar 

  37. 37

    Hassan F, Wang X Y. Hirebuild: an automatic approach to history-driven repair of build scripts. In: Proceedings of the 40th International Conference on Software Engineering. New York: ACM, 2018. 1078–1089

    Google Scholar 

  38. 38

    Martinez M, Monperrus M. Mining software repair models for reasoning on the search space of automated program fixing. Empir Softw Eng, 2015, 20: 176–205

    Article  Google Scholar 

  39. 39

    Soto M, Thung F, Wong C P, et al. A deeper look into bug fixes: patterns, replacements, deletions, and additions. In: Proceedings of the 13th International Workshop on Mining Software Repositories, 2016. 512–515

  40. 40

    Yang J Q, Zhikhartsev A, Liu Y F, et al. Better test cases for better automated program repair. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2017. 831–841

    Google Scholar 

  41. 41

    Baudry B, Fleurey F, Le Traon Y. Improving test suites for efficient fault localization. In: Proceedings of the 28th International Conference on Software Engineering. New York: ACM, 2006. 82–91

    Google Scholar 

  42. 42

    Artzi S, Dolby J, Tip F, et al. Directed test generation for effective fault localization. In: Proceedings of the 19th International Symposium on Software Testing and Analysis. New York: ACM, 2010. 49–60

    Google Scholar 

  43. 43

    Yang D H, Qi Y H, Mao X G. Evaluating the strategies of statement selection in automated program repair. In: Proceedings of International Conference on Software Analysis, Testing, and Evolution. Berlin: Springer, 2018. 33–48

    Google Scholar 

  44. 44

    Tao Y D, Kim J, Kim S, et al. Automatically generated patches as debugging aids: a human study. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2014. 64–74

    Google Scholar 

  45. 45

    Lawrance J, Bogart C, Burnett M, et al. How programmers debug, revisited: an information foraging theory perspective. IEEE Trans Softw Eng, 2013, 39: 197–215

    Article  Google Scholar 

  46. 46

    LaToza T D, Myers B A. Hard-to-answer questions about code. In: Proceedings of Evaluation and Usability of Programming Languages and Tools. New York: ACM, 2010. 1–6

    Google Scholar 

  47. 47

    Murphy-Hill E, Zimmermann T, Bird C, et al. The design space of bug fixes and how developers navigate it. IEEE Trans Softw Eng, 2015, 41: 65–81

    Article  Google Scholar 

  48. 48

    Qi Y H, Mao X G, Lei Y, et al. The strength of random search on automated program repair. In: Proceedings of the 36th International Conference on Software Engineering. New York: ACM, 2014. 254–265

    Google Scholar 

  49. 49

    Le X B D, Lo D, Le Goues C. History driven program repair. In: Proceedings of IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering. New York: IEEE, 2016. 213–224

    Google Scholar 

  50. 50

    Xin Q, Reiss S P. Leveraging syntax-related code for automated program repair. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. New York: IEEE, 2017. 660–670

    Google Scholar 

  51. 51

    Agrawal H, Horgan J R. Dynamic program slicing. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 1990. 246–256

    Google Scholar 

  52. 52

    Zhang X Y, Gupta N, Gupta R. Pruning dynamic slices with confidence. In: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2006. 169–180

    Google Scholar 

  53. 53

    Rathore S S, Kumar S. Predicting number of faults in software system using genetic programming. In: Proceedings of International Conference on Soft Computing and Software Engineering, 2015. 62: 303–311

  54. 54

    Tahir A, MacDonell S G. A systematic mapping study on dynamic metrics and software quality. In: Proceedings of the 2012 IEEE International Conference on Software Maintenance, 2012. 326–335

  55. 55

    Wu R X, Zhang H Y, Cheung S C, et al. Crashlocator: locating crashing faults based on crash stacks. In: Proceedings of the 2014 International Symposium on Software Testing and Analysis. New York: ACM, 2014. 204–214

    Google Scholar 

  56. 56

    Wong C P, Xiong Y F, Zhang H Y, et al. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution, 2014. 181–190

  57. 57

    Zhong H, Mei H. Mining repair model for exception-related bug. J Syst Softw, 2018, 141: 16–31

    Article  Google Scholar 

  58. 58

    Cleve H, Zeller A. Locating causes of program failures. In: Proceedings of the 27th International Conference on Software Engineering. New York: ACM, 2005. 342–351

    Google Scholar 

  59. 59

    Le T B, Lo D, Goues C L, et al. A learning-to-rank based fault localization approach using likely invariants. In: Proceedings of the 25th International Symposium on Software Testing and Analysis. New York: ACM, 2016. 177–188

    Google Scholar 

  60. 60

    Ayewah N, Hovemeyer D, Morgenthaler J D, et al. Using static analysis to find bugs. IEEE Softw, 2008, 25: 22–29

    Article  Google Scholar 

  61. 61

    Weimer W, Fry Z P, Forrest S. Leveraging program equivalence for adaptive program repair: models and first results. In: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, 2013. 356–366

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (Grant No. 2017YFB1001803) and National Natural Science Foundation of China (Grant No. 61672045).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yingfei Xiong.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jiang, J., Xiong, Y. & Xia, X. A manual inspection of Defects4J bugs and its implications for automatic program repair. Sci. China Inf. Sci. 62, 200102 (2019). https://doi.org/10.1007/s11432-018-1465-6

Download citation

Keywords

  • automatic defect repair
  • fault localization
  • manual repair
  • software maintenance
  • case study