Abstract
Patching is a common activity in software development. It is generally performed on a source code base to address bugs or add new functionalities. In this context, given the recurrence of bugs across projects, the associated similar patches can be leveraged to extract generic fix actions. While the literature includes various approaches leveraging similarity among patches to guide program repair, these approaches often do not yield fix patterns that are tractable and reusable as actionable input to APR systems. In this paper, we propose a systematic and automated approach to mining relevant and actionable fix patterns based on an iterative clustering strategy applied to atomic changes within patches. The goal of FixMiner is thus to infer separate and reusable fix patterns that can be leveraged in other patch generation systems. Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree structure of the edit scripts that captures the AST-level context of the code changes. FixMiner uses different tree representations of Rich Edit Scripts for each round of clustering to identify similar changes. These are abstract syntax trees, edit actions trees, and code context trees. We have evaluated FixMiner on thousands of software patches collected from open source projects. Preliminary results show that we are able to mine accurate patterns, efficiently exploiting change information in Rich Edit Scripts. We further integrated the mined patterns to an automated program repair prototype, PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J benchmark. Beyond this quantitative performance, we show that the mined fix patterns are sufficiently relevant to produce patches with a high probability of correctness: 81% of PARFixMiner’s generated plausible patches are correct.
Similar content being viewed by others
Notes
The initial version of this paper was written concurrently to SimFix and CapGen.
The order of AST subtrees follows the order of hunks of the GNU diff format.
In this experiment, we excluded 34 patches from Defects4J dataset which affect more than 1 file.
Semantic Patch Language
We used GZoltar version 0.1.1
Version 1.2.0 - https://github.com/rjust/defects4j/releases/tag/v1.2.0
References
Abreu R, Zoeteweij P, Van Gemund A J (2007) On the accuracy of spectrum-based fault localization. In: Testing: Academic and industrial conference practice and research techniques-MUTATION (TAICPART-MUTATION 2007), pp 89–98. IEEE
Al-Ekram R, Adma A, Baysal O (2005) Diffx: An algorithm to detect changes in multi-version xml documents. In: Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research, pp 1–11. IBM Press
Andersen J, Lawall JL (2010) Generic patch inference. Auto Softw Eng 17 (2):119–148
Andersen J, Nguyen AC, Lo D, Lawall JL, Khoo SC (2012) Semantic patch inference. In: 2012 Proceedings of the 27th IEEE/ACM international conference on automated software engineering (ASE), pp 382–385. IEEE
Bhatia S, Singh R (2016) Automated correction for syntax errors in programming assignments using recurrent neural networks. arXiv:1603.06129
Bille P (2005) A survey on tree edit distance and related problems. Theor Comput Sci 337(1-3):217–239
Brunel J, Doligez D, Hansen RR, Lawall JL, Muller G (2009) A foundation for flow-based program matching: Using temporal logic and model checking. In: Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL ’09. ACM, New York, pp 114–126. https://doi.org/10.1145/1480881.1480897
Campos J, Riboira A, Perez A, Abreu R (2012) Gzoltar: an eclipse plug-in for testing and debugging. In: Proceedings of the 27th IEEE/ACM international conference on automated software engineering, pp 378–381. ACM
Chawathe SS, Rajaraman A, Garcia-Molina H, Widom J (1996) Change Detection in Hierarchically Structured Information. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, SIGMOD ’96. ACM, New York, pp 493–504. https://doi.org/10.1145/233269.233366
Chen L, Pei Y, Furia CA (2017) Contract-based program repair without the contracts. In: Proceedings of the 32nd IEEE/ACM international conference on automated software engineering. IEEE, Urbana, pp 637–647
Chilowicz M, Duris E, Roussel G (2009) Syntax tree fingerprinting for source code similarity detection. In: IEEE 17th international conference on program comprehension, 2009. ICPC’09, pp 243–247. IEEE
Coker Z, Hafiz M (2013) Program transformations to fix c integers. In: Proceedings of the international conference on software engineering. IEEE, San Francisco, pp 792–801
Dallmeier V, Zeller A, Meyer B (2009) Generating fixes from object behavior anomalies. In: Proceedings of the 2009 IEEE/ACM international conference on automated software engineering, pp 550–554. IEEE Computer Society
Duley A, Spandikow C, Kim M (2012) Vdiff: A program differencing algorithm for verilog hardware description language. Autom Softw Eng 19(4):459–490
Durieux T, Cornu B, Seinturier L, Monperrus M (2017) Dynamic patch generation for null pointer exceptions using metaprogramming. In: Proceedings of the 24th international conference on software analysis, evolution and reengineering, pp 349–358. IEEE
Falleri JR GumTree. https://github.com/GumTreeDiff/gumtree (Last Access: Mar. 2018.)
Falleri JR, Morandat F, Blanc X, Martinez M, Monperrus M (2014) Fine-grained and accurate source code differencing. In: Proceedings of ACM/IEEE international conference on automated software engineering. ACM, Vasteras, pp 313–324
Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: Proceeding of the 19th ICSM, pp 23–32. IEEE
Fluri B, Gall HC (2006) Classifying change types for qualifying change couplings. In: 14th IEEE international conference on program comprehension, 2006. ICPC 2006, pp 35–45. IEEE
Fluri B, Giger E, Gall HC (2008) Discovering patterns of change types. In: Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering. IEEE, L’Aquila, pp 463– 466
Fluri B, Wuersch M, PInzger M, Gall H (2007) Change distilling: Tree differencing for fine-grained source code change extraction. IEEE Transactions on software engineering 33(11)
Gupta R, Pal S, Kanade A, Shevade S (2017) Deepfix: Fixing common c language errors by deep learning. In: AAAI, pp 1345–1351
Hanam Q, Brito FSDM, Mesbah A (2016) Discovering bug patterns in javascript. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 144–156. ACM
Hashimoto M, Mori A (2008) Diff/ts: A tool for fine-grained structural change analysis. In: 2008 15th working conference on reverse engineering, pp 279–288. IEEE
Herzig K, Zeller A (2013) The impact of tangled code changes. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13. IEEE, San Francisco, pp 121–130
Hovemeyer D, Pugh W (2004) Finding bugs is easy. ACM Sigplan Notices 39 (12):92–106
Hua J, Zhang M, Wang K, Khurshid S (2018) Towards practical program repair with on-demand candidate generation. In: Proceedings of the 40th international conference on software engineering, pp 12–23. ACM
Huang K, Chen B, Peng X, Zhou D, Wang Y, Liu Y, Zhao W (2018) Cldiff: generating concise linked code differences. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, pp 679–690. ACM
Jaro MA (1989) Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. J Am Stat Assoc 84(406):414–420
Jiang J, Xiong Y, Zhang H, Gao Q, Chen X (2018) Shaping program repair space with existing patches and similar code. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, pp 298–309. ACM
Just R, Jalali D, Ernst MD (2014) Defects4j: A database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the 2014 international symposium on software testing and analysis. ACM, San Jose, pp 437–440
Ke Y, Stolee KT, Le Goues C, Brun Y (2015) Repairing programs with semantic code search. In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE). IEEE, Lincoln, pp 295–306
Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: Proceedings of the 2013 international conference on software engineering, pp 802–811. IEEE Press
Kim M, Notkin D (2009) Discovering and representing systematic code changes. In: Proceedings of the 31st international conference on software engineering, pp 309–319. IEEE Computer Society
Kim M, Notkin D, Grossman D (2007) Automatic inference of structural changes for matching across program versions. In: ICSE, vol 7, pp 333–343. Citeseer
Kim S, Pan K, Whitehead Jr E (2006) Memories of bug fixes. In: Proceedings of the 14th ACM SIGSOFT international symposium on foundations of software engineering, pp 35–45. ACM
Koyuncu A, Bissyandé T, Kim D, Klein J, Monperrus M, Le Traon Y (2017) Impact of tool support in patch construction. In: Proceedings of the 26th ACM SIGSOFT international symposium on software testing and analysis. ACM, New York, pp 237–248
Koyuncu A, Bissyandé TF, Kim D, Liu K, Klein J, Monperrus M, Traon Y L (2019) D&c: A divide-and-conquer approach to ir-based bug localization. arXiv:1902.02703
Koyuncu A, Liu K, Bissyandé TF, Kim D, Monperrus M, Klein J, Le Traon Y (2019) Ifixr: bug report driven program repair. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 314–325. ACM
Kreutzer P, Dotzler G, Ring M, Eskofier BM, Philippsen M (2016) Automatic clustering of code changes. In: Proceedings of the 13th international conference on mining software repositories, MSR ’16. ACM, New York, pp 61–72. https://doi.org/10.1145/2901739.2901749. http://doi.acm.org.proxy.bnl.lu/10.1145/2901739.2901749
Le XBD, Chu DH, Lo D, Le Goues C, Visser W (2017) S3: syntax-and semantic-guided repair synthesis via programming by examples. In: Proceedings of the 11th joint meeting on foundations of software engineering. ACM, Paderborn, pp 593–604
Le XD, Lo D, Le Goues C (2016a) History driven program repair. In: Proceedings of the 23rd international conference on software analysis, evolution, and reengineering, vol 1, pp 213–224. IEEE
Le XBD, Le Q L, Lo D, Le Goues C (2016b) Enhancing automated program repair with deductive verification. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE, Raleigh, pp 428–432
Le Goues C, Nguyen T, Forrest S, Weimer W (2012) GenProg: A generic method for automatic software repair. TSE 38(1):54–72
Le Goues C, Nguyen T, Forrest S, Weimer W (2012) Genprog: A generic method for automatic software repair. IEEE Trans Softw Eng 38(1):54–72
Lee J, Kim D, Bissyandé TF, Jung W, Le Traon Y (2018) Bench4bl: reproducibility study on the performance of ir-based bug localization. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, pp 61–72. ACM
Lin W, Chen Z, Ma W, Chen L, Xu L, Xu B (2016) An empirical study on the characteristics of python fine-grained source code change types. In: 2016 IEEE international conference on software maintenance and evolution (ICSME), pp 188–199. IEEE
Liu K, Kim D, Bissyandé TF, Yoo S, Le Traon Y (2018a) Mining fix patterns for findbugs violations. IEEE Transactions on Software Engineering
Liu K, Kim D, Koyuncu A, Li L, Bissyandé TF, Le Traon Y (2018b) A closer look at real-world patches. In: 2018 IEEE international conference on software maintenance and evolution, pp 275–286. IEEE
Liu K, Koyuncu A, Kim D, Bissyandé TF (2019) Avatar: Fixing semantic bugs with fix patterns of static analysis violations. In: Proceedings of the IEEE 26th international conference on software analysis, evolution and reengineering, pp 456–467. IEEE
Liu K, Koyuncu A, Bissyandé TF, Kim D, Klein J, Le Traon Y (2019b) You cannot fix what you cannot find! an investigation of fault localization bias in benchmarking automated program repair systems. In: 2019 12th IEEE conference on software testing, validation and verification (ICST), pp 102–113. IEEE
Liu K, Koyuncu A, Kim D, Bissyandé TF (2019) TBar: revisiting template-based automated program repair. In: Proceedings of the 28th international symposium on software testing and analysis
Liu K, Koyuncu A, Kim K, Kim D, Bissyandé TF (2018) LSRepair: Live search of fix ingredients for automated program repair. In: Proceedings of the 25th Asia-Pacific software engineering conference, pp 658–662
Liu X, Zhong H (2018) Mining stackoverflow for program repair. In: Proceedings of the 25th international conference on software analysis, evolution and reengineering, pp 118–129. IEEE
Livshits B, Zimmermann T (2005) DynaMine: Finding common error patterns by mining software revision histories. In: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on foundations of software engineering, ESEC/FSE-13. ACM, New York, pp 296–305. https://doi.org/10.1145/1081706.1081754
Long F, Amidon P, Rinard M (2017) Automatic inference of code transforms for patch generation. In: Proceedings of the 11th joint meeting on foundations of software engineering. ACM, Paderborn, pp 727–739
Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, Bergamo, pp 166–178
Long F, Rinard M (2016) Automatic patch generation by learning correct code. In: Proceedings of the 43rd annual ACM SIGPLAN-SIGACT symposium on principles of programming languages. ACM, St. Petersburg, pp 298–312
Martinez M, Duchien L, Monperrus M (2013) Automatically extracting instances of code change patterns with ast analysis. In: 2013 29th IEEE international conference on software maintenance (ICSM), pp 388–391. IEEE
Martinez M, Durieux T, Sommerard R, Xuan J, Monperrus M (2017) Automatic repair of real bugs in java: A large-scale experiment on the defects4j dataset. Empir Softw Eng 22(4):1936–1964
Martinez M, Monperrus M (2015) Mining software repair models for reasoning on the search space of automated program fixing. Empir Softw Eng 20(1):176–205
Martinez M, Monperrus M (2016) Astor: A program repair library for java. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, Saarbru̇cken, pp 441–444
Martinez M, Monperrus M (2018) Ultra-large repair search space with automatically mined templates: The cardumen mode of astor. In: Proceedings of the 10th SSBSE, pp 65–86. Springer
Mechtaev S, Yi J, Roychoudhury A (2015) Directfix: Looking for simple program repairs. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE, Florence, pp 448–458
Meng N, Kim M, McKinley KS (2011) Systematic editing: Generating program transformations from an example. ACM SIGPLAN Not 46(6):329–342
Meng N, Kim M, McKinley KS (2013) Lase: locating and applying systematic edits by learning from examples. In: Proceedings of the 2013 international conference on software engineering, pp 502–511. IEEE Press
Molderez T, Stevens R, De Roover C (2017) Mining change histories for unknown systematic edits. In: Procee dings of the 14th international conference on mining software repositories, pp 248–256. IEEE Press
Monperrus M (2018) Automatic software repair: a bibliography. ACM Comput Surveys (CSUR) 51(1):17
Myers EW (1986) Ano (nd) difference algorithm and its variations. Algorithmica 1(1-4):251–266
Neamtiu I, Foster JS, Hicks M (2005) Understanding source code evolution using abstract syntax tree matching. ACM SIGSOFT Softw Eng Notes 30(4):1–5
Nguyen HA, Nguyen AT, Nguyen TN (2013) Filtering noise in mixed-purpose fixing commits to improve defect prediction and localization. In: 2013 IEEE 24th international symposium on software reliability engineering (ISSRE), pp 138–147. IEEE
Nguyen HDT, Qi D, Roychoudhury A, Chandra S (2013) SemFix: program repair via semantic analysis. In: Proceedings of the 35th ICSE, pp 772–781. IEEE
Nguyen TT, Nguyen HA, Pham NH, Al-Kofahi J, Nguyen TN (2010) Recurring bug fixes in object-oriented programs. In: 2010 ACM/IEEE 32nd international conference on software engineering, vol 1, pp 315–324. IEEE
Osman H, Lungu M, Nierstrasz O (2014) Mining frequent bug-fix code changes. In: 2014 software evolution week-IEEE conference on software maintenance, reengineering and reverse engineering (CSMR-WCRE), pp 343–347. IEEE
Oumarou H, Anquetil N, Etien A, Ducasse S, Taiwe KD (2015) Identifying the exact fixing actions of static rule violation. In: 2015 IEEE 22nd international conference on software analysis, evolution and reengineering (SANER), pp 371–379. IEEE
Padioleau Y, Lawall J, Hansen RR, Muller G (2008) Documenting and Automating Collateral Evolutions in Linux Device Drivers. In: Proceedings of the 3rd ACM SIGOPS/EuroSys european conference on computer systems 2008, Eurosys ’08. https://doi.org/10.1145/1352592.1352618. ACM, New York, pp 247–260
Pan K, Kim S, Whitehead EJ (2009) Toward an understanding of bug fix patterns. Empir Softw Eng 14(3):286–315
Park J, Kim M, Ray B, Bae DH (2012) An empirical study of supplementary bug fixes. In: Proceedings of the 9th IEEE working conference on mining software repositories, pp 40–49. IEEE Press
Pawlik M, Augsten N (2011) Rted: A robust algorithm for the tree edit distance. Proceedings of the VLDB Endowment 5(4):334–345
Rolim R, Soares G, Gheyi R, D’Antoni L (2018) Learning quick fixes from code repositories. arXiv:1803.03806
Saha RK, Lyu Y, Yoshida H, Prasad MR (2017) Elixir: Effective object-oriented program repair. In: 2017 32nd IEEE/ACM international conference on automated software engineering (ASE), pp 648–659. IEEE
Skiena SS (1997) The stony brook algorithm repository. http://www.cs.sunysb.edu/algorith/implement/nauty/implement. shtml
Sobreira V, Durieux T, Madeiral F, Monperrus M, Maia MA (2018) Dissection of a bug dataset: Anatomy of 395 patches from Defects4J. In: Proceedings of SANER
Tan SH, Roychoudhury A (2015) Relifix: Automated repair of software regressions. In: Proceedings of the 37th international conference on software engineering-volume 1, pp 471–482. IEEE Press
Tao Y, Kim S (2015) Partitioning composite code changes to facilitate code review. In: 2015 IEEE/ACM 12th working conference on mining software repositories, pp 180–190. IEEE
Thomas SW, Nagappan M, Blostein D, Hassan AE (2013) The impact of classifier configuration and classifier combination on bug localization. TSE 39(10):1427–1443
Tian Y, Lawall J, Lo D (2012) Identifying linux bug fixing patches. In: Proceedings of the 34th international conference on software engineering, pp 386–396. IEEE Press
Weimer W, Nguyen T, Le Goues C, Forrest S (2009) Automatically finding patches using genetic programming. In: Proceedings of the 31st international conference on software engineering, May 16-24. IEEE, Vancouver, pp 364–374
Weissgerber P, Diehl S (2006) Identifying refactorings from source-code changes. In: 21st IEEE/ACM international conference on automated software engineering, 2006. ASE’06, pp 231–240. IEEE
Wen M, Chen J, Wu R, Hao D, Cheung SC (2018) Context-aware patch generation for better automated program repair. In: Proceedings of the 40th international conference on software engineering, pp 1–11. ACM
Wen M, Wu R, Cheung SC (2016) Locus: Locating bugs from software changes. In: 2016 31st IEEE/ACM international conference on automated software engineering (ASE), pp 262–273. IEEE
Winkler WE (1990) String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage
Xin Q, Reiss SP (2017) Leveraging syntax-related code for automated program repair. In: Proceedings of the 32nd IEEE/ACM international conference on automated software engineering, pp 660–670. IEEE
Xiong Y, Wang J, Yan R, Zhang J, Han S, Huang G, Zhang L (2017) Precise condition synthesis for program repair. In: Proceedings of the 39th international conference on software engineering. IEEE, Buenos Aires, pp 416–426
Xuan J, Martinez M, DeMarco F, Clement M, Marcote S L, Durieux T, Le Berre D, Monperrus M (2017) Nopol: Automatic repair of conditional statement bugs in java programs. IEEE Trans Softw Eng 43(1):34–55
Ying AT, Murphy GC, Ng R, Chu-Carroll MC (2004) Predicting source code changes by mining change history. IEEE Trans Softw Eng 30(9):574–586
Yue R, Meng N, Wang Q (2017) A characterization study of repeated bug fixes. In: 2017 IEEE international conference on software maintenance and evolution (ICSME), pp 422–432. IEEE
Acknowledgements
This work is supported by the Fonds National de la Recherche (FNR), Luxembourg, through RECOMMEND 15/IS/10449467 and FIXPATTERN C15/IS/9964569.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Paolo Tonella
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Koyuncu, A., Liu, K., Bissyandé, T.F. et al. FixMiner: Mining relevant fix patterns for automated program repair. Empir Software Eng 25, 1980–2024 (2020). https://doi.org/10.1007/s10664-019-09780-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-019-09780-z