Empirical Software Engineering

, Volume 22, Issue 1, pp 436–473 | Cite as

An empirical study of supplementary patches in open source projects

Article

Abstract

Developers occasionally make more than one patch to fix a bug. The related patches sometimes are intentionally separated, but unintended omission errors require supplementary patches. Several change recommendation systems have been suggested based on clone analysis, structural dependency, and historical change coupling in order to reduce or prevent incomplete patches. However, very few studies have examined the reason that incomplete patches occur and how real-world omission errors could be reduced. This paper systematically studies a group of bugs that were fixed more than once in open source projects in order to understand the characteristics of incomplete patches. Our study on Eclipse JDT core, Eclipse SWT, Mozilla, and Equinox p2 showed that a significant portion of the resolved bugs require more than one attempt to fix. Compared to single-fix bugs, the multi-fix bugs did not have a lower quality of bug reports, but more attribute changes (i.e., cc’ed developers or title) were made to the multi-fix bugs than to the single-fix bugs. Multi-fix bugs are more likely to have high severity levels than single-fix bugs. Hence, more developers have participated in discussions about multi-fix bugs compared to single-fix bugs. Multi-fix bugs take more time to resolve than single-fix bugs do. Incomplete patches are longer and more scattered, and they are related to more files than regular patches are. Our manual inspection showed that the causes of incomplete patches were diverse, including missed porting updates, incorrect handling of conditional statements, and incomplete refactoring. Our investigation showed that only 7 % to 17 % of supplementary patches had content similar to their initial patches, which implies that supplementary patch locations cannot be predicted by code clone analysis alone. Furthermore, 16 % to 46 % of supplementary patches were beyond the scope of the immediate structural dependency of their initial patch locations. Historical co-change patterns also showed low precision in predicting supplementary patch locations. Code clones, structural dependencies, and historical co-change analyses predicted different supplementary patch locations, and there was little overlap between them. Combining these analyses did not cover all supplementary patch locations. The present study investigates the characteristics of incomplete patches and multi-fix bugs, which have not been systematically examined in previous research. We reveal that predicting supplementary patch is a difficult problem that existing change recommendation approaches could not cover. New type of approaches should be developed and validated on a supplementary patch data set, which developers failed to make the complete patches at once in practice.

Keywords

Software evolution Empirical study Patches Bug fixes 

References

  1. An L, Khomh F, Adams B (2014) Supplementary bug fixes vs. re-opened bugs. In: 2014 IEEE 14th international working conference on source code analysis and manipulation (SCAM). IEEE, pp 205–214Google Scholar
  2. Andersen J, Lawall J (2008) Generic patch inference. In: ASE ’08: Proceedings of the 23rd IEEE/ACM international conference on automated software engineering. IEEE Computer Society, Washington, DC, pp 337–346. doi:10.1109/ASE.2008.44
  3. Barnett M, Bird C, Brunet J, Lahiri S K (2015) Helping developers help themselves: automatic decomposition of code review changesets. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 1. IEEE, pp 134–144Google Scholar
  4. Bettenburg N, Shang W, Ibrahim W M, Adams B, Zou Y, Hassan A E (2012) An empirical study on inconsistent changes to code clones at the release level. Sci Comput Program 77(6):760–776CrossRefGoogle Scholar
  5. Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: ESEC/FSE ’09: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, New York, pp 121–130. doi:10.1145/1595696.1595716
  6. Cliff N (1996) Ordinal methods for behavioral data analysis. Routledge, EvanstonGoogle Scholar
  7. Čubranić D, Murphy GC (2003) Hipikat: recommending pertinent software development artifacts. In: Proceedings of the 25th international conference on software engineering. IEEE Computer Society, Washington, DC, ICSE ’03, pp 408–418. http://portal.acm.org/citation.cfm?id=776816.776866
  8. Duala-Ekoko E, Robillard MP (2007) Tracking code clones in evolving software. In: ICSE ’07: Proceedings of the 29th international conference on software engineering. IEEE Computer Society, Washington, DC, pp 158–167. doi:10.1109/ICSE.2007.90
  9. Eyolfson J, Tan L, Lam P (2011) Do time of day and developer experience affect commit bugginess? In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 153–162Google Scholar
  10. Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM ’03: Proceedings of the international conference on software maintenance. IEEE Computer Society, Washington, DC, p 23Google Scholar
  11. Fry ZP, Weimer W (2010) A human study of fault localization accuracy. In: ICSM ’10: Proceedings of the 2010 IEEE international conference on software maintenance. IEEE Computer Society, Washington, DC, pp 1–10. doi:10.1109/ICSM.2010.5609691
  12. Gamma E, Helm R, Johnson R, Vlissides JM (1994) Design patterns: elements of reusable object-oriented software. Addison-Wesley ProfessionalGoogle Scholar
  13. Görg C, Weißgerber P (2005) Error detection by refactoring reconstruction. In: MSR ’05: Proceedings of the 2005 international workshop on mining software repositories. ACM Press, New York, pp 1–5. doi:10.1145/1083142.1083148
  14. Grissom RJ, Kim JJ (2005) Effect sizes for research: a broad practical approach. Lawrence Erlbaum Associates PublishersGoogle Scholar
  15. Gu Z, Barr E, Hamilton D, Su Z (2010) Has the bug really been fixed? In: 2010 ACM/IEEE 32nd international conference on software engineering, vol 1, pp 55–64. doi:10.1145/1806799.1806812
  16. Hassan AE (2009) Predicting faults using the complexity of code changes. In: ICSE ’09: Proceedings of the 31st international conference on software engineering, IEEE Computer Society, Washington, DC, pp 78–88. doi:10.1109/ICSE.2009.5070510
  17. Hassan AE, Holt RC (2004) Predicting change propagation in software systems. In: ICSM ’04: Proceedings of the 20th IEEE international conference on software maintenance. IEEE Computer Society, Washington, DC, pp 284–293. doi:10.1109/ICSM.2004.1357812
  18. Herzig K, Zeller A (2013) The impact of tangled code changes. In: 2013 10th IEEE working conference on mining software repositories (MSR). IEEE, pp 121–130Google Scholar
  19. Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 392–401Google Scholar
  20. Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 34–43Google Scholar
  21. Jiang L, Misherghi G, Su Z, Glondu S (2007) Deckard: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th international conference on software engineering. IEEE Computer Society, Washington, DC, ICSE ’07, pp 96–105. doi:10.1109/ICSE.2007.30
  22. Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28 (7):654–670. doi:10.1109/TSE.2002.1019480 CrossRefGoogle Scholar
  23. Kim M, Notkin D (2009) Discovering and representing systematic code changes. In: ICSE ’09: Proceedings of the 2009 IEEE 31st international conference on software engineering. IEEE Computer Society, Washington, DC, pp 309–319. doi:10.1109/ICSE.2009.5070531
  24. Kim S, Whitehead EJ Jr, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196. doi:10.1109/TSE.2007.70773 CrossRefGoogle Scholar
  25. Kim M, Sinha S, Go Andrg C, Shah H, Harrold M, Nanda M (2010) Automated bug neighborhood analysis for identifying incomplete bug fixes. In: ICST ’10: Proceedings of the third international conference on software testing, verification and validation. IEEE Computer Society, Washington, DC, pp 383–392. doi:10.1109/ICST.2010.63
  26. Kim M, Cai D, Kim S (2011) An empirical investigation into the role of refactorings during software evolution. In: ICSE’ 11: Proceedings of the 2011 ACM and IEEE 33rd international conference on software engineering. ACM, New York, pp 151–160. doi:10.1145/1985793.1985815
  27. Loh A, Kim M (2010) Lsdiff: a program differencing tool to identify systematic structural differences. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering—volume 2. ACM, New York, ICSE ’10, pp 263–266. doi:10.1145/1810295.1810348
  28. Meng N, Kim M, McKinley KS (2011) Systematic editing: generating program transformations from an example. In: Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation. ACM, New York, PLDI ’11, pp 329–342. http://doi.acm.org/10.1145/1993498.1993537
  29. Meng N, Kim M, McKinley KS (2013) Lase: locating and applying systematic edits by learning from examples. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, Piscataway, ICSE ’13, pp 502–511. http://dl.acm.org/citation.cfm?id=2486788.2486855
  30. Mockus A, Votta LG (2000) Identifying reasons for software changes using historic databases. In: ICSM ’00: Proceedings of the international conference on software maintenance. IEEE Computer Society, Washington, DC, pp 120–130. doi:10.1109/ICSM.2000.883028
  31. Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Tech J 5(2):169–180CrossRefGoogle Scholar
  32. Nagappan N, Zeller A, Zimmermann T, Herzig K, Murphy B (2010) Change bursts as defect predictors. In: Proceedings of the 2010 IEEE 21st international symposium on software reliability engineering. IEEE Computer Society, Washington, DC, ISSRE ’10, pp 309–318. doi:10.1109/ISSRE.2010.25
  33. Nguyen TT, Nguyen HA, Pham NH, Al-Kofahi JM, Nguyen TN (2009) Clone-aware configuration management. In: ASE ’09: Proceedings of the 2009 IEEE/ACM international conference on automated software engineering. IEEE Computer Society, Washington, DC, pp 123–134. doi:10.1109/ASE.2009.90
  34. Nguyen TT, Nguyen HA, Pham NH, Al-Kofahi J, Nguyen TN (2010) Recurring bug fixes in object-oriented programs. In: ICSE ’10: Proceedings of the 32nd ACM/IEEE international conference on software engineering. ACM, New York, pp 315–324. doi:10.1145/1806799.1806847
  35. Padioleau Y, Lawall J, Hansen RR, Muller G (2008) Documenting and automating collateral evolutions in linux device drivers. In: Eurosys ’08: Proceedings of the 3rd ACM SIGOPS/EuroSys European conference on computer systems 2008. ACM, New York, pp 247–260. doi:10.1145/1352592.1352618
  36. Park J, Kim M, Ray B, Bae DH (2012) An empirical study of supplementary bug fixes. In: MSR ’12: 9th IEEE working conference on mining software repositories. IEEE Computer Society, Washington, DC, pp 40–49. doi:10.1109/MSR.2012.6224298
  37. Park J, Kim M, Bae DH (2014) An empirical study on reducing omission errors in practice. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering. ACM, pp 121–126Google Scholar
  38. Pham NH, Nguyen HA, Nguyen TT, Al-Kofahi JM, Nguyen TN (2009) Complete and accurate clone detection in graph-based models. In: Proceedings of the 31st international conference on software engineering. IEEE Computer Society, Washington, DC, ICSE ’09, pp 276–286. doi:10.1109/ICSE.2009.5070528
  39. Purushothaman R, Perry DE (2005) Toward understanding the rhetoric of small source code changes. IEEE Trans Softw Eng 31(6):511–526. doi:10.1109/TSE.2005.74 CrossRefGoogle Scholar
  40. Rahman F, Devanbu P (2011) Ownership, experience and defects: a fine-grained study of authorship. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 491–500Google Scholar
  41. Rahman F, Bird C, Devanbu P (2012) Clones: what is that smell? Empir Softw Eng 17(4–5):503–530CrossRefGoogle Scholar
  42. Ray B, Kim M (2012a) A case study of cross-system porting in forked projects. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering. ACM, p 53Google Scholar
  43. Ray B, Kim M (2012b) A case study of cross-system porting in forked projects. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering. ACM, New York, FSE ’12, pp 53:1–53:11. doi:10.1145/2393596.2393659
  44. Robillard MP (2005) Automatic generation of suggestions for program investigation. In: ESEC/FSE-13: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on foundations of software engineering. ACM, New York, pp 11–20. doi:10.1145/1081706.1081711
  45. Shannon CE (1949) Communication theory of secrecy systems*. Bell Syst Tech J 28(4):656–715MathSciNetCrossRefMATHGoogle Scholar
  46. Shihab E, Ihara A, Kamei Y, Ibrahim WM, Ohira M, Adams B, Hassan AE, Ki M (2013) Studying re-opened bugs in open source software. Empir Softw Eng 18(5):1005–1042CrossRefGoogle Scholar
  47. Tao Y, Kim S (2015) Partitioning composite code changes to facilitate code review. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 180–190Google Scholar
  48. Viera AJ, Garrett JM, et al. (2005) Understanding interobserver agreement: the kappa statistic. Fam Med 37(5):360–363Google Scholar
  49. Wang X, Lo D, Cheng J, Zhang L, Mei H, Yu JX (2010) Matching dependence-related queries in the system dependence graph. In: ASE ’10: Proceedings of the IEEE/ACM international conference on Automated software engineering. ACM, New York, ASE ’10, pp 457–466. doi:10.1145/1858996.1859091
  50. Yin Z, Yuan D, Zhou Y, Pasupathy S, Bairavasundaram L (2011) How do fixes become bugs? In: ESEC/FSE ’11: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, New York. doi:10.1145/2025113.2025121
  51. Ying A TT, Murphy GC, Ng R, Chu-Carroll M (2004) Predicting source code changes by mining change history. IEEE Trans Softw Eng 30(9):574–586CrossRefGoogle Scholar
  52. Zhang X, Tallam S, Gupta N, Gupta R (2007) Towards locating execution omission errors. In: PLDI ’07: Proceedings of the 2007 ACM SIGPLAN conference on programming language design and implementation. ACM, New York, pp 415–424. doi:10.1145/1250734.1250782
  53. Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: ICSE ’04: Proceedings of the 26th international conference on software engineering. IEEE Computer Society, Washington, DC, pp 563–572. doi:10.1109/ICSE.2004.1317478
  54. Zimmermann T, Weißgerber P, Diehl S, Zeller A (2005) Mining version histories to guide software changes. IEEE Trans Softw Eng 31(6):429–445. doi:10.1109/TSE.2005.72 CrossRefGoogle Scholar
  55. Zimmermann T, Premraj R, Bettenburg N, Just S, Schroter A, Weiss C (2010) What makes a good bug report? IEEE Trans Softw Eng 36(5):618–643CrossRefGoogle Scholar
  56. Zimmermann T, Nagappan N, Guo PJ, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: 2012 34th international conference on software engineering (ICSE). IEEE, pp 1074–1083Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.School of ComputingKorea Advanced Institute of Science and TechnologyDaejeonKorea
  2. 2.Computer Science DepartmentThe University of CaliforniaLos AngelesUSA

Personalised recommendations