Studying high impact fix-inducing changes

An Erratum to this article was published on 01 September 2016

Abstract

As software systems continue to play an important role in our daily lives, their quality is of paramount importance. Therefore, a plethora of prior research has focused on predicting components of software that are defect-prone. One aspect of this research focuses on predicting software changes that are fix-inducing. Although the prior research on fix-inducing changes has many advantages in terms of highly accurate results, it has one main drawback: It gives the same level of impact to all fix-inducing changes. We argue that treating all fix-inducing changes the same is not ideal, since a small typo in a change is easier to address by a developer than a thread synchronization issue. Therefore, in this paper, we study high impact fix-inducing changes (HIFCs). Since the impact of a change can be measured in different ways, we first propose a measure of impact of the fix-inducing changes, which takes into account the implementation work that needs to be done by developers in later (fixing) changes. Our measure of impact for a fix-inducing change uses the amount of churn, the number of files and the number of subsystems modified by developers during an associated fix of the fix-inducing change. We perform our study using six large open source projects to build specialized models that identify HIFCs, determine the best indicators of HIFCs and examine the benefits of prioritizing HIFCs. Using change factors, we are able to predict 56 % to 77 % of HIFCs with an average false alarm (misclassification) rate of 16 %. We find that the lines of code added, the number of developers who worked on a change, and the number of prior modifications on the files modified during a change are the best indicators of HIFCs. Lastly, we observe that a specialized model for HIFCs can provide inspection effort savings of 4 % over the state-of-the-art models. We believe our results would help practitioners prioritize their efforts towards the most impactful fix-inducing changes and save inspection effort.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    git clone git://git.gnome.org/gimp

  2. 2.

    git clone git://git.apache.org/maven-2.git

  3. 3.

    git clone gitclonegit://perl5.git.perl.org/perl.git

  4. 4.

    git clone git://git.postgresql.org/git/postgresql.git

  5. 5.

    git clone github.com/rails/rails.git

  6. 6.

    git clone github.com/mozilla/rhino.git

References

  1. Arisholm E, Briand LC (2006) Predicting fault-prone components in a Java legacy system. In: Proceedings of International Symposium on Empirical Software Engineering (ISESE), pp 8–17

  2. Bachmann A, Bird C, Rahman F, Devanbu P, Bernstein A (2010) The missing links: Bugs and bug-fix commits. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’10, pp 97–106

  3. Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761

  4. Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced? Bias in bug-fix datasets. In: European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE

  5. Boehm B, Basili VR (2005) Software defect reduction top 10 list. Foundations of empirical software engineering: the legacy of Victor R Basili 426

  6. Bradley P, Fayyad U, Reina C (1999) Scaling em (expectation-maximization) clustering to large databases. Technical Report, Microsoft Research, MSR-TR-98-35

  7. Breiman L (2001) Random forests. Mach learn 45:5–32

    Article  MATH  Google Scholar 

  8. Caglayan B, Misirli AT, Miranskyy AV, Turhan B, Bener A (2012) Factors characterizing reopened issues: a case study. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 1–10

  9. Caglayan B, Tosun A, Miranskyy A, Bener A, Ruffolo N (2010) Usage of multiple prediction models based on defect categories. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 8:1–8:9

  10. Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493. doi:10.1109/32.295895

    Article  Google Scholar 

  11. Czerwonka J, Das R, Nagappan N, Tarvo A, Teterev A (2011) Crane: Failure prediction, change analysis and test prioritization in practice – experiences from windows. In: Proceedings of Int’l Confernce on Software Testing, Verification and Validation (ICST), pp 357–366

  12. D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proceedings of Int’l Working Conference on Mining Software Repositories (MSR), pp 31–41

  13. Estabrooks A, Japkowicz N (2001) A mixture-of-experts framework for learning from imbalanced data sets. In: Proceedings of Int’l Conference on Advances in Intelligent Data Analysis (IDA), pp 34– 43

  14. Eyolfson J, Tan L, Lam P (2011) Do time of day and developer experience affect commit bugginess. In: Proceedings of Int’l Working Conference on Mining Software Repositories (MSR), pp 153– 162

  15. Fraley C, Raftery A, Murphy B, Scrucca L (2012) mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report, Department of Statistics. University of Washington

  16. Gall J, Razavi N, Van Gool L (2012) An introduction to random forests for multi-class object detection. In: sProceedings of Int’l Conference on Theoretical Foundations of Computer Vision: outdoor and large-scale real-world scene analysis, pp 243–263

  17. Gethers M, Dit B, Kagdi H, Poshyvanyk D (2012) Integrated impact analysis for managing software changes. In: International Conference on Software Engineering

  18. Giger E, D’Ambros M, Pinzger M, Gall HC (2012) Method-level bug prediction. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’12, pp 171–180

  19. Giger E, Pinzger M, Gall H (2012) Can we predict types of code changes? An empirical analysis. In: Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on, pp 217– 226

  20. Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661

  21. Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: An empirical study of microsoft windows. In: Proceedings of Int’l Conference on Software Engineering (ICSE), pp 495–504

  22. Halstead M (1977) Elements of Software Science. Elsevier North-Holland, Amsterdam

  23. Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of Int’l Conf. on Software Engineering (ICSE), pp 78–88

  24. Hata H, Mizuno O, Kikuno T (2012) Bug prediction based on fine-grained module histories. In: Proceedings of the 2012 International Conference on Software Engineering, ICSE 2012, pp 200–210

  25. Herraiz I, Gonzalez-Barahona JM, Robles G (2007) Towards a theoretical model for software growth. In: Proceedings of Int’l Workshop on Mining Software Repositories (MSR), p 21

  26. Herzig K, Just S, Rau A, Zeller A (2013) Predicting defects using change genealogies. In: Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on, pp 118– 127. IEEE

  27. Jiang Y, Cukic B, Menzies T (2008) Can data transformation help in the detection of fault-prone modules. In: Proceedings of Int’l Workshop on Defects in Large Software Systems (DEFECTS), pp 16–20

  28. Kamei Y, Monden A, Morisaki S, Matsumoto KI (2008) A hybrid faulty module prediction using association rule mining and logistic regression analysis. In: Proceedings of Int’l Symposium on Empirical Software Engineering and Measurement (EMSE), pp 279–281

  29. Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773

    Article  Google Scholar 

  30. Kawrykow D, Robillard MP (2011) Non-essential changes in version histories. In: ICSE, pp 351– 360

  31. Kim S, Ernst MD (2007) Which warnings should I fix first? In: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pp 45–54. ACM

  32. Kim S, Whitehead Jr. EJ, Zhang Y (2008) Classifying software changes: Clean or buggy. IEEE Trans Softw Eng 34(2):181–196

    Article  Google Scholar 

  33. Kim S, Zimmermann T, Pan K, Whitehead EJ (2006) Automatic identification of bug-introducing changes. In: Automated Software Engineering, 2006. ASE’06. 21st IEEE/ACM International Conference on, pp 81–90. IEEE

  34. Lehnert S (2011) A review of software change impact analysis. Technical Report, Technische Universität Ilmenau

  35. Leszak M, Perry DE, Stoll D (2002) Classification and evaluation of defects in a project retrospective. J Syst Softw 61(3):173–187

    Article  Google Scholar 

  36. Li B, Sun X, Leung H, Zhang S (2013) A survey of code-based change impact analysis techniques. Soft Test Verification Reliab 23:613–646

    Article  Google Scholar 

  37. Liaw A, Wiener M (2013) randomforest: Breiman and cutler’s random forests for classification and regression. Online: cran.r-project.org/web/packages/randomForest/index.html

  38. Madhavan JT, Whitehead EJ (2007) Predicting buggy changes inside an integrated development environment. In: Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange, pp 36–40. ACM

  39. Matsumoto S, Kamei Y, Monden A, Matsumoto K, Nakamura M (2010) An analysis of developer metrics for fault prediction. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 18:1–18:9

  40. McCabe TJ (1976) A complexity measure. In: Proceedings of Int’l Conference on Software Engineering (ICSE), p 407

  41. Mende T, Koschke R (2010) Effort-aware defect prediction models. In: Proceedings of European Conference on Software Maintenance and Reengineering (CSMR), pp 107–116

  42. Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 47–54

  43. Misirli AT, Caglayan B, Bener A, Turhan B (2013) A retrospective study of software analytics projects: In-depth interviews with practitioners. IEEE Soft 30:54–61

    Article  Google Scholar 

  44. Misirli AT, Caglayan B, Miranskyy A, Bener A, Ruffolo N (2011) Different strokes for different folks: A case study on software metrics for different defect categories. In: Proceedings of Int’l Workshop on Emerging Trends in Software Metrics (WETSOM)

  45. Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Tech J 5(2):169–180

    Article  Google Scholar 

  46. Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of Int’l Conference on Software Engineering (ICSE), pp 181–190

  47. Nagappan N, Ball T (2005) Static analysis tools as early indicators of pre-release defect density. In: Proceedings of Int’l Conference on Software Engineering (ICSE), pp 580–586

  48. Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of Int’l Conf. on Software Engineering (ICSE), pp 284–292

  49. Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of Int’l Conf. on Software Engineering (ICSE), pp 452–461

  50. Ohlsson N, Alberg H (1996) Predicting fault-prone software modules in telephone switches. IEEE Trans Softw Eng 22(12):886–894

    Article  Google Scholar 

  51. Ostrand TJ, Weyuker EJ, Bell RM (2010) Programmer-based fault prediction. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 19:1–19:10

  52. Purushothaman R, Perry D (2005) Toward understanding the rhetoric of small source code changes. IEEE Trans Softw Eng 31(6):511–526

    Article  Google Scholar 

  53. Ren X, Shah F, Tip F, Ryder BG, Chesley O (2004) Chianti: A tool for change impact analysis of java programs. In: OOPSLA. Vancouver, Canada

  54. Shadish W, Cook T, Campbell D (2002) Experimental and Quasi Experimental Designs for Generilized Causal Inference. Houghton Mifflin, Boston

  55. Shihab E (2011) Pragmatic prioritization of software quality assurance efforts. In: Proceedings of Int’l Conf. on Software Engineering (ICSE), pp 1106–1109

  56. Shihab E, Hassan AE, Adams B, Jiang ZM (2012) An industrial study on the risk of software changes. In: Proceedings of Int’l Sym. on Foundations of Software Engineering (FSE), pp 62:1–62:11

  57. Shihab E, Ihara A, Kamei Y, Ibrahim W, Ohira M, Adams B, Hassan A, Matsumoto KI (2010) Predicting re-opened bugs: A case study on the eclipse project, pp 249–258

  58. Shihab E, Kamei Y, Adams B, Hassan AE (2013) Is lines of code a good measure of effort in effort-aware models Inf Softw Technol 55(11):1981–1993

    Article  Google Scholar 

  59. Shihab E, Mockus A, Kamei Y, Adams B, Hassan AE (2011) High-impact defects: a study of breakage and surprise defects. In: Proceedings of Sym. and European Conf. on Foundations of Software Engineering (ESEC-FSE), pp 300–310

  60. Shin Y, Meneely A, Williams L, Osborne JA (2011) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Softw Eng 37(6):772–787

    Article  Google Scholar 

  61. Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: Proceedings of Int’l Workshop on Mining Software Repositories (MSR), pp 1–5

  62. Thung F, Wang S, Lo D, Jiang L (2012) An empirical study of bugs in machine learning systems. In: Proceedings of Int’l Sym. on Software Reliability Engineering (ISSRE), pp 271–280

  63. Tosun A, Bener A (2009) Reducing false alarms in software defect prediction by decision threshold optimization. In: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, ESEM ’09, pp. 477–480. IEEE Computer Society, Washington, DC, USA. doi:10.1109/ESEM.2009.5316006

  64. Tosun A, Bener AB, Turhan B, Menzies T (2010) Practical considerations in deploying statistical methods for defect prediction: A case study within the turkish telecommunications industry. Inf Softw Technol 52(11):1242–1257

    Article  Google Scholar 

  65. Yin Z, Yuan D, Zhou Y, Pasupathy S, Bairavasundaram L (2011) How do fixes become bugs? In: Proceedings of Sym. and European Conf. on Foundations of Software Engineering (ESEC-FSE), pp 26–36

  66. Zimmermann T, Nagappan N, Williams L (2010) Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In: Proceedings of Int’l Conf. on Software Testing, Verification and Validation (ICST)

  67. Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for Eclipse. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 9–15

  68. Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: Proceedings Int’l Conference on Software Engineering (ICSE), pp 563–572

Download references

Acknowledgments

This research was partially supported by JSPS KAKENHI Grant Numbers 24680003 and 25540026.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ayse Tosun Misirli.

Additional information

An erratum to this article can be found at http://dx.doi.org/10.1007/s10664-016-9455-3.

Communicated by: Andrea de Lucia

Appendix

Appendix

Table 12 presents the minimum, 25 % quartile, median, 75 % quartile and maximum values of three metrics for HIFCs and LIFCs respectively in all projects. Figure 8 presents three-dimensional scatter diagrams of all projects in terms of the total churn, total number of modified files and total number of modified subsystems used for clustering fix-inducing changes. The clusters depicting HIFC are colored as red, whereas the clusters for LIFC are colored as black in the scatter diagrams. The figures are zoomed to the region where LIFCs are highly concentrated and redrawn to show the differences between two clusters. Figure 9 also shows box-plots of all projects in terms of the total churn, total number of modified files and total number of modified subsystems in two, i.e., high impact and low impact, clusters.

Table 12 Descriptive statistics for two data clusters (HIFCs and LIFCs) in the studied projects
Fig. 8
figure8

Box plots of all projects in terms of three metrics used for identifying the clusters, HIFCs and LIFCs

Fig. 9
figure9

Three-dimensional scatter diagrams presenting the clusters for HIFCs (red) and LIFCs (black)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Misirli, A.T., Shihab, E. & Kamei, Y. Studying high impact fix-inducing changes. Empir Software Eng 21, 605–641 (2016). https://doi.org/10.1007/s10664-015-9370-z

Download citation

Keywords

  • Change metrics
  • Fix-inducing changes
  • Change impact