Skip to main content
Log in

Studying high impact fix-inducing changes

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

An Erratum to this article was published on 01 September 2016

Abstract

As software systems continue to play an important role in our daily lives, their quality is of paramount importance. Therefore, a plethora of prior research has focused on predicting components of software that are defect-prone. One aspect of this research focuses on predicting software changes that are fix-inducing. Although the prior research on fix-inducing changes has many advantages in terms of highly accurate results, it has one main drawback: It gives the same level of impact to all fix-inducing changes. We argue that treating all fix-inducing changes the same is not ideal, since a small typo in a change is easier to address by a developer than a thread synchronization issue. Therefore, in this paper, we study high impact fix-inducing changes (HIFCs). Since the impact of a change can be measured in different ways, we first propose a measure of impact of the fix-inducing changes, which takes into account the implementation work that needs to be done by developers in later (fixing) changes. Our measure of impact for a fix-inducing change uses the amount of churn, the number of files and the number of subsystems modified by developers during an associated fix of the fix-inducing change. We perform our study using six large open source projects to build specialized models that identify HIFCs, determine the best indicators of HIFCs and examine the benefits of prioritizing HIFCs. Using change factors, we are able to predict 56 % to 77 % of HIFCs with an average false alarm (misclassification) rate of 16 %. We find that the lines of code added, the number of developers who worked on a change, and the number of prior modifications on the files modified during a change are the best indicators of HIFCs. Lastly, we observe that a specialized model for HIFCs can provide inspection effort savings of 4 % over the state-of-the-art models. We believe our results would help practitioners prioritize their efforts towards the most impactful fix-inducing changes and save inspection effort.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. git clone git://git.gnome.org/gimp

  2. git clone git://git.apache.org/maven-2.git

  3. git clone gitclonegit://perl5.git.perl.org/perl.git

  4. git clone git://git.postgresql.org/git/postgresql.git

  5. git clone github.com/rails/rails.git

  6. git clone github.com/mozilla/rhino.git

References

  • Arisholm E, Briand LC (2006) Predicting fault-prone components in a Java legacy system. In: Proceedings of International Symposium on Empirical Software Engineering (ISESE), pp 8–17

  • Bachmann A, Bird C, Rahman F, Devanbu P, Bernstein A (2010) The missing links: Bugs and bug-fix commits. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’10, pp 97–106

  • Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761

  • Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced? Bias in bug-fix datasets. In: European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE

  • Boehm B, Basili VR (2005) Software defect reduction top 10 list. Foundations of empirical software engineering: the legacy of Victor R Basili 426

  • Bradley P, Fayyad U, Reina C (1999) Scaling em (expectation-maximization) clustering to large databases. Technical Report, Microsoft Research, MSR-TR-98-35

  • Breiman L (2001) Random forests. Mach learn 45:5–32

    Article  MATH  Google Scholar 

  • Caglayan B, Misirli AT, Miranskyy AV, Turhan B, Bener A (2012) Factors characterizing reopened issues: a case study. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 1–10

  • Caglayan B, Tosun A, Miranskyy A, Bener A, Ruffolo N (2010) Usage of multiple prediction models based on defect categories. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 8:1–8:9

  • Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493. doi:10.1109/32.295895

    Article  Google Scholar 

  • Czerwonka J, Das R, Nagappan N, Tarvo A, Teterev A (2011) Crane: Failure prediction, change analysis and test prioritization in practice – experiences from windows. In: Proceedings of Int’l Confernce on Software Testing, Verification and Validation (ICST), pp 357–366

  • D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proceedings of Int’l Working Conference on Mining Software Repositories (MSR), pp 31–41

  • Estabrooks A, Japkowicz N (2001) A mixture-of-experts framework for learning from imbalanced data sets. In: Proceedings of Int’l Conference on Advances in Intelligent Data Analysis (IDA), pp 34– 43

  • Eyolfson J, Tan L, Lam P (2011) Do time of day and developer experience affect commit bugginess. In: Proceedings of Int’l Working Conference on Mining Software Repositories (MSR), pp 153– 162

  • Fraley C, Raftery A, Murphy B, Scrucca L (2012) mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report, Department of Statistics. University of Washington

  • Gall J, Razavi N, Van Gool L (2012) An introduction to random forests for multi-class object detection. In: sProceedings of Int’l Conference on Theoretical Foundations of Computer Vision: outdoor and large-scale real-world scene analysis, pp 243–263

  • Gethers M, Dit B, Kagdi H, Poshyvanyk D (2012) Integrated impact analysis for managing software changes. In: International Conference on Software Engineering

  • Giger E, D’Ambros M, Pinzger M, Gall HC (2012) Method-level bug prediction. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’12, pp 171–180

  • Giger E, Pinzger M, Gall H (2012) Can we predict types of code changes? An empirical analysis. In: Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on, pp 217– 226

  • Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661

  • Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: An empirical study of microsoft windows. In: Proceedings of Int’l Conference on Software Engineering (ICSE), pp 495–504

  • Halstead M (1977) Elements of Software Science. Elsevier North-Holland, Amsterdam

  • Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of Int’l Conf. on Software Engineering (ICSE), pp 78–88

  • Hata H, Mizuno O, Kikuno T (2012) Bug prediction based on fine-grained module histories. In: Proceedings of the 2012 International Conference on Software Engineering, ICSE 2012, pp 200–210

  • Herraiz I, Gonzalez-Barahona JM, Robles G (2007) Towards a theoretical model for software growth. In: Proceedings of Int’l Workshop on Mining Software Repositories (MSR), p 21

  • Herzig K, Just S, Rau A, Zeller A (2013) Predicting defects using change genealogies. In: Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on, pp 118– 127. IEEE

  • Jiang Y, Cukic B, Menzies T (2008) Can data transformation help in the detection of fault-prone modules. In: Proceedings of Int’l Workshop on Defects in Large Software Systems (DEFECTS), pp 16–20

  • Kamei Y, Monden A, Morisaki S, Matsumoto KI (2008) A hybrid faulty module prediction using association rule mining and logistic regression analysis. In: Proceedings of Int’l Symposium on Empirical Software Engineering and Measurement (EMSE), pp 279–281

  • Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773

    Article  Google Scholar 

  • Kawrykow D, Robillard MP (2011) Non-essential changes in version histories. In: ICSE, pp 351– 360

  • Kim S, Ernst MD (2007) Which warnings should I fix first? In: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pp 45–54. ACM

  • Kim S, Whitehead Jr. EJ, Zhang Y (2008) Classifying software changes: Clean or buggy. IEEE Trans Softw Eng 34(2):181–196

    Article  Google Scholar 

  • Kim S, Zimmermann T, Pan K, Whitehead EJ (2006) Automatic identification of bug-introducing changes. In: Automated Software Engineering, 2006. ASE’06. 21st IEEE/ACM International Conference on, pp 81–90. IEEE

  • Lehnert S (2011) A review of software change impact analysis. Technical Report, Technische Universität Ilmenau

  • Leszak M, Perry DE, Stoll D (2002) Classification and evaluation of defects in a project retrospective. J Syst Softw 61(3):173–187

    Article  Google Scholar 

  • Li B, Sun X, Leung H, Zhang S (2013) A survey of code-based change impact analysis techniques. Soft Test Verification Reliab 23:613–646

    Article  Google Scholar 

  • Liaw A, Wiener M (2013) randomforest: Breiman and cutler’s random forests for classification and regression. Online: cran.r-project.org/web/packages/randomForest/index.html

  • Madhavan JT, Whitehead EJ (2007) Predicting buggy changes inside an integrated development environment. In: Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange, pp 36–40. ACM

  • Matsumoto S, Kamei Y, Monden A, Matsumoto K, Nakamura M (2010) An analysis of developer metrics for fault prediction. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 18:1–18:9

  • McCabe TJ (1976) A complexity measure. In: Proceedings of Int’l Conference on Software Engineering (ICSE), p 407

  • Mende T, Koschke R (2010) Effort-aware defect prediction models. In: Proceedings of European Conference on Software Maintenance and Reengineering (CSMR), pp 107–116

  • Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 47–54

  • Misirli AT, Caglayan B, Bener A, Turhan B (2013) A retrospective study of software analytics projects: In-depth interviews with practitioners. IEEE Soft 30:54–61

    Article  Google Scholar 

  • Misirli AT, Caglayan B, Miranskyy A, Bener A, Ruffolo N (2011) Different strokes for different folks: A case study on software metrics for different defect categories. In: Proceedings of Int’l Workshop on Emerging Trends in Software Metrics (WETSOM)

  • Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Tech J 5(2):169–180

    Article  Google Scholar 

  • Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of Int’l Conference on Software Engineering (ICSE), pp 181–190

  • Nagappan N, Ball T (2005) Static analysis tools as early indicators of pre-release defect density. In: Proceedings of Int’l Conference on Software Engineering (ICSE), pp 580–586

  • Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of Int’l Conf. on Software Engineering (ICSE), pp 284–292

  • Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of Int’l Conf. on Software Engineering (ICSE), pp 452–461

  • Ohlsson N, Alberg H (1996) Predicting fault-prone software modules in telephone switches. IEEE Trans Softw Eng 22(12):886–894

    Article  Google Scholar 

  • Ostrand TJ, Weyuker EJ, Bell RM (2010) Programmer-based fault prediction. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 19:1–19:10

  • Purushothaman R, Perry D (2005) Toward understanding the rhetoric of small source code changes. IEEE Trans Softw Eng 31(6):511–526

    Article  Google Scholar 

  • Ren X, Shah F, Tip F, Ryder BG, Chesley O (2004) Chianti: A tool for change impact analysis of java programs. In: OOPSLA. Vancouver, Canada

  • Shadish W, Cook T, Campbell D (2002) Experimental and Quasi Experimental Designs for Generilized Causal Inference. Houghton Mifflin, Boston

  • Shihab E (2011) Pragmatic prioritization of software quality assurance efforts. In: Proceedings of Int’l Conf. on Software Engineering (ICSE), pp 1106–1109

  • Shihab E, Hassan AE, Adams B, Jiang ZM (2012) An industrial study on the risk of software changes. In: Proceedings of Int’l Sym. on Foundations of Software Engineering (FSE), pp 62:1–62:11

  • Shihab E, Ihara A, Kamei Y, Ibrahim W, Ohira M, Adams B, Hassan A, Matsumoto KI (2010) Predicting re-opened bugs: A case study on the eclipse project, pp 249–258

  • Shihab E, Kamei Y, Adams B, Hassan AE (2013) Is lines of code a good measure of effort in effort-aware models Inf Softw Technol 55(11):1981–1993

    Article  Google Scholar 

  • Shihab E, Mockus A, Kamei Y, Adams B, Hassan AE (2011) High-impact defects: a study of breakage and surprise defects. In: Proceedings of Sym. and European Conf. on Foundations of Software Engineering (ESEC-FSE), pp 300–310

  • Shin Y, Meneely A, Williams L, Osborne JA (2011) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Softw Eng 37(6):772–787

    Article  Google Scholar 

  • Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: Proceedings of Int’l Workshop on Mining Software Repositories (MSR), pp 1–5

  • Thung F, Wang S, Lo D, Jiang L (2012) An empirical study of bugs in machine learning systems. In: Proceedings of Int’l Sym. on Software Reliability Engineering (ISSRE), pp 271–280

  • Tosun A, Bener A (2009) Reducing false alarms in software defect prediction by decision threshold optimization. In: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, ESEM ’09, pp. 477–480. IEEE Computer Society, Washington, DC, USA. doi:10.1109/ESEM.2009.5316006

  • Tosun A, Bener AB, Turhan B, Menzies T (2010) Practical considerations in deploying statistical methods for defect prediction: A case study within the turkish telecommunications industry. Inf Softw Technol 52(11):1242–1257

    Article  Google Scholar 

  • Yin Z, Yuan D, Zhou Y, Pasupathy S, Bairavasundaram L (2011) How do fixes become bugs? In: Proceedings of Sym. and European Conf. on Foundations of Software Engineering (ESEC-FSE), pp 26–36

  • Zimmermann T, Nagappan N, Williams L (2010) Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In: Proceedings of Int’l Conf. on Software Testing, Verification and Validation (ICST)

  • Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for Eclipse. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 9–15

  • Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: Proceedings Int’l Conference on Software Engineering (ICSE), pp 563–572

Download references

Acknowledgments

This research was partially supported by JSPS KAKENHI Grant Numbers 24680003 and 25540026.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayse Tosun Misirli.

Additional information

Communicated by: Andrea de Lucia

An erratum to this article can be found at http://dx.doi.org/10.1007/s10664-016-9455-3.

Appendix

Appendix

Table 12 presents the minimum, 25 % quartile, median, 75 % quartile and maximum values of three metrics for HIFCs and LIFCs respectively in all projects. Figure 8 presents three-dimensional scatter diagrams of all projects in terms of the total churn, total number of modified files and total number of modified subsystems used for clustering fix-inducing changes. The clusters depicting HIFC are colored as red, whereas the clusters for LIFC are colored as black in the scatter diagrams. The figures are zoomed to the region where LIFCs are highly concentrated and redrawn to show the differences between two clusters. Figure 9 also shows box-plots of all projects in terms of the total churn, total number of modified files and total number of modified subsystems in two, i.e., high impact and low impact, clusters.

Table 12 Descriptive statistics for two data clusters (HIFCs and LIFCs) in the studied projects
Fig. 8
figure 8

Box plots of all projects in terms of three metrics used for identifying the clusters, HIFCs and LIFCs

Fig. 9
figure 9

Three-dimensional scatter diagrams presenting the clusters for HIFCs (red) and LIFCs (black)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Misirli, A.T., Shihab, E. & Kamei, Y. Studying high impact fix-inducing changes. Empir Software Eng 21, 605–641 (2016). https://doi.org/10.1007/s10664-015-9370-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-015-9370-z

Keywords

Navigation