Empirical Software Engineering

, Volume 21, Issue 2, pp 605–641 | Cite as

Studying high impact fix-inducing changes

Article

Abstract

As software systems continue to play an important role in our daily lives, their quality is of paramount importance. Therefore, a plethora of prior research has focused on predicting components of software that are defect-prone. One aspect of this research focuses on predicting software changes that are fix-inducing. Although the prior research on fix-inducing changes has many advantages in terms of highly accurate results, it has one main drawback: It gives the same level of impact to all fix-inducing changes. We argue that treating all fix-inducing changes the same is not ideal, since a small typo in a change is easier to address by a developer than a thread synchronization issue. Therefore, in this paper, we study high impact fix-inducing changes (HIFCs). Since the impact of a change can be measured in different ways, we first propose a measure of impact of the fix-inducing changes, which takes into account the implementation work that needs to be done by developers in later (fixing) changes. Our measure of impact for a fix-inducing change uses the amount of churn, the number of files and the number of subsystems modified by developers during an associated fix of the fix-inducing change. We perform our study using six large open source projects to build specialized models that identify HIFCs, determine the best indicators of HIFCs and examine the benefits of prioritizing HIFCs. Using change factors, we are able to predict 56 % to 77 % of HIFCs with an average false alarm (misclassification) rate of 16 %. We find that the lines of code added, the number of developers who worked on a change, and the number of prior modifications on the files modified during a change are the best indicators of HIFCs. Lastly, we observe that a specialized model for HIFCs can provide inspection effort savings of 4 % over the state-of-the-art models. We believe our results would help practitioners prioritize their efforts towards the most impactful fix-inducing changes and save inspection effort.

Keywords

Change metrics Fix-inducing changes Change impact 

References

  1. Arisholm E, Briand LC (2006) Predicting fault-prone components in a Java legacy system. In: Proceedings of International Symposium on Empirical Software Engineering (ISESE), pp 8–17Google Scholar
  2. Bachmann A, Bird C, Rahman F, Devanbu P, Bernstein A (2010) The missing links: Bugs and bug-fix commits. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’10, pp 97–106Google Scholar
  3. Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761Google Scholar
  4. Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced? Bias in bug-fix datasets. In: European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSEGoogle Scholar
  5. Boehm B, Basili VR (2005) Software defect reduction top 10 list. Foundations of empirical software engineering: the legacy of Victor R Basili 426Google Scholar
  6. Bradley P, Fayyad U, Reina C (1999) Scaling em (expectation-maximization) clustering to large databases. Technical Report, Microsoft Research, MSR-TR-98-35Google Scholar
  7. Breiman L (2001) Random forests. Mach learn 45:5–32CrossRefMATHGoogle Scholar
  8. Caglayan B, Misirli AT, Miranskyy AV, Turhan B, Bener A (2012) Factors characterizing reopened issues: a case study. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 1–10Google Scholar
  9. Caglayan B, Tosun A, Miranskyy A, Bener A, Ruffolo N (2010) Usage of multiple prediction models based on defect categories. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 8:1–8:9Google Scholar
  10. Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493. doi:10.1109/32.295895 CrossRefGoogle Scholar
  11. Czerwonka J, Das R, Nagappan N, Tarvo A, Teterev A (2011) Crane: Failure prediction, change analysis and test prioritization in practice – experiences from windows. In: Proceedings of Int’l Confernce on Software Testing, Verification and Validation (ICST), pp 357–366Google Scholar
  12. D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proceedings of Int’l Working Conference on Mining Software Repositories (MSR), pp 31–41Google Scholar
  13. Estabrooks A, Japkowicz N (2001) A mixture-of-experts framework for learning from imbalanced data sets. In: Proceedings of Int’l Conference on Advances in Intelligent Data Analysis (IDA), pp 34– 43Google Scholar
  14. Eyolfson J, Tan L, Lam P (2011) Do time of day and developer experience affect commit bugginess. In: Proceedings of Int’l Working Conference on Mining Software Repositories (MSR), pp 153– 162Google Scholar
  15. Fraley C, Raftery A, Murphy B, Scrucca L (2012) mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report, Department of Statistics. University of WashingtonGoogle Scholar
  16. Gall J, Razavi N, Van Gool L (2012) An introduction to random forests for multi-class object detection. In: sProceedings of Int’l Conference on Theoretical Foundations of Computer Vision: outdoor and large-scale real-world scene analysis, pp 243–263Google Scholar
  17. Gethers M, Dit B, Kagdi H, Poshyvanyk D (2012) Integrated impact analysis for managing software changes. In: International Conference on Software EngineeringGoogle Scholar
  18. Giger E, D’Ambros M, Pinzger M, Gall HC (2012) Method-level bug prediction. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’12, pp 171–180Google Scholar
  19. Giger E, Pinzger M, Gall H (2012) Can we predict types of code changes? An empirical analysis. In: Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on, pp 217– 226Google Scholar
  20. Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661Google Scholar
  21. Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: An empirical study of microsoft windows. In: Proceedings of Int’l Conference on Software Engineering (ICSE), pp 495–504Google Scholar
  22. Halstead M (1977) Elements of Software Science. Elsevier North-Holland, AmsterdamGoogle Scholar
  23. Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of Int’l Conf. on Software Engineering (ICSE), pp 78–88Google Scholar
  24. Hata H, Mizuno O, Kikuno T (2012) Bug prediction based on fine-grained module histories. In: Proceedings of the 2012 International Conference on Software Engineering, ICSE 2012, pp 200–210Google Scholar
  25. Herraiz I, Gonzalez-Barahona JM, Robles G (2007) Towards a theoretical model for software growth. In: Proceedings of Int’l Workshop on Mining Software Repositories (MSR), p 21Google Scholar
  26. Herzig K, Just S, Rau A, Zeller A (2013) Predicting defects using change genealogies. In: Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on, pp 118– 127. IEEEGoogle Scholar
  27. Jiang Y, Cukic B, Menzies T (2008) Can data transformation help in the detection of fault-prone modules. In: Proceedings of Int’l Workshop on Defects in Large Software Systems (DEFECTS), pp 16–20Google Scholar
  28. Kamei Y, Monden A, Morisaki S, Matsumoto KI (2008) A hybrid faulty module prediction using association rule mining and logistic regression analysis. In: Proceedings of Int’l Symposium on Empirical Software Engineering and Measurement (EMSE), pp 279–281Google Scholar
  29. Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773CrossRefGoogle Scholar
  30. Kawrykow D, Robillard MP (2011) Non-essential changes in version histories. In: ICSE, pp 351– 360Google Scholar
  31. Kim S, Ernst MD (2007) Which warnings should I fix first? In: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pp 45–54. ACMGoogle Scholar
  32. Kim S, Whitehead Jr. EJ, Zhang Y (2008) Classifying software changes: Clean or buggy. IEEE Trans Softw Eng 34(2):181–196CrossRefGoogle Scholar
  33. Kim S, Zimmermann T, Pan K, Whitehead EJ (2006) Automatic identification of bug-introducing changes. In: Automated Software Engineering, 2006. ASE’06. 21st IEEE/ACM International Conference on, pp 81–90. IEEEGoogle Scholar
  34. Lehnert S (2011) A review of software change impact analysis. Technical Report, Technische Universität IlmenauGoogle Scholar
  35. Leszak M, Perry DE, Stoll D (2002) Classification and evaluation of defects in a project retrospective. J Syst Softw 61(3):173–187CrossRefGoogle Scholar
  36. Li B, Sun X, Leung H, Zhang S (2013) A survey of code-based change impact analysis techniques. Soft Test Verification Reliab 23:613–646CrossRefGoogle Scholar
  37. Liaw A, Wiener M (2013) randomforest: Breiman and cutler’s random forests for classification and regression. Online: cran.r-project.org/web/packages/randomForest/index.html
  38. Madhavan JT, Whitehead EJ (2007) Predicting buggy changes inside an integrated development environment. In: Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange, pp 36–40. ACMGoogle Scholar
  39. Matsumoto S, Kamei Y, Monden A, Matsumoto K, Nakamura M (2010) An analysis of developer metrics for fault prediction. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 18:1–18:9Google Scholar
  40. McCabe TJ (1976) A complexity measure. In: Proceedings of Int’l Conference on Software Engineering (ICSE), p 407Google Scholar
  41. Mende T, Koschke R (2010) Effort-aware defect prediction models. In: Proceedings of European Conference on Software Maintenance and Reengineering (CSMR), pp 107–116Google Scholar
  42. Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 47–54Google Scholar
  43. Misirli AT, Caglayan B, Bener A, Turhan B (2013) A retrospective study of software analytics projects: In-depth interviews with practitioners. IEEE Soft 30:54–61CrossRefGoogle Scholar
  44. Misirli AT, Caglayan B, Miranskyy A, Bener A, Ruffolo N (2011) Different strokes for different folks: A case study on software metrics for different defect categories. In: Proceedings of Int’l Workshop on Emerging Trends in Software Metrics (WETSOM)Google Scholar
  45. Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Tech J 5(2):169–180CrossRefGoogle Scholar
  46. Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of Int’l Conference on Software Engineering (ICSE), pp 181–190Google Scholar
  47. Nagappan N, Ball T (2005) Static analysis tools as early indicators of pre-release defect density. In: Proceedings of Int’l Conference on Software Engineering (ICSE), pp 580–586Google Scholar
  48. Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of Int’l Conf. on Software Engineering (ICSE), pp 284–292Google Scholar
  49. Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of Int’l Conf. on Software Engineering (ICSE), pp 452–461Google Scholar
  50. Ohlsson N, Alberg H (1996) Predicting fault-prone software modules in telephone switches. IEEE Trans Softw Eng 22(12):886–894CrossRefGoogle Scholar
  51. Ostrand TJ, Weyuker EJ, Bell RM (2010) Programmer-based fault prediction. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 19:1–19:10Google Scholar
  52. Purushothaman R, Perry D (2005) Toward understanding the rhetoric of small source code changes. IEEE Trans Softw Eng 31(6):511–526CrossRefGoogle Scholar
  53. Ren X, Shah F, Tip F, Ryder BG, Chesley O (2004) Chianti: A tool for change impact analysis of java programs. In: OOPSLA. Vancouver, CanadaGoogle Scholar
  54. Shadish W, Cook T, Campbell D (2002) Experimental and Quasi Experimental Designs for Generilized Causal Inference. Houghton Mifflin, BostonGoogle Scholar
  55. Shihab E (2011) Pragmatic prioritization of software quality assurance efforts. In: Proceedings of Int’l Conf. on Software Engineering (ICSE), pp 1106–1109Google Scholar
  56. Shihab E, Hassan AE, Adams B, Jiang ZM (2012) An industrial study on the risk of software changes. In: Proceedings of Int’l Sym. on Foundations of Software Engineering (FSE), pp 62:1–62:11Google Scholar
  57. Shihab E, Ihara A, Kamei Y, Ibrahim W, Ohira M, Adams B, Hassan A, Matsumoto KI (2010) Predicting re-opened bugs: A case study on the eclipse project, pp 249–258Google Scholar
  58. Shihab E, Kamei Y, Adams B, Hassan AE (2013) Is lines of code a good measure of effort in effort-aware models Inf Softw Technol 55(11):1981–1993CrossRefGoogle Scholar
  59. Shihab E, Mockus A, Kamei Y, Adams B, Hassan AE (2011) High-impact defects: a study of breakage and surprise defects. In: Proceedings of Sym. and European Conf. on Foundations of Software Engineering (ESEC-FSE), pp 300–310Google Scholar
  60. Shin Y, Meneely A, Williams L, Osborne JA (2011) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Softw Eng 37(6):772–787CrossRefGoogle Scholar
  61. Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: Proceedings of Int’l Workshop on Mining Software Repositories (MSR), pp 1–5Google Scholar
  62. Thung F, Wang S, Lo D, Jiang L (2012) An empirical study of bugs in machine learning systems. In: Proceedings of Int’l Sym. on Software Reliability Engineering (ISSRE), pp 271–280Google Scholar
  63. Tosun A, Bener A (2009) Reducing false alarms in software defect prediction by decision threshold optimization. In: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, ESEM ’09, pp. 477–480. IEEE Computer Society, Washington, DC, USA. doi:10.1109/ESEM.2009.5316006
  64. Tosun A, Bener AB, Turhan B, Menzies T (2010) Practical considerations in deploying statistical methods for defect prediction: A case study within the turkish telecommunications industry. Inf Softw Technol 52(11):1242–1257CrossRefGoogle Scholar
  65. Yin Z, Yuan D, Zhou Y, Pasupathy S, Bairavasundaram L (2011) How do fixes become bugs? In: Proceedings of Sym. and European Conf. on Foundations of Software Engineering (ESEC-FSE), pp 26–36Google Scholar
  66. Zimmermann T, Nagappan N, Williams L (2010) Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In: Proceedings of Int’l Conf. on Software Testing, Verification and Validation (ICST)Google Scholar
  67. Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for Eclipse. In: Proceedings of Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 9–15Google Scholar
  68. Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: Proceedings Int’l Conference on Software Engineering (ICSE), pp 563–572Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Ayse Tosun Misirli
    • 1
  • Emad Shihab
    • 2
  • Yasukata Kamei
    • 3
  1. 1.Faculty of Computer and InformaticsIstanbul Technical UniversityIstanbulTurkey
  2. 2.Department of Computer Science and Software EngineeringConcordia UniversityQuebecCanada
  3. 3.Graduate School and Faculty of Information Science and Electrical EngineeringKyushu UniversityFukuokaJapan

Personalised recommendations