Empirical Software Engineering

, Volume 21, Issue 2, pp 303–336 | Cite as

The impact of tangled code changes on defect prediction models

  • Kim Herzig
  • Sascha Just
  • Andreas Zeller


When interacting with source control management system, developers often commit unrelated or loosely related code changes in a single transaction. When analyzing version histories, such tangled changes will make all changes to all modules appear related, possibly compromising the resulting analyses through noise and bias. In an investigation of five open-source Java projects, we found between 7 % and 20 % of all bug fixes to consist of multiple tangled changes. Using a multi-predictor approach to untangle changes, we show that on average at least 16.6 % of all source files are incorrectly associated with bug reports. These incorrect bug file associations seem to not significantly impact models classifying source files to have at least one bug or no bugs. But our experiments show that untangling tangled code changes can result in more accurate regression bug prediction models when compared to models trained and tested on tangled bug datasets—in our experiments, the statistically significant accuracy improvements lies between 5 % and 200 %. We recommend better change organization to limit the impact of tangled changes.


Defect prediction Untangling Data noise 



Jeremias Rößler and Nadja Altabari provided constructive feedback on earlier versions of this work. We thank the reviewers for their constructive comments.


  1. Alam O, Adams B, Hassan AE (2009) Measuring the progress of projects using the time dependence of code changes. In: ICSM, pp 329–338Google Scholar
  2. Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on Software engineering. ACM, pp 361–370Google Scholar
  3. Bachmann A, Bird C, Rahman F, Devanbu P, Bernstein A (2010) The missing links: bugs and bug-fix commits. In: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering. ACM, pp 97–106Google Scholar
  4. Bhattacharya P (2011) Using software evolution history to facilitate development and maintenance. In: Proceeding of the 33rd international conference on Software engineering. ACM, pp 1122– 1123Google Scholar
  5. Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced? Bias in bug-fix datasets. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, ESEC/FSE ’09. ACM, pp 121–130Google Scholar
  6. Bird C, Nagappan N, Gall H, Murphy B, Devanbu P (2009) Putting it all together: Using socio-technical networks to predict failures. In: Proceedings of the 2009 20th International Symposium on Software Reliability Engineering, ISSRE ’09. IEEE Computer Society, Washington, pp 109–119. doi: 10.1109/ISSRE.2009.17 Google Scholar
  7. Bonacich P (1987) Power and centrality: a family of measures. American journal of sociologyGoogle Scholar
  8. Csardi G, Nepusz T (2006) The igraph software package for complex network research. InterJournal Complex Systems:1695.
  9. Dagenais B, Hendren L (2008) Enabling static analysis for partial Java programs. In: Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, OOPSLA ’08. ACM, pp 313–328Google Scholar
  10. Dallmeier V (2010) Mining and checking object behavior. Ph.D. thesis, Universität des SaarlandesGoogle Scholar
  11. Herzig K (2012) Mining and untangling change genealogies. Ph.D. thesis, Universität des SaarlandesGoogle Scholar
  12. Herzig K., Just S., Rau A., Zeller A. (2013) Predicting defects using change genealogies. In: Proceedings of the 2013 IEEE 24nd international symposium on software reliability engineering, ISSRE ’13. IEEE Computer SocietyGoogle Scholar
  13. Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13. IEEE Press, Piscataway, pp 392–401Google Scholar
  14. Herzig K., Zeller A. (2013) The Impact of Tangled Code Changes. IEEE Press, Piscataway, pp 121–130Google Scholar
  15. Hindle A, German D, Godfrey M, Holt R (2009) Automatic classication of large changes into maintenance categories. In: Program comprehension, 2009. ICPC ’09. 17th International Conference on IEEE, pp. 30–39Google Scholar
  16. Hindle A, German DM, Holt R (2008) What do large commits tell us? A taxonomical study of large commits. In: Proceedings of the 2008 international working conference on Mining software repositories, MSR ’08. ACM, pp 99–108Google Scholar
  17. Jaccard P (1901) Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin del la Société Vaudoise des Sciences Naturelles 37:547–579Google Scholar
  18. Karypis G, Kumar V (1995) Analysis of multilevel graph partitioning. In: Proceedings of the 1995 ACM/IEEE conference on Supercomputing, Supercomputing 1995. ACMGoogle Scholar
  19. Karypis G, Kumar V (1995) MeTis: unstructured graph partitioning and sparse matrix ordering system. Version 2:0. URL Google Scholar
  20. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359–392MathSciNetCrossRefzbMATHGoogle Scholar
  21. Kawrykow D. (2011) Enabling precise interpretations of software change data. Master’s thesis, McGill UniversityGoogle Scholar
  22. Kawrykow D, Robillard MP (2011) Non-essential changes in version histories. In: Proceeding of the 33rd international conference on Software engineering, ICSE ’11. ACM, pp 351–360Google Scholar
  23. Kim S, Whitehead Jr. EJ, Zhang Y (2008) Classifying software changes: Clean or buggy. IEEE Trans Softw Eng 34:181–196CrossRefGoogle Scholar
  24. Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: Proceeding of the 33rd international conference on Software engineering, ICSE ’11. ACM, pp 481–490Google Scholar
  25. Kim S, Zimmermann T, Whitehead Jr. EJ, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th International Conference on Software Engineering, ICSE ’07. IEEE Computer Society, Washington, pp 489–498. doi: 10.1109/ICSE.2007.66 Google Scholar
  26. Kuhn M (2011) caret: classification and regression training. R package version 4.76.
  27. Li PL, Kivett R, Zhan Z, Jeon SE, Nagappan N, Murphy B, Ko AJ (2011) Characterizing the differences between pre- and post-release versions of software. In: Proceeding of the 33rd international conference on Software engineering. ACM, pp 716–725Google Scholar
  28. Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engg 17:375–407CrossRefGoogle Scholar
  29. Mockus A, Votta LG (2000) Identifying reasons for software changes using historic databases. In: Proceedings of the international conference on software maintenance (ICSM’00), ICSM ’00. IEEE Computer Society, pp 120–130Google Scholar
  30. Murphy-Hill E, Black A (2008) Refactoring tools: fitness for purpose. IEEE Software 25(5):38–44CrossRefGoogle Scholar
  31. Murphy-Hill E, Parnin C, Black AP (2009) How we refactor, and how we know it. Int Conf Softw Eng 287–297Google Scholar
  32. Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality: an empirical case study. In: Proceedings of the 30th international conference on Software engineering, ICSE ’08. ACM, New York, pp 521–530. doi: 10.1145/1368088.1368160 Google Scholar
  33. Nguyen TH, Adams B, Hassan AE (2010) A case study of bias in bug-fix datasets. In: 2010 17th Working Conference on Reverse Engineering. IEEE Computer Society, pp 259–268Google Scholar
  34. Premraj R, Herzig K (2011) Network versus code metrics to predict defects: a replication study. In: Proceedings of the 2011 international symposium on empirical software engineering and measurement, ESEM ’11. IEEE Computer Society, Washington, pp 215–224. doi: 10.1109/ESEM.2011.30 CrossRefGoogle Scholar
  35. R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical ComputingGoogle Scholar
  36. Robbes R, Lanza M, Lungu M (2007) An approach to software evolution based on semantic change. In: Fundamental approaches to software engineering, Lecture notes in computer science, vol 4422. Springer, Berlin, pp 27–41Google Scholar
  37. Stoerzer M, Ryder BG, Ren X, Tip F (2006) Finding failure-inducing changes in java programs using change classification. In: Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering. ACM, pp 57–68Google Scholar
  38. Tosun A, Turhan B, Bener A (2009) Validation of network measures as indicators of defective modules in software systems. In: Proceedings of the 5th international conference on predictor models in software engineering, PROMISE ’09. ACM, New York, pp 5:1–5:9. doi: 10.1145/1540438.1540446 Google Scholar
  39. Williams BJ, Carver JC (2010) Characterizing software architecture changes: a systematic review. Information and Software Technology 52(1):1–51CrossRefGoogle Scholar
  40. Witten IH, Frank E (2002) Data mining: practical machine learning tools and techniques with java implementations. SIGMOD Rec 31(1):76–77. doi: 10.1145/507338.507355 CrossRefGoogle Scholar
  41. Wloka J, Ryder B, Tip F, Ren X (2009) Safe-commit analysis to facilitate team software development. In: Proceedings of the 31st International Conference on Software Engineering, ICSE ’09. IEEE Computer Society, pp 507–517Google Scholar
  42. Zimmermann T, Nagappan N (2008) Predicting defects using network analysis on dependency graphs. In: Proceedings of the 30th international conference on Software engineering, ICSE ’08. ACM, New York, pp 531–540. doi: 10.1145/1368088.1368161 Google Scholar
  43. Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the third international workshop on predictor models in software engineering, PROMISE ’07. IEEE Computer SocietyGoogle Scholar
  44. Zimmermann T, Weigerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: Proceedings of the 26th international conference on software engineering. IEEE Computer Society, pp 563–572Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Microsoft ResearchCambridgeUK
  2. 2.Saarland UniversitySaarbückenGermany

Personalised recommendations