Empirical Software Engineering

, Volume 23, Issue 1, pp 334–383 | Cite as

An empirical study of the integration time of fixed issues

  • Daniel Alencar da Costa
  • Shane McIntosh
  • Uirá Kulesza
  • Ahmed E. Hassan
  • Surafel Lemma Abebe


Predicting the required time to fix an issue (i.e., a new feature, bug fix, or enhancement) has long been the goal of many software engineering researchers. However, after an issue has been fixed, it must be integrated into an official release to become visible to users. In theory, issues should be quickly integrated into releases after they are fixed. However, in practice, the integration of a fixed issue might be prevented in one or more releases before reaching users. For example, a fixed issue might be prevented from integration in order to assess the impact that this fixed issue may have on the system as a whole. While one can often speculate, it is not always clear why some fixed issues are integrated immediately, while others are prevented from integration. In this paper, we empirically study the integration of 20,995 fixed issues from the ArgoUML, Eclipse, and Firefox projects. Our results indicate that: (i) despite being fixed well before the release date, the integration of 34% to 60% of fixed issues in projects with traditional release cycle (the Eclipse and ArgoUML projects), and 98% of fixed issues in a project with a rapid release cycle (the Firefox project) was prevented in one or more releases; (ii) using information that we derive from fixed issues, our models are able to accurately predict the release in which a fixed issue will be integrated, achieving Areas Under the Curve (AUC) values of 0.62 to 0.93; and (iii) heuristics that estimate the effort that the team invests to fix issues is one of the most influential factors in our models. Furthermore, we fit models to study fixed issues that suffer from a long integration time. Such models, (iv) obtain AUC values of 0.82 to 0.96 and (v) derive much of their explanatory power from metrics that are related to the release cycle. Finally, we train regression models to study integration time in terms of number of days. Our models achieve R 2 values of 0.39 to 0.65, and indicate that the time at which an issue is fixed and the resolver of the issue have a large impact on the number of days that a fixed issue requires for integration. Our results indicate that, in addition to the backlog of issues that need to be fixed, the backlog of issues that need to be released introduces a software development overhead, which may lead to a longer integration time. Therefore, in addition to studying the triaging and fixing stages of the issue lifecycle, the integration stage should also be the target of future research and tooling efforts in order to reduce the time-to-delivery of fixed issues.


Integration time Integration delay Software maintenance Mining software repositories 


  1. Anbalagan P, Vouk M (2009) On predicting the time taken to correct bug reports in open source projects. In: Proceedings of the 2009 IEEE international conference on software maintenance, ICSM ’09, pp 523– 526Google Scholar
  2. Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the 2005 OOPSLA workshop on eclipse technology eXchange, eclipse ‘05, pp 35–39Google Scholar
  3. Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering, ICSE ’06, pp 361–370Google Scholar
  4. Bersani FS, Lindqvist D, Mellon SH, Epel ES, Yehuda R, Flory J, Henn-Hasse C, Bierer LM, Makotkine I, Abu-Amara D et al (2016) Association of dimensional psychological health measures with telomere length in male war veterans. J Affect Disord 190:537–542CrossRefGoogle Scholar
  5. Bhattacharya P, Neamtiu I (2011) Bug-fix time prediction models: can we do better? In: Proceedings of the 8th working conference on mining software repositories, MSR ‘11, pp 207–210Google Scholar
  6. Breiman L (2001) Random forests. In: Machine Learning, Springer Journal no. 10994, pp 5–32Google Scholar
  7. Brooks FP (1975) The mythical man-month, vol 1995. Addison-Wesley, ReadingGoogle Scholar
  8. Choetkiertikul M, Dam HK, Tran T, Ghose A (2017) Predicting the delay of issues with due dates in software projects. Empir Softw Eng J 23:1–41Google Scholar
  9. Choi H, Varian H (2012) Predicting the present with google trends. Econ Rec 88(s1):2–9CrossRefGoogle Scholar
  10. Cliff N (1993) Dominance statistics: ordinal analyses to answer ordinal questions. Psychol Bull 114:494–509CrossRefGoogle Scholar
  11. Costa DAD, Abebe SL, McIntosh S, Kulesza U, Hassan AE (2014) An empirical study of delays in the integration of addressed issues. In: 2014 IEEE international conference on software maintenance and evolution (ICSME), pp 281–290Google Scholar
  12. Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56 (293):52–64MathSciNetCrossRefzbMATHGoogle Scholar
  13. Dunn OJ (1964) Multiple comparisons using rank sums. Technometrics 6 (3):241–252CrossRefGoogle Scholar
  14. Efron B (1986) How biased is the apparent error rate of a prediction rule? J Am Stat Assoc 81 (394):461–470MathSciNetCrossRefzbMATHGoogle Scholar
  15. Freedman DA (2009) Statistical models: theory and practice. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  16. Giger E, Pinzger M, Gall H (2010) Predicting the fix time of bugs. In: Proceedings of the 2nd international workshop on recommendation systems for software engineering, RSSE ’10, pp 52–56Google Scholar
  17. Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: an empirical study of microsoft windows. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering—volume 1, ICSE ’10, pp 495–504Google Scholar
  18. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36CrossRefGoogle Scholar
  19. Harrell FE (2001) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis, Springer, BerlinGoogle Scholar
  20. Herraiz I, German DM, Gonzalez-Barahona JM, Robles G (2008) Towards a simplification of the bug report form in eclipse. In: Proceedings of the 2008 international working conference on mining software repositories, MSR ’08, pp 145–148Google Scholar
  21. Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering, ASE ’07, pp 34–43Google Scholar
  22. Howell DC (2005) Median absolute deviation. In: Encyclopedia of statistics in behavioral scienceGoogle Scholar
  23. Jeong G, Kim S, Zimmermann T (2009) Improving bug triage with bug tossing graphs. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pp 111–120Google Scholar
  24. Jiang Y, Adams B, German DM (2013) Will my patch make it? And how fast?: Case study on the linux kernel. In: Proceedings of the 10th working conference on mining software repositories, MSR ’13, pp 101–110Google Scholar
  25. Kim S, Whitehead EJ Jr (2006) How long did it take to fix bugs? In: Proceedings of the 2006 international workshop on mining software repositories, MSR ’06, pp 173–174Google Scholar
  26. Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621CrossRefzbMATHGoogle Scholar
  27. Leys C, Ley C, Klein O, Bernard P, Licata L (2013) Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median. Exp Social Psychol J 49:764–766CrossRefGoogle Scholar
  28. Mantyla MV, Khomh F, Adams B, Engstrom E, Petersen K (2013) On rapid releases and software testing. In: 2013 29th IEEE international conference on software maintenance (ICSM), pp 20–29Google Scholar
  29. Marks L, Zou Y, Hassan AE (2011) Studying the fix-time for bugs in large open source projects. In: Proceedings of the 7th international conference on predictive models in software engineering, Promise ’11, pp 11:1–11:8Google Scholar
  30. Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and mozilla. ACM Trans Softw Eng Methodol 11 (3):309–346CrossRefGoogle Scholar
  31. Morakot C, Hoa Khanh D, Truyen T, Aditya G (2015) Predicting delays in software projects using networked classification. In: 30th international conference on automated software engineering (ASE)Google Scholar
  32. Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: 27th international conference on software engineering, 2005. ICSE 2005. Proceedings, pp 284–292Google Scholar
  33. Olkin GCSFI (2002) Springer texts in statisticsGoogle Scholar
  34. Panjer LD (2007) Predicting eclipse bug lifetimes. In: Proceedings of the fourth international workshop on mining software repositories, MSR ’07, p 29Google Scholar
  35. Rahman MT, Rigby PC (2015) Release stabilization on linux and chrome. IEEE Softw 32(2):81–88CrossRefGoogle Scholar
  36. Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Should we really be using t-test and cohen’s d for evaluating group differences on the nsse and other surveys? In: Annual meeting of the Florida association of institutional researchGoogle Scholar
  37. Saha R, Khurshid S, Perry D (2014) An empirical study of long lived bugs. In: 2014 software evolution week—IEEE conference on software maintenance, reengineering and reverse engineering (CSMR-WCRE), pp 144–153Google Scholar
  38. Sarle W (1990) The varclus procedure. SAS/STAT User’s GuideGoogle Scholar
  39. Schroter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs? In: 2010 7th IEEE working conference on mining software repositories (MSR), pp 118–121Google Scholar
  40. Singer J (1999) Using the american psychological association (apa) style guidelines to report experimental results. In: Proceedings of workshop on empirical studies in software maintenance, pp 71–75Google Scholar
  41. Steel RG, James H (1960) Principles and procedures of statistics: with special reference to the biological sciences. Tech. rep. McGraw-Hill, New YorkzbMATHGoogle Scholar
  42. Tian Y, Ali N, Lo D, Hassan AE (2015) On the unreliability of bug severity data. Empir Softw Eng 21:1–26Google Scholar
  43. Weiß C., Premraj R., Zimmermann T., Zeller A. (2007) How long will it take to fix this bug? In: Proceedings of the fourth international workshop on mining software repositories, MSR ’07, p 1Google Scholar
  44. Zhang F, Khomh F, Zou Y, Hassan A (2012) An empirical study on factors impacting bug fixing time. In: 2012 19th working conference on reverse engineering (WCRE), pp 225–234Google Scholar
  45. Zhang H, Gong L, Versteeg S (2013) Predicting bug-fixing time: an empirical study of commercial software projects. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 1042–1051Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Department of Informatics and Applied Mathematics (DIMAp)Federal University of Rio Grande do NorteNatalBrazil
  2. 2.Department of Electrical and Computer EngineeringMcGill UniversityMontrealCanada
  3. 3.Software Analysis and Intelligence Lab (SAIL)Queen’s UniversityKingstonCanada
  4. 4.Addis Ababa Institute of TechnologyAddis Ababa UniversityAddis AbabaEthiopia

Personalised recommendations