Skip to main content
Log in

Studying re-opened bugs in open source software

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Bug fixing accounts for a large amount of the software maintenance resources. Generally, bugs are reported, fixed, verified and closed. However, in some cases bugs have to be re-opened. Re-opened bugs increase maintenance costs, degrade the overall user-perceived quality of the software and lead to unnecessary rework by busy practitioners. In this paper, we study and predict re-opened bugs through a case study on three large open source projects—namely Eclipse, Apache and OpenOffice. We structure our study along four dimensions: (1) the work habits dimension (e.g., the weekday on which the bug was initially closed), (2) the bug report dimension (e.g., the component in which the bug was found) (3) the bug fix dimension (e.g., the amount of time it took to perform the initial fix) and (4) the team dimension (e.g., the experience of the bug fixer). We build decision trees using the aforementioned factors that aim to predict re-opened bugs. We perform top node analysis to determine which factors are the most important indicators of whether or not a bug will be re-opened. Our study shows that the comment text and last status of the bug when it is initially closed are the most important factors related to whether or not a bug will be re-opened. Using a combination of these dimensions, we can build explainable prediction models that can achieve a precision between 52.1–78.6 % and a recall in the range of 70.5–94.1 % when predicting whether a bug will be re-opened. We find that the factors that best indicate which bugs might be re-opened vary based on the project. The comment text is the most important factor for the Eclipse and OpenOffice projects, while the last status is the most important one for Apache. These factors should be closely examined in order to reduce maintenance cost due to re-opened bugs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. For example, in some cases developers discover a bug and know how to fix it, however they create a bug report and assign it to themselves for book-keeping purposes.

  2. For re-opened bugs, we used all the comments posted before the bugs were re-opened.

References

  • Anbalagan P, Vouk M (2009) “Days of the week” effect in predicting the time taken to fix defects. In: DEFECTS ’09: proceedings of the 2nd international workshop on defects in large software systems, pp 29–30

  • Androutsopoulos I, Koutsias J, Cb KV, Spyropoulos CD (2000) An experimental comparison of Naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp 160–167

  • Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: CASCON ’08: proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, pp 304–318

  • Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: ICSE’06: proceedings of the 28th international conference on software engineering, pp 361–370

  • Aranda J, Venolia G (2009) The secret life of bugs: going past the errors and omissions in software repositories. In: ICSE ’09: proceedings of the 31st international conference on software engineering, pp 298–308

  • Barandela R, Sánchez JS, García V, Rangel E (2003) Strategies for learning in class imbalance problems. Pattern Recogn 36(3):849–851

    Article  Google Scholar 

  • Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008a) What makes a good bug report? In: SIGSOFT ’08/FSE-16: proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, pp 308–318

  • Bettenburg N, Premraj R, Zimmermann T, Kim S (2008b) Duplicate bug reports considered harmful really? In: ICSM ’08: proceedings of international conference on software maintenance, pp 337–345

  • Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009a) Fair and balanced? Bias in bug-fix datasets. In: ESEC/FSE’09: proceedings of the the seventh joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 121–130

  • Bird C, Nagappan N, Devanbu P, Gall H, Murphy B (2009b) Does distributed development affect software quality? An empirical case study of windows vista. In: ICSE ’09: proceedings of the 31st international conference on software engineering, pp 518–528

  • Cataldo M, Mockus A, Roberts JA, Herbsleb JD (2009) Software dependencies, work dependencies, and their impact on failures. IEEE Trans Softw Eng 99(6):864–878

    Article  Google Scholar 

  • Chan P, Stolfo SJ (1998) Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the fourth international conference on knowledge discovery and data mining, pp 164–168

  • D’Ambros M, Lanza M, Robbes R (2009) On the relationship between change coupling and software defects. Working conference on reverse engineering, pp 135–144

  • Efron B (1983) Estimating the error rate of a prediction rule: improvement on cross-validation. J Am Stat Assoc 78(382):316–331

    Article  MathSciNet  MATH  Google Scholar 

  • Erlikh L (2000) Leveraging legacy system dollars for e-business. IT Prof 2(3):17–23

    Article  Google Scholar 

  • Estabrooks A, Japkowicz N (2001) A mixture-of-experts framework for learning from imbalanced data sets. In: IDA ’01: proceedings of the 4th international conference on advances in intelligent data analysis, pp 34–43

  • Eyolfson J, Tan L, Lam P (2008) Do time of day and developer experience affect commit bugginess? In: MSR ’11: proceedings of the 8th working conference on mining software repositories, pp 153–162

  • Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM’03: proceedings of the international conference on software maintenance, pp 23–32

  • Freund Y, Schapire RE (1995) A decision-theoretic generalization of online learning and an application to boosting. In: Second European conf. on computational learning theory (EuroCOLT), pp 23–37

  • Graham P (2002) A plan for spam. http://paulgraham.com/spam.html. Accessed Mar 2012

  • Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661

    Article  Google Scholar 

  • Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows. In: ICSE ’10: proceedings of the 32nd ACM/IEEE international conference on software engineering, vol 1, pp 495–504

  • Hassan AE (2009) Predicting faults using the complexity of code changes. In: ICSE ’09: proceedings of the 2009 IEEE 31st international conference on software engineering, pp 78–88

  • Hassan AE, Zhang K (2006) Using decision trees to predict the certification result of a build. In: ASE ’06: proceedings of the 21st IEEE/ACM international conference on automated software engineering, pp 189–198

  • Herraiz I, German DM, Gonzalez-Barahona JM, Robles G (2008) Towards a simplification of the bug report form in eclipse. In: MSR ’08: proceedings of the 2008 international working conference on mining software repositories, pp 145–148

  • Hewett R, Kijsanayothin P (2009) On modeling software defect repair time. Empir Software Eng 14(2):165–186

    Article  Google Scholar 

  • Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: ASE ’07: proceedings of the twenty-second IEEE/ACM international conference on automated software engineering, pp 34–43

  • Ibrahim WM, Bettenburg N, Shihab E, Adams B, Hassan AE (2010) Should I contribute to this discussion? In: MSR ’10: proceedings of the 2010 international working conference on mining software repositories

  • Jeong G, Kim S, Zimmermann T (2009) Improving bug triage with bug tossing graphs. In: ESEC/FSE ’09: proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pp 111–120

  • Kim S, Whitehead EJ Jr (2006) How long did it take to fix bugs? In: MSR ’06: proceedings of the 2006 international workshop on mining software repositories, pp 173–174

  • Kim S, James Whitehead JE, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34:181–196

    Article  Google Scholar 

  • Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: ICSE ’06: proceedings of the IEEE working conference on mining software repositories, pp 1–10

  • Lee T, Nam J, Han D, Kim S, In H (2011) Micro interaction metrics for defect prediction. In: ESEC/FSE ’11: proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, pp 311–321

  • Menzies T, Dekhtyar A, Distefano J, Greenwald J 2007 Problems with precision: a response to comments on data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33:637–640

    Article  Google Scholar 

  • Meyer TA, Whateley B (2004) SpamBayes: effective open-source, Bayesian based, email classification system. In: Proceedings of the first conference on email and anti-spam

  • Michelakis E, Androutsopoulos I, Paliouras G, Sakkis G, Stamatopoulos P (2004) Filtron: a learning-based anti-spam filter. In: Proceedings of the 1st conference on email and anti-spam

  • Mizuno O, Hata H (2010) An integrated approach to detect fault-prone modules using complexity and text feature metrics. In: Proceedings of the 2010 international conference on advances in computer science and information technology, pp 457–568

  • Mizuno O, Ikami S, Nakaichi S, Kikuno T (2007) Spam filter based approach for finding fault-prone software modules. In: MSR’07: proceedings of the fourth international workshop on mining software repositories, pp 4–8

  • Mockus A (2010) Organizational volatility and its effects on software defects. In: FSE ’10: proceedings of the eighteenth ACM SIGSOFT international symposium on foundations of software engineering, pp 117–126

  • Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and mozilla. ACM Trans Softw Eng Methodol 11(3):309–346

    Article  Google Scholar 

  • Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: ICSE ’08: proceedings of the 30th international conference on Software engineering, pp 181–190

  • Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality: an empirical case study. In: ICSE ’08: proceedings of the 30th international conference on Software engineering, pp 521–530

  • Panjer LD (2007) Predicting eclipse bug lifetimes. In: MSR ’07: proceedings of the fourth international workshop on mining software repositories, p 29

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc

  • Sánchez JS, Barandela R, Marqués AI, Alejo R (2001) Performance evaluation of prototype selection algorithms for nearest neighbor classification. In: SIBGRAPI ’01: proceedings of the XIV Brazilian symposium on computer graphics and image processing, p 44

  • Sayyad J, Lethbridge C (2001) Supporting software maintenance by mining software update records. In: ICSM ’01: proceedings of the IEEE International Conference on Software Maintenance (ICSM’01), p 22

  • Schröter A, Zimmermann T, Premraj R, Zeller A (2006) If your bug database could talk.... In: ISESE ’06: proceedings of the 5th international symposium on empirical software engineering. Volume II: short papers and posters, pp 18–20

  • Shang W, Jiang ZM, Adams B, Hassan AE (2009) Mapreduce as a general framework to support research in mining software repositories (MSR). In: MSR ’09: proceedings of the fourth international workshop on mining software repositories, p 10

  • Shihab E, Ihara A, Kamei Y, Ibrahim WM, Ohira M, Adams B, Hassan AE, Matsumoto K (2010) Predicting re-opened bugs: a case study on the eclipse project. In: WCRE’10: proceedings of the 17th working conference on reverse engineering, pp 249–258

  • Shihab E, Mockus A, Kamei Y, Adams B, Hassan AE (2011) High-impact defects: a study of breakage and surprise defects. In: ESEC/FSE ’11: proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, pp 300–310

  • Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: MSR ’05: proceedings of the 2005 international workshop on mining software repositories, pp 1–5

  • Weiss C, Premraj R, Zimmermann T, Zeller A (2007) How long will it take to fix this bug? In: MSR ’07: proceedings of the fourth international workshop on mining software repositories, p 1

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Publishers Inc

  • Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: PROMISE ’07: proceedings of the third international workshop on predictor models in software engineering, p 9

  • Zimmermann T, Nagappan N, Guo PJ, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: ICSE ’12: proceedings of the 34th international conference on software engineering, pp 495–504

Download references

Acknowledgements

This research is being conducted as a part of the Next Generation IT Program and Grant-in-aid for Young Scientists (B), 22700033, 2010 by the Ministry of Education, Culture, Sports, Science and Technology, Japan. In addition, it is supported in part by research grants from the Natural Science and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emad Shihab.

Additional information

Editors: Giuliano Antoniol and Martin Pinzger

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shihab, E., Ihara, A., Kamei, Y. et al. Studying re-opened bugs in open source software. Empir Software Eng 18, 1005–1042 (2013). https://doi.org/10.1007/s10664-012-9228-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-012-9228-6

Keywords

Navigation