Studying re-opened bugs in open source software

Shihab, Emad; Ihara, Akinori; Kamei, Yasutaka; Ibrahim, Walid M.; Ohira, Masao; Adams, Bram; Hassan, Ahmed E.; Matsumoto, Ken-ichi

doi:10.1007/s10664-012-9228-6

Studying re-opened bugs in open source software

Published: 20 September 2012

Volume 18, pages 1005–1042, (2013)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Emad Shihab¹,
Akinori Ihara²,
Yasutaka Kamei³,
Walid M. Ibrahim⁴,
Masao Ohira²,
Bram Adams⁵,
Ahmed E. Hassan⁴ &
…
Ken-ichi Matsumoto²

1636 Accesses
91 Citations
Explore all metrics

Abstract

Bug fixing accounts for a large amount of the software maintenance resources. Generally, bugs are reported, fixed, verified and closed. However, in some cases bugs have to be re-opened. Re-opened bugs increase maintenance costs, degrade the overall user-perceived quality of the software and lead to unnecessary rework by busy practitioners. In this paper, we study and predict re-opened bugs through a case study on three large open source projects—namely Eclipse, Apache and OpenOffice. We structure our study along four dimensions: (1) the work habits dimension (e.g., the weekday on which the bug was initially closed), (2) the bug report dimension (e.g., the component in which the bug was found) (3) the bug fix dimension (e.g., the amount of time it took to perform the initial fix) and (4) the team dimension (e.g., the experience of the bug fixer). We build decision trees using the aforementioned factors that aim to predict re-opened bugs. We perform top node analysis to determine which factors are the most important indicators of whether or not a bug will be re-opened. Our study shows that the comment text and last status of the bug when it is initially closed are the most important factors related to whether or not a bug will be re-opened. Using a combination of these dimensions, we can build explainable prediction models that can achieve a precision between 52.1–78.6 % and a recall in the range of 70.5–94.1 % when predicting whether a bug will be re-opened. We find that the factors that best indicate which bugs might be re-opened vary based on the project. The comment text is the most important factor for the Eclipse and OpenOffice projects, while the last status is the most important one for Apache. These factors should be closely examined in order to reduce maintenance cost due to re-opened bugs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Revisiting reopened bugs in open source software systems

Article 25 April 2022

Automatic, high accuracy prediction of reopened bugs

Article 18 September 2014

The role of bug report evolution in reliable fixing estimation

Article 20 September 2022

Notes

For example, in some cases developers discover a bug and know how to fix it, however they create a bug report and assign it to themselves for book-keeping purposes.
For re-opened bugs, we used all the comments posted before the bugs were re-opened.

References

Anbalagan P, Vouk M (2009) “Days of the week” effect in predicting the time taken to fix defects. In: DEFECTS ’09: proceedings of the 2nd international workshop on defects in large software systems, pp 29–30
Androutsopoulos I, Koutsias J, Cb KV, Spyropoulos CD (2000) An experimental comparison of Naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp 160–167
Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: CASCON ’08: proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, pp 304–318
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: ICSE’06: proceedings of the 28th international conference on software engineering, pp 361–370
Aranda J, Venolia G (2009) The secret life of bugs: going past the errors and omissions in software repositories. In: ICSE ’09: proceedings of the 31st international conference on software engineering, pp 298–308
Barandela R, Sánchez JS, García V, Rangel E (2003) Strategies for learning in class imbalance problems. Pattern Recogn 36(3):849–851
Article Google Scholar
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008a) What makes a good bug report? In: SIGSOFT ’08/FSE-16: proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, pp 308–318
Bettenburg N, Premraj R, Zimmermann T, Kim S (2008b) Duplicate bug reports considered harmful really? In: ICSM ’08: proceedings of international conference on software maintenance, pp 337–345
Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009a) Fair and balanced? Bias in bug-fix datasets. In: ESEC/FSE’09: proceedings of the the seventh joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 121–130
Bird C, Nagappan N, Devanbu P, Gall H, Murphy B (2009b) Does distributed development affect software quality? An empirical case study of windows vista. In: ICSE ’09: proceedings of the 31st international conference on software engineering, pp 518–528
Cataldo M, Mockus A, Roberts JA, Herbsleb JD (2009) Software dependencies, work dependencies, and their impact on failures. IEEE Trans Softw Eng 99(6):864–878
Article Google Scholar
Chan P, Stolfo SJ (1998) Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the fourth international conference on knowledge discovery and data mining, pp 164–168
D’Ambros M, Lanza M, Robbes R (2009) On the relationship between change coupling and software defects. Working conference on reverse engineering, pp 135–144
Efron B (1983) Estimating the error rate of a prediction rule: improvement on cross-validation. J Am Stat Assoc 78(382):316–331
Article MathSciNet MATH Google Scholar
Erlikh L (2000) Leveraging legacy system dollars for e-business. IT Prof 2(3):17–23
Article Google Scholar
Estabrooks A, Japkowicz N (2001) A mixture-of-experts framework for learning from imbalanced data sets. In: IDA ’01: proceedings of the 4th international conference on advances in intelligent data analysis, pp 34–43
Eyolfson J, Tan L, Lam P (2008) Do time of day and developer experience affect commit bugginess? In: MSR ’11: proceedings of the 8th working conference on mining software repositories, pp 153–162
Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM’03: proceedings of the international conference on software maintenance, pp 23–32
Freund Y, Schapire RE (1995) A decision-theoretic generalization of online learning and an application to boosting. In: Second European conf. on computational learning theory (EuroCOLT), pp 23–37
Graham P (2002) A plan for spam. http://paulgraham.com/spam.html. Accessed Mar 2012
Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661
Article Google Scholar
Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows. In: ICSE ’10: proceedings of the 32nd ACM/IEEE international conference on software engineering, vol 1, pp 495–504
Hassan AE (2009) Predicting faults using the complexity of code changes. In: ICSE ’09: proceedings of the 2009 IEEE 31st international conference on software engineering, pp 78–88
Hassan AE, Zhang K (2006) Using decision trees to predict the certification result of a build. In: ASE ’06: proceedings of the 21st IEEE/ACM international conference on automated software engineering, pp 189–198
Herraiz I, German DM, Gonzalez-Barahona JM, Robles G (2008) Towards a simplification of the bug report form in eclipse. In: MSR ’08: proceedings of the 2008 international working conference on mining software repositories, pp 145–148
Hewett R, Kijsanayothin P (2009) On modeling software defect repair time. Empir Software Eng 14(2):165–186
Article Google Scholar
Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: ASE ’07: proceedings of the twenty-second IEEE/ACM international conference on automated software engineering, pp 34–43
Ibrahim WM, Bettenburg N, Shihab E, Adams B, Hassan AE (2010) Should I contribute to this discussion? In: MSR ’10: proceedings of the 2010 international working conference on mining software repositories
Jeong G, Kim S, Zimmermann T (2009) Improving bug triage with bug tossing graphs. In: ESEC/FSE ’09: proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pp 111–120
Kim S, Whitehead EJ Jr (2006) How long did it take to fix bugs? In: MSR ’06: proceedings of the 2006 international workshop on mining software repositories, pp 173–174
Kim S, James Whitehead JE, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34:181–196
Article Google Scholar
Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: ICSE ’06: proceedings of the IEEE working conference on mining software repositories, pp 1–10
Lee T, Nam J, Han D, Kim S, In H (2011) Micro interaction metrics for defect prediction. In: ESEC/FSE ’11: proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, pp 311–321
Menzies T, Dekhtyar A, Distefano J, Greenwald J 2007 Problems with precision: a response to comments on data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33:637–640
Article Google Scholar
Meyer TA, Whateley B (2004) SpamBayes: effective open-source, Bayesian based, email classification system. In: Proceedings of the first conference on email and anti-spam
Michelakis E, Androutsopoulos I, Paliouras G, Sakkis G, Stamatopoulos P (2004) Filtron: a learning-based anti-spam filter. In: Proceedings of the 1st conference on email and anti-spam
Mizuno O, Hata H (2010) An integrated approach to detect fault-prone modules using complexity and text feature metrics. In: Proceedings of the 2010 international conference on advances in computer science and information technology, pp 457–568
Mizuno O, Ikami S, Nakaichi S, Kikuno T (2007) Spam filter based approach for finding fault-prone software modules. In: MSR’07: proceedings of the fourth international workshop on mining software repositories, pp 4–8
Mockus A (2010) Organizational volatility and its effects on software defects. In: FSE ’10: proceedings of the eighteenth ACM SIGSOFT international symposium on foundations of software engineering, pp 117–126
Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and mozilla. ACM Trans Softw Eng Methodol 11(3):309–346
Article Google Scholar
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: ICSE ’08: proceedings of the 30th international conference on Software engineering, pp 181–190
Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality: an empirical case study. In: ICSE ’08: proceedings of the 30th international conference on Software engineering, pp 521–530
Panjer LD (2007) Predicting eclipse bug lifetimes. In: MSR ’07: proceedings of the fourth international workshop on mining software repositories, p 29
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc
Sánchez JS, Barandela R, Marqués AI, Alejo R (2001) Performance evaluation of prototype selection algorithms for nearest neighbor classification. In: SIBGRAPI ’01: proceedings of the XIV Brazilian symposium on computer graphics and image processing, p 44
Sayyad J, Lethbridge C (2001) Supporting software maintenance by mining software update records. In: ICSM ’01: proceedings of the IEEE International Conference on Software Maintenance (ICSM’01), p 22
Schröter A, Zimmermann T, Premraj R, Zeller A (2006) If your bug database could talk.... In: ISESE ’06: proceedings of the 5th international symposium on empirical software engineering. Volume II: short papers and posters, pp 18–20
Shang W, Jiang ZM, Adams B, Hassan AE (2009) Mapreduce as a general framework to support research in mining software repositories (MSR). In: MSR ’09: proceedings of the fourth international workshop on mining software repositories, p 10
Shihab E, Ihara A, Kamei Y, Ibrahim WM, Ohira M, Adams B, Hassan AE, Matsumoto K (2010) Predicting re-opened bugs: a case study on the eclipse project. In: WCRE’10: proceedings of the 17th working conference on reverse engineering, pp 249–258
Shihab E, Mockus A, Kamei Y, Adams B, Hassan AE (2011) High-impact defects: a study of breakage and surprise defects. In: ESEC/FSE ’11: proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, pp 300–310
Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: MSR ’05: proceedings of the 2005 international workshop on mining software repositories, pp 1–5
Weiss C, Premraj R, Zimmermann T, Zeller A (2007) How long will it take to fix this bug? In: MSR ’07: proceedings of the fourth international workshop on mining software repositories, p 1
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Publishers Inc
Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: PROMISE ’07: proceedings of the third international workshop on predictor models in software engineering, p 9
Zimmermann T, Nagappan N, Guo PJ, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: ICSE ’12: proceedings of the 34th international conference on software engineering, pp 495–504

Download references

Acknowledgements

This research is being conducted as a part of the Next Generation IT Program and Grant-in-aid for Young Scientists (B), 22700033, 2010 by the Ministry of Education, Culture, Sports, Science and Technology, Japan. In addition, it is supported in part by research grants from the Natural Science and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Department of Software Engineering, Rochester Institute of Technology, Rochester, NY, USA
Emad Shihab
Graduate School of Information Science, Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan
Akinori Ihara, Masao Ohira & Ken-ichi Matsumoto
Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Nishi-ku, Fukuoka, Japan
Yasutaka Kamei
Software Analysis and Intelligence Lab (SAIL), Queen’s University, Kingston, ON, Canada
Walid M. Ibrahim & Ahmed E. Hassan
Lab on Maintenance, Construction and Intelligence of Software (MCIS), École Polytechnique de Montréal, Montréal, QC, Canada
Bram Adams

Authors

Emad Shihab
View author publications
You can also search for this author in PubMed Google Scholar
Akinori Ihara
View author publications
You can also search for this author in PubMed Google Scholar
Yasutaka Kamei
View author publications
You can also search for this author in PubMed Google Scholar
Walid M. Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
Masao Ohira
View author publications
You can also search for this author in PubMed Google Scholar
Bram Adams
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed E. Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Ken-ichi Matsumoto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emad Shihab.

Additional information

Editors: Giuliano Antoniol and Martin Pinzger

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shihab, E., Ihara, A., Kamei, Y. et al. Studying re-opened bugs in open source software. Empir Software Eng 18, 1005–1042 (2013). https://doi.org/10.1007/s10664-012-9228-6

Download citation

Published: 20 September 2012
Issue Date: October 2013
DOI: https://doi.org/10.1007/s10664-012-9228-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Studying re-opened bugs in open source software

Abstract

Access this article

Similar content being viewed by others

Revisiting reopened bugs in open source software systems

Automatic, high accuracy prediction of reopened bugs

The role of bug report evolution in reliable fixing estimation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Studying re-opened bugs in open source software

Abstract

Access this article

Similar content being viewed by others

Revisiting reopened bugs in open source software systems

Automatic, high accuracy prediction of reopened bugs

The role of bug report evolution in reliable fixing estimation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation