An Analysis of Software Bug Reports Using Machine Learning Techniques

Tran, Ha Manh; Le, Son Thanh; Nguyen, Sinh Van; Ho, Phong Thanh

doi:10.1007/s42979-019-0004-1

An Analysis of Software Bug Reports Using Machine Learning Techniques

Original Research
Published: 29 June 2019

Volume 1, article number 4, (2020)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Ha Manh Tran¹,
Son Thanh Le²,
Sinh Van Nguyen² &
…
Phong Thanh Ho¹

2548 Accesses
15 Citations
Explore all metrics

A Publisher Correction to this article was published on 28 September 2023

This article has been updated

Abstract

Bug tracking systems manage bug reports for assuring the quality of software products. A bug report (alsoreferred as trouble, problem, ticket or defect) contains several features for problem management and resolution purposes. Severity and priority are two essential features of a bug report that define the effect level and fixing order of the bug. Determining these features is challenging and depends heavily on human being, e.g., software developers or system operators, especially for assessing a large number of error and warning events occurring on software products or network services. This study first proposes a comparison of machine learning techniques for assessing severity and priority for software bug reports and then chooses an approach of using optimal decision trees, or random forest, for further investigation. This approach aims at constructing multiple decision trees based on the subsets of the existing bug dataset and features, and then selecting the best decision trees to assess the severity and priority of new bugs. The approach can be applied for detecting and forecasting faults in large, complex communication networks and distributed systems today. We have presented the applicability of random forest for bug report analysis and performed several experiments on software bug datasets obtained from open source bug tracking systems. Random forest yields an average accuracy score of 0.75 that can be sufficient for assisting system operators in determining these features. We have provided some analysis of the experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

An Analysis of Software Bug Reports Using Random Forest

Machine Learning-Based Methods for Identifying Bug Severity Level from Bug Reports

A Boosted Random Forest Algorithm for Automated Bug Classification

Change history

28 September 2023
A Correction to this paper has been published: https://doi.org/10.1007/s42979-023-02168-3

References

Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M. A view of cloud computing. ACM Commun. 2010;53(4):50–8.
Article Google Scholar
Tran HM, Lange C, Chulkov G, Schönwälder J, Kohlhase M. Applying semantic techniques to search and analyze bug tracking data. J Netw Syst Manag. 2009;17(3):285–308.
Article Google Scholar
Breiman L. Random Forests. Mach Learn. 2001;45(1):5–32.
Article Google Scholar
Wang T, Zhang W, Wei J, Zhong H. Fault detection for cloud computing systems with correlation analysis. In: Proceedings of IFIP/IEEE international symposium on integrated network management IM’15; 2015. p. 652–8.
Ferreira VC, Carrano RC, Silva JO, Albuquerque CVN, Muchaluat-Saade DC, Passos DG. Fault detection and diagnosis for solar-powered wireless mesh networks using machine learning. In: Proceedings of IFIP/IEEE symposium on integrated network and service management (IM’17); 2017. p. 456–62.
Duenas JC, Navarro JM, Parada HA, Andion J, Cuadrado F. Applying event stream processing to network online failure prediction. Commun Mag. 2018;56(1):166–70.
Article Google Scholar
Tan JS, Ho CK, Lim AH, Ramly MR. Predicting network faults using Random Forest and C5.0. Int J Eng Technol. 2018;7(2.14):93–6.
Article Google Scholar
Tran HM, Le ST. Software bug ontology supporting semantic bug search on peer-to-peer networks. New Gen Comput. 2014;32(2):145–62.
Article Google Scholar
Tran HM, Schönwälder J. Discaria—distributed case-based reasoning system for fault management. IEEE Trans Netw Serv Manag. 2015;12(4):540–53.
Article Google Scholar
Hausheer D, Morariu C. Distributed Test-Lab: EMANICSLab. In: The 2nd international summer school on network and service management (ISSNSM ’08). Switzerland: University of Zurich; 2008.
Sinnamon RM, Andrews JD. Fault tree analysis and binary decision diagrams. In: Proceedings in reliability and maintainability annual symposium; 1996. p. 215–22.
Reay KA, Andrews JD. A fault tree analysis strategy using binary decision diagrams. Reliab Eng Syst Saf. 2002;78(1):45–56.
Article Google Scholar
Guo L, Ma Y, Cukic B, Singh H. Robust prediction of fault-proneness by Random Forests. In: Proceedings of 15th international symposium on software reliability engineering (ISSRE’04). Washington, DC: IEEE; 2004. p. 417–28.
Francis P, Leon D, Minch M, Podgurski A. Tree-based methods for classifying software failures. In: Proceedings of 15th international symposium on software reliability engineering (ISSRE’04). Washington, DC: IEEE; 2004. p. 451–62.
Zheng AX, Lloyd J, Brewer E. Failure diagnosis using decision trees. In: Proceedings of 1st international conference on autonomic computing (ICAC’04). Washington, DC: IEEE Computer Society; 2004. p. 36–43.
Quinlan JR. C4.5: programs for machine learning. San Francisco: Morgan Kaufmann Publishers; 1993.
Google Scholar
Tran HM, Nguyen SV, Le ST, Vu QT. Applying data analytic techniques for fault detection. Trans Large Scale Data Knowl Cent Syst (TLDKS). 2017;31:30–46.
Google Scholar
Tran HM, Nguyen SV, Ha SVU, Le TQ. An analysis of software bug reports using Random Forest. In: Proceedings of 5th international conference on future data and security engineering (FDSE’18). Springer; 2018. p. 1–13.
Bishop CM. Neural networks for pattern recognition. New York: Oxford University Press Inc; 1995.
MATH Google Scholar
Aha DW, Kibler D, Albert MK. Instance-based learning algorithms. Mach Learn. 1991;6(1):37–66.
Google Scholar
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.
MATH Google Scholar
Rish I. An empirical study of the Naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3(22). 2001. p. 41–6.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
MathSciNet MATH Google Scholar
Oliphant T. A guide to NumPy, vol. 1. New York: Trelgol Publishing; 2006.
Google Scholar
Silva FB. Learning SciPy for numerical and scientific computing. Birmingham: Packt Publishing; 2013.
Google Scholar
Mozilla bug tracking system. https://bugzilla.mozilla.org/. Accessed Aug 2017.
Launchpad bugs. https://bugs.launchpad.net/. Accessed Aug 2017.
Mantis bug tracker. https://www.mantisbt.org/. Accessed Aug 2017.
Debian bug tracking system. https://www.debian.org/Bugs/. Accessed Aug 2017.
OpenStack Cloud Software. http://www.openstack.org/ (2010). Accessed Aug 2017.

Download references

Acknowledgements

This research activity is funded by the Vietnam National University in Ho Chi Minh City (VNU-HCM) under the Grant number C2019-28-06.

Author information

Authors and Affiliations

HongBang International University, 215 Dien Bien Phu, Ward 15, Binh Thanh District, Ho Chi Minh City, Vietnam
Ha Manh Tran & Phong Thanh Ho
International University, Vietnam National University, Block 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Vietnam
Son Thanh Le & Sinh Van Nguyen

Authors

Ha Manh Tran
View author publications
You can also search for this author in PubMed Google Scholar
Son Thanh Le
View author publications
You can also search for this author in PubMed Google Scholar
Sinh Van Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Phong Thanh Ho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ha Manh Tran.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Future Data and Security Engineering” guest edited by Tran Khanh Dang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tran, H.M., Le, S.T., Nguyen, S.V. et al. An Analysis of Software Bug Reports Using Machine Learning Techniques. SN COMPUT. SCI. 1, 4 (2020). https://doi.org/10.1007/s42979-019-0004-1

Download citation

Received: 15 April 2019
Accepted: 10 June 2019
Published: 29 June 2019
DOI: https://doi.org/10.1007/s42979-019-0004-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Analysis of Software Bug Reports Using Machine Learning Techniques

Abstract

Access this article

Similar content being viewed by others

An Analysis of Software Bug Reports Using Random Forest

Machine Learning-Based Methods for Identifying Bug Severity Level from Bug Reports

A Boosted Random Forest Algorithm for Automated Bug Classification

Change history

28 September 2023

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Analysis of Software Bug Reports Using Machine Learning Techniques

Abstract

Access this article

Similar content being viewed by others

An Analysis of Software Bug Reports Using Random Forest

Machine Learning-Based Methods for Identifying Bug Severity Level from Bug Reports

A Boosted Random Forest Algorithm for Automated Bug Classification

Change history

28 September 2023

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation