Skip to main content
Log in

An Analysis of Software Bug Reports Using Machine Learning Techniques

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

A Publisher Correction to this article was published on 28 September 2023

This article has been updated

Abstract

Bug tracking systems manage bug reports for assuring the quality of software products. A bug report (alsoreferred as trouble, problem, ticket or defect) contains several features for problem management and resolution purposes. Severity and priority are two essential features of a bug report that define the effect level and fixing order of the bug. Determining these features is challenging and depends heavily on human being, e.g., software developers or system operators, especially for assessing a large number of error and warning events occurring on software products or network services. This study first proposes a comparison of machine learning techniques for assessing severity and priority for software bug reports and then chooses an approach of using optimal decision trees, or random forest, for further investigation. This approach aims at constructing multiple decision trees based on the subsets of the existing bug dataset and features, and then selecting the best decision trees to assess the severity and priority of new bugs. The approach can be applied for detecting and forecasting faults in large, complex communication networks and distributed systems today. We have presented the applicability of random forest for bug report analysis and performed several experiments on software bug datasets obtained from open source bug tracking systems. Random forest yields an average accuracy score of 0.75 that can be sufficient for assisting system operators in determining these features. We have provided some analysis of the experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Change history

References

  1. Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M. A view of cloud computing. ACM Commun. 2010;53(4):50–8.

    Article  Google Scholar 

  2. Tran HM, Lange C, Chulkov G, Schönwälder J, Kohlhase M. Applying semantic techniques to search and analyze bug tracking data. J Netw Syst Manag. 2009;17(3):285–308.

    Article  Google Scholar 

  3. Breiman L. Random Forests. Mach Learn. 2001;45(1):5–32.

    Article  Google Scholar 

  4. Wang T, Zhang W, Wei J, Zhong H. Fault detection for cloud computing systems with correlation analysis. In: Proceedings of IFIP/IEEE international symposium on integrated network management IM’15; 2015. p. 652–8.

  5. Ferreira VC, Carrano RC, Silva JO, Albuquerque CVN, Muchaluat-Saade DC, Passos DG. Fault detection and diagnosis for solar-powered wireless mesh networks using machine learning. In: Proceedings of IFIP/IEEE symposium on integrated network and service management (IM’17); 2017. p. 456–62.

  6. Duenas JC, Navarro JM, Parada HA, Andion J, Cuadrado F. Applying event stream processing to network online failure prediction. Commun Mag. 2018;56(1):166–70.

    Article  Google Scholar 

  7. Tan JS, Ho CK, Lim AH, Ramly MR. Predicting network faults using Random Forest and C5.0. Int J Eng Technol. 2018;7(2.14):93–6.

    Article  Google Scholar 

  8. Tran HM, Le ST. Software bug ontology supporting semantic bug search on peer-to-peer networks. New Gen Comput. 2014;32(2):145–62.

    Article  Google Scholar 

  9. Tran HM, Schönwälder J. Discaria—distributed case-based reasoning system for fault management. IEEE Trans Netw Serv Manag. 2015;12(4):540–53.

    Article  Google Scholar 

  10. Hausheer D, Morariu C. Distributed Test-Lab: EMANICSLab. In: The 2nd international summer school on network and service management (ISSNSM ’08). Switzerland: University of Zurich; 2008.

  11. Sinnamon RM, Andrews JD. Fault tree analysis and binary decision diagrams. In: Proceedings in reliability and maintainability annual symposium; 1996. p. 215–22.

  12. Reay KA, Andrews JD. A fault tree analysis strategy using binary decision diagrams. Reliab Eng Syst Saf. 2002;78(1):45–56.

    Article  Google Scholar 

  13. Guo L, Ma Y, Cukic B, Singh H. Robust prediction of fault-proneness by Random Forests. In: Proceedings of 15th international symposium on software reliability engineering (ISSRE’04). Washington, DC: IEEE; 2004. p. 417–28.

  14. Francis P, Leon D, Minch M, Podgurski A. Tree-based methods for classifying software failures. In: Proceedings of 15th international symposium on software reliability engineering (ISSRE’04). Washington, DC: IEEE; 2004. p. 451–62.

  15. Zheng AX, Lloyd J, Brewer E. Failure diagnosis using decision trees. In: Proceedings of 1st international conference on autonomic computing (ICAC’04). Washington, DC: IEEE Computer Society; 2004. p. 36–43.

  16. Quinlan JR. C4.5: programs for machine learning. San Francisco: Morgan Kaufmann Publishers; 1993.

    Google Scholar 

  17. Tran HM, Nguyen SV, Le ST, Vu QT. Applying data analytic techniques for fault detection. Trans Large Scale Data Knowl Cent Syst (TLDKS). 2017;31:30–46.

    Google Scholar 

  18. Tran HM, Nguyen SV, Ha SVU, Le TQ. An analysis of software bug reports using Random Forest. In: Proceedings of 5th international conference on future data and security engineering (FDSE’18). Springer; 2018. p. 1–13.

  19. Bishop CM. Neural networks for pattern recognition. New York: Oxford University Press Inc; 1995.

    MATH  Google Scholar 

  20. Aha DW, Kibler D, Albert MK. Instance-based learning algorithms. Mach Learn. 1991;6(1):37–66.

    Google Scholar 

  21. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.

    MATH  Google Scholar 

  22. Rish I. An empirical study of the Naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3(22). 2001. p. 41–6.

  23. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

    MathSciNet  MATH  Google Scholar 

  24. Oliphant T. A guide to NumPy, vol. 1. New York: Trelgol Publishing; 2006.

    Google Scholar 

  25. Silva FB. Learning SciPy for numerical and scientific computing. Birmingham: Packt Publishing; 2013.

    Google Scholar 

  26. Mozilla bug tracking system. https://bugzilla.mozilla.org/. Accessed Aug 2017.

  27. Launchpad bugs. https://bugs.launchpad.net/. Accessed Aug 2017.

  28. Mantis bug tracker. https://www.mantisbt.org/. Accessed Aug 2017.

  29. Debian bug tracking system. https://www.debian.org/Bugs/. Accessed Aug 2017.

  30. OpenStack Cloud Software. http://www.openstack.org/ (2010). Accessed Aug 2017.

Download references

Acknowledgements

This research activity is funded by the Vietnam National University in Ho Chi Minh City (VNU-HCM) under the Grant number C2019-28-06.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ha Manh Tran.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Future Data and Security Engineering” guest edited by Tran Khanh Dang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tran, H.M., Le, S.T., Nguyen, S.V. et al. An Analysis of Software Bug Reports Using Machine Learning Techniques. SN COMPUT. SCI. 1, 4 (2020). https://doi.org/10.1007/s42979-019-0004-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-019-0004-1

Keywords

Navigation