Skip to main content

Generation of Testing Metrics by Using Cluster Analysis of Bug Reports

  • Conference paper
  • First Online:
  • 321 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1288))

Abstract

One of the most significant challenges of defect reporting is how to compute and predict the testing metrics. Any software development project needs certain suitable testing metrics. Cluster analysis can be used to generate them. The interpretation of the received clusters helps to determine explicit and implicit characteristics of software testing and development.

This paper describes several software solutions for clustering bug reports. We have extracted bug reports related to three open-source JBOSS projects and experimented using that data. Our experiments demonstrate that effective results can be achieved in the area of defect clustering. We provide the results achieved by using two clustering algorithms: k-means and EM. Our research shows that the usage of the EM algorithm generates more detailed information about the specifics of the project than the usage of the k-means algorithm. So, EM gives a possibility to create more diverse testing metrics suitable for project needs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ahmed, A., Ghazali, R.: An improved self-organizing map for bugs data clustering. In: 2016 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), pp. 135–140. IEEE (2016)

    Google Scholar 

  2. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Parzen, E., Tanabe, K., Kitagawa, G. (eds.) Selected Papers of Hirotugu Akaike. Springer Series in Statistics (Perspectives in Statistics), pp. 199–213. Springer, New York (1998). https://doi.org/10.1007/978-1-4612-1694-0_15

    Chapter  Google Scholar 

  3. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  4. Desgraupes, B.: Clustering indices. Univ. Paris Ouest-Lab Modal’X 1, 34 (2013)

    Google Scholar 

  5. Fry, Z.P., Weimer, W.: Clustering static analysis defect reports to reduce maintenance costs. In: 2013 20th Working Conference on Reverse Engineering (WCRE), pp. 282–291. IEEE (2013)

    Google Scholar 

  6. Giger, E., Pinzger, M., Gall, H.: Predicting the fix time of bugs. In: Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, pp. 52–56 (2010)

    Google Scholar 

  7. Gromova, A.: Defect report classification in accordance with areas of testing. In: Itsykson, V., Scedrov, A., Zakharov, V. (eds.) TMPA 2017. CCIS, vol. 779, pp. 38–50. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-71734-0_4

    Chapter  Google Scholar 

  8. Gromova, A.: Using cluster analysis for characteristics detection in software defect reports. In: van der Aalst, W.M.P., et al. (eds.) AIST 2017. LNCS, vol. 10716, pp. 152–163. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73013-4_14

    Chapter  Google Scholar 

  9. Gromova, A.: Defect clustering. Github repository (2019). https://github.com/AnkaGromova/DefectClustering/. Accessed August 2019

  10. Guo, P.J., Zimmermann, T., Nagappan, N., Murphy, B.: Characterizing and predicting which bugs get fixed: an empirical study of Microsoft windows. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 1, pp. 495–504 (2010)

    Google Scholar 

  11. Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. JSTOR Appl. Stat. 28(1), 100–108 (1979)

    Article  Google Scholar 

  12. Hooimeijer, P., Weimer, W.: Modeling bug report quality. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, pp. 34–43 (2007)

    Google Scholar 

  13. Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: open source scientific tools for Python (2001). http://www.scipy.org/

  14. Lamkanfi, A., Demeyer, S., Soetens, Q.D., Verdonck, T.: Comparing mining algorithms for predicting the severity of a reported bug. In: 2011 15th European Conference on Software Maintenance and Reengineering, pp. 249–258. IEEE (2011)

    Google Scholar 

  15. Limsettho, N., Hata, H., Monden, A., Matsumoto, K.: Automatic unsupervised bug report categorization. In: 2014 6th International Workshop on Empirical Software Engineering in Practice, pp. 7–12. IEEE (2014)

    Google Scholar 

  16. Marks, L., Zou, Y., Hassan, A.E.: Studying the fix-time for bugs in large open source projects. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering, pp. 1–8 (2011)

    Google Scholar 

  17. McKinney, W., et al.: Data structures for statistical computing in Python. In: Proceedings of the 9th Python in Science Conference, Austin, TX, vol. 445, pp. 51–56 (2010)

    Google Scholar 

  18. Minh, P.N.: An approach to detecting duplicate bug reports using n-gram features and cluster chrinkage technique. Int. J. Sci. Res. Publ. (IJSRP) 4(5), 89–100 (2014)

    Google Scholar 

  19. Nagwani, N.K., Bhansali, A.: A data mining model to predict software bug complexity using bug estimation and clustering. In: 2010 International Conference on Recent Trends in Information, Telecommunication and Computing, pp. 13–17. IEEE (2010)

    Google Scholar 

  20. Oliphant, T.E.: A Guide to NumPy, vol. 1. Trelgol Publishing USA, Austin (2006)

    Google Scholar 

  21. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  22. Raschka, S.: Python Machine Learning. Packt Publishing Ltd., Birmingham (2015)

    Google Scholar 

  23. Red Hat: Application Server 7 getting started guide. https://docs.jboss.org/author/display/AS7. Accessed 21 Aug 2019

  24. Red Hat: Product Documentation for JBoss Enterprise Application Platform 6. https://access.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/6/. Accessed 21 Aug 2019

  25. Red Hat: Product Documentation for JBoss Enterprise SOA Platform 5. https://access.redhat.com/documentation/en-us/jboss_enterprise_soa_platform/5/. Accessed 21 Aug 2019

  26. Red Hat: System Dashboard. https://issues.jboss.org/secure/Dashboard.jspa. Accessed 21 Aug 2019

  27. Rus, V., Nan, X., Shiva, S.G., Chen, Y.: Clustering of defect reports using graph partitioning algorithms. In: SEKE, pp. 442–445 (2009)

    Google Scholar 

  28. Sabor, K.K., Hamdaqa, M., Hamou-Lhadj, A.: Automatic prediction of the severity of bugs using stack traces. In: Proceedings of the 26th Annual International Conference on Computer Science and Software Engineering, pp. 96–105 (2016)

    Google Scholar 

  29. Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MathSciNet  Google Scholar 

  30. Shihab, E., et al.: Predicting re-opened bugs: a case study on the eclipse project. In: 2010 17th Working Conference on Reverse Engineering, pp. 249–258. IEEE (2010)

    Google Scholar 

  31. Strate, J.D., Laplante, P.A.: A literature review of research in software defect reporting. IEEE Trans. Reliab. 62(2), 444–454 (2013)

    Article  Google Scholar 

  32. Weiss, C., Premraj, R., Zimmermann, T., Zeller, A.: How long will it take to fix this bug? In: Fourth International Workshop on Mining Software Repositories (MSR 2007: ICSE Workshops 2007), p. 1. IEEE (2007)

    Google Scholar 

  33. Yang, X.L., Lo, D., Xia, X., Huang, Q., Sun, J.L.: High-impact bug report identification with imbalanced learning strategies. J. Comput. Sci. Technol. 32(1), 181–198 (2017)

    Article  Google Scholar 

  34. Zimmermann, T., Nagappan, N., Guo, P.J., Murphy, B.: Characterizing and predicting which bugs get reopened. In: 2012 34th International Conference on Software Engineering (ICSE), pp. 1074–1083. IEEE (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Anna Gromova , Iosif Itkin or Sergey Pavlov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gromova, A., Itkin, I., Pavlov, S. (2021). Generation of Testing Metrics by Using Cluster Analysis of Bug Reports. In: Kalenkova, A., Lozano, J.A., Yavorskiy, R. (eds) Tools and Methods of Program Analysis. TMPA 2019. Communications in Computer and Information Science, vol 1288. Springer, Cham. https://doi.org/10.1007/978-3-030-71472-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71472-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71471-0

  • Online ISBN: 978-3-030-71472-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics