Generation of Testing Metrics by Using Cluster Analysis of Bug Reports

Gromova, Anna; Itkin, Iosif; Pavlov, Sergey

doi:10.1007/978-3-030-71472-7_14

Generation of Testing Metrics by Using Cluster Analysis of Bug Reports

Anna Gromova⁸,
Iosif Itkin⁹ &
Sergey Pavlov⁸

Conference paper
First Online: 17 March 2021

321 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1288))

Abstract

One of the most significant challenges of defect reporting is how to compute and predict the testing metrics. Any software development project needs certain suitable testing metrics. Cluster analysis can be used to generate them. The interpretation of the received clusters helps to determine explicit and implicit characteristics of software testing and development.

This paper describes several software solutions for clustering bug reports. We have extracted bug reports related to three open-source JBOSS projects and experimented using that data. Our experiments demonstrate that effective results can be achieved in the area of defect clustering. We provide the results achieved by using two clustering algorithms: k-means and EM. Our research shows that the usage of the EM algorithm generates more detailed information about the specifics of the project than the usage of the k-means algorithm. So, EM gives a possibility to create more diverse testing metrics suitable for project needs.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ahmed, A., Ghazali, R.: An improved self-organizing map for bugs data clustering. In: 2016 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), pp. 135–140. IEEE (2016)
Google Scholar
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Parzen, E., Tanabe, K., Kitagawa, G. (eds.) Selected Papers of Hirotugu Akaike. Springer Series in Statistics (Perspectives in Statistics), pp. 199–213. Springer, New York (1998). https://doi.org/10.1007/978-1-4612-1694-0_15
Chapter Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)
MathSciNet MATH Google Scholar
Desgraupes, B.: Clustering indices. Univ. Paris Ouest-Lab Modal’X 1, 34 (2013)
Google Scholar
Fry, Z.P., Weimer, W.: Clustering static analysis defect reports to reduce maintenance costs. In: 2013 20th Working Conference on Reverse Engineering (WCRE), pp. 282–291. IEEE (2013)
Google Scholar
Giger, E., Pinzger, M., Gall, H.: Predicting the fix time of bugs. In: Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, pp. 52–56 (2010)
Google Scholar
Gromova, A.: Defect report classification in accordance with areas of testing. In: Itsykson, V., Scedrov, A., Zakharov, V. (eds.) TMPA 2017. CCIS, vol. 779, pp. 38–50. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-71734-0_4
Chapter Google Scholar
Gromova, A.: Using cluster analysis for characteristics detection in software defect reports. In: van der Aalst, W.M.P., et al. (eds.) AIST 2017. LNCS, vol. 10716, pp. 152–163. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73013-4_14
Chapter Google Scholar
Gromova, A.: Defect clustering. Github repository (2019). https://github.com/AnkaGromova/DefectClustering/. Accessed August 2019
Guo, P.J., Zimmermann, T., Nagappan, N., Murphy, B.: Characterizing and predicting which bugs get fixed: an empirical study of Microsoft windows. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 1, pp. 495–504 (2010)
Google Scholar
Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. JSTOR Appl. Stat. 28(1), 100–108 (1979)
Article Google Scholar
Hooimeijer, P., Weimer, W.: Modeling bug report quality. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, pp. 34–43 (2007)
Google Scholar
Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: open source scientific tools for Python (2001). http://www.scipy.org/
Lamkanfi, A., Demeyer, S., Soetens, Q.D., Verdonck, T.: Comparing mining algorithms for predicting the severity of a reported bug. In: 2011 15th European Conference on Software Maintenance and Reengineering, pp. 249–258. IEEE (2011)
Google Scholar
Limsettho, N., Hata, H., Monden, A., Matsumoto, K.: Automatic unsupervised bug report categorization. In: 2014 6th International Workshop on Empirical Software Engineering in Practice, pp. 7–12. IEEE (2014)
Google Scholar
Marks, L., Zou, Y., Hassan, A.E.: Studying the fix-time for bugs in large open source projects. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering, pp. 1–8 (2011)
Google Scholar
McKinney, W., et al.: Data structures for statistical computing in Python. In: Proceedings of the 9th Python in Science Conference, Austin, TX, vol. 445, pp. 51–56 (2010)
Google Scholar
Minh, P.N.: An approach to detecting duplicate bug reports using n-gram features and cluster chrinkage technique. Int. J. Sci. Res. Publ. (IJSRP) 4(5), 89–100 (2014)
Google Scholar
Nagwani, N.K., Bhansali, A.: A data mining model to predict software bug complexity using bug estimation and clustering. In: 2010 International Conference on Recent Trends in Information, Telecommunication and Computing, pp. 13–17. IEEE (2010)
Google Scholar
Oliphant, T.E.: A Guide to NumPy, vol. 1. Trelgol Publishing USA, Austin (2006)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Raschka, S.: Python Machine Learning. Packt Publishing Ltd., Birmingham (2015)
Google Scholar
Red Hat: Application Server 7 getting started guide. https://docs.jboss.org/author/display/AS7. Accessed 21 Aug 2019
Red Hat: Product Documentation for JBoss Enterprise Application Platform 6. https://access.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/6/. Accessed 21 Aug 2019
Red Hat: Product Documentation for JBoss Enterprise SOA Platform 5. https://access.redhat.com/documentation/en-us/jboss_enterprise_soa_platform/5/. Accessed 21 Aug 2019
Red Hat: System Dashboard. https://issues.jboss.org/secure/Dashboard.jspa. Accessed 21 Aug 2019
Rus, V., Nan, X., Shiva, S.G., Chen, Y.: Clustering of defect reports using graph partitioning algorithms. In: SEKE, pp. 442–445 (2009)
Google Scholar
Sabor, K.K., Hamdaqa, M., Hamou-Lhadj, A.: Automatic prediction of the severity of bugs using stack traces. In: Proceedings of the 26th Annual International Conference on Computer Science and Software Engineering, pp. 96–105 (2016)
Google Scholar
Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Article MathSciNet Google Scholar
Shihab, E., et al.: Predicting re-opened bugs: a case study on the eclipse project. In: 2010 17th Working Conference on Reverse Engineering, pp. 249–258. IEEE (2010)
Google Scholar
Strate, J.D., Laplante, P.A.: A literature review of research in software defect reporting. IEEE Trans. Reliab. 62(2), 444–454 (2013)
Article Google Scholar
Weiss, C., Premraj, R., Zimmermann, T., Zeller, A.: How long will it take to fix this bug? In: Fourth International Workshop on Mining Software Repositories (MSR 2007: ICSE Workshops 2007), p. 1. IEEE (2007)
Google Scholar
Yang, X.L., Lo, D., Xia, X., Huang, Q., Sun, J.L.: High-impact bug report identification with imbalanced learning strategies. J. Comput. Sci. Technol. 32(1), 181–198 (2017)
Article Google Scholar
Zimmermann, T., Nagappan, N., Guo, P.J., Murphy, B.: Characterizing and predicting which bugs get reopened. In: 2012 34th International Conference on Software Engineering (ICSE), pp. 1074–1083. IEEE (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Exactpro Systems, 2nd Yuzhnoportovy Projezd 20A Str. 4, Moscow, 115088, Russia
Anna Gromova & Sergey Pavlov
Exactpro Systems, Suite 3.02, St Clements House, 27 Clements Lane, London, EC4N 7AE, UK
Iosif Itkin

Authors

Anna Gromova
View author publications
You can also search for this author in PubMed Google Scholar
Iosif Itkin
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Pavlov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Anna Gromova , Iosif Itkin or Sergey Pavlov .

Editor information

Editors and Affiliations

University of Melbourne, Melbourne, VIC, Australia
Anna Kalenkova
Intelligent Systems Group, UPV/EHU, Donostia, Spain
Jose A. Lozano
Tomsk Polytechnic University, Tomsk, Russia
Rostislav Yavorskiy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gromova, A., Itkin, I., Pavlov, S. (2021). Generation of Testing Metrics by Using Cluster Analysis of Bug Reports. In: Kalenkova, A., Lozano, J.A., Yavorskiy, R. (eds) Tools and Methods of Program Analysis. TMPA 2019. Communications in Computer and Information Science, vol 1288. Springer, Cham. https://doi.org/10.1007/978-3-030-71472-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-71472-7_14
Published: 17 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71471-0
Online ISBN: 978-3-030-71472-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics