Abstract
One of the most significant challenges of defect reporting is how to compute and predict the testing metrics. Any software development project needs certain suitable testing metrics. Cluster analysis can be used to generate them. The interpretation of the received clusters helps to determine explicit and implicit characteristics of software testing and development.
This paper describes several software solutions for clustering bug reports. We have extracted bug reports related to three open-source JBOSS projects and experimented using that data. Our experiments demonstrate that effective results can be achieved in the area of defect clustering. We provide the results achieved by using two clustering algorithms: k-means and EM. Our research shows that the usage of the EM algorithm generates more detailed information about the specifics of the project than the usage of the k-means algorithm. So, EM gives a possibility to create more diverse testing metrics suitable for project needs.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ahmed, A., Ghazali, R.: An improved self-organizing map for bugs data clustering. In: 2016 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), pp. 135–140. IEEE (2016)
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Parzen, E., Tanabe, K., Kitagawa, G. (eds.) Selected Papers of Hirotugu Akaike. Springer Series in Statistics (Perspectives in Statistics), pp. 199–213. Springer, New York (1998). https://doi.org/10.1007/978-1-4612-1694-0_15
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)
Desgraupes, B.: Clustering indices. Univ. Paris Ouest-Lab Modal’X 1, 34 (2013)
Fry, Z.P., Weimer, W.: Clustering static analysis defect reports to reduce maintenance costs. In: 2013 20th Working Conference on Reverse Engineering (WCRE), pp. 282–291. IEEE (2013)
Giger, E., Pinzger, M., Gall, H.: Predicting the fix time of bugs. In: Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, pp. 52–56 (2010)
Gromova, A.: Defect report classification in accordance with areas of testing. In: Itsykson, V., Scedrov, A., Zakharov, V. (eds.) TMPA 2017. CCIS, vol. 779, pp. 38–50. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-71734-0_4
Gromova, A.: Using cluster analysis for characteristics detection in software defect reports. In: van der Aalst, W.M.P., et al. (eds.) AIST 2017. LNCS, vol. 10716, pp. 152–163. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73013-4_14
Gromova, A.: Defect clustering. Github repository (2019). https://github.com/AnkaGromova/DefectClustering/. Accessed August 2019
Guo, P.J., Zimmermann, T., Nagappan, N., Murphy, B.: Characterizing and predicting which bugs get fixed: an empirical study of Microsoft windows. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 1, pp. 495–504 (2010)
Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. JSTOR Appl. Stat. 28(1), 100–108 (1979)
Hooimeijer, P., Weimer, W.: Modeling bug report quality. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, pp. 34–43 (2007)
Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: open source scientific tools for Python (2001). http://www.scipy.org/
Lamkanfi, A., Demeyer, S., Soetens, Q.D., Verdonck, T.: Comparing mining algorithms for predicting the severity of a reported bug. In: 2011 15th European Conference on Software Maintenance and Reengineering, pp. 249–258. IEEE (2011)
Limsettho, N., Hata, H., Monden, A., Matsumoto, K.: Automatic unsupervised bug report categorization. In: 2014 6th International Workshop on Empirical Software Engineering in Practice, pp. 7–12. IEEE (2014)
Marks, L., Zou, Y., Hassan, A.E.: Studying the fix-time for bugs in large open source projects. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering, pp. 1–8 (2011)
McKinney, W., et al.: Data structures for statistical computing in Python. In: Proceedings of the 9th Python in Science Conference, Austin, TX, vol. 445, pp. 51–56 (2010)
Minh, P.N.: An approach to detecting duplicate bug reports using n-gram features and cluster chrinkage technique. Int. J. Sci. Res. Publ. (IJSRP) 4(5), 89–100 (2014)
Nagwani, N.K., Bhansali, A.: A data mining model to predict software bug complexity using bug estimation and clustering. In: 2010 International Conference on Recent Trends in Information, Telecommunication and Computing, pp. 13–17. IEEE (2010)
Oliphant, T.E.: A Guide to NumPy, vol. 1. Trelgol Publishing USA, Austin (2006)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Raschka, S.: Python Machine Learning. Packt Publishing Ltd., Birmingham (2015)
Red Hat: Application Server 7 getting started guide. https://docs.jboss.org/author/display/AS7. Accessed 21 Aug 2019
Red Hat: Product Documentation for JBoss Enterprise Application Platform 6. https://access.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/6/. Accessed 21 Aug 2019
Red Hat: Product Documentation for JBoss Enterprise SOA Platform 5. https://access.redhat.com/documentation/en-us/jboss_enterprise_soa_platform/5/. Accessed 21 Aug 2019
Red Hat: System Dashboard. https://issues.jboss.org/secure/Dashboard.jspa. Accessed 21 Aug 2019
Rus, V., Nan, X., Shiva, S.G., Chen, Y.: Clustering of defect reports using graph partitioning algorithms. In: SEKE, pp. 442–445 (2009)
Sabor, K.K., Hamdaqa, M., Hamou-Lhadj, A.: Automatic prediction of the severity of bugs using stack traces. In: Proceedings of the 26th Annual International Conference on Computer Science and Software Engineering, pp. 96–105 (2016)
Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Shihab, E., et al.: Predicting re-opened bugs: a case study on the eclipse project. In: 2010 17th Working Conference on Reverse Engineering, pp. 249–258. IEEE (2010)
Strate, J.D., Laplante, P.A.: A literature review of research in software defect reporting. IEEE Trans. Reliab. 62(2), 444–454 (2013)
Weiss, C., Premraj, R., Zimmermann, T., Zeller, A.: How long will it take to fix this bug? In: Fourth International Workshop on Mining Software Repositories (MSR 2007: ICSE Workshops 2007), p. 1. IEEE (2007)
Yang, X.L., Lo, D., Xia, X., Huang, Q., Sun, J.L.: High-impact bug report identification with imbalanced learning strategies. J. Comput. Sci. Technol. 32(1), 181–198 (2017)
Zimmermann, T., Nagappan, N., Guo, P.J., Murphy, B.: Characterizing and predicting which bugs get reopened. In: 2012 34th International Conference on Software Engineering (ICSE), pp. 1074–1083. IEEE (2012)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Gromova, A., Itkin, I., Pavlov, S. (2021). Generation of Testing Metrics by Using Cluster Analysis of Bug Reports. In: Kalenkova, A., Lozano, J.A., Yavorskiy, R. (eds) Tools and Methods of Program Analysis. TMPA 2019. Communications in Computer and Information Science, vol 1288. Springer, Cham. https://doi.org/10.1007/978-3-030-71472-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-71472-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71471-0
Online ISBN: 978-3-030-71472-7
eBook Packages: Computer ScienceComputer Science (R0)