Abstract
Bug handling processes aimed at efficient defect resolution provisioning is an important part of a software development lifecycle and usually has a very formal process definition in modern and professional large software development organizations. Improvements of such process may include automated bug assignment, which is a task of selecting a correct development team for further investigations of a bug report. As bug reports contain lots of natural language descriptions, the bug assignment becomes a non-trivial task, especially in testing of large-scale projects or complex systems. This research focuses on natural language preprocessing and vectorization impact on accuracy of bug report assignment based on real data captured in large software development projects. Experimentation results cover stemming and lemmatization techniques applied for bug description preprocessing and term frequency – inverse document frequency (TF-IDF) parametrization as vectorization method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alazzam, I., Aleroud, A., Al Latifah, Z., Karabatis, G.: Automatic bug triage in software systems using graph neighborhood relations for feature augmentation. IEEE Trans. Comput. Soc. Syst. 7(5), 1288–1303 (2020). https://doi.org/10.1109/TCSS.2020.3017501
Behl, D., Handa, S., Arora, A.: A bug mining tool to identify and analyze security bugs using Naive Bayes and TF-IDF. In: 2014 International Conference on Reliability Optimization and Information Technology (ICROIT), pp. 294–299 (2014). https://doi.org/10.1109/ICROIT.2014.6798341
Choquette-Choo, C.A., Sheldon, D., Proppe, J., Alphonso-Gibbs, J., Gupta, H.: A multi-label, dual-output deep neural network for automated bug triaging. In: 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 937–944 (2019). https://doi.org/10.1109/ICMLA.2019.00161
Cubranic, D., Murphy, G.: Automatic bug triage using text categorization. In: SEKE (2004)
Goseva-Popstojanova, K., Tyo, J.: Identification of security related bug reports via text mining using supervised and unsupervised classification. In: 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 344–355 (2018). https://doi.org/10.1109/QRS.2018.00047
Gujral, S., Sharma, G., Sharma, S., Diksha: Classifying bug severity using dictionary based approach. In: 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), pp. 599–602 (2015). https://doi.org/10.1109/ABLAZE.2015.7154933
Jabeen, H.: Stemming and Lemmatization in Python (2018). https://www.datacamp.com/community/tutorials/stemming-lemmatization-python
Jonsson, L.: Machine learning-based bug handling in large-scale software development. Ph.D. thesis, Linköping Studies in Science and Technology (2018)
Kallis, R., Di Sorbo, A., Canfora, G., Panichella, S.: Ticket tagger: machine learning driven issue classification. In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 406–409 (2019). https://doi.org/10.1109/ICSME.2019.00070
Lamkanfi, A., Demeyer, S.: Predicting reassignments of bug reports - an exploratory investigation. In: 2013 17th European Conference on Software Maintenance and Reengineering, pp. 327–330 (2013). https://doi.org/10.1109/CSMR.2013.42
Liu, K., Beng Kuan Tan, H., Zhang, H.: Has this bug been reported? In: 2013 20th Working Conference on Reverse Engineering (WCRE), pp. 82–91 (2013). https://doi.org/10.1109/WCRE.2013.6671283
M. Castelluccio, S.L.: Teaching machines to triage Firefox bugs. https://hacks.mozilla.org/2019/04/teaching-machines-to-triage-firefox-bugs/
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)
Pereira, M., Kumar, A., Cristiansen, S.: Identifying security bug reports based solely on report titles and noisy data. In: 2019 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 39–44 (2019). https://doi.org/10.1109/SMARTCOMP.2019.00026
Roul, R.K., Sahoo, J.K., Arora, K.: Modified TF-IDF term weighting strategies for text categorization. In: 2017 14th IEEE India Council International Conference (INDICON), pp. 1–6 (2017). https://doi.org/10.1109/INDICON.2017.8487593
Tsuruda, A., Manabe, Y., Aritsugi, M.: Can we detect bug report duplication with unfinished bug reports? In: 2015 Asia-Pacific Software Engineering Conference (APSEC), pp. 151–158 (2015). https://doi.org/10.1109/APSEC.2015.33
Zaidi, S.F.A., Lee, C.G.: Learning graph representation of bug reports to triage bugs using graph convolution network. In: 2021 International Conference on Information Networking (ICOIN), pp. 504–507 (2021). https://doi.org/10.1109/ICOIN50884.2021.9333902
Zaidi, S.F.A., Lee, C.G.: One-class classification based bug triage system to assign a newly added developer. In: 2021 International Conference on Information Networking (ICOIN), pp. 738–741 (2021). https://doi.org/10.1109/ICOIN50884.2021.9334002
Zhang, W.: Efficient bug triage for industrial environments. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 727–735 (2020). https://doi.org/10.1109/ICSME46990.2020.00082
Zhou, C., Li, B., Sun, X., Guo, H.: Recognizing software bug-specific named entity in software bug repository. In: 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC), pp. 108–10,811 (2018)
Acknowledgement
This work has been carried out in cooperation between NOKIA and Wroclaw University of Science and Technology in context of a Ph. D. grant under the fourth edition of the “Implementation Doctorate Programme”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chmielowski, L., Kucharzak, M. (2022). Impact of Software Bug Report Preprocessing and Vectorization on Bug Assignment Accuracy. In: Choraś, M., Choraś, R.S., Kurzyński, M., Trajdos, P., Pejaś, J., Hyla, T. (eds) Progress in Image Processing, Pattern Recognition and Communication Systems. CORES IP&C ACS 2021 2021 2021. Lecture Notes in Networks and Systems, vol 255. Springer, Cham. https://doi.org/10.1007/978-3-030-81523-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-81523-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81522-6
Online ISBN: 978-3-030-81523-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)