Skip to main content

Impact of Software Bug Report Preprocessing and Vectorization on Bug Assignment Accuracy

  • Conference paper
  • First Online:
Progress in Image Processing, Pattern Recognition and Communication Systems (CORES 2021, IP&C 2021, ACS 2021)

Abstract

Bug handling processes aimed at efficient defect resolution provisioning is an important part of a software development lifecycle and usually has a very formal process definition in modern and professional large software development organizations. Improvements of such process may include automated bug assignment, which is a task of selecting a correct development team for further investigations of a bug report. As bug reports contain lots of natural language descriptions, the bug assignment becomes a non-trivial task, especially in testing of large-scale projects or complex systems. This research focuses on natural language preprocessing and vectorization impact on accuracy of bug report assignment based on real data captured in large software development projects. Experimentation results cover stemming and lemmatization techniques applied for bug description preprocessing and term frequency – inverse document frequency (TF-IDF) parametrization as vectorization method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alazzam, I., Aleroud, A., Al Latifah, Z., Karabatis, G.: Automatic bug triage in software systems using graph neighborhood relations for feature augmentation. IEEE Trans. Comput. Soc. Syst. 7(5), 1288–1303 (2020). https://doi.org/10.1109/TCSS.2020.3017501

    Article  Google Scholar 

  2. Behl, D., Handa, S., Arora, A.: A bug mining tool to identify and analyze security bugs using Naive Bayes and TF-IDF. In: 2014 International Conference on Reliability Optimization and Information Technology (ICROIT), pp. 294–299 (2014). https://doi.org/10.1109/ICROIT.2014.6798341

  3. Choquette-Choo, C.A., Sheldon, D., Proppe, J., Alphonso-Gibbs, J., Gupta, H.: A multi-label, dual-output deep neural network for automated bug triaging. In: 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 937–944 (2019). https://doi.org/10.1109/ICMLA.2019.00161

  4. Cubranic, D., Murphy, G.: Automatic bug triage using text categorization. In: SEKE (2004)

    Google Scholar 

  5. Goseva-Popstojanova, K., Tyo, J.: Identification of security related bug reports via text mining using supervised and unsupervised classification. In: 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 344–355 (2018). https://doi.org/10.1109/QRS.2018.00047

  6. Gujral, S., Sharma, G., Sharma, S., Diksha: Classifying bug severity using dictionary based approach. In: 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), pp. 599–602 (2015). https://doi.org/10.1109/ABLAZE.2015.7154933

  7. Jabeen, H.: Stemming and Lemmatization in Python (2018). https://www.datacamp.com/community/tutorials/stemming-lemmatization-python

  8. Jonsson, L.: Machine learning-based bug handling in large-scale software development. Ph.D. thesis, Linköping Studies in Science and Technology (2018)

    Google Scholar 

  9. Kallis, R., Di Sorbo, A., Canfora, G., Panichella, S.: Ticket tagger: machine learning driven issue classification. In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 406–409 (2019). https://doi.org/10.1109/ICSME.2019.00070

  10. Lamkanfi, A., Demeyer, S.: Predicting reassignments of bug reports - an exploratory investigation. In: 2013 17th European Conference on Software Maintenance and Reengineering, pp. 327–330 (2013). https://doi.org/10.1109/CSMR.2013.42

  11. Liu, K., Beng Kuan Tan, H., Zhang, H.: Has this bug been reported? In: 2013 20th Working Conference on Reverse Engineering (WCRE), pp. 82–91 (2013). https://doi.org/10.1109/WCRE.2013.6671283

  12. M. Castelluccio, S.L.: Teaching machines to triage Firefox bugs. https://hacks.mozilla.org/2019/04/teaching-machines-to-triage-firefox-bugs/

  13. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)

    Google Scholar 

  14. Pereira, M., Kumar, A., Cristiansen, S.: Identifying security bug reports based solely on report titles and noisy data. In: 2019 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 39–44 (2019). https://doi.org/10.1109/SMARTCOMP.2019.00026

  15. Roul, R.K., Sahoo, J.K., Arora, K.: Modified TF-IDF term weighting strategies for text categorization. In: 2017 14th IEEE India Council International Conference (INDICON), pp. 1–6 (2017). https://doi.org/10.1109/INDICON.2017.8487593

  16. Tsuruda, A., Manabe, Y., Aritsugi, M.: Can we detect bug report duplication with unfinished bug reports? In: 2015 Asia-Pacific Software Engineering Conference (APSEC), pp. 151–158 (2015). https://doi.org/10.1109/APSEC.2015.33

  17. Zaidi, S.F.A., Lee, C.G.: Learning graph representation of bug reports to triage bugs using graph convolution network. In: 2021 International Conference on Information Networking (ICOIN), pp. 504–507 (2021). https://doi.org/10.1109/ICOIN50884.2021.9333902

  18. Zaidi, S.F.A., Lee, C.G.: One-class classification based bug triage system to assign a newly added developer. In: 2021 International Conference on Information Networking (ICOIN), pp. 738–741 (2021). https://doi.org/10.1109/ICOIN50884.2021.9334002

  19. Zhang, W.: Efficient bug triage for industrial environments. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 727–735 (2020). https://doi.org/10.1109/ICSME46990.2020.00082

  20. Zhou, C., Li, B., Sun, X., Guo, H.: Recognizing software bug-specific named entity in software bug repository. In: 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC), pp. 108–10,811 (2018)

    Google Scholar 

Download references

Acknowledgement

This work has been carried out in cooperation between NOKIA and Wroclaw University of Science and Technology in context of a Ph. D. grant under the fourth edition of the “Implementation Doctorate Programme”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lukasz Chmielowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chmielowski, L., Kucharzak, M. (2022). Impact of Software Bug Report Preprocessing and Vectorization on Bug Assignment Accuracy. In: Choraś, M., Choraś, R.S., Kurzyński, M., Trajdos, P., Pejaś, J., Hyla, T. (eds) Progress in Image Processing, Pattern Recognition and Communication Systems. CORES IP&C ACS 2021 2021 2021. Lecture Notes in Networks and Systems, vol 255. Springer, Cham. https://doi.org/10.1007/978-3-030-81523-3_15

Download citation

Publish with us

Policies and ethics