Improved Searchability of Bug Reports Using Content-Based Labeling with Machine Learning of Sentences

  • Yuki NoyoriEmail author
  • Hironori Washizaki
  • Yoshiaki Fukazawa
  • Hideyuki Kanuka
  • Keishi Ooshima
  • Ryosuke Tsuchiya
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 108)


Most stakeholders refer to past bug reports when they encounter a problem since bug reports contain useful information. However, searching for specific content is difficult because there are many bug reports. The desired content depends on the viewpoint of the stakeholder. A full text search includes unwanted content, which is costly. Although this problem has been previously noted, a solution has yet to be proposed. Herein we propose Content-based Labeling Method as a solution. This method organizes information in a bug report by labeling each sentence based on its contents, allowing stakeholders’ viewpoints to be considered. We evaluate the improvement in searchability. The Content-based Labeling Method improves the searchability according to the F-measure and precision of the experimental results.


Bug report Machine learning Labeling Searchability 


  1. 1.
    Bettenburg, N., Just, S., Schroter, A., Weiss, C., Premraj, R., Zimmermann,T.: What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 308–318 (2008)Google Scholar
  2. 2.
    Yusop, N.S.M.Y., Grundy, J., Vasa, R.: Reporting usability defects: do reporters report what software developers need? In: Proceedings of the 24th Australasian Software Engineering Conference, pp. 38–45 (2015)Google Scholar
  3. 3.
    Rastkar, S., Murphy, G.C., Murray, G.: Automatic summarization of bug reports. IEEE Trans. Softw. Eng. 40(4), 366–380 (2014)CrossRefGoogle Scholar
  4. 4.
    Rastkar, S., Murphy, G.C., Murray, G.: Summarizing software artifacts: a case study of bug reports. In: Proceedings of the 32nd International Conference on Software Engineering, pp. 505–514 (2010)Google Scholar
  5. 5.
    Ferreira, E.C., Vieira, V., Mourao, F.: Bug report summarization: an evaluation of ranking techniques. In: X Brazilian Symposium on Components, Architectures and Reuse Software, pp. 101–110 (2016)Google Scholar
  6. 6.
    Mani, S., Catherine, R., Sinha, V.S., Dubey, A.: AUSUM: approach for unsupervised bug report summarization. In: Proceedings of the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, pp. 1–11 (2012)Google Scholar
  7. 7.
    Yusop, N.S.M.Y., Grundy, J., Vasa, R.: Reporting usability defects: do reporters report what software developers need? In: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering, pp. 1–10 (2016)Google Scholar
  8. 8.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning, pp. 137–142 (1998)Google Scholar
  9. 9.
    Zhang, H., Li, D.: Naïve Bayes text classifier. In: Proceedings of the IEEE International Conference on Granular Computing, pp. 708–711 (2007)Google Scholar
  10. 10.
    Wu, Q., Ye, Y., Zhang, H., Ng, M.K., Ho, S.-S.: ForesTexter: an efficient random forest algorithm for imbalanced text categorization. Knowl. Based Syst. 67, 105–116 (2014)CrossRefGoogle Scholar
  11. 11.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRefGoogle Scholar
  12. 12.
    Scikit-learn machine learning in Python.
  13. 13.
    Gensim topic modelling for humans.
  14. 14.
    Garca, S., Herrera, F.: Evolutionary under-sampling for classification with imbalanced data sets: proposals and taxonomy. Evol. Comput. 17(3), 275–306 (2009)CrossRefGoogle Scholar
  15. 15.
    Hripcsak, G., Rothschild, A.S.: Agreement, the F-Measure, and reliability in information retrieval. J. Am. Inform. Assoc. 12(3), 296–298 (2005)CrossRefGoogle Scholar
  16. 16.
    Watanabe, Y., et al.: ID3P: iterative data-driven development of persona based on quantitative evaluation and revision. In: Proceedings of the 10th International Workshop on Cooperative and Human Aspects of Software Engineering, pp. 49–55 (2017)Google Scholar
  17. 17.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Yuki Noyori
    • 1
    Email author
  • Hironori Washizaki
    • 1
  • Yoshiaki Fukazawa
    • 1
  • Hideyuki Kanuka
    • 2
  • Keishi Ooshima
    • 2
  • Ryosuke Tsuchiya
    • 2
  1. 1.Waseda UniversityTokyoJapan
  2. 2.Hitachi, Ltd. Research & Development GroupTokyoJapan

Personalised recommendations