A Text-Based Regression Approach to Predict Bug-Fix Time

  • Pasquale ArdimentoEmail author
  • Nicola Boffoli
  • Costantino Mele
Part of the Studies in Computational Intelligence book series (SCI, volume 880)


Predicting bug-fixing time can help project managers to select the adequate resources in bug assignment activity. In this work, we tackle the problem of predicting the bug-fixing time by a multiple regression analysis using as predictor variables the textual information extracted from the bug reports. Our model selects all and only the features useful for prediction, also using statistical procedures, such as the Principal Component Analysis (PCA). To validate our model, we performed an empirical investigation using the bug reports of four well-known open source projects whose bugs are stored in Bugzilla installations, where Bugzilla is an online open-source Bug Tracking System (BTS). For each project, we built a regression model using the M5P model tree, Support Vector Machine (SVM) and Random Forests algorithms. Experimental results show the model is effective, in fact, they are slightly better than all the ones known in the literature. In the future, we will use and compare other different regression approaches to select the best one for a specific data set.



The research is partially supported by the POR Puglia FESR-FSE 2014-2020 - Asse prioritario 1 - Ricerca, sviluppo tecnologico, innovazione - Sub Azione 1.4.b bando innolabs - sostegno alla creazione di soluzioni innovative finalizzate a specifici problemi di rilevanza sociale - Research project KOMETA (Knowledge Community for Efficient Training through Virtual Technologies), funded by Regione Puglia.


  1. 1.
    Ardimento, P., Bianchi, A, Visaggio, G..: Maintenance-Oriented Selection of Software Components. In: Proceedings of the 8th Euromicro Working Conference on Software Maintenance and Reengineering (CSMR’04), Washington, DC, USA. IEEE Computer Society (2004)Google Scholar
  2. 2.
    Habayeb, M., Murtaza, S.S., Miranskyy, A., Bener, A.B.: On the use of hidden Markov model to predict the time to fix bugs. IEEE Trans. Softw. Eng. 44, 1224–1244 (2018)Google Scholar
  3. 3.
    Panjer, L.D.: Predicting eclipse bug lifetimes. In: Proceedings of the 4th International Workshop on mining software repositories, p. 29 (2007)Google Scholar
  4. 4.
    Kim, S., Whitehead, Jr., E.J.: How long did it take to fix bugs? In: Proceedings of the 2006 International Workshop on Mining Software Repositories, pp. 173–174 (2006)Google Scholar
  5. 5.
    Giger, E., Pinzger, M., Gall, H.: Predicting the fix time of bugs. In: Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, pp. 52–56 (2010)Google Scholar
  6. 6.
    Ardimento, P., Bilancia, M., Monopoli, S.: Bug-fix, predicting, time: using standard versus topic-based text categorization techniques. In: Calders, T., Ceci, M., Malerba, D. (eds.) Discovery Science, DS 2016. Lecture Notes in Computer Science, vol. 9956. Springer, Cham (2016)Google Scholar
  7. 7.
    Ardimento, P., Dinapoli, A.: Knowledge extraction from on-line open source bug tracking systems to predict bug-fixing time. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (WIMS’17), New York, NY, USA, Article 7, p. 9. ACM (2017)Google Scholar
  8. 8.
    Bugzilla installation for Mozilla. Accessed 19 July 2019
  9. 9.
    Bugzilla installation for Accessed 19 July 2019
  10. 10.
    Bugzilla installation for NetBeans. Accessed 19 July 2019
  11. 11.
    Bugzilla installation for Eclipse. Accessed 19 July 2019
  12. 12.
    MongoDB, a cross-platform document-oriented database program. Accessed 19 July 2019
  13. 13.
    R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2018). Accessed 19 July 2019
  14. 14.
    Bugzilla documentation. REST API Bugzilla. Accessed 19 July 2019
  15. 15.
    Life cycle of a bug. Accessed 19 July 2019
  16. 16.
    Hooimeijer, P., Weimer, W.: Modeling bug report quality. In: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering, pp. 34–43 (2007)Google Scholar
  17. 17.
    Anbalagan, P., Vouk, M.: On predicting the time taken to correct bug reports in open source projects. In: Proceedings of 2009 IEEE International Conference on Software Maintenance, pp. 523–526 (2009)Google Scholar
  18. 18.
    Bhattacharya, P., Neamtiu, I.: Bug-fix time prediction models: can we do better? In: Proceedings of the 8th Working Conference on Mining Software Repositories, pp. 207–210 (2011)Google Scholar
  19. 19.
    Puranik, S., Deshpande, P., Chandrasekaran, K.: A novel machine learning approach for bug prediction. Procedia Comput. Sci. 93, 924–930 (2016)CrossRefGoogle Scholar
  20. 20.
    D’Ambros, M., Lanza, M., Robbes, R.: An extensive comparison of bug prediction approaches. In: 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pp. 31–41 (2010)Google Scholar
  21. 21.
    Zhang, H., Gong, L., Versteeg, S.: Predicting bug-fixing time: an empirical study of commercial software projects. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 1042–1051 (2013)Google Scholar
  22. 22.
    Akbarinasaji, S., Caglayan, B., Bener, A.: Predicting bug-fixing time: a replication study using an open source software project. J. Syst. Softw. 136, 173–186 (2018)CrossRefGoogle Scholar
  23. 23.
    Marks, L., Zou, Y., Hassan, A.E.: Studying the fix-time for bugs in large open source projects. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering, p. 11 (2011)Google Scholar
  24. 24.
    Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2, 433–459 (2010)CrossRefGoogle Scholar
  25. 25.
    Kanyongo, G.Y.: Determining the correct number of components to extract from a principal components analysis: a Monte Carlo study of the accuracy of the scree plot. J. Mod. Appl. Stat. Methods 4(1), article 13 (2005)CrossRefGoogle Scholar
  26. 26.
    Bouchet-Valat, M.: SnowballC: Snowball stemmers based on the C ‘libstemmer’ UTF-8 library. R package version 0.6.0. (2019). Accessed 19 July 2019
  27. 27.
    Feinerer, I., Hornik, K.: tm: text mining package. R package version 0.7-6 (2018). Accessed 19 July 2019
  28. 28.
    Hornik, K., Buchta, C., Zeileis, A.: Open-source machine learning: R meets Weka. Comput. Stat. 24(2), 225–232 (2009). Scholar
  29. 29.
    Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)Google Scholar
  30. 30.
    Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F: e1071: Misc functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien. R package version 1.7-1. (2019). Accessed 19 July 2019
  31. 31.
    The full result set of empirical experimentation. Accessed 19 July 2019
  32. 32.
    The collected dataset is available, in JSON format, online under this link. Accessed 19 July 2019

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Pasquale Ardimento
    • 1
    Email author
  • Nicola Boffoli
    • 1
  • Costantino Mele
    • 1
  1. 1.Department of InformaticsUniversity of Bari Aldo MoroBariItaly

Personalised recommendations