Spark-Based Cluster Implementation of a Bug Report Assignment Recommender System

  • Adrian-Cătălin FloreaEmail author
  • John Anvik
  • Răzvan Andonie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10246)


The use of recommenders for bug report triage decisions is especially important in the context of large software development projects, where both the frequency of reported problems and a large number of active developers can pose problems in selecting the most appropriate developer to work on a certain issue. From a machine learning perspective, the triage problem of bug report assignment in software projects may be regarded as a classification problem which can be solved by a recommender system. We describe a highly scalable SVM-based bug report assignment recommender that is able to run on massive datasets. Unlike previous desktop-based implementations of bug report triage assignment recommenders, our recommender is implemented on a cloud platform. The system uses a novel sequence of machine learning processing steps and compares favorably with other SVM-based bug report assignment recommender systems with respect to prediction performance. We validate our approach on real-world datasets from the Netbeans, Eclipse and Mozilla projects.



The authors are grateful to the Mozilla Foundation for providing a dump of their Bugzilla database.


  1. 1.
    Ahsan, S.N., Ferzund, J., Wotawa, F.: Automatic software bug triage system (BTS) based on latent semantic indexing and support vector machine. In: Fourth International Conference on Software Engineering Advances, ICSEA 2009, pp. 216–221, September 2009Google Scholar
  2. 2.
    Anvik, J.: Automating bug report assignment. In: Proceedings of the 28th International Conference on Software Engineering, ICSE 2006, NY, USA, pp. 937–940 (2006).
  3. 3.
    Anvik, J., Hiew, L., Murphy, G.C.: Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering, ICSE 2006, NY, USA, pp. 361–370 (2006).
  4. 4.
    Anvik, J., Murphy, G.C.: Reducing the effort of bug report triage: recommenders for development-oriented decisions. ACM Trans. Softw. Eng. Methodol. 20(3), 10:1–10:35 (2011).
  5. 5.
    Banitaan, S., Alenezi, M.: Tram: an approach for assigning bug reports using their metadata. In: 2013 Third International Conference on Communications and Information Technology, pp. 215–219, June 2013Google Scholar
  6. 6.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012).
  7. 7.
    Bhattacharya, P., Neamtiu, I., Shelton, C.R.: Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J. Syst. Softw. 85(10), 2275–2292 (2012)CrossRefGoogle Scholar
  8. 8.
    Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  9. 9.
    Cavalcanti, Y.C., da Mota Silveira Neto, P.A., do Carmo Machado, I., Vale, T.F., de Almeida, E.S., de Lemos Meira, S.R.: Challenges and opportunities for software change request repositories: a systematic mapping study. J. Softw. Evol. Process 26(7), 620–653 (2014).
  10. 10.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011).
  11. 11.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). 1022627411411 zbMATHGoogle Scholar
  12. 12.
    Cubranic, D., Murphy, G.C.: Automatic bug triage using text categorization. In: Proceedings of the Sixteenth International Conference on Software Engineering & Knowledge Engineering (SEKE 2004), Banff, Alberta, Canada, 20–24 June 2004, pp. 92–97 (2004)Google Scholar
  13. 13.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)zbMATHGoogle Scholar
  14. 14.
    Harris, D., Harris, S.: Digital Design and Computer Architecture, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2012)zbMATHGoogle Scholar
  15. 15.
    Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28, 11–21 (1972)CrossRefGoogle Scholar
  16. 16.
    Nasim, S., Razzaq, S., Ferzund, J.: Automated change request triage using alpha frequency matrix. In: Frontiers of Information Technology (FIT), pp. 298–302, December 2011Google Scholar
  17. 17.
    Nguyen, T.T., Nguyen, A.T., Nguyen, T.N.: Topic-based, time-aware bug assignment. SIGSOFT Softw. Eng. Notes 39(1), 1–4 (2014). CrossRefGoogle Scholar
  18. 18.
    Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, New York (2011)CrossRefGoogle Scholar
  19. 19.
    Reis, C.R., de Mattos Fortes, R.P., Pontin, R., Fortes, M.: An overview of the software engineering process and tools in the mozilla project (2002)Google Scholar
  20. 20.
    Shinnar, A., Cunningham, D., Saraswat, V., Herta, B.: M3r: Increased performance for in-memory Hadoop jobs. Proc. VLDB Endow. 5(12), 1736–1747 (2012). CrossRefGoogle Scholar
  21. 21.
    Shokripour, R., Anvik, J., Kasirun, Z.M., Zamani, S.: Why so complicated? simple term filtering and weighting for location-based bug report assignment recommendation. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR 2013, pp. 2–11. IEEE Press, Piscataway (2013).
  22. 22.
    Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009). CrossRefGoogle Scholar
  23. 23.
    Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, vol. 1, pp. 173–180. Association for Computational Linguistics, Stroudsburg (2003).
  24. 24.
    Wu, W., Zhang, W., Yang, Y., Wang, Q.: Drex: developer recommendation with k-nearest-neighbor search and expertise ranking. In: 2011 18th Asia Pacific Software Engineering Conference (APSEC), pp. 389–396, December 2011Google Scholar
  25. 25.
    Xia, X., Lo, D., Wang, X., Zhou, B.: Accurate developer recommendation for bug resolution. In: Proceedings of the 20th Working Conference on Reverse Engineering, pp. 72–81, October 2013Google Scholar
  26. 26.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997).

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Adrian-Cătălin Florea
    • 1
    Email author
  • John Anvik
    • 2
  • Răzvan Andonie
    • 1
    • 3
  1. 1.Electronics and Computers DepartmentTransilvania University of BraşovBraşovRomania
  2. 2.Department of Mathematics and Computer ScienceUniversity of LethbridgeLethbridgeCanada
  3. 3.Computer Science DepartmentCentral Washington UniversityEllensburgUSA

Personalised recommendations