Empirical Software Engineering

, Volume 21, Issue 4, pp 1533–1578 | Cite as

Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts

  • Leif JonssonEmail author
  • Markus Borg
  • David Broman
  • Kristian Sandahl
  • Sigrid Eldh
  • Per Runeson


Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learning classification. In particular, we study the state-of-the-art ensemble learner Stacked Generalization (SG) that combines several classifiers. We collect more than 50,000 bug reports from five development projects from two companies in different domains. We implement automated bug assignment and evaluate the performance in a set of controlled experiments. We show that SG scales to large scale industrial application and that it outperforms the use of individual classifiers for bug assignment, reaching prediction accuracies from 50 % to 89 % when large training sets are used. In addition, we show how old training data can decrease the prediction accuracy of bug assignment. We advice industry to use SG for bug assignment in proprietary contexts, using at least 2,000 bug reports for training. Finally, we highlight the importance of not solely relying on results from cross-validation when evaluating automated bug assignment.


Machine learning Ensemble learning Classification Bug reports Bug assignment Industrial scale; Large scale 



This work was supported in part by the Industrial Excellence Center EASE – Embedded Applications Software Engineering.9


  1. Aberdour M (2007) Achieving quality in open-source software. IEEE Softw 24 (1):58–64CrossRefGoogle Scholar
  2. Ahsan S, Ferzund J, Wotawa F (2009) Automatic software bug triage system (bts) based on latent semantic indexing and support vector machine. In: Proceedings of the 4th international conference on software engineering advances, pp 216–221Google Scholar
  3. Alenezi M, Magel K, Banitaan S (2013) Efficient bug triaging using text mining. J Softw 8(9)Google Scholar
  4. Alshammari R, Zincir-Heywood A (2009) Machine learning based encrypted traffic classification: Identifying SSH and Skype. In: Proceedings of the symposium on computational intelligence for security and defense applications, pp 1–8Google Scholar
  5. Amamra A, Talhi C, Robert JM, Hamiche M (2012) Enhancing smartphone malware detection performance by applying machine learning hybrid classifiers. In: Kim Th, Ramos C, Kim Hk, Kiumi A, Mohammed S, Slezak D (eds) Computer applications for software engineering, disaster recovery, and business continuity, no. 340 in communications in computer and information science. Springer, Berlin, pp 131–137Google Scholar
  6. Anvik J (2007) Assisting bug report triage through recommendation. Thesis, University of British ColumbiaGoogle Scholar
  7. Anvik J, Murphy GC (2011) Reducing the effort of bug report triage: recommenders for development-oriented decisions. Trans Softw Eng Methodol 20(3):10:1–10:35Google Scholar
  8. Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug?. In: Proceedings of the 28th international conference on software engineering, New York, NY, USA, ’06, pp 361–370Google Scholar
  9. Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statistics Surveys 4:40–79MathSciNetCrossRefzbMATHGoogle Scholar
  10. Asklund U, Bendix L (2002) A study of configuration management in open source software projects. IEE Proceedings - Software 149(1):40–46CrossRefGoogle Scholar
  11. Avazpour I, Pitakrat T, Grunske L, Grundy J (2014) Dimensions and metrics for evaluating recommendation systems. In: Robillard M, Maalej W, Walker R, Zimmermann T (eds) Recommendation systems in software engineering. Springer, pp 245–273Google Scholar
  12. Basili V, Selby R, Hutchens D (1986) Experimentation in software engineering. IEEE Trans Softw Eng SE 12(7):733–743. doi: 10.1109/TSE.1986.6312975 CrossRefGoogle Scholar
  13. Baysal O, Godfrey M, Cohen R (2009) A bug you like: A framework for automated assignment of bugs. In: Proceedings of the 17th international conference on program comprehension, pp 297–298Google Scholar
  14. Bettenburg N, Premraj R, Zimmermann T, Sunghun K (2008) Duplicate bug reports considered harmful... really?. In: Proceedings of the international conference on software maintenance, pp 337–345Google Scholar
  15. Bezanson J, Karpinski S, Shah VB, Edelman A (2012) Julia: A fast dynamic language for technical computing. arXiv:1209.5145
  16. Bhattacharya P, Neamtiu I, Shelton CR (2012) Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J Syst Softw 85(10):2275–2292CrossRefGoogle Scholar
  17. Bifet A, Holmes G, Kirkby R, Pfahringer B, Massive online analysis (2010). J Mach Learn Res 11:1601–1604Google Scholar
  18. Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkzbMATHGoogle Scholar
  19. Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022. zbMATHGoogle Scholar
  20. Borg M, Pfahl D (2011) Do better IR tools improve the accuracy of engineers’ traceability recovery? In: Proceedings of the international workshop on machine learning technologies in software engineering, pp 27–34Google Scholar
  21. Borg M, Runeson P, Ardö A (2014) Recovering from a decade: A systematic mapping of information retrieval approaches to software traceability. Empir Softw Eng 19(6):1565–1616. doi: 10.1007/s10664-013-9255-y CrossRefGoogle Scholar
  22. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MathSciNetzbMATHGoogle Scholar
  23. Burman P, Chow E, Nolan D (1994) A cross-validatory method for dependent data. Biometrika 81(2):351–358MathSciNetCrossRefzbMATHGoogle Scholar
  24. Canfora G, Cerulo L (2006) Supporting change request assignment in open source development. In: Proceedings of the symposium on applied computing, pp 1767–1772Google Scholar
  25. Chen L, Wang X, Liu C (2011) An approach to improving bug assignment with bug tossing graphs and bug similarities. J Softw 6(3)Google Scholar
  26. Cubranic D, Murphy GC (2004) Automatic bug triage using text categorization. In: Proceedings of the 16th international conference on software engineering & knowledge engineering, pp 92–97Google Scholar
  27. Frank E, Hall M, Trigg L, Holmes G, Witten I (2004) Data mining in bioinformatics using Weka. Bioinformatics 20(15):2479–2481CrossRefGoogle Scholar
  28. Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitanyi P (ed) Computational learning theory, no. 904 in lecture notes in computer science. Springer, Berlin, pp 23–37Google Scholar
  29. Green SB (1991) How many subjects does it take to do a regression analysis. Multivar Behav Res 26(3):499–510CrossRefGoogle Scholar
  30. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11 (1):10–18CrossRefGoogle Scholar
  31. Helming J, Arndt H, Hodaie Z, Koegel M, Narayan N (2011) Automatic sssignment of work items. In: Maciaszek LA, Loucopoulos P (eds) Proceedings of the international conference on evaluation of novel approaches to software engineering. Springer, Berlin, pp 236–250Google Scholar
  32. Hofmann M, Klinkenberg R (2013) Data mining use cases and business analytics applications. CRC Press, Taylor & Francis Group, Boca Raton. ISBN: 1482205491, 9781482205497Google Scholar
  33. Jeong G, Kim S, Zimmermann T (2009) Improving bug triage with bug tossing graphs. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, New York, NY, USA , pp 111–120Google Scholar
  34. Jonsson L, Broman D, Sandahl K, Eldh S (2012) Towards automated anomaly report assignment in large complex systems using stacked generalization. In: Proceedings of the International conference on software testing, verification, and validation, pp 437–446Google Scholar
  35. Just S, Premraj R, Zimmermann T (2008) Towards the next generation of bug tracking systems. In: Proceedings of the Symposium on visual languages and Human-centric computing, IEEE Computer Society, pp 82–85Google Scholar
  36. Kagdi H, Gethers M, Poshyvanyk D, Hammad M (2012) Assigning change requests to software developers. J Softw: Evolution and Process 24(1):3–33Google Scholar
  37. Kodovsky J (2011) On dangers of cross-validation in steganalysis. Tech. rep., Birmingham UniversityGoogle Scholar
  38. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International joint conference on artificial intelligence, vol 2 , pp 1137–1143Google Scholar
  39. Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207CrossRefzbMATHGoogle Scholar
  40. Li N, Li Z, Nie Y, Sun X, Li X (2011) Predicting software black-box defects using stacked generalization. In: Proceedings of the International conference on digital information management , pp 294–299Google Scholar
  41. Li Q, Wang Q, Yang Y, Li M (2008) Reducing biases in individual software effort estimations: a combining approach. In: Proceedings of the 2nd international symposium on empirical software engineering and measurement. doi: 10.1145/1414004.1414041, pp 223–232
  42. Lin Z, Shu F, Yang Y, Hu C, Wang Q (2009) An empirical study on bug assignment automation using Chinese bug data. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement, pp 451–455Google Scholar
  43. Linares-Vasquez M, Hossen K, Dang H, Kagdi H, Gethers M, Poshyvanyk D (2012) Triaging incoming change requests: bug or commit history, or code authorship?. In: Proceedings of the 28th international conference on software maintenance, pp 451–460Google Scholar
  44. Matter D, Kuhn A, Nierstrasz O (2009) Assigning bug reports using a vocabulary-based expertise model of developers. In: 6th IEEE International working conference on mining software repositories, 2009. MSR ’09. doi: 10.1109/MSR.2009.5069491, pp 131–140
  45. McCallum A (2002) A machine learning for language toolkit. Tech. rep.
  46. Mozilla (2013) Life cycle of a bug. Accessed 28-October-2013
  47. Murphy-Hill E, Murphy G (2014) Recommendation delivery: getting the user interface just right. In: Robillard M, Maalej W, Walker R, Zimmermann T (eds) Recommendation systems in software engineering. Springer, BerlinGoogle Scholar
  48. Nagwani N, Verma S (2012) Predicting expert developers for newly reported bugs using frequent terms similarities of bug attributes. In: Proceedings of the 9th international conference on ICT and knowledge engineering, pp 113–117Google Scholar
  49. Owen S, Anil R, Dunning T, Friedman E (2011) Mahout in action. Manning Publications, Shelter IslandGoogle Scholar
  50. Parasuraman R, Sheridan T, Wickens C (2000) A model for types and levels of human interaction with automation. IEEE Trans Syst Man Cybern 30(3):286–297CrossRefGoogle Scholar
  51. Park J, Lee M, Kim J, Hwang S, Kim S (2011) A cost-aware triage algorithm for bug reporting systems. In: Proceedings of the 25th AAAI conference on artificial intelligenceGoogle Scholar
  52. Paulson J, Succi G, Eberlein A (2004) An empirical study of open-source and closed-source software products. IEEE Trans Softw Eng 30(4):246–256CrossRefGoogle Scholar
  53. Petersen K, Wohlin C (2009) Context in industrial software engineering research. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement, pp 401–404Google Scholar
  54. Rao R, Fung G, Rosales R (2008) On the dangers of cross-validation. An experimental evaluation. In: Proceedings of the SIAM international conference on data mining, pp 588–596Google Scholar
  55. Regnell B, Berntsson Svensson R, Olsson T (2008) Supporting roadmapping of quality requirements. IEEE Softw 25(2):42–47. doi: 10.1109/MS.2008.48 CrossRefGoogle Scholar
  56. Robillard M, Maalej W, Walker R, Zimmermann T (2014) Recommendation systems in software engineering. Springer, BerlinCrossRefGoogle Scholar
  57. Robinson B, Francis P (2010) Improving industrial adoption of software engineering research: A comparison of open and closed source software. In: Proceedings of the international symposium on empirical software engineering and measurement, pp 21:1–21:10Google Scholar
  58. Robles G, Gonzalez-Barahona J (2006) Contributor turnover in Libre software projects. In: Damiani E, Fitzgerald B, Scacchi W, Scotto M, Succi G (eds) Open source systems, no. 203 in International federation for information processing. Springer, pp 273–286Google Scholar
  59. Servant F, Jones J (2012) Automatic developer-to-fault assignment through fault localization. In: Proceedings. of the 34th international conference on software engineering (ICSE), pp 36–46Google Scholar
  60. Shokripour R, Kasirun Z, Zamani S, Anvik J (2012) Automatic bug assignment using information extraction methods. In: Proceedings of the international conference on advanced computer science applications and technologies, pp 144–149Google Scholar
  61. Sill J, Takacs G, Mackey L, Lin D (2009) Feature-weighted linear stacking. arXiv:0911.0460
  62. Tamrawi A, Nguyen T, Al-Kofahi J, Nguyen T (2011) Fuzzy set and cache-based approach for bug triaging. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. doi: 10.1145/2025113.2025163, pp 365–375
  63. Thomas S, Nagappan M, Blostein D, Hassan A (2013) The impact of classifier configuration and classifier combination on bug localization. IEEE Trans. Softw. Eng. 39(10):1427–1443CrossRefGoogle Scholar
  64. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B (Stat Methodol) 63(2):411–423MathSciNetCrossRefzbMATHGoogle Scholar
  65. Wiklund K, Eldh S, Sundmark D, Lundqvist K (2013) Can we do useful industrial software engineering research in the shadow of lean and agile?. In: Proceedings of the 1st international workshop on conducting empirical studies in industry, pp 67–68Google Scholar
  66. Witten IH, Frank E, Hall MA (2011) Data mining. pub. Burlington, MAGoogle Scholar
  67. Wohlin C, Runeson P, Host M, Ohlsson M, Regnell B, Wesslen A (2012) Experimentation in software engineering: A practical guide. Springer, BerlinCrossRefzbMATHGoogle Scholar
  68. Wolpert D (1992) Stacked generalization. Neural Netw. 5(2):241–259MathSciNetCrossRefGoogle Scholar
  69. Wu W, Zhang W, Yang Y, Wang Q (2011) Developer recommendation with k-nearest-neighbor search and expertise ranking. In: Proceedings of the 18th Asia pacific software engineering conference, pp 389–396Google Scholar
  70. Xia X, Lo D, Wang X, Zhou B (2013) Accurate developer recommendation for bug resolution. In: Proceedings of the 20th working conference on reverse engineering, pp 72–81Google Scholar
  71. Xie X, Zhang W, Yang Y, Wang Q (2012) Developer recommendation based on topic models for bug resolution. In: Proceedings of the 8th international conference on predictive models in software engineering, pp 19–28Google Scholar
  72. Zaharia M, Chowdhury NMM, Franklin M, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. Tech. rep., EECS department, University of California, University of California at Berkeley, Berkeley, CaliforniaGoogle Scholar
  73. Zhao Y, Zhang Y (2008) Comparison of decision tree methods for finding active objects. Adv Space Res 41(12):1955–1959CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Ericsson ABStockholmSweden
  2. 2.Department of Computer and Information ScienceLinköping UniversityLinköpingSweden
  3. 3.Department of Computer ScienceLund UniversityLundSweden
  4. 4.KTH Royal Institute of TechnologyKistaSweden
  5. 5.UC BerkeleyBerkeleyUSA

Personalised recommendations