Skip to main content

Cross-Project Issue Classification Based on Ensemble Modeling in a Social Coding World

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11304))

Included in the following conference series:

Abstract

The simplified and deformalized contribution mechanisms in social coding are attracting more and more contributors involved in the collaborative software development. To reduce the burden on the side of project core team, various kinds of automated and intelligent approaches have been proposed based on machine learning and data mining technologies, which would be restricted by the lack of training data. In this paper, we conduct an extensive empirical study of transferring and aggregating reusable models across projects in the context of issue classification, based on a large-scale dataset including 799 open source projects and more than 795,000 issues. We propose a novel cross-project approach which integrate multiple models learned from various source projects to classify target project. We evaluate our approach through conducting comparative experiments with the within-project classification and a typical cross-project method called Bellwether. The results show that our cross-project approach based on ensemble modeling can obtain great performance, which comparable to the within-project classification and performs better than Bellwether.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://bitbucket.org.

  2. 2.

    https://github.com.

  3. 3.

    https://github.com/blog/831-issues-2-0-the-next-generation.

  4. 4.

    http://ghtorrent.org/downloads.html.

  5. 5.

    https://cran.r-project.org/web/packages/nparcomp/index.html.

References

  1. Antoniol, G., Ayari, K., Di Penta, M., Khomh, F., Guéhéneuc, Y.G.: Is it a bug or an enhancement?: A text-based approach to classify change requests. In: Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds, p. 23. ACM (2008)

    Google Scholar 

  2. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)

    Article  Google Scholar 

  3. Bettenburg, N., Nagappan, M., Hassan, A.E.: Think locally, act globally: improving defect and effort prediction models. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 60–69. IEEE (2012)

    Google Scholar 

  4. Bissyandé, T.F., Lo, D., Jiang, L., Réveillere, L., Klein, J., Le Traon, Y.: Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub. In: 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE), pp. 188–197. IEEE (2013)

    Google Scholar 

  5. Fan, Q., Yu, Y., Yin, G., Wang, T., Wang, H.: Where is the road for issue reports classification based on text mining? In: 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 121–130. IEEE (2017)

    Google Scholar 

  6. Gousios, G., Pinzger, M., Deursen, A.V.: An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering, pp. 345–355. ACM (2014)

    Google Scholar 

  7. He, P., Li, B., Ma, Y.: Towards cross-project defect prediction with imbalanced feature sets. arXiv preprint arXiv:1411.4228 (2014)

  8. Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: The promises and perils of mining GitHub. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 92–101. ACM (2014)

    Google Scholar 

  9. Kitchenham, B.A., Mendes, E., Travassos, G.H.: Cross versus within-company cost estimation studies: a systematic review. IEEE Trans. Softw. Eng. 33(5), 316–329 (2007)

    Article  Google Scholar 

  10. Konietschke, F., Hothorn, L.A., Brunner, E., et al.: Rank-based multiple test procedures and simultaneous confidence intervals. Electron. J. Stat. 6, 738–759 (2012)

    Article  MathSciNet  Google Scholar 

  11. Konietschke, F., Placzek, M., Schaarschmidt, F., Hothorn, L.A.: nparcomp: An R software package for nonparametric multiple comparisons and simultaneous confidence intervals (2015)

    Google Scholar 

  12. Krishna, R., Menzies, T., Fu, W.: Too much automation? The bellwether effect and its implications for transfer learning. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 122–131. ACM (2016)

    Google Scholar 

  13. Lan, L., Tao, D., Gong, C., Guan, N., Luo, Z.: Online multi-object tracking by quadratic pseudo-boolean optimization. In: IJCAI, pp. 3396–3402 (2016)

    Google Scholar 

  14. Ma, Y., Luo, G., Zeng, X., Chen, A.: Transfer learning for cross-company software defect prediction. Inf. Softw. Technol. 54(3), 248–256 (2012)

    Article  Google Scholar 

  15. Menzies, T., Butcher, A., Marcus, A., Zimmermann, T., Cok, D.: Local vs. global models for effort estimation and defect prediction. In: Automated Software Engineering, pp. 343–351. IEEE (2011)

    Google Scholar 

  16. Merten, T., Falis, M., Hübner, P., Quirchmayr, T., Bürsner, S., Paech, B.: Software feature request detection in issue tracking systems. In: 2016 IEEE 24th International Requirements Engineering Conference (RE), pp. 166–175. IEEE (2016)

    Google Scholar 

  17. Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the 28th International Conference on Software Engineering, pp. 452–461. ACM (2006)

    Google Scholar 

  18. Nam, J., Pan, S.J., Kim, S.: Transfer defect learning. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 382–391. IEEE Press (2013)

    Google Scholar 

  19. Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011)

    Article  Google Scholar 

  20. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  21. Peters, F., Menzies, T., Marcus, A.: Better cross company defect prediction. In: Mining Software Repositories, pp. 409–418 (2013)

    Google Scholar 

  22. Posnett, D., Filkov, V., Devanbu, P.: Ecological inference in empirical software engineering. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, pp. 362–371. IEEE Computer Society (2011)

    Google Scholar 

  23. Premraj, R., Herzig, K.: Network versus code metrics to predict defects: a replication study. In: 2011 International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 215–224. IEEE (2011)

    Google Scholar 

  24. Turhan, B., Menzies, T., Bener, A.B., Di Stefano, J.: On the relative value of cross-company and within-company data for defect prediction. Empirical Softw. Eng. 14(5), 540–578 (2009)

    Article  Google Scholar 

  25. Uddin, J., Ghazali, R., Deris, M.M., Naseem, R., Shah, H.: A survey on bug prioritization. Artif. Intell. Rev. 47(2), 145–180 (2017)

    Article  Google Scholar 

  26. Van Der Veen, E., Gousios, G., Zaidman, A.: Automatically prioritizing pull requests. In: Proceedings of the 12th Working Conference on Mining Software Repositories, pp. 357–361. IEEE Press (2015)

    Google Scholar 

  27. Yu, Y., Wang, H., Filkov, V., Devanbu, P., Vasilescu, B.: Wait for it: determinants of pull request evaluation latency on GitHub. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR), pp. 367–371. IEEE (2015)

    Google Scholar 

  28. Yu, Y., Wang, H., Yin, G., Wang, T.: Reviewer recommendation for pull-requests in github: what can we learn from code review and bug assignment? Inf. Softw. Technol. 74, 204–218 (2016)

    Article  Google Scholar 

  29. Zanetti, M.S., Scholtes, I., Tessone, C.J., Schweitzer, F.: Categorizing bugs with social networks: a case study on four open source software communities. In: Proceedings of the 35th International Conference on Software Engineering, pp. 1032–1041. IEEE (2013)

    Google Scholar 

  30. Zhang, F., Mockus, A., Keivanloo, I., Zou, Y.: Towards building a universal defect prediction model. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 182–191. ACM (2014)

    Google Scholar 

  31. Zhang, F., Zheng, Q., Zou, Y., Hassan, A.E.: Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proceedings of the 38th International Conference on Software Engineering, pp. 309–320. ACM (2016)

    Google Scholar 

  32. Zhou, Y., Tong, Y., Gu, R., Gall, H.: Combining text mining and data mining for bug report classification. J. Softw. Evol. Process 28(3), 150–176 (2016)

    Article  Google Scholar 

  33. Zimmermann, T., Nagappan, N., Gall, H., Giger, E., Murphy, B.: Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 91–100. ACM (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yarong Zeng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zeng, Y. et al. (2018). Cross-Project Issue Classification Based on Ensemble Modeling in a Social Coding World. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11304. Springer, Cham. https://doi.org/10.1007/978-3-030-04212-7_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04212-7_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04211-0

  • Online ISBN: 978-3-030-04212-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics