Advertisement

Towards One Reusable Model for Various Software Defect Mining Tasks

  • Heng-Yi Li
  • Ming LiEmail author
  • Zhi-Hua Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11441)

Abstract

Software defect mining is playing an important role in software quality assurance. Many deep neural network based models have been proposed for software defect mining tasks, and have pushed forward the state-of-the-art mining performance. These deep models usually require a huge amount of task-specific source code for training to capture the code functionality to mine the defects. But such requirement is often hard to be satisfied in practice. On the other hand, lots of free source code and corresponding textual explanations are publicly available in the open source software repositories, which is potentially useful in modeling code functionality. However, no previous studies ever leverage these resources to help defect mining tasks. In this paper, we propose a novel framework to learn one reusable deep model for code functional representation using the huge amount of publicly available task-free source code as well as their textual explanations. And then reuse it for various software defect mining tasks. Experimental results on three major defect mining tasks with real world datasets indicate that by reusing this model in specific tasks, the mining performance outperforms its counterpart that learns deep models from scratch, especially when the training data is insufficient.

Keywords

Software defect mining Machine learning Model reuse 

Notes

Acknowledgment

This research was supported by National Key Research and Development Program (2017YFB1001903) and NSFC (61751306).

References

  1. 1.
    Alemi, M., Haghighi, H., Shahrivari, S.: CCFinder: using Spark to find clustering coefficient in big graphs. J. Supercomput. 73(11), 4683–4710 (2017)CrossRefGoogle Scholar
  2. 2.
    Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a Siamese time delay neural network. In: Advances in Neural Information Processing Systems, pp. 737–744 (1993)Google Scholar
  3. 3.
    D’Ambros, M., Lanza, M., Robbes, R.: Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir. Softw. Eng. 17(4–5), 531–577 (2012)CrossRefGoogle Scholar
  4. 4.
    Gay, G., Haiduc, S., Marcus, A., Menzies, T.: On the use of relevance feedback in IR-based concept location. In: Proceedings of the 25th IEEE International Conference on Software Maintenance, pp. 351–360 (2009)Google Scholar
  5. 5.
    Huo, X., Li, M.: Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 1909–1915 (2017)Google Scholar
  6. 6.
    Huo, X., Li, M., Zhou, Z.H.: Learning unified features from natural and programming languages for locating buggy source code. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, pp. 1606–1612 (2016)Google Scholar
  7. 7.
    Jiang, L., Misherghi, G., Su, Z., Glondu, S.: DECKARD: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th International Conference on Software Engineering, pp. 96–105 (2007)Google Scholar
  8. 8.
    Johnson, R., Zhang, T.: Effective use of word order for text categorization with convolutional neural networks. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 103–112 (2015)Google Scholar
  9. 9.
    Kim, S., Zimmermann, T., Whitehead Jr., E.J., Zeller, A.: Predicting faults from cached history. In: Proceedings of the 29th International Conference on Software Engineering, pp. 489–498 (2007)Google Scholar
  10. 10.
    Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: Proceedings of the 32nd International Conference on Machine Learning Deep Learning Workshop, vol. 2 (2015)Google Scholar
  11. 11.
    Komondoor, R., Horwitz, S.: Using slicing to identify duplication in source code. In: Cousot, P. (ed.) SAS 2001. LNCS, vol. 2126, pp. 40–56. Springer, Heidelberg (2001).  https://doi.org/10.1007/3-540-47764-0_3CrossRefGoogle Scholar
  12. 12.
    Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, pp. 2786–2792 (2016)Google Scholar
  13. 13.
    Roy, C.K., Cordy, J.R.: A survey on software clone detection research. Queen’s Sch. Comput. TR 541(115), 64–68 (2007)Google Scholar
  14. 14.
    Saha, R.K., Lease, M., Khurshid, S., Perry, D.E.: Improving bug localization using structured information retrieval. In: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, pp. 345–355 (2013)Google Scholar
  15. 15.
    de Souza, S.C.B., Anquetil, N., de Oliveira, K.M.: A study of the documentation essential to software maintenance. In: Proceedings of the 23rd Annual International Conference on Design of Communication: Documenting & Designing for Pervasive Information, pp. 68–75 (2005)Google Scholar
  16. 16.
    Svajlenko, J., Islam, J.F., Keivanloo, I., Roy, C.K., Mia, M.M.: Towards a big data curated benchmark of inter-project code clones. In: Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution, pp. 476–480 (2014)Google Scholar
  17. 17.
    Wang, S., Liu, T., Tan, L.: Automatically learning semantic features for defect prediction. In: Proceedings of the 38th International Conference on Software Engineering, pp. 297–308 (2016)Google Scholar
  18. 18.
    Wei, H.H., Li, M.: Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 3034–3040 (2017)Google Scholar
  19. 19.
    White, M., Tufano, M., Vendome, C., Poshyvanyk, D.: Deep learning code fragments for code clone detection. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 87–98 (2016)Google Scholar
  20. 20.
    Yang, X., Lo, D., Xia, X., Zhang, Y., Sun, J.: Deep learning for just-in-time defect prediction. In: Proceedings of the 2015 IEEE International Conference on Software Quality, Reliability and Security, pp. 17–26 (2015)Google Scholar
  21. 21.
    Zhou, J., Zhang, H., Lo, D.: Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering, pp. 14–24 (2012)Google Scholar
  22. 22.
    Zimmermann, T., Premraj, R., Zeller, A.: Predicting defects for Eclipse. In: Proceedings of the 3rd International Workshop on Predictor Models in Software Engineering, p. 9 (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.National Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina

Personalised recommendations