SCVD: A New Semantics-Based Approach for Cloned Vulnerable Code Detection

  • Deqing Zou
  • Hanchao Qi
  • Zhen LiEmail author
  • Song Wu
  • Hai Jin
  • Guozhong Sun
  • Sujuan Wang
  • Yuyi Zhong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10327)


The behavior of copying existing code to reuse or modify its functionality is very common in the software development. However, when developers clone the existing code, they also clone any vulnerabilities in it. Thus, it seriously affects the security of the system. In this paper, we propose a novel semantics-based approach called SCVD for cloned vulnerable code detection. We use the full path traversal algorithm to transform the Program Dependency Graph (PDG) into a tree structure while preserving all the semantic information carried by the PDG and apply the tree to the cloned vulnerable code detection. We use the identifier name mapping technique to eliminate the impact of identifier name modification. Our key insights are converting the complex graph similarity problem into a simpler tree similarity problem and using the identifier name mapping technique to improve the effectiveness of semantics-based cloned vulnerable code detection. We have developed a practical tool based on our approach and performed a large number of experiments to evaluate the performance from three aspects, including the false positive rate, false negative rate, and time cost. The experiment results show that our approach has a significant improvement on the vulnerability detection effectiveness compared with the existing approaches and has lower time cost than subgraph isomorphism approaches.


Vulnerability detection Cloned code Semantics 



This paper is supported by the National Science Foundation of China under grant No. 61672249, the National Basic Research Program of China (973 Program) under grant No. 2014CB340600, the National Key Research & Development (R&D) Plan of China under grant No. 2016YFB0200300, and the Natural Science Foundation of Hebei Province under grant No. F2015201089.


  1. 1.
  2. 2.
  3. 3.
    Baker, B.S.: On finding duplication and near-duplication in large software systems. In: Proceedings of 2nd Working Conference on Reverse Engineering, pp. 86–95. IEEE (1995)Google Scholar
  4. 4.
    Baxter, I.D., Yahin, A., Moura, L., Sant’Anna, M., Bier, L.: Clone detection using abstract syntax trees. In: Proceedings of International Conference on Software Maintenance, pp. 368–377 (1998)Google Scholar
  5. 5.
    Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for matching large graphs. In: Proceedings of 3rd IAPR-TC15 Workshop on Graph-Based Representations in Pattern Recognition, pp. 149–159 (2001)Google Scholar
  6. 6.
    Csardi, G., Nepusz, T.: The igraph software package for complex network research. Int. J. Complex Syst. 1695(5), 1–9 (2006)Google Scholar
  7. 7.
    Ducasse, S., Rieger, M., Demeyer, S.: A language independent approach for detecting duplicated code. In: Proceedings of the International Conference on Software Maintenance (ICSM), pp. 109–118. IEEE (1999)Google Scholar
  8. 8.
    Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. (TOPLAS) 9(3), 319–349 (1987)CrossRefzbMATHGoogle Scholar
  9. 9.
    Gabel, M., Jiang, L., Su, Z.: Scalable detection of semantic clones. In: Proceedings of ACM/IEEE 30th International Conference on Software Engineering (ICSE), pp. 321–330. IEEE (2008)Google Scholar
  10. 10.
    Jang, J., Agrawal, A., Brumley, D.: ReDeBug: finding unpatched code clones in entire OS distributions. In: Proceedings of IEEE Symposium on Security and Privacy (SP), pp. 48–62. IEEE (2012)Google Scholar
  11. 11.
    Jiang, L., Misherghi, G., Su, Z., Glondu, S.: Deckard: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th International Conference on Software Engineering, pp. 96–105. IEEE Computer Society (2007)Google Scholar
  12. 12.
    Johnson, J.H.: Identifying redundancy in source code using fingerprints. In: Proceedings of the 1993 Conference of the Centre for Advanced Studies on Collaborative Research, pp. 171–183. IBM Press (1993)Google Scholar
  13. 13.
    Johnson, J.H.: Substring matching for clone detection and change tracking. In: Proceedings of the International Conference on Software Maintenance (ICSM), vol. 94, pp. 120–126 (1994)Google Scholar
  14. 14.
    Jones, J.: Abstract syntax tree implementation idioms. In: Proceedings of the 10th Conference on Pattern Languages of Programs (PLoP). p. 26 (2003)Google Scholar
  15. 15.
    Kamiya, T., Kusumoto, S., Inoue, K.: CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. 28(7), 654–670 (2002)CrossRefGoogle Scholar
  16. 16.
    Kim, M., Sazawal, V., Notkin, D., Murphy, G.: An empirical study of code clone genealogies. In: ACM SIGSOFT Software Engineering Notes, vol. 30, pp. 187–196. ACM (2005)Google Scholar
  17. 17.
    Komondoor, R., Horwitz, S.: Using slicing to identify duplication in source code. In: Cousot, P. (ed.) SAS 2001. LNCS, vol. 2126, pp. 40–56. Springer, Heidelberg (2001). doi: 10.1007/3-540-47764-0_3 CrossRefGoogle Scholar
  18. 18.
    Koschke, R., Falke, R., Frenzel, P.: Clone detection using abstract syntax suffix trees. In: Proceedings of the 13th Working Conference on Reverse Engineering (WCRE), pp. 253–262. IEEE (2006)Google Scholar
  19. 19.
    Li, J., Ernst, M.D.: CBCD: cloned buggy code detector. In: Proceedings of 34th International Conference on Software Engineering (ICSE), pp. 310–320. IEEE (2012)Google Scholar
  20. 20.
    Li, Z., Zou, D., Xu, S., Jin, H., Qi, H., Hu, J.: VulPecker: an automated vulnerability detection system based on code similarity analysis. In: Proceedings of the 32nd Annual Conference on Computer Security Applications (ACSAC), pp. 201–213. ACM (2016)Google Scholar
  21. 21.
    Li, Z., Lu, S., Myagmar, S., Zhou, Y.: CP-Miner: finding copy-paste and related bugs in large-scale software code. IEEE Trans. Softw. Eng. 32(3), 176–192 (2006)CrossRefGoogle Scholar
  22. 22.
    Mayrand, J., Leblanc, C., Merlo, E.: Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of International Conference on Software Maintenance (ICSM), p. 244 (1996)Google Scholar
  23. 23.
    Read, R.C., Corneil, D.G.: The graph isomorphism disease. J. Graph Theory 1(4), 339–363 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Sajnani, H., Saini, V., Lopes, C.: A parallel and efficient approach to large scale clone detection. J. Softw. Evol. Process 27(6), 402–429 (2015)CrossRefGoogle Scholar
  25. 25.
    Sheneamer, A., Kalita, J.: Semantic clone detection using machine learning. In: Proceedings of 15th IEEE International Conference on Machine Learning and Applications, pp. 1024–1028. IEEE (2016)Google Scholar
  26. 26.
    White, M., Tufano, M., Vendome, C., Poshyvanyk, D.: Deep learning code fragments for code clone detection. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 87–98. ACM (2016)Google Scholar
  27. 27.
    Yamaguchi, F., Golde, N., Arp, D., Rieck, K.: Modeling and discovering vulnerabilities with code property graphs. In: Proceedings of IEEE Symposium on Security and Privacy (SP), pp. 590–604. IEEE (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Deqing Zou
    • 1
  • Hanchao Qi
    • 1
  • Zhen Li
    • 1
    • 2
    Email author
  • Song Wu
    • 1
  • Hai Jin
    • 1
  • Guozhong Sun
    • 3
  • Sujuan Wang
    • 1
  • Yuyi Zhong
    • 1
  1. 1.Services Computing Technology and System Lab, Big Data Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanChina
  2. 2.School of Computer Science and TechnologyHebei UniversityBaodingChina
  3. 3.Dawning Information Industry (Beijing) Co., Ltd.BeijingChina

Personalised recommendations