Skip to main content
Log in

BovdGFE: buffer overflow vulnerability detection based on graph feature extraction

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Automatically detecting buffer overflow vulnerabilities is an important research topic in software security. Recent studies have shown that vulnerability detection performance utilizing deep learning-based techniques can be significantly enhanced. However, due to information loss during code representation, existing approaches cannot learn the features associated with vulnerabilities, leading to a high false negative rate (FNR) and low precision. To resolve the existing problems, we propose a method for buffer overflow vulnerability detection based on graph feature extraction (BovdGFE) in C/C++ programs. BovdGFE constructs the buffer overflow function samples. Then, we present a new representation structure, code representation sequence (CoRS), which incorporates the control flow, data dependencies, and syntax structure of the vulnerable code for reducing information loss during code representation. After the function samples are transformed into CoRS, a deep learning model is used to learn vulnerable features and perform vulnerability classification. The results of the experiments show that BovdGFE improves the precision and FNR by 6.3% and 3.9% respectively compared with state-of-the-art methods, which can significantly improve the capability of vulnerability detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Algorithm 2
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://github.com/lvxinghang/BovdGFE

References

  1. Liang H, Wang L, Wu D, Xu J (2016) Mlsa: a static bugs analysis tool based on llvm ir. In: 2016 17th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD), IEEE, pp 407–412

  2. Fang Z, Liu Q, Zhang Y, Wang K, Wang Z, Wu Q (2017) A static technique for detecting input validation vulnerabilities in android apps. Sci China Inf Sci 60(5):1–16

    Article  Google Scholar 

  3. Kim S, Woo S, Lee H, Oh H (2017) Vuddy: A scalable approach for vulnerable code clone discovery. In: 2017 IEEE symposium on security and privacy (SP), IEEE, pp 595–614

  4. Li Z, Zou D, Xu S, Jin H, Qi H, Hu J (2016) Vulpecker: an automated vulnerability detection system based on code similarity analysis. In: Proceedings of the 32nd annual conference on computer security applications, pp 201–213

  5. Cadar C, Dunbar D, Engler D (2008) KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proc. operating system design and implementation, pp 209–224

  6. Yamaguchi F, Lottmann M, Rieck K (2012) Generalized vulnerability extrapolation using abstract syntax trees. In: Proceedings of the 28th annual computer security applications conference, pp 359–368

  7. Godefroid P, Levin MY, Molnar D (2012) Sage: whitebox fuzzing for security testing: sage has had a remarkable impact at microsoft. Queue 10(1):20

    Article  Google Scholar 

  8. Manès VJ, Han H, Han C, Cha SK, Egele M, Schwartz EJ, Woo M (2019) The art, science, and engineering of fuzzing: a survey. IEEE Trans Softw Eng 47(11):2312–2331

    Article  Google Scholar 

  9. Peng H, Shoshitaishvili Y, Payer M (2018) T-fuzz: fuzzing by program transformation. In: 2018 IEEE symposium on security and privacy (SP), IEEE, pp 697–710

  10. She D, Chen Y, Shah A, Ray B, Jana S (2020) Neutaint: Efficient dynamic taint analysis with neural networks. In: 2020 IEEE symposium on security and privacy (SP), IEEE, pp 1527–1543

  11. Niu W, Zhang X, Du X, Zhao L, Cao R, Guizani M (2020) A deep learning based static taint analysis approach for iot software vulnerability location. Measurement 107139:152

    Google Scholar 

  12. Bojović P, Bašičević I, Ocovaj S, Popović M (2019) A practical approach to detection of distributed denial-of-service attacks using a hybrid detection method. Comput Electr Eng 73:84–96

    Article  Google Scholar 

  13. Smys S, Basar A, Wang H, et al. (2020) Hybrid intrusion detection system for internet of things (iot). J ISMAC 2(04):190–199

    Article  Google Scholar 

  14. Chen Z, Zou D, Li Z, Jin H (2020) Intelligent vulnerability detection system based on abstract syntax tree. J Inf Secur 5(4):13

    Google Scholar 

  15. Wang T, Guo J, Wu Z, Xu T (2021) Ifta: iterative filtering by using tf-aicl algorithm for chinese encyclopedia knowledge refinement. Appl Intell 51(8):6265–6293

    Article  Google Scholar 

  16. Chen K, Zhang Z, Long J, Zhang H (2016) Turning from tf-idf to tf-igm for term weighting in text classification. Expert Syst Appl 66:245–260

    Article  Google Scholar 

  17. Li Z, Zou D, Xu S, Jin H, Zhu Y, Chen Z (2021) Sysevr: a framework for using deep learning to detect software vulnerabilities. IEEE Trans Dependable Secure Comput 19(4):2244–2258

    Article  Google Scholar 

  18. Nandi A, Mandal A, Atreja S, Dasgupta GB, Bhattacharya S (2016) Anomaly detection using program control flow graph mining from execution logs. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 215–224

  19. Wang J, Park S, Park CS (2022) Spatial data dependence graph based pre-rtl simulator for convolutional neural network dataflows. IEEE Access 10:11382–11403

    Article  Google Scholar 

  20. Zhang J, Wang X, Zhang H, Sun H, Wang K, Liu X (2019) A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st international conference on software engineering (ICSE), IEEE, pp 783–794

  21. Ullah K, Rashid I, Afzal H, Iqbal MMW, Bangash YA, Abbas H (2020) Ss7 vulnerabilities—a survey and implementation of machine learning vs rule based filtering for detection of ss7 network attacks. IEEE Commun Surv Tutor 22(2):1337–1371

    Article  Google Scholar 

  22. Croft R, Newlands D, Chen Z, Babar MA (2021) An empirical study of rule-based and learning-based approaches for static application security testing. In: Proceedings of the 15th ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), pp 1–12

  23. Du X, Chen B, Li Y, Guo J, Zhou Y, Liu Y, Jiang Y (2019) Leopard: identifying vulnerable code for vulnerability assessment through program metrics. In: 2019 IEEE/ACM 41st international conference on software engineering (ICSE), IEEE, pp 60–71

  24. Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) Vuldeepecker: a deep learning-based system for vulnerability detection. In: Proceedings of the 25th annual network and distributed system security symposium, San Diego, California, USA

  25. Gan S, Zhang C, Qin X, Tu X, Li K, Pei Z, Chen Z (2018) Collafl: Path sensitive fuzzing. In: 2018 IEEE symposium on security and privacy (SP), IEEE, pp 679–696

  26. He J, Balunović M, Ambroladze N, Tsankov P, Vechev M (2019) Learning to fuzz from symbolic execution with application to smart contracts. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, pp 531–548

  27. Mossberg M, Manzano F, Hennenfent E, Groce A, Grieco G, Feist J, Brunson T, Dinaburg A (2019) Manticore: A user-friendly symbolic execution framework for binaries and smart contracts. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE), IEEE, pp 1186–1189

  28. Poeplau S, Francillon A (2020) Symbolic execution with {symCC}: Don’t interpret, compile!. In: 29Th USENIX security symposium (USENIX security 20), pp 181–198

  29. Sun P, Garcia L, Salles-Loustau G, Zonouz S (2020) Hybrid firmware analysis for known mobile and iot security vulnerabilities. In: 2020 50th Annual IEEE/IFIP international conference on dependable systems and networks (DSN), IEEE, pp 373–384

  30. Liu S, Dibaei M, Tai Y, Chen C, Zhang J, Xiang Y (2019) Cyber vulnerability intelligence for internet of things binary. IEEE Trans Ind Inf 16(3):2154–2163

    Article  Google Scholar 

  31. Lee Y, Kwon H, Choi S-H, Lim S-H, Baek SH, Park K-W (2019) Instruction2vec: efficient preprocessor of assembly code to detect software weakness with cnn. Appl Sci 9(19):4086

    Article  Google Scholar 

  32. Yan H, Luo S, Pan L, Zhang Y (2021) Han-bsvd: a hierarchical attention network for binary software vulnerability detection. Comput Secur 102286:108

    Google Scholar 

  33. Cao S, Sun X, Bo L, Wei Y, Li B (2021) Bgnn4vd: constructing bidirectional graph neural-network for vulnerability detection. Inf Softw Technol 106576:136

    Google Scholar 

  34. Wartschinski L, Noller Y, Vogel T, Kehrer T, Grunske L (2022) Vudenc: vulnerability detection with deep learning on a natural codebase for python. Inf Softw Technol 144:106809

    Article  Google Scholar 

  35. Guo W, Fang Y, Huang C, Ou H, Lin C, Guo Y (2022) Hyvuldect: a hybrid semantic vulnerability mining system based on graph neural network. Comput Secur 121:102823

    Article  Google Scholar 

  36. Salimi S, Kharrazi M (2022) Vulslicer: vulnerability detection through code slicing. J Syst Softw 193:111450

    Article  Google Scholar 

  37. Weber M, Engert M, Schaffer N, Weking J, Krcmar H (2022) Organizational capabilities for ai implementation—coping with inscrutability and data dependency in ai. Inf Syst Front :1–21

  38. Huang J, Zhou K, Xiong A, Li D (2022) Smart contract vulnerability detection model based on multi-task learning. Sensors 22(5):1829

    Article  Google Scholar 

  39. Duan X, Wu J, Luo T, Yang M, Wu Y (2020) A vulnerability mining approach based on code attribute graph and attentional bi-directional lstm. J Softw 31(11):3404–3420

    Google Scholar 

  40. Mou L, Jin Z (2018) Tbcnn for dependency trees in natural language processing. In: Tree-based convolutional neural networks, pp 73–89

Download references

Acknowledgements

The authors would like to express their gratitude to the people who provided CGD. In addition, we would like to thank the anonymous reviewers for their comments and suggestions.

Funding

This work was supported by the Science Foundation of Hubei Province, China (Grant No. 2020BAB116), the Hubei Garment Information Engineering Research Center, and Ningbo Cixing Co. (Grant No. 2021Z069).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Tao Peng or Jia Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lv, X., Peng, T., Chen, J. et al. BovdGFE: buffer overflow vulnerability detection based on graph feature extraction. Appl Intell 53, 15204–15221 (2023). https://doi.org/10.1007/s10489-022-04214-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04214-8

Keywords

Navigation