Abstract
Automatically detecting buffer overflow vulnerabilities is an important research topic in software security. Recent studies have shown that vulnerability detection performance utilizing deep learning-based techniques can be significantly enhanced. However, due to information loss during code representation, existing approaches cannot learn the features associated with vulnerabilities, leading to a high false negative rate (FNR) and low precision. To resolve the existing problems, we propose a method for buffer overflow vulnerability detection based on graph feature extraction (BovdGFE) in C/C++ programs. BovdGFE constructs the buffer overflow function samples. Then, we present a new representation structure, code representation sequence (CoRS), which incorporates the control flow, data dependencies, and syntax structure of the vulnerable code for reducing information loss during code representation. After the function samples are transformed into CoRS, a deep learning model is used to learn vulnerable features and perform vulnerability classification. The results of the experiments show that BovdGFE improves the precision and FNR by 6.3% and 3.9% respectively compared with state-of-the-art methods, which can significantly improve the capability of vulnerability detection.
Similar content being viewed by others
References
Liang H, Wang L, Wu D, Xu J (2016) Mlsa: a static bugs analysis tool based on llvm ir. In: 2016 17th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD), IEEE, pp 407–412
Fang Z, Liu Q, Zhang Y, Wang K, Wang Z, Wu Q (2017) A static technique for detecting input validation vulnerabilities in android apps. Sci China Inf Sci 60(5):1–16
Kim S, Woo S, Lee H, Oh H (2017) Vuddy: A scalable approach for vulnerable code clone discovery. In: 2017 IEEE symposium on security and privacy (SP), IEEE, pp 595–614
Li Z, Zou D, Xu S, Jin H, Qi H, Hu J (2016) Vulpecker: an automated vulnerability detection system based on code similarity analysis. In: Proceedings of the 32nd annual conference on computer security applications, pp 201–213
Cadar C, Dunbar D, Engler D (2008) KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proc. operating system design and implementation, pp 209–224
Yamaguchi F, Lottmann M, Rieck K (2012) Generalized vulnerability extrapolation using abstract syntax trees. In: Proceedings of the 28th annual computer security applications conference, pp 359–368
Godefroid P, Levin MY, Molnar D (2012) Sage: whitebox fuzzing for security testing: sage has had a remarkable impact at microsoft. Queue 10(1):20
Manès VJ, Han H, Han C, Cha SK, Egele M, Schwartz EJ, Woo M (2019) The art, science, and engineering of fuzzing: a survey. IEEE Trans Softw Eng 47(11):2312–2331
Peng H, Shoshitaishvili Y, Payer M (2018) T-fuzz: fuzzing by program transformation. In: 2018 IEEE symposium on security and privacy (SP), IEEE, pp 697–710
She D, Chen Y, Shah A, Ray B, Jana S (2020) Neutaint: Efficient dynamic taint analysis with neural networks. In: 2020 IEEE symposium on security and privacy (SP), IEEE, pp 1527–1543
Niu W, Zhang X, Du X, Zhao L, Cao R, Guizani M (2020) A deep learning based static taint analysis approach for iot software vulnerability location. Measurement 107139:152
Bojović P, Bašičević I, Ocovaj S, Popović M (2019) A practical approach to detection of distributed denial-of-service attacks using a hybrid detection method. Comput Electr Eng 73:84–96
Smys S, Basar A, Wang H, et al. (2020) Hybrid intrusion detection system for internet of things (iot). J ISMAC 2(04):190–199
Chen Z, Zou D, Li Z, Jin H (2020) Intelligent vulnerability detection system based on abstract syntax tree. J Inf Secur 5(4):13
Wang T, Guo J, Wu Z, Xu T (2021) Ifta: iterative filtering by using tf-aicl algorithm for chinese encyclopedia knowledge refinement. Appl Intell 51(8):6265–6293
Chen K, Zhang Z, Long J, Zhang H (2016) Turning from tf-idf to tf-igm for term weighting in text classification. Expert Syst Appl 66:245–260
Li Z, Zou D, Xu S, Jin H, Zhu Y, Chen Z (2021) Sysevr: a framework for using deep learning to detect software vulnerabilities. IEEE Trans Dependable Secure Comput 19(4):2244–2258
Nandi A, Mandal A, Atreja S, Dasgupta GB, Bhattacharya S (2016) Anomaly detection using program control flow graph mining from execution logs. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 215–224
Wang J, Park S, Park CS (2022) Spatial data dependence graph based pre-rtl simulator for convolutional neural network dataflows. IEEE Access 10:11382–11403
Zhang J, Wang X, Zhang H, Sun H, Wang K, Liu X (2019) A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st international conference on software engineering (ICSE), IEEE, pp 783–794
Ullah K, Rashid I, Afzal H, Iqbal MMW, Bangash YA, Abbas H (2020) Ss7 vulnerabilities—a survey and implementation of machine learning vs rule based filtering for detection of ss7 network attacks. IEEE Commun Surv Tutor 22(2):1337–1371
Croft R, Newlands D, Chen Z, Babar MA (2021) An empirical study of rule-based and learning-based approaches for static application security testing. In: Proceedings of the 15th ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), pp 1–12
Du X, Chen B, Li Y, Guo J, Zhou Y, Liu Y, Jiang Y (2019) Leopard: identifying vulnerable code for vulnerability assessment through program metrics. In: 2019 IEEE/ACM 41st international conference on software engineering (ICSE), IEEE, pp 60–71
Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) Vuldeepecker: a deep learning-based system for vulnerability detection. In: Proceedings of the 25th annual network and distributed system security symposium, San Diego, California, USA
Gan S, Zhang C, Qin X, Tu X, Li K, Pei Z, Chen Z (2018) Collafl: Path sensitive fuzzing. In: 2018 IEEE symposium on security and privacy (SP), IEEE, pp 679–696
He J, Balunović M, Ambroladze N, Tsankov P, Vechev M (2019) Learning to fuzz from symbolic execution with application to smart contracts. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, pp 531–548
Mossberg M, Manzano F, Hennenfent E, Groce A, Grieco G, Feist J, Brunson T, Dinaburg A (2019) Manticore: A user-friendly symbolic execution framework for binaries and smart contracts. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE), IEEE, pp 1186–1189
Poeplau S, Francillon A (2020) Symbolic execution with {symCC}: Don’t interpret, compile!. In: 29Th USENIX security symposium (USENIX security 20), pp 181–198
Sun P, Garcia L, Salles-Loustau G, Zonouz S (2020) Hybrid firmware analysis for known mobile and iot security vulnerabilities. In: 2020 50th Annual IEEE/IFIP international conference on dependable systems and networks (DSN), IEEE, pp 373–384
Liu S, Dibaei M, Tai Y, Chen C, Zhang J, Xiang Y (2019) Cyber vulnerability intelligence for internet of things binary. IEEE Trans Ind Inf 16(3):2154–2163
Lee Y, Kwon H, Choi S-H, Lim S-H, Baek SH, Park K-W (2019) Instruction2vec: efficient preprocessor of assembly code to detect software weakness with cnn. Appl Sci 9(19):4086
Yan H, Luo S, Pan L, Zhang Y (2021) Han-bsvd: a hierarchical attention network for binary software vulnerability detection. Comput Secur 102286:108
Cao S, Sun X, Bo L, Wei Y, Li B (2021) Bgnn4vd: constructing bidirectional graph neural-network for vulnerability detection. Inf Softw Technol 106576:136
Wartschinski L, Noller Y, Vogel T, Kehrer T, Grunske L (2022) Vudenc: vulnerability detection with deep learning on a natural codebase for python. Inf Softw Technol 144:106809
Guo W, Fang Y, Huang C, Ou H, Lin C, Guo Y (2022) Hyvuldect: a hybrid semantic vulnerability mining system based on graph neural network. Comput Secur 121:102823
Salimi S, Kharrazi M (2022) Vulslicer: vulnerability detection through code slicing. J Syst Softw 193:111450
Weber M, Engert M, Schaffer N, Weking J, Krcmar H (2022) Organizational capabilities for ai implementation—coping with inscrutability and data dependency in ai. Inf Syst Front :1–21
Huang J, Zhou K, Xiong A, Li D (2022) Smart contract vulnerability detection model based on multi-task learning. Sensors 22(5):1829
Duan X, Wu J, Luo T, Yang M, Wu Y (2020) A vulnerability mining approach based on code attribute graph and attentional bi-directional lstm. J Softw 31(11):3404–3420
Mou L, Jin Z (2018) Tbcnn for dependency trees in natural language processing. In: Tree-based convolutional neural networks, pp 73–89
Acknowledgements
The authors would like to express their gratitude to the people who provided CGD. In addition, we would like to thank the anonymous reviewers for their comments and suggestions.
Funding
This work was supported by the Science Foundation of Hubei Province, China (Grant No. 2020BAB116), the Hubei Garment Information Engineering Research Center, and Ningbo Cixing Co. (Grant No. 2021Z069).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lv, X., Peng, T., Chen, J. et al. BovdGFE: buffer overflow vulnerability detection based on graph feature extraction. Appl Intell 53, 15204–15221 (2023). https://doi.org/10.1007/s10489-022-04214-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04214-8