BovdGFE: buffer overflow vulnerability detection based on graph feature extraction

Lv, Xinghang; Peng, Tao; Chen, Jia; Liu, Junping; Hu, Xinrong; He, Ruhan; Jiang, Minghua; Cao, Wenli

doi:10.1007/s10489-022-04214-8

BovdGFE: buffer overflow vulnerability detection based on graph feature extraction

Published: 12 November 2022

Volume 53, pages 15204–15221, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xinghang Lv¹,
Tao Peng^2,3,
Jia Chen ORCID: orcid.org/0000-0001-5275-3408^2,3,
Junping Liu^1,2,3,
Xinrong Hu^1,2,
Ruhan He^1,3,
Minghua Jiang^2,3 &
…
Wenli Cao¹

826 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Automatically detecting buffer overflow vulnerabilities is an important research topic in software security. Recent studies have shown that vulnerability detection performance utilizing deep learning-based techniques can be significantly enhanced. However, due to information loss during code representation, existing approaches cannot learn the features associated with vulnerabilities, leading to a high false negative rate (FNR) and low precision. To resolve the existing problems, we propose a method for buffer overflow vulnerability detection based on graph feature extraction (BovdGFE) in C/C++ programs. BovdGFE constructs the buffer overflow function samples. Then, we present a new representation structure, code representation sequence (CoRS), which incorporates the control flow, data dependencies, and syntax structure of the vulnerable code for reducing information loss during code representation. After the function samples are transformed into CoRS, a deep learning model is used to learn vulnerable features and perform vulnerability classification. The results of the experiments show that BovdGFE improves the precision and FNR by 6.3% and 3.9% respectively compared with state-of-the-art methods, which can significantly improve the capability of vulnerability detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-granularity Deep Vulnerability Detection Using Graph Neural Networks

An attention-based automatic vulnerability detection approach with GGNN

Article 16 April 2023

Vulnerability detection in Java source code using a quantum convolutional neural network with self-attentive pooling, deep sequence, and graph-based hybrid feature extraction

Article Open access 28 March 2024

Notes

https://github.com/lvxinghang/BovdGFE

References

Liang H, Wang L, Wu D, Xu J (2016) Mlsa: a static bugs analysis tool based on llvm ir. In: 2016 17th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD), IEEE, pp 407–412
Fang Z, Liu Q, Zhang Y, Wang K, Wang Z, Wu Q (2017) A static technique for detecting input validation vulnerabilities in android apps. Sci China Inf Sci 60(5):1–16
Article Google Scholar
Kim S, Woo S, Lee H, Oh H (2017) Vuddy: A scalable approach for vulnerable code clone discovery. In: 2017 IEEE symposium on security and privacy (SP), IEEE, pp 595–614
Li Z, Zou D, Xu S, Jin H, Qi H, Hu J (2016) Vulpecker: an automated vulnerability detection system based on code similarity analysis. In: Proceedings of the 32nd annual conference on computer security applications, pp 201–213
Cadar C, Dunbar D, Engler D (2008) KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proc. operating system design and implementation, pp 209–224
Yamaguchi F, Lottmann M, Rieck K (2012) Generalized vulnerability extrapolation using abstract syntax trees. In: Proceedings of the 28th annual computer security applications conference, pp 359–368
Godefroid P, Levin MY, Molnar D (2012) Sage: whitebox fuzzing for security testing: sage has had a remarkable impact at microsoft. Queue 10(1):20
Article Google Scholar
Manès VJ, Han H, Han C, Cha SK, Egele M, Schwartz EJ, Woo M (2019) The art, science, and engineering of fuzzing: a survey. IEEE Trans Softw Eng 47(11):2312–2331
Article Google Scholar
Peng H, Shoshitaishvili Y, Payer M (2018) T-fuzz: fuzzing by program transformation. In: 2018 IEEE symposium on security and privacy (SP), IEEE, pp 697–710
She D, Chen Y, Shah A, Ray B, Jana S (2020) Neutaint: Efficient dynamic taint analysis with neural networks. In: 2020 IEEE symposium on security and privacy (SP), IEEE, pp 1527–1543
Niu W, Zhang X, Du X, Zhao L, Cao R, Guizani M (2020) A deep learning based static taint analysis approach for iot software vulnerability location. Measurement 107139:152
Google Scholar
Bojović P, Bašičević I, Ocovaj S, Popović M (2019) A practical approach to detection of distributed denial-of-service attacks using a hybrid detection method. Comput Electr Eng 73:84–96
Article Google Scholar
Smys S, Basar A, Wang H, et al. (2020) Hybrid intrusion detection system for internet of things (iot). J ISMAC 2(04):190–199
Article Google Scholar
Chen Z, Zou D, Li Z, Jin H (2020) Intelligent vulnerability detection system based on abstract syntax tree. J Inf Secur 5(4):13
Google Scholar
Wang T, Guo J, Wu Z, Xu T (2021) Ifta: iterative filtering by using tf-aicl algorithm for chinese encyclopedia knowledge refinement. Appl Intell 51(8):6265–6293
Article Google Scholar
Chen K, Zhang Z, Long J, Zhang H (2016) Turning from tf-idf to tf-igm for term weighting in text classification. Expert Syst Appl 66:245–260
Article Google Scholar
Li Z, Zou D, Xu S, Jin H, Zhu Y, Chen Z (2021) Sysevr: a framework for using deep learning to detect software vulnerabilities. IEEE Trans Dependable Secure Comput 19(4):2244–2258
Article Google Scholar
Nandi A, Mandal A, Atreja S, Dasgupta GB, Bhattacharya S (2016) Anomaly detection using program control flow graph mining from execution logs. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 215–224
Wang J, Park S, Park CS (2022) Spatial data dependence graph based pre-rtl simulator for convolutional neural network dataflows. IEEE Access 10:11382–11403
Article Google Scholar
Zhang J, Wang X, Zhang H, Sun H, Wang K, Liu X (2019) A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st international conference on software engineering (ICSE), IEEE, pp 783–794
Ullah K, Rashid I, Afzal H, Iqbal MMW, Bangash YA, Abbas H (2020) Ss7 vulnerabilities—a survey and implementation of machine learning vs rule based filtering for detection of ss7 network attacks. IEEE Commun Surv Tutor 22(2):1337–1371
Article Google Scholar
Croft R, Newlands D, Chen Z, Babar MA (2021) An empirical study of rule-based and learning-based approaches for static application security testing. In: Proceedings of the 15th ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), pp 1–12
Du X, Chen B, Li Y, Guo J, Zhou Y, Liu Y, Jiang Y (2019) Leopard: identifying vulnerable code for vulnerability assessment through program metrics. In: 2019 IEEE/ACM 41st international conference on software engineering (ICSE), IEEE, pp 60–71
Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) Vuldeepecker: a deep learning-based system for vulnerability detection. In: Proceedings of the 25th annual network and distributed system security symposium, San Diego, California, USA
Gan S, Zhang C, Qin X, Tu X, Li K, Pei Z, Chen Z (2018) Collafl: Path sensitive fuzzing. In: 2018 IEEE symposium on security and privacy (SP), IEEE, pp 679–696
He J, Balunović M, Ambroladze N, Tsankov P, Vechev M (2019) Learning to fuzz from symbolic execution with application to smart contracts. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, pp 531–548
Mossberg M, Manzano F, Hennenfent E, Groce A, Grieco G, Feist J, Brunson T, Dinaburg A (2019) Manticore: A user-friendly symbolic execution framework for binaries and smart contracts. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE), IEEE, pp 1186–1189
Poeplau S, Francillon A (2020) Symbolic execution with {symCC}: Don’t interpret, compile!. In: 29Th USENIX security symposium (USENIX security 20), pp 181–198
Sun P, Garcia L, Salles-Loustau G, Zonouz S (2020) Hybrid firmware analysis for known mobile and iot security vulnerabilities. In: 2020 50th Annual IEEE/IFIP international conference on dependable systems and networks (DSN), IEEE, pp 373–384
Liu S, Dibaei M, Tai Y, Chen C, Zhang J, Xiang Y (2019) Cyber vulnerability intelligence for internet of things binary. IEEE Trans Ind Inf 16(3):2154–2163
Article Google Scholar
Lee Y, Kwon H, Choi S-H, Lim S-H, Baek SH, Park K-W (2019) Instruction2vec: efficient preprocessor of assembly code to detect software weakness with cnn. Appl Sci 9(19):4086
Article Google Scholar
Yan H, Luo S, Pan L, Zhang Y (2021) Han-bsvd: a hierarchical attention network for binary software vulnerability detection. Comput Secur 102286:108
Google Scholar
Cao S, Sun X, Bo L, Wei Y, Li B (2021) Bgnn4vd: constructing bidirectional graph neural-network for vulnerability detection. Inf Softw Technol 106576:136
Google Scholar
Wartschinski L, Noller Y, Vogel T, Kehrer T, Grunske L (2022) Vudenc: vulnerability detection with deep learning on a natural codebase for python. Inf Softw Technol 144:106809
Article Google Scholar
Guo W, Fang Y, Huang C, Ou H, Lin C, Guo Y (2022) Hyvuldect: a hybrid semantic vulnerability mining system based on graph neural network. Comput Secur 121:102823
Article Google Scholar
Salimi S, Kharrazi M (2022) Vulslicer: vulnerability detection through code slicing. J Syst Softw 193:111450
Article Google Scholar
Weber M, Engert M, Schaffer N, Weking J, Krcmar H (2022) Organizational capabilities for ai implementation—coping with inscrutability and data dependency in ai. Inf Syst Front :1–21
Huang J, Zhou K, Xiong A, Li D (2022) Smart contract vulnerability detection model based on multi-task learning. Sensors 22(5):1829
Article Google Scholar
Duan X, Wu J, Luo T, Yang M, Wu Y (2020) A vulnerability mining approach based on code attribute graph and attentional bi-directional lstm. J Softw 31(11):3404–3420
Google Scholar
Mou L, Jin Z (2018) Tbcnn for dependency trees in natural language processing. In: Tree-based convolutional neural networks, pp 73–89

Download references

Acknowledgements

The authors would like to express their gratitude to the people who provided CGD. In addition, we would like to thank the anonymous reviewers for their comments and suggestions.

Funding

This work was supported by the Science Foundation of Hubei Province, China (Grant No. 2020BAB116), the Hubei Garment Information Engineering Research Center, and Ningbo Cixing Co. (Grant No. 2021Z069).

Author information

Authors and Affiliations

School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, 430200, Hubei, China
Xinghang Lv, Junping Liu, Xinrong Hu, Ruhan He & Wenli Cao
Hubei Provincial Engineering Research Center for Intelligent Textile and Fashion, Wuhan, 430200, Hubei, China
Tao Peng, Jia Chen, Junping Liu, Xinrong Hu & Minghua Jiang
Engineering Research Center of Hubei Province for Clothing Information, Wuhan, 430200, Hubei, China
Tao Peng, Jia Chen, Junping Liu, Ruhan He & Minghua Jiang

Authors

Xinghang Lv
View author publications
You can also search for this author in PubMed Google Scholar
Tao Peng
View author publications
You can also search for this author in PubMed Google Scholar
Jia Chen
View author publications
You can also search for this author in PubMed Google Scholar
Junping Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xinrong Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ruhan He
View author publications
You can also search for this author in PubMed Google Scholar
Minghua Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Wenli Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Tao Peng or Jia Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lv, X., Peng, T., Chen, J. et al. BovdGFE: buffer overflow vulnerability detection based on graph feature extraction. Appl Intell 53, 15204–15221 (2023). https://doi.org/10.1007/s10489-022-04214-8

Download citation

Accepted: 26 September 2022
Published: 12 November 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04214-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BovdGFE: buffer overflow vulnerability detection based on graph feature extraction

Abstract

Access this article

Similar content being viewed by others

Multi-granularity Deep Vulnerability Detection Using Graph Neural Networks

An attention-based automatic vulnerability detection approach with GGNN

Vulnerability detection in Java source code using a quantum convolutional neural network with self-attentive pooling, deep sequence, and graph-based hybrid feature extraction

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

BovdGFE: buffer overflow vulnerability detection based on graph feature extraction

Abstract

Access this article

Similar content being viewed by others

Multi-granularity Deep Vulnerability Detection Using Graph Neural Networks

An attention-based automatic vulnerability detection approach with GGNN

Vulnerability detection in Java source code using a quantum convolutional neural network with self-attentive pooling, deep sequence, and graph-based hybrid feature extraction

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation