Detecting vulnerable software functions via text and dependency features

Xu, Wenlin; Li, Tong; Wang, Jinsong; Tang, Yahui

doi:10.1007/s00500-022-07775-5

Detecting vulnerable software functions via text and dependency features

Data analytics and machine learning
Published: 07 January 2023

Volume 27, pages 5425–5435, (2023)
Cite this article

Soft Computing Aims and scope Submit manuscript

Wenlin Xu^1,2,
Tong Li^3,4,
Jinsong Wang² &
…
Yahui Tang¹

273 Accesses
1 Citation
Explore all metrics

Abstract

Detecting vulnerabilities in software is crucial to guarantee the security of software systems. Most previous methods focus on training a classification or regression model on the text feature of the source code to predict vulnerabilities. However, it is not always easy to obtain the labeled vulnerabilities in practical applications, and using only the text feature is insufficient to find the vulnerabilities in complex software systems. To address these problems, in this paper, we propose an unsupervised method to detect vulnerable software functions, which uses both text and dependency features of the source code to improve the detection accuracy. Specifically, we first extract the text and dependency features from the source code and concatenate them to the combined feature. We then learn a deep autoencoder to transform the combined feature into low-dimensional embedding. We finally apply an outlier detection method on the embedding to predict the vulnerable functions. We extensively evaluated the proposed method on seven C/C++ program datasets, and the results illustrate that our method improves F1 score on average of 88 and 66% over comparison methods Rats and Joern, which verifies the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting Software Vulnerabilities Based on Hierarchical Graph Attention Network

Vulnerability Detection Using Deep Learning Based Function Classification

Deep Learning-Based Vulnerable Function Detection: A Benchmark

Data Availability

Enquiries about data availability should be directed to the authors.

Notes

References

Aggarwal CC (2015) Time series and multidimensional streaming outlier detection. Outlier Analysis. Springer, New York, pp 225–264
Google Scholar
Anowar F, Sadaoui S, Selim B (2021) Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne). Comput Sci Rev 40(100):378
MathSciNet MATH Google Scholar
Aremu OO, Hyland-Wood D, McAree PR (2020) A machine learning approach to circumventing the curse of dimensionality in discontinuous time series machine data. Reliab Eng Syst Safety 195(106):706
Google Scholar
Breunig MM, Kriegel HP, Ng RT (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data. ACM, Dallas, Texas, USA, pp 93–104
Chakraborty S, Krishna R, Ding Y (2022) Deep learning based vulnerability detection: are we there yet. IEEE Trans Softw Eng 48(9):3280–3296
Article Google Scholar
Chibotaru V, Bichsel B, Raychev V (2019) Scalable taint specification inference with big code. In: Proceedings of the 40th ACM SIGPLAN conference on programming language design and implementation (PLDI ’19). ACM, Phoenix, AZ, pp 760–774
Dey T, Karnauch A, Mockus A (2021) Representation of developer expertise in open source software. In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE 2021). IEEE, Electr network, pp 995–1007
Duan X, Wu J, Luo T (2020) Vulnerability mining method based on code property graph and attention BILSTM. J Softw 31(11):3404–3420
Google Scholar
Filus K, Boryszko P, Domanska J et al (2021) Efficient feature selection for static analysis vulnerability prediction. Sensors 21(4):1133
Article Google Scholar
Han J, Pei J, Kamber M (eds) (2011) Data mining: concepts and techniques. Elsevier, USA
Google Scholar
Hata H, Mizuno O, Kikuno T (2010) Fault-prone module detection using large-scale text features based on spam filtering. Empir Softw Eng 15:147–165
Article Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet MATH Google Scholar
Landman D, Serebrenik A, Vinju JJ (2017) Challenges for static analysis of java refection-literature review and empirical study. In: 39th IEEE/ACM international conference on software engineering (ICSE). IEEE, Buenos Aires, ARGENTINA, pp 507–518
Li B, Zhou Y, Wang Y (2005) Matrixbased component dependence representation and its applications in software quality assurance. ACM SIGPLAN Notices 40:29–36
Article Google Scholar
Li Y, Xue Y, Chen H (2019) Cerebro: Context-aware adaptive fuzzing for effective vulnerability detection. In: ESEC/FSE’2019 proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. ACM, Tallinn, ESTONIA, pp 533–544
Li Z, Zou D, Xu S (2021) Sysevr: A framework for using deep learning to detect software vulnerabilities. IEEE Trans Depend Secur Comput
Lin G, Wen S, Han QL (2020) Software vulnerability detection using deep neural networks: a survey. Proc IEEE 108(10):1825–1848
Article Google Scholar
Liu Z, Qian P, Wang X (2021) Combining graph neural networks with expert knowledge for smart contract vulnerability detection. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2021.3095196
Article Google Scholar
Neuhaus S, Zimmermann T, Holler C (2007) Predicting vulnerable software components. In: 14th ACM conference on computer and communication security. ACM, Alexandria, VA, pp 529–540
Nguyen VH, Tran LMS (2010) Predicting vulnerable software components with dependency graphs. In: Proceedings of the 6th international workshop on security measurements and metrics, pp 1–8
Pang Y, Xue X, Namin A (2015) Predicting vulnerable software components through n-gram analysis and statistical feature selection. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA). IEEE, Miami, pp 543–548
Pang Y, Xue X, Wang H (2017) Predicting vulnerable software components through deep neural network. In: Proceedings of the 2017 international conference on deep learning technologies. ACM, Chengdu, China, pp 6–10
Perl H, Dechand S, Smith M (2015) Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In: 22nd ACM SIGSAC conference on computer and communications security (CCS). ACM, Denver, CO, pp 426–437
Qasem A, Shirani P, Debbabi M (2021) Automatic vulnerability detection in embedded devices and firmware: survey and layered taxonomies. ACM Comput Surv 54(2):1–42
Article Google Scholar
Russell RL, Kim L, Hamilton LH (2018) Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, Orlando, FL, pp 757–762
Şahin CB, Dinler ÖB, Abualigah L (2021) Prediction of software vulnerability based deep symbiotic genetic algorithms: phenotyping of dominant-features. Appl Intell 51(11):8271–8287
Article Google Scholar
Shirey R (2007) Internet security glossary, version 2. RFC 4949:1–365
Google Scholar
Sun H, Cui L, Li L (2021) Vdsimilar: Vulnerability detection based on code similarity of vulnerabilities and patches. Comput Secur 110(102):417
Google Scholar
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: Weinberger K (ed) Balcan M. Unsupervised deep embedding for clustering analysis, New York, pp 478–487
Google Scholar
Yamaguchi F, Maier A, Gascon H (2015) Automatic inference of search patterns for taint-style vulnerabilities. In: 2015 IEEE symposium on security and privacy SP 2015. IEEE, San Jose, CA, pp 797–812
Yan H, Sui Y, Chen S (2017) Machine-learning-guided typestate analysis for static use-after-free detection. In: 33rd annual computer security applications conference (ACSAC 2017). ACM, Orlando, FL, pp 42–54
Zhou C, Liu Y, Liu X (2017) Scalable graph embedding for asymmetric proximity. In: 31st AAAI conference on artificial intelligence. AAAI, San Francisco, CA, pp 2942–2948
Zhou Y, Liu S, Siow J (2019) Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Adv Neural Inf Proces Syst 32(10):197–207
Zou D, Wang S, Xu S (2019) \(\mu \)vuldeepecker: A deep learning-based system for multiclass vulnerability detection. IEEE Trans Depend Secur Comput 18(5):2224–2236
Google Scholar

Download references

Acknowledgements

This work was supported by Natural Science Foundation of YunNan Provincial Department of Education (2019J0942).

Funding

The authors have not disclosed any funding.

Author information

Authors and Affiliations

School of Information Science and Engineering, Yunnan University, Kunming, 650500, Yunnan, China
Wenlin Xu & Yahui Tang
Information Management Center, Yunnan University of Finance and Economics, Kunming, 650221, Yunnan, China
Wenlin Xu & Jinsong Wang
School of Big Data, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
Tong Li
The Key Laboratory for Software Engineering of Yunnan Province, Kunming, 650201, Yunnan, China
Tong Li

Authors

Wenlin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Tong Li
View author publications
You can also search for this author in PubMed Google Scholar
Jinsong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yahui Tang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

WX involved in conceptualization, methodology, experiment and writing—original draft. TL involved in writing—review and editing. JW involved in writing—review and editing. YT involved in data curation, resources and writing—review.

Corresponding author

Correspondence to Wenlin Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This material is the authors’ own original work, which has not been previously published elsewhere. The paper is not currently being considered for publication elsewhere. The paper reflects the authors’ own research and analysis in a truthful and complete manner.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, W., Li, T., Wang, J. et al. Detecting vulnerable software functions via text and dependency features. Soft Comput 27, 5425–5435 (2023). https://doi.org/10.1007/s00500-022-07775-5

Download citation

Accepted: 20 December 2022
Published: 07 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00500-022-07775-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting vulnerable software functions via text and dependency features

Abstract

Access this article

Similar content being viewed by others

Detecting Software Vulnerabilities Based on Hierarchical Graph Attention Network

Vulnerability Detection Using Deep Learning Based Function Classification

Deep Learning-Based Vulnerable Function Detection: A Benchmark

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detecting vulnerable software functions via text and dependency features

Abstract

Access this article

Similar content being viewed by others

Detecting Software Vulnerabilities Based on Hierarchical Graph Attention Network

Vulnerability Detection Using Deep Learning Based Function Classification

Deep Learning-Based Vulnerable Function Detection: A Benchmark

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation