Seq2Seq-AFL: Fuzzing via sequence-to-sequence model

Yang, Liqun; Wei, Chaoren; Yang, Jian; Ma, Jinxin; Guo, Hongcheng; Cheng, Long; Li, Zhoujun

doi:10.1007/s13042-024-02153-z

Liqun Yang¹,
Chaoren Wei¹,
Jian Yang²,
Jinxin Ma³,
Hongcheng Guo²,
Long Cheng⁴ &
…
Zhoujun Li²

101 Accesses
Explore all metrics

Abstract

Fuzzing is a technique in which anomalous data is fed into software to find potential bugs. It is mainly used to discover vulnerabilities including but not limited to buffer overflows, memory leaks, and crashes when handling abnormal inputs. However, to ensure all inputs are valid in Fuzzing is infeasible in practice due to the high instrumentation overhead. Popular Fuzzers (e.g., AFL) often generate a large number of invalid mutations when performing Fuzzing, which prevents Fuzzers from discovering potential paths that lead to new crashes. More importantly, it prevents Fuzzers from making wise decisions on fuzzing operators. In this article, we propose a mutation sensitive Fuzzing solution Seq2Seq-AFL, in which mutation operator and mutation position are simultaneously taken into account, and different Seq2Seq models are designed to perform optimization scheme. The optimization scheme is capable of efficiently training a function for obtaining mutation operator and mutation position pairs, and utilizes the function to conduct Fuzzing. To verify the effectiveness of our scheme, we construct the dataset with two-dimensional vector data that corresponding to objdump, readelf, and nm programs. The experiment results demonstrate that our proposed scheme significantly improves the performance of the state-of-the-art AFL Fuzzing tool, with the coverage improvements of 13.7%, 17.6% and 6.9% of objdump, readelf and nm, respectively. Especially, Seq2Seq-AFL exposes a total of 21 Crashes for objdump.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data collection and quality challenges in deep learning: a data-centric AI perspective

Article 03 January 2023

Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERT

Article Open access 22 February 2024

AIBugHunter: A Practical tool for predicting, classifying and repairing software vulnerabilities

Article Open access 20 November 2023

Data availability

The data that support the findings of this study are available on request from the corresponding author.

References

Wu T, Liu J, Xue L, Wu Y (2023) Fixed-time synchronization of multilayer complex networks under denial-of-service attacks. IEEE Trans Circuits Syst II Express Briefs 70(9):3519–3523. https://doi.org/10.1109/TCSII.2023.3261405
Article Google Scholar
Min D, Ko Y, Walker R, Lee J, Kim Y (2022) A content-based ransomware detection and backup solid-state drive for ransomware defense. IEEE Trans Comput Aided Des Integr Circuits Syst 41(7):2038–2051. https://doi.org/10.1109/TCAD.2021.3099084
Article Google Scholar
Gan S et al (2022) Path sensitive fuzzing for native applications. IEEE Trans Dependable Secure Comput 19(3):1544–1561. https://doi.org/10.1109/TDSC.2020.3027690
Article Google Scholar
Iorga D, Wickerson J, Donaldson AF (2023) Simulating operational memory models using off-the-shelf program analysis tools. IEEE Trans Software Eng 49(12):5084–5102. https://doi.org/10.1109/TSE.2023.3326056
Article Google Scholar
Zuo F et al (2022) Vulnerability detection of ICS protocols via cross-state fuzzing. IEEE Trans Comput Aided Des Integr Circuits Syst 41(11):4457–4468. https://doi.org/10.1109/TCAD.2022.3201471
Article Google Scholar
Li Y et al (2023) G-Fuzz: a directed fuzzing framework for gVisor. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2023.3244825
Article Google Scholar
He R, He H, Zhang Y, Zhou M (2023) Automating dependency updates in practice: an exploratory study on GitHub dependabot. IEEE Trans Software Eng 49(8):4004–4022. https://doi.org/10.1109/TSE.2023.3278129
Article Google Scholar
Zhang Zenong et al. {FIXREVERTER}: A Realistic Bug Injection Methodology for Benchmarking Fuzz Testing. 31st USENIX Security Symposium (USENIX Security 22). 2022.
Dai H, Sun CA, Liu H, Zhang X (2023) DFuzzer: diversity-driven seed queue construction of fuzzing for deep learning models. IEEE Trans Reliab. https://doi.org/10.1109/TR.2023.3322406
Article Google Scholar
Klooster T, Turkmen F, Broenink G, Hove RT, Böhme M (2023) Continuous fuzzing: a study of the effectiveness and scalability of fuzzing in CI/CD pipelines. 2023 IEEE/ACM International Workshop on Search-Based and Fuzz Testing (SBFT), Melbourne, Australia, 2023, pp 25-32. https://doi.org/10.1109/SBFT59156.2023.00015
Barinov V, Kashkarov M, Kazmin A (2020) Applying compiler-based binary watermarking technology to ensure binary compatibility in GNU/Linux distribution. 2020 Ivannikov Ispras Open Conference (ISPRAS). Moscow, Russia 2020:11–15
Google Scholar
Lin G (2021) Software vulnerability discovery via learning multi-domainknowledge bases. IEEE Trans Dependable Secure Comput 18(5):2469–2485. https://doi.org/10.1109/TDSC.2019.2954088
Article Google Scholar
Lin G, Wen S, Han Q-L, Zhang J, Xiang Y (2020) Software vulnerability detection using deep neural networks: a survey. Proc IEEE 108(10):1825–1848. https://doi.org/10.1109/JPROC.2020.2993293
Article Google Scholar
Croft Roland, et al. An empirical study of rule-based and learning-based approaches for static application security testing. Proceedings of the 15th ACM/IEEE international symposium on empirical software engineering and measurement (ESEM). (2021). https://doi.org/10.1145/3475716.3475781.
Dinh Sung Ta, et al (2021) Favocado: fuzzing the binding code of javascript engines using semantically correct test cases. NDSS. https://doi.org/10.14722/ndss.2021.24224.
Cloosters T et al (2022) SGXFuzz: efficiently synthesizing nested structures for SGX enclave fuzzing. 31st USENIX Security Symposium (USENIX Security 22)
Kim SJ, Shon T (2018) Field classification-based novel fuzzing case generation for ICS protocols. J Supercomput 74:4434–4450. https://doi.org/10.1007/s11227-017-1980-3
Article Google Scholar
Kiss Balázs, et al. Combining static and dynamic analyses for vulnerability detection: illustration on heartbleed. Hardware and Software: Verification and Testing: 11th International Haifa Verification Conference, HVC 2015, Haifa, Israel, Proceedings 11. Springer International Publishing, Cham. 2015. https://doi.org/10.1007/978-3-319-26287-1_3.
Li Z, Zhao H, Shi J, Huang Y, Xiong J (2019) An intelligent fuzzing data generation method based on deep adversarial learning. IEEE Access. 7:49327–49340. https://doi.org/10.1109/ACCESS.2019.2911121
Article Google Scholar
Wang J, Chen B, Wei L, Liu Y (2017) Skyfire: data-driven seed generation for fuzzing. 2017 IEEE Symposium on Security and Privacy (SP), San Jose. pp. 579–594. https://doi.org/10.1109/sp.2017.23.
Xue Y et al (2022) xFuzz: machine learning guided cross-contract fuzzing. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2022.3182373
Article Google Scholar
Situ L et al (2023) Physical devices-agnostic hybrid fuzzing of IoT firmware. IEEE Internet Things J 10(23):20718–20734. https://doi.org/10.1109/JIOT.2023.3303780
Article Google Scholar
Wang B, Wang R, Song H (2023) Toward the trustworthiness of industrial robotics using differential fuzz testing. IEEE Trans Ind Inf. 19(3):2782–2791. https://doi.org/10.1109/TII.2022.3211888
Article Google Scholar
Rajpal Mohit, William Blum, Rishabh Singh (2017) Not all bytes are equal: Neural byte sieve for fuzzing. arXiv preprint arXiv:1711.04596.
She D, Pei K, Epstein D, Yang J, Ray B, Jana S. NEUZZ: efficient fuzzing with neural program smoothing. 2019 IEEE Symposium on Security and Privacy (SP), San Francisco. 2019. pp. 803-817. https://doi.org/10.1109/sp.2019.00052
She Dongdong, et al. MTFuzz: fuzzing with a multi-task neural network. Proceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 2020. https://doi.org/10.1145/3410251.
Wang X, Hu C, Ma R, Li B, Wang X (2020) LAFuzz: neural network for efficient fuzzing.” 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore. 2020. pp. 603–611. https://doi.org/10.1109/ictai50040.2020.00098.
Hu Zhicheng, et al. GANFuzz: a GAN-based industrial network protocol fuzzing framework. Proceedings of the 15th ACM International Conference on Computing Frontiers. 2018. https://doi.org/10.1145/3203217.3203241.
Zalewski M. American fuzzy lop. http://lcamtuf.coredump.c/afl 2014.
Aschermann C, Frassetto T, Holz T, et al. NAUTILUS: fishing for deep bugs with grammars. Network and distributed system security symposium. 2019. https://doi.org/10.14722/ndss.2019.23412.
Lyu C, Ji S, Zhang C, et al. MOPT: optimized mutation scheduling for fuzzers. USENIX Security Symposium. 2019.
Böhme M, Pham V-T, Roychoudhury A (2019) Coverage-based greybox fuzzing as markov chain. IEEE Trans Software Eng 45(5):489-506. https://doi.org/10.1109/TSE.2017.2785841
Article Google Scholar
Wang J, Chen B, Wei L et al (2017) Skyfire: data-driven seed generation for fuzzing. IEEE. https://doi.org/10.1109/SP.2017.23
Article Google Scholar
Hao P et al (2023) Lifelong property price prediction: a case study for the toronto real estate market. IEEE Trans Knowl Data Eng 35(3):2765–2780. https://doi.org/10.1109/TKDE.2021.3112749
Article Google Scholar
Qian Li et al (2022) Reinforcement learning-based dialogue guided event extraction to exploit argument relations. IEEE/ACM Trans Audio Speech Lang Process 30:520–533. https://doi.org/10.1109/TASLP.2021.3138670
Article Google Scholar
Ma L, Zhao Y, Wang B, Shen F (2023) A multistep sequence-to-sequence model with attention LSTM neural networks for industrial soft sensor application. IEEE Sens J 23(10):10801–10813. https://doi.org/10.1109/JSEN.2023.3266104
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput. 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Menendez HD, Clark D (2022) Hashing fuzzing: introducing input diversity to improve crash detection. IEEE Trans Software Eng 48(9):3540–3553. https://doi.org/10.1109/TSE.2021.3100858
Article Google Scholar
Miller BP, Zhang M, Heymann ER (2022) The relevance of classic fuzz testing: have we solved this one? IEEE Trans Software Eng 48(6):2028–2039. https://doi.org/10.1109/TSE.2020.3047766
Article Google Scholar
Arizon-Peretz R, Hadar I, Luria G (2022) The importance of securityis in the eye of the beholder: cultural, organizational, and personal factors affecting the implementation of security by design. IEEE Trans Software Eng 48(11):4433–4446. https://doi.org/10.1109/TSE.2021.3119721
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. U2333205, 62302025, 62276017), a fund project: State Grid Co., Ltd. Technology R&D Project (ProjectName: Research on Key Technologies of Data Scenario-based Security Governance and Emergency Blocking in Power Monitoring System, Project No.: 5108-202303439A-3-2-ZN), the 2022 CCF-NSFOCUS Kun-Peng Scientific Research Fund and the Opening Project of Shanghai Trusted Industrial Control Platform.

Author information

Authors and Affiliations

School of Cyber Science and Technology, Beihang University, Beijing, 100191, China
Liqun Yang & Chaoren Wei
State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China
Jian Yang, Hongcheng Guo & Zhoujun Li
China Information Technology Security Evaluation Cernter, Beijing, 100085, China
Jinxin Ma
Beijing KaiLan Aviation Technology Co., LTD, Beijing, 101312, China
Long Cheng

Authors

Liqun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chaoren Wei
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jinxin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Hongcheng Guo
View author publications
You can also search for this author in PubMed Google Scholar
Long Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhoujun Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Liqun Yang: Conceptualization, Methodology, Software, Funding acquisition. Chaoren Wei: Methodology, Formal analysis, Software, Writing, and Editing. Jian Yang: Review, Writing, Original draft and Editing. Jinxin Ma: Methodology, Review and Editing and Revise. Hongcheng Guo: Conceptualization, Methodology and Review. Long Cheng: Conduct Fuzzing testing in his self-developing equipment, and Review. Zhoujun Li: Review and Funding acquisition.

Corresponding author

Correspondence to Jian Yang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, L., Wei, C., Yang, J. et al. Seq2Seq-AFL: Fuzzing via sequence-to-sequence model. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02153-z

Download citation

Received: 19 January 2024
Accepted: 18 March 2024
Published: 23 April 2024
DOI: https://doi.org/10.1007/s13042-024-02153-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Seq2Seq-AFL: Fuzzing via sequence-to-sequence model

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERT

AIBugHunter: A Practical tool for predicting, classifying and repairing software vulnerabilities

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Seq2Seq-AFL: Fuzzing via sequence-to-sequence model

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERT

AIBugHunter: A Practical tool for predicting, classifying and repairing software vulnerabilities

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation