Evaluating seed selection for fuzzing JavaScript engines

Wen, Ming; Wang, Yongcong; Xia, Yifan; Jin, Hai

doi:10.1007/s10664-023-10340-9

Evaluating seed selection for fuzzing JavaScript engines

Published: 26 September 2023

Volume 28, article number 133, (2023)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Ming Wen ORCID: orcid.org/0000-0001-5588-9618^1,2,
Yongcong Wang^1,2,
Yifan Xia³ &
…
Hai Jin^1,4

294 Accesses
Explore all metrics

Abstract

JavaScript (JS), as a platform-independent programming language, remains to be the most popular language over the years. However, popular JavaScript engines that have been widely utilized by web browsers to interpret JS code, have become the most common targets for attackers. Thus ensuring the security and reliability of JS engines is significant. Fuzzing is a simple yet effective method to unveil vulnerabilities. However, existing JS fuzzers focus more on the design of effective mutation mechanisms to generate diverse and valid seeds while they often ignore the importance of the initial seed corpus selected to drive the fuzzing process. In this paper, we performed extensive experiments to systematically evaluate the impact of seed selection on fuzzing JavaScript engines. In particular, we investigate seed selections from three main dimensions, their collected sources (e.g., CVE PoCs, Regression tests, etc.), the number and sizes, as well as a set of concerned code properties. Our major findings reveal that seeds collected from different sources can cast a significant impact on the fuzzing effectiveness (i.e., CVE PoC is significantly better than the other types of seeds), and seed files containing those concerned code structures can lead existing fuzzers to achieve superior results in terms of both code coverage and unique crashes identified. Inspired by our observations, we devised a simple heuristic to prioritize JavaScript files when selecting seed corpus. Our experiments show that when driven by our selected seed corpus, the existing state-of-art fuzzer is able to achieve significantly higher code coverage and identify more crashes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sniping at web applications to discover input-handling vulnerabilities

Article Open access 12 April 2024

Generative AI for pentesting: the good, the bad, the ugly

Article Open access 15 March 2024

Future of software development with generative AI

Article Open access 11 March 2024

Data Available Statement

The seeds we collected and the analysis results of the experiment have been stored in the Git repository, https://github.com/CGCL-codes/JSFuzz

Notes

A fuzzer usually perform a dry-run on the seed corpus to obtain the initial information.

References

(2019) A collection of javascript engine cves with pocs. https://github.com/tunz/js-vuln-db
Apple Javascriptcore (2014) The Built-in Javascript Engine for Webkit. https://trac.webkit.org/wiki/JavaScriptCore
Aschermann C, Frassetto T, Holz T, Jauernig P, Sadeghi AR, Teuchert D (2019) Nautilus: Fishing For Deep Bugs With Grammars. In: NDSS
Athanasakis M, Athanasopoulos E, Polychronakis M, Portokalidis G, Ioannidis S (2015) The devil is in the constants: Bypassing defenses in browser jit engines. In: NDSS
Böhme M, Pham VT, Roychoudhury A (2017) Coverage-based greybox fuzzing as markov chain. IEEE Trans Softw Eng 45(5):489–506
Article Google Scholar
Böhme M, Pham VT, Roychoudhury A (2017) Coverage-based greybox fuzzing as markov chain. IEEE Trans Softw Eng 45(5):489–506
Chen Y, Zhong R, Hu H, Zhang H, Yang Y, Wu D, Lee W (2021) One engine to fuzz’em all: Generic language processor testing with semantic validation. In: Proc 42nd IEEE Symp Secur Priv (Oakland)
Cummins C, Petoumenos P, Murray A, Leather H (2018) Compiler fuzzing through deep learning. In: Proc 27th ACM SIGSOFT Int Symp Soft Test Anal pp 95–105
Ecma (2019) standard ecma-262. https://www.ecma-international.org/publications/standards/Ecma-262.htm
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with lstm. Neural Comput 12(10):2451–2471
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with lstm. Neural Comput 12(10):2451–2471
Article Google Scholar
Godefroid P, Peleg H, Singh R (2017) Learn amp;fuzz: Machine learning for input fuzzing. In: 2017 32nd IEEE/ACM Int Conf Autom Softw Eng (ASE) pp 50–59. https://doi.org/10.1109/ASE.2017.8115618
Han H, Oh D, Cha SK (2018) Codealchemist: Semantics-aware code generation to find vulnerabilities in javascript engines. In: NDSS
Herrera A, Gunadi H, Magrath S, Norrish M, Payer M, Hosking AL (2021) Seed selection for successful fuzzing. In: Proc 30th ACM SIGSOFT Int Symp Softw Test Anal ISSTA 2021 Assoc Comput Mach. New York, NY, USA pp 230–243. https://doi.org/10.1145/3460319.3464795
He X, Xie X, Li Y, Sun J, Li F, Zou W, Liu Y, Yu L, Zhou J, Shi W, Huo W (2021) Sofi: Reflection-augmented fuzzing for javascript engines. CCS ’21
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780
Article Google Scholar
Holler C, Herzig K, Zeller A (2012) Fuzzing with code fragments. In: 21st USENIX Secur Symp (USENIX Security 12) pp 445–458. USENIX Association, Bellevue, WA. https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/holler
Klees G, Ruef A, Cooper B, Wei S, Hicks M (2018) Evaluating fuzz testing. In: Proc 2018 ACM SIGSAC Conf Comput Commun Secur CCS’18 pp 2123–2138. Assoc Comput Mach. New York, NY, USA. https://doi.org/10.1145/3243734.3243804
Language Ranking (2021). https://madnight.github.io/githut/#/pullrequests/2021/3 Accessed 28 Oct 2021
Lee S, Han H, Cha SK, Son S (2020) Montage: A neural network language model-guided javascript engine fuzzer. In: 29th USENIX Secur Symp (USENIX Security 20) pp 2613–2630
Lemieux C, Sen K (2018) Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. In: Proc 33rd ACM/IEEE Int Conf Autom Softw Eng pp 475–485
LLVM Project (2015) Libfuzzer. https://llvm.org/docs/LibFuzzer.html#value-profile. Accessed 10 Jan 2021
Lyu C, Ji S, Zhang C, Li Y, Lee WH, Song Y, Beyah R (2018) MOPT: Optimized mutation scheduling for fuzzers. In: 28th USENIX Secur Symp (USENIX Security 19) pp 1949–1966
Mann HB, Whitney DR (1947) On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18(1):50– 60. DOI 10.1214/aoms/1177730491. https://doi.org/10.1214/aoms/1177730491
Mann HB, Whitney DR (1947) On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18(1):50–60. https://doi.org/10.1214/aoms/1177730491
Molinyawe M, Hariri AA, Spelman J (2016) Shell on earth: From browser to system compromise. Proc Black Hat USA
Official Ecmascript Conformance Test Suite (1997). https://github.com/tc39/test262
Patra J, Pradel M (2016) Learning to fuzz: Application-independent fuzz testing with probabilistic, generative models of input data. TU Darmstadt, Department of Computer Science, Tech. Rep. TUD-CS-2016-14664
Pham VT, Böhme M, Santosa AE, Căciulescu AR, Roychoudhury A (2019) Smart greybox fuzzing. IEEE Trans Softw Eng 47(9):1980–1997
Pham VT, Böhme M, Santosa AE, Căciulescu AR, Roychoudhury A (2019) Smart greybox fuzzing. IEEE Trans Softw Eng 47(9):1980–1997
Google Scholar
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9
Google Scholar
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9
Raychev V, Bielik P, Vechev M, Krause A (2016) Learning programs from noisy data. ACM Sigplan Notices 51(1):761–774
Article MATH Google Scholar
Raychev V, Bielik P, Vechev M, Krause A (2016) Learning programs from noisy data. ACM Sigplan Notices 51(1):761–774
Reddy S, Lemieux C, Padhye R, Sen K (2021) Quickly generating diverse valid test inputs with reinforcement learning. In: 2020 IEEE/ACM 42nd Int Conf Softw Eng (ICSE) pp 1410–1421. IEEE
Rohlf C, Ivnitskiy Y (2011) Attacking clientside jit compilers. Black Hat USA
Romano A, Lehmann D, Pradel M, Wang W (2021) Wobfuscator: Obfuscating javascript malware via opportunistic translation to webassembly
R. Swiecki. Honggfuzz. (2016). http://code.google.com/p/honggfuzz
The React.js Library (2013). https://reactjs.org. Accessed 28 Oct 2021
Theori INC (2019) pwn.js. https://github.com/theori-io/pwnjs
Veggalam S, Rawat S, Haller I, Bos H (2016) Ifuzzer: An evolutionary interpreter fuzzer using genetic programming. In: I. Askoxylakis S, Ioannidis S, Katsikas C Meadows, (ed) Comput Secur - ESORICS 2016 pp 581–601. Springer International Publishing, Cham
Wang J, Chen B, Wei L, Liu Y (2017) Skyfire: Data-driven seed generation for fuzzing. In: 2017 IEEE Symp Secur Priv (SP) pp 579–594. IEEE
Wang J, Chen B, Wei L, Liu Y (2019) Superion: Grammar-aware greybox fuzzing. In: 2019 IEEE/ACM 41st Int Conf Softw Eng (ICSE) pp 724–735. IEEE
Ye G, Tang Z, Tan SH, Huang S, Fang D, Sun X, Bian L, Wang H, Wang Z (2021) Automated conformance testing for javascript engines via deep compiler fuzzing. In: PLDI pp 435–450

Download references

Acknowledgements

We sincerely thank the editor for his/her help in reviewing this paper and all anonymous reviewers for their valuable comments. This work was supported by the National Natural Science Foundation of China (Grant No. 62002125) as well as the Young Elite Scientists Sponsorship Program by CAST (Grant No. 2021QNRC001)

Author information

Authors and Affiliations

Hubei Key Laboratory of Distributed System Security, Hubei Engineering Research Center on Big Data Security, National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Cluster and Grid Computing Lab, Wuhan, China
Ming Wen, Yongcong Wang & Hai Jin
School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, China
Ming Wen & Yongcong Wang
Zhejiang University, Hangzhou, China
Yifan Xia
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
Hai Jin

Authors

Ming Wen
View author publications
You can also search for this author in PubMed Google Scholar
Yongcong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Xia
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Wen.

Additional information

Communicated by: Tingting Yu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wen, M., Wang, Y., Xia, Y. et al. Evaluating seed selection for fuzzing JavaScript engines. Empir Software Eng 28, 133 (2023). https://doi.org/10.1007/s10664-023-10340-9

Download citation

Accepted: 10 May 2023
Published: 26 September 2023
DOI: https://doi.org/10.1007/s10664-023-10340-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating seed selection for fuzzing JavaScript engines

Abstract

Access this article

Similar content being viewed by others

Sniping at web applications to discover input-handling vulnerabilities

Generative AI for pentesting: the good, the bad, the ugly

Future of software development with generative AI

Data Available Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluating seed selection for fuzzing JavaScript engines

Abstract

Access this article

Similar content being viewed by others

Sniping at web applications to discover input-handling vulnerabilities

Generative AI for pentesting: the good, the bad, the ugly

Future of software development with generative AI

Data Available Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation