Skip to main content
Log in

Evaluating seed selection for fuzzing JavaScript engines

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

JavaScript (JS), as a platform-independent programming language, remains to be the most popular language over the years. However, popular JavaScript engines that have been widely utilized by web browsers to interpret JS code, have become the most common targets for attackers. Thus ensuring the security and reliability of JS engines is significant. Fuzzing is a simple yet effective method to unveil vulnerabilities. However, existing JS fuzzers focus more on the design of effective mutation mechanisms to generate diverse and valid seeds while they often ignore the importance of the initial seed corpus selected to drive the fuzzing process. In this paper, we performed extensive experiments to systematically evaluate the impact of seed selection on fuzzing JavaScript engines. In particular, we investigate seed selections from three main dimensions, their collected sources (e.g., CVE PoCs, Regression tests, etc.), the number and sizes, as well as a set of concerned code properties. Our major findings reveal that seeds collected from different sources can cast a significant impact on the fuzzing effectiveness (i.e., CVE PoC is significantly better than the other types of seeds), and seed files containing those concerned code structures can lead existing fuzzers to achieve superior results in terms of both code coverage and unique crashes identified. Inspired by our observations, we devised a simple heuristic to prioritize JavaScript files when selecting seed corpus. Our experiments show that when driven by our selected seed corpus, the existing state-of-art fuzzer is able to achieve significantly higher code coverage and identify more crashes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Available Statement

The seeds we collected and the analysis results of the experiment have been stored in the Git repository, https://github.com/CGCL-codes/JSFuzz

Notes

  1. A fuzzer usually perform a dry-run on the seed corpus to obtain the initial information.

References

  • (2019) A collection of javascript engine cves with pocs. https://github.com/tunz/js-vuln-db

  • Apple Javascriptcore (2014) The Built-in Javascript Engine for Webkit. https://trac.webkit.org/wiki/JavaScriptCore

  • Aschermann C, Frassetto T, Holz T, Jauernig P, Sadeghi AR, Teuchert D (2019) Nautilus: Fishing For Deep Bugs With Grammars. In: NDSS

  • Athanasakis M, Athanasopoulos E, Polychronakis M, Portokalidis G, Ioannidis S (2015) The devil is in the constants: Bypassing defenses in browser jit engines. In: NDSS

  • Böhme M, Pham VT, Roychoudhury A (2017) Coverage-based greybox fuzzing as markov chain. IEEE Trans Softw Eng 45(5):489–506

    Article  Google Scholar 

  • Böhme M, Pham VT, Roychoudhury A (2017) Coverage-based greybox fuzzing as markov chain. IEEE Trans Softw Eng 45(5):489–506

  • Chen Y, Zhong R, Hu H, Zhang H, Yang Y, Wu D, Lee W (2021) One engine to fuzz’em all: Generic language processor testing with semantic validation. In: Proc 42nd IEEE Symp Secur Priv (Oakland)

  • Cummins C, Petoumenos P, Murray A, Leather H (2018) Compiler fuzzing through deep learning. In: Proc 27th ACM SIGSOFT Int Symp Soft Test Anal pp 95–105

  • Ecma (2019) standard ecma-262. https://www.ecma-international.org/publications/standards/Ecma-262.htm

  • Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with lstm. Neural Comput 12(10):2451–2471

  • Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with lstm. Neural Comput 12(10):2451–2471

    Article  Google Scholar 

  • Godefroid P, Peleg H, Singh R (2017) Learn amp;fuzz: Machine learning for input fuzzing. In: 2017 32nd IEEE/ACM Int Conf Autom Softw Eng (ASE) pp 50–59. https://doi.org/10.1109/ASE.2017.8115618

  • Han H, Oh D, Cha SK (2018) Codealchemist: Semantics-aware code generation to find vulnerabilities in javascript engines. In: NDSS

  • Herrera A, Gunadi H, Magrath S, Norrish M, Payer M, Hosking AL (2021) Seed selection for successful fuzzing. In: Proc 30th ACM SIGSOFT Int Symp Softw Test Anal ISSTA 2021 Assoc Comput Mach. New York, NY, USA pp 230–243. https://doi.org/10.1145/3460319.3464795

  • He X, Xie X, Li Y, Sun J, Li F, Zou W, Liu Y, Yu L, Zhou J, Shi W, Huo W (2021) Sofi: Reflection-augmented fuzzing for javascript engines. CCS ’21

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780

    Article  Google Scholar 

  • Holler C, Herzig K, Zeller A (2012) Fuzzing with code fragments. In: 21st USENIX Secur Symp (USENIX Security 12) pp 445–458. USENIX Association, Bellevue, WA. https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/holler

  • Klees G, Ruef A, Cooper B, Wei S, Hicks M (2018) Evaluating fuzz testing. In: Proc 2018 ACM SIGSAC Conf Comput Commun Secur CCS’18 pp 2123–2138. Assoc Comput Mach. New York, NY, USA. https://doi.org/10.1145/3243734.3243804

  • Language Ranking (2021). https://madnight.github.io/githut/#/pullrequests/2021/3 Accessed 28 Oct 2021

  • Lee S, Han H, Cha SK, Son S (2020) Montage: A neural network language model-guided javascript engine fuzzer. In: 29th USENIX Secur Symp (USENIX Security 20) pp 2613–2630

  • Lemieux C, Sen K (2018) Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. In: Proc 33rd ACM/IEEE Int Conf Autom Softw Eng pp 475–485

  • LLVM Project (2015) Libfuzzer. https://llvm.org/docs/LibFuzzer.html#value-profile. Accessed 10 Jan 2021

  • Lyu C, Ji S, Zhang C, Li Y, Lee WH, Song Y, Beyah R (2018) MOPT: Optimized mutation scheduling for fuzzers. In: 28th USENIX Secur Symp (USENIX Security 19) pp 1949–1966

  • Mann HB, Whitney DR (1947) On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18(1):50– 60. DOI 10.1214/aoms/1177730491. https://doi.org/10.1214/aoms/1177730491

  • Mann HB, Whitney DR (1947) On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18(1):50–60. https://doi.org/10.1214/aoms/1177730491

  • Molinyawe M, Hariri AA, Spelman J (2016) Shell on earth: From browser to system compromise. Proc Black Hat USA

  • Official Ecmascript Conformance Test Suite (1997). https://github.com/tc39/test262

  • Patra J, Pradel M (2016) Learning to fuzz: Application-independent fuzz testing with probabilistic, generative models of input data. TU Darmstadt, Department of Computer Science, Tech. Rep. TUD-CS-2016-14664

  • Pham VT, Böhme M, Santosa AE, Căciulescu AR, Roychoudhury A (2019) Smart greybox fuzzing. IEEE Trans Softw Eng 47(9):1980–1997

  • Pham VT, Böhme M, Santosa AE, Căciulescu AR, Roychoudhury A (2019) Smart greybox fuzzing. IEEE Trans Softw Eng 47(9):1980–1997

    Google Scholar 

  • Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9

    Google Scholar 

  • Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9

  • Raychev V, Bielik P, Vechev M, Krause A (2016) Learning programs from noisy data. ACM Sigplan Notices 51(1):761–774

    Article  MATH  Google Scholar 

  • Raychev V, Bielik P, Vechev M, Krause A (2016) Learning programs from noisy data. ACM Sigplan Notices 51(1):761–774

  • Reddy S, Lemieux C, Padhye R, Sen K (2021) Quickly generating diverse valid test inputs with reinforcement learning. In: 2020 IEEE/ACM 42nd Int Conf Softw Eng (ICSE) pp 1410–1421. IEEE

  • Rohlf C, Ivnitskiy Y (2011) Attacking clientside jit compilers. Black Hat USA

  • Romano A, Lehmann D, Pradel M, Wang W (2021) Wobfuscator: Obfuscating javascript malware via opportunistic translation to webassembly

  • R. Swiecki. Honggfuzz. (2016). http://code.google.com/p/honggfuzz

  • The React.js Library (2013). https://reactjs.org. Accessed 28 Oct 2021

  • Theori INC (2019) pwn.js. https://github.com/theori-io/pwnjs

  • Veggalam S, Rawat S, Haller I, Bos H (2016) Ifuzzer: An evolutionary interpreter fuzzer using genetic programming. In: I. Askoxylakis S, Ioannidis S, Katsikas C Meadows, (ed) Comput Secur - ESORICS 2016 pp 581–601. Springer International Publishing, Cham

  • Wang J, Chen B, Wei L, Liu Y (2017) Skyfire: Data-driven seed generation for fuzzing. In: 2017 IEEE Symp Secur Priv (SP) pp 579–594. IEEE

  • Wang J, Chen B, Wei L, Liu Y (2019) Superion: Grammar-aware greybox fuzzing. In: 2019 IEEE/ACM 41st Int Conf Softw Eng (ICSE) pp 724–735. IEEE

  • Ye G, Tang Z, Tan SH, Huang S, Fang D, Sun X, Bian L, Wang H, Wang Z (2021) Automated conformance testing for javascript engines via deep compiler fuzzing. In: PLDI pp 435–450

Download references

Acknowledgements

We sincerely thank the editor for his/her help in reviewing this paper and all anonymous reviewers for their valuable comments. This work was supported by the National Natural Science Foundation of China (Grant No. 62002125) as well as the Young Elite Scientists Sponsorship Program by CAST (Grant No. 2021QNRC001)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Wen.

Additional information

Communicated by: Tingting Yu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, M., Wang, Y., Xia, Y. et al. Evaluating seed selection for fuzzing JavaScript engines. Empir Software Eng 28, 133 (2023). https://doi.org/10.1007/s10664-023-10340-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-023-10340-9

Keywords

Navigation