Abstract
Fuzzing is an effective approach to discover bugs in programs, especially memory corruption bugs, using randomly generated test cases. However, without prior knowledge of the target program, the fuzzer can generate only a limited number of test cases because of sanity checks. To solve this problem, recent studies have proposed hybrid fuzzers that observe the context of a target program using symbolic execution; these fuzzers generate test cases to bypass the sanity check. While hybrid fuzzers explore “deep” bugs in the target program, they generate many ineffective test cases. In this paper, we propose a concolic execution algorithm that combines deep reinforcement learning with a hybrid fuzzing solution, Dr.PathFinder. When the reinforcement learning agent encounters a branch during concolic execution, it evaluates the state and determines the search path. In this process,“shallow” paths are pruned, and “deep” paths are searched first. This reduces unnecessary exploration, allowing the efficient memory usage and alleviating the state explosion problem. In experiments with the CB-multios dataset for deep bug cases, Dr.PathFinder discovered approximately five times more bugs than AFL and two times more than Driller-AFL. In addition to finding more bugs, Dr.PathFinder generated 19 times fewer test cases and used at least \(2\%\) less memory than Driller-AFL. While it performed well in finding bugs located in deep paths, Dr.PathFinder had limitation to find bugs located at shallow paths, which we discussed.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-022-07008-8/MediaObjects/521_2022_7008_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-022-07008-8/MediaObjects/521_2022_7008_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-022-07008-8/MediaObjects/521_2022_7008_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-022-07008-8/MediaObjects/521_2022_7008_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-022-07008-8/MediaObjects/521_2022_7008_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-022-07008-8/MediaObjects/521_2022_7008_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-022-07008-8/MediaObjects/521_2022_7008_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-022-07008-8/MediaObjects/521_2022_7008_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-022-07008-8/MediaObjects/521_2022_7008_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-022-07008-8/MediaObjects/521_2022_7008_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-022-07008-8/MediaObjects/521_2022_7008_Fig11_HTML.png)
Similar content being viewed by others
Notes
This number only counts vulnerabilities reported at https://lcamtuf.coredump.cx/afl/.
Some applications based on reinforcement learning models start learning with a stochastic policy, then they switch to a deterministic policy. However, in this experiment, Dr.PathFinder maintained a stochastic policy to continuously reflect new states and actions therein as fuzzing proceeds.
In the DQN algorithm, the policy depends on the Q-network. Therefore, training the Q-network is equivalent to improving the policy.
References
Aschermann C, Schumilo S, Blazytko T, Gawlik R, Holz T (2019) REDQUEEN: fuzzing with input-to-state correspondence. In: Proceedings of NDSS, pp 1–15. https://doi.org/10.14722/ndss.2019.23371
Barreto A, Dabney W , Munos R, Hunt JJ, Schaul T, van Hasselt H, Silver D (2018) Successor features for transfer in reinforcement learning
Barrett C, Stump A, Tinelli C (2010) The SMT-LIB standard: version 2.0. In: Gupta A, Kroening D (eds.) Proceedings of international work satisfy modul theory, p 14
Böhme M, Pham VT, Roychoudhury A (2016) Coverage-based greybox fuzzing as Markov chain. Proc ACM Conf Comput Commun Secur. https://doi.org/10.1145/2976749.2978428
Bottou L (2012) Stochastic gradient descent tricks. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). https://doi.org/10.1007/978-3-642-35289-8-25
Brumley D, Jager I, Avgerinos T, Schwartz EJ (2011) BAP: A binary analysis platform. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) pp. 464–469. https://doi.org/10.1007/978-3-642-22110-1_37
Cadar C, Dunbar D, Engler D (2019) Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proc USENIX Symp Oper Syst Des Impl, pp 209–224
Cha SK, Avgerinos T, Rebert A, Brumley D (2012) Unleashing Mayhem on binary code. In: Proc IEEE Symp Secur Priv, pp 380–394. https://doi.org/10.1109/SP.2012.31
Chen P, Chen H (2018) Angora: efficient fuzzing by principled search. In: Proceedings IEEE symposium on security and privacy, pp. 711–725. https://doi.org/10.1109/SP.2018.00046
Chipounov V, Kuznetsov V, Candea G (2011) S2E: a platform for in-vivo multi-path analysis of software systems. In: Proceedings international conference on architectural support for programming languages and operating systems, pp. 265–278. https://doi.org/10.1145/1950365.1950396
De Moura L, Bjørner N (2008) Z3: an efficient SMT solver. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) pp. 337–340. https://doi.org/10.1007/978-3-540-78800-3_24
Defence Advanced Research Projects Agency (DARPA): Cyber Grand Challenge (CGC) (2016). https://www.darpa.mil/program/cyber-grand-challenge
Dolan-Gavitt B, Hulin P, Kirda E, Leek T, Mambretti A, Robertson W, Ulrich F, Whelan R (2016) LAVA: large-scale automated vulnerability addition. In: Proceedings IEEE Symposium on Security and Privacy, pp. 110–121. https://doi.org/10.1109/SP.2016.15
Enck W, Gilbert P, Han S, Tendulkar V, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2014) TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans Comput Syst. https://doi.org/10.1145/2619091
Ganai M, Lee D, Gupta A (2012) DTAM: dynamic taint analysis of multi-threaded programs for relevancy. Proc ACM SIGSOFT Int Symp Found Softw Eng. https://doi.org/10.1145/2393596.2393650
Ganesh V, Leek T, Rinard M (2009) Taint-based directed whitebox fuzzing. Proc Int Conf Softw Eng. https://doi.org/10.1109/ICSE.2009.5070546
Godefroid P, Klarlund N, Sen K (2005) DART: directed automated random testing. ACM SIGPLAN Not 1:2. https://doi.org/10.1145/1064978.1065036
Google: Honggfuzz (2016). https://github.com/google/honggfuzz
Haller I, Slowinska A, Neugschwandtner M, Bos H (2013) Dowsing for overflows: a guided fuzzer to find buffer boundary violations. In: Proceedings of USENIX security symposium, pp 49–64
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings IEEE international conference on computer vision, pp. 1026–1034. https://doi.org/10.1109/ICCV.2015.123
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 1:2. https://doi.org/10.1162/neco.1997.9.8.1735
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of international conference on machine learning, pp. 448–456
Ispoglou KK, Austin D, Mohan V, Payer M (2020) FuzzGen: automatic fuzzer generation. In: Proceedings of 29th USENIX security symposium
Kim S, Faerevaag M, Jung M, Jung S, Oh D, Lee J, Cha SK (2017) Testing intermediate representations for binary analysis. In: Proceedings of IEEE/ACM international conference on automated software engineering, pp 353–364. https://doi.org/10.1109/ASE.2017.8115648
Kingma DP, Ba J (2017) Adam: a method for stochastic optimization
Laf-Intel: circumventing fuzzing roadblocks with compiler transformations (2016). https://lafintel.wordpress.com/2016/08/15/circumventing-fuzzing-roadblocks-with-compiler-transformations/
Landi W (1992) Undecidability of static analysis. ACM Lett Program Lang Syst. https://doi.org/10.1145/161494.161501
Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis & transformation. In: International symposium on code generation and optimization, pp 75–86. https://doi.org/10.1109/CGO.2004.1281665
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. https://doi.org/10.1038/nature14539
Liang J, Jiang Y, Wang M, Jiao X, Chen Y, Song H, Choo KKR (2020) DeepFuzzer: accelerated deep Greybox fuzzing. IEEE Trans Depend Secur Comput. https://doi.org/10.1109/TDSC.2019.2961339
Liang J, Wang M, Chen Y, Jiang Y, Zhang R (2018) Fuzz testing in practice: obstacles and solutions. In: 25th IEEE international conference software analysis evolution reengineering, SANER 2018—Proceedings, vol 2018-March. https://doi.org/10.1109/SANER.2018.8330260
Lin LJ (1993) Reinforcement learning for robots using neural networks. Carnegie-Mellon Univ Pittsburgh PA School of Computer Science, Tech. rep
Luk CK, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: Building customized program analysis tools with dynamic instrumentation. In: Proceedings of ACM SIGPLAN Conference programming language implementation, pp. 190–200
Lyu C, Ji S, Zhang C, Li Y, Lee WH, Song Y, Beyah R (2019) MOPT: optimized mutation scheduling for fuzzers. In: Proceedings of USENIX security symposium, pp 1949–1966
Masri W, Podgurski A, Leon D (2004) Detecting and debugging insecure information flows. In: Proceedings—international symposium on software reliability engineering, pp 198–209. https://doi.org/10.1109/issre.2004.17
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature. https://doi.org/10.1038/nature14236
Nethercote N, Seward J (2007) Valgrind: A framework for heavyweight dynamic binary instrumentation. In: Proceedings ACM SIGPLAN conference on programming language design and implementation, pp 89–100. https://doi.org/10.1145/1250734.1250746
Peng H, Shoshitaishvili Y, Payer M (2018) T-Fuzz: fuzzing by program transformation. In: Proceedings of IEEE symposium on security and privacy, pp 697–710. https://doi.org/10.1109/SP.2018.00056
Rawat S, Jain V, Kumar A, Cojocar L, Giuffrida C, Bos H (2017) VUzzer: application-aware evolutionary fuzzing. In: Proceedings of NDSS, pp 1–14. https://doi.org/10.14722/ndss.2017.23404
Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems, vol 37. Cambridge University, Engineering Department
Schwartz EJ, Avgerinos T, Brumley D (2010) All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In: Proceedings of IEEE symposium on security and privacy, pp 317–331. https://doi.org/10.1109/SP.2010.26
Sen K, Marinov D, Agha G (2005) CUTE: a concolic unit testing engine for C. In: Proceedings of European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software, pp 263–272
Sengupta S, Basak S, Peters RA (2018) Particle Swarm Optimization: a survey of historical and recent developments with hybridization perspectives. https://doi.org/10.3390/make1010010
Shoshitaishvili Y, Wang R, Salls C, Stephens N, Polino M, Dutcher A, Grosen J, Feng S, Hauser C, Kruegel C, Vigna, G (2016) SOK: (State of) the art of war: offensive techniques in binary analysis. In: Proceedings of IEEE symposium on security and privacy, pp 138–157. https://doi.org/10.1109/SP.2016.17
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature. https://doi.org/10.1038/nature16961
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, Van Den Driessche G, Graepel T, Hassabis D (2017) Mastering the game of Go without human knowledge. Nature. https://doi.org/10.1038/nature24270
Stephens N, Grosen J, Salls C, Dutcher A, Wang R, Corbetta J, Shoshitaishvili Y, Kruegel C, Vigna G (2017) Driller: augmenting fuzzing through selective symbolic execution. In: Proceedings of NDSS, pp 1–16. https://doi.org/10.14722/ndss.2016.23368
Sun M, Wei T, Lui JC (2016) TaintART: a practical multi-level information-flow tracking system for Android RunTime. In: Proceedings of ACM conference computer communications security, pp 331–342. https://doi.org/10.1145/2976749.2978343
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn, vol 3. MIT Press
Trail of Bits: DARPA challenges sets for Linux, Windows, and macOS (2016). https://github.com/trailofbits/cb-multios
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Daan W, Riedmiller M (2016) Playing atari with deep reinforcement learning. https://doi.org/10.1038/nature14236
Wang T, Wei T, Gu G, Zou W (2010) TaintScope: a checksum-aware directed fuzzing tool for automatic software vulnerability detection. In: Proceedings of IEEE symposium on security and privacy, pp 497–512. https://doi.org/10.1109/SP.2010.37
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn. https://doi.org/10.1007/bf00992698
Wei CY, Hong YT, Lu CJ (2017) Online reinforcement learning in stochastic games
Yun I, Lee S, Xu M, Jang Y, Kim T (2018) QSYM: a practical concolic execution engine tailored for hybrid fuzzing. In: Proceedings of USENIX security symposium, pp 745–761
Zakeri Nasrabadi M, Parsa S, Kalaee A (2021) Format-aware learn & fuzz: deep test data generation for efficient fuzzing. Neural Comput Appl 33(5):1–17. https://doi.org/10.1007/s00521-020-05039-7
Zalewski M (2017) American fuzzy lop. https://lcamtuf.coredump.cx/afl/
Zhao L, Duan Y, Yin H, Xuan J (2019) Send hardest problems my way: probabilistic path prioritization for hybrid fuzzing. In: Proceedings of NDSS, pp 1–15. https://doi.org/10.14722/ndss.2019.23504
Acknowledgements
This research was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI) and funded by the Ministry of Health & Welfare, Republic of Korea (Grant Number: HI19C0791).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jeon, S., Moon, J. Dr.PathFinder: hybrid fuzzing with deep reinforcement concolic execution toward deeper path-first search. Neural Comput & Applic 34, 10731–10750 (2022). https://doi.org/10.1007/s00521-022-07008-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07008-8