Skip to main content
Log in

Dr.PathFinder: hybrid fuzzing with deep reinforcement concolic execution toward deeper path-first search

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Fuzzing is an effective approach to discover bugs in programs, especially memory corruption bugs, using randomly generated test cases. However, without prior knowledge of the target program, the fuzzer can generate only a limited number of test cases because of sanity checks. To solve this problem, recent studies have proposed hybrid fuzzers that observe the context of a target program using symbolic execution; these fuzzers generate test cases to bypass the sanity check. While hybrid fuzzers explore “deep” bugs in the target program, they generate many ineffective test cases. In this paper, we propose a concolic execution algorithm that combines deep reinforcement learning with a hybrid fuzzing solution, Dr.PathFinder. When the reinforcement learning agent encounters a branch during concolic execution, it evaluates the state and determines the search path. In this process,“shallow” paths are pruned, and “deep” paths are searched first. This reduces unnecessary exploration, allowing the efficient memory usage and alleviating the state explosion problem. In experiments with the CB-multios dataset for deep bug cases, Dr.PathFinder discovered approximately five times more bugs than AFL and two times more than Driller-AFL. In addition to finding more bugs, Dr.PathFinder generated 19 times fewer test cases and used at least \(2\%\) less memory than Driller-AFL. While it performed well in finding bugs located in deep paths, Dr.PathFinder had limitation to find bugs located at shallow paths, which we discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. This number only counts vulnerabilities reported at https://lcamtuf.coredump.cx/afl/.

  2. Some applications based on reinforcement learning models start learning with a stochastic policy, then they switch to a deterministic policy. However, in this experiment, Dr.PathFinder maintained a stochastic policy to continuously reflect new states and actions therein as fuzzing proceeds.

  3. In the DQN algorithm, the policy depends on the Q-network. Therefore, training the Q-network is equivalent to improving the policy.

References

  1. Aschermann C, Schumilo S, Blazytko T, Gawlik R, Holz T (2019) REDQUEEN: fuzzing with input-to-state correspondence. In: Proceedings of NDSS, pp 1–15. https://doi.org/10.14722/ndss.2019.23371

  2. Barreto A, Dabney W , Munos R, Hunt JJ, Schaul T, van Hasselt H, Silver D (2018) Successor features for transfer in reinforcement learning

  3. Barrett C, Stump A, Tinelli C (2010) The SMT-LIB standard: version 2.0. In: Gupta A, Kroening D (eds.) Proceedings of international work satisfy modul theory, p 14

  4. Böhme M, Pham VT, Roychoudhury A (2016) Coverage-based greybox fuzzing as Markov chain. Proc ACM Conf Comput Commun Secur. https://doi.org/10.1145/2976749.2978428

    Article  Google Scholar 

  5. Bottou L (2012) Stochastic gradient descent tricks. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). https://doi.org/10.1007/978-3-642-35289-8-25

  6. Brumley D, Jager I, Avgerinos T, Schwartz EJ (2011) BAP: A binary analysis platform. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) pp. 464–469. https://doi.org/10.1007/978-3-642-22110-1_37

  7. Cadar C, Dunbar D, Engler D (2019) Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proc USENIX Symp Oper Syst Des Impl, pp 209–224

  8. Cha SK, Avgerinos T, Rebert A, Brumley D (2012) Unleashing Mayhem on binary code. In: Proc IEEE Symp Secur Priv, pp 380–394. https://doi.org/10.1109/SP.2012.31

  9. Chen P, Chen H (2018) Angora: efficient fuzzing by principled search. In: Proceedings IEEE symposium on security and privacy, pp. 711–725. https://doi.org/10.1109/SP.2018.00046

  10. Chipounov V, Kuznetsov V, Candea G (2011) S2E: a platform for in-vivo multi-path analysis of software systems. In: Proceedings international conference on architectural support for programming languages and operating systems, pp. 265–278. https://doi.org/10.1145/1950365.1950396

  11. De Moura L, Bjørner N (2008) Z3: an efficient SMT solver. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) pp. 337–340. https://doi.org/10.1007/978-3-540-78800-3_24

  12. Defence Advanced Research Projects Agency (DARPA): Cyber Grand Challenge (CGC) (2016). https://www.darpa.mil/program/cyber-grand-challenge

  13. Dolan-Gavitt B, Hulin P, Kirda E, Leek T, Mambretti A, Robertson W, Ulrich F, Whelan R (2016) LAVA: large-scale automated vulnerability addition. In: Proceedings IEEE Symposium on Security and Privacy, pp. 110–121. https://doi.org/10.1109/SP.2016.15

  14. Enck W, Gilbert P, Han S, Tendulkar V, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2014) TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans Comput Syst. https://doi.org/10.1145/2619091

    Article  Google Scholar 

  15. Ganai M, Lee D, Gupta A (2012) DTAM: dynamic taint analysis of multi-threaded programs for relevancy. Proc ACM SIGSOFT Int Symp Found Softw Eng. https://doi.org/10.1145/2393596.2393650

    Article  Google Scholar 

  16. Ganesh V, Leek T, Rinard M (2009) Taint-based directed whitebox fuzzing. Proc Int Conf Softw Eng. https://doi.org/10.1109/ICSE.2009.5070546

    Article  Google Scholar 

  17. Godefroid P, Klarlund N, Sen K (2005) DART: directed automated random testing. ACM SIGPLAN Not 1:2. https://doi.org/10.1145/1064978.1065036

    Article  Google Scholar 

  18. Google: Honggfuzz (2016). https://github.com/google/honggfuzz

  19. Haller I, Slowinska A, Neugschwandtner M, Bos H (2013) Dowsing for overflows: a guided fuzzer to find buffer boundary violations. In: Proceedings of USENIX security symposium, pp 49–64

  20. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings IEEE international conference on computer vision, pp. 1026–1034. https://doi.org/10.1109/ICCV.2015.123

  21. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 1:2. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  22. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of international conference on machine learning, pp. 448–456

  23. Ispoglou KK, Austin D, Mohan V, Payer M (2020) FuzzGen: automatic fuzzer generation. In: Proceedings of 29th USENIX security symposium

  24. Kim S, Faerevaag M, Jung M, Jung S, Oh D, Lee J, Cha SK (2017) Testing intermediate representations for binary analysis. In: Proceedings of IEEE/ACM international conference on automated software engineering, pp 353–364. https://doi.org/10.1109/ASE.2017.8115648

  25. Kingma DP, Ba J (2017) Adam: a method for stochastic optimization

  26. Laf-Intel: circumventing fuzzing roadblocks with compiler transformations (2016). https://lafintel.wordpress.com/2016/08/15/circumventing-fuzzing-roadblocks-with-compiler-transformations/

  27. Landi W (1992) Undecidability of static analysis. ACM Lett Program Lang Syst. https://doi.org/10.1145/161494.161501

    Article  Google Scholar 

  28. Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis & transformation. In: International symposium on code generation and optimization, pp 75–86. https://doi.org/10.1109/CGO.2004.1281665

  29. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. https://doi.org/10.1038/nature14539

  30. Liang J, Jiang Y, Wang M, Jiao X, Chen Y, Song H, Choo KKR (2020) DeepFuzzer: accelerated deep Greybox fuzzing. IEEE Trans Depend Secur Comput. https://doi.org/10.1109/TDSC.2019.2961339

    Article  Google Scholar 

  31. Liang J, Wang M, Chen Y, Jiang Y, Zhang R (2018) Fuzz testing in practice: obstacles and solutions. In: 25th IEEE international conference software analysis evolution reengineering, SANER 2018—Proceedings, vol 2018-March. https://doi.org/10.1109/SANER.2018.8330260

  32. Lin LJ (1993) Reinforcement learning for robots using neural networks. Carnegie-Mellon Univ Pittsburgh PA School of Computer Science, Tech. rep

  33. Luk CK, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: Building customized program analysis tools with dynamic instrumentation. In: Proceedings of ACM SIGPLAN Conference programming language implementation, pp. 190–200

  34. Lyu C, Ji S, Zhang C, Li Y, Lee WH, Song Y, Beyah R (2019) MOPT: optimized mutation scheduling for fuzzers. In: Proceedings of USENIX security symposium, pp 1949–1966

  35. Masri W, Podgurski A, Leon D (2004) Detecting and debugging insecure information flows. In: Proceedings—international symposium on software reliability engineering, pp 198–209. https://doi.org/10.1109/issre.2004.17

  36. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature. https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  37. Nethercote N, Seward J (2007) Valgrind: A framework for heavyweight dynamic binary instrumentation. In: Proceedings ACM SIGPLAN conference on programming language design and implementation, pp 89–100. https://doi.org/10.1145/1250734.1250746

  38. Peng H, Shoshitaishvili Y, Payer M (2018) T-Fuzz: fuzzing by program transformation. In: Proceedings of IEEE symposium on security and privacy, pp 697–710. https://doi.org/10.1109/SP.2018.00056

  39. Rawat S, Jain V, Kumar A, Cojocar L, Giuffrida C, Bos H (2017) VUzzer: application-aware evolutionary fuzzing. In: Proceedings of NDSS, pp 1–14. https://doi.org/10.14722/ndss.2017.23404

  40. Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems, vol 37. Cambridge University, Engineering Department

  41. Schwartz EJ, Avgerinos T, Brumley D (2010) All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In: Proceedings of IEEE symposium on security and privacy, pp 317–331. https://doi.org/10.1109/SP.2010.26

  42. Sen K, Marinov D, Agha G (2005) CUTE: a concolic unit testing engine for C. In: Proceedings of European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software, pp 263–272

  43. Sengupta S, Basak S, Peters RA (2018) Particle Swarm Optimization: a survey of historical and recent developments with hybridization perspectives. https://doi.org/10.3390/make1010010

  44. Shoshitaishvili Y, Wang R, Salls C, Stephens N, Polino M, Dutcher A, Grosen J, Feng S, Hauser C, Kruegel C, Vigna, G (2016) SOK: (State of) the art of war: offensive techniques in binary analysis. In: Proceedings of IEEE symposium on security and privacy, pp 138–157. https://doi.org/10.1109/SP.2016.17

  45. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature. https://doi.org/10.1038/nature16961

    Article  Google Scholar 

  46. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, Van Den Driessche G, Graepel T, Hassabis D (2017) Mastering the game of Go without human knowledge. Nature. https://doi.org/10.1038/nature24270

    Article  Google Scholar 

  47. Stephens N, Grosen J, Salls C, Dutcher A, Wang R, Corbetta J, Shoshitaishvili Y, Kruegel C, Vigna G (2017) Driller: augmenting fuzzing through selective symbolic execution. In: Proceedings of NDSS, pp 1–16. https://doi.org/10.14722/ndss.2016.23368

  48. Sun M, Wei T, Lui JC (2016) TaintART: a practical multi-level information-flow tracking system for Android RunTime. In: Proceedings of ACM conference computer communications security, pp 331–342. https://doi.org/10.1145/2976749.2978343

  49. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn, vol 3. MIT Press

  50. Trail of Bits: DARPA challenges sets for Linux, Windows, and macOS (2016). https://github.com/trailofbits/cb-multios

  51. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Daan W, Riedmiller M (2016) Playing atari with deep reinforcement learning. https://doi.org/10.1038/nature14236

  52. Wang T, Wei T, Gu G, Zou W (2010) TaintScope: a checksum-aware directed fuzzing tool for automatic software vulnerability detection. In: Proceedings of IEEE symposium on security and privacy, pp 497–512. https://doi.org/10.1109/SP.2010.37

  53. Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn. https://doi.org/10.1007/bf00992698

    Article  MATH  Google Scholar 

  54. Wei CY, Hong YT, Lu CJ (2017) Online reinforcement learning in stochastic games

  55. Yun I, Lee S, Xu M, Jang Y, Kim T (2018) QSYM: a practical concolic execution engine tailored for hybrid fuzzing. In: Proceedings of USENIX security symposium, pp 745–761

  56. Zakeri Nasrabadi M, Parsa S, Kalaee A (2021) Format-aware learn & fuzz: deep test data generation for efficient fuzzing. Neural Comput Appl 33(5):1–17. https://doi.org/10.1007/s00521-020-05039-7

    Article  Google Scholar 

  57. Zalewski M (2017) American fuzzy lop. https://lcamtuf.coredump.cx/afl/

  58. Zhao L, Duan Y, Yin H, Xuan J (2019) Send hardest problems my way: probabilistic path prioritization for hybrid fuzzing. In: Proceedings of NDSS, pp 1–15. https://doi.org/10.14722/ndss.2019.23504

Download references

Acknowledgements

This research was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI) and funded by the Ministry of Health & Welfare, Republic of Korea (Grant Number: HI19C0791).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jongsub Moon.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jeon, S., Moon, J. Dr.PathFinder: hybrid fuzzing with deep reinforcement concolic execution toward deeper path-first search. Neural Comput & Applic 34, 10731–10750 (2022). https://doi.org/10.1007/s00521-022-07008-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07008-8

Keywords

Navigation