Skip to main content

A Survey of Control Flow Graph Recovery for Binary Code

  • Conference paper
  • First Online:
Computer Applications (CCF NCCA 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1960))

Included in the following conference series:

  • 125 Accesses

Abstract

With the rapid development of Internet applications, the study of software security has received increasing attention. The recovery of control flow graphs, as one of the fundamental tasks in software security analysis, is essential to understand the structure and flow of program execution. The accuracy of control flow recovery is crucial to security techniques such as vulnerability mining and code similarity comparison, which are based on control flow graphs. In the field of reverse analysis, the recovery of the control flow graph for binary code has become a hot research topic. In this paper, we review the methods of control flow graph construction of binary code, including static analysis, dynamic analysis, and hybrid analysis, and compare their advantages and disadvantages. After that, we discuss the difficult problems in control flow graph construction and summarize the research progress of the indirect jump problem in recent years. Finally, the focus and outlook of future research in this area are summarized and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

  • 16 December 2023

    A correction has been published.

References

  1. Wurm, J., et al.: Security analysis on consumer and industrial IoT devices. In: 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 519–524. IEEE (2016). https://doi.org/10.1109/ASPDAC.2016.7428064

  2. Bogart, C., et al.: When and how to make breaking changes: policies and practices in 18 open source software ecosystems. ACM Trans. Softw. Eng. Methodol. 30(4), 1–56 (2021). https://doi.org/10.1145/3447245

  3. NIST. National Vulnerability Dtabase (2023). https://nvd.nist.gov. Accessed 26 Apr 2023

  4. Shoshitaishvili, Y., et al.: SOK: (state of) the art of war: offensive techniques in binary analysis. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 138–157. IEEE (2016). https://doi.org/10.1109/SP.2016.17

  5. Allen, F.E.: Control flow analysis. ACM Sigplan Notices 5(7), 1–19 (1970). https://doi.org/10.1145/390013.808479

    Article  Google Scholar 

  6. Sun, Q., et al.: Leveraging spectral representations of control flow graphs for efficient analysis of windows malware. In: Proceedings of the ACM on Asia Conference on Computer and Communications Security, 2022, pp. 1240–1242 (2022). https://doi.org/10.1145/3488932.3527294

  7. Wu, C.Y., et al.: IoT malware classification based on reinterpreted function-call graphs. Comput. Secur. 125, 103060 (2023). https://doi.org/10.1016/j.cose.2022.103060

  8. Herath, J.D., et al.: CFGExplainer: explaining graph neural network-based malware classification from control flow graphs. In: 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 172–184. IEEE (2022). https://doi.org/10.1109/DSN53405.2022.00028

  9. Cao, S., et al.: Bgnn4vd: constructing bidirectional graph neural-network for vulnerability detection. Inf. Softw. Technol. 136, 106576 (2021). https://doi.org/10.1016/j.infsof.2021.106576

  10. Cheng, X., et al.: Path-sensitive code embedding via contrastive learning for software vulnerability detection. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 519–531 (2022). https://doi.org/10.1145/3533767.3534371

  11. Xu, X., et al.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 363–376 (2017). https://doi.org/10.1145/3133956.3134018

  12. Wang, H., et al.: jTrans: Jump-Aware Transformer for Binary Code Similarity. arXiv preprint arXiv:2205.12713 (2022). https://doi.org/10.48550/arXiv.2205.12713

  13. Balakrishnan, G., Reps, T.: Wysinwyx: what you see is not what you execute. ACM Trans. Prog. Lang. Syst. 32(6), 1–84 (2010). https://doi.org/10.1145/1749608.1749612

    Article  Google Scholar 

  14. Xu, L., Sun, F., Su, Z.: Constructing Precise Control Flow Graphs from Binaries. University of California, Davis, Tech. Rep. 28 (2009)

    Google Scholar 

  15. Hex-Rays. IDAPro Disassembler. https://www.hex-rays.com/. Accessed 24 Feb 2023

  16. Wenzl, M., et al.: From hack to elaborate technique-a survey on binary rewriting. ACM Comput. Surv. 52(3), 1–37 (2019). https://doi.org/10.1145/3316415

  17. Wang, J., et al.: Survey on application of machine learning in disassembly on x86 binaries. Netinfo Security 22(6), 9–25 (2022). https://doi.org/10.3969/j.issn.1671-1122.2022.06.002

  18. Peterson, T.: Alternating Control Flow Graph Reconstruction by Combining Constant Propagation and Strided Intervals with Directed Symbolic Execution (2019). http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A1416002

  19. Zhang, B., Li, Q.-B., Cui, C.: Dynamic control flow recovery algorithm based on automatic path driven. Comput. Eng. 39(8), 77–82 (2013). https://doi.org/10.3969/j.issn.1000-3428.2013.08.016

  20. Di Federico, A., Payer, M., Agosta, G.: rev. ng: a unified binary analysis framework to recover CFGs and function boundaries. In: Proceedings of the 26th International Conference on Compiler Construction, pp. 131–141 (2017). https://doi.org/10.1145/3033019.3033028

  21. Pang, C., et al.: SoK: all you ever wanted to know about x86/x64 binary disassembly but were afraid to ask. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 833–851. IEEE (2021). https://doi.org/10.1109/SP40001.2021.00012

  22. Pang, C., et al.: Ground truth for binary disassembly is not easy. In: 31st USENIX Security Symposium (USENIX Security 22), pp. 2479–2495 (2022). https://www.usenix.org/conference/usenixsecurity22/presentation/pang-chengbin

  23. Dai, C., et al.: Research on disassembly against the Malware obfuscated with embedded code. J. Inf. Eng. Univ. 19(3), 347–352 (2018). https://doi.org/10.3969/j.issn.1671-0673.2018.03.018

  24. Flores-Montoya, A., Schulte, E.: Datalog disassembly. In: Proceedings of the 29th USENIX Conference on Security Symposium, pp. 1075–1092 (2020)

    Google Scholar 

  25. Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: a binary analysis platform. In: Proceedings of the Computer Aided Verification-23rd International Conference, CAV 2011, Snowbird, 14–20 July 2011, pp. 463–469 (2011). https://doi.org/10.1007/978-3-642-22110-137

  26. Meng, X., Miller, B.P.: Binary code is not easy. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, pp. 24–35 (2016). https://doi.org/10.1145/2931037.2931047

  27. NSA. Ghidra Software Reverse Engineering Framework. National Security Agency (2022). https://github.com/NationalSecurityAgency/ghidra. Accessed 24 Feb 2023

  28. Radare 2 (2023). https://rada.re. Accessed 24 Feb 2023

  29. Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, PLDI 2007, p. 100. ACM (2007). https://doi.org/10.1145/1273442.1250746

  30. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis and transformation. In: International Symposium on Code Generation and Optimization, CGO 2004, pp. 75–86. IEEE (2004). https://doi.org/10.1109/CGO.2004.1281665

  31. Naus, N., Verbeek, F., Walker, D., Ravindran, B.: A formal semantics for P-code. In: Lal, A., Tonetta, S. (eds.) Verified Software. Theories, Tools and Experiments. VSTTE 2022. LNCS 13800, pp. 111–128. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25803-9_7

  32. Bardin, S., et al.: The BINCOA framework for binary code analysis. In: CAV 2011, pp. 165–170. https://doi.org/10.1007/978-3-642-22110-1

  33. Kinder, J., Veith, H.: Jakstab: a static analysis platform for binaries: tool paper. In: Gupta, A., Malik, S. (eds.) Computer Aided Verification. CAV 2008. LNCS 5123, pp. 423–427. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70545-140

  34. Song, D., et al.: BitBlaze: a new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89862-7_1

  35. Luk, C.K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. ACM Sigplan Notices 40(6), 190–200 (2005). https://doi.org/10.1145/1064978.1065034

  36. Bellard, F.: QEMU, a Fast and Portable Dynamic Translator. In: Proceedings of the USENIX Annual Technical Conference, pp. 41–46 (2005)

    Google Scholar 

  37. Nataraj, L., et al.: A comparative assessment of malware classification using binary texture analysis and dynamic analysis. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 21–30 (2011). https://doi.org/10.1145/2046684.2046689

  38. Liu, Z., et al.: Automated binary analysis: a survey. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds.) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. LNCS, 13777, pp. 392–411. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-22677-9_21

  39. Zhu, K., et al.: Constructing more complete control flow graphs utilizing directed gray-box fuzzing. Appl. Sci. 11(3), 1351 (2021). https://doi.org/10.3390/app11031351

  40. Balakrishnan, G., Gruian, R., Reps, T., Teitelbaum, T.: CodeSurfer/x86—a platform for analyzing x86 executables. In: Bodik, R. (ed.) CC 2005. LNCS, vol. 3443, pp. 250–254. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31985-6_19

  41. King, J.C.: Symbolic execution and program testing. Commun. ACM 19(7), 385–394 (1976). https://doi.org/10.1145/360248.360252

    Article  MathSciNet  Google Scholar 

  42. Godefroid, P., Klarlund, N., Sen, K.: DART: directed automated random testing. In: Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation, 2005, pp. 213–223 (2005). https://doi.org/10.1145/1065010.1065036

  43. Weiser, M.: Program slicing. IEEE Trans. Softw. Eng. 4, 352–357 (1984). https://doi.org/10.1109/TSE.1984.5010248

    Article  Google Scholar 

  44. Lin, J., et al.: A value set analysis refinement approach based on conditional merging and lazy constraint solving. IEEE Access 7, 114593–114606 (2019). https://doi.org/10.1109/ACCESS.2019.2936139

  45. Qian, C., et al.: RAZOR: a framework for post-deployment software debloating. In: USENIX Security Symposium, pp. 1733–1750 (2019)

    Google Scholar 

  46. Hao, Q., et al.: A hardware security-monitoring architecture based on data integrity and control flow integrity for embedded systems. Appl. Sci. 12(15), 7750 (2022). https://doi.org/10.3390/app12157750

  47. Altinay, A., et al.: BinRec: dynamic binary lifting and recompilation. In: Proceedings of the Fifteenth European Conference on Computer Systems, pp. 1–16 (2020). https://doi.org/10.1145/3342195.3387550

  48. Ye, Z.-B., Yan, B.: Survey of symbolic execution. Comput. Sci. 45(6A), 28–35 (2018). https://doi.org/10.11896/j.issn.1002-137X.2018.Z6.005

  49. Garcia, R.: Proper Tail Calls (2015)

    Google Scholar 

  50. GNU. Gnulib Manual. https://www.gnu.org/software/gnulib/manual/html_node/Non_002dreturning-Functions.html. Accessed 26 Apr 2023

  51. Rimsa, A., Nelson Amaral, J., Pereira, F.M.Q.: Practical dynamic reconstruction of control flow graphs. Softw. Pract. Exp. 51(2), 353–384 (2021). https://doi.org/10.1002/spe.2907

  52. He, X., et al.: BinProv: binary code provenance identification without disassembly. In: Proceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses, pp. 350–363 (2022). https://doi.org/10.1145/3545948.3545956

  53. Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static disassembly. In: Proceedings of the 10th ACM Conference on Computer and Communications Security, pp. 290–299 (2003). https://doi.org/10.1145/948109.948149

  54. Steinhöfel, D.: Symbolic execution: foundations, techniques, applications, and future perspectives. In: Ahrendt, W., Beckert, B., Bubel, R., Johnsen, E.B. (eds.) The Logic of Software. A Tasting Menu of Formal Methods. LNCS, vol. 13360, pp. 446–480. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08166-8_22

  55. Vinçont, Y., Bardin, S., Marcozzi, M.: A tight integration of symbolic execution and fuzzing (Short Paper). In: Aimeur, E., Laurent, M., Yaich, R., Dupont, B., Garcia-Alfaro, J. (eds.) Foundations and Practice of Security. FPS 2021. LNCS, vol. 13291, pp. 303–310. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08147-7_20

  56. Peng, F., et al.: X-Force: force-executing binary programs for security applications. In: 23rd USENIX Security Symposium (USENIX Security 14), pp. 829–844 (2014)

    Google Scholar 

  57. Bernat, A.R., Miller, B.P.: Structured binary editing with a CFG transformation algebra. In: 2012 19th Working Conference on Reverse Engineering, pp. 9–18. IEEE (2012). https://doi.org/10.1109/WCRE.2012.11

  58. Di Federico, A., Agosta, G.: A jump-target identification method for multi-architecture static binary translation. In: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 1–10 (2016). https://doi.org/10.1145/2968455.2968514

  59. Zhu, K., Lu, Y.U., Huang, H., et al.: Construction approach for control flow graph from binaries using hybrid analysis. J. ZheJiang Univ. (Eng. Sci.) 53(5), 829–836 (2019). https://doi.org/10.3785/j.issn.1008-973X.2019.05.002

  60. Ye, Z., Jiang, X., Shi, D.: Combined method of constructing binary-oriented control flow graphs. Appl. Res. Comput. 35(7), 2168–2171 (2018). https://doi.org/10.3969/j.issn.1001-3695.2018.07.060

  61. SPEC CPU. Standard Performance Evaluation Corporation. https://www.spec.org/. Accessed 25 Apr 2023

  62. GNU Core Utilities. Free Software Foundation: Coreutils. https://ftp.gnu.org/gnu/coreutils/. Accessed 25 Apr 2023

  63. Hutchins, M., et al.: Experiments on the effectiveness of dataflow-and control-flow-based test adequacy criteria. In: Proceedings of 16th International Conference on Software Engineering, pp. 191–200. IEEE (1994). https://doi.org/10.1109/ICSE.1994.296778

  64. DARPA. DARPA cyber grand challenge. https://github.com/CyberGrand.Challenge . Accessed 25 Apr 2023

  65. The CTuning Foundation. Collective Benchmar. https://ctuning.org/. Accessed 25 Apr 2023

  66. Zhao, Y.J., Tang, Z.Y., Wang, N., Fang, D.Y., Gu, Y.X.: Evaluation of code obfuscating transformation. J. Softw. 23(3), 700–711 (2012)

    Google Scholar 

  67. Kumar, S., Moolchandani, D., Sarangi, S.R.: Hardware-assisted mechanisms to enforce control flow integrity: a comprehensive survey. J. Syst. Architect. 130, 102644 (2022). https://doi.org/10.1016/j.sysarc.2022.102644

  68. Heo, K., et al.: Effective program debloating via reinforcement learning. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2018, pp. 380–394 (2018). https://doi.org/10.1145/3243734.3243838

Download references

Acknowledgements

This work is supported by Key Science and Technology Program of Henan Province under Grant No. 182102210130, No. 232102210134, No. 232102211088. We would like to thank the anonymous reviewers for valuable comments on this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangdong Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Q., Li, X., Yue, C., He, Y. (2024). A Survey of Control Flow Graph Recovery for Binary Code. In: Zhang, M., Xu, B., Hu, F., Lin, J., Song, X., Lu, Z. (eds) Computer Applications. CCF NCCA 2023. Communications in Computer and Information Science, vol 1960. Springer, Singapore. https://doi.org/10.1007/978-981-99-8761-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8761-0_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8760-3

  • Online ISBN: 978-981-99-8761-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics