Abstract
With the rapid development of Internet applications, the study of software security has received increasing attention. The recovery of control flow graphs, as one of the fundamental tasks in software security analysis, is essential to understand the structure and flow of program execution. The accuracy of control flow recovery is crucial to security techniques such as vulnerability mining and code similarity comparison, which are based on control flow graphs. In the field of reverse analysis, the recovery of the control flow graph for binary code has become a hot research topic. In this paper, we review the methods of control flow graph construction of binary code, including static analysis, dynamic analysis, and hybrid analysis, and compare their advantages and disadvantages. After that, we discuss the difficult problems in control flow graph construction and summarize the research progress of the indirect jump problem in recent years. Finally, the focus and outlook of future research in this area are summarized and discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Change history
16 December 2023
A correction has been published.
References
Wurm, J., et al.: Security analysis on consumer and industrial IoT devices. In: 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 519–524. IEEE (2016). https://doi.org/10.1109/ASPDAC.2016.7428064
Bogart, C., et al.: When and how to make breaking changes: policies and practices in 18 open source software ecosystems. ACM Trans. Softw. Eng. Methodol. 30(4), 1–56 (2021). https://doi.org/10.1145/3447245
NIST. National Vulnerability Dtabase (2023). https://nvd.nist.gov. Accessed 26 Apr 2023
Shoshitaishvili, Y., et al.: SOK: (state of) the art of war: offensive techniques in binary analysis. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 138–157. IEEE (2016). https://doi.org/10.1109/SP.2016.17
Allen, F.E.: Control flow analysis. ACM Sigplan Notices 5(7), 1–19 (1970). https://doi.org/10.1145/390013.808479
Sun, Q., et al.: Leveraging spectral representations of control flow graphs for efficient analysis of windows malware. In: Proceedings of the ACM on Asia Conference on Computer and Communications Security, 2022, pp. 1240–1242 (2022). https://doi.org/10.1145/3488932.3527294
Wu, C.Y., et al.: IoT malware classification based on reinterpreted function-call graphs. Comput. Secur. 125, 103060 (2023). https://doi.org/10.1016/j.cose.2022.103060
Herath, J.D., et al.: CFGExplainer: explaining graph neural network-based malware classification from control flow graphs. In: 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 172–184. IEEE (2022). https://doi.org/10.1109/DSN53405.2022.00028
Cao, S., et al.: Bgnn4vd: constructing bidirectional graph neural-network for vulnerability detection. Inf. Softw. Technol. 136, 106576 (2021). https://doi.org/10.1016/j.infsof.2021.106576
Cheng, X., et al.: Path-sensitive code embedding via contrastive learning for software vulnerability detection. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 519–531 (2022). https://doi.org/10.1145/3533767.3534371
Xu, X., et al.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 363–376 (2017). https://doi.org/10.1145/3133956.3134018
Wang, H., et al.: jTrans: Jump-Aware Transformer for Binary Code Similarity. arXiv preprint arXiv:2205.12713 (2022). https://doi.org/10.48550/arXiv.2205.12713
Balakrishnan, G., Reps, T.: Wysinwyx: what you see is not what you execute. ACM Trans. Prog. Lang. Syst. 32(6), 1–84 (2010). https://doi.org/10.1145/1749608.1749612
Xu, L., Sun, F., Su, Z.: Constructing Precise Control Flow Graphs from Binaries. University of California, Davis, Tech. Rep. 28 (2009)
Hex-Rays. IDAPro Disassembler. https://www.hex-rays.com/. Accessed 24 Feb 2023
Wenzl, M., et al.: From hack to elaborate technique-a survey on binary rewriting. ACM Comput. Surv. 52(3), 1–37 (2019). https://doi.org/10.1145/3316415
Wang, J., et al.: Survey on application of machine learning in disassembly on x86 binaries. Netinfo Security 22(6), 9–25 (2022). https://doi.org/10.3969/j.issn.1671-1122.2022.06.002
Peterson, T.: Alternating Control Flow Graph Reconstruction by Combining Constant Propagation and Strided Intervals with Directed Symbolic Execution (2019). http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A1416002
Zhang, B., Li, Q.-B., Cui, C.: Dynamic control flow recovery algorithm based on automatic path driven. Comput. Eng. 39(8), 77–82 (2013). https://doi.org/10.3969/j.issn.1000-3428.2013.08.016
Di Federico, A., Payer, M., Agosta, G.: rev. ng: a unified binary analysis framework to recover CFGs and function boundaries. In: Proceedings of the 26th International Conference on Compiler Construction, pp. 131–141 (2017). https://doi.org/10.1145/3033019.3033028
Pang, C., et al.: SoK: all you ever wanted to know about x86/x64 binary disassembly but were afraid to ask. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 833–851. IEEE (2021). https://doi.org/10.1109/SP40001.2021.00012
Pang, C., et al.: Ground truth for binary disassembly is not easy. In: 31st USENIX Security Symposium (USENIX Security 22), pp. 2479–2495 (2022). https://www.usenix.org/conference/usenixsecurity22/presentation/pang-chengbin
Dai, C., et al.: Research on disassembly against the Malware obfuscated with embedded code. J. Inf. Eng. Univ. 19(3), 347–352 (2018). https://doi.org/10.3969/j.issn.1671-0673.2018.03.018
Flores-Montoya, A., Schulte, E.: Datalog disassembly. In: Proceedings of the 29th USENIX Conference on Security Symposium, pp. 1075–1092 (2020)
Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: a binary analysis platform. In: Proceedings of the Computer Aided Verification-23rd International Conference, CAV 2011, Snowbird, 14–20 July 2011, pp. 463–469 (2011). https://doi.org/10.1007/978-3-642-22110-137
Meng, X., Miller, B.P.: Binary code is not easy. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, pp. 24–35 (2016). https://doi.org/10.1145/2931037.2931047
NSA. Ghidra Software Reverse Engineering Framework. National Security Agency (2022). https://github.com/NationalSecurityAgency/ghidra. Accessed 24 Feb 2023
Radare 2 (2023). https://rada.re. Accessed 24 Feb 2023
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, PLDI 2007, p. 100. ACM (2007). https://doi.org/10.1145/1273442.1250746
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis and transformation. In: International Symposium on Code Generation and Optimization, CGO 2004, pp. 75–86. IEEE (2004). https://doi.org/10.1109/CGO.2004.1281665
Naus, N., Verbeek, F., Walker, D., Ravindran, B.: A formal semantics for P-code. In: Lal, A., Tonetta, S. (eds.) Verified Software. Theories, Tools and Experiments. VSTTE 2022. LNCS 13800, pp. 111–128. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25803-9_7
Bardin, S., et al.: The BINCOA framework for binary code analysis. In: CAV 2011, pp. 165–170. https://doi.org/10.1007/978-3-642-22110-1
Kinder, J., Veith, H.: Jakstab: a static analysis platform for binaries: tool paper. In: Gupta, A., Malik, S. (eds.) Computer Aided Verification. CAV 2008. LNCS 5123, pp. 423–427. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70545-140
Song, D., et al.: BitBlaze: a new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89862-7_1
Luk, C.K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. ACM Sigplan Notices 40(6), 190–200 (2005). https://doi.org/10.1145/1064978.1065034
Bellard, F.: QEMU, a Fast and Portable Dynamic Translator. In: Proceedings of the USENIX Annual Technical Conference, pp. 41–46 (2005)
Nataraj, L., et al.: A comparative assessment of malware classification using binary texture analysis and dynamic analysis. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 21–30 (2011). https://doi.org/10.1145/2046684.2046689
Liu, Z., et al.: Automated binary analysis: a survey. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds.) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. LNCS, 13777, pp. 392–411. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-22677-9_21
Zhu, K., et al.: Constructing more complete control flow graphs utilizing directed gray-box fuzzing. Appl. Sci. 11(3), 1351 (2021). https://doi.org/10.3390/app11031351
Balakrishnan, G., Gruian, R., Reps, T., Teitelbaum, T.: CodeSurfer/x86—a platform for analyzing x86 executables. In: Bodik, R. (ed.) CC 2005. LNCS, vol. 3443, pp. 250–254. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31985-6_19
King, J.C.: Symbolic execution and program testing. Commun. ACM 19(7), 385–394 (1976). https://doi.org/10.1145/360248.360252
Godefroid, P., Klarlund, N., Sen, K.: DART: directed automated random testing. In: Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation, 2005, pp. 213–223 (2005). https://doi.org/10.1145/1065010.1065036
Weiser, M.: Program slicing. IEEE Trans. Softw. Eng. 4, 352–357 (1984). https://doi.org/10.1109/TSE.1984.5010248
Lin, J., et al.: A value set analysis refinement approach based on conditional merging and lazy constraint solving. IEEE Access 7, 114593–114606 (2019). https://doi.org/10.1109/ACCESS.2019.2936139
Qian, C., et al.: RAZOR: a framework for post-deployment software debloating. In: USENIX Security Symposium, pp. 1733–1750 (2019)
Hao, Q., et al.: A hardware security-monitoring architecture based on data integrity and control flow integrity for embedded systems. Appl. Sci. 12(15), 7750 (2022). https://doi.org/10.3390/app12157750
Altinay, A., et al.: BinRec: dynamic binary lifting and recompilation. In: Proceedings of the Fifteenth European Conference on Computer Systems, pp. 1–16 (2020). https://doi.org/10.1145/3342195.3387550
Ye, Z.-B., Yan, B.: Survey of symbolic execution. Comput. Sci. 45(6A), 28–35 (2018). https://doi.org/10.11896/j.issn.1002-137X.2018.Z6.005
Garcia, R.: Proper Tail Calls (2015)
GNU. Gnulib Manual. https://www.gnu.org/software/gnulib/manual/html_node/Non_002dreturning-Functions.html. Accessed 26 Apr 2023
Rimsa, A., Nelson Amaral, J., Pereira, F.M.Q.: Practical dynamic reconstruction of control flow graphs. Softw. Pract. Exp. 51(2), 353–384 (2021). https://doi.org/10.1002/spe.2907
He, X., et al.: BinProv: binary code provenance identification without disassembly. In: Proceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses, pp. 350–363 (2022). https://doi.org/10.1145/3545948.3545956
Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static disassembly. In: Proceedings of the 10th ACM Conference on Computer and Communications Security, pp. 290–299 (2003). https://doi.org/10.1145/948109.948149
Steinhöfel, D.: Symbolic execution: foundations, techniques, applications, and future perspectives. In: Ahrendt, W., Beckert, B., Bubel, R., Johnsen, E.B. (eds.) The Logic of Software. A Tasting Menu of Formal Methods. LNCS, vol. 13360, pp. 446–480. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08166-8_22
Vinçont, Y., Bardin, S., Marcozzi, M.: A tight integration of symbolic execution and fuzzing (Short Paper). In: Aimeur, E., Laurent, M., Yaich, R., Dupont, B., Garcia-Alfaro, J. (eds.) Foundations and Practice of Security. FPS 2021. LNCS, vol. 13291, pp. 303–310. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08147-7_20
Peng, F., et al.: X-Force: force-executing binary programs for security applications. In: 23rd USENIX Security Symposium (USENIX Security 14), pp. 829–844 (2014)
Bernat, A.R., Miller, B.P.: Structured binary editing with a CFG transformation algebra. In: 2012 19th Working Conference on Reverse Engineering, pp. 9–18. IEEE (2012). https://doi.org/10.1109/WCRE.2012.11
Di Federico, A., Agosta, G.: A jump-target identification method for multi-architecture static binary translation. In: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 1–10 (2016). https://doi.org/10.1145/2968455.2968514
Zhu, K., Lu, Y.U., Huang, H., et al.: Construction approach for control flow graph from binaries using hybrid analysis. J. ZheJiang Univ. (Eng. Sci.) 53(5), 829–836 (2019). https://doi.org/10.3785/j.issn.1008-973X.2019.05.002
Ye, Z., Jiang, X., Shi, D.: Combined method of constructing binary-oriented control flow graphs. Appl. Res. Comput. 35(7), 2168–2171 (2018). https://doi.org/10.3969/j.issn.1001-3695.2018.07.060
SPEC CPU. Standard Performance Evaluation Corporation. https://www.spec.org/. Accessed 25 Apr 2023
GNU Core Utilities. Free Software Foundation: Coreutils. https://ftp.gnu.org/gnu/coreutils/. Accessed 25 Apr 2023
Hutchins, M., et al.: Experiments on the effectiveness of dataflow-and control-flow-based test adequacy criteria. In: Proceedings of 16th International Conference on Software Engineering, pp. 191–200. IEEE (1994). https://doi.org/10.1109/ICSE.1994.296778
DARPA. DARPA cyber grand challenge. https://github.com/CyberGrand.Challenge . Accessed 25 Apr 2023
The CTuning Foundation. Collective Benchmar. https://ctuning.org/. Accessed 25 Apr 2023
Zhao, Y.J., Tang, Z.Y., Wang, N., Fang, D.Y., Gu, Y.X.: Evaluation of code obfuscating transformation. J. Softw. 23(3), 700–711 (2012)
Kumar, S., Moolchandani, D., Sarangi, S.R.: Hardware-assisted mechanisms to enforce control flow integrity: a comprehensive survey. J. Syst. Architect. 130, 102644 (2022). https://doi.org/10.1016/j.sysarc.2022.102644
Heo, K., et al.: Effective program debloating via reinforcement learning. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2018, pp. 380–394 (2018). https://doi.org/10.1145/3243734.3243838
Acknowledgements
This work is supported by Key Science and Technology Program of Henan Province under Grant No. 182102210130, No. 232102210134, No. 232102211088. We would like to thank the anonymous reviewers for valuable comments on this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, Q., Li, X., Yue, C., He, Y. (2024). A Survey of Control Flow Graph Recovery for Binary Code. In: Zhang, M., Xu, B., Hu, F., Lin, J., Song, X., Lu, Z. (eds) Computer Applications. CCF NCCA 2023. Communications in Computer and Information Science, vol 1960. Springer, Singapore. https://doi.org/10.1007/978-981-99-8761-0_16
Download citation
DOI: https://doi.org/10.1007/978-981-99-8761-0_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8760-3
Online ISBN: 978-981-99-8761-0
eBook Packages: Computer ScienceComputer Science (R0)