Skip to main content

Sound C Code Decompilation for a Subset of x86-64 Binaries

  • Conference paper
  • First Online:
Software Engineering and Formal Methods (SEFM 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12310))

Included in the following conference series:

Abstract

We present FoxDec: an approach to C code decompilation that aims at producing sound and recompilable code. Formal methods are used during three phases of the decompilation process: control flow recovery, symbolic execution, and variable analysis. The use of formal methods minimizes the trusted code base and ensures soundness: the extracted C code behaves the same as the original binary. Soundness and recompilablity enable C code decompilation to be used in the contexts of binary patching, binary porting, binary analysis and binary improvement, with confidence that the recompiled code’s behavior is consistent with the original program. We demonstrate that FoxDec can be used to improve execution speed by recompiling a binary with different compiler options, to patch a memory leak with a code transformation tool, and to port a binary to a different architecture. FoxDec can also be leveraged to port a binary to run as a unikernel, a minimal and secure virtual machine usually requiring source access for porting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. National Security Agency. Ghidra (2019). https://www.nsa.gov/resources/everyone/ghidra/

  2. Andriesse, D., Chen, X., Van Der Veen, V., Slowinska, A., Bos, H.: An in-depth analysis of disassembly on full-scale x86/x64 binaries. In: 25th USENIX Security Symposium (USENIX Security 2016), pp. 583–600 (2016)

    Google Scholar 

  3. Balakrishnan, G., Gruian, R., Reps, T., Teitelbaum, T.: CodeSurfer/x86—a platform for analyzing x86 executables. In: Bodik, R. (ed.) CC 2005. LNCS, vol. 3443, pp. 250–254. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31985-6_19

    Chapter  Google Scholar 

  4. Bauman, E., Lin, Z., Hamlen, K.W.: Superset disassembly: statically rewriting x86 binaries without heuristics. In: NDSS (2018)

    Google Scholar 

  5. Bellard, F.: QEMU, a fast and portable dynamic translator. In: USENIX Annual Technical Conference, FREENIX Track, vol. 41, p. 46 (2005)

    Google Scholar 

  6. Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81. ACM (2008)

    Google Scholar 

  7. Bonfante, G., Kaczmarek, M., Marion, J.-Y.: Control flow graphs as malware signatures (2007)

    Google Scholar 

  8. Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: a binary analysis platform. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 463–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_37

    Chapter  Google Scholar 

  9. Brumley, D., Lee, J.H., Schwartz, E.J., Woo, M.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In: Presented as part of the 22nd USENIX Security Symposium (USENIX Security 2013), pp. 353–368 (2013)

    Google Scholar 

  10. Bugnion, E., Nieh, J., Tsafrir, D.: Hardware and software support for virtualization. Synth. Lect. Comput. Archit. 12(1), 1–206 (2017)

    Article  Google Scholar 

  11. Cifuentes, C., Gough, K.J.: Decompilation of binary programs. Softw. Pract. Exp. 25(7), 811–829 (1995)

    Article  Google Scholar 

  12. Cifuentes, C., Simon, D., Fraboulet, A.: Assembly to high-level language translation. In: Proceedings of International Conference on Software Maintenance (Cat. No. 98CB36272), pp. 228–237. IEEE (1998)

    Google Scholar 

  13. De Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24

    Chapter  Google Scholar 

  14. Dinaburg, A., Ruef, A.: Mcsema: static translation of x86 instructions to LLVM. In: ReCon 2014 Conference, Montreal, Canada (2014)

    Google Scholar 

  15. Ďurfina, L., et al.: Design of a retargetable decompiler for a static platform-independent malware analysis. In: Kim, T., Adeli, H., Robles, R.J., Balitanas, M. (eds.) ISA 2011. CCIS, vol. 200, pp. 72–86. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23141-4_8

    Chapter  Google Scholar 

  16. Ferguson, J., Kaminsky, D.: Reverse engineering code with IDA Pro. Syngress (2008)

    Google Scholar 

  17. Fokin, A., Derevenetc, E., Chernov, A., Troshina, K.: SmartDec: approaching C++ decompilation. In: 2011 18th Working Conference on Reverse Engineering, pp. 347–356. IEEE (2011)

    Google Scholar 

  18. German, S.M., Wegbreit, B.: A synthesizer of inductive assertions. IEEE Trans. Softw. Eng. 1(1), 68–75 (1975)

    Article  Google Scholar 

  19. Guilfanov, I.: Decompilers and beyond. Black Hat USA (2008)

    Google Scholar 

  20. Hecht, M.S., Ullman, J.D.: Characterizations of reducible flow graphs. J. ACM (JACM) 21(3), 367–375 (1974)

    Article  MathSciNet  Google Scholar 

  21. Heule, S., Schkufza, E., Sharma, R., Aiken, A.: Stratified synthesis: automatically learning the x86-64 instruction set. In: ACM SIGPLAN Notices, vol. 51, pp. 237–250. ACM (2016)

    Google Scholar 

  22. Horspool, R.N., Marovac, N.: An approach to the problem of detranslation of computer programs. Comput. J. 23(3), 223–229 (1980)

    Article  Google Scholar 

  23. Khadra, M.A.B., Stoffel, D., Kunz, W.: Speculative disassembly of binary code. In: 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES), pp. 1–10, October 2016

    Google Scholar 

  24. Kirchner, F., Kosmatov, N., Prevosto, V., Signoles, J., Yakobowski, B.: Frama-C: a software analysis perspective. Formal Aspects Comput. 27(3), 573–609 (2015). https://doi.org/10.1007/s00165-014-0326-7

    Article  MathSciNet  Google Scholar 

  25. Křoustek, J.: Retargetable analysis of machine code. Ph.D. thesis, Brno, FIT BUT (2014)

    Google Scholar 

  26. Křoustek, J., Kolár, D.: Preprocessing of binary executable files towards retargetable decompilation. In: 8th International Multi-Conference on Computing in the Global Information Technology (ICCGI 2013), pp. 259–264 (2013)

    Google Scholar 

  27. Lankes, S., Pickartz, S., Breitbart, J.: Hermitcore: a unikernel for extreme scale computing. In: Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, p. 4. ACM (2016)

    Google Scholar 

  28. Madhavapeddy, A., et al.: Unikernels: library operating systems for the cloud. ACM SIGPLAN Not. 48(4), 461–472 (2013)

    Article  Google Scholar 

  29. Merkel, D.: Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014(239), 2 (2014)

    Google Scholar 

  30. Mycroft, A.: Type-based decompilation (or program reconstruction via type reconstruction). In: Swierstra, S.D. (ed.) ESOP 1999. LNCS, vol. 1576, pp. 208–223. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49099-X_14

  31. Myreen, M.O., Gordon, M.J.C., Slind, K.: Machine-code verification for multiple architectures - an application of decompilation into logic. In: Formal Methods in Computer-Aided Design, pp. 1–8, November 2008

    Google Scholar 

  32. Myreen, M.O., Gordon, M.J.C., Slind, K.: Decompilation into logic - improved. In: 2012 Formal Methods in Computer-Aided Design (FMCAD), pp. 78–81. IEEE (2012)

    Google Scholar 

  33. Nipkow, T., Wenzel, M., Paulson, L.C. (eds.): Isabelle/HOL: A Proof Assistant for Higher-Order Logic. LNCS, vol. 2283. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45949-9

    Book  MATH  Google Scholar 

  34. Olivier, P., Chiba, D., Lankes, S., Min, C., Ravindran, B.: A binary-compatible unikernel. In: Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 2019) (2019)

    Google Scholar 

  35. Padioleau, Y., Lawall, J., Hansen, R.R., Muller, G.: Documenting and automating collateral evolutions in Linux device drivers. In: Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008, Eurosys 2008, pp. 247–260. ACM, New York (2008)

    Google Scholar 

  36. Padioleau, Y., Lawall, J.L., Muller, G.: Semantic patches, documenting and automating collateral evolutions in Linux device drivers. In: Ottawa Linux Symposium (OLS 2007), Ottawa, Canada (2007)

    Google Scholar 

  37. Proebsting, T.A., Watterson, S.A.: Krakatoa: decompilation in Java (does bytecode reveal source?). In: COOTS, pp. 185–198 (1997)

    Google Scholar 

  38. Roessle, I., Verbeek, F., Ravindran, B.: Formally verified big step semantics out of x86-64 binaries. In: Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, pp. 181–195. ACM (2019)

    Google Scholar 

  39. Shoshitaishvili, Y., et al.: SoK: (state of) the art of war: offensive techniques in binary analysis. In: IEEE Symposium on Security and Privacy (2016)

    Google Scholar 

  40. Sulaman, S.M., Orucevic-Alagic, A., Borg, M., Wnuk, K., Höst, M., de la Vara, J.L.: Development of safety-critical software systems using open source software - a systematic map. In: 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA), pp. 17–24. IEEE (2014)

    Google Scholar 

  41. Wang, R., et al.: Ramblr: making reassembly great again. In: NDSS (2017)

    Google Scholar 

  42. Wei, T., Mao, J., Zou, W., Chen, Y.: A new algorithm for identifying loops in decompilation. In: Nielson, H.R., Filé, G. (eds.) SAS 2007. LNCS, vol. 4634, pp. 170–183. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74061-2_11

    Chapter  Google Scholar 

  43. Liang, X., Sun, F., Zhendong, S.: Constructing precise control flow graphs from binaries. University of California, Davis, Technical report (2009)

    Google Scholar 

  44. Yakdan, K., Eschweiler, S., Gerhards-Padilla, E., Smith, M.: No more gotos: decompilation using pattern-independent control-flow structuring and semantic-preserving transformations. In: NDSS (2015)

    Google Scholar 

Download references

Acknowledgments

This work is supported in part by the US Office of Naval Research (ONR) under grants N00014-17-1-2297, N00014-16-1-2104, and N00014-18-1-2022.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Freek Verbeek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Verbeek, F., Olivier, P., Ravindran, B. (2020). Sound C Code Decompilation for a Subset of x86-64 Binaries. In: de Boer, F., Cerone, A. (eds) Software Engineering and Formal Methods. SEFM 2020. Lecture Notes in Computer Science(), vol 12310. Springer, Cham. https://doi.org/10.1007/978-3-030-58768-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58768-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58767-3

  • Online ISBN: 978-3-030-58768-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics