Function matching between binary executables: efficient algorithms and features

  • Chariton KaramitasEmail author
  • Athanasios Kehagias
Original Paper


Binary diffing consists in comparing syntactic and semantic differences of two programs in binary form, when source code is unavailable. It can be reduced to a graph isomorphism problem between the Control Flow Graphs, Call Graphs or other forms of graphs of the compared programs. Here we present REveal, a prototype tool which implements a binary diffing algorithm and an associated set of features, extracted from a binary’s CG and CFGs. Additionally, we explore the potential of applying Markov lumping techniques on function CFGs. The proposed algorithm and features are evaluated in a series of experiments on executables compiled for i386, amd64, arm and aarch64. Furthermore, the effectiveness of our prototype tool, code-named REveal, is assessed in a second series of experiments involving clustering of a corpus of 18 malware samples into 5 malware families. REveal’s results are compared against those produced by Diaphora, the most widely used binary diffing software of the public domain. We conclude that REveal improves the state-of-the-art in binary diffing by achieving higher matching scores, obtained at the cost of a slight running time increase, in most of the experiments conducted. Furthermore, REveal successfully partitions the malware corpus into clusters consisting of samples of the same malware family.



  1. 1.
    Aho, A., Lam, M., Sethi, R., Ullmanr, J.: Compilers: Principles, Techniques, and Tools, 2nd edn. Addison-Wesley Longman Publishing Co., Boston (2006)Google Scholar
  2. 2.
    Bourquin, M., King, A., Robbins, E.: BinSlayer: accurate comparison of binary executables. In: 2nd ACM SIGPLAN Program Protection and Reverse Engineering (2013)Google Scholar
  3. 3.
    Cesare, S., Xiang, Y.: Classification of malware using structured control flow. In: Proceedings of the 8th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2010) (2010)Google Scholar
  4. 4.
    Cesare, S., Xiang, Y., Zhou, W.: Control flow-based malware variant detection. IEEE Trans. Dependable Secur Comput 11, 307–317 (2013)CrossRefGoogle Scholar
  5. 5.
    Deo, N.: Graph Theory with Applications to Engineering and Computer Science. Prentice-Hall Inc, Upper Saddle River (1974)zbMATHGoogle Scholar
  6. 6.
    Derisavi, S., Hermanns, H., Sanders, W.: Optimal state-space lumping in Markov chains. Inf. Process. Lett. 87, 309–315 (2003)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Koret, J.: Diaphora: A Free and Open Source Program Diffing Tool [Online]. Accessed 15 Apr 2019
  8. 8.
    Dullien, T., Rolles, R.: Graph-based comparison of executable objects. In: Proceedings of the Symposium sur la Securite des Technologies de l’Information et des Communications (2005)Google Scholar
  9. 9.
    Dullien, T., Carrera, E., Eppler, S. M., Porst, S.: Automated attacker correlation for malicious code. In: NATO Information Systems Technology (IST) 091 (2010)Google Scholar
  10. 10.
    Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: SP ’15 Proceedings of the 2015 IEEE Symposium on Security and Privacy (2016)Google Scholar
  11. 11.
    Flake, H.: Structural comparison of executable objects. In: Proceedings of the IEEE Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA) (2004)Google Scholar
  12. 12.
    Gao, D., Reiter, M., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Information and Communications Security, pp. 238–255 (2008)CrossRefGoogle Scholar
  13. 13.
    Hex-Rays: IDA Pro [Online]. Accessed 15 Apr 2019
  14. 14.
    Henderson, T.A.D., Podgurski, A.: Sampling code clones from program dependence graphs with GRAPLE. In: SWAN 2016 Proceedings of the 2nd International Workshop on Software Analytics (2016)Google Scholar
  15. 15.
    Howard, R.: Dynamic Probabilistic Systems: volume I: Markov Models. Wiley, Hoboken (1971)Google Scholar
  16. 16.
    Howard, R.: Dynamic Probabilistic Systems. Volume II: Semi-Markov and Decision Processes. Wiley, Hoboken (1971)Google Scholar
  17. 17.
    Hu, X., Chiueh, T., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Computer and Communications Security, pp. 611–620 (2009)Google Scholar
  18. 18.
    Intel: Intel X86 Encoder Decoder Software Library [Online]. Accessed 15 Apr 2019
  19. 19.
    Intel: Intel X86 Encoder Decoder [Online]. Accessed 15 Apr 2019
  20. 20.
    Jurczyk, M.: Using Binary Diffing to Discover Windows Kernel Memory Disclosure Bugs [Online]. Accessed 15 Apr 2019
  21. 21.
    Karamitas, C.: Python Bindings for Intel’s XED [Online]. Accessed 15 Apr 2019
  22. 22.
    Karamitas, C., Kehagias, A.: Efficient Features for function matching between binary executables. In: 2018 IEEE 25th Int Conf Softw Anal Evol Reengineering (SANER), vol. 1, pp. 335–345 (2018)Google Scholar
  23. 23.
    Kostakis, O., Kinable, J., Mahmoudi, H., Mustonen, K.: Improved call graph comparison using simulated annealing. In: Proceedings of the 2011 ACM Symposium on Applied Computing (2011)Google Scholar
  24. 24.
    Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. In: Soviet Physics Doklady, pp. 707–710 (1966)Google Scholar
  25. 25.
    Ming, J., Pan, M., Gao, D.: iBinHunt: binary hunting with inter-procedural control flow. In: Lecture Notes in Computer Science, pp. 92–109 (2013)Google Scholar
  26. 26.
    Ming, J., Xu, D., Jiang, Y., Wu, D.: BinSim: trace-based semantic binary diffing via system call sliced segment equivalence checking. In: 26th USENIX Security Symposium (USENIX Security 17) (2017)Google Scholar
  27. 27.
    McAfee: McAfee Labs Threats Report April (2017) [Online]. Accessed 15 Apr 2019
  28. 28.
    Panda Security: Pandalabs Quarterly Report Q1 (2017) [Online]. Accessed 15 Apr 2019
  29. 29.
    Ramalingam, G.: On loops, dominators, and dominance frontiers. In: PLDI’00 Proceedings of the ACM SIGPLAN 2000 conference on Programming Language Design and Implementation, pp. 233–241 (2000)Google Scholar
  30. 30.
    SafeCorp: Detecting Software IP Theft Using CodeMatch [Online]. Accessed 15 Apr 2019
  31. 31.
    Tarjan, R.: Testing flow graph reducibility. In: STOC’73 Proceedings of the Fifth Annual ACM Symposium on Theory of Computing, pp. 96–107 (1973)Google Scholar
  32. 32.
    Valmari, A., Franceschinis, G.: Simple O(mlogn) time Markov chain lumping. In: TACAS’10 Proceedings of the 16th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 38–52 (2010)CrossRefGoogle Scholar
  33. 33.
    Wang, Z., Pierce, K., McFarling, S.: BMAT: a binary matching tool. In: Second ACM Workshop on Feedback-Directed and Dynamic Optimization (1999)Google Scholar
  34. 34.
    Wang, Z., Pierce, K., McFarling, S.: BMAT: a binary matching tool for stale profile propagation. J Instr Level Parallel 2, 1–20 (2000)Google Scholar
  35. 35.
    Zynamics: BinDiff [Online]. Accessed 15 Apr 2019

Copyright information

© Springer-Verlag France SAS, part of Springer Nature 2019

Authors and Affiliations

  1. 1.CENSUS S.A.ThessalonıkiGreece
  2. 2.Department of Electrical and Computer EngineeringAristotle University of ThessalonikiThessalonıkiGreece

Personalised recommendations