Skip to main content

Binary Analysis Overview

  • Chapter
  • First Online:
Binary Code Fingerprinting for Cybersecurity

Abstract

When the source code is unavailable, it is important for security applications, such as malware detection, software license infringement , vulnerability analysis , and digital forensics to be able to efficiently extract meaningful fingerprints from the binary code. Such fingerprints will enhance the effectiveness and efficiency of reverse engineering tasks as they can provide a range of insights into the program binaries. However, a great deal of important information will likely be lost during the compilation process, including variable and function names, the original control and data flow structures, comments, and layout. In this chapter, we provide a comprehensive review of existing binary code fingerprinting frameworks. As such, we systematize the study of binary code fingerprints based on the most important dimensions: the applications that motivate it, the approaches used and their implementations, the specific aspects of the fingerprinting framework, and how the results are evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Fingerprint generation based on a sequence of hash values that allows to make sure that at least part of any sufficiently long match is detected.

References

  1. Malheur: Automatic Analysis of Malware Behavior. http://www.mlsec.org/malheur/, 2015.

  2. C++ refactoring tools for visual studio. http://www.wholetomato.com/, 2016. Accessed: February 2016.

  3. Refactoring tool. https://www.devexpress.com/Products/CodeRush/, 2018. Accessed: February 2018.

  4. EXEINFO PE. http://exeinfo.atwebpages.com/, 2019. Accessed: June 2019.

  5. Hex-Rays IDA Pro. https://www.hex-rays.com/products/ida/, 2019. Accessed: June 2019.

  6. HexRays: IDA Pro. https://www.hex-rays.com/products/ida/, 2019. Accessed: January 2019.

  7. OllyDbg, a 32-bit Assembler Level Analysing Debugger for Microsoft Windows. http://ollydbg.de/, 2019. Accessed: June 2019.

  8. PEfile. http://code.google.com/p/pefile/, 2019. Accessed: June 2019.

  9. RDG_Packer_Detector. http://www.rdgsoft.net/, 2019. Accessed: June 2019.

  10. The Paradyn Project. http://www.paradyn.org/html/dyninst9.0.0-features.html, 2019. Accessed: June 2019.

  11. PlanetMath. Symmetric Difference. https://planetmath.org/symmetricdifference, 2019. Accessed: 2019.

  12. Tigress, a Diversifying Virtualizer/Obfuscator for the C language. http://tigress.cs.arizona.edu/, 2019. Accessed: June 2019.

  13. Zynamics, BinNavi: Binary Code Reverse Engineering Tool. http://www.zynamics.com/binnavi.html, 2019. Accessed: June 2019.

  14. Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R Tallent. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 22(6):685–701, 2010.

    Google Scholar 

  15. Hiralal Agrawal and Joseph R Horgan. Dynamic program slicing. In ACM SIGPLAN Notices, volume 25, pages 246–256. ACM, 1990.

    Google Scholar 

  16. Agrawal, Parag and Arasu, Arvind and Kaushik, Raghav. On indexing error-tolerant set containment. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pages 927–938, 2010.

    Google Scholar 

  17. Shahinur Alam, R Nigel Horspool, and Issa Traore. MARD: a framework for metamorphic malware analysis and real-time detection. In The 28th International Conference on Advanced Information Networking and Applications (AINA), pages 480–489. IEEE, 2014.

    Google Scholar 

  18. Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. OBA2: an onion approach to binary code authorship attribution. Digital Investigation, 11:S94–S103, 2014.

    Article  Google Scholar 

  19. Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code. Digital Investigation, 12:S61–S71, 2015.

    Article  Google Scholar 

  20. Saed Alrabaee, Lingyu Wang, and Mourad Debbabi. BinGold: Towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (SFGs). Digital Investigation, 18:S11–S22, 2016.

    Article  Google Scholar 

  21. Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06)., pages 459–468. IEEE, 2006.

    Google Scholar 

  22. Dorian C Arnold, Dong H Ahn, Bronis R De Supinski, Gregory L Lee, Barton P Miller, and Martin Schulz. Stack trace analysis for large scale debugging. In IEEE International on Parallel and Distributed Processing Symposium (IPDPS), pages 1–10. IEEE, 2007.

    Google Scholar 

  23. Thanassis Avgerinos, Sang Kil Cha, Alexandre Rebert, Edward J Schwartz, Maverick Woo, and David Brumley. Automatic exploit generation. Communications of the ACM, 57(2):74–84, 2014.

    Article  Google Scholar 

  24. Gogul Balakrishnan, Radu Gruian, Thomas Reps, and Tim Teitelbaum. CodeSurfer/x86—A platform for analyzing x86 executables. In Compiler Construction, pages 250–254. Springer, 2005.

    Google Scholar 

  25. Gogul Balakrishnan and Thomas Reps. WYSINWYX: What you see is not what you eXecute. ACM Transactions on Programming Languages and Systems (TOPLAS), 32(6):23, 2010.

    Google Scholar 

  26. Tiffany Bao, Jonathan Burket, Maverick Woo, Rafael Turner, and David Brumley. BYTEWEIGHT: Learning to Recognize Functions in Binary Code. In 23rd USENIX Security Symposium (USENIX Security 14), pages 845–860, 2014.

    Google Scholar 

  27. Sébastien Bardin, Philippe Herrmann, Jérôme Leroux, Olivier Ly, Sighireanu M., R. Tabary, T. Touili, and Aymeric Vincent. Description of the BINCOA Model. In Deliverable J1.1 part 2 of ANR Project BINCOA, 2009.

    Google Scholar 

  28. Sébastien Bardin, Philippe Herrmann, Jérôme Leroux, Olivier Ly, Renaud Tabary, and Aymeric Vincent. The BINCOA framework for binary code analysis. In International Conference on Computer Aided Verification, pages 165–170. Springer, 2011.

    Google Scholar 

  29. Mayank Bawa, Tyson Condie, and Prasanna Ganesan. LSH forest: self-tuning indexes for similarity search. In Proceedings of the 14th international conference on World Wide Web, pages 651–660. ACM, 2005.

    Google Scholar 

  30. Laszlo A. Belady and Meir M Lehman. A model of large program development. IBM Systems journal, 15(3):225–252, 1976.

    Article  MATH  Google Scholar 

  31. Martial Bourquin, Andy King, and Edward Robbins. BinSlayer: accurate comparison of binary executables. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, page 4. ACM, 2013.

    Google Scholar 

  32. David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward J Schwartz. BAP: A binary analysis platform. In International Conference on Computer Aided Verification, pages 463–469. Springer, 2011.

    Google Scholar 

  33. Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. Code normalization for self-mutating malware. IEEE Security & Privacy, (2):46–54, 2007.

    Google Scholar 

  34. Juan Caballero, Noah M Johnson, Stephen McCamant, and Dawn Song. Binary code extraction and interface identification for security applications. Technical report, University of California, Berkeley, Dept. of Electrical Engineering and Computer Science, 2009.

    Google Scholar 

  35. Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX conference on Operating Systems Design and Implementation, pages 209–224. USENIX Association, 2008.

    Google Scholar 

  36. Aylin Caliskan-Islam, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan. When coding style survives compilation: De-anonymizing programmers from executable binaries. The 25th Annual Network and Distributed System Security Symposium (NDSS), pages 255–270, 2018.

    Google Scholar 

  37. Joan Calvet, José M Fernandez, and Jean-Yves Marion. Aligot: cryptographic function identification in obfuscated binary programs. In Proceedings of the 2012 ACM conference on Computer and communications security (CCS), pages 169–182. ACM, 2012.

    Google Scholar 

  38. Silvio Cesare, Yang Xiang, and Wanlei Zhou. Control flow-based malware variantdetection. IEEE Transactions on Dependable and Secure Computing (TDSC), 11(4):307–317, 2014.

    Article  Google Scholar 

  39. Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. Unleashing mayhem on binary code. In IEEE Symposium on Security and Privacy (S&P), pages 380–394. IEEE, 2012.

    Google Scholar 

  40. Sang Kil Cha, Maverick Woo, and David Brumley. Program-adaptive mutational fuzzing. In IEEE Symposium on Security and Privacy (S&P), pages 725–741. IEEE, 2015.

    Google Scholar 

  41. Sagar Chaki, Cory Cohen, and Arie Gurfinkel. Supervised learning for provenance-similarity of binaries. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 15–23. ACM, 2011.

    Google Scholar 

  42. Chandra, Mahalanobis Prasanta and Others. On the generalised distance in statistics. Proceedings of the National Institute of Sciences of India, 2(1):49–55, 1936.

    Google Scholar 

  43. Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. BinGo: cross-architecture cross-OS binary search. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 678–689. ACM, 2016.

    Google Scholar 

  44. Eric Cheng. Binary Analysis and Symbolic Execution with angr. PhD thesis, The MITRE Corporation, 2016.

    Google Scholar 

  45. Vitaly Chipounov, Volodymyr Kuznetsov, and George Candea. The S2E platform: Design, implementation, and applications. ACM Transactions on Computer Systems (TOCS), 30(1):2, 2012.

    Google Scholar 

  46. Young Han Choi, Byoung Jin Han, Byung Chul Bae, Hyung Geun Oh, and Ki Wook Sohn. Toward extracting malware features for classification using static and dynamic analysis. In The 8th International Conference on Computing and Networking Technology (ICCNT), pages 126–129. IEEE, 2012.

    Google Scholar 

  47. Paolo Milani Comparetti, Guido Salvaneschi, Engin Kirda, Clemens Kolbitsch, Christopher Kruegel, and Stefano Zanero. Identifying dormant functionality in malware programs. In IEEE Symposium on Security and Privacy (S&P), pages 61–76. IEEE, 2010.

    Google Scholar 

  48. Christoph Csallner and Yannis Smaragdakis. Check‘n’crash: combining static checking and testing. In Proceedings of the 27th international conference on Software engineering, pages 422–431. ACM, 2005.

    Google Scholar 

  49. Ţăpuş, Cristian and Chung, I-Hsin and Hollingsworth, Jeffrey K and others. Active harmony: Towards automated performance tuning. In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1–11. IEEE Computer Society Press, 2002.

    Google Scholar 

  50. Yaniv David, Nimrod Partush, and Eran Yahav. Statistical similarity of binaries. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 266–280. ACM, 2016.

    Google Scholar 

  51. Yaniv David and Eran Yahav. Tracelet-based code search in executables. ACM SIGPLAN Notices, 49(6):349–360, 2014.

    Article  Google Scholar 

  52. De Maesschalck, Roy and Jouan-Rimbaud, Delphine, and Massart, Désiré L. The mahalanobis distance. Chemometrics and Intelligent Laboratory Systems, 50(1): 1–18, 2000.

    Article  Google Scholar 

  53. Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 337–340. Springer, 2008.

    Google Scholar 

  54. Alessandro Di Federico, Mathias Payer, and Giovanni Agosta. REV.NG: a unified binary analysis framework to recover CFGs and function boundaries. In Proceedings of the 26th International Conference on Compiler Construction, pages 131–141. ACM, 2017.

    Google Scholar 

  55. Steven HH Ding, Benjamin Fung, and Philippe Charland. Kam1n0: Mapreduce-based assembly clone search for reverse engineering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 461–470. ACM, 2016.

    Google Scholar 

  56. Adel Djoudi and Sébastien Bardin. BINSEC: Binary Code Analysis with Low-Level Regions. In Tools and Algorithms for the Construction and Analysis of Systems, pages 212–217. Springer, 2015.

    Google Scholar 

  57. Tudor Dumitraş and Darren Shou. Toward a standard benchmark for computer security research: The Worldwide Intelligence Network Environment (WINE). In Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS workshop), pages 89–96. ACM, 2011.

    Google Scholar 

  58. Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys (CSUR), 44(2):6, 2012.

    Google Scholar 

  59. Manuel Egele, Maverick Woo, Peter Chapman, and David Brumley. Blanket execution: Dynamic similarity testing for program binaries and components. In 23rd USENIX Security Symposium (USENIX Security 14), pages 303–317, 2014.

    Google Scholar 

  60. Khaled ElWazeer, Kapil Anand, Aparna Kotha, Matthew Smithson, and Rajeev Barua. Scalable variable and data type detection in a binary rewriter. In ACM SIGPLAN Notices, volume 48, pages 51–60. ACM, 2013.

    Google Scholar 

  61. Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. discovRE: Efficient cross-architecture identification of bugs in binary code. In Proceedings of the 23rd Symposium on Network and Distributed System Security (NDSS), 2016.

    Google Scholar 

  62. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. Journal of machine learning research, 9(Aug):1871–1874, 2008.

    MATH  Google Scholar 

  63. Wenbin Fang, Barton P Miller, and James A Kupsch. Automated tracing and visualization of software security structure and properties. In Proceedings of the ninth international symposium on visualization for cyber security, pages 9–16. ACM, 2012.

    Google Scholar 

  64. Mohammad Reza Farhadi, Benjamin Fung, Philippe Charland, and Mourad Debbabi. BinClone: detecting code clones in malware. In Eighth International Conference on Software Security and Reliability (SERE), pages 78–87. IEEE, 2014.

    Google Scholar 

  65. Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. Scalable Graph-based Bug Search for Firmware Images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 480–491. ACM, 2016.

    Google Scholar 

  66. Jeanne Ferrante, Karl J Ottenstein, and Joe D Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems (TOPLAS), 9(3):319–349, 1987.

    Article  MATH  Google Scholar 

  67. Halvar Flake. Graph-based binary analysis. Blackhat Briefings 2002, 2002.

    Google Scholar 

  68. Martin Fowler. Refactoring: improving the design of existing code. Pearson Education India, 1999.

    MATH  Google Scholar 

  69. Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 541–552. ACM, 2012.

    Google Scholar 

  70. Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: directed automated random testing. In ACM Sigplan Notices, volume 40, pages 213–223. ACM, 2005.

    Google Scholar 

  71. Patrice Godefroid, Michael Y Levin, and David Molnar. SAGE: whitebox fuzzing for security testing. Communications of the ACM, 55(3):40–44, 2012.

    Article  Google Scholar 

  72. Ilfak Guilfanov. IDA fast library identification and recognition technology (FLIRT Technology): In-depth. https://www.hex\-rays.com/products/ida/tech/flirt/in_depth.shtml, 2012.

  73. Sumit Gulwani and George C Necula. Precise interprocedural analysis using random interpretation. In ACM SIGPLAN Notices, volume 40, pages 324–337. ACM, 2005.

    Google Scholar 

  74. Archit Gupta, Pavan Kuppili, Aditya Akella, and Paul Barford. An empirical study of malware evolution. In First International Communication Systems and Networks and Workshops (COMSNETS), pages 1–10. IEEE, 2009.

    Google Scholar 

  75. Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. TurboISO: towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 337–348. ACM, 2013.

    Google Scholar 

  76. Sean Heelan. Automatic generation of control flow hijacking exploits for software vulnerabilities. PhD thesis, University of Oxford, 2009.

    Google Scholar 

  77. Sean Heelan and Agustin Gianni. Augmenting vulnerability analysis of binary code. In Proceedings of the 28th Annual Computer Security Applications Conference (ACSAC), pages 199–208. ACM, 2012.

    Google Scholar 

  78. Christian Heitman and Iván Arce. BARF: A multiplatform open source binary analysis and reverse engineering framework. In XX Congreso Argentino de Ciencias de la Computación (Buenos Aires, 2014), 2014.

    Google Scholar 

  79. Armijn Hemel, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Dolstra. Finding software license violations through binary code clone detection. In Proceedings of the 8th Working Conference on Mining Software Repositories, pages 63–72. ACM, 2011.

    Google Scholar 

  80. Susan Horwitz, Thomas Reps, and David Binkley. Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(1):26–60, 1990.

    Article  Google Scholar 

  81. Emily R Jacobson, Andrew R Bernat, William R Williams, and Barton P Miller. Detecting code reuse attacks with a model of conformant program execution. In Engineering Secure Software and Systems, pages 1–18. Springer, 2014.

    Google Scholar 

  82. Emily R Jacobson, Nathan Rosenblum, and Barton P Miller. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools (PASTE), pages 1–8. ACM, 2011.

    Google Scholar 

  83. Anil K Jain. Data clustering: 50 years beyond k-means. Pattern recognition letters, 31(8):651–666, 2010.

    Article  Google Scholar 

  84. Jiyong Jang, Abeer Agrawal, and David Brumley. ReDeBug: finding unpatched code clones in entire os distributions. In IEEE Symposium on Security and Privacy (S&P), pages 48–62. IEEE, 2012.

    Google Scholar 

  85. Jiyong Jang and David Brumley. Bitshred: Fast, scalable code reuse detection in binary code. CMU-CyLab-10-006, 16, 2009.

    Google Scholar 

  86. Jiyong Jang, Maverick Woo, and David Brumley. Towards automatic software lineage inference. In USENIX Security Symposium (USENIX Security 13), pages 81–96, 2013.

    Google Scholar 

  87. Yoon-Chan Jhi, Xinran Wang, Xiaoqi Jia, Sencun Zhu, Peng Liu, and Dinghao Wu. Value-based program characterization and its application to software plagiarism detection. In Proceedings of the 33rd International Conference on Software Engineering, pages 756–765. ACM, 2011.

    Google Scholar 

  88. Weiwei Jin, Sagar Chaki, Cory Cohen, Arie Gurfinkel, Jeffrey Havrilla, Charles Hines, and Priya Narasimhan. Binary function clustering using semantic hashes. In The 11th International Conference on Machine Learning and Applications (ICMLA), volume 1, pages 386–391. IEEE, 2012.

    Google Scholar 

  89. Jousselme, Anne-Laure and Maupin, Patrick. Distances in evidence theory: Comprehensive survey and generalizations. International Journal of Approximate Reasoning, 53(2), 118–145, 2012.

    Article  MathSciNet  MATH  Google Scholar 

  90. Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. Obfuscator-LLVM: software protection for the masses. In Proceedings of the 1st International Workshop on Software PROtection (SPRO), pages 3–9. IEEE Press, 2015.

    Google Scholar 

  91. Md Enamul Karim, Andrew Walenstein, Arun Lakhotia, and Laxmi Parida. Malware phylogeny generation using permutations of code. Journal in Computer Virology, 1(1-2):13–23, 2005.

    Article  Google Scholar 

  92. Wei Ming Khoo, Alan Mycroft, and Ross Anderson. Rendezvous: a search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, pages 329–338. IEEE Press, 2013.

    Google Scholar 

  93. Johannes Kinder. Static analysis of x86 executables. PhD thesis, Technische Universität Darmstadt, 2010.

    Google Scholar 

  94. Johannes Kinder and Helmut Veith. Jakstab: A static analysis platform for binaries. In International Conference on Computer Aided Verification, pages 423–427. Springer, 2008.

    Google Scholar 

  95. Jonghoon Kwon and Heejo Lee. Bingraph: Discovering mutant malware using hierarchical semantic signatures. In Malicious and Unwanted Software (MALWARE), 2012 7th International Conference on, pages 104–111. IEEE, 2012.

    Google Scholar 

  96. Shuvendu K Lahiri, Chris Hawblitzel, Ming Kawaguchi, and Henrique Rebêlo. Symdiff: A language-agnostic semantic diff tool for imperative programs. In International Conference on Computer Aided Verification, pages 712–717. Springer, 2012.

    Google Scholar 

  97. Arun Lakhotia, Mila Dalla Preda, and Roberto Giacobazzi. Fast location of similar code fragments using semantic ‘juice’. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, page 5. ACM, 2013.

    Google Scholar 

  98. Andrea Lanzi, Davide Balzarotti, Christopher Kruegel, Mihai Christodorescu, and Engin Kirda. Accessminer: using system-centric models for malware protection. In Proceedings of the 17th ACM conference on Computer and communications security (CCS), pages 399–412. ACM, 2010.

    Google Scholar 

  99. Meir M Lehman and Juan F Ramil. Rules and tools for software evolution planning and management. Annals of software engineering, 11(1):15–44, 2001.

    Google Scholar 

  100. Pierre Lestringant, Frédéric Guihéry, and Pierre-Alain Fouque. Automated identification of cryptographic primitives in binary code with data flow graph isomorphism. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, pages 203–214. ACM, 2015.

    Google Scholar 

  101. Yuping Li, Sathya Chandran Sundaramurthy, Alexandru G Bardas, Xinming Ou, Doina Caragea, Xin Hu, and Jiyong Jang. Experimental study of fuzzy hashing in malware clustering analysis. In 8th Workshop on Cyber Security Experimentation and Test (CSET 15), 2015.

    Google Scholar 

  102. Michael Ligh, Steven Adair, Blake Hartstein, and Matthew Richard. Malware analyst’s cookbook and DVD: tools and techniques for fighting malicious code. Wiley Publishing, 2010.

    Google Scholar 

  103. Da Lin and Mark Stamp. Hunting for undetectable metamorphic viruses. Journal in computer virology, 7(3):201–214, 2011.

    Article  Google Scholar 

  104. Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. Automatic reverse engineering of data structures from binary execution. In Proceedings of the 11th Annual Information Security Symposium, page 5. CERIAS-Purdue University, 2010.

    Google Scholar 

  105. Yingfan Liu, Jiangtao Cui, Zi Huang, Hui Li, and Heng Tao Shen. Sk-lsh: An efficient index structure for approximate nearest neighbor search. Proceedings of the VLDB Endowment, 7(9):745–756, 2014.

    Article  Google Scholar 

  106. Fan Long, Stelios Sidiroglou-Douskos, and Martin Rinard. Automatic runtime error repair and containment via recovery shepherding. In ACM SIGPLAN Notices, volume 49, pages 227–238. ACM, 2014.

    Google Scholar 

  107. Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 389–400. ACM, 2014.

    Google Scholar 

  108. Matias Madou, Bertrand Anckaert, Bjorn De Sutter, and Koen De Bosschere. Hybrid static-dynamic attacks against software protection mechanisms. In Proceedings of the 5th ACM workshop on Digital rights management, pages 75–82. ACM, 2005.

    Google Scholar 

  109. Lorenzo Martignoni, Stephen McCamant, Pongsin Poosankam, Dawn Song, and Petros Maniatis. Path-exploration lifting: Hi-fi tests for lo-fi emulators. In ACM SIGARCH Computer Architecture News, volume 40, pages 337–348. ACM, 2012.

    Google Scholar 

  110. Sven Mattsen, Arne Wichmann, and Sibylle Schupp. A non-convex abstract domain for the value analysis of binaries. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pages 271–280. IEEE, 2015.

    Google Scholar 

  111. Eitan Menahem, Asaf Shabtai, Lior Rokach, and Yuval Elovici. Improving malware detection by applying multi-inducer ensemble. Computational Statistics & Data Analysis, 53(4):1483–1494, 2009.

    Article  MathSciNet  MATH  Google Scholar 

  112. Charith Mendis, Jeffrey Bosboom, Kevin Wu, Shoaib Kamil, Jonathan Ragan-Kelley, Sylvain Paris, Qin Zhao, and Saman Amarasinghe. Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide dsl code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 391–402. ACM, 2015.

    Google Scholar 

  113. Xiaozhu Meng. Fine-grained binary code authorship identification. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 1097–1099. ACM, 2016.

    Google Scholar 

  114. Xiaozhu Meng, Barton P Miller, and Kwang-Sung Jun. Identifying multiple authors in a binary program. In European Symposium on Research in Computer Security (ESORICS), pages 286–304. Springer, 2017.

    Google Scholar 

  115. Barton P Miller, Mark D Callaghan, Jonathan M Cargille, Jeffrey K Hollingsworth, R Bruce Irvin, Karen L Karavanic, Krishna Kunchithapadam, and Tia Newhall. The paradyn parallel performance measurement tool. Computer, 28(11):37–46, 1995.

    Article  Google Scholar 

  116. Jiang Ming, Meng Pan, and Debin Gao. iBinHunt: binary hunting with inter-procedural control flow. In Information Security and Cryptology–ICISC 2012, pages 92–109. Springer, 2012.

    Google Scholar 

  117. Mondaini, Rubem P. BIOMAT 2012: International Symposium on Mathematical and Computational Biology, Tempe, Arizona, USA, 6-10 November 2012. World Scientific, 2013.

    Google Scholar 

  118. James Munkres. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics, 5(1):32–38, 1957.

    Article  MathSciNet  MATH  Google Scholar 

  119. Lakshmanan Nataraj, Dhilung Kirat, BS Manjunath, and Giovanni Vigna. SARVAM: Search and retrieval of malware. In Worshop on Next Generation Malware Attacks and Defense (NGMAD), 2013.

    Google Scholar 

  120. Beng Heng Ng and Aravind Prakash. Exposé: discovering potential binary code re-use. In IEEE 37th Annual Computer Software and Applications Conference (COMPSAC), pages 492–501. IEEE, 2013.

    Google Scholar 

  121. Pádraig OáSullivan, Kapil Anand, Aparna Kotha, Matthew Smithson, Rajeev Barua, and Angelos D Keromytis. Retrofitting security in cots software with binary rewriting. In Future Challenges in Security and Privacy for Academia and Industry, pages 154–172. Springer, 2011.

    Google Scholar 

  122. Karl J Ottenstein and Linda M Ottenstein. The program dependence graph in a software development environment. In ACM Sigplan Notices, volume 19, pages 177–184. ACM, 1984.

    Google Scholar 

  123. Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten Holz. Cross-architecture bug search in binary executables. In IEEE Symposium on Security and Privacy (S&P), pages 709–724. IEEE, 2015.

    Google Scholar 

  124. Jannik Pewny, Felix Schuster, Lukas Bernhard, Thorsten Holz, and Christian Rossow. Leveraging semantic signatures for bug search in binary programs. In Proceedings of the 30th Annual Computer Security Applications Conference (ACSAC), pages 406–415. ACM, 2014.

    Google Scholar 

  125. Van-Thuan Pham, Wei Boon Ng, Konstantin Rubinov, and Abhik Roychoudhury. Hercules: reproducing crashes in real-world application binaries. In Proceedings of the 37th International Conference on Software Engineering-Volume 1, pages 891–901. IEEE Press, 2015.

    Google Scholar 

  126. Jing Qiu, Xiaohong Su, and Peijun Ma. Library functions identification in binary code by using graph isomorphism testings. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pages 261–270. IEEE, 2015.

    Google Scholar 

  127. Jing Qiu, Xiaohong Su, and Peijun Ma. Using reduced execution flow graph to identify library functions in binary code. IEEE Transactions on Software Engineering (TSE), 42(2):187–202, 2016.

    Article  Google Scholar 

  128. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices, 48(6):519–530, 2013.

    Article  Google Scholar 

  129. Ashkan Rahimian, Paria Shirani, Saed Alrbaee, Lingyu Wang, and Mourad Debbabi. Bincomp: A stratified approach to compiler provenance attribution. Digital Investigation, 14:S146–S155, 2015.

    Article  Google Scholar 

  130. David A Ramos and Dawson Engler. Under-constrained symbolic execution: correctness checking for real code. In 24th USENIX Security Symposium (USENIX Security 15), pages 49–64, 2015.

    Google Scholar 

  131. Alexandre Rebert, Sang Kil Cha, Thanassis Avgerinos, Jonathan Foote, David Warren, Gustavo Grieco, and David Brumley. Optimizing seed selection for fuzzing. In 23rd USENIX Security Symposium (USENIX Security 14), pages 861–875, 2014.

    Google Scholar 

  132. Konrad Rieck, Philipp Trinius, Carsten Willems, and Thorsten Holz. Automatic analysis of malware behavior using machine learning. Journal of Computer Security, 19(4):639–668, 2011.

    Article  Google Scholar 

  133. Roman, Steven. Coding and Information Theory, vol. 134, Springer Science & Business Media, 1992.

    Google Scholar 

  134. Nathan Rosenblum, Barton P Miller, and Xiaojin Zhu. Recovering the toolchain provenance of binary code. In Proceedings of the International Symposium on Software Testing and Analysis, pages 100–110. ACM, 2011.

    Google Scholar 

  135. Nathan Rosenblum, Xiaojin Zhu, and Barton P Miller. Who wrote this code? identifying the authors of program binaries. In European Symposium on Research in Computer Security (ESORICS), pages 172–189. Springer, 2011.

    Google Scholar 

  136. Nathan E Rosenblum, Barton P Miller, and Xiaojin Zhu. Extracting compiler provenance from program binaries. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, pages 21–28. ACM, 2010.

    Google Scholar 

  137. Kevin A Roundy and Barton P Miller. Hybrid analysis and control of malware. In Recent Advances in Intrusion Detection (RAID), pages 317–338. Springer, 2010.

    Google Scholar 

  138. Chanchal K Roy, James R Cordy, and Rainer Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming, 74(7):470–495, 2009.

    Article  MathSciNet  MATH  Google Scholar 

  139. Brian Ruttenberg, Craig Miles, Lee Kellogg, Vivek Notani, Michael Howard, Charles LeDoux, Arun Lakhotia, and Avi Pfeffer. Identifying shared software components to support malware forensics. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), pages 21–40. Springer, 2014.

    Google Scholar 

  140. Andreas Sæbjørnsen, Jeremiah Willcock, Thomas Panas, Daniel Quinlan, and Zhendong Su. Detecting code clones in binary executables. In Proceedings of the eighteenth international symposium on Software testing and analysis, pages 117–128. ACM, 2009.

    Google Scholar 

  141. Saul Schleimer, Daniel S Wilkerson, and Alex Aiken. Winnowing: local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 76–85. ACM, 2003.

    Google Scholar 

  142. Matthew G Schultz, Eleazar Eskin, Erez Zadok, and Salvatore J Stolfo. Data mining methods for detection of new malicious executables. In IEEE Symposium on Security and Privacy (S&P), pages 38–49. IEEE, 2001.

    Google Scholar 

  143. Farrukh Shahzad and Muddassar Farooq. ELF-Miner: using structural knowledge and data mining methods to detect new (Linux) malicious executables. Knowledge and information systems, 30(3):589–612, 2012.

    Article  Google Scholar 

  144. Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. Recognizing functions in binaries with neural networks. In 24th USENIX Security Symposium (USENIX Security 15), pages 611–626, 2015.

    Google Scholar 

  145. Paria Shirani, Lingyu Wang, and Mourad Debbabi. BinShape: Scalable and robust binary library function identification using function shape. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), pages 301–324. Springer, 2017.

    Google Scholar 

  146. Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, et al. Sok:(state of) the art of war: Offensive techniques in binary analysis. In IEEE Symposium on Security and Privacy (SP), pages 138–157. IEEE, 2016.

    Google Scholar 

  147. Asia Slowinska, Traian Stancescu, and Herbert Bos. Howard: A dynamic excavator for reverse engineering data structures. In NDSS. Citeseer, 2011.

    Google Scholar 

  148. Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. Bitblaze: A new approach to computer security via binary analysis. In Information systems security, pages 1–25. Springer, 2008.

    Google Scholar 

  149. Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. Efficient subgraph matching on billion node graphs. Proceedings of the VLDB Endowment, 5(9):788–799, 2012.

    Article  Google Scholar 

  150. Johan AK Suykens and Joos Vandewalle. Least squares support vector machine classifiers. Neural processing letters, 9(3):293–300, 1999.

    Google Scholar 

  151. Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. Quality and efficiency in high dimensional nearest neighbor search. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 563–576. ACM, 2009.

    Google Scholar 

  152. Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Transactions on Database Systems (TODS), 35(3):20, 2010.

    Google Scholar 

  153. Julian R Ullmann. An algorithm for subgraph isomorphism. Journal of the ACM (JACM), 23(1):31–42, 1976.

    Article  MathSciNet  Google Scholar 

  154. Maarten Van Emmerik. Identifying library functions in executable file using patterns. In Software Engineering Conference, 1998. Proceedings. 1998 Australian, pages 90–97. IEEE, 1998.

    Google Scholar 

  155. William M Waite and Gerhard Goos. Compiler construction. Springer Science & Business Media, 2012.

    Google Scholar 

  156. Andrew Walenstein, Michael Venable, Matthew Hayes, Christopher Thompson, and Arun Lakhotia. Exploiting similarity between variants to defeat malware. In Proc. BlackHat DC Conf, 2007.

    Google Scholar 

  157. Xinran Wang, Chi-Chun Pan, Peng Liu, and Sencun Zhu. Sigfree: A signature-free buffer overflow attack blocker. Dependable and Secure Computing, IEEE Transactions on, 7(1):65–79, 2010.

    Article  Google Scholar 

  158. Zheng Wang, Ken Pierce, and Scott McFarling. Bmat-a binary matching tool for stale profile propagation. The Journal of Instruction-Level Parallelism, 2:1–20, 2000.

    Google Scholar 

  159. Daniel Weise, Roger F Crew, Michael Ernst, and Bjarne Steensgaard. Value dependence graphs: Representation without taxation. In Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 297–310. ACM, 1994.

    Google Scholar 

  160. Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin. Symstra: A framework for generating object-oriented unit tests using symbolic execution. In Tools and Algorithms for the Construction and Analysis of Systems, pages 365–381. Springer, 2005.

    Google Scholar 

  161. Fabian Yamaguchi, Alwin Maier, Hugo Gascon, and Konrad Rieck. Automatic inference of search patterns for taint-style vulnerabilities. In IEEE Symposium on Security and Privacy, pages 797–812. IEEE, 2015.

    Google Scholar 

  162. Junyuan Zeng, Yangchun Fu, Kenneth A Miller, Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. Obfuscation resilient binary code reuse through trace-oriented programming. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security (CCS), pages 487–498. ACM, 2013.

    Google Scholar 

  163. Viviane Zwanger and Felix C Freiling. Kernel mode API spectroscopy for incident response and digital forensics. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, page 3. ACM, 2013.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Alrabaee, S. et al. (2020). Binary Analysis Overview. In: Binary Code Fingerprinting for Cybersecurity. Advances in Information Security, vol 78. Springer, Cham. https://doi.org/10.1007/978-3-030-34238-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34238-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34237-1

  • Online ISBN: 978-3-030-34238-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics