Skip to main content

RouAlign: Cross-Version Function Alignment and Routine Recovery with Graphlet Edge Embedding

  • Conference paper
  • First Online:
ICT Systems Security and Privacy Protection (SEC 2020)

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 580))

  • 1153 Accesses

Abstract

Reverse engineering is labor-intensive work to understand the inner implementation of a program, and is necessary for malware analysis, vulnerability hunting, etc. Cross-version function identification and subroutine matching would greatly release manpower by indicating the known parts coming from different binary programs. Existing approaches mainly focus on function recognition ignoring the recovery of the relationships between functions, which makes the researchers hard to locate the calling routine they are interested in.

In this paper, we propose a method using graphlet edge embedding to abstract high-level topology features of function call graphs and recover the relationships between functions. With the recovery of function relationships, we reconstruct the calling routine of the program and then infer the specific functions in it. We implement a prototype model called RouAlign, which can automatically align the trunk routine of assembly codes. We evaluated RouAlign on 65 groups of real-world programs, with over two million functions. RouAlign outperforms state-of-the-art binary comparing solutions by over 35% with a high precision of 92% on average in pairwise function recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Le, S.: Structure2Vec: deep learning for security analytics over graphs (2018)

    Google Scholar 

  2. Bromley, J., et al.: Signature verification using a “siamese” time delay neural network. In: Advances in neural information processing systems (1994)

    Google Scholar 

  3. Altschul, S.F., et al.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  Google Scholar 

  4. Shi, C., et al.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2016)

    Article  Google Scholar 

  5. Andriesse, D., et al.: An in-depth analysis of disassembly on full-scale x86/x64 binaries. In: 25th USENIX Security Symposium (USENIX Security 2016) (2016)

    Google Scholar 

  6. Chandramohan, M., et al.: BinGo: cross-architecture cross-OS binary search. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM (2016)

    Google Scholar 

  7. Ding, S., et al.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. IEEE (2019)

    Google Scholar 

  8. Dullien, T., et al.: Automated attacker correlation for malicious code. Bochum University (Germany FR) (2010)

    Google Scholar 

  9. Dullien, T., Rolles, R.: Graph-based comparison of executable objects (English version). SSTIC 5(1), 3 (2005)

    Google Scholar 

  10. BinDiff manual. https://www.zynamics.com/bindiff/manual/. Accessed 15 Sept 2019

  11. Junod, P., et al.: Obfuscator-LLVM-software protection for the masses. In: 2015 IEEE/ACM 1st International Workshop on Software Protection, pp. 3–9. IEEE (2015)

    Google Scholar 

  12. Eschweiler, S, Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: NDSS (2016)

    Google Scholar 

  13. Feng, M., et al.: Open-source license violations of binary software at large scale. In: IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER) (2019)

    Google Scholar 

  14. Feng, Q., et al.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM (2016)

    Google Scholar 

  15. Xu, X., et al.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM (2017)

    Google Scholar 

  16. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2016)

    Google Scholar 

  17. Hu, X., et al.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications Security. ACM (2009)

    Google Scholar 

  18. Kuchaiev, O., et al.: Topological network alignment uncovers biological function and phylogeny. J. R. Soc. Interface 7(50), 1341–1354 (2010)

    Article  Google Scholar 

  19. László, T., Kiss, Á.: Obfuscating C++ programs via control flow flattening. Annales Universitatis Scientarum Budapestinensis de Rolando Eötvös Nominatae, Sectio Computatorica 30(1), 3–19 (2009)

    MATH  Google Scholar 

  20. Liu, B., et al.: \(\alpha \)diff: cross-version binary code similarity detection with DNN. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM (2018)

    Google Scholar 

  21. Luo, M., Yang, C., Gong, X., Yu, L.: FuncNet: a Euclidean embedding approach for lightweight cross-platform binary recognition. In: Chen, S., Choo, K.-K.R., Fu, X., Lou, W., Mohaisen, A. (eds.) SecureComm 2019. LNICST, vol. 304, pp. 319–337. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37228-6_16

    Chapter  Google Scholar 

  22. Milenković, T., Pržulj, N.: Uncovering biological network function via graphlet degree signatures. Cancer Inform. 6, 257–273 (2008). CIN-S680

    Article  Google Scholar 

  23. Milo, R., et al.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)

    Article  Google Scholar 

  24. Tang, J., et al.: LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2015)

    Google Scholar 

  25. Tang, Y., Wang, Y., Wei, S.N., Yu, B., Yang, Q.: Matching function-call graph of binary codes and its applications (Short Paper). In: Liu, J.K., Samarati, P. (eds.) ISPEC 2017. LNCS, vol. 10701, pp. 770–779. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72359-4_48

    Chapter  Google Scholar 

  26. Zuo, F., Li, X., et al. Neural machine translation inspired binary code similarity comparison beyond function pairs. In: Proceedings of the 2019 Network and Distributed Systems Security Symposium (NDSS) (2019, in press)

    Google Scholar 

  27. SimHash wiki. https://en.wikipedia.org/wiki/SimHash. Accessed 3 Jan 2020

Download references

Acknowledgments

We thank anonymous reviewers for their invaluable comments and suggestions. Can Yang and Jian Liu share the co-first authorship.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Can Yang or Jian Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, C., Liu, J., Luo, M., Gong, X., Liu, B. (2020). RouAlign: Cross-Version Function Alignment and Routine Recovery with Graphlet Edge Embedding. In: Hölbl, M., Rannenberg, K., Welzer, T. (eds) ICT Systems Security and Privacy Protection. SEC 2020. IFIP Advances in Information and Communication Technology, vol 580. Springer, Cham. https://doi.org/10.1007/978-3-030-58201-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58201-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58200-5

  • Online ISBN: 978-3-030-58201-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics