Skip to main content

FuncNet: A Euclidean Embedding Approach for Lightweight Cross-platform Binary Recognition

  • Conference paper
  • First Online:

Abstract

Reverse analysis is a necessary but manually dependent technique to comprehend the working principle of new malware. The cross-platform binary recognition facilitates the work of reverse engineers by identifying those duplicated or known parts compiled from various platforms. However, existing approaches mainly rely on raw function bytes or cosine embedding representation, which have either low binary recognition accuracy or high binary search overheads on real-world binary recognition tasks. In this paper, we propose a lightweight neural network-based approach to generate the Euclidean embedding (i.e., a numeric vector), based on the control flow graph and callee’s interface information of each binary function, and classify the embedding vectors with an Euclidean distance sensitive artificial neural network. We implement a prototype called FuncNet, and evaluate it on real-world projects with 1980 binaries, about 2 million function pairs. The experiment result shows that its accuracy outperforms state-of-the-art solutions by over 13% on average and the binary search on big datasets can be done with constant time complexity.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    High performance, open source, cross platform CryptoNight CPU/GPU miners: https://xmrig.com/.

  2. 2.

    Details could be found at: https://github.com/delia0204/FuncNet.

References

  1. Khoo, W.M., Mycroft, A., Anderson, R.: Rendezvous: a search engine for binary code. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR) (2013)

    Google Scholar 

  2. Ding, S.H.H., Fung, B.C.M., Charland, P.: Kam1n0: MapReduce-based assembly clone search for reverse engineering. In: The 22nd ACM SIGKDD International Conference. ACM (2016)

    Google Scholar 

  3. Saebjornsen, A.: Detecting fine-grained similarity in binaries. Dissertations & Theses - Gradworks (2014)

    Google Scholar 

  4. Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: NDSS (2016)

    Google Scholar 

  5. Feng, Q., et al.: Scalable graph-based bug search for firmware images. In: ACM SIGSAC Conference on Computer and Communications Security. ACM (2016)

    Google Scholar 

  6. Xu, X., et al.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: ACM SIGSAC Conference on Computer and Communications Security. ACM (2017)

    Google Scholar 

  7. Ding, S.H.H., Fung, B.C.M., Charland, P.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society (2019)

    Google Scholar 

  8. Egele, M., et al.: Blanket execution: dynamic similarity testing for program binaries and components. In: USENIX Conference on Security Symposium. USENIX Association (2014)

    Google Scholar 

  9. Pewny, J., et al.: Cross-architecture bug search in binary executables. In: 2015 IEEE Symposium on Security and Privacy (SP), pp. 709–724. IEEE Computer Society (2015)

    Google Scholar 

  10. Chandramohan, M., et al.: BinGo: cross-architecture cross-OS binary search. In: ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 678–689. ACM (2016)

    Google Scholar 

  11. Hu, Y., et al.: Binary code clone detection across architectures and compiling configurations. In: 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). IEEE Computer Society (2017)

    Google Scholar 

  12. George, E.D., Tara, N.S., Geoffrey, E.H.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8609–8613. IEEE (2013)

    Google Scholar 

  13. Liu, B., et al.: Cross-version binary code similarity detection with DNN. In: Proceedings of the 2018 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018) (2018)

    Google Scholar 

  14. Gao, J., Yang, X., Fu, Y., Jiang, Y., Shi, H., Sun, J.: VulSeeker-pro: enhanced semantic learning based binary vulnerability seeker with emulation. In: Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018) (2018)

    Google Scholar 

  15. zynamics. BinDiff. https://www.zynamics.com/bindiff.html

  16. The IDA Pro Disassembler and Debugger (2015). http://www.datarescue.com/idabase/

  17. Brumley, D., Poosankam, P., Song, D., Zheng, J.: Automatic patch-based exploit generation is possible: techniques and implications. In: IEEE Symposium on Security and Privacy 2008 (SP 2008), pp. 143–157. IEEE (2008)

    Google Scholar 

  18. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society (2015)

    Google Scholar 

  19. Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: International Conference on Machine Learning (2016)

    Google Scholar 

  20. Vettigli, G.: MiniSom: minimalistic and numpy-based implementation of the self organizing map (2018). https://github.com/JustGlowing/minisom

  21. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982)

    Article  Google Scholar 

  22. Luo, L., Ming, J., Wu, D., Liu, P., Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 389–400. ACM (2014)

    Google Scholar 

  23. Luo, L., Ming, J., Dinghao, W., Liu, P., Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software and algorithm plagiarism detection. IEEE Trans. Softw. Eng. 43(12), 1157–1177 (2017)

    Article  Google Scholar 

  24. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning (2016)

    Google Scholar 

  25. Blokhin, K., Saxe, J., Mentis, D.: Malware similarity identification using call graph based system call subsequence features. In: IEEE International Conference on Distributed Computing Systems Workshops (2013)

    Google Scholar 

  26. Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4004–4012. IEEE (2016)

    Google Scholar 

  27. Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Berlin (2001). https://doi.org/10.1007/978-3-642-97966-8

    Book  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank Can Yang, Yuchen Wei, and the anonymous reviewers for their constructive comments. This work is supported by the Chinese Academy of Sciences Key Laboratory of Network Assessment Technology, and Beijing Key Laboratory of Network Security and Protection Technology, as well as Chinese National Natural Science Foundation (U1836209, 61602470, 61802394), Strategic Priority Research Program of the CAS (XDC02040100, XDC02030200, XDC02020200), National Key Research and Development Program of China (2016QY071405), the Program of Beijing Municipal Science and Technology Commission (No. D181100000618004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaorui Gong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luo, M., Yang, C., Gong, X., Yu, L. (2019). FuncNet: A Euclidean Embedding Approach for Lightweight Cross-platform Binary Recognition. In: Chen, S., Choo, KK., Fu, X., Lou, W., Mohaisen, A. (eds) Security and Privacy in Communication Networks. SecureComm 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 304. Springer, Cham. https://doi.org/10.1007/978-3-030-37228-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37228-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37227-9

  • Online ISBN: 978-3-030-37228-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics