Skip to main content

SimCGE: Simple Contrastive Learning of Graph Embeddings for Cross-Version Binary Code Similarity Detection

  • 1184 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13407)

Abstract

Binary code similarity detection (BCSD) has many applications in computer security, whose task is to detect the similarity of two binary functions without having access to the source code. Recently deep learning methods have shown better efficiency, accuracy, and potential in BCSD. Most of them reduce losses by the Siamese network, and they ignore some shortcomings of the Siamese network. In this paper, we introduce the idea of contrastive learning into graph neural networks and experimentally demonstrate that the way of training graph models by contrastive learning is significantly better than Siamese. In addition, we found that Principal Neighbourhood Aggregation for Graph Nets (PNA) has the best ability to extract structural information of control flow graph (CFG) among various graph neural networks.

Keywords

  • Binary code similarity detection
  • contrastive learning
  • graph neural network

This work was supported by the Natural Science Foundation of Jiangsu Province, China (BK20141209).

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Xu, X., Liu, C., Feng, Q., et al.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of The ACM SIGSAC Conference on Computer and Communications Security, vol. 2017, pp. 363–376 (2017)

    Google Scholar 

  2. Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: International Conference on Machine Learning. PMLR, pp. 2702–2711 (2016)

    Google Scholar 

  3. Yu, Z., Cao, R., Tang, Q., et al.: Order matters: semantic-aware neural networks for binary code similarity detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, pp. 1145–1152 (2020)

    Google Scholar 

  4. Guo, H., Huang, S., Huang, C., et al.: A lightweight cross-version binary code similarity detection based on similarity and correlation coefficient features. IEEE Access 8, 120501–120512 (2020)

    CrossRef  Google Scholar 

  5. Yang, S., Cheng, L., Zeng, Y., et al.: Asteria: deep learning-based AST-encoding for cross-platform binary code similarity detection. In: 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, pp. 224–236 (2021)

    Google Scholar 

  6. Gilmer, J., Schoenholz, S.S., Riley, P.F., et al.: Neural message passing for quantum chemistry. In: International Conference on Machine Learning. PMLR, pp. 1263–1272 (2017)

    Google Scholar 

  7. Devlin, J., Chang, M.W., Lee, K., et al.: Bert: pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805

  8. Veličković, P., Cucurull, G., Casanova, A., et al.: Graph attention networks (2017). arXiv preprint arXiv:1710.10903

  9. Corso, G., Cavalleri, L., Beaini, D., et al.: Principal neighbourhood aggregation for graph nets. Adv. Neural. Inf. Process. Syst. 33, 13260–13271 (2020)

    Google Scholar 

  10. Kim, D., Kim, E., Cha, S.K., et al.: Revisiting binary code similarity analysis using interpretable feature engineering and lessons learned (2020). arXiv preprint arXiv:2011.10749

  11. Liu, B., Huo, W., Zhang, C., et al.: Diff: cross-version binary code similarity detection with dnn. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 667–678 (2018)

    Google Scholar 

  12. Zuo, F., Li, X., Young, P., et al.: Neural machine translation inspired binary code similarity comparison beyond function pairs (2018). arXiv preprint arXiv:1808.04706

  13. Scarselli, F., Gori, M., Tsoi, A.C., et al.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)

    CrossRef  Google Scholar 

  14. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks (2016). arXiv preprint arXiv:1609.02907

  15. Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  16. Battaglia, P.W., Hamrick, J.B., Bapst, V., et al.: Relational inductive biases, deep learning, and graph networks (2018). arXiv preprint arXiv:1806.01261

  17. Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

    Google Scholar 

  18. Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International Conference on Machine Learning. PMLR, pp. 9929–9939 (2020)

    Google Scholar 

  19. Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings (2021). arXiv preprint arXiv:2104.08821

  20. Chen, T., Kornblith, S., Norouzi, M., et al.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. PMLR, pp. 1597–1607 (2020)

    Google Scholar 

  21. IDA Pro Homepage. https://www.hex-rays.com/ida-pro/

  22. Henderson, M., Al-Rfou, R., Strope, B., et al.: Efficient natural language response suggestion for smart reply (2017). arXiv preprint arXiv:1705.00652

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guixing Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xia, F., Wu, G., Zhao, G., Li, X. (2022). SimCGE: Simple Contrastive Learning of Graph Embeddings for Cross-Version Binary Code Similarity Detection. In: Alcaraz, C., Chen, L., Li, S., Samarati, P. (eds) Information and Communications Security. ICICS 2022. Lecture Notes in Computer Science, vol 13407. Springer, Cham. https://doi.org/10.1007/978-3-031-15777-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15777-6_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15776-9

  • Online ISBN: 978-3-031-15777-6

  • eBook Packages: Computer ScienceComputer Science (R0)