Abstract
Binary code similarity detection (BCSD) has many applications in computer security, whose task is to detect the similarity of two binary functions without having access to the source code. Recently deep learning methods have shown better efficiency, accuracy, and potential in BCSD. Most of them reduce losses by the Siamese network, and they ignore some shortcomings of the Siamese network. In this paper, we introduce the idea of contrastive learning into graph neural networks and experimentally demonstrate that the way of training graph models by contrastive learning is significantly better than Siamese. In addition, we found that Principal Neighbourhood Aggregation for Graph Nets (PNA) has the best ability to extract structural information of control flow graph (CFG) among various graph neural networks.
Keywords
- Binary code similarity detection
- contrastive learning
- graph neural network
This work was supported by the Natural Science Foundation of Jiangsu Province, China (BK20141209).
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Xu, X., Liu, C., Feng, Q., et al.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of The ACM SIGSAC Conference on Computer and Communications Security, vol. 2017, pp. 363–376 (2017)
Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: International Conference on Machine Learning. PMLR, pp. 2702–2711 (2016)
Yu, Z., Cao, R., Tang, Q., et al.: Order matters: semantic-aware neural networks for binary code similarity detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, pp. 1145–1152 (2020)
Guo, H., Huang, S., Huang, C., et al.: A lightweight cross-version binary code similarity detection based on similarity and correlation coefficient features. IEEE Access 8, 120501–120512 (2020)
Yang, S., Cheng, L., Zeng, Y., et al.: Asteria: deep learning-based AST-encoding for cross-platform binary code similarity detection. In: 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, pp. 224–236 (2021)
Gilmer, J., Schoenholz, S.S., Riley, P.F., et al.: Neural message passing for quantum chemistry. In: International Conference on Machine Learning. PMLR, pp. 1263–1272 (2017)
Devlin, J., Chang, M.W., Lee, K., et al.: Bert: pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805
Veličković, P., Cucurull, G., Casanova, A., et al.: Graph attention networks (2017). arXiv preprint arXiv:1710.10903
Corso, G., Cavalleri, L., Beaini, D., et al.: Principal neighbourhood aggregation for graph nets. Adv. Neural. Inf. Process. Syst. 33, 13260–13271 (2020)
Kim, D., Kim, E., Cha, S.K., et al.: Revisiting binary code similarity analysis using interpretable feature engineering and lessons learned (2020). arXiv preprint arXiv:2011.10749
Liu, B., Huo, W., Zhang, C., et al.: Diff: cross-version binary code similarity detection with dnn. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 667–678 (2018)
Zuo, F., Li, X., Young, P., et al.: Neural machine translation inspired binary code similarity comparison beyond function pairs (2018). arXiv preprint arXiv:1808.04706
Scarselli, F., Gori, M., Tsoi, A.C., et al.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks (2016). arXiv preprint arXiv:1609.02907
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 30 (2017)
Battaglia, P.W., Hamrick, J.B., Bapst, V., et al.: Relational inductive biases, deep learning, and graph networks (2018). arXiv preprint arXiv:1806.01261
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International Conference on Machine Learning. PMLR, pp. 9929–9939 (2020)
Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings (2021). arXiv preprint arXiv:2104.08821
Chen, T., Kornblith, S., Norouzi, M., et al.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. PMLR, pp. 1597–1607 (2020)
IDA Pro Homepage. https://www.hex-rays.com/ida-pro/
Henderson, M., Al-Rfou, R., Strope, B., et al.: Efficient natural language response suggestion for smart reply (2017). arXiv preprint arXiv:1705.00652
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Xia, F., Wu, G., Zhao, G., Li, X. (2022). SimCGE: Simple Contrastive Learning of Graph Embeddings for Cross-Version Binary Code Similarity Detection. In: Alcaraz, C., Chen, L., Li, S., Samarati, P. (eds) Information and Communications Security. ICICS 2022. Lecture Notes in Computer Science, vol 13407. Springer, Cham. https://doi.org/10.1007/978-3-031-15777-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-15777-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15776-9
Online ISBN: 978-3-031-15777-6
eBook Packages: Computer ScienceComputer Science (R0)