AIHGAT: A novel method of malware detection and homology analysis using assembly instruction heterogeneous graph

Wang, Runzheng; Gao, Jian; Huang, Shuhua

doi:10.1007/s10207-023-00699-7

AIHGAT: A novel method of malware detection and homology analysis using assembly instruction heterogeneous graph

Regular contribution
Published: 14 May 2023

Volume 22, pages 1423–1443, (2023)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

Runzheng Wang¹,
Jian Gao^1,2 &
Shuhua Huang^1,2

271 Accesses
Explore all metrics

Abstract

At present, the trend of familiarization of malicious code is becoming more and more obvious, and the research on the homology of malware (the classification of malicious code family) is of great significance for maintaining network security. In order to better express the overall characteristics of malicious code and improve the effect of detection and homology analysis, this paper proposes a method for detection and homology analysis of malware based on heterogeneous graphs of assembly instructions (AIHGAT). We take the assembly instructions of malicious families as the research object and analyze the importance and correlation of assembly instructions of different malicious families. The malware detection and homology analysis are carried out in three aspects: feature extraction, feature preprocessing, and model construction. In the feature extraction of malicious code, in order to alleviate the problem that it is difficult to extract static features of malicious samples that contain countermeasures such as packing and obfuscation, we obtain binary files from dynamic memory through sandbox and then, analyze its assembly instruction set. In feature preprocessing, we divide the assembly instructions into N-tuples and construct a heterogeneous graph based on assembly instructions according to the internal correlation of the gene sequence composed of the assembly N-grams features. Finally, in terms of model construction, we analyze the homology determination effect of the traditional graph neural network and construct the Graph Attention Network based on residual connection named ResGAT to analyze the homology of malicious code. The experimental results show that the ResGAT can gather the core characteristics of malicious families and enhance the ability to recognize malicious family variants. Our model has an accuracy rate of 98.83%, which is better than traditional machine learning detection methods, and can effectively determine the homology of malicious code families.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Evolutionary Triplet Network of Learning Disentangled Malware Space for Malware Classification

Malware Detected and Tell Me Why: An Verifiable Malware Detection Model with Graph Metric Learning

Code Representation Based on Hybrid Graph Modelling

Data availability statements

All data generated or analyzed during this study are included in this article. The dataset that supports the findings of this study are available at virusshare.com.

Notes

References

Santos, I., Brezo, F., Ugarte-Pedrero, X., et al.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. 231, 64–82 (2013). https://doi.org/10.1016/j.ins.2011.08.020
Article MathSciNet Google Scholar
Zhang F.Y., Zhao, T.Z. Malware detection and classification based on N-grams attribute similarity. In: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), IEEE, 2017, pp. 793–796. https://doi.org/10.1109/CSE-EUC.2017.157
Galal, H.S., Mahdy, Y.B., Atiea, M.A.: Behavior-based features model for malware detection. J. Comput. Virol. Hacking Tech. 12(2), 59–67 (2016). https://doi.org/10.1007/s11416-015-0244-0
Article Google Scholar
Shabtai, A., Moskovitch, R., Feher, C., et al.: Detecting unknown malicious code by applying classification techniques on OpCode patterns. Secur. Inform. 1, 1–22 (2012). https://doi.org/10.1186/2190-8532-1-1
Article Google Scholar
Lee, J., Im, C., Jeong, H.: A study of malware detection and classification by comparing extracted strings. In: Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication, 2011, pp. 1–4. https://doi.org/10.1145/1968613.1968704
Alazab, M., Venkataraman, S., Watters, P.: Towards understanding malware behaviour by the extraction of API calls. In: Proceedings of 2010 Second Cybercrime and Trustworthy Computing Workshop, IEEE, 2016, pp.52–59.doi: https://doi.org/10.1109/CTC.2010.8
Amer, E., Zelinka, I.: A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence. Comput. Secur. 92, 101760 (2020). https://doi.org/10.1016/j.cose.2020.101760
Article Google Scholar
Shang, S., Zheng, N., Xu, J. et al.: Detecting malware variants via function-call graph similarity. In: Proceedings of the 5th International Conference on the Malicious and Unwanted Software, IEEE, 2010, pp.113–120. https://doi.org/10.1109/MALWARE.2010.5665787
Kong, D., Yan, G.: Discriminant malware distance learning on structural information for automated malware classification. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 2013, pp. 1357–1365. https://doi.org/10.1145/2487575.2488219
Hassen, M., Chan, P.K.: Scalable function call graph-based malware classification. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy. ACM, New York, NY, USA, 2017, pp. 239–248, https://doi.org/10.1145/3029806.3029824
Bruschi, D., Martignoni, L., Monga, M.: Detecting self-mutating malware using control-flow graph matching. In: Proceedings of the 3rd International Conference on Detection of Intrusions and Malware & Vulnerability Assessment. Berlin: Springer, 2006, pp.129–143. https://doi.org/10.1007/11790754_8
Ding, Y.X., Dai, W., Yan, S.L., Zhang, Y.M.: Control flow-based opcode behavior analysis for Malware detection. Comput. Secur. 44, 65–74 (2014). https://doi.org/10.1016/j.cose.2014.04.003
Article Google Scholar
Abou-Assaleh, T., Cercone, N., Keselj, V. et al.: Detection of new malicious code using N-grams signatures. In: Proceedings of the 2nd Annual Conference on Privacy, Security and Trust. New Brunswick, Canada, 2004, pp.193–196
Sornil, O., Liangboonprakong, C.: Malware classification using N-grams sequential pattern features. Int.J. Inf. Process. Manag. 4(5), 59–67 (2013). https://doi.org/10.4156/ijipm.vol4.issue5.7
Article Google Scholar
Moskovitch, R., Feher, C., Tzachar, N. et al.: Unknown malcode detection using OPCODE representation. In: Proceedings of the 2008 European Conference on Intelligence and Security Informatics. Berlin: Springer, 2008, pp.204–215. https://doi.org/10.1007/978-3-540-89900-6_21
Zhang, B., Xiao, W.T., Xiao, X., et al.: Ransomware classification using patch-based CNN and self-attention network on embedded N-grams of opcodes. Futur. Gener. Comput. Syst. 110, 708–720 (2020). https://doi.org/10.1016/j.future.2019.09.025
Article Google Scholar
Baldangombo, U., Jambaljav, N., Horng, S. J.: Static malware detection system using data mining methods. Int. J.Artif. Intell. Appl. 4(4), 113–126. https://arxiv.org/abs/1308.2831 (2013)
Kolosnjaji, B., Zarras, A., Webster, G., Eckert, C.: Deep learning for classification of malware system call sequences. In: Australasian Joint Conference on Artificial Intelligence. Springer, Cham, 2016, pp. 137–149. https://doi.org/10.1007/978-3-319-50127-7_11
Zhang, J.X., Qin, Z., Yin, H., et al.: A feature-hybrid malware variants detection using CNN based opcode embedding and BPNN based API embedding. Comput. Secur. 84, 376–392 (2019). https://doi.org/10.1016/j.cose.2019.04.005
Article Google Scholar
Wojnowicz, M., Chisholm, G., Wolff, M., Zhao, X.: Wavelet decomposition of software entropy reveals symptoms of malicious code. J. Innov. Digit. Ecosyst. 3(2), 130–140 (2016). https://doi.org/10.1016/j.jides.2016.10.009
Article Google Scholar
Pagani, F., Dell'Amico, M., Balzarotti, D.: Beyond precision and recall: understanding uses (and misuses) of similarity hashes in binary analysis. In: Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, 2018, pp. 354–365. https://doi.org/10.1145/3176258.3176306
Botacin, M., Moia, V.H.G., Ceschin, F., et al.: Understanding uses and misuses of similarity hashing functions for malware detection and family clustering in actual scenarios. Forensic Sci. Int.: Digit. Invest. 38, 301220 (2021). https://doi.org/10.1016/j.fsidi.2021.301220
Article Google Scholar
Nataraj, L., Karthikeyan, S., Jacob, G. et al.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, 2011, pp.1–7. https://doi.org/10.1145/2016904.2016908
Fu, J.W., Xue, J.F., Wang, Y., et al.: Malware visualization for fine-grained classification. IEEE Access 6, 14510–14523 (2018). https://doi.org/10.1109/ACCESS.2018.2805301
Article Google Scholar
Yakura, H., Shinozaki, S., Nishimura, R., et al.: Neural malware analysis with attention mechanism. Comput. Secur. 87, 101592 (2019). https://doi.org/10.1016/j.cose.2019.101592
Article Google Scholar
Vasan, D., Alazab, M., Wassan, S., et al.: IMCFN: image-based malware classification using fine-tuned convolutional neural network architecture. ComputerNet-works 171, 107138 (2020). https://doi.org/10.1016/j.comnet.2020.107138
Article Google Scholar
Xiao, G.Q., Li, J.N., Chen, Y.D., et al.: MalFCS: An effective malware classification framework with automated feature extraction based on deep convolutional neural networks. J. Parallel Distrib. Comput. 141, 49–58 (2020). https://doi.org/10.1016/j.jpdc.2020.03.012
Article Google Scholar
Yuan, B.G., Wang, J.F., Liu, D., et al.: Byte-level malware classi-fication based on markov images and deep learning. Comput. Secur. 92, 101740 (2020). https://doi.org/10.1016/j.cose.2020.101740
Article Google Scholar
Ghouti, L.: Malware classification using compact image features and multiclass support vector machines. IET Inf. Secur. 14(4), 419–429 (2020). https://doi.org/10.1049/iet-ifs.2019.0189
Article Google Scholar
Jain, M., Andreopoulos, W., Stamp, M.: Convolutional neural networks and extreme learning machines for malware classification. J. Comput. Virol.Hacking Tech. 16(3), 229–244 (2020). https://doi.org/10.1007/s11416-020-00354-y
Article Google Scholar
Kim, J., Kim, T.G., Im, E.G.: Structural information based malicious app similarity calculation and clustering. In: Proceedings of the 2015 Conference on research in adaptive and convergent systems, 2015, pp. 314–318. https://doi.org/10.1145/2811411.2811545
Schultz, M. G., Eskin, E., Zadok, E. et al.: Data mining methods for detection of new malicious executables. In: Proceedings 2001 IEEE Symposium on Security and Privacy, IEEE, 2001, pp. 38–49. https://doi.org/10.1109/SECPRI.2001.924286
Nataraj, L., Karthikeyan, S., Jacob, G. et al.: Malware images: visualization and automatic classification. In: Proceedings of the 8th international symposium on visualization for cyber security, 2011, pp. 1–7. https://doi.org/10.1145/2016904.2016908
Nataraj, L., Yegneswaran, V., Porras, P. et al.: A comparative assessment of malware classification using binary texture analysis and dynamic analysis. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, 2011, pp. 21–30. https://doi.org/10.1145/2046684.2046689
Zhao, H.L., Xu, M., Zheng, N. et al.: Malicious executables classification based on behavioral factor analysis. In: 2010 International Conference on e-Education, e-Business, e-Management and e-Learning, IEEE, 2010, pp. 502–506.doi: https://doi.org/10.1109/IC4E.2010.78
Uppal, D., Sinha, R., Mehra, V. et al.: Exploring behavioral aspects of API calls for malware identification and categorization. In: 2014 International Conference on Computational Intelligence and Communication Networks, IEEE, 2014, pp. 824–828. https://doi.org/10.1109/CICN.2014.176
Lu, X.F., Jiang, F.S., Zhou, X., et al.: ASSCA: API sequence and statistics features combined architecture for malware detection. Comput. Netw. 157, 99–111 (2019). https://doi.org/10.1016/j.comnet.2019.04.007
Article Google Scholar
Cakir, B., Dogdu, E.: Malware classification using deep learning methods. In: Proceedings of the ACMSE 2018 Conference, 2018, pp. 1–5. https://doi.org/10.1145/3190645.3190692
Popov, I.: Malware detection using machine learning based on word2vec embeddings of machine code instructions.In 2017: Siberian symposium on data science and engineering (SSDSE). IEEE 2017, 1–4 (2017). https://doi.org/10.1109/SSDSE.2017.8071952
Article Google Scholar
Pascanu, R., Stokes, J. W., Sanossian, H. et al.: Malware classification with recurrent networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, pp.1916–1920. https://doi.org/10.1109/ICASSP.2015.7178304.
Jeon, S., Moon, J.: Malware-detection method with a convolutional recurrent neural network using opcode sequences. Inf. Sci. 535, 1–15 (2020). https://doi.org/10.1016/j.ins.2020.05.026
Article MathSciNet Google Scholar
David, O.E., Netanyahu, N.S.: DeepSign: Deep learning for automatic malware signature generation and classification. In: Proceedings of the 2015 International Joint Conference on Neural Networks, IEEE, 2015. https://doi.org/10.1109/IJCNN.2015.7280815
Hardy, W., Chen, L.W., Hou, S. F. et al.: DL4MD: A deep learning framework for intelligent malware detection.In: Proceedings of the International Conference on Data Science (ICDATA), 2016. URL:https://www.covert.io/research-papers/deep-learning-security/DL4MD—A Deep Learning Framework for Intelligent Malware Detection.pdf
Kim, J.Y., Bu, S.J., Cho, S.B.: Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. Inf. Sci. 460, 83–102 (2018). https://doi.org/10.1016/j.ins.2018.04.092
Article Google Scholar
Wang, S., Philip, S.Y.: Heterogeneous graph matching networks: application to unknown malware detection. In: 2019 IEEE International Conference on Big Data (Big Data), 2019, pp. 5401–5408 https://doi.org/10.1109/BigData47090.2019.9006464.
Chen, K., Liu, P., Zhang, Y.J.: Achieving accuracy and scalability simultaneously in detecting application clones on Android markets. In: Proceedings of the 36th International Conference on Software Engineering (ICSE 2014), 2014, pp.175–186. https://doi.org/10.1145/2568225.2568286
Fan, M., Liu, J., Wang, W., et al.: DAPASA: detecting android piggybacked apps through sensitive subgraph analysis. IEEE Trans. Inf. Foren. Secur. 12, 1772–1785 (2017). https://doi.org/10.1109/TIFS.2017.2687880
Article Google Scholar
Fan, M., et al.: Android malware familial classification and representative sample selection via frequent subgraph analysis. IEEE Trans. Inf. Foren. Secur. 13, 1890–1905 (2018). https://doi.org/10.1109/TIFS.2018.2806891
Article Google Scholar
Yewale, A., Singh, M.: Malware detection based on opcode frequency. Int. Conf. Adv. Commun. Control Comput. Technol. (ICACCCT) 2016, 646–649 (2016). https://doi.org/10.1109/ICACCCT.2016.7831719
Article Google Scholar
Gao, H., Cheng, S., Zhang, W.: GDroid: Android malware detection and classification with graph convolutional network. Comput. Secur. 106, 102264 (2021). https://doi.org/10.1016/j.cose.2021.102264
Article Google Scholar
Khalilian, A., Nourazar, A., Vahidi-Asl, M., et al.: G3MD: Mining frequent opcode sub-graphs for metamorphic malware detection of existing families. Expert Syst. Appl. 112, 15–33 (2018). https://doi.org/10.1016/j.eswa.2018.06.012
Article Google Scholar
Aghakhani, H., Gritti, F., Mecca, F. et al.: When malware is Packin' heat; limits of machine learning classifiers based on static analysis features. In: Network and Distributed System Security Symposium 2020, 2020. https://doi.org/10.14722/ndss.2020.24310
Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.:. AVclass: A tool for massive malware labeling. International Symposium on Research in Attacks, Intrusions, and Defenses, 2016, pp.230–253. https://doi.org/10.1007/978-3-319-45719-2_11
Afianian, A., Niksefat, S., Sadeghiyan, B., Baptiste, D.: Malware dynamic analysis evasion techniques: a survey. ACM Computing Surveys ,2019, pp.1–28 .https://doi.org/10.1145/3365001
Kyriakos K. Ispoglou and Mathias Payer.MalWASH: washing malware to evade dynamic analysis.In: Proceedings of the 10th USENIX Conference on Offensive Technologies, 2016, pp.106–117.https://dl.acm.org/doi/https://doi.org/10.5555/3027019.3027029

Download references

Acknowledgements

This work was supported by Double First-Class Innovation Research Project for People’s Public Security University of China, No.2023SYL07.

Author information

Authors and Affiliations

College of Information and Cyber Security, People’s Public Security University of China, Beijing, China
Runzheng Wang, Jian Gao & Shuhua Huang
Key Laboratory of Safety Precautions and Risk Assessment, Beijing, China
Jian Gao & Shuhua Huang

Authors

Runzheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Gao
View author publications
You can also search for this author in PubMed Google Scholar
Shuhua Huang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RW was contributed to acquisition and analysis of data, conception and design of methodology, writing original draft, review and editing. JG was contributed to conception and design of methodology, supervision, review. SH was contributed to conception and design of methodology, supervision, validation.

Corresponding author

Correspondence to Jian Gao.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, R., Gao, J. & Huang, S. AIHGAT: A novel method of malware detection and homology analysis using assembly instruction heterogeneous graph. Int. J. Inf. Secur. 22, 1423–1443 (2023). https://doi.org/10.1007/s10207-023-00699-7

Download citation

Accepted: 12 April 2023
Published: 14 May 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10207-023-00699-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AIHGAT: A novel method of malware detection and homology analysis using assembly instruction heterogeneous graph

Abstract

Access this article

Similar content being viewed by others

Evolutionary Triplet Network of Learning Disentangled Malware Space for Malware Classification

Malware Detected and Tell Me Why: An Verifiable Malware Detection Model with Graph Metric Learning

Code Representation Based on Hybrid Graph Modelling

Data availability statements

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

AIHGAT: A novel method of malware detection and homology analysis using assembly instruction heterogeneous graph

Abstract

Access this article

Similar content being viewed by others

Evolutionary Triplet Network of Learning Disentangled Malware Space for Malware Classification

Malware Detected and Tell Me Why: An Verifiable Malware Detection Model with Graph Metric Learning

Code Representation Based on Hybrid Graph Modelling

Data availability statements

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation