Skip to main content
Log in

AIHGAT: A novel method of malware detection and homology analysis using assembly instruction heterogeneous graph

  • Regular contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

At present, the trend of familiarization of malicious code is becoming more and more obvious, and the research on the homology of malware (the classification of malicious code family) is of great significance for maintaining network security. In order to better express the overall characteristics of malicious code and improve the effect of detection and homology analysis, this paper proposes a method for detection and homology analysis of malware based on heterogeneous graphs of assembly instructions (AIHGAT). We take the assembly instructions of malicious families as the research object and analyze the importance and correlation of assembly instructions of different malicious families. The malware detection and homology analysis are carried out in three aspects: feature extraction, feature preprocessing, and model construction. In the feature extraction of malicious code, in order to alleviate the problem that it is difficult to extract static features of malicious samples that contain countermeasures such as packing and obfuscation, we obtain binary files from dynamic memory through sandbox and then, analyze its assembly instruction set. In feature preprocessing, we divide the assembly instructions into N-tuples and construct a heterogeneous graph based on assembly instructions according to the internal correlation of the gene sequence composed of the assembly N-grams features. Finally, in terms of model construction, we analyze the homology determination effect of the traditional graph neural network and construct the Graph Attention Network based on residual connection named ResGAT to analyze the homology of malicious code. The experimental results show that the ResGAT can gather the core characteristics of malicious families and enhance the ability to recognize malicious family variants. Our model has an accuracy rate of 98.83%, which is better than traditional machine learning detection methods, and can effectively determine the homology of malicious code families.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability statements

All data generated or analyzed during this study are included in this article. The dataset that supports the findings of this study are available at virusshare.com.

Notes

  1. https://hex-rays.com/ida-pro/.

  2. https://cuckoosandbox.org/.

  3. https://www.aldeid.com/wiki/PEiD.

  4. https://hex-rays.com/products/ida/support/idapython_docs/.

References

  1. Santos, I., Brezo, F., Ugarte-Pedrero, X., et al.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. 231, 64–82 (2013). https://doi.org/10.1016/j.ins.2011.08.020

    Article  MathSciNet  Google Scholar 

  2. Zhang F.Y., Zhao, T.Z. Malware detection and classification based on N-grams attribute similarity. In: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), IEEE, 2017, pp. 793–796. https://doi.org/10.1109/CSE-EUC.2017.157

  3. Galal, H.S., Mahdy, Y.B., Atiea, M.A.: Behavior-based features model for malware detection. J. Comput. Virol. Hacking Tech. 12(2), 59–67 (2016). https://doi.org/10.1007/s11416-015-0244-0

    Article  Google Scholar 

  4. Shabtai, A., Moskovitch, R., Feher, C., et al.: Detecting unknown malicious code by applying classification techniques on OpCode patterns. Secur. Inform. 1, 1–22 (2012). https://doi.org/10.1186/2190-8532-1-1

    Article  Google Scholar 

  5. Lee, J., Im, C., Jeong, H.: A study of malware detection and classification by comparing extracted strings. In: Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication, 2011, pp. 1–4. https://doi.org/10.1145/1968613.1968704

  6. Alazab, M., Venkataraman, S., Watters, P.: Towards understanding malware behaviour by the extraction of API calls. In: Proceedings of 2010 Second Cybercrime and Trustworthy Computing Workshop, IEEE, 2016, pp.52–59.doi: https://doi.org/10.1109/CTC.2010.8

  7. Amer, E., Zelinka, I.: A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence. Comput. Secur. 92, 101760 (2020). https://doi.org/10.1016/j.cose.2020.101760

    Article  Google Scholar 

  8. Shang, S., Zheng, N., Xu, J. et al.: Detecting malware variants via function-call graph similarity. In: Proceedings of the 5th International Conference on the Malicious and Unwanted Software, IEEE, 2010, pp.113–120. https://doi.org/10.1109/MALWARE.2010.5665787

  9. Kong, D., Yan, G.: Discriminant malware distance learning on structural information for automated malware classification. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 2013, pp. 1357–1365. https://doi.org/10.1145/2487575.2488219

  10. Hassen, M., Chan, P.K.: Scalable function call graph-based malware classification. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy. ACM, New York, NY, USA, 2017, pp. 239–248, https://doi.org/10.1145/3029806.3029824

  11. Bruschi, D., Martignoni, L., Monga, M.: Detecting self-mutating malware using control-flow graph matching. In: Proceedings of the 3rd International Conference on Detection of Intrusions and Malware & Vulnerability Assessment. Berlin: Springer, 2006, pp.129–143. https://doi.org/10.1007/11790754_8

  12. Ding, Y.X., Dai, W., Yan, S.L., Zhang, Y.M.: Control flow-based opcode behavior analysis for Malware detection. Comput. Secur. 44, 65–74 (2014). https://doi.org/10.1016/j.cose.2014.04.003

    Article  Google Scholar 

  13. Abou-Assaleh, T., Cercone, N., Keselj, V. et al.: Detection of new malicious code using N-grams signatures. In: Proceedings of the 2nd Annual Conference on Privacy, Security and Trust. New Brunswick, Canada, 2004, pp.193–196

  14. Sornil, O., Liangboonprakong, C.: Malware classification using N-grams sequential pattern features. Int.J. Inf. Process. Manag. 4(5), 59–67 (2013). https://doi.org/10.4156/ijipm.vol4.issue5.7

    Article  Google Scholar 

  15. Moskovitch, R., Feher, C., Tzachar, N. et al.: Unknown malcode detection using OPCODE representation. In: Proceedings of the 2008 European Conference on Intelligence and Security Informatics. Berlin: Springer, 2008, pp.204–215. https://doi.org/10.1007/978-3-540-89900-6_21

  16. Zhang, B., Xiao, W.T., Xiao, X., et al.: Ransomware classification using patch-based CNN and self-attention network on embedded N-grams of opcodes. Futur. Gener. Comput. Syst. 110, 708–720 (2020). https://doi.org/10.1016/j.future.2019.09.025

    Article  Google Scholar 

  17. Baldangombo, U., Jambaljav, N., Horng, S. J.: Static malware detection system using data mining methods. Int. J.Artif. Intell. Appl. 4(4), 113–126. https://arxiv.org/abs/1308.2831 (2013)

  18. Kolosnjaji, B., Zarras, A., Webster, G., Eckert, C.: Deep learning for classification of malware system call sequences. In: Australasian Joint Conference on Artificial Intelligence. Springer, Cham, 2016, pp. 137–149. https://doi.org/10.1007/978-3-319-50127-7_11

  19. Zhang, J.X., Qin, Z., Yin, H., et al.: A feature-hybrid malware variants detection using CNN based opcode embedding and BPNN based API embedding. Comput. Secur. 84, 376–392 (2019). https://doi.org/10.1016/j.cose.2019.04.005

    Article  Google Scholar 

  20. Wojnowicz, M., Chisholm, G., Wolff, M., Zhao, X.: Wavelet decomposition of software entropy reveals symptoms of malicious code. J. Innov. Digit. Ecosyst. 3(2), 130–140 (2016). https://doi.org/10.1016/j.jides.2016.10.009

    Article  Google Scholar 

  21. Pagani, F., Dell'Amico, M., Balzarotti, D.: Beyond precision and recall: understanding uses (and misuses) of similarity hashes in binary analysis. In: Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, 2018, pp. 354–365. https://doi.org/10.1145/3176258.3176306

  22. Botacin, M., Moia, V.H.G., Ceschin, F., et al.: Understanding uses and misuses of similarity hashing functions for malware detection and family clustering in actual scenarios. Forensic Sci. Int.: Digit. Invest. 38, 301220 (2021). https://doi.org/10.1016/j.fsidi.2021.301220

    Article  Google Scholar 

  23. Nataraj, L., Karthikeyan, S., Jacob, G. et al.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, 2011, pp.1–7. https://doi.org/10.1145/2016904.2016908

  24. Fu, J.W., Xue, J.F., Wang, Y., et al.: Malware visualization for fine-grained classification. IEEE Access 6, 14510–14523 (2018). https://doi.org/10.1109/ACCESS.2018.2805301

    Article  Google Scholar 

  25. Yakura, H., Shinozaki, S., Nishimura, R., et al.: Neural malware analysis with attention mechanism. Comput. Secur. 87, 101592 (2019). https://doi.org/10.1016/j.cose.2019.101592

    Article  Google Scholar 

  26. Vasan, D., Alazab, M., Wassan, S., et al.: IMCFN: image-based malware classification using fine-tuned convolutional neural network architecture. ComputerNet-works 171, 107138 (2020). https://doi.org/10.1016/j.comnet.2020.107138

    Article  Google Scholar 

  27. Xiao, G.Q., Li, J.N., Chen, Y.D., et al.: MalFCS: An effective malware classification framework with automated feature extraction based on deep convolutional neural networks. J. Parallel Distrib. Comput. 141, 49–58 (2020). https://doi.org/10.1016/j.jpdc.2020.03.012

    Article  Google Scholar 

  28. Yuan, B.G., Wang, J.F., Liu, D., et al.: Byte-level malware classi-fication based on markov images and deep learning. Comput. Secur. 92, 101740 (2020). https://doi.org/10.1016/j.cose.2020.101740

    Article  Google Scholar 

  29. Ghouti, L.: Malware classification using compact image features and multiclass support vector machines. IET Inf. Secur. 14(4), 419–429 (2020). https://doi.org/10.1049/iet-ifs.2019.0189

    Article  Google Scholar 

  30. Jain, M., Andreopoulos, W., Stamp, M.: Convolutional neural networks and extreme learning machines for malware classification. J. Comput. Virol.Hacking Tech. 16(3), 229–244 (2020). https://doi.org/10.1007/s11416-020-00354-y

    Article  Google Scholar 

  31. Kim, J., Kim, T.G., Im, E.G.: Structural information based malicious app similarity calculation and clustering. In: Proceedings of the 2015 Conference on research in adaptive and convergent systems, 2015, pp. 314–318. https://doi.org/10.1145/2811411.2811545

  32. Schultz, M. G., Eskin, E., Zadok, E. et al.: Data mining methods for detection of new malicious executables. In: Proceedings 2001 IEEE Symposium on Security and Privacy, IEEE, 2001, pp. 38–49. https://doi.org/10.1109/SECPRI.2001.924286

  33. Nataraj, L., Karthikeyan, S., Jacob, G. et al.: Malware images: visualization and automatic classification. In: Proceedings of the 8th international symposium on visualization for cyber security, 2011, pp. 1–7. https://doi.org/10.1145/2016904.2016908

  34. Nataraj, L., Yegneswaran, V., Porras, P. et al.: A comparative assessment of malware classification using binary texture analysis and dynamic analysis. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, 2011, pp. 21–30. https://doi.org/10.1145/2046684.2046689

  35. Zhao, H.L., Xu, M., Zheng, N. et al.: Malicious executables classification based on behavioral factor analysis. In: 2010 International Conference on e-Education, e-Business, e-Management and e-Learning, IEEE, 2010, pp. 502–506.doi: https://doi.org/10.1109/IC4E.2010.78

  36. Uppal, D., Sinha, R., Mehra, V. et al.: Exploring behavioral aspects of API calls for malware identification and categorization. In: 2014 International Conference on Computational Intelligence and Communication Networks, IEEE, 2014, pp. 824–828. https://doi.org/10.1109/CICN.2014.176

  37. Lu, X.F., Jiang, F.S., Zhou, X., et al.: ASSCA: API sequence and statistics features combined architecture for malware detection. Comput. Netw. 157, 99–111 (2019). https://doi.org/10.1016/j.comnet.2019.04.007

    Article  Google Scholar 

  38. Cakir, B., Dogdu, E.: Malware classification using deep learning methods. In: Proceedings of the ACMSE 2018 Conference, 2018, pp. 1–5. https://doi.org/10.1145/3190645.3190692

  39. Popov, I.: Malware detection using machine learning based on word2vec embeddings of machine code instructions.In 2017: Siberian symposium on data science and engineering (SSDSE). IEEE 2017, 1–4 (2017). https://doi.org/10.1109/SSDSE.2017.8071952

    Article  Google Scholar 

  40. Pascanu, R., Stokes, J. W., Sanossian, H. et al.: Malware classification with recurrent networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, pp.1916–1920. https://doi.org/10.1109/ICASSP.2015.7178304.

  41. Jeon, S., Moon, J.: Malware-detection method with a convolutional recurrent neural network using opcode sequences. Inf. Sci. 535, 1–15 (2020). https://doi.org/10.1016/j.ins.2020.05.026

    Article  MathSciNet  Google Scholar 

  42. David, O.E., Netanyahu, N.S.: DeepSign: Deep learning for automatic malware signature generation and classification. In: Proceedings of the 2015 International Joint Conference on Neural Networks, IEEE, 2015. https://doi.org/10.1109/IJCNN.2015.7280815

  43. Hardy, W., Chen, L.W., Hou, S. F. et al.: DL4MD: A deep learning framework for intelligent malware detection.In: Proceedings of the International Conference on Data Science (ICDATA), 2016. URL:https://www.covert.io/research-papers/deep-learning-security/DL4MD—A Deep Learning Framework for Intelligent Malware Detection.pdf

  44. Kim, J.Y., Bu, S.J., Cho, S.B.: Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. Inf. Sci. 460, 83–102 (2018). https://doi.org/10.1016/j.ins.2018.04.092

    Article  Google Scholar 

  45. Wang, S., Philip, S.Y.: Heterogeneous graph matching networks: application to unknown malware detection. In: 2019 IEEE International Conference on Big Data (Big Data), 2019, pp. 5401–5408 https://doi.org/10.1109/BigData47090.2019.9006464.

  46. Chen, K., Liu, P., Zhang, Y.J.: Achieving accuracy and scalability simultaneously in detecting application clones on Android markets. In: Proceedings of the 36th International Conference on Software Engineering (ICSE 2014), 2014, pp.175–186. https://doi.org/10.1145/2568225.2568286

  47. Fan, M., Liu, J., Wang, W., et al.: DAPASA: detecting android piggybacked apps through sensitive subgraph analysis. IEEE Trans. Inf. Foren. Secur. 12, 1772–1785 (2017). https://doi.org/10.1109/TIFS.2017.2687880

    Article  Google Scholar 

  48. Fan, M., et al.: Android malware familial classification and representative sample selection via frequent subgraph analysis. IEEE Trans. Inf. Foren. Secur. 13, 1890–1905 (2018). https://doi.org/10.1109/TIFS.2018.2806891

    Article  Google Scholar 

  49. Yewale, A., Singh, M.: Malware detection based on opcode frequency. Int. Conf. Adv. Commun. Control Comput. Technol. (ICACCCT) 2016, 646–649 (2016). https://doi.org/10.1109/ICACCCT.2016.7831719

    Article  Google Scholar 

  50. Gao, H., Cheng, S., Zhang, W.: GDroid: Android malware detection and classification with graph convolutional network. Comput. Secur. 106, 102264 (2021). https://doi.org/10.1016/j.cose.2021.102264

    Article  Google Scholar 

  51. Khalilian, A., Nourazar, A., Vahidi-Asl, M., et al.: G3MD: Mining frequent opcode sub-graphs for metamorphic malware detection of existing families. Expert Syst. Appl. 112, 15–33 (2018). https://doi.org/10.1016/j.eswa.2018.06.012

    Article  Google Scholar 

  52. Aghakhani, H., Gritti, F., Mecca, F. et al.: When malware is Packin' heat; limits of machine learning classifiers based on static analysis features. In: Network and Distributed System Security Symposium 2020, 2020. https://doi.org/10.14722/ndss.2020.24310

  53. Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.:. AVclass: A tool for massive malware labeling. International Symposium on Research in Attacks, Intrusions, and Defenses, 2016, pp.230–253. https://doi.org/10.1007/978-3-319-45719-2_11

  54. Afianian, A., Niksefat, S., Sadeghiyan, B., Baptiste, D.: Malware dynamic analysis evasion techniques: a survey. ACM Computing Surveys ,2019, pp.1–28 .https://doi.org/10.1145/3365001

  55. Kyriakos K. Ispoglou and Mathias Payer.MalWASH: washing malware to evade dynamic analysis.In: Proceedings of the 10th USENIX Conference on Offensive Technologies, 2016, pp.106–117.https://dl.acm.org/doi/https://doi.org/10.5555/3027019.3027029

Download references

Acknowledgements

This work was supported by Double First-Class Innovation Research Project for People’s Public Security University of China, No.2023SYL07.

Author information

Authors and Affiliations

Authors

Contributions

RW was contributed to acquisition and analysis of data, conception and design of methodology, writing original draft, review and editing. JG was contributed to conception and design of methodology, supervision, review. SH was contributed to conception and design of methodology, supervision, validation.

Corresponding author

Correspondence to Jian Gao.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, R., Gao, J. & Huang, S. AIHGAT: A novel method of malware detection and homology analysis using assembly instruction heterogeneous graph. Int. J. Inf. Secur. 22, 1423–1443 (2023). https://doi.org/10.1007/s10207-023-00699-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-023-00699-7

Keywords

Navigation