Skip to main content

Malware Classification Based on Graph Neural Network Using Control Flow Graph

  • 233 Accesses

Part of the Lecture Notes in Networks and Systems book series (LNNS,volume 346)

Abstract

The classification of malware families is based on the similarity within the malware family, including the similarity of program structure and content. Since the control flow graph belongs to non-Euclidean structured data, it is difficult to directly use the feature extracted from its data and structure for classifying before. However, with the proposal of graph neural network, non-Euclidean graph’s classification become possible. We propose a malware family classification system based on control flow graph and Term Frequency-Inverse Document Frequency. In this system, both the control flow graph branch structure and the instruction sequence in basic blocks are treated as input, and the graph feature representation of the malware family is generated through the graph neural network. The experimental results on the Microsoft Malware Classification Challenge dataset show that retaining the feature data of the graph structure can effectively improve the effect of family classification. And the effect can also be improved through the instruction features based on TF-IDF.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-90072-4_13
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   219.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-90072-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   279.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

References

  1. Cangea, C., Veličković, P., Jovanović, N., Kipf, T., Liò, P.: Towards sparse hierarchical graph classifiers. arXiv preprint arXiv:1811.01287 (2018)

  2. Cesare, S., Xiang, Y.: Classification of malware using structured control flow. In: Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing, vol. 107, pp. 61–70 (2010)

    Google Scholar 

  3. Han, K.S., Kang, B., Im, E.G.: Malware classification using instruction frequencies. In: Proceedings of the 2011 ACM Symposium on Research in Applied Computation, pp. 298–300 (2011)

    Google Scholar 

  4. Jiang, J., et al.: Android malware family classification based on sensitive opcode sequence. In: 2019 IEEE Symposium on Computers and Communications (ISCC), pp. 1–7. IEEE (2019)

    Google Scholar 

  5. Kinable, J., Kostakis, O.: Malware classification based on call graph clustering. J. Comput. Virol. 7(4), 233–245 (2011)

    CrossRef  Google Scholar 

  6. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  7. Lee, J., Lee, I., Kang, J.: Self-attention graph pooling. In: International Conference on Machine Learning, pp. 3734–3743. PMLR (2019)

    Google Scholar 

  8. Li, Y., Luo, S., Hao, J., Pan, L.: Malware family classification method based on abstract assembly instructions. J. Beijing Univ. Aeronaut. Astronaut. 1–9. https://kns.cnki.net/kns8/manage/export?filename=bjhk20210205000&dbname=CAPJLAST

  9. Microsoft: Microsoft malware classification challenge big 2015 (2015). http://arxiv.org/abs/1802.10135

  10. Walenstein, A., Lakhotia, A.: The software similarity problem in malware analysis. In: Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2007)

    Google Scholar 

  11. Xiao, D., Liu, B., Cui, B., Wang, X., Zhang, S.: Malicious program prediction technology based on program gene. Chin. J. Netw. Inf. Secur. 4(08), 21–30 (2018)

    Google Scholar 

  12. Xu, L., Zhang, D., Alvarez, M.A., Morales, J.A., Ma, X., Cavazos, J.: Dynamic android malware classification using graph-based representations. In: 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud), pp. 220–231 (2016). https://doi.org/10.1109/CSCloud.2016.27

  13. Xue, D., Li, J., Wu, W., Tian, Q., Wang, J.: Homology analysis of malware based on ensemble learning and multifeatures. PloS One 14(8), e0211,373 (2019)

    Google Scholar 

  14. Yewale, A., Singh, M.: Malware detection based on opcode frequency. In: 2016 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), pp. 646–649. IEEE (2016)

    Google Scholar 

  15. Zhao, B., Shan, Z., Liu, F., Zhao, B., Chen, Y., Sun, W.: Malware homology identification based on a gene perspective. Front. Inf. Technol. Electron. Eng. 20(6), 801–815 (2019)

    CrossRef  Google Scholar 

  16. Zhao, J., Chen, S., Cao, M., Cui, B.: Malicious program algorithm identification based on offline assembly instruction flow analysis. J. Tsinghua Univ. (Sci. Technol.) 56(05), 484–492 (2016)

    Google Scholar 

  17. Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baojiang Cui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Xia, R., Cui, B. (2022). Malware Classification Based on Graph Neural Network Using Control Flow Graph. In: Barolli, L. (eds) Advances on Broad-Band Wireless Computing, Communication and Applications. BWCCA 2021. Lecture Notes in Networks and Systems, vol 346. Springer, Cham. https://doi.org/10.1007/978-3-030-90072-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-90072-4_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-90071-7

  • Online ISBN: 978-3-030-90072-4

  • eBook Packages: EngineeringEngineering (R0)