Skip to main content

Imbalance Malware Classification by Decoupling Representation and Classifier

  • Conference paper
  • First Online:
Advances in Artificial Intelligence and Security (ICAIS 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1424))

Included in the following conference series:

  • 1756 Accesses

Abstract

In recent years, with the widespread utilization of automatic and semi-automatic malicious code generation tools, malware has been increased dramatically. Due to the growing malware variants, automatic detection and classification of malware samples is inevitable. A long-tailed distribution is one of the most common problems in malware dataset. However, existing general deep learning model is difficult to identify few-shot classes in the case of frequent classes. To address this challenge, we propose a new training scheme to classify unbalanced malware by decoupling representation learning from classifier learning. Specifically, in representation learning, the model is trained without balancing or oversampling the tail classes in order to learn the best representation. In classifier learning, we fine-tune the classifier under the class-balance sampling scheme. In addition, at this stage, we optimize the classifier with a fixed feature extractor during training to suppress the re-balancing effect on the backbone learned in the first stage. To assess the performance of the proposed approach, experiments are conducted on the BIG 2015 dataset. The experimental results demonstrate that our approach provides far higher classification performance, particularly in few-shot malware families than those proposed in existing literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. China Internet Security Report in 2018. https://www.cert.org.cn/publish/main/upload/File/2018annual.pdf. Accessed 17 July 2019

  2. Luo, M., Wang, K., Cai, Z., et al.: Using imbalanced triangle synthetic data for machine learning anomaly detection. Comput. Mater. Continua 58(1), 15–26 (2019)

    Article  Google Scholar 

  3. Ye, H.J., Ye, H.Y., Chen, D.C., et al.: Identifying and compensating for feature deviation in imbalanced deep learning (2020). arXiv preprint arXiv:2001.01385

  4. Kang, B., Kang, S., Xie, M., et al.: Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217 (2019)

  5. Egele, M., Kruegel, C., Kirda, E., et al.: Dynamic spyware analysis. In: Proceedings of the Usenix Annual Technical Conference, pp. 233–246. ACM (2007)

    Google Scholar 

  6. Fujin, A., Murakami, J., Mori, T.: Discovering similar malware samples using API call topics. In: The 12th Annual IEEE Consumer Communications and Networking Conference (CCNC), pp. 1–8. IEEE (2015)

    Google Scholar 

  7. Lim, H., Yamaguchi, Y., Shimada, H., et al.: Malware classification method based on sequence of traffic flow. In: International Conference on Information Systems Security and Privacy (ICISSP), pp. 1–8. IEEE (2015)

    Google Scholar 

  8. Tobiyama, S., Yamaguchi, Y., Shimada, H., et al.: Malware detection with deep neural network using process behavior. In: IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), pp. 577–582. IEEE (2016)

    Google Scholar 

  9. Kim, D., Shin, G., Han, M.: Analysis of feature importance and interpretation for malware classification. Comput. Mater. Continua 65(3), 1891–904 (2020)

    Article  Google Scholar 

  10. Santos, I., Brezo, F., Nieves, J., Penya, Y.K., Sanz, B., Laorden, C., Bringas, P.G.: Idea: opcode-sequence-based malware detection. In: Massacci, F., Wallach, D., Zannone, N. (eds.) ESSoS 2010. LNCS, vol. 5965, pp. 35–43. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11747-3_3

    Chapter  Google Scholar 

  11. Liu, L., Wang, B.S., Yu, B., et al.: Automatic malware classification and new malware detection using machine learning. Front. Inf. Technol. Electron. Eng. 18(9), 1336–1347 (2016). https://doi.org/10.1631/FITEE.1601325

    Article  Google Scholar 

  12. Raff, E., Sylvester, J., Nicholas, C.: Learning the PE header, malware detection with minimal domain knowledge. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (AISec 2017), pp. 121–132. ACM (2017)

    Google Scholar 

  13. Dai, Y., Li, H., Qian, Y., et al.: SMASH: a malware detection method based on multi-feature ensemble learning. IEEE Access 7, 112588–112597 (2019)

    Article  Google Scholar 

  14. Li, Y., Xu, H., Xian, L., et al.: Novel android malware detection method based on multi-dimensional hybrid features extraction and analysis. Intell. Autom. Soft Comput. 25(3), 637–647 (2019)

    Article  Google Scholar 

  15. Nataraj, L., Karthikeyan, S., Jacob, G., et al.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security (VizSec 2011), pp. 1–7. ACM (2011)

    Google Scholar 

  16. Han, X.G., Qu, W., Yao, X.X., et al.: Research on malicious code variants detection based on texture fingerprint. J. Commun. 35(8), 125–136 (2014)

    Google Scholar 

  17. Gupta, S., Bansal, P., Kumar. S.: ULBP-RF: a hybrid approach for malware image classification. In: 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 115–119. IEEE (2018)

    Google Scholar 

  18. Zhao, Y., Xu, C., Bo, B., et al.: Maldeep: a deep learning classification framework against malware variants based on texture visualization. Secur. Commun. Netw. 2019, 1–12 (2019)

    Google Scholar 

  19. Yakura, H., Shinozaki, S., Nishimura, R., et al.: Malware analysis of imaged binary samples by convolutional neural network with attention mechanism. In: Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, pp. 127–134. ACM (2018)

    Google Scholar 

  20. Chen, B., Ren, Z., Yu, C., et al.: Adversarial examples for CNN-based malware detectors. IEEE Access 7, 54360–54371 (2019)

    Article  Google Scholar 

  21. Wang, Y.X., Girshick, R., Hebert, M., et al.: Low-shot learning from imaginary data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7278–7286. IEEE (2018)

    Google Scholar 

  22. Xie, S., Girshick, R., Dollár, P., et al.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500. IEEE (2017)

    Google Scholar 

  23. Ronen, R., Radu, M., Feuerstein, M., et al.: Microsoft malware classification challenge. arXiv preprint arXiv:1802.10135 (2018)

Download references

Acknowledgement

This work is supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJB510022, the Research Startup Funds for the Introduction of High-level Talents at Jiangsu Police Institute under Grant No. JSPIGKZ. We also express our thanks to the Jiangsu Electronic Data Forensics Analysis and Research Center (No. 2019SJPT002) and the Key Laboratory of Digital Forensics of Jiangsu Public Security Department. Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, J., Zhuge, C., Wang, Q., Guo, X., Li, Z. (2021). Imbalance Malware Classification by Decoupling Representation and Classifier. In: Sun, X., Zhang, X., Xia, Z., Bertino, E. (eds) Advances in Artificial Intelligence and Security. ICAIS 2021. Communications in Computer and Information Science, vol 1424. Springer, Cham. https://doi.org/10.1007/978-3-030-78621-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78621-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78620-5

  • Online ISBN: 978-3-030-78621-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics