Imbalance Malware Classification by Decoupling Representation and Classifier

Liu, Jiayin; Zhuge, Chengchen; Wang, Qun; Guo, Xiangmin; Li, Ziyuan

doi:10.1007/978-3-030-78621-2_7

Jiayin Liu^9,10,11,
Chengchen Zhuge^9,10,11,
Qun Wang^9,10,11,
Xiangmin Guo^9,10,11 &
…
Ziyuan Li⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1424))

Included in the following conference series:

International Conference on Artificial Intelligence and Security

1756 Accesses

Abstract

In recent years, with the widespread utilization of automatic and semi-automatic malicious code generation tools, malware has been increased dramatically. Due to the growing malware variants, automatic detection and classification of malware samples is inevitable. A long-tailed distribution is one of the most common problems in malware dataset. However, existing general deep learning model is difficult to identify few-shot classes in the case of frequent classes. To address this challenge, we propose a new training scheme to classify unbalanced malware by decoupling representation learning from classifier learning. Specifically, in representation learning, the model is trained without balancing or oversampling the tail classes in order to learn the best representation. In classifier learning, we fine-tune the classifier under the class-balance sampling scheme. In addition, at this stage, we optimize the classifier with a fixed feature extractor during training to suppress the re-balancing effect on the backbone learned in the first stage. To assess the performance of the proposed approach, experiments are conducted on the BIG 2015 dataset. The experimental results demonstrate that our approach provides far higher classification performance, particularly in few-shot malware families than those proposed in existing literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Generalized Unknown Malware Classification

Malware Classification Based on Semi-Supervised Learning

Cost-Effective Malware Classification Based on Deep Active Learning

References

China Internet Security Report in 2018. https://www.cert.org.cn/publish/main/upload/File/2018annual.pdf. Accessed 17 July 2019
Luo, M., Wang, K., Cai, Z., et al.: Using imbalanced triangle synthetic data for machine learning anomaly detection. Comput. Mater. Continua 58(1), 15–26 (2019)
Article Google Scholar
Ye, H.J., Ye, H.Y., Chen, D.C., et al.: Identifying and compensating for feature deviation in imbalanced deep learning (2020). arXiv preprint arXiv:2001.01385
Kang, B., Kang, S., Xie, M., et al.: Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217 (2019)
Egele, M., Kruegel, C., Kirda, E., et al.: Dynamic spyware analysis. In: Proceedings of the Usenix Annual Technical Conference, pp. 233–246. ACM (2007)
Google Scholar
Fujin, A., Murakami, J., Mori, T.: Discovering similar malware samples using API call topics. In: The 12th Annual IEEE Consumer Communications and Networking Conference (CCNC), pp. 1–8. IEEE (2015)
Google Scholar
Lim, H., Yamaguchi, Y., Shimada, H., et al.: Malware classification method based on sequence of traffic flow. In: International Conference on Information Systems Security and Privacy (ICISSP), pp. 1–8. IEEE (2015)
Google Scholar
Tobiyama, S., Yamaguchi, Y., Shimada, H., et al.: Malware detection with deep neural network using process behavior. In: IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), pp. 577–582. IEEE (2016)
Google Scholar
Kim, D., Shin, G., Han, M.: Analysis of feature importance and interpretation for malware classification. Comput. Mater. Continua 65(3), 1891–904 (2020)
Article Google Scholar
Santos, I., Brezo, F., Nieves, J., Penya, Y.K., Sanz, B., Laorden, C., Bringas, P.G.: Idea: opcode-sequence-based malware detection. In: Massacci, F., Wallach, D., Zannone, N. (eds.) ESSoS 2010. LNCS, vol. 5965, pp. 35–43. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11747-3_3
Chapter Google Scholar
Liu, L., Wang, B.S., Yu, B., et al.: Automatic malware classification and new malware detection using machine learning. Front. Inf. Technol. Electron. Eng. 18(9), 1336–1347 (2016). https://doi.org/10.1631/FITEE.1601325
Article Google Scholar
Raff, E., Sylvester, J., Nicholas, C.: Learning the PE header, malware detection with minimal domain knowledge. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (AISec 2017), pp. 121–132. ACM (2017)
Google Scholar
Dai, Y., Li, H., Qian, Y., et al.: SMASH: a malware detection method based on multi-feature ensemble learning. IEEE Access 7, 112588–112597 (2019)
Article Google Scholar
Li, Y., Xu, H., Xian, L., et al.: Novel android malware detection method based on multi-dimensional hybrid features extraction and analysis. Intell. Autom. Soft Comput. 25(3), 637–647 (2019)
Article Google Scholar
Nataraj, L., Karthikeyan, S., Jacob, G., et al.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security (VizSec 2011), pp. 1–7. ACM (2011)
Google Scholar
Han, X.G., Qu, W., Yao, X.X., et al.: Research on malicious code variants detection based on texture fingerprint. J. Commun. 35(8), 125–136 (2014)
Google Scholar
Gupta, S., Bansal, P., Kumar. S.: ULBP-RF: a hybrid approach for malware image classification. In: 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 115–119. IEEE (2018)
Google Scholar
Zhao, Y., Xu, C., Bo, B., et al.: Maldeep: a deep learning classification framework against malware variants based on texture visualization. Secur. Commun. Netw. 2019, 1–12 (2019)
Google Scholar
Yakura, H., Shinozaki, S., Nishimura, R., et al.: Malware analysis of imaged binary samples by convolutional neural network with attention mechanism. In: Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, pp. 127–134. ACM (2018)
Google Scholar
Chen, B., Ren, Z., Yu, C., et al.: Adversarial examples for CNN-based malware detectors. IEEE Access 7, 54360–54371 (2019)
Article Google Scholar
Wang, Y.X., Girshick, R., Hebert, M., et al.: Low-shot learning from imaginary data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7278–7286. IEEE (2018)
Google Scholar
Xie, S., Girshick, R., Dollár, P., et al.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500. IEEE (2017)
Google Scholar
Ronen, R., Radu, M., Feuerstein, M., et al.: Microsoft malware classification challenge. arXiv preprint arXiv:1802.10135 (2018)

Download references

Acknowledgement

This work is supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJB510022, the Research Startup Funds for the Introduction of High-level Talents at Jiangsu Police Institute under Grant No. JSPIGKZ. We also express our thanks to the Jiangsu Electronic Data Forensics Analysis and Research Center (No. 2019SJPT002) and the Key Laboratory of Digital Forensics of Jiangsu Public Security Department. Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

Author information

Authors and Affiliations

Department of Computer Information and Cyber Security, Jiangsu Police Institute, Nanjing, 210031, China
Jiayin Liu, Chengchen Zhuge, Qun Wang, Xiangmin Guo & Ziyuan Li
Jiangsu Electronic Data Forensics and Analysis Engineering Research Center, Jiangsu Police Institute, Nanjing, 210031, China
Jiayin Liu, Chengchen Zhuge, Qun Wang & Xiangmin Guo
Jiangsu Provincial Public Security Department Key Laboratory of Digital Forensics, Jiangsu Police Institute, Nanjing, 210031, China
Jiayin Liu, Chengchen Zhuge, Qun Wang & Xiangmin Guo

Authors

Jiayin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chengchen Zhuge
View author publications
You can also search for this author in PubMed Google Scholar
Qun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangmin Guo
View author publications
You can also search for this author in PubMed Google Scholar
Ziyuan Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Xingming Sun
Nanjing University of Information Science and Technology, Nanjing, China
Xiaorui Zhang
Jinan University, Guangzhou, China
Zhihua Xia
Purdue University, West Lafayette, IN, USA
Elisa Bertino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Zhuge, C., Wang, Q., Guo, X., Li, Z. (2021). Imbalance Malware Classification by Decoupling Representation and Classifier. In: Sun, X., Zhang, X., Xia, Z., Bertino, E. (eds) Advances in Artificial Intelligence and Security. ICAIS 2021. Communications in Computer and Information Science, vol 1424. Springer, Cham. https://doi.org/10.1007/978-3-030-78621-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-78621-2_7
Published: 29 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78620-5
Online ISBN: 978-3-030-78621-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Imbalance Malware Classification by Decoupling Representation and Classifier

Abstract

Access this chapter

Similar content being viewed by others

A Generalized Unknown Malware Classification

Malware Classification Based on Semi-Supervised Learning

Cost-Effective Malware Classification Based on Deep Active Learning

References

Acknowledgement

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Imbalance Malware Classification by Decoupling Representation and Classifier

Abstract

Access this chapter

Similar content being viewed by others

A Generalized Unknown Malware Classification

Malware Classification Based on Semi-Supervised Learning

Cost-Effective Malware Classification Based on Deep Active Learning

References

Acknowledgement

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation