Abstract
This paper attempted to introduce a deep learning-based model for the classification of malicious software (Malware). Malware is growing exponentially every year and malware writers try to evade the antivirus software by producing polymorphic and metamorphic malware. Most antiviruses are based on signature detection which is not sufficient against the new generation of malware. For a solution against malicious software, antivirus vendors started to use Machine Learning approaches which had a positive impact on malware detection and classification. Recently, Deep Learning algorithms and specifically Convolutional Neural Networks (CNN) caught more attraction for malware classification and it is the best deep learning algorithm for extracting features from images. By integrating the CNN with Gradient Boosting (XG-Boost) algorithm we can have a powerful model to classify malware images into their classes or families. The input source for the model is the Malimg dataset [1] which is an open collection of already converted malware to a grayscale image. There are many papers used CNN-SVM, CNN-Softmax and other models for malware image classification and they got good accuracies, but this paper proposed to used CNN-XGBoost model and achieve more accuracy than previously used algorithms for malware classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Malimg Dataset. https://www.kaggle.com/c/malware_classification/discussion/73433. Accessed 2019
Malware statistic. https://www.av-test.org/en/statistics/malware/. Accessed 31 Oct 2019
Internet security threat.https://resource.elq.symantec.com/e/f2ISTR_24_2019_April_en.pdf, report 2019, volume 24
Ren X et al. (2017) A novel image classification method with CNN-XGBoost model. In: IWDW
Nataraj L, Karthikeyan S, Jacob G, Manjunath B (2011) Malware images: visualization and automatic classification. In: Proceedings of the 8th international symposium on visualization for cyber security. ACM, p 4
Siddiqui M, Wang MC, Lee J (2008) A survey of data mining techniques for malware detection using file features. In: Proceedings of the 46th annual southeast regional conference on XX. ACM, pp 509–510
Drew J, Moore T, Hahsler M (2016) Polymorphic malware detection using sequence classification methods. In: Security and privacy workshops. IEEE, pp 81–87
Microsoft malware classification challenge (big 2015) (2017) https://www.kaggle.com/c/malware-classification. Accessed 30 Sept 2019
Microsoft malware classification challenge (big 2015) first place team: Say no to overfitting. http://blog.kaggle.com/2015/05/26/. Accessed 20 Nov 2019
Ahmadi M, Ulyanov D, Semenov S, Trofimov M, Giacinto G Novel feature extraction, selection and fusion for effective malware family classification. In: Proceedings of the sixth ACM conference
Gibert D (2016) Convolutional neural networks for malware classification. Universitat de Barcelona
Cui Z, Xue F (2018) Detection of malicious code variants based on deep learning. IEEE Trans Ind Informat 14(7)
Kabanga EK, Kim CH (2018) Malware images classification using convolutional neural network. J Comput Commun 6:153–158. https://doi.org/10.4236/jcc.2018.61016
Elleuch M, Maalej R, Kherallah M (2016) A new design based-SVM of the CNN classifier architecture with dropout for offline Arabic handwritten recognition. Proc Comput Sci 80:1712–1723
Intro to convolutional neural networks (2019) https://web.stanford.edu/class/cs231a/lectures/intro_cnn
Lin M, Chen Q, Yan S (2014) Network in network. In: ICLR
A comparison of different classifiers’ accuracy & performance for high-dimensional data. https://www.freecodecamp.org/news/multi-class-classification-with-sci-kit-learn-xgboost-a-case-study-using-brainwave-data-363d7fca5f69/. Published on 9 May 2019, accessed on 20 Oct 2019
A comprehensive guide to boosting machine learning algorithms. https://www.edureka.co/blog/boosting-machine-learning/, retrieved at April 2020
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Saadat, S., Joseph Raymond, V. (2021). Malware Classification Using CNN-XGBoost Model. In: Hemanth, D., Vadivu, G., Sangeetha, M., Balas, V. (eds) Artificial Intelligence Techniques for Advanced Computing Applications. Lecture Notes in Networks and Systems, vol 130. Springer, Singapore. https://doi.org/10.1007/978-981-15-5329-5_19
Download citation
DOI: https://doi.org/10.1007/978-981-15-5329-5_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5328-8
Online ISBN: 978-981-15-5329-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)