Malware Classification Using CNN-XGBoost Model

Saadat, Sumaya; Joseph Raymond, V.

doi:10.1007/978-981-15-5329-5_19

Sumaya Saadat¹³ &
V. Joseph Raymond¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 130))

903 Accesses
5 Citations

Abstract

This paper attempted to introduce a deep learning-based model for the classification of malicious software (Malware). Malware is growing exponentially every year and malware writers try to evade the antivirus software by producing polymorphic and metamorphic malware. Most antiviruses are based on signature detection which is not sufficient against the new generation of malware. For a solution against malicious software, antivirus vendors started to use Machine Learning approaches which had a positive impact on malware detection and classification. Recently, Deep Learning algorithms and specifically Convolutional Neural Networks (CNN) caught more attraction for malware classification and it is the best deep learning algorithm for extracting features from images. By integrating the CNN with Gradient Boosting (XG-Boost) algorithm we can have a powerful model to classify malware images into their classes or families. The input source for the model is the Malimg dataset [1] which is an open collection of already converted malware to a grayscale image. There are many papers used CNN-SVM, CNN-Softmax and other models for malware image classification and they got good accuracies, but this paper proposed to used CNN-XGBoost model and achieve more accuracy than previously used algorithms for malware classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Malimg Dataset. https://www.kaggle.com/c/malware_classification/discussion/73433. Accessed 2019
Malware statistic. https://www.av-test.org/en/statistics/malware/. Accessed 31 Oct 2019
Internet security threat.https://resource.elq.symantec.com/e/f2ISTR_24_2019_April_en.pdf, report 2019, volume 24
Ren X et al. (2017) A novel image classification method with CNN-XGBoost model. In: IWDW
Google Scholar
Nataraj L, Karthikeyan S, Jacob G, Manjunath B (2011) Malware images: visualization and automatic classification. In: Proceedings of the 8th international symposium on visualization for cyber security. ACM, p 4
Google Scholar
Siddiqui M, Wang MC, Lee J (2008) A survey of data mining techniques for malware detection using file features. In: Proceedings of the 46th annual southeast regional conference on XX. ACM, pp 509–510
Google Scholar
Drew J, Moore T, Hahsler M (2016) Polymorphic malware detection using sequence classification methods. In: Security and privacy workshops. IEEE, pp 81–87
Google Scholar
Microsoft malware classification challenge (big 2015) (2017) https://www.kaggle.com/c/malware-classification. Accessed 30 Sept 2019
Microsoft malware classification challenge (big 2015) first place team: Say no to overfitting. http://blog.kaggle.com/2015/05/26/. Accessed 20 Nov 2019
Ahmadi M, Ulyanov D, Semenov S, Trofimov M, Giacinto G Novel feature extraction, selection and fusion for effective malware family classification. In: Proceedings of the sixth ACM conference
Google Scholar
Gibert D (2016) Convolutional neural networks for malware classification. Universitat de Barcelona
Google Scholar
Cui Z, Xue F (2018) Detection of malicious code variants based on deep learning. IEEE Trans Ind Informat 14(7)
Google Scholar
Kabanga EK, Kim CH (2018) Malware images classification using convolutional neural network. J Comput Commun 6:153–158. https://doi.org/10.4236/jcc.2018.61016
Article Google Scholar
Elleuch M, Maalej R, Kherallah M (2016) A new design based-SVM of the CNN classifier architecture with dropout for offline Arabic handwritten recognition. Proc Comput Sci 80:1712–1723
Article Google Scholar
Intro to convolutional neural networks (2019) https://web.stanford.edu/class/cs231a/lectures/intro_cnn
Lin M, Chen Q, Yan S (2014) Network in network. In: ICLR
Google Scholar
A comparison of different classifiers’ accuracy & performance for high-dimensional data. https://www.freecodecamp.org/news/multi-class-classification-with-sci-kit-learn-xgboost-a-case-study-using-brainwave-data-363d7fca5f69/. Published on 9 May 2019, accessed on 20 Oct 2019
A comprehensive guide to boosting machine learning algorithms. https://www.edureka.co/blog/boosting-machine-learning/, retrieved at April 2020

Download references

Author information

Authors and Affiliations

Department of Information Security and Cyber Forensics, SRM Institute of Science and Technology, Chennai, India
Sumaya Saadat & V. Joseph Raymond

Authors

Sumaya Saadat
View author publications
You can also search for this author in PubMed Google Scholar
V. Joseph Raymond
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sumaya Saadat .

Editor information

Editors and Affiliations

Karunya Institute of Technology and Sciences, Coimbatore, India
D. Jude Hemanth
SRM Institute of Science and Technology, Chennai, India
G. Vadivu
SRM Institute of Science and Technology, Chennai, India
M. Sangeetha
Aurel Vlaicu University of Arad, Arad, Romania
Valentina Emilia Balas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saadat, S., Joseph Raymond, V. (2021). Malware Classification Using CNN-XGBoost Model. In: Hemanth, D., Vadivu, G., Sangeetha, M., Balas, V. (eds) Artificial Intelligence Techniques for Advanced Computing Applications. Lecture Notes in Networks and Systems, vol 130. Springer, Singapore. https://doi.org/10.1007/978-981-15-5329-5_19

Download citation

DOI: https://doi.org/10.1007/978-981-15-5329-5_19
Published: 24 July 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5328-8
Online ISBN: 978-981-15-5329-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics