Skip to main content

Malware Classification Using CNN-XGBoost Model

  • Conference paper
  • First Online:
Artificial Intelligence Techniques for Advanced Computing Applications

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 130))

Abstract

This paper attempted to introduce a deep learning-based model for the classification of malicious software (Malware). Malware is growing exponentially every year and malware writers try to evade the antivirus software by producing polymorphic and metamorphic malware. Most antiviruses are based on signature detection which is not sufficient against the new generation of malware. For a solution against malicious software, antivirus vendors started to use Machine Learning approaches which had a positive impact on malware detection and classification. Recently, Deep Learning algorithms and specifically Convolutional Neural Networks (CNN) caught more attraction for malware classification and it is the best deep learning algorithm for extracting features from images. By integrating the CNN with Gradient Boosting (XG-Boost) algorithm we can have a powerful model to classify malware images into their classes or families. The input source for the model is the Malimg dataset [1] which is an open collection of already converted malware to a grayscale image. There are many papers used CNN-SVM, CNN-Softmax and other models for malware image classification and they got good accuracies, but this paper proposed to used CNN-XGBoost model and achieve more accuracy than previously used algorithms for malware classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Malimg Dataset. https://www.kaggle.com/c/malware_classification/discussion/73433. Accessed 2019

  2. Malware statistic. https://www.av-test.org/en/statistics/malware/. Accessed 31 Oct 2019

  3. Internet security threat.https://resource.elq.symantec.com/e/f2ISTR_24_2019_April_en.pdf, report 2019, volume 24

  4. Ren X et al. (2017) A novel image classification method with CNN-XGBoost model. In: IWDW

    Google Scholar 

  5. Nataraj L, Karthikeyan S, Jacob G, Manjunath B (2011) Malware images: visualization and automatic classification. In: Proceedings of the 8th international symposium on visualization for cyber security. ACM, p 4

    Google Scholar 

  6. Siddiqui M, Wang MC, Lee J (2008) A survey of data mining techniques for malware detection using file features. In: Proceedings of the 46th annual southeast regional conference on XX. ACM, pp 509–510

    Google Scholar 

  7. Drew J, Moore T, Hahsler M (2016) Polymorphic malware detection using sequence classification methods. In: Security and privacy workshops. IEEE, pp 81–87

    Google Scholar 

  8. Microsoft malware classification challenge (big 2015) (2017) https://www.kaggle.com/c/malware-classification. Accessed 30 Sept 2019

  9. Microsoft malware classification challenge (big 2015) first place team: Say no to overfitting. http://blog.kaggle.com/2015/05/26/. Accessed 20 Nov 2019

  10. Ahmadi M, Ulyanov D, Semenov S, Trofimov M, Giacinto G Novel feature extraction, selection and fusion for effective malware family classification. In: Proceedings of the sixth ACM conference

    Google Scholar 

  11. Gibert D (2016) Convolutional neural networks for malware classification. Universitat de Barcelona

    Google Scholar 

  12. Cui Z, Xue F (2018) Detection of malicious code variants based on deep learning. IEEE Trans Ind Informat 14(7)

    Google Scholar 

  13. Kabanga EK, Kim CH (2018) Malware images classification using convolutional neural network. J Comput Commun 6:153–158. https://doi.org/10.4236/jcc.2018.61016

    Article  Google Scholar 

  14. Elleuch M, Maalej R, Kherallah M (2016) A new design based-SVM of the CNN classifier architecture with dropout for offline Arabic handwritten recognition. Proc Comput Sci 80:1712–1723

    Article  Google Scholar 

  15. Intro to convolutional neural networks (2019) https://web.stanford.edu/class/cs231a/lectures/intro_cnn

  16. Lin M, Chen Q, Yan S (2014) Network in network. In: ICLR

    Google Scholar 

  17. A comparison of different classifiers’ accuracy & performance for high-dimensional data. https://www.freecodecamp.org/news/multi-class-classification-with-sci-kit-learn-xgboost-a-case-study-using-brainwave-data-363d7fca5f69/. Published on 9 May 2019, accessed on 20 Oct 2019

  18. A comprehensive guide to boosting machine learning algorithms. https://www.edureka.co/blog/boosting-machine-learning/, retrieved at April 2020

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sumaya Saadat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saadat, S., Joseph Raymond, V. (2021). Malware Classification Using CNN-XGBoost Model. In: Hemanth, D., Vadivu, G., Sangeetha, M., Balas, V. (eds) Artificial Intelligence Techniques for Advanced Computing Applications. Lecture Notes in Networks and Systems, vol 130. Springer, Singapore. https://doi.org/10.1007/978-981-15-5329-5_19

Download citation

Publish with us

Policies and ethics