Abstract
Malware classification continues to be exceedingly difficult due to the exponential growth in the number and variants of malicious files. It is crucial to classify malicious files based on their intent, activity, and threat to have a robust malware protection and post-attack recovery system in place. This paper proposes a novel deep learning-based model, S-DCNN, to classify malware binary files into their respective malware families efficiently. S-DCNN uses the image-based representation of the malware binaries and leverages the concepts of transfer learning and ensemble learning. The model incorporates three deep convolutional neural networks, namely ResNet50, Xception, and EfficientNet-B4. The ensemble technique is used to combine these component models’ predictions and a multilayered perceptron is used as a meta classifier. The ensemble technique fuses the diverse knowledge of the component models, resulting in high generalizability and low variance of the S-DCNN. Further, it eliminates the use of feature engineering, reverse engineering, disassembly, and other domain-specific techniques earlier used for malware classification. To establish S-DCNN’s robustness and generalizability, the performance of proposed model is evaluated on the Malimg dataset, a dataset collected from VirusShare, and packed malware dataset counterparts of both Malimg and VirusShare datasets. The proposed method achieves a state-of-the-art 10-fold accuracy of 99.43% on the Malimg dataset and an accuracy of 99.65% on the VirusShare dataset.
Similar content being viewed by others
References
Alsulami B, Mancoridis S (2018) Behavioral malware classification using convolutional recurrent neural networks. In: 2018 13th international conference on malicious and unwanted software (MALWARE), pp 103–111. https://doi.org/10.1109/MALWARE.2018.8659358
Beek C, Dunton T, Fokker J, Grobman S, Hux T, Polzer T, Lopez MR, Roccia T, Saavedra-Morales J, Samani R, Sherstobitof R (2019) Accessed on December 27, 2020 Mcafee labs threats report. https://www.mcafee.com/enterprise/en-us/assets/reports/rp-quarterly-threats-aug-2019.pdf
Bhowmik A, Kumar S, Bhat N (2019) Eye disease prediction from optical coherence tomography images with transfer learning. In: International conference on engineering applications of neural networks. Springer, pp 104–114
Bhowmik A, Kumar S, Bhat N (2021) Evolution of automatic visual description techniques-a methodological survey. Multimed Tools Appl, 1–45
Çayır A, Ünal U, Dağ H (2021) Random capsnet forest model for imbalanced malware type classification task. Comput Secur 102:102133. https://doi.org/10.1016/j.cose.2020.102133
Chaudhary P, Gupta DK, Singh S (2021) Outcome prediction of patients for different stages of sepsis using machine learning models. In: Advances in communication and computational technology. Springer, Singapore, pp 1085–1098
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1800–1807. https://doi.org/10.1109/CVPR.2017.195
Cui Z, Xue F, Cai X, Cao Y, ge Wang G, Chen J (2018) Detection of malicious code variants based on deep learning. IEEE Trans Industr Inform 14(7):3187–3196. https://doi.org/10.1109/tii.2018.2822680
D’Angelo G, Ficco M, Palmieri F (2021) Association rule-based malware classification using common subsequences of api calls. Appl Soft Comput 105:107234. https://doi.org/10.1016/j.asoc.2021.107234
Gao X, Hu C, Shan C, Liu B, Niu Z, Xie H (2020) Malware classification for the cloud via semi-supervised transfer learning. J Inform Secur Applic 55:102661. https://doi.org/10.1016/j.jisa.2020.102661
Gibert D, Mateu C, Planes J, Vicens R (2018) Using convolutional neural networks for classification of malware represented as images. J Comput Virol Hack Techniques 15(1):15–28. https://doi.org/10.1007/s11416-018-0323-0
Gibert D, Mateu C, Planes J (2020) Hydra: a multimodal deep learning framework for malware classification. Comput Secur 95:101873. https://doi.org/10.1016/j.cose.2020.101873
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Jain M, Andreopoulos W, Stamp M (2020) Convolutional neural networks and extreme learning machines for malware classification. J Comput Virol Hack Techniques 16(3):229–244
Kalash M, Rochan M, Mohammed N, Bruce NDB, Wang Y, Iqbal F (2018) Malware classification with deep convolutional neural networks. In: 2018 9th IFIP international conference on new technologies, mobility and security (NTMS), pp 1–5. https://doi.org/10.1109/NTMS.2018.8328749
Kaspersky (2020) Accessed on December 27, 2020 Protecting your personal data online at every point. https://media.kasperskydaily.com/wp-content/uploads/sites/92/2020/01/27103216/International-Privacy-Day-2020-Kaspersky-report.pdf
Katyal S, Kumar S, Sakhuja R, Gupta S (2018) Object detection in foggy conditions by fusion of saliency map and yolo. In: In 2018 12th international conference on sensing technology (ICST), pp 154–159. https://doi.org/10.1109/ICSensT.2018.8603632
Kaur G, Singh S, Rani R, Kumar R, Malik A (2021) High-quality reversible data hiding scheme using sorting and enhanced pairwise pee. IET Image Processing. https://doi.org/10.1049/ipr2.12212
Kolosnjaji B, Zarras A, Webster G, Eckert C (2016) Deep learning for classification of malware system call sequences. In: AI 2016: advances in artificial intelligence. Springer International Publishing, pp 137–149. https://doi.org/10.1007/978-3-319-50127-7_11
Kumar M, Gupta DK, Singh S (2021) Extreme event forecasting using machine learning models. In: Advances in communication and computational technology. Springer, Singapore, pp 1503–1514
Kumar N, Kumar R, Malik A, Singh S (2021) Low bandwidth data hiding for multimedia systems based on bit redundancy. Multimed Tools Appl, 1–19
Kumar R, Chand S, Singh S (2019) An optimal high capacity reversible data hiding scheme using move to front coding for lzw codes. Multimed Tools Appl 78(16):22977–23001
Liu L, Wang B, Yu B, Zhong Q (2017) Automatic malware classification and new malware detection using machine learning. Front Inform Technol Electron Eng 18:1336–1347
Malik A, Kumar R, Singh S (2018) A new image steganography technique based on pixel intensity and similarity in secret message. In: International Conference on Advances in Computing, Communication Control and Networking (ICACCCN). IEEE, pp 828–831
Moskovitch R, Feher C, Tzachar N, Berger E, Gitelman M, Dolev S, Elovici Y (2008) Unknown malcode detection using OPCODE representation. In: Intelligence and security informatics. Springer, Berlin, pp 204–215. https://doi.org/10.1007/978-3-540-89900-6_21
Naeem H (2019) Detection of malicious activities in internet of things environment based on binary visualization and machine intelligence. Wirel Pers Commun 108 (4):2609–2629. https://doi.org/10.1007/s11277-019-06540-6
Naeem H, Guo B, Naeem MR (2018) A light-weight malware static visual analysis for iot infrastructure. In: 2018 International conference on artificial intelligence and big data (ICAIBD), pp 240–244. https://doi.org/10.1109/ICAIBD.2018.8396202
Naeem H, Guo B, Naeem MR, Ullah F, Aldabbas H, Javed MS (2019) Identification of malicious code variants based on image visualization. Comput Electr Eng 76:225–237. https://doi.org/10.1016/j.compeleceng.2019.03.015
Narayanan BN, Djaneye-Boundjou O, Kebede TM (2016) Performance analysis of machine learning and pattern recognition algorithms for malware classification. In: 2016 IEEE national aerospace and electronics conference (NAECON) and ohio innovation summit (OIS), pp 338–342. https://doi.org/10.1109/NAECON.2016.7856826
Nataraj L, Karthikeyan S, Jacob G, Manjunath BS (2011) Malware images: visualization and automatic classification. In: the 8th International symposium on visualization for cyber security. Association for Computing Machinery, New York, NY, USA, VizSec ’11. https://doi.org/10.1145/2016904.2016908
Nataraj L, Yegneswaran V, Porras P, Zhang J (2011) A comparative assessment of malware classification using binary texture analysis and dynamic analysis. In: Proceedings of the 4th ACM workshop on security and artificial intelligence. Association for Computing Machinery, New York, NY, USA, AISec ’11, pp 21–30. https://doi.org/10.1145/2046684.2046689
Nisa M, Shah JH, Kanwal S, Raza M, Khan MA, Damaševičius R, Blažauskas T (2020) Hybrid malware classification method using segmentation-based fractal texture analysis and deep convolution neural network features. Appl Sci 10(14):4966. https://doi.org/10.3390/app10144966
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Pascanu R, Stokes JW, Sanossian H, Marinescu M, Thomas A (2015) Malware classification with recurrent networks. In: IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 1916–1920. https://doi.org/10.1109/ICASSP.2015.7178304
Saadat S, Raymond VJ (2020) Malware classification using CNN-XGBoost model. In: Artificial intelligence techniques for advanced computing applications. https://doi.org/10.1007/978-981-15-5329-5_19. Springer, Singapore, pp 191–202
Saxe J, Berlin K (2015) Deep neural network based malware detection using two dimensional binary program features. In: 2015 10th international conference on malicious and unwanted software (MALWARE), pp 11–20. https://doi.org/10.1109/MALWARE.2015.7413680
Saxena A, Gupta DK, Singh S (2021) An animal detection and collision avoidance system using deep learning. In: Advances in communication and computational technology. Springer, Singapore, pp 1069–1084
Schultz MG, Eskin E, Zadok F, Stolfo SJ (2001) Data mining methods for detection of new malicious executables. In: Proceedings 200 IEEE symposium on security and privacy S P 2001, pp 38–49. https://doi.org/10.1109/SECPRI.2001.924286
Tan M, Le Q (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th international conference on machine learning, PMLR, proceedings of machine learning research, vol 97, pp 6105–6114
Vasan D, Alazab M, Wassan S, Naeem H, Safaei B, Zheng Q (2020) Imcfn: image-based malware classification using fine-tuned convolutional neural network architecture. Comput Netw 171:107138. https://doi.org/10.1016/j.comnet.2020.107138
Vasan D, Alazab M, Wassan S, Safaei B, Zheng Q (2020) Image-based malware classification using ensemble of cnn architectures (imcec). Comput Secur 92:101748. https://doi.org/10.1016/j.cose.2020.101748
Venkatraman S, Alazab M, Vinayakumar R (2019) A hybrid deep learning image-based analysis for effective malware detection. J Inform Secur Applic 47:377–389. https://doi.org/10.1016/j.jisa.2019.06.006
Verma V, Muttoo SK, Singh VB (2020) Multiclass malware classification via first- and second-order texture statistics. Comput Secur 97:101895. https://doi.org/10.1016/j.cose.2020.101895
Yuan B, Wang J, Liu D, Guo W, Wu P, Bao X (2020) Byte-level malware classification based on markov images and deep learning. Comput Secur 92:101740. https://doi.org/10.1016/j.cose.2020.101740
Zhang H, Xiao X, Mercaldo F, Ni S, Martinelli F, Sangaiah AK (2019) Classification of ransomware families with machine learning based on n-gram of opcodes. Futur Gener Comput Syst 90:211–221. https://doi.org/10.1016/j.future.2018.07.052
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Parihar, A.S., Kumar, S. & Khosla, S. S-DCNN: stacked deep convolutional neural networks for malware classification. Multimed Tools Appl 81, 30997–31015 (2022). https://doi.org/10.1007/s11042-022-12615-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12615-7