Abstract
Context
Technological advances have led to a tremendous increase in complexity and volume of specialized malware, affecting computational devices across the globe. Along with malware targeting Windows devices, IoT devices having lesser computational power, have also been affected by malware attacks in the recent past. Due to a scarcity of updated malware datasets, malware recognition and classification has become trickier, particularly in IoT environments where malware samples are limited and scarce. Identifying a malware family can reveal the underlying intent of malware and traditional machine learning algorithms have performed well in this area. However, since such methods necessitate a large amount of feature engineering, deep learning algorithms for malware recognition and classification have been developed. In particular, the malware visualization-based approaches, which have shown decent success in the past have scope of improvement, which has been exploited in the current study.
Objectives
The current work aims at utilizing malware images (grayscale, RGB, markov) and deep CNNs for effective Windows and IoT malware recognition and classification using traditional learning and transfer learning approaches.
Methods and Design
First, grayscale, RGB and markov images were created from malware binaries. In particular, the idea of markov image generation by using markov probability matrix is to retain the global statistics of malware bytes which are generally lost during image transformation operations. A Gabor filter-based approach is utilized to extract textures and then a custom-built deep CNN and pretrained Xception CNN trained on 1.5 million images from ImageNet dataset, which is fine-tuned for malware images are employed for classifying malware images into families.
Results and Conclusions
To assess the effectiveness of the suggested framework, two public benchmark Windows malware image datasets, one custom built Windows malware image dataset and one custom built IoT malware image dataset were utilized. In particular, the methods demonstrate excellent classification results for the 500 GB Microsoft Malware Challenge dataset. A comparison of the suggested solutions with state-of-the-art methods clearly indicates the effectiveness and low computational cost of our malware recognition and classification solution.
Similar content being viewed by others
Data Availability
Microsoft (Ronen et al., 2018) dataset https://www.kaggle.com/c/malware-classification. Malimg (Nataraj et al., 2011) dataset https://www.kaggle.com/datasets/keerthicheepurupalli/malimg-dataset9010. Custom Windows malware dataset sources (https://virusshare.com/, https://github.com/ytisf/theZoo, https://vx-underground.org/archive/VxHeaven/index.html). Custom IoT malware dataset sources (https://vx-underground.org/archive/VxHeaven/index.html, https://github.com/ytisf/theZoo). Malware can cause damage to the computing environments therefore caution must be taken before downloading malware.
Code Availability
Code is available on request at https://forms.gle/mp9GihTmsAzAUNpT7.
References
Amer, E., & Zelinka, I. (2020). A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence. Computers & Security, 92, 101760. https://doi.org/10.1016/j.cose.2020.101760
Amin, M., Tanveer, T. A., Tehseen, M., Khan, M., Khan, F. A., & Anwar, S. (2020). Static malware detection and attribution in android byte-code through an end-to-end deep system. Future Generation Computer Systems, 102, 112–126. https://doi.org/10.1016/j.future.2019.07.070
Amin, M., Shehwar, D., Ullah, A., Guarda, T., Tanveer, T. A., & Anwar, S. (2020). “A deep learning system for health care IoT and smartphone malware detection,” Neural Comput & Applic. https://doi.org/10.1007/s00521-020-05429-x
Anandhi, V., Vinod, P., & Menon, V. G. (2021). “Malware visualization and detection using DenseNets,” Pers Ubiquit Comput. https://doi.org/10.1007/s00779-021-01581-w.
Andresini, G., Appice, A., De Rose, L., & Malerba, D. (2021). GAN augmentation to deal with imbalance in imaging-based intrusion detection. Future Generation Computer Systems, 123, 108–127. https://doi.org/10.1016/j.future.2021.04.017
Bai, Y., Xing, Z., Ma, D., Li, X., & Feng, Z. (2021). Comparative analysis of feature representations and machine learning methods in Android family classification. Computer Networks, 184, 107639. https://doi.org/10.1016/j.comnet.2020.107639
Bakour, K., & Ünver, H. M. (2021). VisDroid: Android malware classification based on local and global image features, bag of visual words and machine learning techniques. Neural Computing and Applications, 33(8), 3133–3153. https://doi.org/10.1007/s00521-020-05195-w
Chollet, F. (2017). “Xception: Deep Learning with Depthwise Separable Convolutions,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807. https://doi.org/10.1109/CVPR.2017.195.
Dai, Y., Li, H., Qian, Y., & Lu, X. (2018). A malware classification method based on memory dump grayscale image. Digital Investigation, 27, 30–37. https://doi.org/10.1016/j.diin.2018.09.006
Darabian, H., et al. (2020). Detecting Cryptomining Malware: A Deep Learning Approach for Static and Dynamic Analysis. Journal Grid Computing, 18(2), 293–303. https://doi.org/10.1007/s10723-020-09510-6
Darem, A., Abawajy, J., Makkar, A., Alhashmi, A., & Alanazi, S. (2021). Visualization and deep-learning-based malware variant detection using OpCode-level features. Future Generation Computer Systems, 125, 314–323. https://doi.org/10.1016/j.future.2021.06.032
De Lorenzo, A., Martinelli, F., Medvet, E., Mercaldo, F., & Santone, A. (2020). Visualizing the outcome of dynamic analysis of Android malware with VizMal. Journal of Information Security and Applications, 50, 102423. https://doi.org/10.1016/j.jisa.2019.102423
Dehkordy, D. T., & Rasoolzadegan, A. (2021). A new machine learning-based method for android malware detection on imbalanced dataset. Multimedia Tools and Applications, 80(16), 24533–24554. https://doi.org/10.1007/s11042-021-10647-z
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009) “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. https://doi.org/10.1109/CVPR.2009.5206848.
Dhalaria, M., & Gandotra, E. (2020). “CSForest: an approach for imbalanced family classification of android malicious applications,” p. 13. https://doi.org/10.1007/s41870-021-00661-7.
Ding, Y., Zhang, X., Hu, J., & Xu, W. (2020). “Android malware detection method based on bytecode image.” Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-020-02196-4.
EscuderoGarcía, D., & DeCastro-García, N. (2021). Optimal feature configuration for dynamic malware detection. Computers & Security, 105, 102250. https://doi.org/10.1016/j.cose.2021.102250
Farrokhmanesh, M., & Hamzeh, A. (2019). Music classification as a new approach for malware detection. Journal of Computer Virology and Hacking Techniques, 15(2), 77–96. https://doi.org/10.1007/s11416-018-0321-2
Ganesh, M., Pednekar, P., Prabhuswamy, P., Nair, D. S., Park, Y., & Jeon, H. (2017). “CNN-Based Android Malware Detection,” in 2017 International Conference on Software Security and Assurance (ICSSA), Altoona, PA, pp. 60–65. https://doi.org/10.1109/ICSSA.2017.18.
Gibert, D., Mateu, C., Planes, J., & Vicens, R. (2019). Using convolutional neural networks for classification of malware represented as images. Journal of Computer Virology and Hacking Techniques, 15(1), 15–28. https://doi.org/10.1007/s11416-018-0323-0
Gibert, D., Mateu, C., & Planes, J. (2020). HYDRA: A multimodal deep learning framework for malware classification. Computers & Security, 95, 101873. https://doi.org/10.1016/j.cose.2020.101873
He, K., Zhang, X., Ren, S., & Sun, J. (2016). “Deep Residual Learning for Image Recognition,” pp. 770–778. Accessed: Nov. 09, 2021. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html
Jain, M., Andreopoulos, W., & Stamp, M. (2020). Convolutional neural networks and extreme learning machines for malware classification. Journal of Computer Virology and Hacking Techniques, 16(3), 229–244. https://doi.org/10.1007/s11416-020-00354-y
Li, Z., Qin, Z., Huang, K., Yang, X., & Ye, S. (2017). “Intrusion Detection Using Convolutional Neural Networks for Representation Learning.” In D. Liu, S. Xie, Y. Li, D. Zhao, & E.-S. M. El-Alfy (Eds.), Neural Information Processing, (vol. 10638, pp. 858–866). Springer International Publishing. https://doi.org/10.1007/978-3-319-70139-4_87.
Liu, L., & Wang, B. (2017). “Automatic Malware Detection Using Deep Learning Based on Static Analysis,” in Data Science, Singapore, pp. 500–507. https://doi.org/10.1007/978-981-10-6385-5_42.
“Malware Statistics & Trends Report | AV-TEST.” (2022). https://www.av-test.org/en/statistics/malware/ (accessed May 14, 2022).
Mercaldo, F., & Santone, A. (2020). Deep learning for image-based mobile malware detection. Journal of Computer Virology and Hacking Techniques, 16(2), 157–171. https://doi.org/10.1007/s11416-019-00346-7
Moti, Z., et al. (2021). Generative adversarial network to detect unseen Internet of Things malware. Ad Hoc Networks, 122, 102591. https://doi.org/10.1016/j.adhoc.2021.102591
Moti, Z., Hashemi, S., & Jahromi, A. N. (2020). “A Deep Learning-based Malware Hunting Technique to Handle Imbalanced Data,” in 2020 17th International ISC Conference on Information Security and Cryptology (ISCISC), Tehran, Iran, pp. 48–53. https://doi.org/10.1109/ISCISC51277.2020.9261913.
Naeem, H., et al. (2020). Malware detection in industrial internet of things based on hybrid image visualization and deep learning model. Ad Hoc Networks, 105, 102154. https://doi.org/10.1016/j.adhoc.2020.102154
Nataraj, L., Karthikeyan, S., Jacob, G., & Manjunath, B. S. (2011). “Malware images: visualization and automatic classification,” in Proceedings of the 8th International Symposium on Visualization for Cyber Security - VizSec ’11, Pittsburgh, Pennsylvania, pp. 1–7. https://doi.org/10.1145/2016904.2016908.
Pei, X., Yu, L., & Tian, S. (2020). AMalNet: A deep learning framework based on graph convolutional networks for malware detection. Computers & Security, 93, 101792. https://doi.org/10.1016/j.cose.2020.101792
Pundir, S., Obaidat, M. S., Wazid, M., Das, A. K., Singh, D. P., & Rodrigues, J. J. P. C. (2021). “MADP-IIME: malware attack detection protocol in IoT-enabled industrial multimedia environment using machine learning approach,” Multimedia Systems. https://doi.org/10.1007/s00530-020-00743-9.
Ren, Z., Chen, G., & Lu, W. (2020). Malware visualization methods based on deep convolution neural networks. Multimedia Tools and Applications, 79(15–16), 10975–10993. https://doi.org/10.1007/s11042-019-08310-9
Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E., & Ahmadi, M. (2018) “Microsoft Malware Classification Challenge,” arXiv:1802.10135 [cs], Accessed: Feb. 12, 2022. [Online]. Available: http://arxiv.org/abs/1802.10135
Stamp, M., Chandak, A., Wong, G., & Ye, A. (2021). “On Ensemble Learning,” arXiv:2103.12521 [cs], Accessed: Jan. 22, 2022. [Online]. Available: http://arxiv.org/abs/2103.12521
Sudhakar & Kumar, S. (2021). “MCFT-CNN: Malware classification with fine-tune convolution neural networks using traditional and transfer learning in Internet of Things.” Future Generation Computer Systems, 125, 334–351. https://doi.org/10.1016/j.future.2021.06.029.
tisf, theZoo - A Live Malware Repository. 2022. Accessed: May 14, 2022. [Online]. Available: https://github.com/ytisf/theZoo
Tuncer, T., Ertam, F., & Dogan, S. (2021). Automated malware identification method using image descriptors and singular value decomposition. Multimedia Tools and Applications, 80(7), 10881–10900. https://doi.org/10.1007/s11042-020-10317-6
Vasan, D., Alazab, M., Wassan, S., Safaei, B., & Zheng, Q. (2020a). Image-Based malware classification using ensemble of CNN architectures (IMCEC). Computers & Security, 92, 101748. https://doi.org/10.1016/j.cose.2020.101748
Vasan, D., Alazab, M., Wassan, S., Naeem, H., Safaei, B., & Zheng, Q. (2020b). IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Computer Networks, 171, 107138. https://doi.org/10.1016/j.comnet.2020.107138
Verma, V., Muttoo, S. K., & Singh, V. B. (2020). Multiclass malware classification via first- and second-order texture statistics. Computers & Security, 97, 101895. https://doi.org/10.1016/j.cose.2020.101895
“VirusShare.com.” https://virusshare.com/ (accessed May 14, 2022).
“VirusTotal - Stats.” https://www.virustotal.com/gui/stats (accessed May 14, 2022).
“vx-underground.” https://www.vx-underground.org/archive/VxHeaven/index.html (accessed May 14, 2022).
Xiao, G., Li, J., Chen, Y., & Li, K. (2020). MalFCS: An effective malware classification framework with automated feature extraction based on deep convolutional neural networks. Journal of Parallel and Distributed Computing, 141, 49–58. https://doi.org/10.1016/j.jpdc.2020.03.012
Yuan, B., Wang, J., Liu, D., Guo, W., Wu, P., & Bao, X. (2020). Byte-level malware classification based on markov images and deep learning. Computers & Security, 92, 101740. https://doi.org/10.1016/j.cose.2020.101740
Zhang, J., et al. (2021). Malware Detection Based on Multi-level and Dynamic Multi-feature Using Ensemble Learning at Hypervisor. Mobile Netw Appl, 26(4), 1668–1685. https://doi.org/10.1007/s11036-019-01503-4
Author information
Authors and Affiliations
Contributions
All authors contributed equally in this manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors state that they have no known competing financial interests or personal ties that could have appeared to affect the work reported in this study.
Consent to participate
Not Applicable.
Human and Animal Ethics
No Humans or Animals were harmed in any way.
Consent for publication
Not Applicable.
Credit authorship contribution statement
All authors contributed equally to this study.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sharma, O., Sharma, A. & Kalia, A. Windows and IoT malware visualization and classification with deep CNN and Xception CNN using Markov images. J Intell Inf Syst 60, 349–375 (2023). https://doi.org/10.1007/s10844-022-00734-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-022-00734-4