Abstract
In this paper, we propose – Peekaboo – a multiple feature-based lenient hybrid analysis for malware detection and classification. Our solution uses application programming interface (API) calls and operational codes (opcodes) extracted dynamically and statically as the behavioral features, and uses Recurrent Neural Network (RNN) to model both static and dynamic malicious behaviors. Peekaboo carries out dynamic analysis for a subset of samples, and static analysis for all samples in a large corpus, leading to lenient hybrid analysis. Peekaboo novelty lies in reducing the computational overhead of dynamic analysis but also utilizes multiple features to improve the model performance, making it lightweight and suitable for real-world deployment for malware detection and classification at a large scale.
We have conducted multiple sets of experiments by training and evaluating Peekaboo on a large dataset, our results show a 99.67% binary classification (benign vs. malicious) accuracy and 96.30% multi-class classification (classifies samples into malware classes) accuracy with a FPR as low as 0.45%. In comparison with our baseline model, Peekaboo enables us to increase the accuracy for binary classification by more than 1% and 5% in the multi-class setting. In addition, we tested Peekaboo on unseen malware classes, and it improved the accuracy by almost 4% compared to our baseline.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
VirutTotal: https://www.virustotal.com/.
- 2.
Radare2 version 3.9.0: https://www.radare.org/n/radare2.html.
- 3.
R2pipe version 4.0.0: https://github.com/radareorg/radare2-r2pipe.
- 4.
Softpedia: https://www.softpedia.com/.
- 5.
AVClass2 source code: https://github.com/malicialab/avclass.
References
David, O., Netanyahu, N.S.: DeepSign: deep learning for automatic malware signature generation and classification. In: International Joint Conference on Neural Networks (IJCNN), vol. 2015, pp. 1–8 (2015)
Ye, Y., Chen, L., Hou, S., Hardy, W., Li, X.: DeepAM: a heterogeneous deep learning framework for intelligent malware detection. Knowl. Inf. Syst. 54(2), 265–285 (2017). https://doi.org/10.1007/s10115-017-1058-9
Imran, M., Afzal, M.T., Qadir, M.A.: Using hidden Markov model for dynamic malware analysis: first impressions. In: 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp. 816–821 (2015)
Pranamulia, R., Asnar, Y.D., Perdana, R.S.: Profile hidden Markov model for malware classification: usage of system call sequence for malware classification. In: International Conference on Data and Software Engineering (ICoDSE), vol. 2017, pp. 1–5 (2017)
Cordonsky, I., Rosenberg, I., Sicard, G., David, E.: DeepOrigin: end-to-end deep learning for detection of new malware families. In: International Joint Conference on Neural Networks (IJCNN), vol. 2018, pp. 1–7 (2018)
Kim, J., Bu, S., Cho, S.: Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. Inf. Sci. 460, 460–461 (2018)
Kancherla, K., Mukkamala, S.: Image visualization based malware detection. In: 2013 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), pp. 40–44 (2013)
Zolotukhin, M., Hämäläinen, T.: Detection of zero-day malware based on the analysis of opcode sequences. In: 2014 IEEE 11th Consumer Communications and Networking Conference (CCNC), pp. 386–391 (2014)
Manavi, F., Hamzeh, A.: A new method for malware detection using opcode visualization. In: Artificial Intelligence and Signal Processing Conference (AISP), vol. 2017, pp. 96–102 (2017)
Yewale, A., Singh, M.: Malware detection based on opcode frequency. In: International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), vol. 2016, pp. 646–649 (2016)
Masabo, E., Kaawaase, K.S., Sansa-Otim, J., Ngubiri, J., Hanyurwimfura, D.: Improvement of malware classification using hybrid feature engineering. SN Comput. Sci. 1, 17:1–17:14 (2020)
Zhang, Y., Rong, C., Huang, Q., Wu, Y., Yang, Z., Jiang, J.: Based on multi-features and clustering ensemble method for automatic malware categorization. In: IEEE Trustcom/BigDataSE/ICESS, vol. 2017, pp. 73–82 (2017)
Zhang, J., Qin, Z., Yin, H.B., Ou, L., Zhang, K.: A feature-hybrid malware variants detection using CNN based opcode embedding and BPNN based API embedding. Comput. Secur. 84, 376–392 (2019)
Duarte-Garcia, H.L., et al.: A semi-supervised learning methodology for malware categorization using weighted word embeddings. In: 2019 IEEE European Symposium on Security and Privacy Workshops, pp. 238–246 (2019)
Pascanu, R., Stokes, J.W., Sanossian, H., Marinescu, M., Thomas, A.: Malware classification with recurrent networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1916–1920 (2015)
Athiwaratkun, B., Stokes, J.W.: Malware classification with LSTM and GRU language models and a character-level CNN. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)
Elhadi, A.A., Maarof, M.A., Barry, B.I., Hentabli, H.: Enhancing the detection of metamorphic malware using call graphs. Comput. Secur. 46, 62–78 (2014)
Ki, Y., Kim, E., Kim, H.K.: A novel approach to detect malware based on API call sequence analysis. Int. J. Distrib. Sens. Networks 11, 659101 (2015)
The cost of cybercrime. (2019). https://www.accenture.com/_acnmedia/PDF-96/Accenture-2019-Cost-of-Cybercrime-Study-Final.pdf#zoom=50
Sebastián, S., Caballero, J.: AVclass2: massive malware tag extraction from AV labels. In: Annual Computer Security Applications Conference (2020)
Wei, J., Zou, K.: EDA: easy data augmentation techniques for boosting performance on text classification tasks. ArXiv, abs/1901.11196 (2019)
Yuan, L., Wang, Y., Thompson, P., Narayan, V., Ye, J.: Multi-source learning for joint analysis of incomplete multi-modality neuroimaging data. In: International Conference on Knowledge Discovery & Data Mining, pp. 1149–1157 (2012)
Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Rabadi, D., Teo, S.: Advanced windows methods on malware detection and classification. In: Annual Computer Security Applications Conference (2020)
Jindal, C., Salls, C., Aghakhani, H., Long, K., Kruegel, C., Vigna, G.: Neurlux: dynamic malware analysis without feature engineering. In: Proceedings of the 35th Annual Computer Security Applications Conference (2019)
Subedi, K.P., Budhathoki, D.R., Dasgupta, D.: Forensic analysis of ransomware families using static and dynamic analysis. In: IEEE Security and Privacy Workshops (SPW), vol. 2018, pp. 180–185 (2018)
Aghakhani, H., et al.: When malware is packin’ heat. limits of machine learning classifiers based on static analysis features. In: NDSS (2020)
Kumar, N., Mukhopadhyay, S., Gupta, M., Handa, A., Shukla, S.: Malware classification using early stage behavioral analysis. In: 2019 14th Asia Joint Conference on Information Security (AsiaJCIS), pp. 16–23
Kang, B., Kim, T., Kwon, H., Choi, Y., Im, E.: Malware classification method via binary content comparison. In: RACS (2012)
Shalaginov, A., Banin, S., Dehghantanha, A., Franke, K.: Machine learning aided static malware analysis: a survey and tutorial. ArXiv, abs/1808.01201 (2018)
Egele, M., Scholte, T., Kirda, E., Krügel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. 44, 6:1–6:42 (2008)
Or-Meir, O., Nissim, N., Elovici, Y., Rokach, L.: Dynamic malware analysis in the modern era—A state of the art survey. ACM Comput. Surv. (CSUR) 52, 1–48 (2019)
Sihwail, R., Omar, K., Ariffin, K.A.: A survey on malware analysis techniques: static, dynamic, p. 8. hybrid and memory analysis, Int. J. Adv. Sci. Eng. Inf. Technol. 8(4-2), 1662–1671 (2018)
Gandotra, E., Bansal, D., Sofat, S.: Malware analysis and classification: a survey. J. Inf. Secur. 5, 56–64 (2014)
Shijo, P.V., Salim, A.: Integrated static and dynamic analysis for malware detection. Procedia Comput. Sci. 46, 804–811 (2015)
Islam, M., Tian, R., Batten, L., Versteeg, S.: Classification of malware based on integrated static and dynamic features. J. Network Comput. Appl. 36(2), 646–656 (2013)
Vinayakumar, R., Alazab, M., Soman, K.P., Poornachandran, P., Venkatraman, S.: Robust intelligent malware detection using deep learning. IEEE Access 7, 46717–46738 (2019)
Venkatraman, S., Alazab, M., Vinayakumar, R.: A hybrid deep learning image-based analysis for effective malware detection. J. Inf. Secur. Appl. 47, 377–389 (2019)
Acknowledgment
This research is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its Corporate Laboratory@University Scheme, National University of Singapore, and Singapore Telecommunications Ltd.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Background
A Background
Static Analysis refers to the analysis of a binary file without executing it.
Dynamic Analysis refers to the analysis of a binary file by executing it in a controlled and well-monitored environment e.g., a virtual machine or sandbox.
Terminologies. In the malware analysis context, a feature often means a type of data extracted from the samples that can characterize the maliciousness. The use of this term in malware analysis is different from that in the usual machine learning setting where features represent the attribute of the observations. The term "multiple features" here refers to multiple kinds of features in the malware analysis setting. For example, Peekaboo uses API calls and opcodes as features.
Few-Shot Learning is a learning strategy that can improve the model generalization ability when the sample size is small. FSL is essential to Peekaboo when training the model on the API call dataset since we only select a small portion of the entire corpus of samples for dynamic analysis.
Multi-view Learning is a learning strategy that deals with data consisting of different views. A view can be a set of features obtained from one domain. In our setting, one view is the API calls collected during dynamic analysis and the other is the opcodes from static analysis. Multi-view learning aims to integrate the data for model training or use custom learning strategies to teach learners to consume data from different views to perform well on a common task. Partial multi-view learning is a task that specializes in handling missing views.
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, M., Sachidananda, V., Peng, H., Patil, R., Muneeswaran, S., Gurusamy, M. (2022). Peekaboo: Hide and Seek with Malware Through Lightweight Multi-feature Based Lenient Hybrid Approach. In: Alcaraz, C., Chen, L., Li, S., Samarati, P. (eds) Information and Communications Security. ICICS 2022. Lecture Notes in Computer Science, vol 13407. Springer, Cham. https://doi.org/10.1007/978-3-031-15777-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-15777-6_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15776-9
Online ISBN: 978-3-031-15777-6
eBook Packages: Computer ScienceComputer Science (R0)