Skip to main content

Peekaboo: Hide and Seek with Malware Through Lightweight Multi-feature Based Lenient Hybrid Approach

  • Conference paper
  • First Online:
Information and Communications Security (ICICS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13407))

Included in the following conference series:

  • 1331 Accesses

Abstract

In this paper, we propose – Peekaboo – a multiple feature-based lenient hybrid analysis for malware detection and classification. Our solution uses application programming interface (API) calls and operational codes (opcodes) extracted dynamically and statically as the behavioral features, and uses Recurrent Neural Network (RNN) to model both static and dynamic malicious behaviors. Peekaboo carries out dynamic analysis for a subset of samples, and static analysis for all samples in a large corpus, leading to lenient hybrid analysis. Peekaboo novelty lies in reducing the computational overhead of dynamic analysis but also utilizes multiple features to improve the model performance, making it lightweight and suitable for real-world deployment for malware detection and classification at a large scale.

We have conducted multiple sets of experiments by training and evaluating Peekaboo on a large dataset, our results show a 99.67% binary classification (benign vs. malicious) accuracy and 96.30% multi-class classification (classifies samples into malware classes) accuracy with a FPR as low as 0.45%. In comparison with our baseline model, Peekaboo enables us to increase the accuracy for binary classification by more than 1% and 5% in the multi-class setting. In addition, we tested Peekaboo on unseen malware classes, and it improved the accuracy by almost 4% compared to our baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    VirutTotal: https://www.virustotal.com/.

  2. 2.

    Radare2 version 3.9.0: https://www.radare.org/n/radare2.html.

  3. 3.

    R2pipe version 4.0.0: https://github.com/radareorg/radare2-r2pipe.

  4. 4.

    Softpedia: https://www.softpedia.com/.

  5. 5.

    AVClass2 source code: https://github.com/malicialab/avclass.

References

  1. David, O., Netanyahu, N.S.: DeepSign: deep learning for automatic malware signature generation and classification. In: International Joint Conference on Neural Networks (IJCNN), vol. 2015, pp. 1–8 (2015)

    Google Scholar 

  2. Ye, Y., Chen, L., Hou, S., Hardy, W., Li, X.: DeepAM: a heterogeneous deep learning framework for intelligent malware detection. Knowl. Inf. Syst. 54(2), 265–285 (2017). https://doi.org/10.1007/s10115-017-1058-9

    Article  Google Scholar 

  3. Imran, M., Afzal, M.T., Qadir, M.A.: Using hidden Markov model for dynamic malware analysis: first impressions. In: 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp. 816–821 (2015)

    Google Scholar 

  4. Pranamulia, R., Asnar, Y.D., Perdana, R.S.: Profile hidden Markov model for malware classification: usage of system call sequence for malware classification. In: International Conference on Data and Software Engineering (ICoDSE), vol. 2017, pp. 1–5 (2017)

    Google Scholar 

  5. Cordonsky, I., Rosenberg, I., Sicard, G., David, E.: DeepOrigin: end-to-end deep learning for detection of new malware families. In: International Joint Conference on Neural Networks (IJCNN), vol. 2018, pp. 1–7 (2018)

    Google Scholar 

  6. Kim, J., Bu, S., Cho, S.: Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. Inf. Sci. 460, 460–461 (2018)

    Google Scholar 

  7. Kancherla, K., Mukkamala, S.: Image visualization based malware detection. In: 2013 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), pp. 40–44 (2013)

    Google Scholar 

  8. Zolotukhin, M., Hämäläinen, T.: Detection of zero-day malware based on the analysis of opcode sequences. In: 2014 IEEE 11th Consumer Communications and Networking Conference (CCNC), pp. 386–391 (2014)

    Google Scholar 

  9. Manavi, F., Hamzeh, A.: A new method for malware detection using opcode visualization. In: Artificial Intelligence and Signal Processing Conference (AISP), vol. 2017, pp. 96–102 (2017)

    Google Scholar 

  10. Yewale, A., Singh, M.: Malware detection based on opcode frequency. In: International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), vol. 2016, pp. 646–649 (2016)

    Google Scholar 

  11. Masabo, E., Kaawaase, K.S., Sansa-Otim, J., Ngubiri, J., Hanyurwimfura, D.: Improvement of malware classification using hybrid feature engineering. SN Comput. Sci. 1, 17:1–17:14 (2020)

    Google Scholar 

  12. Zhang, Y., Rong, C., Huang, Q., Wu, Y., Yang, Z., Jiang, J.: Based on multi-features and clustering ensemble method for automatic malware categorization. In: IEEE Trustcom/BigDataSE/ICESS, vol. 2017, pp. 73–82 (2017)

    Google Scholar 

  13. Zhang, J., Qin, Z., Yin, H.B., Ou, L., Zhang, K.: A feature-hybrid malware variants detection using CNN based opcode embedding and BPNN based API embedding. Comput. Secur. 84, 376–392 (2019)

    Article  Google Scholar 

  14. Duarte-Garcia, H.L., et al.: A semi-supervised learning methodology for malware categorization using weighted word embeddings. In: 2019 IEEE European Symposium on Security and Privacy Workshops, pp. 238–246 (2019)

    Google Scholar 

  15. Pascanu, R., Stokes, J.W., Sanossian, H., Marinescu, M., Thomas, A.: Malware classification with recurrent networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1916–1920 (2015)

    Google Scholar 

  16. Athiwaratkun, B., Stokes, J.W.: Malware classification with LSTM and GRU language models and a character-level CNN. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)

    Google Scholar 

  17. Elhadi, A.A., Maarof, M.A., Barry, B.I., Hentabli, H.: Enhancing the detection of metamorphic malware using call graphs. Comput. Secur. 46, 62–78 (2014)

    Article  Google Scholar 

  18. Ki, Y., Kim, E., Kim, H.K.: A novel approach to detect malware based on API call sequence analysis. Int. J. Distrib. Sens. Networks 11, 659101 (2015)

    Google Scholar 

  19. The cost of cybercrime. (2019). https://www.accenture.com/_acnmedia/PDF-96/Accenture-2019-Cost-of-Cybercrime-Study-Final.pdf#zoom=50

  20. Sebastián, S., Caballero, J.: AVclass2: massive malware tag extraction from AV labels. In: Annual Computer Security Applications Conference (2020)

    Google Scholar 

  21. Wei, J., Zou, K.: EDA: easy data augmentation techniques for boosting performance on text classification tasks. ArXiv, abs/1901.11196 (2019)

    Google Scholar 

  22. Yuan, L., Wang, Y., Thompson, P., Narayan, V., Ye, J.: Multi-source learning for joint analysis of incomplete multi-modality neuroimaging data. In: International Conference on Knowledge Discovery & Data Mining, pp. 1149–1157 (2012)

    Google Scholar 

  23. Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)

    Article  Google Scholar 

  24. Rabadi, D., Teo, S.: Advanced windows methods on malware detection and classification. In: Annual Computer Security Applications Conference (2020)

    Google Scholar 

  25. Jindal, C., Salls, C., Aghakhani, H., Long, K., Kruegel, C., Vigna, G.: Neurlux: dynamic malware analysis without feature engineering. In: Proceedings of the 35th Annual Computer Security Applications Conference (2019)

    Google Scholar 

  26. Subedi, K.P., Budhathoki, D.R., Dasgupta, D.: Forensic analysis of ransomware families using static and dynamic analysis. In: IEEE Security and Privacy Workshops (SPW), vol. 2018, pp. 180–185 (2018)

    Google Scholar 

  27. Aghakhani, H., et al.: When malware is packin’ heat. limits of machine learning classifiers based on static analysis features. In: NDSS (2020)

    Google Scholar 

  28. Kumar, N., Mukhopadhyay, S., Gupta, M., Handa, A., Shukla, S.: Malware classification using early stage behavioral analysis. In: 2019 14th Asia Joint Conference on Information Security (AsiaJCIS), pp. 16–23

    Google Scholar 

  29. Kang, B., Kim, T., Kwon, H., Choi, Y., Im, E.: Malware classification method via binary content comparison. In: RACS (2012)

    Google Scholar 

  30. Shalaginov, A., Banin, S., Dehghantanha, A., Franke, K.: Machine learning aided static malware analysis: a survey and tutorial. ArXiv, abs/1808.01201 (2018)

    Google Scholar 

  31. Egele, M., Scholte, T., Kirda, E., Krügel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. 44, 6:1–6:42 (2008)

    Google Scholar 

  32. Or-Meir, O., Nissim, N., Elovici, Y., Rokach, L.: Dynamic malware analysis in the modern era—A state of the art survey. ACM Comput. Surv. (CSUR) 52, 1–48 (2019)

    Article  Google Scholar 

  33. Sihwail, R., Omar, K., Ariffin, K.A.: A survey on malware analysis techniques: static, dynamic, p. 8. hybrid and memory analysis, Int. J. Adv. Sci. Eng. Inf. Technol. 8(4-2), 1662–1671 (2018)

    Google Scholar 

  34. Gandotra, E., Bansal, D., Sofat, S.: Malware analysis and classification: a survey. J. Inf. Secur. 5, 56–64 (2014)

    Google Scholar 

  35. Shijo, P.V., Salim, A.: Integrated static and dynamic analysis for malware detection. Procedia Comput. Sci. 46, 804–811 (2015)

    Article  Google Scholar 

  36. Islam, M., Tian, R., Batten, L., Versteeg, S.: Classification of malware based on integrated static and dynamic features. J. Network Comput. Appl. 36(2), 646–656 (2013)

    Article  Google Scholar 

  37. Vinayakumar, R., Alazab, M., Soman, K.P., Poornachandran, P., Venkatraman, S.: Robust intelligent malware detection using deep learning. IEEE Access 7, 46717–46738 (2019)

    Article  Google Scholar 

  38. Venkatraman, S., Alazab, M., Vinayakumar, R.: A hybrid deep learning image-based analysis for effective malware detection. J. Inf. Secur. Appl. 47, 377–389 (2019)

    Google Scholar 

Download references

Acknowledgment

This research is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its Corporate Laboratory@University Scheme, National University of Singapore, and Singapore Telecommunications Ltd.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingchang Liu .

Editor information

Editors and Affiliations

A Background

A Background

Static Analysis refers to the analysis of a binary file without executing it.

Dynamic Analysis refers to the analysis of a binary file by executing it in a controlled and well-monitored environment e.g., a virtual machine or sandbox.

Terminologies. In the malware analysis context, a feature often means a type of data extracted from the samples that can characterize the maliciousness. The use of this term in malware analysis is different from that in the usual machine learning setting where features represent the attribute of the observations. The term "multiple features" here refers to multiple kinds of features in the malware analysis setting. For example, Peekaboo uses API calls and opcodes as features.

Few-Shot Learning is a learning strategy that can improve the model generalization ability when the sample size is small. FSL is essential to Peekaboo when training the model on the API call dataset since we only select a small portion of the entire corpus of samples for dynamic analysis.

Multi-view Learning is a learning strategy that deals with data consisting of different views. A view can be a set of features obtained from one domain. In our setting, one view is the API calls collected during dynamic analysis and the other is the opcodes from static analysis. Multi-view learning aims to integrate the data for model training or use custom learning strategies to teach learners to consume data from different views to perform well on a common task. Partial multi-view learning is a task that specializes in handling missing views.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, M., Sachidananda, V., Peng, H., Patil, R., Muneeswaran, S., Gurusamy, M. (2022). Peekaboo: Hide and Seek with Malware Through Lightweight Multi-feature Based Lenient Hybrid Approach. In: Alcaraz, C., Chen, L., Li, S., Samarati, P. (eds) Information and Communications Security. ICICS 2022. Lecture Notes in Computer Science, vol 13407. Springer, Cham. https://doi.org/10.1007/978-3-031-15777-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15777-6_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15776-9

  • Online ISBN: 978-3-031-15777-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics