Skip to main content
Log in

Integrating a Rule-Based Approach to Malware Detection with an LSTM-Based Feature Selection Technique

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Technology has amplified malware activity, affecting network and users. Before being forwarded to the next host, network traffic must be dynamically analysed for malware. By exploiting network vulnerabilities, attackers gain control of the system and implement their own network rules to enable malicious traffic. Yet, another recursive acronym (YARA) rules are effective string and pattern-matching malware analysis approaches. The quality and amount of YARA rules utilized in analysis determine its effectiveness. YARA rules focus on whether to activate a rule for a suspicious sample after examining its rule condition. YARA rules rely on binary conclusion on malware analysis, which may limit its use and results. Thus, the proposed approach selects malware features using the ML-based LSTM model. Rule-based traffic analysis and long-short term memory (LSTM)-based feature selection strengthen the malware detection model in the proposed approach. By comparing performance results with and without LSTM-based feature (parameter) selection, this research assesses model integrity. Due to LSTM-based feature selection, the model achieved its best accuracy of 97%, proving its suitability for malware detection on diverse datasets belonging to different network environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

This research work uses publicly available datasets that have been cited and referenced.

References

  1. Tahir R. A study on malware and malware detection techniques. Int J Edu Mgmt Engg. 2018;8(2):20. https://doi.org/10.5815/ijeme.2018.02.03.

    Article  Google Scholar 

  2. Faruk MJ, Miner P, Coughlan R, Masum M, Shahriar H, Clincy V, Cetinkaya C. Smart connected aircraft: towards security, privacy, and ethical hacking. In: International Conference on Security of Information and Networks (SIN). IEEE. 2021; p. 1–5. https://doi.org/10.1109/SIN54109.2021.9699243

  3. Zhang K. A machine learning based approach to identify SQL injection vulnerabilities. In: IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE. 2019; p. 1286–1288. https://doi.org/10.1109/ASE.2019.00164

  4. Kim MS. A Study on the Attack Index Packet Filtering Algorithm Based on Web Vulnerability. In: Big Data, cloud computing, and data science engineering. Springer; 2023, p. 145–152. https://doi.org/10.1007/978-3-031-19608-9_12

  5. Shandilya SK, Ganguli C, Izonin I, Nagar AK. Cyber attack evaluation dataset for deep packet inspection and analysis. Data Brief. 2023;46:108771. https://doi.org/10.1016/j.dib.2022.108771.

    Article  Google Scholar 

  6. Catal C, Giray G, Tekinerdogan B. Applications of deep learning for mobile malware detection: A systematic literature review. Neur Comp Appl. 2022; 1–26. https://doi.org/10.1007/s00521-021-06597-0

  7. Naik N, Jenkins P, Savage N, Yang L. Cyberthreat Hunting-Part 1: triaging ransomware using fuzzy hashing, import hashing and YARA rules. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE. 2019; p. 1–6. https://doi.org/10.1109/FUZZ-IEEE.2019.8858803

  8. Mira F, Huang W. Performance evaluation of string based malware detection methods. In: International Conference on Automation and Computing (ICAC). IEEE. 2018; p. 1–6. https://doi.org/10.23919/IConAC.2018.8749096

  9. Xiao X, Zhang S, Mercaldo F, Hu G, Sangaiah AK. Android malware detection based on system call sequences and LSTM. Multim Tls Appl. 2019;78:3979–99. https://doi.org/10.1007/s11042-017-5104-0.

    Article  Google Scholar 

  10. Zhang J. A practical logic obfuscation technique for hardware security. IEEE Trans VLSI sys. 2015;24(3):1193–7. https://doi.org/10.1109/TVLSI.2015.2437996.

    Article  Google Scholar 

  11. Cesare S, Xiang Y, Zhou W. Malwise—an effective and efficient classification system for packed and polymorphic malware. IEEE Trans Compu. 2012;62(6):1193–206. https://doi.org/10.1109/TC.2012.65.

    Article  MathSciNet  MATH  Google Scholar 

  12. Abbas MFB, Srikanthan T. Low-complexity signature-based malware detection for IoT devices. In: Applications and Techniques in Information Security: International Conference, ATIS 2017, Auckland, New Zealand, Proceedings. Springer. 2017; p. 181–189. https://doi.org/10.1007/978-981-10-5421-1_15

  13. Naik N, Jenkins P, Savage N, Yang L, Naik K, Song J. Embedding fuzzy rules with YARA rules for performance optimisation of malware analysis. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE. 2020; p. 1–7. https://doi.org/10.1109/FUZZ48607.2020.9177856

  14. Culling C. Which YARA rules rule: basic or advanced? In: GIAC (GCIA) Gold Certification and RES. 2018; p. 5500.

  15. Brengel M, Rossow C. YARIX: scalable YARA-based Malware Intelligence. In: USENIX Security Symposium. 2021. p. 3541–3558.

  16. VirusTotal. YARA in a nutshell. 2019; [Online]. https://virustotal.github.io/yara/. Accessed 3 Feb 2023.

  17. Andrade EDO, Viterbo J, Vasconcelos CN, Guérin J, Bernardini FC. A model based on lstm neural networks to identify five different types of malware. Proc Compu Sci. 2019;159:182–91. https://doi.org/10.1016/j.procs.2019.09.173.

    Article  Google Scholar 

  18. Akbar F, Hussain M, Mumtaz R, Riaz Q, Wahab AWA, Jung KH. Permissions-based detection of android malware using machine learning. Symmetry. 2022;14(4):718. https://doi.org/10.3390/sym14040718.

    Article  Google Scholar 

  19. Kambar MEZN, Esmaeilzadeh A, Kim Y, Taghva K. A survey on mobile malware detection methods using machine learning. In: IEEE Annual Computing and Communication Workshop and Conference (CCWC). IEEE. 2022; p. 0215–0221. https://doi.org/10.1109/CCWC54503.2022.9720753

  20. Yakura H, Shinozaki S, Nishimura R, Oyama Y, Sakuma J. Neural malware analysis with attention mechanism. Comp Secur. 2019;87:101592. https://doi.org/10.1016/j.cose.2019.101592.

    Article  Google Scholar 

  21. Nadeem MW, Goh HG, Ponnusamy V, Aun Y. Ddos detection in sdn usingmachine learning techniques. Comput Mater Contin. 2022;71(1):771–89. https://doi.org/10.32604/cmc.2022.021669.

    Article  Google Scholar 

  22. Mirjalili S, Mirjalili SM, Yang XS. Binary bat algorithm. Neural Comput Appl. 2014;25:663–81. https://doi.org/10.1007/s00521-013-1525-5.

    Article  Google Scholar 

  23. Nakamura RY, Pereira LA, Costa KA, Rodrigues D, Papa JP, Yang XS. BBA: a binary bat algorithm for feature selection. In: SIBGRAPI conference on graphics, patterns and images. IEEE. p. 291–297. https://doi.org/10.1109/SIBGRAPI.2012.47

  24. Alotaibi FM, Vassilakis VG. Sdn-based detection of self-propagating ransomware: the case of badrabbit. IEEE Access. 2021;9:28039–58. https://doi.org/10.1109/ACCESS.2021.3058897.

    Article  Google Scholar 

  25. Masum M, Faruk MJ, Shahriar H, Qian K, Lo D, Adnan MI. Ransomware classification and detection with machine learning algorithms. In: IEEE Annual Computing and Communication Workshop and Conference (CCWC). IEEE 2022; p. 0316–0322. DOI: https://doi.org/10.1109/CCWC54503.2022.9720869.

  26. Şahín CB, Dírí B. Robust feature selection with LSTM recurrent neural networks for artificial immune recognition system. IEEE Access. 2019;7:24165–78. https://doi.org/10.1109/ACCESS.2019.2900118.

    Article  Google Scholar 

  27. Ghanei H, Manavi F, Hamzeh A. A novel method for malware detection based on hardware events using deep neural networks. Compu Viro Hack Tech. 2021;17(4):319–31. https://doi.org/10.1007/s11416-021-00386-y.

    Article  Google Scholar 

  28. Burks R, Islam KA, Lu Y, Li J. Data augmentation with generative models for improved malware detection: A comparative study. In: IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE. 2019. p. 0660–0665. IEEE. https://doi.org/10.1109/UEMCON47517.2019.8993085.

  29. Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G. Learning under concept drift: a review. IEEE trans knowl Data Engg. 2018;31(12):2346–63. https://doi.org/10.1109/TKDE.2018.2876857.

    Article  Google Scholar 

  30. Thabtah F, Hammoud S, Kamalov F, Gonsalves A. Data imbalance in classification: experimental evaluation. Info Sci. 2020;513:429–41. https://doi.org/10.1016/j.ins.2019.11.004.

    Article  MathSciNet  Google Scholar 

  31. Fahy C, Yang S, Gongora M. Scarcity of labels in non-stationary data streams: A survey. ACM Comp Surv (CSUR). 2022;55(2):1–39. https://doi.org/10.1145/3494832.

    Article  Google Scholar 

  32. Duch W, Wieczorek T, Biesiada J, Blachnik M. Comparison of feature ranking methods based on information entropy. In: IEEE International Joint Conference on Neural Networks. IEEE. 2004. p. 1415–1419. IEEE. https://doi.org/10.1109/IJCNN.2004.1380157.

  33. Jaramillo LES. Malware detection and mitigation techniques: Lessons learned from Mirai DDOS attack. Info Sys Eng Manag. 2018;3(3):19. https://doi.org/10.20897/jisem/2655.

    Article  Google Scholar 

  34. Wireshark Tool. https://www.wireshark.org/. Accessed 27 July 2023

  35. Mukhiya SK, Ahmed U. Hands-On Exploratory Data Analysis with Python: Perform EDA techniques to understand, summarize, and investigate your data. Packt Publishing Ltd; 2020.

    Google Scholar 

  36. Vinesmsuic dataset [Available online] at: https://www.kaggle.com/code/vinesmsuic/malware-detection-using-deeplearning/data. Accessed 8 Feb 2023.

  37. Android Malware Dataset [Available online] at: https://www.kaggle.com/datasets/shashwatwork/android-malware-dataset-for-machine-learning. Accessed 8 Feb 2023.

  38. IoT Malware Dataset [Available online]: https://www.kaggle.com/datasets/efecastaneda/malware-iot-log-file. Accessed 8 Feb 2023.

  39. Elsayed MS, Le-Khac NA, Jurcut AD. InSDN: A novel SDN intrusion dataset. IEEE access. 2020;8:165263–84. https://doi.org/10.1109/ACCESS.2020.3022633.

    Article  Google Scholar 

  40. Reddy KV, Ambati SR, Reddy YS, Reddy AN. AdaBoost for Parkinson's disease detection using robust scaler and SFS from acoustic features. In: Smart Technologies, Communication and Robotics (STCR). IEEE. p. 1–6. https://doi.org/10.1109/STCR51658.2021.9588906

  41. Mashru D, Mangipudi GM, Swamy H, HalangaliS, Sushma E. A decentralised instant messaging application with end-to-end encryption. In: 2023 20th Learning and Technology Conference (L&T) (pp. 48–53). IEEE; 2023.

  42. Si Q, Xu H, Tong Y, Zhou Y, Liang J, Cui L, Hao Z. Malware Detection Using Automated Generation of Yara Rules on Dynamic Features. In: Science of Cyber Security: 4th International Conference, SciSec 2022, Matsue, Japan, August 10–12, 2022, Revised Selected Papers (pp. 315-330). Cham: Springer International Publishing; 2022.

  43. Liu H, Patras P. NetSentry: a deep learning approach to detecting incipient large-scale network attacks. Comput Commun. 2022;191:119–32.

    Article  Google Scholar 

  44. Kim S, Kim J, Nam S, Kim D. WebMon: ML-and YARA-based malicious webpage detection. Comput Netw. 2018;137:119–31.

    Article  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sonam Bhardwaj.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Research Trends in Computational Intelligence” guest edited by Anshul Verma, Pradeepika Verma, Vivek Kumar Singh, and S. Karthikeyan.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhardwaj, S., Dave, M. Integrating a Rule-Based Approach to Malware Detection with an LSTM-Based Feature Selection Technique. SN COMPUT. SCI. 4, 737 (2023). https://doi.org/10.1007/s42979-023-02177-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02177-2

Keywords

Navigation