Abstract
Technology has amplified malware activity, affecting network and users. Before being forwarded to the next host, network traffic must be dynamically analysed for malware. By exploiting network vulnerabilities, attackers gain control of the system and implement their own network rules to enable malicious traffic. Yet, another recursive acronym (YARA) rules are effective string and pattern-matching malware analysis approaches. The quality and amount of YARA rules utilized in analysis determine its effectiveness. YARA rules focus on whether to activate a rule for a suspicious sample after examining its rule condition. YARA rules rely on binary conclusion on malware analysis, which may limit its use and results. Thus, the proposed approach selects malware features using the ML-based LSTM model. Rule-based traffic analysis and long-short term memory (LSTM)-based feature selection strengthen the malware detection model in the proposed approach. By comparing performance results with and without LSTM-based feature (parameter) selection, this research assesses model integrity. Due to LSTM-based feature selection, the model achieved its best accuracy of 97%, proving its suitability for malware detection on diverse datasets belonging to different network environments.
Similar content being viewed by others
Data availability
This research work uses publicly available datasets that have been cited and referenced.
References
Tahir R. A study on malware and malware detection techniques. Int J Edu Mgmt Engg. 2018;8(2):20. https://doi.org/10.5815/ijeme.2018.02.03.
Faruk MJ, Miner P, Coughlan R, Masum M, Shahriar H, Clincy V, Cetinkaya C. Smart connected aircraft: towards security, privacy, and ethical hacking. In: International Conference on Security of Information and Networks (SIN). IEEE. 2021; p. 1–5. https://doi.org/10.1109/SIN54109.2021.9699243
Zhang K. A machine learning based approach to identify SQL injection vulnerabilities. In: IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE. 2019; p. 1286–1288. https://doi.org/10.1109/ASE.2019.00164
Kim MS. A Study on the Attack Index Packet Filtering Algorithm Based on Web Vulnerability. In: Big Data, cloud computing, and data science engineering. Springer; 2023, p. 145–152. https://doi.org/10.1007/978-3-031-19608-9_12
Shandilya SK, Ganguli C, Izonin I, Nagar AK. Cyber attack evaluation dataset for deep packet inspection and analysis. Data Brief. 2023;46:108771. https://doi.org/10.1016/j.dib.2022.108771.
Catal C, Giray G, Tekinerdogan B. Applications of deep learning for mobile malware detection: A systematic literature review. Neur Comp Appl. 2022; 1–26. https://doi.org/10.1007/s00521-021-06597-0
Naik N, Jenkins P, Savage N, Yang L. Cyberthreat Hunting-Part 1: triaging ransomware using fuzzy hashing, import hashing and YARA rules. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE. 2019; p. 1–6. https://doi.org/10.1109/FUZZ-IEEE.2019.8858803
Mira F, Huang W. Performance evaluation of string based malware detection methods. In: International Conference on Automation and Computing (ICAC). IEEE. 2018; p. 1–6. https://doi.org/10.23919/IConAC.2018.8749096
Xiao X, Zhang S, Mercaldo F, Hu G, Sangaiah AK. Android malware detection based on system call sequences and LSTM. Multim Tls Appl. 2019;78:3979–99. https://doi.org/10.1007/s11042-017-5104-0.
Zhang J. A practical logic obfuscation technique for hardware security. IEEE Trans VLSI sys. 2015;24(3):1193–7. https://doi.org/10.1109/TVLSI.2015.2437996.
Cesare S, Xiang Y, Zhou W. Malwise—an effective and efficient classification system for packed and polymorphic malware. IEEE Trans Compu. 2012;62(6):1193–206. https://doi.org/10.1109/TC.2012.65.
Abbas MFB, Srikanthan T. Low-complexity signature-based malware detection for IoT devices. In: Applications and Techniques in Information Security: International Conference, ATIS 2017, Auckland, New Zealand, Proceedings. Springer. 2017; p. 181–189. https://doi.org/10.1007/978-981-10-5421-1_15
Naik N, Jenkins P, Savage N, Yang L, Naik K, Song J. Embedding fuzzy rules with YARA rules for performance optimisation of malware analysis. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE. 2020; p. 1–7. https://doi.org/10.1109/FUZZ48607.2020.9177856
Culling C. Which YARA rules rule: basic or advanced? In: GIAC (GCIA) Gold Certification and RES. 2018; p. 5500.
Brengel M, Rossow C. YARIX: scalable YARA-based Malware Intelligence. In: USENIX Security Symposium. 2021. p. 3541–3558.
VirusTotal. YARA in a nutshell. 2019; [Online]. https://virustotal.github.io/yara/. Accessed 3 Feb 2023.
Andrade EDO, Viterbo J, Vasconcelos CN, Guérin J, Bernardini FC. A model based on lstm neural networks to identify five different types of malware. Proc Compu Sci. 2019;159:182–91. https://doi.org/10.1016/j.procs.2019.09.173.
Akbar F, Hussain M, Mumtaz R, Riaz Q, Wahab AWA, Jung KH. Permissions-based detection of android malware using machine learning. Symmetry. 2022;14(4):718. https://doi.org/10.3390/sym14040718.
Kambar MEZN, Esmaeilzadeh A, Kim Y, Taghva K. A survey on mobile malware detection methods using machine learning. In: IEEE Annual Computing and Communication Workshop and Conference (CCWC). IEEE. 2022; p. 0215–0221. https://doi.org/10.1109/CCWC54503.2022.9720753
Yakura H, Shinozaki S, Nishimura R, Oyama Y, Sakuma J. Neural malware analysis with attention mechanism. Comp Secur. 2019;87:101592. https://doi.org/10.1016/j.cose.2019.101592.
Nadeem MW, Goh HG, Ponnusamy V, Aun Y. Ddos detection in sdn usingmachine learning techniques. Comput Mater Contin. 2022;71(1):771–89. https://doi.org/10.32604/cmc.2022.021669.
Mirjalili S, Mirjalili SM, Yang XS. Binary bat algorithm. Neural Comput Appl. 2014;25:663–81. https://doi.org/10.1007/s00521-013-1525-5.
Nakamura RY, Pereira LA, Costa KA, Rodrigues D, Papa JP, Yang XS. BBA: a binary bat algorithm for feature selection. In: SIBGRAPI conference on graphics, patterns and images. IEEE. p. 291–297. https://doi.org/10.1109/SIBGRAPI.2012.47
Alotaibi FM, Vassilakis VG. Sdn-based detection of self-propagating ransomware: the case of badrabbit. IEEE Access. 2021;9:28039–58. https://doi.org/10.1109/ACCESS.2021.3058897.
Masum M, Faruk MJ, Shahriar H, Qian K, Lo D, Adnan MI. Ransomware classification and detection with machine learning algorithms. In: IEEE Annual Computing and Communication Workshop and Conference (CCWC). IEEE 2022; p. 0316–0322. DOI: https://doi.org/10.1109/CCWC54503.2022.9720869.
Şahín CB, Dírí B. Robust feature selection with LSTM recurrent neural networks for artificial immune recognition system. IEEE Access. 2019;7:24165–78. https://doi.org/10.1109/ACCESS.2019.2900118.
Ghanei H, Manavi F, Hamzeh A. A novel method for malware detection based on hardware events using deep neural networks. Compu Viro Hack Tech. 2021;17(4):319–31. https://doi.org/10.1007/s11416-021-00386-y.
Burks R, Islam KA, Lu Y, Li J. Data augmentation with generative models for improved malware detection: A comparative study. In: IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE. 2019. p. 0660–0665. IEEE. https://doi.org/10.1109/UEMCON47517.2019.8993085.
Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G. Learning under concept drift: a review. IEEE trans knowl Data Engg. 2018;31(12):2346–63. https://doi.org/10.1109/TKDE.2018.2876857.
Thabtah F, Hammoud S, Kamalov F, Gonsalves A. Data imbalance in classification: experimental evaluation. Info Sci. 2020;513:429–41. https://doi.org/10.1016/j.ins.2019.11.004.
Fahy C, Yang S, Gongora M. Scarcity of labels in non-stationary data streams: A survey. ACM Comp Surv (CSUR). 2022;55(2):1–39. https://doi.org/10.1145/3494832.
Duch W, Wieczorek T, Biesiada J, Blachnik M. Comparison of feature ranking methods based on information entropy. In: IEEE International Joint Conference on Neural Networks. IEEE. 2004. p. 1415–1419. IEEE. https://doi.org/10.1109/IJCNN.2004.1380157.
Jaramillo LES. Malware detection and mitigation techniques: Lessons learned from Mirai DDOS attack. Info Sys Eng Manag. 2018;3(3):19. https://doi.org/10.20897/jisem/2655.
Wireshark Tool. https://www.wireshark.org/. Accessed 27 July 2023
Mukhiya SK, Ahmed U. Hands-On Exploratory Data Analysis with Python: Perform EDA techniques to understand, summarize, and investigate your data. Packt Publishing Ltd; 2020.
Vinesmsuic dataset [Available online] at: https://www.kaggle.com/code/vinesmsuic/malware-detection-using-deeplearning/data. Accessed 8 Feb 2023.
Android Malware Dataset [Available online] at: https://www.kaggle.com/datasets/shashwatwork/android-malware-dataset-for-machine-learning. Accessed 8 Feb 2023.
IoT Malware Dataset [Available online]: https://www.kaggle.com/datasets/efecastaneda/malware-iot-log-file. Accessed 8 Feb 2023.
Elsayed MS, Le-Khac NA, Jurcut AD. InSDN: A novel SDN intrusion dataset. IEEE access. 2020;8:165263–84. https://doi.org/10.1109/ACCESS.2020.3022633.
Reddy KV, Ambati SR, Reddy YS, Reddy AN. AdaBoost for Parkinson's disease detection using robust scaler and SFS from acoustic features. In: Smart Technologies, Communication and Robotics (STCR). IEEE. p. 1–6. https://doi.org/10.1109/STCR51658.2021.9588906
Mashru D, Mangipudi GM, Swamy H, HalangaliS, Sushma E. A decentralised instant messaging application with end-to-end encryption. In: 2023 20th Learning and Technology Conference (L&T) (pp. 48–53). IEEE; 2023.
Si Q, Xu H, Tong Y, Zhou Y, Liang J, Cui L, Hao Z. Malware Detection Using Automated Generation of Yara Rules on Dynamic Features. In: Science of Cyber Security: 4th International Conference, SciSec 2022, Matsue, Japan, August 10–12, 2022, Revised Selected Papers (pp. 315-330). Cham: Springer International Publishing; 2022.
Liu H, Patras P. NetSentry: a deep learning approach to detecting incipient large-scale network attacks. Comput Commun. 2022;191:119–32.
Kim S, Kim J, Nam S, Kim D. WebMon: ML-and YARA-based malicious webpage detection. Comput Netw. 2018;137:119–31.
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Ethical Approval
This article does not contain any studies with animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Research Trends in Computational Intelligence” guest edited by Anshul Verma, Pradeepika Verma, Vivek Kumar Singh, and S. Karthikeyan.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bhardwaj, S., Dave, M. Integrating a Rule-Based Approach to Malware Detection with an LSTM-Based Feature Selection Technique. SN COMPUT. SCI. 4, 737 (2023). https://doi.org/10.1007/s42979-023-02177-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-02177-2