Deep learning-aided runtime opcode-based Windows malware detection

Parildi, Enes Sinan; Hatzinakos, Dimitrios; Lawryshyn, Yuri

doi:10.1007/s00521-021-05861-7

Deep learning-aided runtime opcode-based Windows malware detection

Original Article
Published: 21 March 2021

Volume 33, pages 11963–11983, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Enes Sinan Parildi ORCID: orcid.org/0000-0002-6220-8056¹,
Dimitrios Hatzinakos¹ &
Yuri Lawryshyn²

755 Accesses
11 Citations
Explore all metrics

Abstract

Thousands of new malware codes are developed every day. Signature-based methods, which are employed by common malware detectors, are susceptible to code obfuscation and novel malware. In this paper, we present an alternative method for malware detection, which makes use of assembly opcode sequences obtained during runtime. First, for sequential opcode data, we utilize natural language processing and deep learning techniques to facilitate the extraction of deeper behavioral features. Due to these features, this method can be impervious to code obfuscation and effective against novel malware. Finally, these features are fed to various machine learning algorithms for classification. The experiments on a more class balanced dataset of 26869 samples demonstrated that MCC (Matthew’s correlation coefficient) score as high as 0.95 is achievable with this approach. The MCC score results for the experiments conducted on imbalanced and artificially balanced datasets are 0.81 and 0.83, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

Data collection and quality challenges in deep learning: a data-centric AI perspective

Article 03 January 2023

A Hybrid Machine Learning Model for Code Optimization

Article 22 September 2023

Availability of data and material

All the data used in this study can be made available upon request after contacting with any of the authors.

Code availability

The relevant code base is stored in the private repository. It can be made available upon request after contacting with any of the authors.

References

Arora S, Liang Y, Ma T (2016) A simple but tough-to-beat baseline for sentence embeddings (2016)
Bazrafshan Z, Hashemi H, Fard SMH Hamzeh A (2013) A survey on heuristic malware detection techniques. In: The 5th conference on information and knowledge technology, IEEE, pp 113–120
Christodorescu M, Jha S (2006) Static analysis of executables to detect malicious patterns. WISCONSIN UNIV-MADISON DEPT OF COMPUTER SCIENCES, Tech. Rep
Beltagy I, Peters ME, Cohan, A (2020) Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150
Cakir B, Dogdu E (2018) Malware classification using deep learning methods. In: Proceedings of the ACMSE 2018 Conference, ACM, p 10
Carlin D, Cowan A, O’Kane P, Sezer S (2017) The effects of traditional anti-virus labels on malware detection using dynamic runtime opcodes. IEEE Access 5:17742–17752
Article Google Scholar
Carlin D, O’Kane P, and Sezer S (2017) Dynamic analysis of malware using run-time opcodes. In: Data analytics and decision support for cybersecurity. Springer, pp 99–125
Maaten LVD, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605
MATH Google Scholar
Chen T, Mao Q, Yang Y, Lv M, Zhu J (2018) Tinydroid: a lightweight and efficient model for android malware detection and classification. Mobile information systems 2018
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
fnl (https://stats.stackexchange.com/users/44585/fnl): How to set the dictionary for text analysis using neural networks. Cross Validated. https://stats.stackexchange.com/q/163032. URL: https://stats.stackexchange.com/q/163032 (version: 2017-06-26)
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882
Kitaev N, Kaiser Ł, Levskaya A (2020) Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020)
Kolosnjaji B, Eraisha G, Webster GD, Zarras A, Eckert C (2017) Empowering convolutional networks for malware classification and analysis. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp 3838–3845
Kolosnjaji B, Zarras A, Webster GD, Eckert C (2016) Deep learning for classification of malware system call sequences. In: Australasian joint conference on artificial intelligence, pp 137–149
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lab K (2017) Kaspersky Security Bulletin. https://www.kaspersky.com/about/press-releases/2017_kaspersky-lab-detects-360000-new-malicious-files-daily
Martinez E (2015) A first shot at false positives. [Online]. Available: https://blog.virustotal.com/2015/02/a-first-shot-at-false-positives.html
Lapiello E (2018) Shuffling paragraphs: Using data augmentation in nlp to increase accuracy.https://medium.com/bcggamma/shuffling-paragraphs-using-data-augmentation-in-nlp-to-increase-accuracy-477388746bd9
McLaughlin N, del Rincón JM, Kang B, Yerima SY, Miller PC, Sezer S, Safaei Y, Trickel E, Zhao Z, Doupé A, Ahn GJ (2017) Deep android malware detection. In: Proceedings of the seventh ACM on conference on data and application security and privacy, pp 301–308
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Neil D, Pfeiffer M, Liu SC (2016) Phased lstm: Accelerating recurrent network training for long or event-based sequences. In: Advances in neural information processing systems, pp 3882–3890
Osborn, M.: Malware detection techniques. Int J Comput (IJC) 18(1) (2015)
Perez L, Wang J (2017) The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621
Sung Y, Jang S, Jeong Y-S, Hyuk J et al (2020) Malware classification algorithm using advanced word2vec-based bi-lstm for ground control stations. Comput Commun 153:342–348
Article Google Scholar
Jeon S, Moon J (2020) Malware-detection method with a convolutional recurrent neural network using opcode sequences. Inform Sci 535:1–15
Article MathSciNet Google Scholar
Popov I (2017) Malware detection using machine learning based on word2vec embeddings of machine code instructions. In: 2017 Siberian Symposium on Data Science and Engineering (SSDSE), IEEE, pp 1–4
Sihwail R, Omar K, Ariffin KAZ (2018) A survey on malware analysis techniques: static, dynamic, hybrid and memory analysis. Int J Adv Sci Eng Inf Technol 8(4–2):1662
Article Google Scholar
Idika N, Mathur AP (2007) A survey of malware detection techniques. Purdue University
Řehůřek R, Sojka P (2010) Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta, pp 45–50. http://is.muni.cz/publication/884893/en
Rong X (2014) word2vec parameter learning explained. arXiv preprint arXiv:1411.2738
Shijo P, Salim A (2015) Integrated static and dynamic analysis for malware detection. Procedia Comput Sci 46:804–811
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Kaspersky (2018) Types of malware. [Online]. Available: https://www.kaspersky.com/resource-center/threats/malware-classifications
McInnes L, Healy J, Melville J (2018) Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426
Becht E, McInnes L, Healy J, Dutertre C-A, Kwok IW, Ng LG, Ginhoux F, Newell EW (2019) Dimensionality reduction for visualizing single-cell data using umap. Nat Biotechnol 37(1):38–44
Article Google Scholar
Kokhlikyan N, Miglani V, Martin M, Wang E, Alsallakh B, Reynolds J, Melnikov A, Kliushkina N, Araya C, Yan S et al. (2020) Captum: A unified and generic model interpretability library for pytorch,” arXiv preprint arXiv:2009.07896
Vemparala S, Di Troia F, Corrado VA, Austin TH, Stamo M (2016) Malware detection using dynamic birthmarks. In: Proceedings of the 2016 ACM on international workshop on security and privacy analytics, ACM, pp 41–46
Yan J, Qi Y, Rao Q (2018) Detecting malware with an ensemble method based on deep neural network. Secur Commun Networks 2018:1–16
Google Scholar
Yan J, Qi Y, Rao Q (2018) Lstm-based hierarchical denoising network for android malware detection. Secur Commun Netw 2018:1–18
Google Scholar
Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: Short Papers), vol 2. pp 207–212

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Toronto, 27 King’s College Cir, Toronto, ON, Canada
Enes Sinan Parildi & Dimitrios Hatzinakos
Department of Chemical Engineering & Applied Chemistry, University of Toronto, 27 King’s College Cir, Toronto, ON, Canada
Yuri Lawryshyn

Authors

Enes Sinan Parildi
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Hatzinakos
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Lawryshyn
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally.

Corresponding author

Correspondence to Enes Sinan Parildi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Parildi, E.S., Hatzinakos, D. & Lawryshyn, Y. Deep learning-aided runtime opcode-based Windows malware detection. Neural Comput & Applic 33, 11963–11983 (2021). https://doi.org/10.1007/s00521-021-05861-7

Download citation

Received: 13 September 2020
Accepted: 19 February 2021
Published: 21 March 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s00521-021-05861-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning-aided runtime opcode-based Windows malware detection

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Data collection and quality challenges in deep learning: a data-centric AI perspective

A Hybrid Machine Learning Model for Code Optimization

Availability of data and material

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep learning-aided runtime opcode-based Windows malware detection

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Data collection and quality challenges in deep learning: a data-centric AI perspective

A Hybrid Machine Learning Model for Code Optimization

Availability of data and material

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation