Malware detection based on deep learning algorithm

Yuxin, Ding; Siyi, Zhu

doi:10.1007/s00521-017-3077-6

Malware detection based on deep learning algorithm

Original Article
Published: 25 July 2017

Volume 31, pages 461–472, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Ding Yuxin¹ &
Zhu Siyi¹

3656 Accesses
90 Citations
16 Altmetric
Explore all metrics

Abstract

In this study we represent malware as opcode sequences and detect it using a deep belief network (DBN). Compared with traditional shallow neural networks, DBNs can use unlabeled data to pretrain a multi-layer generative model, which can better represent the characteristics of data samples. We compare the performance of DBNs with that of three baseline malware detection models, which use support vector machines, decision trees, and the k-nearest neighbor algorithm as classifiers. The experiments demonstrate that the DBN model provides more accurate detection than the baseline models. When additional unlabeled data are used for DBN pretraining, the DBNs perform better than the other detection models. We also use the DBNs as an autoencoder to extract the feature vectors of executables. The experiments indicate that the autoencoder can effectively model the underlying structure of input data and significantly reduce the dimensions of feature vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey of AI-enabled phishing attacks detection techniques

Article 23 October 2020

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

Article Open access 19 September 2022

Applying NLP techniques to malware detection in a practical environment

Article Open access 06 June 2021

References

Ahmed F, Hameed H, Shafiq MZ, Farooq M (2009) Using spatio-temporal information in API calls with machine learning algorithms for malware detection. In: AISec ‘09 Proceedings of the 2nd ACM workshop on Security and artificial intelligence, pp 55–62
Christodorescu M, Jha S (2004) Testing malware detectors. In: ACM SIGSOFT international symposium on software testing and analysis (ISSTA ‘04), Boston, USA, pp 34–44
Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pretrained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–41
Article Google Scholar
Ding Y, Dai W, Yan S et al (2014) Control flow-based opcode behavior analysis for malware detection. Comput Secur 44(1):64–82
Google Scholar
Elhadi AAE, Maarof MA, Barry BIA, Hamza H (2014) Enhancing the detection of metamorphic malware using call graphs. Comput Secur 46:62–78
Article Google Scholar
Erhan D, Bengio Y, Courville A, Manzagol P, Vincent P (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660
MathSciNet MATH Google Scholar
Eskandari M, Hashemi S (2012) A graph mining approach for detecting unknown malwares. J Visu Lang Comput 23(3):154–162
Article Google Scholar
Hex-Rays SA (2009) IDA pro Introduction. http://www.hex-rays.com/products.shtml/. Accessed 23 Mar 2010
Henchiri O, Japkowicz N (2006) A feature selection and evaluation scheme for computer virus detection. In: Proceedings ofICDM-2006, Hong Kong, pp 891–895
Hinton G et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97
Article Google Scholar
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554
Article MathSciNet MATH Google Scholar
Islam R et al (2013) Classification of malware based on integrated static and dynamic features. J Netw Comput Appl 36:646–656
Article Google Scholar
Kolter JZ, Maloof MA (2004) Learning to detect malicious executables in the wild. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, NY, pp 470–478
Manuel E, Theodoor S, Engin K, Christopher K (2012) A survey on automated dynamic malware-analysis techniques and tools. ACM Comput Surv 44(2):1–42
Google Scholar
Mitchell TM (1997) Machine learning. McGraw-Hill, New York. ISBN: 0070428077
Moskovitch R, Feher C, Zachar N, Berger E, Gitelman M, Dolev S, et al (2008a) Unknown malcode detection using OPCODE representation. In: European conference on intelligence and security informatics 2008 (EuroISI08), Esbjerg, Denmark, pp 204–215
Moskovitch R, Stopel D, Feher C, Nissim N, Elovici Y (2008b) Unknown malcode detection via text categorization and the imbalance problem. In: IEEE intelligence and security informatics, Taiwan, pp 156–161
Peid (2007) Peid v0.94. http://www.peid.info/. Accessed 23 Mar 2010
Salakhutdinov R, Hinton G (2012) An efficient learning procedure for deep Boltzmann machines. Neural Comput 24:1967–2006
Article MathSciNet MATH Google Scholar
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24:513–523
Article Google Scholar
Santos I, Brezo F, Ugarte-pedrero X, Bringas PG (2013) Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf Sci 231:64–82
Article MathSciNet Google Scholar
Sarikaya R, Hinton GE, Deoras A (2014) Application of deep belief networks for natural language understanding. IEEE Trans Audio Speech Lang Process 22(4):778–784
Article Google Scholar
Saxe J, Berlin K (2015) Deep neural network based malware detection using two dimensional binary program features. In: International conference on malicious & unwanted software, pp 11–20
Shabtai A, Moskovitch R, Elovici Y, Glezer C (2009) Detection of malicious code by applying machine learning classifiers on static features—a state-of-the-art survey. Inf Secur Tech Rep 14(1):16–29
Article Google Scholar
Schultz MG, Eskin E, Zadok E, Stolfo SJ (2001) Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE symposium on security and privacy, Oakland USA, pp 38–49
Stopel D, Boger Z, Moskovitch R, Shahar Y, Elovici Y (2006a) Application of Artificial Neural Networks Techniques to Computer Worm Detections. In: Proceedings of IEEE international joint conference on neural networks, Vancouver
Stopel D, Boger Z, Moskovitch R, Shahar Y, Elovici Y (2006b) Improving worm detection with artificial neural networks through feature selection and temporal analysis techniques. In: Proceedings of the third international conference on neural networks, Barcelona
Tian R, Islam R, Batten L, Versteeg S (2010) Differentiating malware from cleanware using behavioral analysis. In: Proceedings of the 5th international conference on malicious and unwanted software: MALWARE 2010, pp 23–30
Yeung DY, Ding Y (2003) Host-based intrusion detection using dynamic and static behavioral models. Pattern Recognit 36(1):229–243
Article MATH Google Scholar
Yuan MY (2014) Data mining and machine learning: WEKA applied technology and practice. Tsinghua University Press. ISBN: 978-7302371748
Zhao Z, Wang J, Bai J (2014) Malware detection method based on the control-flow construct feature of software. Inf Secur IET 8(1):18–24
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by Scientific Research Foundation in Shenzhen (Grant Nos. JCYJ20160525163756635, JCYJ20140627163809422), Natural Science Foundation of Guangdong Province (Grant No. 2016A030313664), State Key Laboratory of Computer Architecture, Institute of Computing Technology,and Chinese Academy of Sciences and Key Laboratory of Network Oriented Intelligent Computation (Shenzhen).

Author information

Authors and Affiliations

Harbin Institute of Technology Shenzhen Graduate School, Shenzhen University Town, Shenzhen, China
Ding Yuxin & Zhu Siyi

Authors

Ding Yuxin
View author publications
You can also search for this author in PubMed Google Scholar
Zhu Siyi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ding Yuxin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuxin, D., Siyi, Z. Malware detection based on deep learning algorithm. Neural Comput & Applic 31, 461–472 (2019). https://doi.org/10.1007/s00521-017-3077-6

Download citation

Received: 10 May 2015
Accepted: 15 June 2017
Published: 25 July 2017
Issue Date: 14 February 2019
DOI: https://doi.org/10.1007/s00521-017-3077-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Malware detection based on deep learning algorithm

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of AI-enabled phishing attacks detection techniques

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

Applying NLP techniques to malware detection in a practical environment

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Malware detection based on deep learning algorithm

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of AI-enabled phishing attacks detection techniques

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

Applying NLP techniques to malware detection in a practical environment

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation