Skip to main content
Log in

Malware detection based on deep learning algorithm

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this study we represent malware as opcode sequences and detect it using a deep belief network (DBN). Compared with traditional shallow neural networks, DBNs can use unlabeled data to pretrain a multi-layer generative model, which can better represent the characteristics of data samples. We compare the performance of DBNs with that of three baseline malware detection models, which use support vector machines, decision trees, and the k-nearest neighbor algorithm as classifiers. The experiments demonstrate that the DBN model provides more accurate detection than the baseline models. When additional unlabeled data are used for DBN pretraining, the DBNs perform better than the other detection models. We also use the DBNs as an autoencoder to extract the feature vectors of executables. The experiments indicate that the autoencoder can effectively model the underlying structure of input data and significantly reduce the dimensions of feature vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Ahmed F, Hameed H, Shafiq MZ, Farooq M (2009) Using spatio-temporal information in API calls with machine learning algorithms for malware detection. In: AISec ‘09 Proceedings of the 2nd ACM workshop on Security and artificial intelligence, pp 55–62

  2. Christodorescu M, Jha S (2004) Testing malware detectors. In: ACM SIGSOFT international symposium on software testing and analysis (ISSTA ‘04), Boston, USA, pp 34–44

  3. Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pretrained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–41

    Article  Google Scholar 

  4. Ding Y, Dai W, Yan S et al (2014) Control flow-based opcode behavior analysis for malware detection. Comput Secur 44(1):64–82

    Google Scholar 

  5. Elhadi AAE, Maarof MA, Barry BIA, Hamza H (2014) Enhancing the detection of metamorphic malware using call graphs. Comput Secur 46:62–78

    Article  Google Scholar 

  6. Erhan D, Bengio Y, Courville A, Manzagol P, Vincent P (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660

    MathSciNet  MATH  Google Scholar 

  7. Eskandari M, Hashemi S (2012) A graph mining approach for detecting unknown malwares. J Visu Lang Comput 23(3):154–162

    Article  Google Scholar 

  8. Hex-Rays SA (2009) IDA pro Introduction. http://www.hex-rays.com/products.shtml/. Accessed 23 Mar 2010

  9. Henchiri O, Japkowicz N (2006) A feature selection and evaluation scheme for computer virus detection. In: Proceedings ofICDM-2006, Hong Kong, pp 891–895

  10. Hinton G et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97

    Article  Google Scholar 

  11. Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  12. Islam R et al (2013) Classification of malware based on integrated static and dynamic features. J Netw Comput Appl 36:646–656

    Article  Google Scholar 

  13. Kolter JZ, Maloof MA (2004) Learning to detect malicious executables in the wild. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, NY, pp 470–478

  14. Manuel E, Theodoor S, Engin K, Christopher K (2012) A survey on automated dynamic malware-analysis techniques and tools. ACM Comput Surv 44(2):1–42

    Google Scholar 

  15. Mitchell TM (1997) Machine learning. McGraw-Hill, New York. ISBN: 0070428077

  16. Moskovitch R, Feher C, Zachar N, Berger E, Gitelman M, Dolev S, et al (2008a) Unknown malcode detection using OPCODE representation. In: European conference on intelligence and security informatics 2008 (EuroISI08), Esbjerg, Denmark, pp 204–215

  17. Moskovitch R, Stopel D, Feher C, Nissim N, Elovici Y (2008b) Unknown malcode detection via text categorization and the imbalance problem. In: IEEE intelligence and security informatics, Taiwan, pp 156–161

  18. Peid (2007) Peid v0.94. http://www.peid.info/. Accessed 23 Mar 2010

  19. Salakhutdinov R, Hinton G (2012) An efficient learning procedure for deep Boltzmann machines. Neural Comput 24:1967–2006

    Article  MathSciNet  MATH  Google Scholar 

  20. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24:513–523

    Article  Google Scholar 

  21. Santos I, Brezo F, Ugarte-pedrero X, Bringas PG (2013) Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf Sci 231:64–82

    Article  MathSciNet  Google Scholar 

  22. Sarikaya R, Hinton GE, Deoras A (2014) Application of deep belief networks for natural language understanding. IEEE Trans Audio Speech Lang Process 22(4):778–784

    Article  Google Scholar 

  23. Saxe J, Berlin K (2015) Deep neural network based malware detection using two dimensional binary program features. In: International conference on malicious & unwanted software, pp 11–20

  24. Shabtai A, Moskovitch R, Elovici Y, Glezer C (2009) Detection of malicious code by applying machine learning classifiers on static features—a state-of-the-art survey. Inf Secur Tech Rep 14(1):16–29

    Article  Google Scholar 

  25. Schultz MG, Eskin E, Zadok E, Stolfo SJ (2001) Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE symposium on security and privacy, Oakland USA, pp 38–49

  26. Stopel D, Boger Z, Moskovitch R, Shahar Y, Elovici Y (2006a) Application of Artificial Neural Networks Techniques to Computer Worm Detections. In: Proceedings of IEEE international joint conference on neural networks, Vancouver

  27. Stopel D, Boger Z, Moskovitch R, Shahar Y, Elovici Y (2006b) Improving worm detection with artificial neural networks through feature selection and temporal analysis techniques. In: Proceedings of the third international conference on neural networks, Barcelona

  28. Tian R, Islam R, Batten L, Versteeg S (2010) Differentiating malware from cleanware using behavioral analysis. In: Proceedings of the 5th international conference on malicious and unwanted software: MALWARE 2010, pp 23–30

  29. Yeung DY, Ding Y (2003) Host-based intrusion detection using dynamic and static behavioral models. Pattern Recognit 36(1):229–243

    Article  MATH  Google Scholar 

  30. Yuan MY (2014) Data mining and machine learning: WEKA applied technology and practice. Tsinghua University Press. ISBN: 978-7302371748

  31. Zhao Z, Wang J, Bai J (2014) Malware detection method based on the control-flow construct feature of software. Inf Secur IET 8(1):18–24

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by Scientific Research Foundation in Shenzhen (Grant Nos. JCYJ20160525163756635, JCYJ20140627163809422), Natural Science Foundation of Guangdong Province (Grant No. 2016A030313664), State Key Laboratory of Computer Architecture, Institute of Computing Technology,and Chinese Academy of Sciences and Key Laboratory of Network Oriented Intelligent Computation (Shenzhen).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ding Yuxin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuxin, D., Siyi, Z. Malware detection based on deep learning algorithm. Neural Comput & Applic 31, 461–472 (2019). https://doi.org/10.1007/s00521-017-3077-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-017-3077-6

Keywords

Navigation