Advertisement

Neural Computing and Applications

, Volume 31, Issue 2, pp 461–472 | Cite as

Malware detection based on deep learning algorithm

  • Ding YuxinEmail author
  • Zhu Siyi
Original Article

Abstract

In this study we represent malware as opcode sequences and detect it using a deep belief network (DBN). Compared with traditional shallow neural networks, DBNs can use unlabeled data to pretrain a multi-layer generative model, which can better represent the characteristics of data samples. We compare the performance of DBNs with that of three baseline malware detection models, which use support vector machines, decision trees, and the k-nearest neighbor algorithm as classifiers. The experiments demonstrate that the DBN model provides more accurate detection than the baseline models. When additional unlabeled data are used for DBN pretraining, the DBNs perform better than the other detection models. We also use the DBNs as an autoencoder to extract the feature vectors of executables. The experiments indicate that the autoencoder can effectively model the underlying structure of input data and significantly reduce the dimensions of feature vectors.

Keywords

Malware detection Opcode Deep learning Neural network Security 

Notes

Acknowledgements

This work was partially supported by Scientific Research Foundation in Shenzhen (Grant Nos. JCYJ20160525163756635, JCYJ20140627163809422), Natural Science Foundation of Guangdong Province (Grant No. 2016A030313664), State Key Laboratory of Computer Architecture, Institute of Computing Technology,and Chinese Academy of Sciences and Key Laboratory of Network Oriented Intelligent Computation (Shenzhen).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. 1.
    Ahmed F, Hameed H, Shafiq MZ, Farooq M (2009) Using spatio-temporal information in API calls with machine learning algorithms for malware detection. In: AISec ‘09 Proceedings of the 2nd ACM workshop on Security and artificial intelligence, pp 55–62Google Scholar
  2. 2.
    Christodorescu M, Jha S (2004) Testing malware detectors. In: ACM SIGSOFT international symposium on software testing and analysis (ISSTA ‘04), Boston, USA, pp 34–44Google Scholar
  3. 3.
    Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pretrained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–41CrossRefGoogle Scholar
  4. 4.
    Ding Y, Dai W, Yan S et al (2014) Control flow-based opcode behavior analysis for malware detection. Comput Secur 44(1):64–82Google Scholar
  5. 5.
    Elhadi AAE, Maarof MA, Barry BIA, Hamza H (2014) Enhancing the detection of metamorphic malware using call graphs. Comput Secur 46:62–78CrossRefGoogle Scholar
  6. 6.
    Erhan D, Bengio Y, Courville A, Manzagol P, Vincent P (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660MathSciNetzbMATHGoogle Scholar
  7. 7.
    Eskandari M, Hashemi S (2012) A graph mining approach for detecting unknown malwares. J Visu Lang Comput 23(3):154–162CrossRefGoogle Scholar
  8. 8.
    Hex-Rays SA (2009) IDA pro Introduction. http://www.hex-rays.com/products.shtml/. Accessed 23 Mar 2010
  9. 9.
    Henchiri O, Japkowicz N (2006) A feature selection and evaluation scheme for computer virus detection. In: Proceedings ofICDM-2006, Hong Kong, pp 891–895Google Scholar
  10. 10.
    Hinton G et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97CrossRefGoogle Scholar
  11. 11.
    Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Islam R et al (2013) Classification of malware based on integrated static and dynamic features. J Netw Comput Appl 36:646–656CrossRefGoogle Scholar
  13. 13.
    Kolter JZ, Maloof MA (2004) Learning to detect malicious executables in the wild. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, NY, pp 470–478Google Scholar
  14. 14.
    Manuel E, Theodoor S, Engin K, Christopher K (2012) A survey on automated dynamic malware-analysis techniques and tools. ACM Comput Surv 44(2):1–42Google Scholar
  15. 15.
    Mitchell TM (1997) Machine learning. McGraw-Hill, New York. ISBN: 0070428077Google Scholar
  16. 16.
    Moskovitch R, Feher C, Zachar N, Berger E, Gitelman M, Dolev S, et al (2008a) Unknown malcode detection using OPCODE representation. In: European conference on intelligence and security informatics 2008 (EuroISI08), Esbjerg, Denmark, pp 204–215Google Scholar
  17. 17.
    Moskovitch R, Stopel D, Feher C, Nissim N, Elovici Y (2008b) Unknown malcode detection via text categorization and the imbalance problem. In: IEEE intelligence and security informatics, Taiwan, pp 156–161Google Scholar
  18. 18.
    Peid (2007) Peid v0.94. http://www.peid.info/. Accessed 23 Mar 2010
  19. 19.
    Salakhutdinov R, Hinton G (2012) An efficient learning procedure for deep Boltzmann machines. Neural Comput 24:1967–2006MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24:513–523CrossRefGoogle Scholar
  21. 21.
    Santos I, Brezo F, Ugarte-pedrero X, Bringas PG (2013) Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf Sci 231:64–82MathSciNetCrossRefGoogle Scholar
  22. 22.
    Sarikaya R, Hinton GE, Deoras A (2014) Application of deep belief networks for natural language understanding. IEEE Trans Audio Speech Lang Process 22(4):778–784CrossRefGoogle Scholar
  23. 23.
    Saxe J, Berlin K (2015) Deep neural network based malware detection using two dimensional binary program features. In: International conference on malicious & unwanted software, pp 11–20Google Scholar
  24. 24.
    Shabtai A, Moskovitch R, Elovici Y, Glezer C (2009) Detection of malicious code by applying machine learning classifiers on static features—a state-of-the-art survey. Inf Secur Tech Rep 14(1):16–29CrossRefGoogle Scholar
  25. 25.
    Schultz MG, Eskin E, Zadok E, Stolfo SJ (2001) Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE symposium on security and privacy, Oakland USA, pp 38–49Google Scholar
  26. 26.
    Stopel D, Boger Z, Moskovitch R, Shahar Y, Elovici Y (2006a) Application of Artificial Neural Networks Techniques to Computer Worm Detections. In: Proceedings of IEEE international joint conference on neural networks, VancouverGoogle Scholar
  27. 27.
    Stopel D, Boger Z, Moskovitch R, Shahar Y, Elovici Y (2006b) Improving worm detection with artificial neural networks through feature selection and temporal analysis techniques. In: Proceedings of the third international conference on neural networks, BarcelonaGoogle Scholar
  28. 28.
    Tian R, Islam R, Batten L, Versteeg S (2010) Differentiating malware from cleanware using behavioral analysis. In: Proceedings of the 5th international conference on malicious and unwanted software: MALWARE 2010, pp 23–30Google Scholar
  29. 29.
    Yeung DY, Ding Y (2003) Host-based intrusion detection using dynamic and static behavioral models. Pattern Recognit 36(1):229–243CrossRefzbMATHGoogle Scholar
  30. 30.
    Yuan MY (2014) Data mining and machine learning: WEKA applied technology and practice. Tsinghua University Press. ISBN: 978-7302371748Google Scholar
  31. 31.
    Zhao Z, Wang J, Bai J (2014) Malware detection method based on the control-flow construct feature of software. Inf Secur IET 8(1):18–24CrossRefGoogle Scholar

Copyright information

© The Natural Computing Applications Forum 2017

Authors and Affiliations

  1. 1.Harbin Institute of Technology Shenzhen Graduate SchoolShenzhen University TownShenzhenChina

Personalised recommendations