Advertisement

Using convolutional neural networks for classification of malware represented as images

  • Daniel GibertEmail author
  • Carles Mateu
  • Jordi Planes
  • Ramon Vicens
Original Paper
  • 356 Downloads

Abstract

The number of malicious files detected every year are counted by millions. One of the main reasons for these high volumes of different files is the fact that, in order to evade detection, malware authors add mutation. This means that malicious files belonging to the same family, with the same malicious behavior, are constantly modified or obfuscated using several techniques, in such a way that they look like different files. In order to be effective in analyzing and classifying such large amounts of files, we need to be able to categorize them into groups and identify their respective families on the basis of their behavior. In this paper, malicious software is visualized as gray scale images since its ability to capture minor changes while retaining the global structure helps to detect variations. Motivated by the visual similarity between malware samples of the same family, we propose a file agnostic deep learning approach for malware categorization to efficiently group malicious software into families based on a set of discriminant patterns extracted from their visualization as images. The suitability of our approach is evaluated against two benchmarks: the MalImg dataset and the Microsoft Malware Classification Challenge dataset. Experimental comparison demonstrates its superior performance with respect to state-of-the-art techniques.

Keywords

Malware visualization Malware classification Convolutional neural network Deep learning 

Notes

Acknowledgements

We would like to thank the Blueliv Labs team, especially Daniel Solís, and Àngel Puigventós for their support and the feedback provided during the development of this work. This work has been partially funded by the Spanish MICINN Projects TIN2014-53234-C2-2-R, TIN2015-71799-C2-2-P and by AGAUR DI-2016-091.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. 1.
    Ahmadi, M., Giacinto, G., Ulyanov, D., Semenov, S., Trofimov, M.: Novel feature extraction, selection and fusion for effective malware family classification. CoRR abs/1511.04317 (2015)Google Scholar
  2. 2.
    Anderson, B., Quist, D., Neil, J., Storlie, C., Lane, T.: Graph-based malware detection using dynamic analysis. J. Comput. Virol. 7(4), 247–258 (2011).  https://doi.org/10.1007/s11416-011-0152-x CrossRefGoogle Scholar
  3. 3.
    Bat-Erdene, M., Park, H., Li, H., Lee, H., Choi, M.S.: Entropy analysis to classify unknown packing algorithms for malware detection. Int. J. Inf. Secur. 16(3), 227–248 (2017)CrossRefGoogle Scholar
  4. 4.
    Billar, D.: Opcodes as predictor for malware. Int. J. Electron. Secur. Digit. Forensics 1, 156–168 (2007)CrossRefGoogle Scholar
  5. 5.
    Chandrasekar Ravi, R.M.: Malware detection using windows API sequence and machine learning. Int. J. Comput. Appl. 43, 12–16 (2012)Google Scholar
  6. 6.
    Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. 44(2), 6:1–6:42 (2008).  https://doi.org/10.1145/2089125.2089126 Google Scholar
  7. 7.
    Gandotra, E., Bansal, D., Sofat, S.: Malware analysis and classification: a survey. J. Inf. Secur. 5, 56–64 (2014)Google Scholar
  8. 8.
    Ghiasi, M., Sami, A., Salehi, Z.: Dynamic VSA: a framework for malware detection based on register contents. Eng. Appl. Artif. Intell. 44, 111–122 (2015)CrossRefGoogle Scholar
  9. 9.
    Gibert, D., Bejar, J., Mateu, C., Planes, J., Solis, D., Vicens, R.: Convolutional neural networks for classification of malware assembly code. In: International Conference of the Catalan Association for Artificial Intelligence, pp. 221–226 (2017).  https://doi.org/10.3233/978-1-61499-806-8-221
  10. 10.
    Gibert, D., Mateu, C., Planes, J., Vicens, R.: Classification of malware by using structural entropy on convolutional neural networks. In: AAAI Conference on Artificial Intelligence (2018)Google Scholar
  11. 11.
    Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural Features for Image Classification. IEEE Trans. Syst. Man Cybern. SMC–3(6), 610–621 (1973)CrossRefGoogle Scholar
  12. 12.
    Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psych. 24, 417–441 (1933)CrossRefzbMATHGoogle Scholar
  13. 13.
    Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture of monkey striate cortex. J. Physiol. (Lond.) 195, 215–243 (1968)CrossRefGoogle Scholar
  14. 14.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, NIPS’12, pp. 1097–1105. Curran Associates Inc., USA (2012)Google Scholar
  15. 15.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp. 2278–2324 (1998)Google Scholar
  16. 16.
    LLC, M.: Mcafee labs threats report (2017). https://www.mcafee.com/us/resources/reports/rp-quarterly-threats-jun-2017.pdf. Accessed 20 Sept 2017
  17. 17.
    Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Secur. Anal. 5, 40–45 (2007)CrossRefGoogle Scholar
  18. 18.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, pp. 807–814. Omnipress, USA (2010)Google Scholar
  19. 19.
    Narayanan, B.N., Djaneye-Boundjou, O., Kebede, T.M.: Performance analysis of machine learning and pattern recognition algorithms for malware classification. In: Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), 2016 IEEE National, pp. 338–342. IEEE (2016)Google Scholar
  20. 20.
    Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.S.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, VizSec ’11, pp. 4:1–4:7. ACM, New York, NY, USA (2011)Google Scholar
  21. 21.
    Ojala, T., Pietikainen, M., Harwood, D.: Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, 1994. Vol. 1—Conference A: Computer Vision amp; Image Processing, vol. 1 (1994)Google Scholar
  22. 22.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  23. 23.
    Ranvee, S., Hiray, S.: Comparative analysis of feature extraction methods of malware detection. Int. J. Comput. Appl. 120, 1–7 (2015)Google Scholar
  24. 24.
    Salehi, Z., Sami, A., Ghiasi, M.: MAAR: robust features to detect malicious activity based on api calls, their arguments and return values. Eng. Appl. Artif. Intell. 59, 93–102 (2017)CrossRefGoogle Scholar
  25. 25.
    Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious code by applying classification techniques on OpCode patterns. Secur. Inf. 1(1), 1 (2012).  https://doi.org/10.1186/2190-8532-1-1 CrossRefGoogle Scholar
  26. 26.
    Sorokin, I.: Comparing files using structural entropy. J. Comput. Virol. 7(4), 259 (2011)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Storlie, C., Anderson, B., Vander Wiel, S., Quist, D., Hash, C., Brown, N.: Stochastic identification of malware with dynamic traces. Ann. Appl. Stat. 8(1), 1–18 (2014).  https://doi.org/10.1214/13-AOAS703 MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Tesauro, G., Kephart, J., Sorkin, G.B.: Neural networks for computer virus recognition. In: IEEE International Conference on Intelligence and Security Informatics, vol. 11 (1996)Google Scholar
  30. 30.
    Turkowski, K.: Filters for common resampling tasks. In: Glassner, A.S. (ed.) Graphics Gems, pp. 147–165. Academic Press Professional Inc., San Diego, CA (1990)CrossRefGoogle Scholar
  31. 31.
    Wojnowicz, M., Chisholm, G., Wolff, M.: Suspiciously structured entropy: wavelet decomposition of software entropy reveals symptoms of malware in the energy spectrum. In: Florida Artificial Intelligence Research Society Conference (2016)Google Scholar
  32. 32.
    Yuxin, D., Siyi, Z.: Malware detection based on deep learning algorithm. Neural Comput. Appl. (2017).  https://doi.org/10.1007/s00521-017-3077-6

Copyright information

© Springer-Verlag France SAS, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Blueliv, Leap in ValueBarcelonaSpain
  2. 2.University of LleidaLleidaSpain

Personalised recommendations