Malicious Software Classification Using VGG16 Deep Neural Network’s Bottleneck Features

Rezende, Edmar; Ruppert, Guilherme; Carvalho, Tiago; Theophilo, Antonio; Ramos, Fabio; Geus, Paulo de

doi:10.1007/978-3-319-77028-4_9

Edmar Rezende^15,16,
Guilherme Ruppert¹⁷,
Tiago Carvalho¹⁸,
Antonio Theophilo¹⁷,
Fabio Ramos¹⁹ &
…
Paulo de Geus²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 738))

4030 Accesses
70 Citations

Abstract

Malicious software (malware) has been extensively employed for illegal purposes and thousands of new samples are discovered every day. The ability to classify samples with similar characteristics into families makes possible to create mitigation strategies that work for a whole class of programs. In this paper, we present a malware family classification approach using VGG16 deep neural network’s bottleneck features. Malware samples are represented as byteplot grayscale images and the convolutional layers of a VGG16 deep neural network pre-trained on the ImageNet dataset is used for bottleneck features extraction. These features are used to train a SVM classifier for the malware family classification task. The experimental results on a dataset comprising 10,136 samples from 20 different families showed that our approach can effectively be used to classify malware families with an accuracy of 92.97%, outperforming similar approaches proposed in the literature which require feature engineering and considerable domain expertise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available at http://www.virussign.com.
2.
Available at http://www.virustotal.com.

References

Y. Bengio et al., Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Article Google Scholar
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
J. Yosinski, J. Clune, Y. Bengio, H. Lipson, How transferable are features in deep neural networks? in Advances in Neural Information Processing Systems (2014), pp. 3320–3328
Google Scholar
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Google Scholar
J.Z. Kolter, M.A. Maloof, Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)
MathSciNet MATH Google Scholar
A. Shabtai, R. Moskovitch, C. Feher, S. Dolev, Y. Elovici, Detecting unknown malicious code by applying classification techniques on opcode patterns. Secur. Inform. 1(1), 1–22 (2012)
Article Google Scholar
L. Nataraj, S. Karthikeyan, G. Jacob, B. Manjunath, Malware images: visualization and automatic classification, in Proceedings of the 8th International Symposium on Visualization for Cyber Security (ACM, New York, 2011), p. 4
Google Scholar
B. Kolosnjaji, A. Zarras, G.D. Webster, C. Eckert, Deep learning for classification of malware system call sequences, in Australasian Conference on Artificial Intelligence (2016), pp. 137–149
Google Scholar
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105
Google Scholar
G. Conti, E. Dean, M. Sinda, B. Sangster, Visual reverse engineering of binary and data files, in Visualization for Computer Security (Springer, Berlin, 2008), pp. 1–17
Book Google Scholar
M. Sebastián, R. Rivera, P. Kotzias, J. Caballero, Avclass: a tool for massive malware labeling, in International Symposium on Research in Attacks, Intrusions, and Defenses (Springer, Cham, 2016), pp. 230–253
Book Google Scholar
L. van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar

Download references

Acknowledgements

This work has been partially supported by Brazilian National Council for Scientific and Technological Development (grants 302923/2014-4 and 313152/2015-2). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPUs used for this research.

Author information

Authors and Affiliations

University of Campinas, Campinas, SP, Brazil
Edmar Rezende
Center for Information Technology Renato Archer, Campinas, SP, Brazil
Edmar Rezende
Center for Information Technology Renato Archer, Campinas, SP, Brazil
Guilherme Ruppert & Antonio Theophilo
Federal Institute of São Paulo, Campinas, SP, Brazil
Tiago Carvalho
University of Sydney, Sydney, NSW, Australia
Fabio Ramos
University of Campinas, Campinas, SP, Brazil
Paulo de Geus

Authors

Edmar Rezende
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme Ruppert
View author publications
You can also search for this author in PubMed Google Scholar
Tiago Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Theophilo
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Paulo de Geus
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical & Computer Engineering, University of Nevada, Las Vegas, Las Vegas, Nevada, USA
Shahram Latifi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rezende, E., Ruppert, G., Carvalho, T., Theophilo, A., Ramos, F., Geus, P.d. (2018). Malicious Software Classification Using VGG16 Deep Neural Network’s Bottleneck Features. In: Latifi, S. (eds) Information Technology - New Generations. Advances in Intelligent Systems and Computing, vol 738. Springer, Cham. https://doi.org/10.1007/978-3-319-77028-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-77028-4_9
Published: 13 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77027-7
Online ISBN: 978-3-319-77028-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics