Opcode-Sequence-Based Semi-supervised Unknown Malware Detection

Santos, Igor; Sanz, Borja; Laorden, Carlos; Brezo, Felix; Bringas, Pablo G.

doi:10.1007/978-3-642-21323-6_7

Igor Santos¹⁸,
Borja Sanz¹⁸,
Carlos Laorden¹⁸,
Felix Brezo¹⁸ &
…
Pablo G. Bringas¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 6694))

1539 Accesses
20 Citations

Abstract

Malware is any computer software potentially harmful to both computers and networks. The amount of malware is growing every year and poses a serious global security threat. Signature-based detection is the most extended method in commercial antivirus software, however, it consistently fails to detect new malware. Supervised machine learning has been adopted to solve this issue, but the usefulness of supervised learning is far to be complete because it requires a high amount of malicious executables and benign software to be identified and labelled previously. In this paper, we propose a new method of malware detection that adopts a well-known semi-supervised learning approach to detect unknown malware. This method is based on examining the frequencies of the appearance of opcode sequences to build a semi-supervised machine-learning classifier using a set of labelled (either malware or legitimate software) and unlabelled instances. We performed an empirical validation demonstrating that the labelling efforts are lower than when supervised learning is used while the system maintains high accuracy rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ollmann, G.: The evolution of commercial malware development kits and colour-by-numbers custom malware. Computer Fraud & Security 2008(9), 4–7 (2008)
Article Google Scholar
Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: AccessMiner: using system-centric models for malware protection. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 399–412. ACM, New York (2010)
Google Scholar
Schultz, M., Eskin, E., Zadok, F., Stolfo, S.: Data mining methods for detection of new malicious executables. In: Proceedings of the 22ⁿ d IEEE Symposium on Security and Privacy, pp. 38–49 (2001)
Google Scholar
Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In: Proceedings of the 10^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 470–478. ACM, New York (2004)
Google Scholar
Zhou, Y., Inge, W.: Malware detection using adaptive data compression. In: Proceedings of the 1st ACM Workshop on Workshop on AISec, pp. 53–60. ACM, New York (2008)
Chapter Google Scholar
Santos, I., Penya, Y., Devesa, J., Bringas, P.: N-Grams-based file signatures for malware detection. In: Proceedings of the 11^th International Conference on Enterprise Information Systems (ICEIS), vol. AIDSS, pp. 317–320 (2009)
Google Scholar
Santos, I., Brezo, F., Nieves, J., Penya, Y.K., Sanz, B., Laorden, C., Bringas, P.G.: Opcode-sequence-based malware detection. In: Massacci, F., Wallach, D., Zannone, N. (eds.) ESSoS 2010. LNCS, vol. 5965, pp. 35–43. Springer, Heidelberg (2010)
Chapter Google Scholar
Christodorescu, M.: Behavior-based malware detection. PhD thesis (2007)
Google Scholar
Perdisci, R., Gu, G., Lee, W.: Using an ensemble of one-class svm classifiers to harden payload-based anomaly detection systems. In: Proceedings of 6^th International Conference on Data Mining (ICDM), pp. 488–498. IEEE, Los Alamitos (2007)
Google Scholar
Chapelle, O., Schölkopf, B., Zien, A.: Semi-supervised learning. MIT Press, Cambridge (2006)
Book Google Scholar
Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Proceedings of the 2003 Conference Advances in Neural Information Processing Systems, vol. 16, pp. 595–602 (2004)
Google Scholar
McGill, M.J., Salton, G.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Garner, S.: Weka: The Waikato environment for knowledge analysis. In: Proceedings of the New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)
Google Scholar
Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. International Journal of Computer Applications in Technology 35(2), 183–193 (2009)
Article Google Scholar
Kang, M., Poosankam, P., Yin, H.: Renovo: A hidden code extractor for packed executables. In: Proceedings of the 2007 ACM Workshop on Recurring Malcode, pp. 46–53 (2007)
Google Scholar
Royal, P., Halpin, M., Dagon, D., Edmonds, R., Lee, W.: Polyunpack: Automating the hidden-code extraction of unpack-executing malware. In: Proceedings of the 22^nd Annual Computer Security Applications Conference (ACSAC), pp. 289–300 (2006)
Google Scholar
Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: Fast, generic, and safe unpacking of malware. In: Proceedings of the 23^rd Annual Computer Security Applications Conference (ACSAC), pp. 431–441 (2007)
Google Scholar
Sharif, M., Yegneswaran, V., Saidi, H., Porras, P.A., Lee, W.: Eureka: A framework for enabling static malware analysis. In: Jajodia, S., Lopez, J. (eds.) ESORICS 2008. LNCS, vol. 5283, pp. 481–500. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

S3Lab, DeustoTech - Computing, Deusto Institute of Technology, University of Deusto, Avenida de las Universidades 24, 48007, Bilbao, Spain
Igor Santos, Borja Sanz, Carlos Laorden, Felix Brezo & Pablo G. Bringas

Authors

Igor Santos
View author publications
You can also search for this author in PubMed Google Scholar
Borja Sanz
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Laorden
View author publications
You can also search for this author in PubMed Google Scholar
Felix Brezo
View author publications
You can also search for this author in PubMed Google Scholar
Pablo G. Bringas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Ingeniería Civil, Universidad de Burgos, Francisco de Vitoria s/n, 09006, Burgos, Spain
Álvaro Herrero
Departamento de Informática y Automática, Universidad de Salamanca, Plaza de la Merced s/n, 37008, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Santos, I., Sanz, B., Laorden, C., Brezo, F., Bringas, P.G. (2011). Opcode-Sequence-Based Semi-supervised Unknown Malware Detection. In: Herrero, Á., Corchado, E. (eds) Computational Intelligence in Security for Information Systems. Lecture Notes in Computer Science, vol 6694. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21323-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-21323-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21322-9
Online ISBN: 978-3-642-21323-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics