Idea: Opcode-Sequence-Based Malware Detection

  • Igor Santos
  • Felix Brezo
  • Javier Nieves
  • Yoseba K. Penya
  • Borja Sanz
  • Carlos Laorden
  • Pablo G. Bringas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5965)


Malware is every malicious code that has the potential to harm any computer or network. The amount of malware is increasing faster every year and poses a serious security threat. Hence, malware detection has become a critical topic in computer security. Currently, signature-based detection is the most extended method within commercial antivirus. Although this method is still used on most popular commercial computer antivirus software, it can only achieve detection once the virus has already caused damage and it is registered. Therefore, it fails to detect new variations of known malware. In this paper, we propose a new method to detect variants of known malware families. This method is based on the frequency of appearance of opcode sequences. Furthermore, we describe a method to mine the relevance of each opcode and, thereby, weigh each opcode sequence frequency. We show that this method provides an effective way to detect variants of known malware families.


malware detection computer security machine learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Karsperky-Labs: Kaspersky Security Bulletin: Statistics 2008 (2009)Google Scholar
  2. 2.
    Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th USENIX Security Symposium, February 2003, pp. 169–186 (2003)Google Scholar
  3. 3.
    Morley, P.: Processing virus collections. In: Proceedings of the 2001 Virus Bulletin Conference (VB 2001), Virus Bulletin, pp. 129–134 (2001)Google Scholar
  4. 4.
    Bilar, D.: Opcodes as predictor for malware. International Journal of Electronic Security and Digital Forensics 1(2), 156–168 (2007)CrossRefGoogle Scholar
  5. 5.
    VX heavens (2009), (Last accessed: September 29, 2009)
  6. 6.
    NewBasic - An x86 Assembler/Disassembler for DOS, (Last accessed: September 29, 2009)
  7. 7.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1226–1238 (2005)Google Scholar
  8. 8.
    McGill, M., Salton, G.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)zbMATHGoogle Scholar
  9. 9.
    Tata, S., Patel, J.: Estimating the Selectivity of tf-idf based Cosine Similarity Predicates. SIGMOD Record 36(2), 75–80 (2007)CrossRefGoogle Scholar
  10. 10.
    Carrera, E., Erdélyi, G.: Digital genome mapping–advanced binary malware analysis. In: Virus Bulletin Conference, pp. 187–197 (2004)Google Scholar
  11. 11.
    Ashcraft, K., Engler, D.: Using programmer-written compiler extensions to catch security holes. In: Proceedings of the 23rd IEEE Symposium on Security and Privacy, pp. 143–159 (2002)Google Scholar
  12. 12.
    Schultz, M., Eskin, E., Zadok, F., Stolfo, S.: Data mining methods for detection of new malicious executables. In: Proceedings of the 22nd IEEE Symposium on Security and Privacy, pp. 38–49 (2001)Google Scholar
  13. 13.
    Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), pp. 470–478. ACM, New York (2004)CrossRefGoogle Scholar
  14. 14.
    Santos, I., Penya, Y., Devesa, J., Bringas, P.: N-Grams-based file signatures for malware detection. In: Proceedings of the 11th International Conference on Enterprise Information Systems (ICEIS), Volume AIDSS, pp. 317–320 (2009)Google Scholar
  15. 15.
    Christodorescu, M., Jha, S., Seshia, S., Song, D., Bryant, R.: Semantics-aware malware detection. In: Proceedings of the 2005 IEEE Symposium on Security and Privacy, pp. 32–46 (2005)Google Scholar
  16. 16.
    Cavallaro, L., Saxena, P., Sekar, R.: On the limits of information flow techniques for malware analysis and containment. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 143–163. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Bayer, U., Moser, A., Kruegel, C., Kirda, E.: Dynamic analysis of malicious code. Journal in Computer Virology 2(1), 67–77 (2006)CrossRefGoogle Scholar
  18. 18.
    King, S., Chen, P.: SubVirt: Implementing malware with virtual machines. In: 2006 IEEE Symposium on Security and Privacy, pp. 314–327 (2006)Google Scholar
  19. 19.
    Willems, C., Holz, T., Freiling, F.: Toward automated dynamic malware analysis using cwsandbox. IEEE Security & Privacy 5(2), 32–39 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Igor Santos
    • 1
  • Felix Brezo
    • 1
  • Javier Nieves
    • 1
  • Yoseba K. Penya
    • 2
  • Borja Sanz
    • 1
  • Carlos Laorden
    • 1
  • Pablo G. Bringas
    • 1
  1. 1.S3 Lab 
  2. 2.eNergy LabUniversity of DeustoBilbaoSpain

Personalised recommendations