A New N-gram Feature Extraction-Selection Method for Malicious Code

  • Hamid Parvin
  • Behrouz Minaei
  • Hossein Karshenas
  • Akram Beigi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6594)


N-grams are the basic features commonly used in sequence-based malicious code detection methods in computer virology research. The empirical results from previous works suggest that, while short length n-grams are easier to extract, the characteristics of the underlying executables are better represented in lengthier n-grams. However, by increasing the length of an n-gram, the feature space grows in an exponential manner and much space and computational resources are demanded. And therefore, feature selection has turned to be the most challenging step in establishing an accurate detection system based on byte n-grams. In this paper we propose an efficient feature extraction method where in order to gain more information; both adjacent and non-adjacent bi-grams are used. Additionally, we present a novel boosting feature selection method based on genetic algorithm. Our experimental results indicate that the proposed detection system detects virus programs far more accurately than the best earlier known methods.


Malicious Code N-gram Analysis Feature Selection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mitchell, T.: Machine Learning. Prentice Hall, Englewood Cliffs (1997)zbMATHGoogle Scholar
  2. 2.
    Schultz, M., Eskin, E., Zadok, E., Stolfo, S.: Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 38–49 (2001)Google Scholar
  3. 3.
    Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: Detection of new malicious code using n-grams signatures. In: PST, pp. 193–196 (2004)Google Scholar
  4. 4.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fisher, D.H. (ed.) Proceedings of ICML 1997, 14th International Conference on Machine Learning, Nashville, pp. 412–420 (1997)Google Scholar
  5. 5.
    Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 470–478 (2004)Google Scholar
  6. 6.
    Cohen, F.: Computer Viruses - Theory and Experiments. IFIP-TC11 Computers and Security 6, 22–35 (1987)CrossRefGoogle Scholar
  7. 7.
    Reddy, D.K.S., Pujari, A.K.: N-gram analysis for computer virus detection. Journal in Computer Virology 2(3), 231–239 (2006)CrossRefGoogle Scholar
  8. 8.
    Morin, B., Mé, L.: Intrusion detection and virology: an analysis of differences, similarities and complementariness. Journal of Computer Virology, vol 3, 39–49 (2007)CrossRefGoogle Scholar
  9. 9.
    Filiol, E.: Computer viruses: from theory to applications. Springer, New York (2005)zbMATHGoogle Scholar
  10. 10.
    Adleman, L.M.: An Abstract Theory of Computer Viruses. In: Goldwasser, S. (ed.) CRYPTO 1988. LNCS, vol. 403, pp. 354–374. Springer, Heidelberg (1990)CrossRefGoogle Scholar
  11. 11.
    Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. The Journal of Machine Learning Research 7, 2721–2744 (2006)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Minaei-Bidgoli, B., Kortemeyer, G., Punch, W.F.: Optimizing Classification Ensembles via a Genetic Algorithm for a Web-Based Educational System. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 397–406. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  13. 13.
    Breiman, L.: Arcing classifiers. The Annals of Statistics 26(3), 801–823 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Hamid Parvin
    • 1
  • Behrouz Minaei
    • 1
  • Hossein Karshenas
    • 1
  • Akram Beigi
    • 1
  1. 1.School of Computer EngineeringIran University of Science and Technology (IUST)TehranIran

Personalised recommendations