A New N-gram Feature Extraction-Selection Method for Malicious Code

Parvin, Hamid; Minaei, Behrouz; Karshenas, Hossein; Beigi, Akram

doi:10.1007/978-3-642-20267-4_11

Hamid Parvin¹⁷,
Behrouz Minaei¹⁷,
Hossein Karshenas¹⁷ &
…
Akram Beigi¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6594))

Included in the following conference series:

International Conference on Adaptive and Natural Computing Algorithms

1791 Accesses
14 Citations

Abstract

N-grams are the basic features commonly used in sequence-based malicious code detection methods in computer virology research. The empirical results from previous works suggest that, while short length n-grams are easier to extract, the characteristics of the underlying executables are better represented in lengthier n-grams. However, by increasing the length of an n-gram, the feature space grows in an exponential manner and much space and computational resources are demanded. And therefore, feature selection has turned to be the most challenging step in establishing an accurate detection system based on byte n-grams. In this paper we propose an efficient feature extraction method where in order to gain more information; both adjacent and non-adjacent bi-grams are used. Additionally, we present a novel boosting feature selection method based on genetic algorithm. Our experimental results indicate that the proposed detection system detects virus programs far more accurately than the best earlier known methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mitchell, T.: Machine Learning. Prentice Hall, Englewood Cliffs (1997)
MATH Google Scholar
Schultz, M., Eskin, E., Zadok, E., Stolfo, S.: Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 38–49 (2001)
Google Scholar
Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: Detection of new malicious code using n-grams signatures. In: PST, pp. 193–196 (2004)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fisher, D.H. (ed.) Proceedings of ICML 1997, 14th International Conference on Machine Learning, Nashville, pp. 412–420 (1997)
Google Scholar
Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 470–478 (2004)
Google Scholar
Cohen, F.: Computer Viruses - Theory and Experiments. IFIP-TC11 Computers and Security 6, 22–35 (1987)
Article Google Scholar
Reddy, D.K.S., Pujari, A.K.: N-gram analysis for computer virus detection. Journal in Computer Virology 2(3), 231–239 (2006)
Article Google Scholar
Morin, B., Mé, L.: Intrusion detection and virology: an analysis of differences, similarities and complementariness. Journal of Computer Virology, vol 3, 39–49 (2007)
Article Google Scholar
Filiol, E.: Computer viruses: from theory to applications. Springer, New York (2005)
MATH Google Scholar
Adleman, L.M.: An Abstract Theory of Computer Viruses. In: Goldwasser, S. (ed.) CRYPTO 1988. LNCS, vol. 403, pp. 354–374. Springer, Heidelberg (1990)
Chapter Google Scholar
Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. The Journal of Machine Learning Research 7, 2721–2744 (2006)
MathSciNet MATH Google Scholar
Minaei-Bidgoli, B., Kortemeyer, G., Punch, W.F.: Optimizing Classification Ensembles via a Genetic Algorithm for a Web-Based Educational System. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 397–406. Springer, Heidelberg (2004)
Chapter Google Scholar
Breiman, L.: Arcing classifiers. The Annals of Statistics 26(3), 801–823 (1998)
Article MathSciNet MATH Google Scholar
http://vx.netlux.org/

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Iran University of Science and Technology (IUST), Tehran, Iran
Hamid Parvin, Behrouz Minaei, Hossein Karshenas & Akram Beigi

Authors

Hamid Parvin
View author publications
You can also search for this author in PubMed Google Scholar
Behrouz Minaei
View author publications
You can also search for this author in PubMed Google Scholar
Hossein Karshenas
View author publications
You can also search for this author in PubMed Google Scholar
Akram Beigi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Computer and Information Science, University of Ljubljana, Tržaška 25, 1000, Ljubljana, Slovenia
Andrej Dobnikar , Uroš Lotrič & Branko Šter , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Parvin, H., Minaei, B., Karshenas, H., Beigi, A. (2011). A New N-gram Feature Extraction-Selection Method for Malicious Code. In: Dobnikar, A., Lotrič, U., Šter, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2011. Lecture Notes in Computer Science, vol 6594. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20267-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-20267-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20266-7
Online ISBN: 978-3-642-20267-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics