Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

IFIP Annual Conference on Data and Applications Security and Privacy

DBSec 2012: Data and Applications Security and Privacy XXVI pp 193–206Cite as

  1. Home
  2. Data and Applications Security and Privacy XXVI
  3. Conference paper
Code Type Revealing Using Experiments Framework

Code Type Revealing Using Experiments Framework

  • Rami Sharon17 &
  • Ehud Gudes18 
  • Conference paper
  • 1901 Accesses

Part of the Lecture Notes in Computer Science book series (LNISA,volume 7371)

Abstract

Identifying the type of a code, whether in a file or byte stream, is a challenge that many software companies are facing. Many applications, security and others, base their behavior on the type of code they receive as an input.

Today’s traditional identification methods rely on file extensions, magic numbers, propriety headers and trailers or specific type identifying rules. All these are vulnerable to content tampering and discovering it requires investing long and tedious working hours of professionals. This study is aimed to find a method of identifying the best settings to automatically create type signatures that will effectively overcome the content manipulation problem.

In this paper we lay out a framework for creating type signatures based on byte N-Grams. The framework allows setting various parameters such as NGram sizes and windows, selecting statistical tests and defining rules for score calculations. The framework serves as a test lab that allows finding the right parameters to satisfy a predefined threshold of type identification accuracy. We demonstrate the framework using basic settings that achieved an F-Measure success rate of 0.996 on 1400 test files.

Keywords

  • File Type
  • Content type revealing framework
  • Code type
  • Byte N-Gram statistical analysis

Download conference paper PDF

References

  1. McDaniel, M., Heydari, M.H.: Content Based File Type Detection Algorithms. In: Proceedings for the 36th Hawaii International Conference on System Sciences (2002)

    Google Scholar 

  2. Li, W.-J., Stolfo, S.J., Herzog, B.: Fileprints: Identifying File Types by n-gram Analysis. In: 2005 IEEE Workshop on Information Assurance, West Point, NY (2005)

    Google Scholar 

  3. Karresand, M., Shahmehri, N.: Oscar – File Type Identification of Binary Data in Disk Clusters and RAM Pages. In: Fischer-Hübner, S., Rannenberg, K., Yngström, L., Lindskog, S. (eds.) Security and Privacy in Dynamic Environment. IFIP, vol. 206, pp. 413–424. Springer, Boston (2006)

    CrossRef  Google Scholar 

  4. Karresand, M., Shahmehri, N.: File Type Identification of Data Fragments by Their Binary Structure. In: Proceedings of the 2006 IEEE Workshop on Information Assurance United States Military Academy, West Point, NY (2006)

    Google Scholar 

  5. Kolter, J.Z., Maloof, M.A.: Learning to Detect Malicious Executables in the Wild. In: Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004)

    Google Scholar 

  6. Dash, K.S., Dubba, S.R.K., Pujari, K.A.: New Malicious Code Detection Using Variable Length n-grams. In: Algorithms, Architectures and Information Systems Security, ch. 14, pp. 307–323. World Scientific (2008)

    Google Scholar 

  7. Irfan, A., Kyung, L., Hyunjung, S., ManPyo, H.: Content-Based File-type Identification Using Cosine Similarity and a Divide-and-Conquer Approach. IETE Technical Review 27(4) (July 2010)

    Google Scholar 

  8. Moskovitch, R., et al.: Unknown malcode detection and the imbalance problem. Journal in Computer Virology 5(4), 295–308 (2009)

    CrossRef  Google Scholar 

  9. Pedersen, T., Banerjee, S., Purandare, A., McInnes, B.T., Liu, Y.: NSP - Ngram Statistics Package (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. The Open University, Ra’anana, Israel

    Rami Sharon

  2. Ben-Gurion University, Beer-Sheva, Israel

    Ehud Gudes

Authors
  1. Rami Sharon
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Ehud Gudes
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Télécom Bretagne, Campus de Rennes 2, rue de la Châtaigneraie, 35512, Cesson Sévigné Cedex, France

    Nora Cuppens-Boulahia, Frédéric Cuppens & Joaquin Garcia-Alfaro,  & 

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 IFIP International Federation for Information Processing

About this paper

Cite this paper

Sharon, R., Gudes, E. (2012). Code Type Revealing Using Experiments Framework. In: Cuppens-Boulahia, N., Cuppens, F., Garcia-Alfaro, J. (eds) Data and Applications Security and Privacy XXVI. DBSec 2012. Lecture Notes in Computer Science, vol 7371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31540-4_15

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-31540-4_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31539-8

  • Online ISBN: 978-3-642-31540-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature