Code Type Revealing Using Experiments Framework

Sharon, Rami; Gudes, Ehud

doi:10.1007/978-3-642-31540-4_15

Rami Sharon¹⁷ &
Ehud Gudes¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7371))

Included in the following conference series:

IFIP Annual Conference on Data and Applications Security and Privacy

1963 Accesses

Abstract

Identifying the type of a code, whether in a file or byte stream, is a challenge that many software companies are facing. Many applications, security and others, base their behavior on the type of code they receive as an input.

Today’s traditional identification methods rely on file extensions, magic numbers, propriety headers and trailers or specific type identifying rules. All these are vulnerable to content tampering and discovering it requires investing long and tedious working hours of professionals. This study is aimed to find a method of identifying the best settings to automatically create type signatures that will effectively overcome the content manipulation problem.

In this paper we lay out a framework for creating type signatures based on byte N-Grams. The framework allows setting various parameters such as NGram sizes and windows, selecting statistical tests and defining rules for score calculations. The framework serves as a test lab that allows finding the right parameters to satisfy a predefined threshold of type identification accuracy. We demonstrate the framework using basic settings that achieved an F-Measure success rate of 0.996 on 1400 test files.

Download to read the full chapter text

Chapter PDF

Data Type Classification: Hierarchical Class-to-Type Modeling

Using String Information for Malware Family Identification

Source Code Authorship Identification Using Tokenization and Boosting Algorithms

Keywords

References

McDaniel, M., Heydari, M.H.: Content Based File Type Detection Algorithms. In: Proceedings for the 36th Hawaii International Conference on System Sciences (2002)
Google Scholar
Li, W.-J., Stolfo, S.J., Herzog, B.: Fileprints: Identifying File Types by n-gram Analysis. In: 2005 IEEE Workshop on Information Assurance, West Point, NY (2005)
Google Scholar
Karresand, M., Shahmehri, N.: Oscar – File Type Identification of Binary Data in Disk Clusters and RAM Pages. In: Fischer-Hübner, S., Rannenberg, K., Yngström, L., Lindskog, S. (eds.) Security and Privacy in Dynamic Environment. IFIP, vol. 206, pp. 413–424. Springer, Boston (2006)
Chapter Google Scholar
Karresand, M., Shahmehri, N.: File Type Identification of Data Fragments by Their Binary Structure. In: Proceedings of the 2006 IEEE Workshop on Information Assurance United States Military Academy, West Point, NY (2006)
Google Scholar
Kolter, J.Z., Maloof, M.A.: Learning to Detect Malicious Executables in the Wild. In: Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004)
Google Scholar
Dash, K.S., Dubba, S.R.K., Pujari, K.A.: New Malicious Code Detection Using Variable Length n-grams. In: Algorithms, Architectures and Information Systems Security, ch. 14, pp. 307–323. World Scientific (2008)
Google Scholar
Irfan, A., Kyung, L., Hyunjung, S., ManPyo, H.: Content-Based File-type Identification Using Cosine Similarity and a Divide-and-Conquer Approach. IETE Technical Review 27(4) (July 2010)
Google Scholar
Moskovitch, R., et al.: Unknown malcode detection and the imbalance problem. Journal in Computer Virology 5(4), 295–308 (2009)
Article Google Scholar
Pedersen, T., Banerjee, S., Purandare, A., McInnes, B.T., Liu, Y.: NSP - Ngram Statistics Package (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

The Open University, Ra’anana, Israel
Rami Sharon
Ben-Gurion University, Beer-Sheva, Israel
Ehud Gudes

Authors

Rami Sharon
View author publications
You can also search for this author in PubMed Google Scholar
Ehud Gudes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Télécom Bretagne, Campus de Rennes 2, rue de la Châtaigneraie, 35512, Cesson Sévigné Cedex, France
Nora Cuppens-Boulahia , Frédéric Cuppens & Joaquin Garcia-Alfaro , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharon, R., Gudes, E. (2012). Code Type Revealing Using Experiments Framework. In: Cuppens-Boulahia, N., Cuppens, F., Garcia-Alfaro, J. (eds) Data and Applications Security and Privacy XXVI. DBSec 2012. Lecture Notes in Computer Science, vol 7371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31540-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-31540-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31539-8
Online ISBN: 978-3-642-31540-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Code Type Revealing Using Experiments Framework

Abstract

Chapter PDF

Similar content being viewed by others

Data Type Classification: Hierarchical Class-to-Type Modeling

Using String Information for Malware Family Identification

Source Code Authorship Identification Using Tokenization and Boosting Algorithms

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Code Type Revealing Using Experiments Framework

Abstract

Chapter PDF

Similar content being viewed by others

Data Type Classification: Hierarchical Class-to-Type Modeling

Using String Information for Malware Family Identification

Source Code Authorship Identification Using Tokenization and Boosting Algorithms

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation