Fast Content-Based File Type Identification
Purchase on Springer.com
$29.95 / €24.95 / £19.95*
* Final gross prices may vary according to local VAT.
Digital forensic examiners often need to identify the type of a file or file fragment based on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds up classification by randomly sampling file blocks. Experimental results demonstrate that up to a fifteen-fold reduction in computational time can be achieved with limited impact on accuracy.
- M. Amirani, M. Toorani and A. Shirazi, A new approach to content-based file type detection, Proceedings of the Thirteenth IEEE Symposium on Computers and Communications, pp. 1103–1108, 2008.
- W. Calhoun and D. Coles, Predicting the types of file fragments, Digital Investigation, vol. 5(S1), pp. 14–20, 2008. CrossRef
- S. Garfinkel, Carving contiguous and fragmented files with fast object validation, Digital Investigation, vol. 4(S1), pp. 2–12, 2007. CrossRef
- R. Duda, P. Hart and D. Stork, Pattern Classification, John Wiley, New York, 2001.
- R. Harris, Using Artificial Neural Networks for Forensic File Type Identification, CERIAS Technical Report 2007-19, Center for Education and Research in Information Assurance and Security, Purdue University, West Lafayette, Indiana, 2007.
- C. Hsu and C. Lin, A comparison of methods for multiclass support vector machines, IEEE Transactions on Neural Networks, vol. 13(2), pp. 415–425, 2002. CrossRef
- M. Karresand and N. Shahmehri, File type identification of data fragments by their binary structure, Proceedings of the Seventh Annual IEEE Information Assurance Workshop, pp. 140–147, 2006. CrossRef
- M. Karresand and N. Shahmehri, Oscar – File type identification of binary data in disk clusters and RAM pages, Proceedings of the IFIP International Conference on Information Security, pp. 413–424, 2006.
- W. Li, K. Wang, S. Stolfo and B. Herzog, Fileprints: Identifying file types by n-gram analysis, Proceedings of the Sixth Annual IEEE Information Assurance Workshop, pp. 64–71, 2005.
- M. McDaniel and M. Heydari, Content based file type detection algorithms, Proceedings of the Thirty-Sixth Annual Hawaii International Conference on System Sciences, 2003.
- A. Rencher, Methods of Multivariate Analysis, John Wiley, New York, 2002. CrossRef
- V. Roussev and S. Garfinkel, File fragment classification – The case for specialized approaches, Proceedings of the Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering, pp. 3–14, 2009. CrossRef
- P. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Addison-Wesley, Reading, Massachusetts, 2005.
- C. Veenman, Statistical disk cluster classification for file carving, Proceedings of the Third International Symposium on Information Assurance and Security, pp. 393–398, 2007. CrossRef
- Fast Content-Based File Type Identification
- Book Title
- Advances in Digital Forensics VII
- Book Subtitle
- 7th IFIP WG 11.9 International Conference on Digital Forensics, Orlando, FL, USA, January 31 – February 2, 2011, Revised Selected Papers
- Book Part
- PART II
- pp 65-75
- Print ISBN
- Online ISBN
- Series Title
- IFIP Advances in Information and Communication Technology
- Series Volume
- Series ISSN
- Springer Berlin Heidelberg
- Copyright Holder
- IFIP International Federation for Information Processing
- Additional Links
- File type identification
- file content classification
- byte frequency
- Industry Sectors
- eBook Packages
- Editor Affiliations
- 1. Air Force Institute of Technology, Wright-Patterson Air Force Base
- 2. Department of Computer Science, University of Tulsa
- Author Affiliations
- 3. Information Security Institute, Queensland University of Technology, Brisbane, Australia
- 4. Ajou University, Suwon, South Korea
To view the rest of this content please follow the download PDF link above.