Performance Issues About Context-Triggered Piecewise Hashing
A hash function is a well-known method in computer science to map arbitrary large data to bit strings of a fixed short length. This property is used in computer forensics to identify known files on base of their hash value. As of today, in a pre-step process hash values of files are generated and stored in a database; typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. Due to security properties of cryptographic hash functions, they can not be used to identify similar files. Therefore Jesse Kornblum proposed a similarity preserving hash function to identify similar files. This paper discusses the efficiency of Kornblum’s approach. We present some enhancements that increase the performance of his algorithm by 55% if applied to a real life scenario. Furthermore, we discuss some characteristics of a sample Windows XP system, which are relevant for the performance of Kornblum’s approach.
KeywordsDigital forensics techniques and tools context-triggered piecewise hash functions fuzzy-hashing efficiency of ssdeep subtleties of fuzzy-hashing
Unable to display preview. Download preview PDF.
- 1.National Institute of Standards and Technology, National Software Reference Library (July 2011), http://www.nsrl.nist.gov
- 2.Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Investigation 3S, 91–97 (2006), http://www.dfrws.org/2006/proceedings/12-Kornblum.pdf CrossRefGoogle Scholar
- 3.Tridgell, A.: Spamsum. Readme (2002), http://samba.org/ftp/unpacked/junkcode/spamsum/README
- 5.Chen, L., Wang, G.: An efficient piecewise hashing method for computer forensics. In: Proceedings of the International Workshop on Knowledge Discovery and Data Mining, pp. 635–638 (2008)Google Scholar
- 6.Baier, H., Breitinger, F.: Security aspects of piecewise hashing in computer forensics. In: 6th International Conference on IT Security Incident Management & IT Forensics (May 2011)Google Scholar
- 8.Roussev, V.: Data fingerprinting with similarity digests. IFIP, vol. 337, pp. 207–226 (2010)Google Scholar
- 9.Seo, K., Lim, K., Choi, J., Chang, K., Lee, S.: Detecting similar files based on hash and statistical analysis for digital forensic investigation. In: 2nd International Conference on Computer Science and its Applications, CSA 2009, pp. 1–6 (December 2009)Google Scholar
- 10.Menezes, A., Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography. CRC Press (1997)Google Scholar
- 11.Tridgell, A.: Spamsum. Readme (2002), http://samba.org/ftp/unpacked/junkcode/spamsum/README
- 12.Kornblum, J.: ssdeep. Sourcecode and Documentation (September 2010), http://ssdeep.sourceforge.net/
- 13.Walter, C.: Kryder’s law, http://www.scientificamerican.com/article.cfm?id=kryders-law&ref=sciam