Abstract
A hash function is a well-known method in computer science to map arbitrary large data to bit strings of a fixed short length. This property is used in computer forensics to identify known files on base of their hash value. As of today, in a pre-step process hash values of files are generated and stored in a database; typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. Due to security properties of cryptographic hash functions, they can not be used to identify similar files. Therefore Jesse Kornblum proposed a similarity preserving hash function to identify similar files. This paper discusses the efficiency of Kornblum’s approach. We present some enhancements that increase the performance of his algorithm by 55% if applied to a real life scenario. Furthermore, we discuss some characteristics of a sample Windows XP system, which are relevant for the performance of Kornblum’s approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
National Institute of Standards and Technology, National Software Reference Library (July 2011), http://www.nsrl.nist.gov
Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Investigation 3S, 91–97 (2006), http://www.dfrws.org/2006/proceedings/12-Kornblum.pdf
Tridgell, A.: Spamsum. Readme (2002), http://samba.org/ftp/unpacked/junkcode/spamsum/README
Roussev, V., Richard, G.G., Marziale, L.: Multi-resolution similarity hashing. Digital Investigation 4S, 105–113 (2007)
Chen, L., Wang, G.: An efficient piecewise hashing method for computer forensics. In: Proceedings of the International Workshop on Knowledge Discovery and Data Mining, pp. 635–638 (2008)
Baier, H., Breitinger, F.: Security aspects of piecewise hashing in computer forensics. In: 6th International Conference on IT Security Incident Management & IT Forensics (May 2011)
Roussev, V., Chen, Y., Bourg, T., Rechard, G.G.: md5bloom: Forensic filesystem hashing revisited. Digital Investigation 3S, 82–90 (2006)
Roussev, V.: Data fingerprinting with similarity digests. IFIP, vol. 337, pp. 207–226 (2010)
Seo, K., Lim, K., Choi, J., Chang, K., Lee, S.: Detecting similar files based on hash and statistical analysis for digital forensic investigation. In: 2nd International Conference on Computer Science and its Applications, CSA 2009, pp. 1–6 (December 2009)
Menezes, A., Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography. CRC Press (1997)
Tridgell, A.: Spamsum. Readme (2002), http://samba.org/ftp/unpacked/junkcode/spamsum/README
Kornblum, J.: ssdeep. Sourcecode and Documentation (September 2010), http://ssdeep.sourceforge.net/
Walter, C.: Kryder’s law, http://www.scientificamerican.com/article.cfm?id=kryders-law&ref=sciam
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Breitinger, F., Baier, H. (2012). Performance Issues About Context-Triggered Piecewise Hashing. In: Gladyshev, P., Rogers, M.K. (eds) Digital Forensics and Cyber Crime. ICDF2C 2011. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 88. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35515-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-35515-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35514-1
Online ISBN: 978-3-642-35515-8
eBook Packages: Computer ScienceComputer Science (R0)