Performance Issues About Context-Triggered Piecewise Hashing

Breitinger, Frank; Baier, Harald

doi:10.1007/978-3-642-35515-8_12

Performance Issues About Context-Triggered Piecewise Hashing

Frank Breitinger¹⁷ &
Harald Baier¹⁷

Conference paper

1533 Accesses
7 Citations

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 88))

Abstract

A hash function is a well-known method in computer science to map arbitrary large data to bit strings of a fixed short length. This property is used in computer forensics to identify known files on base of their hash value. As of today, in a pre-step process hash values of files are generated and stored in a database; typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. Due to security properties of cryptographic hash functions, they can not be used to identify similar files. Therefore Jesse Kornblum proposed a similarity preserving hash function to identify similar files. This paper discusses the efficiency of Kornblum’s approach. We present some enhancements that increase the performance of his algorithm by 55% if applied to a real life scenario. Furthermore, we discuss some characteristics of a sample Windows XP system, which are relevant for the performance of Kornblum’s approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

National Institute of Standards and Technology, National Software Reference Library (July 2011), http://www.nsrl.nist.gov
Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Investigation 3S, 91–97 (2006), http://www.dfrws.org/2006/proceedings/12-Kornblum.pdf
Article Google Scholar
Tridgell, A.: Spamsum. Readme (2002), http://samba.org/ftp/unpacked/junkcode/spamsum/README
Roussev, V., Richard, G.G., Marziale, L.: Multi-resolution similarity hashing. Digital Investigation 4S, 105–113 (2007)
Article Google Scholar
Chen, L., Wang, G.: An efficient piecewise hashing method for computer forensics. In: Proceedings of the International Workshop on Knowledge Discovery and Data Mining, pp. 635–638 (2008)
Google Scholar
Baier, H., Breitinger, F.: Security aspects of piecewise hashing in computer forensics. In: 6th International Conference on IT Security Incident Management & IT Forensics (May 2011)
Google Scholar
Roussev, V., Chen, Y., Bourg, T., Rechard, G.G.: md5bloom: Forensic filesystem hashing revisited. Digital Investigation 3S, 82–90 (2006)
Article Google Scholar
Roussev, V.: Data fingerprinting with similarity digests. IFIP, vol. 337, pp. 207–226 (2010)
Google Scholar
Seo, K., Lim, K., Choi, J., Chang, K., Lee, S.: Detecting similar files based on hash and statistical analysis for digital forensic investigation. In: 2nd International Conference on Computer Science and its Applications, CSA 2009, pp. 1–6 (December 2009)
Google Scholar
Menezes, A., Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography. CRC Press (1997)
Google Scholar
Tridgell, A.: Spamsum. Readme (2002), http://samba.org/ftp/unpacked/junkcode/spamsum/README
Kornblum, J.: ssdeep. Sourcecode and Documentation (September 2010), http://ssdeep.sourceforge.net/
Walter, C.: Kryder’s law, http://www.scientificamerican.com/article.cfm?id=kryders-law&ref=sciam

Download references

Author information

Authors and Affiliations

Center for Advanced Security Research Darmstadt (CASED) and Department of Computer Science, Hochschule Darmstadt, Mornewegstr. 32, D – 64293, Darmstadt, Germany
Frank Breitinger & Harald Baier

Authors

Frank Breitinger
View author publications
You can also search for this author in PubMed Google Scholar
Harald Baier
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Informatics, University College Dublin, 4, Dublin, Ireland
Pavel Gladyshev
Department of Computer and Information Technology, Purdue University, 47907, West Lafayette, IN, USA
Marcus K. Rogers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Breitinger, F., Baier, H. (2012). Performance Issues About Context-Triggered Piecewise Hashing. In: Gladyshev, P., Rogers, M.K. (eds) Digital Forensics and Cyber Crime. ICDF2C 2011. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 88. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35515-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-35515-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35514-1
Online ISBN: 978-3-642-35515-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics