Performance Issues About Context-Triggered Piecewise Hashing

  • Frank Breitinger
  • Harald Baier
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 88)


A hash function is a well-known method in computer science to map arbitrary large data to bit strings of a fixed short length. This property is used in computer forensics to identify known files on base of their hash value. As of today, in a pre-step process hash values of files are generated and stored in a database; typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. Due to security properties of cryptographic hash functions, they can not be used to identify similar files. Therefore Jesse Kornblum proposed a similarity preserving hash function to identify similar files. This paper discusses the efficiency of Kornblum’s approach. We present some enhancements that increase the performance of his algorithm by 55% if applied to a real life scenario. Furthermore, we discuss some characteristics of a sample Windows XP system, which are relevant for the performance of Kornblum’s approach.


Digital forensics techniques and tools context-triggered piecewise hash functions fuzzy-hashing efficiency of ssdeep subtleties of fuzzy-hashing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    National Institute of Standards and Technology, National Software Reference Library (July 2011),
  2. 2.
    Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Investigation 3S, 91–97 (2006), CrossRefGoogle Scholar
  3. 3.
    Tridgell, A.: Spamsum. Readme (2002),
  4. 4.
    Roussev, V., Richard, G.G., Marziale, L.: Multi-resolution similarity hashing. Digital Investigation 4S, 105–113 (2007)CrossRefGoogle Scholar
  5. 5.
    Chen, L., Wang, G.: An efficient piecewise hashing method for computer forensics. In: Proceedings of the International Workshop on Knowledge Discovery and Data Mining, pp. 635–638 (2008)Google Scholar
  6. 6.
    Baier, H., Breitinger, F.: Security aspects of piecewise hashing in computer forensics. In: 6th International Conference on IT Security Incident Management & IT Forensics (May 2011)Google Scholar
  7. 7.
    Roussev, V., Chen, Y., Bourg, T., Rechard, G.G.: md5bloom: Forensic filesystem hashing revisited. Digital Investigation 3S, 82–90 (2006)CrossRefGoogle Scholar
  8. 8.
    Roussev, V.: Data fingerprinting with similarity digests. IFIP, vol. 337, pp. 207–226 (2010)Google Scholar
  9. 9.
    Seo, K., Lim, K., Choi, J., Chang, K., Lee, S.: Detecting similar files based on hash and statistical analysis for digital forensic investigation. In: 2nd International Conference on Computer Science and its Applications, CSA 2009, pp. 1–6 (December 2009)Google Scholar
  10. 10.
    Menezes, A., Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography. CRC Press (1997)Google Scholar
  11. 11.
    Tridgell, A.: Spamsum. Readme (2002),
  12. 12.
    Kornblum, J.: ssdeep. Sourcecode and Documentation (September 2010),
  13. 13.

Copyright information

© ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering 2012

Authors and Affiliations

  • Frank Breitinger
    • 1
  • Harald Baier
    • 1
  1. 1.Center for Advanced Security Research Darmstadt (CASED) and Department of Computer ScienceHochschule DarmstadtDarmstadtGermany

Personalised recommendations