Skip to main content

Performance Issues About Context-Triggered Piecewise Hashing

  • Conference paper

Abstract

A hash function is a well-known method in computer science to map arbitrary large data to bit strings of a fixed short length. This property is used in computer forensics to identify known files on base of their hash value. As of today, in a pre-step process hash values of files are generated and stored in a database; typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. Due to security properties of cryptographic hash functions, they can not be used to identify similar files. Therefore Jesse Kornblum proposed a similarity preserving hash function to identify similar files. This paper discusses the efficiency of Kornblum’s approach. We present some enhancements that increase the performance of his algorithm by 55% if applied to a real life scenario. Furthermore, we discuss some characteristics of a sample Windows XP system, which are relevant for the performance of Kornblum’s approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. National Institute of Standards and Technology, National Software Reference Library (July 2011), http://www.nsrl.nist.gov

  2. Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Investigation 3S, 91–97 (2006), http://www.dfrws.org/2006/proceedings/12-Kornblum.pdf

    Article  Google Scholar 

  3. Tridgell, A.: Spamsum. Readme (2002), http://samba.org/ftp/unpacked/junkcode/spamsum/README

  4. Roussev, V., Richard, G.G., Marziale, L.: Multi-resolution similarity hashing. Digital Investigation 4S, 105–113 (2007)

    Article  Google Scholar 

  5. Chen, L., Wang, G.: An efficient piecewise hashing method for computer forensics. In: Proceedings of the International Workshop on Knowledge Discovery and Data Mining, pp. 635–638 (2008)

    Google Scholar 

  6. Baier, H., Breitinger, F.: Security aspects of piecewise hashing in computer forensics. In: 6th International Conference on IT Security Incident Management & IT Forensics (May 2011)

    Google Scholar 

  7. Roussev, V., Chen, Y., Bourg, T., Rechard, G.G.: md5bloom: Forensic filesystem hashing revisited. Digital Investigation 3S, 82–90 (2006)

    Article  Google Scholar 

  8. Roussev, V.: Data fingerprinting with similarity digests. IFIP, vol. 337, pp. 207–226 (2010)

    Google Scholar 

  9. Seo, K., Lim, K., Choi, J., Chang, K., Lee, S.: Detecting similar files based on hash and statistical analysis for digital forensic investigation. In: 2nd International Conference on Computer Science and its Applications, CSA 2009, pp. 1–6 (December 2009)

    Google Scholar 

  10. Menezes, A., Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography. CRC Press (1997)

    Google Scholar 

  11. Tridgell, A.: Spamsum. Readme (2002), http://samba.org/ftp/unpacked/junkcode/spamsum/README

  12. Kornblum, J.: ssdeep. Sourcecode and Documentation (September 2010), http://ssdeep.sourceforge.net/

  13. Walter, C.: Kryder’s law, http://www.scientificamerican.com/article.cfm?id=kryders-law&ref=sciam

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Breitinger, F., Baier, H. (2012). Performance Issues About Context-Triggered Piecewise Hashing. In: Gladyshev, P., Rogers, M.K. (eds) Digital Forensics and Cyber Crime. ICDF2C 2011. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 88. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35515-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35515-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35514-1

  • Online ISBN: 978-3-642-35515-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics