Skip to main content

Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2

  • Conference paper
Digital Forensics and Cyber Crime (ICDF2C 2012)

Abstract

Hash functions are a widespread class of functions in computer science and used in several applications, e.g. in computer forensics to identify known files. One basic property of cryptographic Hash Functions is the avalanche effect that causes a significantly different output if an input is changed slightly. As some applications also need to identify similar files (e.g. spam/virus detection) this raised the need for Similarity Preserving Hashing. In recent years, several approaches came up, all with different namings, properties, strengths and weaknesses which is due to a missing definition.

Based on the properties and use cases of traditional Hash Functions this paper discusses a uniform naming and properties which is a first step towards a suitable definition of Similarity Preserving Hashing. Additionally, we extend the algorithm MRSH for Similarity Preserving Hashing to its successor MRSH-v2, which has three specialties. First, it fulfills all our proposed defining properties, second, it outperforms existing approaches especially with respect to run time performance and third it has two detections modes. The regular mode of MRSH-v2 is used to identify similar files whereas the f-mode is optimal for fragment detection, i.e. to identify similar parts of a file.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. NIST, “National Software Reference Library” (May 2012), http://www.nsrl.nist.gov

  2. Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. In: Digital Forensic Research Workshop (DFRWS), vol. 3S, pp. 91–97 (2006)

    Google Scholar 

  3. Roussev, V.: Data fingerprinting with similarity digests. In: Chow, K.-P., Shenoi, S. (eds.) Advances in Digital Forensics VI. IFIP AICT, vol. 337, pp. 207–226. Springer, Heidelberg (2010)

    Google Scholar 

  4. Breitinger, F., Baier, H.: A Fuzzy Hashing Approach based on Random Sequences and Hamming Distance. In: ADFSL Conference on Digital Forensics, Security and Law, pp. 89–101 (May 2012)

    Google Scholar 

  5. Roussev, V., Richard, G.G., Marziale, L.: Multi-resolution similarity hashing. In: Digital Forensic Research Workshop (DFRWS), pp. 105–113 (2007)

    Google Scholar 

  6. Roussev, V.: Scalable data correlation. International Conference on Digital Forensics (IFIP WG 11.9) (January 2012)

    Google Scholar 

  7. Tridgell, A.: Spamsum. Readme (2002), http://samba.org/ftp/unpacked/junkcode/spamsum/README

  8. Chen, L., Wang, G.: An Efficient Piecewise Hashing Method for Computer Forensics. In: Workshop on Knowledge Discovery and Data Mining, pp. 635–638 (2008)

    Google Scholar 

  9. Seo, K., Lim, K., Choi, J., Chang, K., Lee, S.: Detecting Similar Files Based on Hash and Statistical Analysis for Digital Forensic Investigation. In: Computer Science and its Applications (CSA 2009), pp. 1–6 (December 2009)

    Google Scholar 

  10. Breitinger, F., Baier, H.: Performance Issues About Context-Triggered Piecewise Hashing. In: Gladyshev, P., Rogers, M.K. (eds.) ICDF2C 2011. LNICST, vol. 88, pp. 141–155. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  11. Baier, H., Breitinger, F.: Security Aspects of Piecewise Hashing in Computer Forensics. In: IT Security Incident Management & IT Forensics (IMF), 21–36 (May 2011)

    Google Scholar 

  12. Breitinger, F.: Security Aspects of Fuzzy Hashing. Master’s thesis, Hochschule Darmstadt (February 2011), https://www.dasec.h-da.de/offerings/theses/

  13. Roussev, V.: Building a Better Similarity Trap with Statistically Improbable Features. In: 42nd Hawaii International Conference on System Sciences, pp. 1–10 (2009)

    Google Scholar 

  14. SHS, “Secure Hash Standard” (1995)

    Google Scholar 

  15. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13, 422–426 (1970)

    Article  MATH  Google Scholar 

  16. Roussev, V.: An evaluation of forensic similarity hashes. In: Digital Forensic Research Workshop, vol. 8, pp. 34–41 (2011)

    Google Scholar 

  17. Breitinger, F., Baier, H., Beckingham, J.: Security and Implementation Analysis of the Similarity Digest sdhash. In: First International Baltic Conference on Network Security & Forensics (NeSeFo) (August 2012)

    Google Scholar 

  18. Noll, L.C.: Fowler / Noll / Vo (FNV) Hash (2001), http://www.isthe.com/chongo/tech/comp/fnv/index.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Breitinger, F., Baier, H. (2013). Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2 . In: Rogers, M., Seigfried-Spellar, K.C. (eds) Digital Forensics and Cyber Crime. ICDF2C 2012. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 114. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39891-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39891-9_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39890-2

  • Online ISBN: 978-3-642-39891-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics