Using Randomization to Attack Similarity Digests

  • Jonathan Oliver
  • Scott Forman
  • Chun Cheng
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 490)


There has been considerable research and use of similarity digests and Locality Sensitive Hashing (LSH) schemes - those hashing schemes where small changes in a file result in small changes in the digest. These schemes are useful in security and forensic applications. We examine how well three similarity digest schemes (Ssdeep, Sdhash and TLSH) work when exposed to random change. Various file types are tested by randomly manipulating source code, Html, text and executable files. In addition, we test for similarities in modified image files that were generated by cybercriminals to defeat fuzzy hashing schemes (spam images). The experiments expose shortcomings in the Sdhash and Ssdeep schemes that can be exploited in straight forward ways. The results suggest that the TLSH scheme is more robust to the attacks and random changes considered.


Locality Sensitive Hash similarity digests Ssdeep Sdhash TLSH 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barnett, B.: Sed - An Introduction and Tutorial,
  2. 2.
    Breitinger, F.: Sicherheitsaspekte von fuzzy-hashing. Master’s thesis, Hochschule Darmstadt (2011)Google Scholar
  3. 3.
    Breitinger, F., Baier, H., Beckingham, J.: Security and Implementation Analysis of the Similarity Digest sdhash. In: 1st International Baltic Conference on Network Security & Forensics (NeSeFo), Tartu, Estland (2012)Google Scholar
  4. 4.
  5. 5.
    Hosmer, C.: Metamorphic and Polymorphic Malware, Black Hat USA (2008),
  6. 6.
    Kornblum, J.: Identifying Almost Identical Files Using Context Triggered Piecewise Hashing. In: Proceedings of the 6th Annual DFRWS, pp. S91–S97. Elsevier (2006)Google Scholar
  7. 7.
    Oliver, J., Cheng, C., Chen, Y.: TLSH - A Locality Sensitive Hash. In: 4th Cybercrime and Trustworthy Computing Workshop, Sydney (November 2013),
  8. 8.
    Roussev, V.: An Evaluation of Forensics Similarity Hashes. In: Proceedings of the 11th Annual DFRWS, pp. S34–S41. Elsevier (2011)Google Scholar
  9. 9.
    Roussev, V.: Data Fingerprinting with Similarity Digests. In: Chow, K., Shenoi, S. (eds.) Advances in Digital Forensics VI. IFIP AICT, vol. 337, pp. 207–226. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
  11. 11.
  12. 12.
  13. 13.
    Stackoverflow Blog, White space inside XML/HTML tags,
  14. 14.
    SVMlight source code,
  15. 15.
  16. 16.

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Jonathan Oliver
    • 1
  • Scott Forman
    • 1
  • Chun Cheng
    • 1
  1. 1.Trend MicroMelbourneAustralia

Personalised recommendations