How Cuckoo Filter Can Improve Existing Approximate Matching Techniques

Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 157)


In recent years, approximate matching algorithms havebecome an important component in digital forensic research and have been adopted in some other working areas as well. Currently there are several approaches, but sdhash and mrsh-v2 especially attract the attention of the community because of their good overall performance (runtime, compression and detection rates). Although both approaches have quite a different proceeding, their final output (the similarity digest) is very similar as both utilize Bloom filters. This data structure was presented in 1970 and thus has been used for a while. Recently, a new data structure was proposed which claimed to be faster and have a smaller memory footprint than Bloom filter – Cuckoo filter.

In this paper we analyze the feasibility of Cuckoo filter for approximate matching algorithms and present a prototype implementation called mrsh-cf which is based on a special version of mrsh-v2 called mrsh-net. We demonstrate that by using Cuckoo filter there is a runtime improvement of approximately 37 % and also a significantly better false positive rate. The memory footprint of mrsh-cf is 8 times smaller than mrsh-net, while the compression rate is twice than Bloom filter based fingerprint.


Approximate matching Similarity hashing Bloom filter Cuckoo filter Fuzzy hashing Similarity hashing mrsh-v2 mrsh-net 


  1. 1.
    Baier, H., Breitinger, F.: Security aspects of piecewise hashing in computer forensics. In: IT Security Incident Management & IT Forensics (IMF), pp. 21–36, May 2011Google Scholar
  2. 2.
    Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)zbMATHCrossRefGoogle Scholar
  3. 3.
    Bonomi, F., Mitzenmacher, M., Panigrahy, R., Singh, S., Varghese, G.: An improved construction for counting bloom filters. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 684–695. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  4. 4.
    Breitinger, F., Baggili, I.: File detection on network traffic using approximate matching. J. Digit. Forensics Secur. Law (JDFSL) 9(2), 23–36 (2014)Google Scholar
  5. 5.
    Breitinger, F., Baier, H.: Similarity preserving hashing: eligible properties and a new algorithm MRSH-v2. In: Rogers, M., Seigfried-Spellar, K.C. (eds.) ICDF2C 2012. LNICST, vol. 114, pp. 167–182. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  6. 6.
    Breitinger, F., Baier, H., White, D.: On the database lookup problem of approximate matching. Digital Invest. 11, S1–S9 (2014)CrossRefGoogle Scholar
  7. 7.
    Breitinger, F., Guttman, B., McCarrin, M., Roussev, V., White, D.: Approximate matching: Definition and terminology. Special publication 800–168. National Institute of Standards and Technologies, May 2014Google Scholar
  8. 8.
    Breitinger, F., Stivaktakis, G., Baier, H.: Frash: a framework to test algorithms of similarity hashing. Digit. Investig. 10, S50–S58 (2013)CrossRefGoogle Scholar
  9. 9.
    Broder, A., Mitzenmacher, M.: Network applications of bloom filters: a survey. Internet Math. 1(4), 485–509 (2004)zbMATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    Fan, B., Andersen, D.G., Kaminsky, M., Mitzenmacher, M.D.: Cuckoo filter: practically better than bloom. In: Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, pp. 75–88. ACM (2014)Google Scholar
  11. 11.
    Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Networking (TON) 8(3), 281–293 (2000)CrossRefGoogle Scholar
  12. 12.
    Gallagher, P., Director, A.: Secure Hash Standard (SHS). Technical report, National Institute of Standards and Technologies, Federal Information Processing Standards Publication 180–1 (1995)Google Scholar
  13. 13.
    Gupta, V.: File detection in network traffic using approximate matching. Master’s thesis, Technical University of Denmark, Copenhagen, Denmark (2013)Google Scholar
  14. 14.
    Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Invest. 3, 91–97 (2006)CrossRefGoogle Scholar
  15. 15.
    Menezes, A.J., van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography, vol. 5. CRC Press, August 2001Google Scholar
  16. 16.
    Landon Curt Noll. Fnv hash (1994–2012).
  17. 17.
    Pagh, A., Pagh, R., Rao, S.S.: An optimal bloom filter replacement. In: Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 823–829. Society for Industrial and Applied Mathematics (2005)Google Scholar
  18. 18.
    Pagh, R.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004)zbMATHMathSciNetCrossRefGoogle Scholar
  19. 19.
    Putze, F., Sanders, P., Singler, J.: Cache-, hash- and space-efficient bloom filters. In: Demetrescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 108–121. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  20. 20.
    Rathgeb, C., Breitinger, F., Busch, C., Baier, H.: On application of bloom filters to iris biometrics. Biometrics, IET 3(4), 207–218 (2014)CrossRefGoogle Scholar
  21. 21.
    Roussev, V.: Data fingerprinting with similarity digests. In: Chow, K.-P., Shenoi, S. (eds.) Advances in Digital Forensics VI. IFIP Advances in Information and Communication Technology, vol. 337, pp. 207–226. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  22. 22.
    Roussev, V.: An evaluation of forensic similarity hashes. Digital Invest. 8, 34–41 (2011)CrossRefGoogle Scholar

Copyright information

© Institute for Computer Sciences, Social informatics and Telecommunication Engineering 2015

Authors and Affiliations

  1. 1.Netskope, Inc.Los AltosUSA
  2. 2.Cyber Forensics Research and Education Group (UNHcFREG)Tagliatela College of Engineering University of New HavenWest HavenUSA

Personalised recommendations