Skip to main content

How Cuckoo Filter Can Improve Existing Approximate Matching Techniques

  • Conference paper
  • First Online:
Digital Forensics and Cyber Crime (ICDF2C 2015)

Abstract

In recent years, approximate matching algorithms havebecome an important component in digital forensic research and have been adopted in some other working areas as well. Currently there are several approaches, but sdhash and mrsh-v2 especially attract the attention of the community because of their good overall performance (runtime, compression and detection rates). Although both approaches have quite a different proceeding, their final output (the similarity digest) is very similar as both utilize Bloom filters. This data structure was presented in 1970 and thus has been used for a while. Recently, a new data structure was proposed which claimed to be faster and have a smaller memory footprint than Bloom filter – Cuckoo filter.

In this paper we analyze the feasibility of Cuckoo filter for approximate matching algorithms and present a prototype implementation called mrsh-cf which is based on a special version of mrsh-v2 called mrsh-net. We demonstrate that by using Cuckoo filter there is a runtime improvement of approximately 37 % and also a significantly better false positive rate. The memory footprint of mrsh-cf is 8 times smaller than mrsh-net, while the compression rate is twice than Bloom filter based fingerprint.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://roussev.net/t5/t5.html (last accessed 2015-04-10).

  2. 2.

    https://en.wikipedia.org/wiki/Amortized_analysis (last accessed 2015-04-10).

  3. 3.

    I.e, this overcomes chained hash table where worst case time for lookup will be linear O(n).

  4. 4.

    Since Cuckoo filter only store the hash of an item (the entry) and not the item itself, it is not possible to rehash an item and identify the other bucket. Therefore, the authors implemented the location hash functions (\(h_1\) and \(h_2\)) in a manner allowing them to be derived from the current location and the entry: \(h_1(x) = hash(x)\) and \( h_2(x) = h_1(x) \oplus hash(f_h(x))\) where hash is any hash function.

  5. 5.

    https://github.com/efficient/cuckoofilter (last accessed 2015-04-10).

  6. 6.

    http://www.fbreitinger.de/?page_id=218 (last accessed 2015-04-10).

  7. 7.

    https://code.google.com/p/smhasher/wiki/MurmurHash2 (last accessed 2015-04-10).

  8. 8.

    http://man7.org/linux/man-pages/man1/time.1.html (last accessed 2015-04-10).

References

  1. Baier, H., Breitinger, F.: Security aspects of piecewise hashing in computer forensics. In: IT Security Incident Management & IT Forensics (IMF), pp. 21–36, May 2011

    Google Scholar 

  2. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  3. Bonomi, F., Mitzenmacher, M., Panigrahy, R., Singh, S., Varghese, G.: An improved construction for counting bloom filters. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 684–695. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Breitinger, F., Baggili, I.: File detection on network traffic using approximate matching. J. Digit. Forensics Secur. Law (JDFSL) 9(2), 23–36 (2014)

    Google Scholar 

  5. Breitinger, F., Baier, H.: Similarity preserving hashing: eligible properties and a new algorithm MRSH-v2. In: Rogers, M., Seigfried-Spellar, K.C. (eds.) ICDF2C 2012. LNICST, vol. 114, pp. 167–182. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  6. Breitinger, F., Baier, H., White, D.: On the database lookup problem of approximate matching. Digital Invest. 11, S1–S9 (2014)

    Article  Google Scholar 

  7. Breitinger, F., Guttman, B., McCarrin, M., Roussev, V., White, D.: Approximate matching: Definition and terminology. Special publication 800–168. National Institute of Standards and Technologies, May 2014

    Google Scholar 

  8. Breitinger, F., Stivaktakis, G., Baier, H.: Frash: a framework to test algorithms of similarity hashing. Digit. Investig. 10, S50–S58 (2013)

    Article  Google Scholar 

  9. Broder, A., Mitzenmacher, M.: Network applications of bloom filters: a survey. Internet Math. 1(4), 485–509 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  10. Fan, B., Andersen, D.G., Kaminsky, M., Mitzenmacher, M.D.: Cuckoo filter: practically better than bloom. In: Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, pp. 75–88. ACM (2014)

    Google Scholar 

  11. Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Networking (TON) 8(3), 281–293 (2000)

    Article  Google Scholar 

  12. Gallagher, P., Director, A.: Secure Hash Standard (SHS). Technical report, National Institute of Standards and Technologies, Federal Information Processing Standards Publication 180–1 (1995)

    Google Scholar 

  13. Gupta, V.: File detection in network traffic using approximate matching. Master’s thesis, Technical University of Denmark, Copenhagen, Denmark (2013)

    Google Scholar 

  14. Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Invest. 3, 91–97 (2006)

    Article  Google Scholar 

  15. Menezes, A.J., van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography, vol. 5. CRC Press, August 2001

    Google Scholar 

  16. Landon Curt Noll. Fnv hash (1994–2012). http://www.isthe.com/chongo/tech/comp/fnv/index.html

  17. Pagh, A., Pagh, R., Rao, S.S.: An optimal bloom filter replacement. In: Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 823–829. Society for Industrial and Applied Mathematics (2005)

    Google Scholar 

  18. Pagh, R.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  19. Putze, F., Sanders, P., Singler, J.: Cache-, hash- and space-efficient bloom filters. In: Demetrescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 108–121. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  20. Rathgeb, C., Breitinger, F., Busch, C., Baier, H.: On application of bloom filters to iris biometrics. Biometrics, IET 3(4), 207–218 (2014)

    Article  Google Scholar 

  21. Roussev, V.: Data fingerprinting with similarity digests. In: Chow, K.-P., Shenoi, S. (eds.) Advances in Digital Forensics VI. IFIP Advances in Information and Communication Technology, vol. 337, pp. 207–226. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  22. Roussev, V.: An evaluation of forensic similarity hashes. Digital Invest. 8, 34–41 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank Breitinger .

Editor information

Editors and Affiliations

Appendix

Appendix

figure a

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Institute for Computer Sciences, Social informatics and Telecommunication Engineering

About this paper

Cite this paper

Gupta, V., Breitinger, F. (2015). How Cuckoo Filter Can Improve Existing Approximate Matching Techniques. In: James, J., Breitinger, F. (eds) Digital Forensics and Cyber Crime. ICDF2C 2015. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 157. Springer, Cham. https://doi.org/10.1007/978-3-319-25512-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25512-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25511-8

  • Online ISBN: 978-3-319-25512-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics