Advertisement

An Optimization Problem Related to Bloom Filters with Bit Patterns

  • Peter Damaschke
  • Alexander Schliep
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10706)

Abstract

Bloom filters are hash-based data structures for membership queries without false negatives widely used across many application domains. They also have become a central data structure in bioinformatics. In genomics applications and DNA sequencing the number of items and number of queries are frequently measured in the hundreds of billions. Consequently, issues of cache behavior and hash function overhead become a pressing issue. Blocked Bloom filters with bit patterns offer a variant that can better cope with cache misses and reduce the amount of hashing. In this work we state an optimization problem concerning the minimum false positive rate for given numbers of memory bits, stored elements, and patterns. The aim is to initiate the study of pattern designs best suited for the use in Bloom filters. We provide partial results about the structure of optimal solutions and a link to two-stage group testing.

Keywords

Bloom filter Genomics Antichain Group testing Disjunct matrix 

Notes

Acknowledgment

We are grateful to the reviewers for encouragement and for careful comments that helped improve the notation and fix a calculation mistake.

References

  1. 1.
    Barg, A., Mazumdar, A.: Almost Disjunct Matrices from Codes and Designs. CoRR abs/1510.02873 (2015)Google Scholar
  2. 2.
    Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Comm. ACM 13, 422–426 (1970)CrossRefzbMATHGoogle Scholar
  3. 3.
    Broder, A., Mitzenmacher, M.: Network applications of Bloom filters: a survey. Internet Math. 1, 485–509 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Damaschke, P.: Calculating approximation guarantees for partial set cover of pairs. Optim. Lett. 11, 1293–1302 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Damaschke, P., Muhammad, A.S.: Randomized group testing both query-optimal and minimal adaptive. In: Bieliková, M., Friedrich, G., Gottlob, G., Katzenbeisser, S., Turán, G. (eds.) SOFSEM 2012. LNCS, vol. 7147, pp. 214–225. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-27660-6_18 CrossRefGoogle Scholar
  6. 6.
    De Bonis, A., Gasieniec, L., Vaccaro, U.: Optimal two-stage algorithms for group testing problems. SIAM J. Comput. 34, 1253–1270 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    de Bruijn, N.G.: A combinatorial problem. Koninklijke Nederlandse Akademie v. Wetenschappen 49, 758–764 (1946)zbMATHGoogle Scholar
  8. 8.
    Dillinger, P.C., Manolios, P.: Bloom filters in probabilistic verification. In: Hu, A.J., Martin, A.K. (eds.) FMCAD 2004. LNCS, vol. 3312, pp. 367–381. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-30494-4_26 CrossRefGoogle Scholar
  9. 9.
    Du, D.Z., Hwang, F.K.: Pooling Designs and Nonadaptive Group Testing. World Scientific, New Jersey (2006)CrossRefzbMATHGoogle Scholar
  10. 10.
    Dyachkov, A.G., Vorobev, I.V., Polyansky, N.A., Shchukin, V.Y.: Bounds on the rate of disjunctive codes. Probl. Inf. Transm. 50, 27–56 (2014)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Eppstein, D.: Cuckoo filter: simplification and analysis. In: Pagh, R. (ed.) SWAT 2016. LIPIcs, vol. 53, paper 8, Dagstuhl (2016)Google Scholar
  12. 12.
    Eppstein, D., Goodrich, M.T., Hirschberg, D.S.: Improved combinatorial group testing algorithms for real-world problem sizes. SIAM J. Comput. 36, 1360–1375 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Fan, B., Andersen, D.G., Kaminsky, M., Mitzenmacher, M.: Cuckoo filter: practically better than Bloom. In: Seneviratne, A., et al. (eds.) CoNEXT 2014, pp. 75–88. ACM (2014)Google Scholar
  14. 14.
    Kapoor, A., Rizzi, R.: Edge-coloring bipartite graphs. J. Algorithms 34, 390–396 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Kirsch, A., Mitzenmacher, M.: Less hashing, same performance: building a better Bloom filter. Random Struct. Algorithms 33, 187–218 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Knill, E., Schliep, A., Torney, D.C.: Interpretation of pooling experiments using the Markov Chain Monte Carlo method. J. Comput. Biol. 3, 395–406 (1996)CrossRefGoogle Scholar
  17. 17.
    Mazumdar, A.: Nonadaptive Group Testing with Random Set of Defectives. CoRR abs/1503.03597 (2016)Google Scholar
  18. 18.
    Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, Cambridge (2005)CrossRefzbMATHGoogle Scholar
  19. 19.
    Pagh, A., Pagh, R., Srinivasa Rao, S.: An optimal Bloom filter replacement. In: SODA 2005, pp. 823–829 (2005)Google Scholar
  20. 20.
    Putze, F., Sanders, P., Singler, J.: Cache-, hash-, and space-efficient Bloom filters. ACM J. Exp. Algorithms 14, Article 4.4 (2009)Google Scholar
  21. 21.
    Roy, R.S., Bhattacharya, D., Schliep, A.: Turtle: identifying frequent k-mers with cache-efficient algorithms. Bioinformatics 14, 1950–1957 (2014)CrossRefGoogle Scholar
  22. 22.
    Sarkar, K., Colbourn, C.J., de Bonis, A., Vaccaro, U.: Partial covering arrays: algorithms and asymptotics. In: Mäkinen, V., Puglisi, S.J., Salmela, L. (eds.) IWOCA 2016. LNCS, vol. 9843, pp. 437–448. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-44543-4_34 CrossRefGoogle Scholar
  23. 23.
    Song, H., Dharmapurikar S., Turner J., Lockwood, J.: Fast hash table lookup using extended Bloom filter: an aid to network processing. In: Guérin, R., Govindan, R., Minshall, G.: SIGCOMM 2005, pp. 181–192. ACM (2005)Google Scholar
  24. 24.
    Sperner, E.: Ein Satz über Untermengen einer endlichen Menge. Math. Zeitschrift 27, 544–548 (1928)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008)CrossRefGoogle Scholar
  26. 26.
    Zhigljavsky, A.: Probabilistic existence theorems in group testing. J. Stat. Plann. Infer. 115, 1–43 (2003)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringChalmers UniversityGothenburgSweden
  2. 2.Department of Computer Science and EngineeringUniversity of GothenburgGothenburgSweden

Personalised recommendations