Cardinality Computing: A New Step Towards Fully Representing Multi-sets by Bloom Filters

  • Jiakui Zhao
  • Dongqing Yang
  • Lijun Chen
  • Jun Gao
  • Tengjiao Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4255)


Bloom Filters are space and time efficient randomized data structures for representing (multi-)sets with certain allowable errors, and are widely used in many applications. Previous works on Bloom Filters considered how to support insertions, deletions, membership queries, and multiplicity queries over (multi-)sets. In this paper, we introduce two novel algorithms for computing cardinalities of multi-sets represented by Bloom Filters, which extend the functionality of the Bloom Filter and thus make it usable in a variety of new applications. The Bloom structure presented in the previous work is used without any modification, and our algorithms have no influence to previous functionality. For Bloom Filters support cardinality computing in addition to insertions, deletions, membership queries, and multiplicity queries simultaneously, our work is a new step towards fully representing multi-sets by Bloom Filters. Performance analysis and experimental results show the difference of the two algorithms and show that our algorithms perform well in most cases.


Hash Function Distinct Element Bloom Filter False Positive Error Error Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bloom, B.H.: Space/Time Trade-offs in Hash Coding with Allowable Errors. Communication of the ACM 13(7), 422–426 (1970)MATHCrossRefGoogle Scholar
  2. 2.
    Fan, L., Cao, P., Almeida, J., Border, A.Z.: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. ACM SIGCOMM Computer Communication Review 28(4), 254–265 (1998)CrossRefGoogle Scholar
  3. 3.
    Cohen, S., Matias, Y.: Spectral Bloom Filters. In: Proceedings of SIGMOD, pp. 241–252 (2003)Google Scholar
  4. 4.
    Flajolet, P., Martin, N.: Probabilistic Counting Algorithms for Data Base Applications. Journal of Computer and System Sciences 31(2), 182–209 (1985)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Ganguly, S., Garofalakis, M.N., Rastogi, R.: Tracking Set-Expression Cardinalities over Continuous Update Streams. VLDB Journal 13(4), 354–369 (2004)CrossRefGoogle Scholar
  6. 6.
    Garofalakis, M.N., Ganguly, S., Kumar, A., Rastogi, R.: Join-Distinct Aggregate Estimation over Update Streams. In: Proceedings of PODS 2005, pp. 259–270 (2005)Google Scholar
  7. 7.
    Broder, A., Mitzenmacher, M.: Network Applications of Bloom Filters: A Survey. Internet Mathematics 1(4), 485–509 (2004)MATHMathSciNetCrossRefGoogle Scholar
  8. 8.
    Metwally, A., Agrawal, D., Abbadi, A.E.: Duplicate Detection in Click Streams. In: Proceedings of WWW 2005, pp. 12–21 (2005)Google Scholar
  9. 9.
    Deng, F., Rafiei, D.: Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters. In: Proceedings of SIGMOD 2006, pp. 25–36 (2006)Google Scholar
  10. 10.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: Proceedings of PODS 2002, pp. 1–16 (2002)Google Scholar
  11. 11.
  12. 12.
    L’Ecuyer, P.: Tables of Maximally Equidistributed Combined LFSR Generators. Mathematics of Computation 68(225), 261–269 (1999)MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Elias, P.: Universal Codeword Sets and Representations of the Integers. IEEE Transactions on Information Theory 21(2), 194–202 (1975)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jiakui Zhao
    • 1
  • Dongqing Yang
    • 1
  • Lijun Chen
    • 1
  • Jun Gao
    • 1
  • Tengjiao Wang
    • 1
  1. 1.School of EECSPeking UniversityBeijingChina

Personalised recommendations