Advertisement

Loglog Counting of Large Cardinalities

(Extended Abstract)
  • Marianne Durand
  • Philippe Flajolet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2832)

Abstract

Using an auxiliary memory smaller than the size of this abstract, the LogLog algorithm makes it possible to estimate in a single pass and within a few percents the number of different words in the whole of Shakespeare’s works. In general the LogLog algorithm makes use of m “small bytes” of auxiliary memory in order to estimate in a single pass the number of distinct elements (the “cardinality”) in a file, and it does so with an accuracy that is of the order of 1/sqrtm. The “small bytes” to be used in order to count cardinalities till N max comprise about loglog N max bits, so that cardinalities well in the range of billions can be determined using one or two kilobytes of memory only. The basic version of the LogLog algorithm is validated by a complete analysis. An optimized version, super–LogLog, is also engineered and tested on real-life data. The algorithm parallelizes optimally.

Keywords

Hash Function Poisson Model Adaptive Sampling Memory Unit Large Cardinality 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58, 137–147 (1999)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Estan, C., Varghese, G.: New directions in traffic measurement and accounting. In: Proceedings of SIGCOMM 2002. ACM Press, New York (2002) (Also: UCSD technical report CS2002-0699, February, 2002; available electronically)Google Scholar
  3. 3.
    Estan, C., Varghese, G., Fisk, M.: Bitmap algorithms for counting active flows on high speed links. Technical Report CS2003-0738, UCSD (Mar 2003)Google Scholar
  4. 4.
    Flajolet, P.: Approximate counting: A detailed analysis. BIT 25, 113–134 (1985)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Flajolet, P.: On adaptive sampling. Computing 34, 391–400 (1990)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Flajolet, P., Gourdon, X., Dumas, P.: Mellin transforms and asymptotics: Harmonic sums. Theoretical Computer Science 144(1-2), 3–58 (1995)MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences 31(2), 182–209 (1985)MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Gibbons, P.B., Poosala, V., Acharya, S., Bartal, Y., Matias, Y., Muthukrishnan, S., Ramaswamy, S., Suel, T.: AQUA: System and techniques for approximate query answering. Tech. report, Bell Laboratories, Murray Hill, New Jersey (February 1998)Google Scholar
  9. 9.
    Jacquet, P., Szpankowski, W.: Analytical depoissonization and its applications. Theoretical Computer Science 201, 1–2 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Knuth, D.E.: The Art of Computer Programming. Sorting and Searching, vol. 3, 2nd edn. Addison-Wesley, Reading (1998)Google Scholar
  11. 11.
    Morris, R.: Counting large numbers of events in small registers. Communications of the ACM 21, 840–842 (1978)zbMATHCrossRefGoogle Scholar
  12. 12.
    Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)zbMATHGoogle Scholar
  13. 13.
    Palmer, C.R., Siganos, G., Faloutsos, M., Faloutsos, C., Gibbons, P.: The connectivity and fault-tolerance of the Internet topology. In: Workshop on Network-Related Data Management (NRDM 2001) (2001)Google Scholar
  14. 14.
    Prodinger, H.: Combinatorics of geometrically distributed random variables: Leftto- right maxima. Discrete Mathematics 153, 253–270 (1996)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Szpankowski, W.: Average-Case Analysis of Algorithms on Sequences. John Wiley, New York (2001)zbMATHGoogle Scholar
  16. 16.
    Whang, K.-Y., Zanden, B.T.V., Taylor, H.M.: A linear-time probabilistic counting algorithm for database applications. TODS 15(2), 208–229 (1990)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Marianne Durand
    • 1
  • Philippe Flajolet
    • 1
  1. 1.Algorithms ProjectINRIA–RocquencourtLe ChesnayFrance

Personalised recommendations