Applying Approximate Counting for Computing the Frequency Moments of Long Data Streams
Article
First Online:
Received:
Accepted:
Abstract
This paper takes up a remark in the well-known paper of Alon, Matias, and Szegedy (J. Comput. Syst. Sci. 58(1):137–147, 1999) about the computation of the frequency moments of data streams and shows in detail how any F k with k≥1 can be approximately computed using space O(km 1−1/k (k+log m+log log n)) based on approximate counting. An important building block for this, which may be interesting in its own right, is a new approximate variant of reservoir sampling using space O(log log n) for constant error parameters.
Keywords
Data Stream Input Stream Total Variation Distance Reservoir Sampling Online Phase
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58(1), 137–147 (1999) MATHCrossRefMathSciNetGoogle Scholar
- 2.Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of 21st PODS, pp. 1–16 (2002) Google Scholar
- 3.Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci. 68(4), 702–732 (2004) MATHCrossRefMathSciNetGoogle Scholar
- 4.Bhuvanagiri, L., Ganguly, S., Kesh, D., Saha, C.: Simpler algorithm for estimating frequency moments of data streams. In: Proceedings of 17th SODA, pp. 708–713 (2006) Google Scholar
- 5.Chakrabarti, M., Khot, S., Sun, X.: Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In: Proceedings of 18th Conference on Computational Complexity, pp. 107–117 (2003) Google Scholar
- 6.Coppersmith, D., Kumar, R.: An improved data stream algorithm for frequency moments. In: Proceedings of 15th SODA, pp. 151–156 (2004) Google Scholar
- 7.Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991) MATHCrossRefGoogle Scholar
- 8.Flajolet, P.: Approximate counting: a detailed analysis. In: BIT, pp. 113–134 (1985) Google Scholar
- 9.Ganguly, S.: Estimating frequency moments of update streams using random linear combinations. In: Proceedings of 8th RANDOM, pp. 369–380 (2004) Google Scholar
- 10.Ganguly, S.: A hybrid technique for estimating frequency moments over data streams, 2004. Manuscript, available at http://www.cse.iitk.ac.in/users/sganguly/HybridFk.pdf
- 11.Hofri, M., Kechris, N.: Probabilistic counting of a large number of events—revisited, 1995. Manuscript, available at http://www.cs.wpi.edu/~hofri/
- 12.Indyk, P., Woodruff, D.: Optimal approximations of the frequency moments. In: Proceedings of 37th STOC, pp. 202–208 (2005) Google Scholar
- 13.Morris, R.: Counting large numbers of events in small registers. Commun. ACM 21(10), 840–842 (1978) MATHCrossRefGoogle Scholar
- 14.Muthukrishnan, S.: Data streams: algorithms and applications. In: Proceedings of 14th SODA, pp. 413–413 (2003). Online version: http://athos.rutgers.edu/~muthu/stream-1-1.ps
- 15.Muthukrishnan, S.: Data streams: Algorithms and Applications. Now Publishers, Hanover (2005) MATHGoogle Scholar
- 16.Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985) MATHCrossRefMathSciNetGoogle Scholar
- 17.Wegener, I.: The Complexity of Boolean Functions. Wiley–Teubner, Stuttgart (1987) MATHGoogle Scholar
- 18.Woodruff, D.: Optimal space lower bounds for all frequency moments. In: Proceedings of 15th SODA, pp. 167–175 (2004) Google Scholar
Copyright information
© Springer Science+Business Media, LLC 2007