Abstract
We consider updates to an n-dimensional frequency vector of a data stream, that is, the vector f is updated coordinate-wise by means of insertions or deletions in any arbitrary order. A fundamental problem in this model is to recall the vector approximately, that is to return an estimate \(\hat{f}\) of f such that
where ε is an accuracy parameter and p is the index of the ℓ p norm used to calculate the norm . This problem, denoted by , is fundamental in data stream processing and is used to solve a number of other problems, such as heavy hitters, approximating range queries and quantiles, approximate histograms, etc..
Suppressing poly-logarithmic factors in n and , for p = 1 the problem is known to have \({\it \tilde{\Theta}}(1/\epsilon)\) randomized space complexity [2,4] and \({\it \tilde{\Theta}}(1/\epsilon^2)\) deterministic space complexity [6,7]. However, the deterministic space complexity of this problem for any value of p > 1 is not known. In this paper, we show that the deterministic space complexity of the problem is \({\it \tilde{ \Theta}}(n^{2-2/p}/\epsilon^2)\) for 1 < p < 2, and \(\it \Theta(n)\) for p ≥ 2.
Preview
Unable to display preview. Download preview PDF.
References
Bhuvanagiri, L., Ganguly, S., Kesh, D., Saha, C.: Simpler Algorithm for Estimating Frequency Moments of Data Streams. In: Proceedings of ACM Symposium on Discrete Algorithms (SODA), pp. 708–713 (2006)
Bose, P., Kranakis, E., Morin, P., Tang, Y.: Bounds for Frequency Estimation of Packet Streams. In: Sibeyn, J.F. (ed.) Proceedings of the 10th Internaltional Colloquium on Structural Information Complexity, Informatics 17 Carleton Scientific, pp. 33–42 (2003)
Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)
Cormode, G., Muthukrishnan, S.: An Improved Data Stream Summary: The Count-Min Sketch and its Applications. J. Algorithms 55(1), 58–75 (2005)
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency Estimation of Internet Packet Streams with Limited Space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002)
Ganguly, S.: Lower bounds for Frequency Estimation over Data Streams. In: Hirsch, E.A., Razborov, A.A., Semenov, A., Slissenko, A. (eds.) Computer Science – Theory and Applications. LNCS, vol. 5010, pp. 204–215. Springer, Heidelberg (2008)
Ganguly, S., Majumder, A.: CR-precis: A Deterministic Summary Structure for Update Streams. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614, pp. 48–59. Springer, Heidelberg (2007)
Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast Small-space Algorithms for Approximate Histogram Maintenance. In: Proceedings of ACM STOC, pp. 152–161 (2002)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries. In: Proceedings of VLDB, pp. 79–88 (2001)
Guha, S., Indyk, P., Muthukrishnan, S., Strauss, M.: Histogramming Data Streams with Fast Per-Item Processing. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 681–692. Springer, Heidelberg (2002)
Indyk, P., Woodruff, D.: Optimal Approximations of the Frequency Moments. In: Proceedings of ACM Symposium on Theory of Computing (STOC), Baltimore, Maryland, USA, pp. 202–298 (2005)
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A Simple Algorithm for Finding Frequent Elements in Streams and Bags. ACM Trans. Data. Syst. 28(1), 51–55 (2003)
Lee, L.K., Ting, H.F.: A Simpler and More Efficient Deterministic Scheme for Finding Frequent Items over Sliding Windows. In: Proceedings of ACM International Symposium on Principles of Database Systems (PODS), pp. 263–272 (2006)
Manku, G., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proceedings of VLDB, pp. 346–357 (2002)
Misra, J., Gries, D.: Finding Repeated Elements. Sci. Comput. Programm. 2, 143–152 (1982)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ganguly, S. (2009). Deterministically Estimating Data Stream Frequencies. In: Du, DZ., Hu, X., Pardalos, P.M. (eds) Combinatorial Optimization and Applications. COCOA 2009. Lecture Notes in Computer Science, vol 5573. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02026-1_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-02026-1_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02025-4
Online ISBN: 978-3-642-02026-1
eBook Packages: Computer ScienceComputer Science (R0)