Abstract
We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space.
Our first result deals with a measure we call the “entropy norm” of an input stream: it is closely related to entropy but is structurally similar to the well-studied notion of frequency moments. We give a polylogarithmic space one-pass algorithm for estimating this norm under certain conditions on the input stream. We also prove a lower bound that rules out such an algorithm if these conditions do not hold.
Our second group of results are for estimating the empirical entropy of an input stream. We first present a sublinear space one-pass algorithm for this problem. For a stream of m items and a given real parameter α, our algorithm uses space \(\tilde{O}(m^{2\alpha})\) and provides an approximation of 1/α in the worst case and (1 + ε) in “most” cases. We then present a two-pass polylogarithmic space (1 + ε)-approximation algorithm. All our algorithms are quite simple.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: Proc. ACM STOC, pp. 20–29 (1996)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: ACM PODS, pp. 1–16 (2002)
Coppersmith, D., Kumar, R.: An improved data stream algorithm for frequency moments. In: ACM-SIAM SODA, pp. 151–156 (2004)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Estan, C., Varghese, G.: New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)
Gu, Y., McCallum, A., Towsley, D.: Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation. In: Proc. Internet Measurement Conference (2005)
Guha, S., McGregor, A., Venkatasubramanian, S.: Streaming and Sublinear Approximation of Entropy and Information Distances. In: ACM-SIAM SODA (to appear, 2006)
Indyk, P.: Personal e-mail communication (September 2005)
Indyk, P., Woodruff, D.: Optimal approximations of the frequency moments of data streams. In: ACM STOC, pp. 202–208 (2005)
Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, Cambridge (1997)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, New York (1995)
Muthukrishnan, S.: Data Streams: Algorithms and Applications. Manuscript, Available online at: http://www.cs.rutgers.edu/~muthu/stream-1-1.ps
Wagner, A., Plattner, B.: Entropy Based Worm and Anomaly Detection in Fast IP Networks. In: 14th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WET ICE). STCA security workshop, Linkping, Sweden (June 2005)
Xu, K., Zhang, Z., Bhattacharya, S.: Profiling Internet Backbone Traffic: Behavior Models and Applications. In: Proc. ACM SIGCOMM (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chakrabarti, A., Do Ba, K., Muthukrishnan, S. (2006). Estimating Entropy and Entropy Norm on Data Streams. In: Durand, B., Thomas, W. (eds) STACS 2006. STACS 2006. Lecture Notes in Computer Science, vol 3884. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11672142_15
Download citation
DOI: https://doi.org/10.1007/11672142_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32301-3
Online ISBN: 978-3-540-32288-7
eBook Packages: Computer ScienceComputer Science (R0)