Data Stream Management pp 149-165 | Cite as

# The Sliding-Window Computation Model and Results

## Abstract

We present some results related to small space computation over sliding windows in the data-stream model. Most research in the data-stream model, including results presented in some of the other chapters, assume that all data elements seen so far in the stream are equally important and synopses, statistics or models that are built should reflect the entire data set. However, for many applications this assumption is not true, particularly those that ascribe more importance to recent data items. One way to discount old data items and only consider recent ones for analysis is the sliding-window model: Data elements arrive at every instant; each data element expires after exactly N time steps; and, the portion of data that is relevant to gathering statistics or answering queries is the set of last N elements to arrive. The sliding window refers to the window of active data elements at a given time instant and window size refers to N. This chapter presents a general technique, called the Exponential Histogram (EH) technique, that can be used to solve a wide variety of problems in the sliding-window model; typically problems that require us to maintain statistics. We will showcase this technique through solutions to basic counting problems, as well as other applications.

## Keywords

Window Size Data Stream Time Instant Data Item Data Element## Preview

Unable to display preview. Download preview PDF.

## References

- 1.N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the frequency moments, in
*Proc. of the 1996 Annual ACM Symp. on Theory of Computing*(1996), pp. 20–29 Google Scholar - 2.A. Arasu, G. Manku, Approximate counts and quantiles over sliding windows. Technical report, Stanford University, Stanford, California (2004) Google Scholar
- 3.B. Babcock, M. Datar, R. Motwani, Sampling from a moving window over streaming data, in
*Proc. of the 2002 Annual ACM-SIAM Symp. on Discrete Algorithms*(2002), pp. 633–634 Google Scholar - 4.B. Babcock, M. Datar, R. Motwani, L. O’Callaghan, Maintaining variance and k-medians over data stream windows, in
*Proc. of the 2003 ACM Symp. on Principles of Database Systems*(2003), pp. 234–243 Google Scholar - 5.E. Cohen, M. Strauss, Maintaining time-decaying stream aggregates, in
*Proc. of the 2003 ACM Symp. on Principles of Database Systems*(2003), pp. 223–233 Google Scholar - 6.A. Das, J. Gehrke, M. Riedwald, Approximate join processing over data streams, in
*Proc. of the 2003 ACM SIGMOD Intl. Conf. on Management of Data*(2003), pp. 40–51 CrossRefGoogle Scholar - 7.M. Datar, Algorithms for data stream systems. PhD thesis, Stanford University, Stanford, CA, USA (2003) Google Scholar
- 8.M. Datar, A. Gionis, P. Indyk, R. Motwani, Maintaining stream statistics over sliding windows. SIAM J. Comput.
**31**(6), 1794–1813 (2002) MathSciNetCrossRefzbMATHGoogle Scholar - 9.M. Datar, S. Muthukrishnan, Estimating rarity and similarity over data stream windows, in
*Proc. of the 2002 Annual European Symp. on Algorithms*(2002), pp. 323–334 CrossRefGoogle Scholar - 10.J. Feigenbaum, S. Kannan, M. Strauss, M. Viswanathan, An approximate \(l_{1}\)-difference algorithm for massive data streams, in
*Proc. of the 1999 Annual IEEE Symp. on Foundations of Computer Science*(1999), pp. 501–511 Google Scholar - 11.A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, M. Strauss, Fast, small-space algorithms for approximate histogram maintenance, in
*Proc. of the 2002 Annual ACM Symp. on Theory of Computing*(2002) Google Scholar - 12.A. Gilbert, Y. Kotidis, S. Muthukrishnan, M. Strauss, Surfing wavelets on streams: one-pass summaries for approximate aggregate queries, in
*Proc. of the 2001 Intl. Conf. on Very Large Data Bases*(2001), pp. 79–88 Google Scholar - 13.M. Greenwald, S. Khanna, Space-efficient online computation of quantile summaries, in
*Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data*(2001), pp. 58–66 CrossRefGoogle Scholar - 14.S. Guha, N. Mishra, R. Motwani, L. O’Callaghan, Clustering data streams, in
*Proc. of the 2000 Annual IEEE Symp. on Foundations of Computer Science*(2000), pp. 359–366 Google Scholar - 15.P. Indyk, Stable distributions, pseudorandom generators, embeddings and data stream computation, in
*Proc. of the 2000 Annual IEEE Symp. on Foundations of Computer Science*(2000), pp. 189–197 Google Scholar - 16.J. Kang, J.F. Naughton, S. Viglas, Evaluating window joins over unbounded streams, in
*Proc. of the 2003 Intl. Conf. on Data Engineering*(2003) Google Scholar - 17.X. Lin, H. Lu, J. Xu, J.X. Yu, Continuously maintaining quantile summaries of the most recent \(n\) elements over a data stream, in
*Proc. of the 2004 Intl. Conf. on Data Engineering*(2004) Google Scholar - 18.R. Motwani, P. Raghavan,
*Randomized Algorithms*(Cambridge University Press, Cambridge, 1995) CrossRefzbMATHGoogle Scholar - 19.J.S. Vitter, Random sampling with a reservoir. ACM Trans. Math. Softw.
**11**(1), 37–57 (1985) MathSciNetCrossRefzbMATHGoogle Scholar