Advertisement

The Sliding-Window Computation Model and Results

  • Mayur Datar
  • Rajeev Motwani
Chapter
Part of the Data-Centric Systems and Applications book series (DCSA)

Abstract

We present some results related to small space computation over sliding windows in the data-stream model. Most research in the data-stream model, including results presented in some of the other chapters, assume that all data elements seen so far in the stream are equally important and synopses, statistics or models that are built should reflect the entire data set. However, for many applications this assumption is not true, particularly those that ascribe more importance to recent data items. One way to discount old data items and only consider recent ones for analysis is the sliding-window model: Data elements arrive at every instant; each data element expires after exactly N time steps; and, the portion of data that is relevant to gathering statistics or answering queries is the set of last N elements to arrive. The sliding window refers to the window of active data elements at a given time instant and window size refers to N. This chapter presents a general technique, called the Exponential Histogram (EH) technique, that can be used to solve a wide variety of problems in the sliding-window model; typically problems that require us to maintain statistics. We will showcase this technique through solutions to basic counting problems, as well as other applications.

Keywords

Window Size Data Stream Time Instant Data Item Data Element 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the frequency moments, in Proc. of the 1996 Annual ACM Symp. on Theory of Computing (1996), pp. 20–29 Google Scholar
  2. 2.
    A. Arasu, G. Manku, Approximate counts and quantiles over sliding windows. Technical report, Stanford University, Stanford, California (2004) Google Scholar
  3. 3.
    B. Babcock, M. Datar, R. Motwani, Sampling from a moving window over streaming data, in Proc. of the 2002 Annual ACM-SIAM Symp. on Discrete Algorithms (2002), pp. 633–634 Google Scholar
  4. 4.
    B. Babcock, M. Datar, R. Motwani, L. O’Callaghan, Maintaining variance and k-medians over data stream windows, in Proc. of the 2003 ACM Symp. on Principles of Database Systems (2003), pp. 234–243 Google Scholar
  5. 5.
    E. Cohen, M. Strauss, Maintaining time-decaying stream aggregates, in Proc. of the 2003 ACM Symp. on Principles of Database Systems (2003), pp. 223–233 Google Scholar
  6. 6.
    A. Das, J. Gehrke, M. Riedwald, Approximate join processing over data streams, in Proc. of the 2003 ACM SIGMOD Intl. Conf. on Management of Data (2003), pp. 40–51 CrossRefGoogle Scholar
  7. 7.
    M. Datar, Algorithms for data stream systems. PhD thesis, Stanford University, Stanford, CA, USA (2003) Google Scholar
  8. 8.
    M. Datar, A. Gionis, P. Indyk, R. Motwani, Maintaining stream statistics over sliding windows. SIAM J. Comput. 31(6), 1794–1813 (2002) MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    M. Datar, S. Muthukrishnan, Estimating rarity and similarity over data stream windows, in Proc. of the 2002 Annual European Symp. on Algorithms (2002), pp. 323–334 CrossRefGoogle Scholar
  10. 10.
    J. Feigenbaum, S. Kannan, M. Strauss, M. Viswanathan, An approximate \(l_{1}\)-difference algorithm for massive data streams, in Proc. of the 1999 Annual IEEE Symp. on Foundations of Computer Science (1999), pp. 501–511 Google Scholar
  11. 11.
    A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, M. Strauss, Fast, small-space algorithms for approximate histogram maintenance, in Proc. of the 2002 Annual ACM Symp. on Theory of Computing (2002) Google Scholar
  12. 12.
    A. Gilbert, Y. Kotidis, S. Muthukrishnan, M. Strauss, Surfing wavelets on streams: one-pass summaries for approximate aggregate queries, in Proc. of the 2001 Intl. Conf. on Very Large Data Bases (2001), pp. 79–88 Google Scholar
  13. 13.
    M. Greenwald, S. Khanna, Space-efficient online computation of quantile summaries, in Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data (2001), pp. 58–66 CrossRefGoogle Scholar
  14. 14.
    S. Guha, N. Mishra, R. Motwani, L. O’Callaghan, Clustering data streams, in Proc. of the 2000 Annual IEEE Symp. on Foundations of Computer Science (2000), pp. 359–366 Google Scholar
  15. 15.
    P. Indyk, Stable distributions, pseudorandom generators, embeddings and data stream computation, in Proc. of the 2000 Annual IEEE Symp. on Foundations of Computer Science (2000), pp. 189–197 Google Scholar
  16. 16.
    J. Kang, J.F. Naughton, S. Viglas, Evaluating window joins over unbounded streams, in Proc. of the 2003 Intl. Conf. on Data Engineering (2003) Google Scholar
  17. 17.
    X. Lin, H. Lu, J. Xu, J.X. Yu, Continuously maintaining quantile summaries of the most recent \(n\) elements over a data stream, in Proc. of the 2004 Intl. Conf. on Data Engineering (2004) Google Scholar
  18. 18.
    R. Motwani, P. Raghavan, Randomized Algorithms (Cambridge University Press, Cambridge, 1995) CrossRefzbMATHGoogle Scholar
  19. 19.
    J.S. Vitter, Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985) MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Google Inc.Mountain ViewUSA
  2. 2.Department of Computer ScienceStanford UniversityStanfordUSA

Personalised recommendations