Stream Data Management pp 35-58

Part of the Advances in Database Systems book series (ADBS, volume 30)

Filtering, Punctuation, Windows and Synopses

  • David Maier
  • Peter A. Tucker
  • Minos Garofalakis

Abstract

This chapter addresses some of the problems raised by the high-volume, nonterminating nature of many data streams. We begin by outlining challenges for query processing over such streams, such as outstripping CPU or memory resources, operators that wait for the end of input and unbounded query state. We then consider various techniques for meeting those challenges. Filtering attempts to reduce stream volume in order to save on system resources. Punctuations incorporate semantics on the structure of a stream into the stream itself, and can help unblock query operators and reduce the state they must retain. Windowing modifies a query so that processing takes place on finite subsets of full streams. Synopses are compact, efficiently maintained summaries of data that can provide approximate answers to particular queries.

Keywords

data stream processing disordered data stream filtering stream punctuation stream synopses window queries 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, N., Gibbons, P., Matias, Y, and Szegedy, M. (1999). Tracking join and self-join sizes in limited storage. In Proceedings of ACM PODS Conference, pages 10–20.Google Scholar
  2. Alon, N., Matias, Y., and Szegedy, M. (1996). The space complexity of approximating the frequency moments. In Proceeding of ACMSTOC Conference, pages 20–29.Google Scholar
  3. Arasu, A., Babu, S., and Widom, J. (2003). The CQL continuous query language: semantic foundations and query execution. Stanford University TR No. 2003-67 (unpublished).Google Scholar
  4. Arasu, A. and Manku, G. S. (2004). Approximate counts and quantiles over sliding windows. In Proceedings of ACM PODS Conference, pages 286–296.Google Scholar
  5. Babu, S., Srivastava, U., and Widom, J. (2004). Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. TODS, 29(3):545–580.CrossRefGoogle Scholar
  6. Carney, D., Cetintemel, Ugur, Chemiack, Mitch, Convey, Christian, Lee, Sangdon, Seidman, Greg, Stonebraker, Michael, Tatbul, Nesime, and Zdonik, Stanley B. (2002). Monitoring Streams-A New Class of Data Management Applications. In VLDB Conference, pages 215–226.Google Scholar
  7. Charikar, M., Chen, K., and Farach-Colton, M. (2002). Finding frequent items in data streams. In Proceedings of ICALP Conference, pages 3–15.Google Scholar
  8. Cisco Systems. (2001). Netflow Services Solutions Guide.Google Scholar
  9. Considine, J., Li, F., Kollios, G., and Byers, J. (2004). Approximate aggregation techniques for sensor databases. In Proceedings of IEEEICDE Conference, pages 449–460.Google Scholar
  10. Das, A., Gehrke, J., and Riedewald, M. (2003). Approximate join processing over data streams. In Proceedings of ACMSIGMOD Conference, pages 40–51.Google Scholar
  11. Das, A., Riedewald, M., and Gehrke, J. (2004). Approximation techniques for spatial data. In Proceedings of ACMSIGMOD Conference, pages 695–706.Google Scholar
  12. Datar, M., Gionis, A., Indyk, P., and Motwani, R. (2002). Maintaining Stream Statistics over Sliding Windows. In Proceedings of SODA Conference, pages 635–644.Google Scholar
  13. Dobra, Alin, Garofalakis, Minos, Gehrke, Johannes, and Rastogi, Rajeev (2002). Processing Complex Aggregate Queries over Data Streams. In Proceedings of ACMSIGMOD Conference, pages 61–72.Google Scholar
  14. Dobra, Alin, Garofalakis, Minos, Gehrke, Johannes, and Rastogi, Rajeev (2004). Sketch-Based Multi-Query Processing over Data Streams. In Proceedings of EDBT Conference, pages 551–568.Google Scholar
  15. Feigenbaum, J., Kannan, S., Strauss, M., and Viswanathan, M. (1999). An approximate L1-difference algorithm for massive data streams. In Proc. IEEE FOCS Conference, page 501.Google Scholar
  16. Flajolet, P. and Martin, N. (1995). Probabilistic counting algorithms for data base applications. JCSS Journal, 31(2): 182–209.MathSciNetGoogle Scholar
  17. Ganguly, S., Garofalakis, M., and Rastogi, R. (2003). Processing set expressions over continuous update streams. In Proceedings of ACMSIGMOD Conference, pages 265–276.Google Scholar
  18. Garofalakis, M. and Kumar, A. (2003). Correlating XML data streams using tree-edit distance embeddings. In Proceedings of ACM PODS Conference, pages 143–154.Google Scholar
  19. Gehrke, J., Korn, F., and Srivastava, D. (2001). On computing correlated aggregates over continual data streams. In Proceedings of ACM SIGMOD Conference, pages 13–24.Google Scholar
  20. Gibbons, P. (2001). Distinct sampling for highly-accurate answers to distinct values queries and event reports. In Proceedings of VLDB Conference, pages 541–550.Google Scholar
  21. Gibbons, P. and Tirthapura, S. (2002). Distributed streams algorithms for sliding windows. In Proceedings of ACM SPAA Conference, pages 63–72.Google Scholar
  22. Gilbert, A. C., Kotidis, Y., Muthukrishnan, S., and Strauss, M. (2001). Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In Proceedings of VLDB Conference, pages 79–88.Google Scholar
  23. Gilbert, A. C., Guha, S., Indyk, P., Kotidis, Y, Muthukrishnan, S., and Strauss, M. (2002). Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of ACM STOC Conference, pages 389–398.Google Scholar
  24. Greenwald, M. B. and Khanna, S. (2001). Space-efficient online computation of quantile summaries. In Proceedings of ACM SIGMOD Conference, pages 58–66.Google Scholar
  25. Hillston, J. and Kloul, L. (2001). Performance investigation of an on-line auction system. Concurrency and Computation: Practice and Experience, 13:23–41.CrossRefGoogle Scholar
  26. Indyk, P. (2000). Stable Distributions, Pseudorandom generators, embeddings, and data stream computation. In Proceedings of IEEEFOCS Conference, page 189.Google Scholar
  27. Johnson, T., Cranor, C, Spatscheck, O., and Shkapenyuk, V. (2003). Gigascope: A stream database for network applications. In Proceedings of ACM SIGMOD Conference, pages 647–651.Google Scholar
  28. Kang, J., Naughton, J. F., and Viglas, S. D. (2003). Evaluating window joins over unbounded streams. In Proceedings of the International Conference on Data Engineering (ICDE).Google Scholar
  29. Manku, G. S. and Motwani, R. (2002). Approximate frequency counts over data streams. In Proceedings of VLDB Conference, pages 346–357.Google Scholar
  30. Rajasekar, A., Vernon, F., Hansen, T., Linquist, K., and Orcutt, J. (2004). Virtual object ring buffer: A framework for real-time data grid. In Proceedings of HDPC Conference.Google Scholar
  31. Reiss, F. and Hellerstein, J. M. (2004). Data triage: An adaptive architecture for load shedding in TelegraphCQ. Intel Research Berkeley Report IRB-TR-04-004.Google Scholar
  32. Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., and Stonebraker, M. (2003). Load shedding in a data stream manager. In Proceedings of VLDB Conference, pages 309–320.Google Scholar
  33. Tucker, P. A. and Maier, D. (2003). Dealing with disorder. In MPDS Workshop.Google Scholar
  34. Tucker, P. A., Maier, D., Fegaras, L., and Sheard, T. (2003). Exploiting punctuation semantics in continuous data streams. IEEE TKDE, 15(3):555–568.Google Scholar
  35. Vitter, J. S. (1985). Random sampling with a reservoir. ACM Trans. on Math. Software, 11(l):37–57.MATHMathSciNetCrossRefGoogle Scholar
  36. Wilschut, Annita N. and Apers, Peter M. G. (1991). Dataflow query execution in a parallel main-memory environment. In Proceedings of PDIS Conference, pages 68–77.Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2005

Authors and Affiliations

  • David Maier
    • 1
  • Peter A. Tucker
    • 2
  • Minos Garofalakis
    • 3
  1. 1.OGI School of Science & Engineering at OHSUBeaverton
  2. 2.Whitworth CollegeSpokane
  3. 3.Bell LaboratoriesLucent TechnologiesMurray Hill

Personalised recommendations