Skip to main content

Filtering, Punctuation, Windows and Synopses

  • Chapter
Book cover Stream Data Management

Part of the book series: Advances in Database Systems ((ADBS,volume 30))

Abstract

This chapter addresses some of the problems raised by the high-volume, nonterminating nature of many data streams. We begin by outlining challenges for query processing over such streams, such as outstripping CPU or memory resources, operators that wait for the end of input and unbounded query state. We then consider various techniques for meeting those challenges. Filtering attempts to reduce stream volume in order to save on system resources. Punctuations incorporate semantics on the structure of a stream into the stream itself, and can help unblock query operators and reduce the state they must retain. Windowing modifies a query so that processing takes place on finite subsets of full streams. Synopses are compact, efficiently maintained summaries of data that can provide approximate answers to particular queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Alon, N., Gibbons, P., Matias, Y, and Szegedy, M. (1999). Tracking join and self-join sizes in limited storage. In Proceedings of ACM PODS Conference, pages 10–20.

    Google Scholar 

  • Alon, N., Matias, Y., and Szegedy, M. (1996). The space complexity of approximating the frequency moments. In Proceeding of ACMSTOC Conference, pages 20–29.

    Google Scholar 

  • Arasu, A., Babu, S., and Widom, J. (2003). The CQL continuous query language: semantic foundations and query execution. Stanford University TR No. 2003-67 (unpublished).

    Google Scholar 

  • Arasu, A. and Manku, G. S. (2004). Approximate counts and quantiles over sliding windows. In Proceedings of ACM PODS Conference, pages 286–296.

    Google Scholar 

  • Babu, S., Srivastava, U., and Widom, J. (2004). Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. TODS, 29(3):545–580.

    Article  Google Scholar 

  • Carney, D., Cetintemel, Ugur, Chemiack, Mitch, Convey, Christian, Lee, Sangdon, Seidman, Greg, Stonebraker, Michael, Tatbul, Nesime, and Zdonik, Stanley B. (2002). Monitoring Streams-A New Class of Data Management Applications. In VLDB Conference, pages 215–226.

    Google Scholar 

  • Charikar, M., Chen, K., and Farach-Colton, M. (2002). Finding frequent items in data streams. In Proceedings of ICALP Conference, pages 3–15.

    Google Scholar 

  • Cisco Systems. (2001). Netflow Services Solutions Guide.

    Google Scholar 

  • Considine, J., Li, F., Kollios, G., and Byers, J. (2004). Approximate aggregation techniques for sensor databases. In Proceedings of IEEEICDE Conference, pages 449–460.

    Google Scholar 

  • Das, A., Gehrke, J., and Riedewald, M. (2003). Approximate join processing over data streams. In Proceedings of ACMSIGMOD Conference, pages 40–51.

    Google Scholar 

  • Das, A., Riedewald, M., and Gehrke, J. (2004). Approximation techniques for spatial data. In Proceedings of ACMSIGMOD Conference, pages 695–706.

    Google Scholar 

  • Datar, M., Gionis, A., Indyk, P., and Motwani, R. (2002). Maintaining Stream Statistics over Sliding Windows. In Proceedings of SODA Conference, pages 635–644.

    Google Scholar 

  • Dobra, Alin, Garofalakis, Minos, Gehrke, Johannes, and Rastogi, Rajeev (2002). Processing Complex Aggregate Queries over Data Streams. In Proceedings of ACMSIGMOD Conference, pages 61–72.

    Google Scholar 

  • Dobra, Alin, Garofalakis, Minos, Gehrke, Johannes, and Rastogi, Rajeev (2004). Sketch-Based Multi-Query Processing over Data Streams. In Proceedings of EDBT Conference, pages 551–568.

    Google Scholar 

  • Feigenbaum, J., Kannan, S., Strauss, M., and Viswanathan, M. (1999). An approximate L1-difference algorithm for massive data streams. In Proc. IEEE FOCS Conference, page 501.

    Google Scholar 

  • Flajolet, P. and Martin, N. (1995). Probabilistic counting algorithms for data base applications. JCSS Journal, 31(2): 182–209.

    MathSciNet  Google Scholar 

  • Ganguly, S., Garofalakis, M., and Rastogi, R. (2003). Processing set expressions over continuous update streams. In Proceedings of ACMSIGMOD Conference, pages 265–276.

    Google Scholar 

  • Garofalakis, M. and Kumar, A. (2003). Correlating XML data streams using tree-edit distance embeddings. In Proceedings of ACM PODS Conference, pages 143–154.

    Google Scholar 

  • Gehrke, J., Korn, F., and Srivastava, D. (2001). On computing correlated aggregates over continual data streams. In Proceedings of ACM SIGMOD Conference, pages 13–24.

    Google Scholar 

  • Gibbons, P. (2001). Distinct sampling for highly-accurate answers to distinct values queries and event reports. In Proceedings of VLDB Conference, pages 541–550.

    Google Scholar 

  • Gibbons, P. and Tirthapura, S. (2002). Distributed streams algorithms for sliding windows. In Proceedings of ACM SPAA Conference, pages 63–72.

    Google Scholar 

  • Gilbert, A. C., Kotidis, Y., Muthukrishnan, S., and Strauss, M. (2001). Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In Proceedings of VLDB Conference, pages 79–88.

    Google Scholar 

  • Gilbert, A. C., Guha, S., Indyk, P., Kotidis, Y, Muthukrishnan, S., and Strauss, M. (2002). Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of ACM STOC Conference, pages 389–398.

    Google Scholar 

  • Greenwald, M. B. and Khanna, S. (2001). Space-efficient online computation of quantile summaries. In Proceedings of ACM SIGMOD Conference, pages 58–66.

    Google Scholar 

  • Hillston, J. and Kloul, L. (2001). Performance investigation of an on-line auction system. Concurrency and Computation: Practice and Experience, 13:23–41.

    Article  Google Scholar 

  • Indyk, P. (2000). Stable Distributions, Pseudorandom generators, embeddings, and data stream computation. In Proceedings of IEEEFOCS Conference, page 189.

    Google Scholar 

  • Johnson, T., Cranor, C, Spatscheck, O., and Shkapenyuk, V. (2003). Gigascope: A stream database for network applications. In Proceedings of ACM SIGMOD Conference, pages 647–651.

    Google Scholar 

  • Kang, J., Naughton, J. F., and Viglas, S. D. (2003). Evaluating window joins over unbounded streams. In Proceedings of the International Conference on Data Engineering (ICDE).

    Google Scholar 

  • Manku, G. S. and Motwani, R. (2002). Approximate frequency counts over data streams. In Proceedings of VLDB Conference, pages 346–357.

    Google Scholar 

  • Rajasekar, A., Vernon, F., Hansen, T., Linquist, K., and Orcutt, J. (2004). Virtual object ring buffer: A framework for real-time data grid. In Proceedings of HDPC Conference.

    Google Scholar 

  • Reiss, F. and Hellerstein, J. M. (2004). Data triage: An adaptive architecture for load shedding in TelegraphCQ. Intel Research Berkeley Report IRB-TR-04-004.

    Google Scholar 

  • Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., and Stonebraker, M. (2003). Load shedding in a data stream manager. In Proceedings of VLDB Conference, pages 309–320.

    Google Scholar 

  • Tucker, P. A. and Maier, D. (2003). Dealing with disorder. In MPDS Workshop.

    Google Scholar 

  • Tucker, P. A., Maier, D., Fegaras, L., and Sheard, T. (2003). Exploiting punctuation semantics in continuous data streams. IEEE TKDE, 15(3):555–568.

    Google Scholar 

  • Vitter, J. S. (1985). Random sampling with a reservoir. ACM Trans. on Math. Software, 11(l):37–57.

    Article  MATH  MathSciNet  Google Scholar 

  • Wilschut, Annita N. and Apers, Peter M. G. (1991). Dataflow query execution in a parallel main-memory environment. In Proceedings of PDIS Conference, pages 68–77.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer Science+Business Media, Inc.

About this chapter

Cite this chapter

Maier, D., Tucker, P.A., Garofalakis, M. (2005). Filtering, Punctuation, Windows and Synopses. In: Chaudhry, N.A., Shaw, K., Abdelguerfi, M. (eds) Stream Data Management. Advances in Database Systems, vol 30. Springer, Boston, MA. https://doi.org/10.1007/0-387-25229-0_3

Download citation

  • DOI: https://doi.org/10.1007/0-387-25229-0_3

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-24393-1

  • Online ISBN: 978-0-387-25229-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics