Skip to main content

Sketch-Based Multi-query Processing over Data Streams

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2992))

Abstract

Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited memory. Providing (perhaps approximate) answers to queries over such continuous data streams is a crucial requirement for many application environments; examples include large telecom and IP network installations where performance data from different parts of the network needs to be continuously collected and analyzed.

Randomized techniques, based on computing small “sketch” synopses for each stream, have recently been shown to be a very effective tool for approximating the result of a single SQL query over streaming data tuples. In this paper, we investigate the problems arising when data-stream sketches are used to process multiple such queries concurrently. We demonstrate that, in the presence of multiple query expressions, intelligently sharing sketches among concurrent query evaluations can result in substantial improvements in the utilization of the available sketching space and the quality of the resulting approximation error guarantees. We provide necessary and sufficient conditions for multi-query sketch sharing that guarantee the correctness of the result-estimation process. We also prove that optimal sketch sharing typically gives rise to \(\mathcal{NP}\)-hard questions, and we propose novel heuristic algorithms for finding good sketch-sharing configurations in practice. Results from our experimental study with realistic workloads verify the effectiveness of our approach, clearly demonstrating the benefits of our sketch-sharing methodology.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: How to Summarize the Universe: Dynamic Maintenance of Quantiles. In: VLDB 2002, Hong Kong, China (2002)

    Google Scholar 

  2. Bar-Yossef, Z., Jayram, T., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S.P. (eds.) RANDOM 2002. LNCS, vol. 2483, p. 1. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  3. Gibbons, P.B., Tirthapura, S.: Estimating Simple Functions on the Union of Data Streams. In: SPAA 2001, Crete Island, Greece (2001)

    Google Scholar 

  4. Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: VLDB 2002, Hong Kong, China (2002)

    Google Scholar 

  5. Alon, N., Gibbons, P.B., Matias, Y., Szegedy, M.: Tracking Join and Self-Join Sizes in Limited Storage. In: PODS 2001, Philadelphia, Pennsylvania (1999)

    Google Scholar 

  6. Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximating the Frequency Moments. In: STOC 1996, Philadelphia, Pennsylvania, pp. 20–29 (1996)

    Google Scholar 

  7. Indyk, P.: Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation. In: FOCS 2000, Redondo Beach, California, pp. 189–197 (2000)

    Google Scholar 

  8. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries. In: VLDB 2000, Roma, Italy (2000)

    Google Scholar 

  9. Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic Multidimensional Histograms. In: SIGMOD 2002, Madison, Wisconsin (2002)

    Google Scholar 

  10. Garofalakis, M., Gehrke, J., Rastogi, R.: Querying and Mining Data Streams: You Only Get One Look. In: Tutorial at VLDB 2002, Hong Kong, China (2002)

    Google Scholar 

  11. Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing Complex Aggregate Queries over Data Streams. In: SIGMOD 2002, Madison, Wisconsin, pp. 61–72 (2002)

    Google Scholar 

  12. Sellis, T.K.: Multiple-Query Optimization. ACM Transactions on Database Systems 13, 23–52 (1988)

    Article  Google Scholar 

  13. Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Sketch-based multi-query processing over data streams (manuscript), available at: http://www.cise.ufl.edu/~adobra/papers/sketch-mqo.pdf

  14. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)

    MATH  Google Scholar 

  15. Stefanov, S.M.: Separable Programming. Applied Optimization, vol. 53. Kluwer Academic Publishers, Dordrecht (2001)

    MATH  Google Scholar 

  16. Vitter, J.S., Wang, M.: Approximate Computation of Multidimensional Aggregates of Sparse Data UsingWavelets. In: SIGMOD 1999, Philadelphia, Pennsylvania (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R. (2004). Sketch-Based Multi-query Processing over Data Streams. In: Bertino, E., et al. Advances in Database Technology - EDBT 2004. EDBT 2004. Lecture Notes in Computer Science, vol 2992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24741-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24741-8_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21200-3

  • Online ISBN: 978-3-540-24741-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics