Skip to main content

Querying Distributed Data Streams

(Invited Keynote Talk)

  • Conference paper
  • 1009 Accesses

Part of the Lecture Notes in Computer Science book series (LNISA,volume 8716)

Abstract

Effective Big Data analytics pose several difficult challenges for modern data management architectures. One key such challenge arises from the naturally streaming nature of big data, which mandates efficient algorithms for querying and analyzing massive, continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams arise naturally in emerging large-scale event monitoring applications; for instance, network-operations monitoring in large ISPs, where usage information from numerous sites needs to be continuously collected and analyzed for interesting trends. In addition to memory- and time-efficiency concerns, the inherently distributed nature of such applications also raises important communication-efficiency issues, making it critical to carefully optimize the use of the underlying network infrastructure. In this talk, we introduce the distributed data streaming model, and discuss recent work on tracking complex queries over massive distributed streams, as well as new research directions in this space.

Keywords

  • Data Stream
  • Remote Site
  • Large Data Base
  • Streaming Data
  • Local Query

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-10933-6_1
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-10933-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, N., Gibbons, P.B., Matias, Y., Szegedy, M.: Tracking Join and Self-Join Sizes in Limited Storage. In: Proc. of the 18th ACM Symposium on Principles of Database Systems, Philadeplphia, Pennsylvania (May 1999)

    Google Scholar 

  2. Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximating the Frequency Moments. In: Proc. of the 28th Annual ACM Symposium on the Theory of Computing, Philadelphia, Pennsylvania, pp. 20–29 (May 1996)

    Google Scholar 

  3. Babcock, B., Olston, C.: Distributed Top-K Monitoring. In: Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (June 2003)

    Google Scholar 

  4. Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)

    CrossRef  Google Scholar 

  5. Cormode, G., Garofalakis, M.: Streaming in a connected world: querying and tracking distributed data streams. In: SIGMOD (2007)

    Google Scholar 

  6. Cormode, G., Garofalakis, M.: Approximate Continuous Querying of Distributed Streams. ACM Transactions on Database Systems 33(2) (June 2008)

    Google Scholar 

  7. Cormode, G., Garofalakis, M., Muthukrishnan, S., Rastogi, R.: Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles. In: Proc. of the 2005 ACM SIGMOD Intl. Conference on Management of Data, Baltimore, Maryland (June 2005)

    Google Scholar 

  8. Cormode, G., Garofalakis, M., Sacharidis, D.: Fast Approximate Wavelet Tracking on Streams. In: Ioannidis, Y., et al. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 4–22. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  9. Cormode, G., Muthukrishnan, S.: What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically. In: Proc. of the 22nd ACM Symposium on Principles of Database Systems, San Diego, California, pp. 296–306 (June 2003)

    Google Scholar 

  10. Cormode, G., Muthukrishnan, S.: An improved data stream summary: The count-min sketch and its applications. In: Latin American Informatics, pp. 29–38 (2004)

    Google Scholar 

  11. Cranor, C., Johnson, T., Spatscheck, O., Shkapenyuk, V.: Gigascope: A Stream Database for Network Applications. In: Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (June 2003)

    Google Scholar 

  12. Das, A., Ganguly, S., Garofalakis, M., Rastogi, R.: Distributed Set-Expression Cardinality Estimation. In: Proc. of the 30th Intl. Conference on Very Large Data Bases, Toronto, Canada (September 2004)

    Google Scholar 

  13. Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing Complex Aggregate Queries over Data Streams. In: Proc. of the 2002 ACM SIGMOD Intl. Conference on Management of Data, Madison, Wisconsin, pp. 61–72 (June 2002)

    Google Scholar 

  14. Ganguly, S., Garofalakis, M., Rastogi, R.: Processing Set Expressions over Continuous Update Streams. In: Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (June 2003)

    Google Scholar 

  15. Garofalakis, M., Keren, D., Samoladas, V.: Sketch-based Geometric Monitoring of Distributed Stream Queries. In: Proc. of the 39th Intl. Conference on Very Large Data Bases, Trento, Italy (August 2013)

    Google Scholar 

  16. Giatrakos, N., Deligiannakis, A., Garofalakis, M., Sharfman, I., Schuster, A.: Prediction-based Geometric Monitoring over Distributed Data Streams. In: Proc. of the 2012 ACM SIGMOD Intl. Conference on Management of Data (June 2012)

    Google Scholar 

  17. Giatrakos, N., Deligiannakis, A., Garofalakis, M., Sharfman, I., Schuster, A.: Distributed Geometric Query Monitoring using Prediction Models. ACM Transactions on Database Systems 39(2) (2014)

    Google Scholar 

  18. Gibbons, P.B.: Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports. In: Proc. of the 27th Intl. Conference on Very Large Data Bases, Roma, Italy (September 2001)

    Google Scholar 

  19. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: How to Summarize the Universe: Dynamic Maintenance of Quantiles. In: Proc. of the 28th Intl. Conference on Very Large Data Bases, Hong Kong, China, pp. 454–465 (August 2002)

    Google Scholar 

  20. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: One-pass wavelet decomposition of data streams. IEEE Transactions on Knowledge and Data Engineering 15(3), 541–554 (2003)

    CrossRef  Google Scholar 

  21. Greenwald, M.B., Khanna, S.: Space-Efficient Online Computation of Quantile Summaries. In: Proc. of the 2001 ACM SIGMOD Intl. Conference on Management of Data, Santa Barbara, California (May 2001)

    Google Scholar 

  22. Keralapura, R., Cormode, G., Ramamirtham, J.: Communication-efficient distributed monitoring of thresholded counts. In: Proc. of the 2006 ACM SIGMOD Intl. Conference on Management of Data, Chicago, Illinois, pp. 289–300 (June 2006)

    Google Scholar 

  23. Keren, D., Sharfman, I., Schuster, A., Livne, A.: Shape-Sensitive Geometric Monitoring. IEEE Transactions on Knowledge and Data Engineering 24(8) (August 2012)

    Google Scholar 

  24. Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, Cambridge (1997)

    MATH  Google Scholar 

  25. Madden, S.R., Franklin, M.J., Hellerstein, J.M., Hong, W.: The Design of an Acquisitional Query Processor for Sensor Networks. In: Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (June 2003)

    Google Scholar 

  26. Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proc. of the 28th Intl. Conference on Very Large Data Bases, Hong Kong, China, pp. 346–357 (August 2002)

    Google Scholar 

  27. NII Shonan Workshop on Large-Scale Distributed Computation, Shonan Village, Japan (January 2012), http://www.nii.ac.jp/shonan/seminar011/ .

  28. Olston, C., Jiang, J., Widom, J.: Adaptive Filters for Continuous Queries over Distributed Data Streams. In: Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (June 2003)

    Google Scholar 

  29. Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring threshold functions over distributed data streams. In: Proc. of the 2006 ACM SIGMOD Intl. Conference on Management of Data, Chicago, Illinois, pp. 301–312 (June 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Garofalakis, M. (2014). Querying Distributed Data Streams. In: Manolopoulos, Y., Trajcevski, G., Kon-Popovska, M. (eds) Advances in Databases and Information Systems. ADBIS 2014. Lecture Notes in Computer Science, vol 8716. Springer, Cham. https://doi.org/10.1007/978-3-319-10933-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10933-6_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10932-9

  • Online ISBN: 978-3-319-10933-6

  • eBook Packages: Computer ScienceComputer Science (R0)