Encyclopedia of Database Systems

Living Edition
| Editors: Ling Liu, M. Tamer Özsu

Distributed Data Streams

Living reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7993-3_137-2

Definition

A majority of today’s data is constantly evolving and fundamentally distributed in nature. Data for almost any large-scale data-management task is continuously collected over a wide area, and at a much greater rate than ever before. Compared to traditional, centralized stream processing, querying such large-scale, evolving data collections poses new challenges, due mainly to the physical distribution of the streaming data and the communication constraints of the underlying network. Distributed stream processing algorithms should guarantee efficiency not only in terms of space and processing time (as conventional streaming techniques), but also in terms of the communication load imposed on the network infrastructure.

Historical Background

The prevailing paradigm in database systems has been understanding the management of centralizeddata: how to organize, index, access, and query data that is held centrally on a single machine or a small number of closely linked machines....

This is a preview of subscription content, log in to check access

Recommended Reading

  1. 1.
    Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annual ACM Symposium on the Theory of Computing; 1996. p. 20–9.Google Scholar
  2. 2.
    Babcock B, Olston C. Distributed top-K monitoring. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data; 2003.Google Scholar
  3. 3.
    Balazinska M, Balakrishnan H, Madden S, Stonebraker M. Fault-tolerance in the borealis distributed stream processing system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2005.Google Scholar
  4. 4.
    Chu D, Deshpande A, Hellerstein JM, Hong W. Approximate data collection in sensor networks using probabilistic models. In: Proceedings of the 22nd International Conference on Data Engineering; 2006.Google Scholar
  5. 5.
    Cormode G, Garofalakis M. Sketching streams through the net: distributed approximate query tracking. In: Proceedings of the 31st International Conference on Very Large Data Bases; 2005.Google Scholar
  6. 6.
    Cormode G, Muthukrishnan S, Yi K. Algorithms for distributed functional monitoring. In: Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms; 2008.Google Scholar
  7. 7.
    Cranor C, Johnson T, Spatscheck O, Shkapenyuk V. Gigascope: a stream database for network applications. In: Proceedings ACM SIGMOD International Conference on Management of Data; 2003.Google Scholar
  8. 8.
    Das A, Ganguly S, Garofalakis M, Rastogi R. Distributed set-expression cardinality estimation. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004.Google Scholar
  9. 9.
    Flajolet P, Nigel Martin G. Probabilistic counting algorithms for data base applications. J Comput Syst Sci. 1985;31:182–209.MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Garofalakis M, Hellerstein JM, Maniatis P. Proof sketches: verifiable in-network aggregation. In: Proceedings of the 23rd International Conference on Data Engineering; 2007.Google Scholar
  11. 11.
    Guestrin C, Bodik P, Thibaux R, Paskin M, Madden S. Distributed regression: an efficient framework for modeling sensor network data. Inform. Process. Sensor Networks; 2004.Google Scholar
  12. 12.
    Huang L, Nguyen X, Garofalakis M, Hellerstein JM, Jordan MI, Joseph AD, Taft N. Communication-efficient online detection of network-wide anomalies. In: Proceedings of the 26th Annual Joint Conference of the IEEE Computer and Communications Societies; 2007.Google Scholar
  13. 13.
    Jain A, Hellerstein J, Ratnasamy S, Wetherall D. A wakeup call for internet monitoring systems: The case for distributed triggers. In: Proceedings of the Third Workshop on Hot Topics in Networks; 2004.Google Scholar
  14. 14.
    Jain S, Fall K, Patra R. Routing in a delay tolerant network. In: Proceeidngs of the ACM International Conference of the on Data Communication; 2005.Google Scholar
  15. 15.
    Kempe D, Dobra A, Gehrke J. Gossip-based computation of aggregate information. In: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science; 2003.Google Scholar
  16. 16.
    Keralapura R, Cormode G, Ramamirtham J. Communication-efficient distributed monitoring of thresholded counts. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data; 2006, p. 289–300.Google Scholar
  17. 17.
    Loo BT, Condie T, Garofalakis M, Gay DE, Hellerstein JM, Maniatis P, Ramakrishnan R, Roscoe T, Stoica I. Declarative networking: language, execution, and optimization. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006.Google Scholar
  18. 18.
    Madden S, Franklin MJ, Hellerstein JM, Hong W. TAG: a tiny aggregation service for ad-hoc sensor networks. In: Proceedings of the 5th USENIX Symposium on Operating System Design and Implementation; 2002.Google Scholar
  19. 19.
    Manjhi A, Nath S, Gibbons P. Tributaries and deltas: efficient and robust aggregation in sensor network streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2005.Google Scholar
  20. 20.
    Nath S, Gibbons PB, Seshan S, Anderson ZR. Synopsis diffusion for robust aggrgation in sensor networks. In: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems; 2004.Google Scholar
  21. 21.
    Olston C, Jiang J, Widom J Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2003.Google Scholar
  22. 22.
    Pietzuch P, Ledlie J, Schneidman J, Roussopoulos M, Welsh M, Seltzer M. Network-awareoperator placement for stream-processing systems. In: Proceedings of the 22nd International Conference on Data Engineering; 2006.Google Scholar
  23. 23.
    Rhea S, Godfrey B, Karp B, Kubiatowicz J, Ratnasamy S, Shenker S, Stoica I, Yu HY. OpenDHT: a public dht service and its uses. In: Proceedings of the ACM International Conference of the on Data Communication; 2005.Google Scholar
  24. 24.
    Rissanen J. Modeling by shortest data description. Automatica. 1978;14:465–71.CrossRefMATHGoogle Scholar
  25. 25.
    Shah MA, Hellerstein JM, Brewer E. Highly available, fault-tolerant, parallel dataflows. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004.Google Scholar
  26. 26.
    Sharfman I, Schuster A, Keren D. A geometric approach to monitoring threshold functions over distributed data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006, p. 301–12.Google Scholar
  27. 27.
    Xing Y, Hwang JH, Cetintemel U, Zdonik S. Providing resiliency to load variations in ditributed stream processing. In: Proceedings of the 32nd International Conference on Very Large Data Bases; 2006.Google Scholar

Copyright information

© Springer Science+Business Media LLC 2016

Authors and Affiliations

  1. 1.Technical University of CreteChaniaGreece