Encyclopedia of Big Data Technologies

2019 Edition
| Editors: Sherif Sakr, Albert Y. Zomaya

Types of Stream Processing Algorithms

  • Lukasz GolabEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-77525-8_193

Synonyms

Definitions

A stream processing algorithm operates over a continuous and potentially unbounded stream of data, arriving at a possibly very high speed, one item or one batch of items at a time, and does so in limited time per item and using limited working storage. At any point in time, a stream algorithm can produce an answer over the prefix of the stream observed so far or over a sliding window of recent data. Stream processing algorithms are used to answer continuous queries, also known as standing queries.

Stream processing algorithms can be categorized according to (1) what output they compute (e.g., what function is being computed, is the answer exact or approximate) and (2) how they compute the output (e.g., sampling vs. hashing, single-threaded vs. distributed, one-pass vs. several passes).

Overview

Stream processing algorithms operate sequentially over unbounded input streams and produce output streams. The input stream is...
This is a preview of subscription content, log in to check access.

References

  1. Agarwal PK, Cormode G, Huang Z, Phillips JM, Wei Z, Yi K (2013) Mergeable summaries. ACM Trans Database Syst 38(4):26:1–26:28MathSciNetzbMATHCrossRefGoogle Scholar
  2. Akidau T, Balikov A, Bekiroglu K, Chernyak S, Haberman J, Lax R, McVeety S, Mills D, Nordstrom P, Whittle S (2013) Millwheel: Fault-tolerant stream processing at internet scale. PVLDB 6(11):1033–1044Google Scholar
  3. Arasu A, Manku GS (2004) Approximate counts and quantiles over sliding windows. In: Proceedings of the twenty-third ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, 14–16 June 2004, Paris, pp 286–296Google Scholar
  4. Arasu A, Widom J (2004) Resource sharing in continuous sliding-window aggregates. In: (e)Proceedings of the thirtieth international conference on very large data bases, Toronto, 31 Aug–3 Sept 2004, pp 336–347CrossRefGoogle Scholar
  5. Babcock B, Olston C (2003) Distributed top-k monitoring. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, 9–12 June 2003, pp 28–39Google Scholar
  6. Babcock B, Datar M, Motwani R (2002) Sampling from a moving window over streaming data. In: Proceedings of the thirteenth annual ACM-SIAM symposium on discrete algorithms, 6–8 Jan 2002, San Francisco, pp 633–634Google Scholar
  7. Babcock B, Datar M, Motwani R (2004) Load shedding for aggregation queries over data streams. In: Proceedings of the 20th international conference on data engineering, ICDE 2004, 30 Mar–2 Apr 2004, Boston, pp 350–361Google Scholar
  8. Braverman V, Ostrovsky R, Zaniolo C (2012) Optimal sampling from sliding windows. J Comput Syst Sci 78(1):260–272MathSciNetzbMATHCrossRefGoogle Scholar
  9. Bulut A, Singh AK (2005) A unified framework for monitoring data streams in real time. In: Proceedings of the 21st international conference on data engineering, ICDE 2005, 5–8 Apr 2005, Tokyo, pp 44–55Google Scholar
  10. Charikar M, Chen KC, Farach-Colton M (2002) Finding frequent items in data streams. In: Proceedings of 29th international colloquium automata, languages and programming, ICALP 2002, Malaga, 8–13 July 2002, pp 693–703Google Scholar
  11. Cormode G (2017) Data sketching. Commun ACM 60(9):48–55CrossRefGoogle Scholar
  12. Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J 19(1):3–20CrossRefGoogle Scholar
  13. Cormode G, Muthukrishnan S (2005) An improved data stream summary: the count-min sketch and its applications. J Algorithm 55(1):58–75MathSciNetzbMATHCrossRefGoogle Scholar
  14. Cormode G, Muthukrishnan S, Yi K, Zhang Q (2012) Continuous sampling from distributed streams. J ACM 59(2):10:1–10:25MathSciNetzbMATHCrossRefGoogle Scholar
  15. Cranor CD, Johnson T, Spatscheck O, Shkapenyuk V (2003) Gigascope: a stream database for network applications. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, 9–12 June 2003, pp 647–651Google Scholar
  16. Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. SIAM J Comput 31(6):1794–1813MathSciNetzbMATHCrossRefGoogle Scholar
  17. Durand M, Flajolet P (2003) Loglog counting of large cardinalities (extended abstract). In: Proceedings of 11th annual European symposium algorithms – ESA 2003, Budapest, 16–19 Sept 2003, pp 605–617Google Scholar
  18. Flajolet P, Martin GN (1983) Probabilistic counting. In: 24th annual symposium on foundations of computer science, Tucson, 7–9 Nov 1983, pp 76–82Google Scholar
  19. Flajolet P, Fusy E, Gandouet O, Meunier F (2007) Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In: Proceedings of the conference on analysis of algorithms, pp 127–146zbMATHGoogle Scholar
  20. Golab L, Özsu MT (2003) Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of 29th international conference on very large data bases VLDB 2003, 9–12 Sept 2003, Berlin, pp 500–511Google Scholar
  21. Greenwald M, Khanna S (2001) Space-efficient online computation of quantile summaries. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, Santa Barbara, 21–24 May 2001, pp 58–66Google Scholar
  22. Haas PJ (2016) Data-stream sampling: basic techniques and results. In: Garofalakis M, Gehrke J, Rastogi R (eds) Data stream management – processing high-speed data streams. Springer, Heidelberg, pp 13–44Google Scholar
  23. Kang J, Naughton JF, Viglas S (2003) Evaluating window joins over unbounded streams. In: Proceedings of the 19th international conference on data engineering, 5–8 Mar 2003, Bangalore, pp 341–352Google Scholar
  24. Krishnamurthy S, Wu C, Franklin MJ (2006) On-the-fly sharing for streamed aggregation. In: Proceedings of the ACM SIGMOD international conference on management of data, Chicago, 27–29 June 2006, pp 623–634Google Scholar
  25. Kulkarni S, Bhagat N, Fu M, Kedigehalli V, Kellogg C, Mittal S, Patel JM, Ramasamy K, Taneja S (2015) Twitter heron: stream processing at scale. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, Melbourne, 31 May–4 June 2015, pp 239–250Google Scholar
  26. Lee L, Ting HF (2006) A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: Proceedings of the twenty-fifth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, 26–28 June 2006, Chicago, pp 290–297Google Scholar
  27. Li J, Maier D, Tufte K, Papadimos V, Tucker PA (2005) No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Rec 34(1):39–44CrossRefGoogle Scholar
  28. Liu X, Golab L, Golab WM, Ilyas IF, Jin S (2017) Smart meter data analytics: Systems, algorithms, and benchmarking. ACM Trans Database Syst 42(1):2: 1–2:39CrossRefGoogle Scholar
  29. Madden S, Franklin MJ (2002) Fjording the stream: an architecture for queries over streaming sensor data. In: Proceedings of the 18th international conference on data engineering, San Jose, 26 Feb–1 Mar 2002, pp 555–566Google Scholar
  30. Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: VLDB 2002, Proceedings of 28th international conference on very large data bases, 20–23 Aug 2002, Hong Kong, pp 346–357Google Scholar
  31. Metwally A, Agrawal D, El Abbadi A (2005) Efficient computation of frequent and top-k elements in data streams. In: Proceedings of 10th international conference on database theory – ICDT 2005, Edinburgh, 5–7 Jan 2005, pp 398–412Google Scholar
  32. Misra J, Gries D (1982) Finding repeated elements. Sci Comput Program 2(2):143–152MathSciNetzbMATHCrossRefGoogle Scholar
  33. Nasir MAU, Morales GDF, García-Soriano D, Kourtellis N, Serafini M (2015) The power of both choices: Practical load balancing for distributed stream processing engines. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, 13–17 Apr 2015, pp 137–148Google Scholar
  34. Olston C, Jiang J, Widom J (2003) Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, 9–12 June 2003, pp 563–574Google Scholar
  35. Stonebraker M, Çetintemel U, Zdonik SB (2005) The 8 requirements of real-time stream processing. SIGMOD Rec 34(4):42–47CrossRefGoogle Scholar
  36. Tatbul N, Çetintemel U, Zdonik SB, Cherniack M, Stonebraker M (2003) Load shedding in a data stream manager. In: VLDB 2003, Proceedings of 29th international conference on very large data bases, 9–12 Sept 2003, Berlin, pp 309–320Google Scholar
  37. Teubner J, Müller R (2011) How soccer players would do stream joins. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2011, Athens, 12–16 June 2011, pp 625–636Google Scholar
  38. Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1):37–57MathSciNetzbMATHCrossRefGoogle Scholar
  39. Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I (2013) Discretized streams: fault-tolerant streaming computation at scale. In: ACM SIGOPSCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of WaterlooWaterlooCanada