Encyclopedia of Big Data Technologies

2019 Edition
| Editors: Sherif Sakr, Albert Y. Zomaya

Sliding-Window Aggregation Algorithms

  • Kanat TangwongsanEmail author
  • Martin Hirzel
  • Scott Schneider
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-77525-8_157

Abstract

Sliding-window aggregation summarizes a collection of recent streaming data, capturing the most recent happenings as well as some history. Algorithms for this problem are required to maintain an aggregate value as new data items are inserted into the window when they arrive, and old data items are evicted from the window when they expire. Supporting this efficiently poses algorithmic challenges, especially for non-invertible aggregation functions such as |max—, for which there is no way to “subtract off” expiring items. This chapter provides a brief overview of this area of research and explores a number of sliding-window aggregation algorithms, including both simple and sophisticated algorithms. Real-world use cases are also given to showcase problem scenarios where sliding-window aggregation can be applicable.

This is a preview of subscription content, log in to check access.

References

  1. Arasu A, Widom J (2004) Resource sharing in continuous sliding window aggregates. In: Conference on very large data bases (VLDB), pp 336–347CrossRefGoogle Scholar
  2. Arasu A, Cherniack M, Galvez E, Maier D, Maskey AS, Ryvkina E, Stonebraker M, Tibbetts R (2004) Linear road: a stream data management benchmark. In: Conference on very large data bases (VLDB), pp 480–491Google Scholar
  3. Arasu A, Babu S, Widom J (2006) The CQL continuous query language: semantic foundations and query execution. J Very Large Data Bases 15(2):121–142CrossRefGoogle Scholar
  4. Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426zbMATHCrossRefGoogle Scholar
  5. Blount M, Ebling MR, Eklund JM, James AG, McGregor C, Percival N, Smith K, Sow D (2010) Real-time analysis for intensive care: development and deployment of the Artemis analytic system. IEEE Eng Med Biol Mag 29:110–118CrossRefGoogle Scholar
  6. Carbone P, Traub J, Katsifodimos A, Haridi S, Markl V (2016) Cutty: aggregate sharing for user-defined windows. In: Conference on information and knowledge management (CIKM), pp 1201–1210Google Scholar
  7. Cormode G, Muthukrishnan S (2005) An improved data stream summary: the count-min sketch and its applications. J Algorithms 55(1):58–75MathSciNetzbMATHCrossRefGoogle Scholar
  8. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Symposium on operating systems design and implementation (OSDI), pp 137–150Google Scholar
  9. Flajolet P, Fusy E, Gandouet O, Meunier F (2007) HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In: Conference on analysis of algorithms (AofA), pp 127–146Google Scholar
  10. Garcia-Molina H, Ullman JD, Widom J (2008) Database systems: the complete book, 2nd edn. Pearson/Prentice Hall, New DehliGoogle Scholar
  11. Gedik B (2013) Generic windowing support for extensible stream processing systems. Softw Pract Exp 44(9): 1105–1128CrossRefGoogle Scholar
  12. Gray J, Bosworth A, Layman A, Pirahesh H (1996) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-total. In: International conference on data engineering (ICDE), pp 152–159Google Scholar
  13. Hirzel M, Rabbah R, Suter P, Tardieu O, Vaziri M (2016) Spreadsheets for stream processing with unbounded windows and partitions. In: Conference on distributed event-based systems (DEBS), pp 49–60Google Scholar
  14. Hutton G (1999) A tutorial on the universality and expressiveness of fold. J Funct Program 9(1):355–372MathSciNetzbMATHCrossRefGoogle Scholar
  15. Krishnamurthy S, Wu C, Franklin M (2006) On-the-fly sharing for streamed aggregation. In: International conference on management of data (SIGMOD), pp 623–634Google Scholar
  16. Krishnamurthy S, Franklin MJ, Davis J, Farina D, Golovko P, Li A, Thombre N (2010) Continuous analytics over discontinuous streams. In: International conference on management of data (SIGMOD), pp 1081–1092Google Scholar
  17. Li J, Maier D, Tufte K, Papadimos V, Tucker PA (2005) No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Rec 34(1):39–44CrossRefGoogle Scholar
  18. Okasaki C (1995) Simple and efficient purely functional queues and deques. J Funct Program 5(4): 583–592CrossRefGoogle Scholar
  19. Sajaniemi J, Pekkanen J (1988) An empirical analysis of spreadsheet calculation. Softw Pract Exp 18(6):583–596CrossRefGoogle Scholar
  20. Schneider S, Hirzel M, Gedik B, Wu KL (2015) Safe data parallelism for general streaming. IEEE Trans Comput 64(2):504–517MathSciNetzbMATHCrossRefGoogle Scholar
  21. Shein AU, Chrysanthis PK, Labrinidis A (2017) FlatFIT: accelerated incremental sliding-window aggregation for real-time analytics. In: Conference on scientific and statistical database management (SSDBM), pp 5:1–5:12Google Scholar
  22. Srivastava U, Widom J (2004) Flexible time management in data stream systems. In: Principles of database systems (PODS), pp 263–274Google Scholar
  23. Tangwongsan K, Hirzel M, Schneider S, Wu KL (2015) General incremental sliding-window aggregation. In: Conference on very large data bases (VLDB), pp 702–713CrossRefGoogle Scholar
  24. Tangwongsan K, Hirzel M, Schneider S (2017) Low-latency sliding-window aggregation in worst-case constant time. In: Conference on distributed event-based systems (DEBS), pp 66–77Google Scholar
  25. Treleaven P, Galas M, Lalchand V (2013) Algorithmic trading review. Commun ACM 56(11):76–85CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Kanat Tangwongsan
    • 1
    Email author
  • Martin Hirzel
    • 2
  • Scott Schneider
    • 2
  1. 1.Mahidol University International CollegeSalayaThailand
  2. 2.IBM Research AIYorktown HeightsUSA