In-order sliding-window aggregation in worst-case constant time

Tangwongsan, Kanat; Hirzel, Martin; Schneider, Scott

doi:10.1007/s00778-021-00668-3

In-order sliding-window aggregation in worst-case constant time

Regular Paper
Published: 03 June 2021

Volume 30, pages 933–957, (2021)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

383 Accesses
2 Citations
Explore all metrics

Abstract

Sliding-window aggregation is a widely-used approach for extracting insights from the most recent portion of a data stream. While aggregations of interest can usually be expressed as binary operators that are associative, they are not necessarily commutative nor invertible. Non-invertible operators, however, are difficult to support efficiently. DABA is the first algorithm for sliding-window aggregation with worst-case constant time. Prior to DABA, the best published algorithms would require \(O(\log n)\) aggregation steps per window operation for a window of size n—and while for strictly in-order streams, this bound could be improved to O(1) aggregation steps in the amortized sense, it was not known how to achieve an O(1) bound in the worst case, which is critical for latency-sensitive applications. In this article, besides describing DABA in more detail, we introduce a new variant, DABA Lite, which achieves the same time bounds in less memory. Whereas DABA requires space for storing 2n partial aggregates, DABA Lite only requires space for \(n+2\) partial aggregates. Our experiments on synthetic and real data support the theoretical findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 10

L-BiX: incremental sliding-window aggregation over data streams using linear bidirectional aggregating indexes

Article 21 February 2020

Sliding-Window Aggregation Algorithms

Notes

https://github.com/IBM/sliding-window-aggregators.
Amortized is sufficient but worst-case guarantees will do as well.
This choice may appear counterintuitive; however, on our hardware, using a comparison is experimentally faster than using modulo. Bit masking is indeed the fastest option but only works when the size is a power of two.
We call the C++ function with the parameter .
Available at https://github.com/IBM/sliding-window-aggregators. Our experiments use the C++ implementations and benchmarks, as well as the Python scripts from commit .
Our implementation performs an optimization where the same stack is reused across queries. This is safe because the stack is always empty at the end of a query. For dynamic windows, the number of indices involved can be non-constant. Avoiding the recreation of the stack and reusing the same memory makes about a 20% difference in throughput, but does not change FlatFIT’s overall comparative performance.
In our benchmarking framework, the code that initializes the aggregator has no knowledge of the actual window size, so we use an upper bound on the window sizes.

References

Apache Flink: Scalable batch and stream data processing. https://flink.apache.org (2016). Retrieved Aug 2016
adamax: Re: Implement a queue in which \(\text{push}\_\text{ rear }(), \text{ pop }\_\text{ front }() \, \text{ and }\, \text{ get }\_\text{ min }()\) are all constant time operations. http://stackoverflow.com/questions/4802038/ (2011). Retrieved Aug (2016)
Agarwal, P.K., Cormode, G., Huang, Z., Phillips, J., Wei, Z., Yi, K.: Mergeable summaries. In: Symposium on Principles of Database Systems (PODS), pp. 23–34 (2012)
Akidau, T., Balikov, A., Bekiroglu, K., Chernyak, S., Haberman, J., Lax, R., McVeety, S., Mills, D., Nordstrom, P., Whittle, S.: MillWheel: Fault-tolerant stream processing at internet scale. In: Conference on Very Large Data Bases (VLDB) Industrial Track, pp. 734–746 (2013)
Ali, M., Chandramouli, B., Goldstein, J., Schindlauer, R.: The extensibility framework in Microsoft StreamInsight. In: International Conference on Data Engineering (ICDE), pp. 1242–1253 (2011)
Arasu, A., Widom, J.: Resource sharing in continuous sliding window aggregates. In: Conference on Very Large Data Bases (VLDB), pp. 336–347 (2004)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM (CACM) 13(7), 422–426 (1970)
Article Google Scholar
Bou, S., Kitagawa, H., Amagasa, T.: L-BiX: incremental sliding-window aggregation over data streams using linear bidirectional aggregating indexes. J. Knowl. Inf. Syst. (KAIS) 62, 3107–3131 (2020)
Article Google Scholar
Boykin, O., Ritchie, S., O’Connell, I., Lin, J.: Summingbird: A framework for integrating batch and online MapReduce computations. In: Conference on Very Large Data Bases (VLDB), pp. 1441–1451 (2014)
Carbone, P., Traub, J., Katsifodimos, A., Haridi, S., Markl, V.: Cutty: Aggregate sharing for user-defined windows. In: Conference on Information and Knowledge Management (CIKM), pp. 1201–1210 (2016)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd Edn. MIT Press (2009). http://mitpress.mit.edu/books/introduction-algorithms
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Article MathSciNet Google Scholar
Cranor, C., Johnson, T., Spataschek, O., Shkapenyuk, V.: Gigascope: A stream database for network applications. In: International Conference on Management of Data (SIGMOD) Industrial Track, pp. 647–651 (2003)
DEBS 2012 Grand Challenge: Manufacturing equipment. https://debs.org/grand-challenges/2012. Retrieved June 2020
Flajolet, P., Fusy, E., Gandouet, O., Meunier, F.: HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm. In: Conference on Analysis of Algorithms (AofA), pp. 127–146 (2007)
Gedik, B.: Generic windowing support for extensible stream processing systems. Software Practice and Experience (SP&E) pp. 1105–1128 (2013)
Hirzel, M., Baudart, G., Bonifati, A., Della Valle, E., Sakr, S., Vlachou, A.: Stream processing languages in the big data era. SIGMOD Record 47(2), 29–40 (2018)
Article Google Scholar
Hirzel, M., Rabbah, R., Suter, P., Tardieu, O., Vaziri, M.: Spreadsheets for stream processing with unbounded windows and partitions. In: Conference on Distributed Event-Based Systems (DEBS), pp. 49–60 (2016)
Hirzel, M., Schneider, S., Gedik, B.: SPL: an extensible language for distributed stream processing. Trans Program Lang Syst (TOPLAS) 39(1), 51–539 (2017)
Google Scholar
Hirzel, M., Schneider, S., Tangwongsan, K.: Tutorial: sliding-window aggregation algorithms. In: Conference on Distributed Event-Based Systems (DEBS), pp. 11–14 (2017)
Izbicki, M.: Algebraic classifiers: a generic approach to fast cross-validation, online training, and parallel training. In: International Conference on Machine Learning (ICML), pp. 648–656 (2013)
Jugel, U., Jerzak, Z., Hackenbroich, G., Markl, V.: M4: A visualization-oriented time series data aggregation. In: Conference on Very Large Data Bases (VLDB), pp. 797–808 (2014)
Krishnamurthy, S., Franklin, M.J., Davis, J., Farina, D., Golovko, P., Li, A., Thombre, N.: Continuous analytics over discontinuous streams. In: International Conference on Management of Data (SIGMOD), pp. 1081–1092 (2010)
Krishnamurthy, S., Wu, C., Franklin, M.: On-the-fly sharing for streamed aggregation. In: International Conference on Management of Data (SIGMOD), pp. 623–634 (2006)
Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J.M., Ramasamy, K., Taneja, S.: Twitter Heron: stream processing at scale. In: International Conference on Management of Data (SIGMOD), pp. 239–250 (2015)
Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Record 34(1), 39–44 (2005)
Article Google Scholar
Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: Symposium on Operating Systems Principles (SOSP) (2013)
Okasaki, C.: Simple and efficient purely functional queues and deques. J. Funct. Program. (JFP) 5(4), 583–592 (1995)
Article Google Scholar
Schneider, S., Gedik, B., Hirzel, M.: Tutorial: stream processing optimizations. In: Conference on Distributed Event-Based Systems (DEBS), pp. 249–258 (2013)
Schneider, S., Gedik, B., Hirzel, M.: Language runtime and optimizations in IBM Streams. Bull. IEEE Comput. Soc. Tech. Committee Data Eng. 38(4), 61–72 (2016)
Google Scholar
Schneider, S., Hirzel, M., Gedik, B., Wu, K.L.: Safe data parallelism for general streaming. IEEE Trans. Comput. (TC) 64(2), 504–517 (2015)
Article MathSciNet Google Scholar
Shein, A.U., Chrysanthis, P.K., Labrinidis, A.: FlatFIT: Accelerated incremental sliding-window aggregation for real-time analytics. In: Conference on Scientific and Statistical Database Management (SSDBM), pp. 5.1–5.12 (2017)
Shein, A.U., Chrysanthis, P.K., Labrinidis, A.: SlickDeque: High throughput and low latency incremental sliding-window aggregation. In: Conference on Extending Database Technology (EDBT), pp. 397–408 (2018)
Srivastava, U., Widom, J.: Flexible time management in data stream systems. In: Principles of Database Systems (PODS), pp. 263–274 (2004)
Tangwongsan, K., Hirzel, M., Schneider, S.: Constant-time sliding window aggregation. Tech. Rep. RC25574, IBM Research (2015)
Tangwongsan, K., Hirzel, M., Schneider, S.: Low-latency sliding-window aggregation in worst-case constant time. In: Conference on Distributed Event-Based Systems (DEBS), pp. 66–77 (2017)
Tangwongsan, K., Hirzel, M., Schneider, S.: Optimal and general out-of-order sliding-window aggregation. In: Conference on Very Large Data Bases (VLDB), pp. 1167–1180 (2019)
Tangwongsan, K., Hirzel, M., Schneider, S., Wu, K.L.: General incremental sliding-window aggregation. In: Conference on Very Large Data Bases (VLDB), pp. 702–713 (2015)
Theodorakis, G., Koliousis, A., Pietzuch, P.R., Pirk, H.: Hammer slide: work- and CPU-efficient streaming window aggregation. In: Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS), pp. 34–41 (2018)
Theodorakis, G., Koliousis, A., Pietzuch, P.R., Pirk, H.: LightSaber: efficient window aggregation on multi-core processors. In: International Conference on Management of Data (SIGMOD), pp. 2505–2521 (2020). https://dl.acm.org/doi/10.1145/3318464.3389753
Theodorakis, G., Pietzuch, P.R., Pirk, H.: SlideSlide: A fast incremental stream processing algorithm for multiple queries. In: Conference on Extending Database Technology (EDBT), pp. 435–438 (2020)
Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J.M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J., Bhagat, N., Mittal, S., Ryaboy, D.: Storm @Twitter. In: International Conference on Management of Data (SIGMOD), pp. 147–156 (2014)
Traub, J., Grulich, P., Cuellar, A.R., Bre\(\ddot{\text{ s }}\), S., Katsifodimos, A., Rabl, T., Markl, V.: Scotty: efficient window aggregation for out-of-order stream processing. In: Poster at the International Conference on Data Engineering (ICDE-Poster) (2018)
Traub, J., Grulich, P., Cuellar, A.R., Bre\(\ddot{\text{ s }}\), S., Katsifodimos, A., Rabl, T., Markl, V.: Efficient window aggregation with general stream slicing. In: Conference on Extending Database Technology (EDBT) (2019)
Villalba, A., Berral, J.L., Carrera, D.: Constant-time sliding window framework with reduced memory footprint and efficient bulk evictions. Trans. Parallel Distrib. Syst. (TPDS) 30(3), 486–500 (2019)
Article Google Scholar
Yu, Y., Gunda, P.K., Isard, M.: Distributed aggregation for data-parallel computing: Interfaces and implementations. In: Symposium on Operating Systems Principles (SOSP), pp. 247–260 (2009)
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: Fault-tolerant streaming computation at scale. In: Symposium on Operating Systems Principles (SOSP), pp. 423–438 (2013)

Download references

Author information

Authors and Affiliations

Mahidol University International College, Nakhon Pathom, Thailand
Kanat Tangwongsan
IBM Research, New York, USA
Martin Hirzel & Scott Schneider

Authors

Kanat Tangwongsan
View author publications
You can also search for this author in PubMed Google Scholar
Martin Hirzel
View author publications
You can also search for this author in PubMed Google Scholar
Scott Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kanat Tangwongsan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tangwongsan, K., Hirzel, M. & Schneider, S. In-order sliding-window aggregation in worst-case constant time. The VLDB Journal 30, 933–957 (2021). https://doi.org/10.1007/s00778-021-00668-3

Download citation

Received: 09 August 2020
Revised: 25 January 2021
Accepted: 10 April 2021
Published: 03 June 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s00778-021-00668-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

In-order sliding-window aggregation in worst-case constant time

Abstract

Access this article

Similar content being viewed by others