Abstract
The number of real-time information sources, or so-called streams, has rapidly increased, leading to a greater demand for complex analyses over streams. Although many stream analysis methods exist, aggregation is fundamental to ascertain higher levels of knowledge from raw data. In particular, sliding-window aggregation, where aggregations over sliding windows are repeatedly computed, is useful in many real-life applications. Two stacks is the state-of-the-art method to compute sliding-window aggregations incrementally with a O(1) time complexity. However, its performance seriously degrades as the window size increases due to the high overhead to maintain its index. To address this problem, this paper proposes a linear bidirectional index (L-BiX) that exploits two kinds of partial aggregations. Specifically, forward (old-to-new) and backward (new-to-old) aggregations allow efficient computations in an incremental manner. The proposed algorithm requires the same time complexity as two stacks (O(1)). Our experimental evaluation shows that the throughput of L-BiX can be faster by up to 1.71 times than that of two stacks with a 50% smaller memory footprint.
Similar content being viewed by others
References
Abadi DJ et al (2003) Aurora: a new model and architecture for data stream management. VLDB J 2(12):120–139
Abadi DJ et al (2005) The design of the borealis stream processing engine. In: Proceedings of the CIDR, pp 277–289
Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. Proc VLDB Endow 11(6):1033–1044
Motwani R et al (2003) Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the CIDR
Toshniwal A et al (2014) Storm@twitter. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 147–156
Murray DG et al (2013) Naiad: a timely dataflow system. In: Proceedings of the ACM symposium on operating systems principles, pp 439–455
Ruxton GD (2006) The unequal variance t test is an underused alternative to student’s t test and the Mann Whitney U test. Behav Ecol 17(4):688–690
Shein AU, Chrysanthis PK, Labrinidis A (2015) Processing of aggregate continuous queries in a distributed environment. Real-time business intelligence and analytics - international workshops, BIRTE 2015, Kohala Coast, HI, USA, August 31, 2015, BIRTE 2016, New Delhi, India, September 5, 2016, BIRTE 2017, Munich, Germany, August 28, 2017, Revised Selected Papers, pp 45–62. https://doi.org/10.1007/978-3-030-24124-7_4
Gulisano V et al (2012) Streamcloud: an elastic and scalable data streaming system. IEEE Trans Parallel Distrib Syst 12(23):2351–2351
Arasu A et al (2006) The CQL continuous query language: semantic foundations and query execution. VLDB J 2(15):121–142
Li J et al (2005) Semantics and evaluation techniques for window aggregates in data streams. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 311–322
Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc VLDB Endow 12(8):1792–1803
Gedik B (2014) Generic windowing support for extensible stream processing systems. Softw Pract Exp 9(44):1105–1128
Ali MH et al (2010) Spatio-temporal stream processing in microsoft streaminsight. Softw Pract Exp 2(33):69–74
Yang J, Widom J (2003) Incremental computation and maintenance of temporal aggregates. VLDB J 3(12):262–283
Moon B et al (2000) Scalable algorithms for large temporal aggregation. In: International conference on data engineering (ICDE), pp 145–154
Soisalon-Soininen E, Widmayer P (2003) Single and bulk updates in stratified trees: an amortized and worst-case analysis. In: Goos G, Hartmanis J, van Leeuwen J (eds) Computer science in perspective: essays dedicated to Thomas Ottmann, LNCS (Lecture Notes in Computer Science), vol 2598. Springer, Berlin, pp 278–292
Bou S et al (2019) Streamingcube-based analytical framework for environmental data analysis. In: BigComp 2019, pp 1–8
Bou S et al (2019) Scalable keyword search over relational data streams by aggressive candidate network consolidation. Inf Syst 81:117–135
Bou S et al (2014) Keyword search with path-based filtering over XML streams. In: The 33rd IEEE symposium on reliable distributed systems (SRDS 2014), pp 337–338
Bou S et al (2014) Filtering XML streams by XPath and Keywords. In: Proceedings of the 16th international conference on information integration and web-based applications and services (iiWAS 2014), pp 410–419
Bou S et al (2015) Path-based keyword search over XML streams. Int J Web Inf Syst (IJWIS) 11(3):347–369
Bou S et al (2016) An improved method of keyword search over relational data streams by aggressive candidate network consolidation. In: Database and expert systems applications, pp 336–351
Qlik. https://www.qlik.com/. Accessed 18 Mar 2019
Versive. https://jp.cloudera.com/solutions/gallery/versive-security-engine.html. Accessed 18 Mar 2019
Ayasdi. https://www.ayasdi.com/. Accessed 18 Mar 2019
CDS. https://www.himss.org/clinical-decision-support-cds. Accessed 18 Mar 2019
Blount M et al (2010) Real-time analysis for intensive care: development and deployment of the artemis analytic system. IEEE Eng Med Biol Mag 29:110–118
EHRs. https://www.capterra.com/infographics/top-emr-software. Accessed 18 Mar 2019
Gray J et al (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min Knowl Discov 1(1):29–53
Wesley R, Xu F (2016) Incremental computation of common windowed holistic aggregates. Proc VLDB Endow 9(12):1221–1232
Li J (2005) No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Rec 34(1):39–44
Krishnamurthy S et al (2006) On-the-fly sharing for streamed aggregation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 623–634
Carbone P et al (2016) Cutty: aggregate sharing for user-defined windows. In: Proceedings of the ACM international on conference on information and knowledge management, pp 1201–1210
Bhatotia P et al (2014) Slider: incremental sliding window analytics. In: Proceedings of the 15th international middleware conference, pp 61–72
Shein AU et al (2017) Flatfit: accelerated incremental sliding-window aggregation for real-time analytics. In: Proceedings of international conference on scientific and statistical database management, vol 5, pp 1–12
Tangwongsan K et al (2015) General incremental sliding-window aggregation. Proc VLDB Endow 8(7):702–713
Tangwongsan K et al (2017) Low-latency sliding-window aggregation in worst-case constant time. In: Proceedings of the 11th ACM international conference on distributed and event-based systems, pp 66–77
Arasu A, Widom J (2004) Resource sharing in continuous sliding-window aggregates. In: Proceedings of the VLDB conference, pp 336–347
Arasu A, Manku GS (2004) Approximate counts and quantiles over sliding windows. In: Proceedings of the ACM SIGMOD–SIGACT–SIGART symposium on principles of database systems, pp 286–296
Bulut A, Singh AK (2003) SWAT: hierarchical stream summarization in large networks. In: International conference on data engineering (ICDE), pp 303–314
Jiao Y (2006) Maintaining stream statistics over multiscale sliding windows. ACM Trans Database Syst (TODS) 31(4):1305–1334
Gibbons PB, Tirthapura S et al (2002) Distributed streams algorithms for sliding windows. In: Proceedings of the ACM symposium on parallel algorithms and architectures, pp 63–72
Guirguis S et al (2011) Optimized processing of multiple aggregate continuous queries. In: Proceedings of the ACM international conference on information and knowledge management, pp 1515–1524
Shein AU et al (2015) F1: accelerating the optimization of aggregate continuous queries. In: Proceedings of the ACM international conference on information and knowledge management, pp 1151–1160
Jerzak Z et al (2012) The DEBS 2012 grand challenge. In: Proceedings of the international conference on distributed event-based systems, pp 393–398
Benchmark T (2015) TPC-H benchmark dataset. http://www.tpc.org/tpch/. Accessed 18 Mar 2019
Acknowledgements
This work has been partly supported by the NICT BigClouT Project and JSPS KAKENHI Grant Number JP19H04114.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bou, S., Kitagawa, H. & Amagasa, T. L-BiX: incremental sliding-window aggregation over data streams using linear bidirectional aggregating indexes. Knowl Inf Syst 62, 3107–3131 (2020). https://doi.org/10.1007/s10115-020-01444-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-020-01444-5