Skip to main content

Processing of Aggregate Continuous Queries in a Distributed Environment

  • Conference paper
  • First Online:
Real-Time Business Intelligence and Analytics (BIRTE 2015, BIRTE 2016, BIRTE 2017)

Abstract

Data Stream Management Systems (DSMSs) performing online analytics rely on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). In this paper, we study the problem of generating high quality execution plans of ACQs in DSMSs deployed on multi-node (multi-core and multi-processor) distributed environments. Towards this goal, we classify optimizers based on how they partition the workload among computing nodes and on their usage of the concept of Weavability, which is utilized by the state-of-the-art WeaveShare optimizer to selectively combine ACQs and produce low cost execution plans for single-node environments. For each category, we propose an optimizer, which either adopts an existing strategy or develops a new one for assigning and grouping ACQs to computing nodes. We implement and experimentally compare all of our proposed optimizers in terms of (1) keeping the total cost of the ACQs execution plan low and (2) balancing the load among the computing nodes. Our extensive experimental evaluation shows that our newly developed Weave-Group to Nodes (\(WG_{TN}\)) and Weave-Group Inserted (\(WG_{I}\)) optimizers produce plans of significantly higher quality than the rest of the optimizers. \(WG_{TN}\) minimizes the total cost, making it more suitable from a client perspective, and \(WG_{I}\) achieves load balancing, making it more suitable from a system perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache samza. http://samza.apache.org

  2. S4 distributed stream computing platform. http://incubator.apache.org/s4

  3. Spark streaming. https://spark.apache.org/streaming

  4. Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. VLDBJ 12(2), 120–139 (2003)

    Article  Google Scholar 

  5. Akidau, T., et al.: Millwheel: fault-tolerant stream processing at internet scale. PVLDB 6(3), 1033–1044 (2013)

    Google Scholar 

  6. Ananthanarayanan, R., et al.: Photon: fault-tolerant and scalable joining of continuous data streams. In: ACM SIGMOD, pp. 577–588 (2013)

    Google Scholar 

  7. Chrysanthis, P.K.: AQSIOS - next generation data stream management system. CONET Newslett. 9, 1–3 (2010)

    Google Scholar 

  8. Chung, C., Guirguis, S., Kurdia, A.: Competitive cost-savings in data stream management systems. In: Cai, Z., Zelikovsky, A., Bourgeois, A. (eds.) COCOON 2014. LNCS, vol. 8591, pp. 129–140. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08783-2_12

    Chapter  Google Scholar 

  9. Condie, T.: Online aggregation and continuous query support in mapreduce. In: ACM SIGMOD, pp. 1115–1118 (2010)

    Google Scholar 

  10. Ghanem, T.M., Hammad, M.A., Mokbel, M.F., Aref, W.G., Elmagarmid, A.K.: Incremental evaluation of sliding-window queries over data streams. IEEE TKDE 19(1), 57–72 (2007)

    Google Scholar 

  11. Guirguis, S., Sharaf, M., Chrysanthis, P.K., Labrinidis, A.: Three-level processing of multiple aggregate continuous queries. In: IEEE ICDE, pp. 929–940 (2012)

    Google Scholar 

  12. Guirguis, S., Sharaf, M.A., Chrysanthis, P.K., Labrinidis, A.: Optimized processing of multiple aggregate continuous queries. In: ACM CIKM, pp. 357–368 (2011)

    Google Scholar 

  13. Gulisano, V., Jimenez-Peris, R., Patino-Martinez, M., Soriente, C., Valduriez, P.: Streamcloud: an elastic and scalable data streaming system. IEEE TPDS 23(12), 2351–2365 (2012)

    Google Scholar 

  14. Katsipoulakis, N.R., Thoma, C., Gratta, E.A., Labrinidis, A., Lee, A.J., Chrysanthis, P.K.: CE-Storm: confidential elastic processing of data streams. In: ACM SIGMOD, pp. 859–864 (2015)

    Google Scholar 

  15. Krishnamurthy, S., Wu, C., Franklin, M.: On-the-fly sharing for streamed aggregation. In: ACM SIGMOD, pp. 623–634 (2006)

    Google Scholar 

  16. Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Rec. 34, 39–44 (2005)

    Article  Google Scholar 

  17. Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: Semantics and evaluation techniques for window aggregates in data streams. In: ACM SIGMOD, pp. 311–322 (2005)

    Google Scholar 

  18. Motwani, R., et al.: Query processing, approximation, and resource management in a data stream management system. In: CIDR (2003)

    Google Scholar 

  19. Naidu, K., Rastogi, R., Satkin, S., Srinivasan, A.: Memory-constrained aggregate computation over data streams. In: IEEE ICDE, pp. 852–863 (2011)

    Google Scholar 

  20. Romeijn, H.E., Morales, D.R.: A class of greedy algorithms for the generalized assignment problem. Discrete Appl. Math. 103, 209–235 (2000)

    Article  MathSciNet  Google Scholar 

  21. Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: ACM SIGMOD, pp. 249–260 (2000)

    Article  Google Scholar 

  22. Sharaf, M.A., Chrysanthis, P.K., Labrinidis, A., Pruhs, K.: Algorithms and metrics for processing multiple heterogeneous continuous queries. IEEE TODS 33, 1–44 (2008)

    Article  Google Scholar 

  23. Shein, A.U., Chrysanthis, P.K., Labrinidis, A.: F1: accelerating the optimization of aggregate continuous queries. In: ACM CIKM, pp. 1151–1160 (2015)

    Google Scholar 

  24. Toshniwal, A., et al.: Storm@ Twitter. In: ACM SIGMOD, pp. 147–156 (2014)

    Google Scholar 

  25. Xing, Y., Zdonik, S., Hwang, J.: Dynamic load distribution in the borealis stream processor. In: IEEE ICDE, pp. 791–802 (2005)

    Google Scholar 

  26. Yao, F., Demers, A., Shenker, S.: A scheduling model for reduced CPU energy. In: ACM FOCS, pp. 374–382 (1995)

    Google Scholar 

  27. Zhang, R., Koudas, N., Ooi, B.C., Srivastava, D.: Multiple aggregations over data streams. In: ACM SIGMOD, pp. 299–310 (2005)

    Google Scholar 

  28. Zhang, R., Koudas, N., Ooi, B.C., Srivastava, D., Zhou, P.: Streaming multiple aggregations using phantoms. VLDBJ 19(4), 557–583 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank Cory Thoma, Nikolaos Katsipoulakis, and the anonymous reviewers for the insightful feedback and Mark Silvis for his help with copyediting. This work was supported in part by NSF award CBET-1250171, a gift from EMC/Greenplum and an ACM SoCC 2015 Student Scholarship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anatoli U. Shein .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shein, A.U., Chrysanthis, P.K., Labrinidis, A. (2019). Processing of Aggregate Continuous Queries in a Distributed Environment. In: Castellanos, M., Chrysanthis, P., Pelechrinis, K. (eds) Real-Time Business Intelligence and Analytics. BIRTE BIRTE BIRTE 2015 2016 2017. Lecture Notes in Business Information Processing, vol 337. Springer, Cham. https://doi.org/10.1007/978-3-030-24124-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24124-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24123-0

  • Online ISBN: 978-3-030-24124-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics