Advertisement

Processing of Aggregate Continuous Queries in a Distributed Environment

  • Anatoli U. SheinEmail author
  • Panos K. Chrysanthis
  • Alexandros Labrinidis
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 337)

Abstract

Data Stream Management Systems (DSMSs) performing online analytics rely on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). In this paper, we study the problem of generating high quality execution plans of ACQs in DSMSs deployed on multi-node (multi-core and multi-processor) distributed environments. Towards this goal, we classify optimizers based on how they partition the workload among computing nodes and on their usage of the concept of Weavability, which is utilized by the state-of-the-art WeaveShare optimizer to selectively combine ACQs and produce low cost execution plans for single-node environments. For each category, we propose an optimizer, which either adopts an existing strategy or develops a new one for assigning and grouping ACQs to computing nodes. We implement and experimentally compare all of our proposed optimizers in terms of (1) keeping the total cost of the ACQs execution plan low and (2) balancing the load among the computing nodes. Our extensive experimental evaluation shows that our newly developed Weave-Group to Nodes (\(WG_{TN}\)) and Weave-Group Inserted (\(WG_{I}\)) optimizers produce plans of significantly higher quality than the rest of the optimizers. \(WG_{TN}\) minimizes the total cost, making it more suitable from a client perspective, and \(WG_{I}\) achieves load balancing, making it more suitable from a system perspective.

Notes

Acknowledgments

We would like to thank Cory Thoma, Nikolaos Katsipoulakis, and the anonymous reviewers for the insightful feedback and Mark Silvis for his help with copyediting. This work was supported in part by NSF award CBET-1250171, a gift from EMC/Greenplum and an ACM SoCC 2015 Student Scholarship.

References

  1. 1.
  2. 2.
    S4 distributed stream computing platform. http://incubator.apache.org/s4
  3. 3.
  4. 4.
    Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. VLDBJ 12(2), 120–139 (2003)CrossRefGoogle Scholar
  5. 5.
    Akidau, T., et al.: Millwheel: fault-tolerant stream processing at internet scale. PVLDB 6(3), 1033–1044 (2013)Google Scholar
  6. 6.
    Ananthanarayanan, R., et al.: Photon: fault-tolerant and scalable joining of continuous data streams. In: ACM SIGMOD, pp. 577–588 (2013)Google Scholar
  7. 7.
    Chrysanthis, P.K.: AQSIOS - next generation data stream management system. CONET Newslett. 9, 1–3 (2010)Google Scholar
  8. 8.
    Chung, C., Guirguis, S., Kurdia, A.: Competitive cost-savings in data stream management systems. In: Cai, Z., Zelikovsky, A., Bourgeois, A. (eds.) COCOON 2014. LNCS, vol. 8591, pp. 129–140. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-08783-2_12CrossRefGoogle Scholar
  9. 9.
    Condie, T.: Online aggregation and continuous query support in mapreduce. In: ACM SIGMOD, pp. 1115–1118 (2010)Google Scholar
  10. 10.
    Ghanem, T.M., Hammad, M.A., Mokbel, M.F., Aref, W.G., Elmagarmid, A.K.: Incremental evaluation of sliding-window queries over data streams. IEEE TKDE 19(1), 57–72 (2007)Google Scholar
  11. 11.
    Guirguis, S., Sharaf, M., Chrysanthis, P.K., Labrinidis, A.: Three-level processing of multiple aggregate continuous queries. In: IEEE ICDE, pp. 929–940 (2012)Google Scholar
  12. 12.
    Guirguis, S., Sharaf, M.A., Chrysanthis, P.K., Labrinidis, A.: Optimized processing of multiple aggregate continuous queries. In: ACM CIKM, pp. 357–368 (2011)Google Scholar
  13. 13.
    Gulisano, V., Jimenez-Peris, R., Patino-Martinez, M., Soriente, C., Valduriez, P.: Streamcloud: an elastic and scalable data streaming system. IEEE TPDS 23(12), 2351–2365 (2012)Google Scholar
  14. 14.
    Katsipoulakis, N.R., Thoma, C., Gratta, E.A., Labrinidis, A., Lee, A.J., Chrysanthis, P.K.: CE-Storm: confidential elastic processing of data streams. In: ACM SIGMOD, pp. 859–864 (2015)Google Scholar
  15. 15.
    Krishnamurthy, S., Wu, C., Franklin, M.: On-the-fly sharing for streamed aggregation. In: ACM SIGMOD, pp. 623–634 (2006)Google Scholar
  16. 16.
    Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Rec. 34, 39–44 (2005)CrossRefGoogle Scholar
  17. 17.
    Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: Semantics and evaluation techniques for window aggregates in data streams. In: ACM SIGMOD, pp. 311–322 (2005)Google Scholar
  18. 18.
    Motwani, R., et al.: Query processing, approximation, and resource management in a data stream management system. In: CIDR (2003)Google Scholar
  19. 19.
    Naidu, K., Rastogi, R., Satkin, S., Srinivasan, A.: Memory-constrained aggregate computation over data streams. In: IEEE ICDE, pp. 852–863 (2011)Google Scholar
  20. 20.
    Romeijn, H.E., Morales, D.R.: A class of greedy algorithms for the generalized assignment problem. Discrete Appl. Math. 103, 209–235 (2000)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: ACM SIGMOD, pp. 249–260 (2000)CrossRefGoogle Scholar
  22. 22.
    Sharaf, M.A., Chrysanthis, P.K., Labrinidis, A., Pruhs, K.: Algorithms and metrics for processing multiple heterogeneous continuous queries. IEEE TODS 33, 1–44 (2008)CrossRefGoogle Scholar
  23. 23.
    Shein, A.U., Chrysanthis, P.K., Labrinidis, A.: F1: accelerating the optimization of aggregate continuous queries. In: ACM CIKM, pp. 1151–1160 (2015)Google Scholar
  24. 24.
    Toshniwal, A., et al.: Storm@ Twitter. In: ACM SIGMOD, pp. 147–156 (2014)Google Scholar
  25. 25.
    Xing, Y., Zdonik, S., Hwang, J.: Dynamic load distribution in the borealis stream processor. In: IEEE ICDE, pp. 791–802 (2005)Google Scholar
  26. 26.
    Yao, F., Demers, A., Shenker, S.: A scheduling model for reduced CPU energy. In: ACM FOCS, pp. 374–382 (1995)Google Scholar
  27. 27.
    Zhang, R., Koudas, N., Ooi, B.C., Srivastava, D.: Multiple aggregations over data streams. In: ACM SIGMOD, pp. 299–310 (2005)Google Scholar
  28. 28.
    Zhang, R., Koudas, N., Ooi, B.C., Srivastava, D., Zhou, P.: Streaming multiple aggregations using phantoms. VLDBJ 19(4), 557–583 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Anatoli U. Shein
    • 1
    Email author
  • Panos K. Chrysanthis
    • 1
  • Alexandros Labrinidis
    • 1
  1. 1.Department of Computer ScienceUniversity of PittsburghPittsburghUSA

Personalised recommendations