Datenbank-Spektrum

, Volume 15, Issue 3, pp 203–211

Placement-Safe Operator-Graph Changes in Distributed Heterogeneous Data Stream Systems

  • Niko Pollner
  • Christian Steudtner
  • Klaus Meyer-Wegener
SCHWERPUNKTBEITRAG

Abstract

Data stream processing systems enable querying continuous data without first storing it. Data stream queries may combine data from distributed data sources like different sensors in an environmental sensing application. This suggests distributed query processing. Thus the amount of transferred data can be reduced and more processing resources are available.

However, distributed query processing on probably heterogeneous platforms complicates query optimization. This article investigates query optimization through operator graph changes and its interaction with operator placement on heterogeneous distributed systems. Pre-placement operator graph changes may prevent certain operator placements. Thereby the resource consumption of the query execution may unexpectedly increase. Based on the operator placement problem modeled as a task assignment problem (TAP), we prove that it is NP-hard to decide in general whether an arbitrary operator graph change may negatively influence the best possible TAP solution. We present conditions for several specific operator graph changes that guarantee to preserve the best possible TAP solution.

Keywords

Data stream systems Query optimization Operator placement Distributed data stream processing Heterogeneous systems 

Literatur

  1. 1.
    Burkard R, Dell’Amico M, Martello S (2009) Assignment Problems, Revised Reprint. SiamGoogle Scholar
  2. 2.
    Daum M (2011) Verteilung globaler Anfragen auf heterogene Stromverarbeitungssysteme. Dissertation, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)Google Scholar
  3. 3.
    Daum M, Lauterwald F, Baumgärtel P, Meyer-Wegener K (2010) Propagation of Densities of Streaming Data within Query Graphs. In: Proceedings of 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Lecture Notes in Computer Science, vol. 6187. Springer-Verlag, Heidelberg, pp. 584–601Google Scholar
  4. 4.
    Daum M, Lauterwald F, Baumgärtel P, Pollner N, Meyer-Wegener K (2011) Black-box Determination of Cost Models` Parameters for Federated Stream-Processing Systems. In: Proceedings of the 15th International Database Engineering & Applications Symposium (IDEAS). Lisbon, pp. 226–232Google Scholar
  5. 5.
    Daum M, Lauterwald F, Baumgärtel P, Pollner N, Meyer-Wegener K (2011) Efficient and Cost-aware Operator Placement in Heterogeneous Stream-Processing Environments. In: Proceedings of the 5th ACM International Conference on Distributed Event-Based Systems (DEBS). ACM, New York, pp. 393–394Google Scholar
  6. 6.
    Hirzel M, Soulé R, Schneider S, Gedik B, Grimm R (2014) A Catalog of Stream Processing Optimizations. ACM Comput Surv 46(4):1–34Google Scholar
  7. 7.
    Hueske F, Peters M, Sax MJ, Rheinländer A, Bergmann R, Krettek A, Tzoumas K (2012) Opening the Black Boxes in Data Flow Optimization. Proceedings VLDB Endowment 5(11):1256–1267Google Scholar
  8. 8.
    Jarke M, Koch J (1984) Query Optimization in Database Systems. ACM Comput Surv 16(2):111–152Google Scholar
  9. 9.
    Karnagel T, Habich D, Schlegel B, Lehner W (2014) Heterogeneity-Aware Operator Placement in Column-Store DBMS. Datenbank-Spektrum 14(3):211–221Google Scholar
  10. 10.
    Ke Q, Isard M, Yu Y (2013) Optimus: A Dynamic Rewriting Framework for Data-parallel Execution Plans. In: Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys). ACM, Prague, pp. 15–28Google Scholar
  11. 11.
    Khandekar R, Hildrum K, Parekh S, Rajan D, Wolf J, Wu KL, Andrade H, Gedik B (2009) COLA: Optimizing stream processing applications via graph partitioning. In: Middleware, Lecture Notes in Computer Science, vol. 5896. Springer, Urbana Champaign, pp. 308–327Google Scholar
  12. 12.
    Kossmann D (2000) The State of the Art in Distributed Query Processing. ACM Comput Surv 32(4):422–469Google Scholar
  13. 13.
    Lo VM (1988) Heuristic Algorithms for Task Assignment in Distributed Systems. IEEE Transactions on Computers 37(11):1384–1397Google Scholar
  14. 14.
    Nehme RV, Works K, Lei C, Rundensteiner EA, Bertino E (2013) Multi-route Query Processing and Optimization. J Comput System Sci 79(3):312–329Google Scholar
  15. 15.
    Pollner N, Steudtner C, Meyer-Wegener K (2015) Operator Fission for Load Balancing in Distributed Heterogeneous Data Stream Processing Systems. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (DEBS). ACM, Oslo, pp. 332–335Google Scholar
  16. 16.
    Pollner N, Steudtner C, Meyer-Wegener K (2015) Placement-Safe Operator-Graph Changes in Distributed Heterogeneous Data Stream Systems. In: Datenbanksysteme für Business, Technologie und Web (BTW) - Workshopband, Lecture Notes in Informatics (LNI) - Proceedings, vol. P-242. Gesellschaft für Informatik (GI), Hamburg, pp. 61–70Google Scholar
  17. 17.
    Tian F, DeWitt DJ (2003) Tuple Routing Strategies for Distributed Eddies. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB). VLDB Endowment, Berlin, pp. 333–344Google Scholar
  18. 18.
    Viglas SD, Naughton JF (2002) Rate-based Query Optimization for Streaming Information Sources. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, Madison, pp. 37–48Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Niko Pollner
    • 1
  • Christian Steudtner
    • 2
  • Klaus Meyer-Wegener
    • 1
  1. 1.Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Computer Science 6 (Data Management)ErlangenGermany
  2. 2.Deutsche Anwaltshotline AGNürnbergGermany

Personalised recommendations