Skip to main content
Log in

Optimal Broadcasting Strategies for Conjunctive Queries over Distributed Data

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

In a distributed context where data is dispersed over many computing nodes, monotone queries can be evaluated in an eventually consistent and coordination-free manner through a simple but naive broadcasting strategy which makes all data available on every computing node. In this paper, we investigate more economical broadcasting strategies for full conjunctive queries without self-joins that only transmit a part of the local data necessary to evaluate the query at hand. We consider oblivious broadcasting strategies which determine which local facts to broadcast independent of the data at other computing nodes. We introduce the notion of broadcast dependency set (BDS) as a sound and complete formalism to represent locally optimal oblivious broadcasting functions. We provide algorithms to construct a BDS for a given conjunctive query and study the complexity of various decision problems related to these algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Actually, this observation is the straightforward part of the CALM-conjecture [14]. It is the converse direction which is more surprising: that every query which can be evaluated in an eventually consistent and coordination-free manner has to be monotone [6].

  2. To simplify notation, in the definition of B and eval, we do not mention I and \(\mathcal {N}\) as they are implied by H.

  3. Notice that we abuse the notation and interpret variables as values.

  4. We use a sequence rather than a set \(\mathcal {R}\) to keep BDS-BUILD deterministic.

  5. For convenience we represent atomic types here by partial atomic types with sufficient (but not complete) conditions; e.g., we write (C,x=y) to denote (C,x=yy=x). Nevertheless, all of the listed pairs indeed correspond to a single (complete) atomic type.

References

  1. Afrati, F.N., Koutris, P., Suciu, D., Ullman, J.D.: Parallel skyline queries. In: International Conference on Database Theory (ICDT), pp 274–284 (2012)

  2. Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: International Conference on Extending Database Technology (EDBT), pp 99–110 (2010)

  3. Alvaro, P., Conway, N., Hellerstein, J., Marczak, W.R.: Consistency analysis in bloom: a CALM and collected approach. In: Conference on Innovative Data Systems Research (CIDR), pp 249–260 (2011)

  4. Alvaro, P., Conway, N., Hellerstein, J.M., Maier, D.: Blazes: Coordination analysis for distributed programs. In: International Conference on Data Engineering (ICDE), pp 52–63. IEEE (2014)

  5. Ameloot, T.J., Ketsman, B., Neven, F., Zinn, D.: Weaker forms of monotonicity for declarative networking: a more fine-grained answer to the CALM-conjecture . In: Symposium on Principles of Database Systems (PODS), pp 64–75. ACM (2014)

  6. Ameloot, T.J., Neven, F., Bussche, J.V.d.: Relational transducers for declarative networking. J. ACM 60(2), 15 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  7. Beame, P., Koutris, P., Suciu, D.: Communication steps for parallel query processing. In: Symposium on Principles of Database Systems (PODS), pp 273–284 (2013)

  8. Beame, P., Koutris, P., Suciu, D.: Skew in parallel query processing. In: Symposium on Principles of Database Systems (PODS), pp 212–223 (2014)

  9. Buneman, P., Cheney, J., Tan, W.C., Vansummeren, S: Curated databases. In: Symposium on Principles of Database Systems (PODS), pp 1–12. ACM (2008)

  10. Buneman, P., Khanna, S., Tan, W.C.: Why and where: A characterization of data provenance. In: International Conference on Database Theory (ICDT) volume 1973 of Lecture Notes in Computer Science, pp 316–330. Springer (2001)

  11. Conway, N., Marczak, W.R., Alvaro, P., Hellerstein, J.M., Maier, D.: Logic and lattices for distributed programming. In: Symposium on Cloud Computing (SoCC), p 1. ACM (2012)

  12. Fan, W., Geerts, F., Libkin, L.: On scale independence for querying big data. In: Symposium on Principles of Database Systems (PODS), pp 51–62. ACM (2014)

  13. Ganguly, S., Silberschatz, A., Tsur, S.: Parallel bottom-up processing of datalog queries. J. Log. Program. 14(1&2), 101–126 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  14. Hellerstein, J.M.: The declarative imperative: experiences and conjectures in distributed logic. SIGMOD Rec. 39(1), 5–19 (2010)

    Article  Google Scholar 

  15. Ketsman, B., Neven, F.: Optimal broadcasting strategies for conjunctive queries over distributed data. In: International Conference on Database Theory (ICDT), pp 291–307 (2015)

  16. Koutris, P., Suciu, D.: Parallel evaluation of conjunctive queries. In: Symposium on Principles of Database Systems (PODS), pp 223–234 (2011)

  17. Meliou, A., Gatterbauer, W., Halpern, J.Y., Koch, C., Moore, K.F., Suciu, D.: Causality in databases. IEEE Data Engineering Bulletin 33(3), 59–67 (2010)

    Google Scholar 

  18. Meliou, A., Gatterbauer, W., Moore, K.F., Suciu, D.: The complexity of causality and responsibility for query answers and non-answers. Proceedings of the VLDB Endowmen (PVLDB) 4(1), 34–45 (2010)

    Article  Google Scholar 

  19. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp 15–28. USENIX Association (2012)

  20. Zinn, D., Green, T.J., Ludäscher, B.: Win-move is coordination-free (sometimes). In: International Conference on Database Theory (ICDT), pp 99–113 (2012)

Download references

Acknowledgment

We thank Phokion Kolaitis for raising the question whether it is always necessary to broadcast all the data in the context of the work in [5]. We thank the reviewers for their in-depth comments and numerous suggestions for improving the presentation of the results.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bas Ketsman.

Additional information

Bas Ketsman is a PhD Fellow of the Research Foundation - Flanders (FWO).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ketsman, B., Neven, F. Optimal Broadcasting Strategies for Conjunctive Queries over Distributed Data. Theory Comput Syst 61, 233–260 (2017). https://doi.org/10.1007/s00224-016-9719-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-016-9719-8

Keywords

Navigation