Advertisement

Distribution Policies for Datalog

  • Bas KetsmanEmail author
  • Aws Albarghouthi
  • Paraschos Koutris
Article
  • 3 Downloads
Part of the following topical collections:
  1. Special Issue on Database Theory (2018)

Abstract

Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds.

Keywords

Datalog queries Distributed evaluation Distribution policies 

Notes

References

  1. 1.
    Foundations of databases: The logical level. In: Abiteboul, S., Hull, R., Vianu, V. (eds.) 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston (1995)Google Scholar
  2. 2.
    Afrati, F.N., Borkar, V.R., Carey, M.J., Polyzotis, N., Ullman, J.D.: Map-reduce extensions and recursive queries. In: EDBT ’11, pp. 1–8 (2011)Google Scholar
  3. 3.
    Afrati, F.N., Papadimitriou, C.H.: The parallel complexity of simple chain queries. In: PODS ’87, pp. 210–213 (1987)Google Scholar
  4. 4.
    Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: EDBT ’10, pp. 99–110 (2010)Google Scholar
  5. 5.
    Afrati, F.N., Ullman, J.D.: Transitive closure and recursive datalog implemented on clusters. In: EDBT ’12, pp. 132–143 (2012)Google Scholar
  6. 6.
    Ameloot, T.J., Geck, G., Ketsman, B., Neven, F., Schwentick, T.: Data partitioning for single-round multi-join evaluation in massively parallel systems. SIGMOD Record 45(1), 33–40 (2016)CrossRefGoogle Scholar
  7. 7.
    Ameloot, T.J., Geck, G., Ketsman, B., Neven, F., Schwentick, T.: Parallel-correctness and transferability for conjunctive queries. J. ACM 64(5), 36:1–36:38 (2017)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Ameloot, T.J., Ketsman, B., Neven, F., Zinn, D.: Datalog queries distributing over components. ACM Trans. Computa. Logic 18(1), 5:1–5:35 (2017)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Beame, P., Koutris, P., Suciu, D.: Communication steps for parallel query processing. In: PODS ’13, pp. 273–284 (2013)Google Scholar
  10. 10.
    Beame, P., Koutris, P., Suciu, D.: Skew in parallel query processing. In: PODS ’14, pp. 212–223 (2014)Google Scholar
  11. 11.
    Chu, S., Balazinska, M., Suciu, D.: From theory to practice: Efficient join query evaluation in a parallel database system. In: SIGMOD ’15, pp. 63–78 (2015)Google Scholar
  12. 12.
    Cosmadakis, S.S., Kanellakis, P.C.: Parallel evaluation of recursive rule queries. In: PODS ’86, pp 280–293. ACM, New York (1986)Google Scholar
  13. 13.
    Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI ’04, pp. 137–150 (2004)Google Scholar
  14. 14.
    Dewan, H.M., Stolfo, S.J., Hernández, M.A., Hwang, J.-J.: Predictive dynamic load balancing of parallel and distributed rule and query processing. In: SIGMOD ’94, pp. 277–288 (1994)CrossRefGoogle Scholar
  15. 15.
    Ganguly, S., Silberschatz, A., Tsur, S.: Parallel bottom-up processing of datalog queries. J. Logic Program. 14(1&2), 101–126 (1992)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Ganguly, S., Silberschatz, A., Tsur, S.: A framework for the parallel processing of datalog queries. In: SIGMOD ’90, pp. 143–152 (1990)CrossRefGoogle Scholar
  17. 17.
    Geck, G., Ketsman, B., Neven, F., Schwentick, T.: Parallel-correctness and containment for conjunctive queries with union and negation. In: ICDT 2016, pp. 9:1–9:17 (2016)MathSciNetCrossRefGoogle Scholar
  18. 18.
  19. 19.
    Halperin, D., de Almeida, V.T., Choo, L.L., Chu, S., Koutris, P., Moritz, D., Ortiz, J., Ruamviboonsuk, V., Wang, J., Whitaker, A., Xu, S., Balazinska, M., Howe, B., Suciu, D.: Demonstration of the myria big data management service. In: SIGMOD ’14, pp. 881–884 (2014)Google Scholar
  20. 20.
    Kanellakis, P.C.: Logic Programming and Parallel Complexity, pp. 1–30. Springer, Berlin (1986)zbMATHGoogle Scholar
  21. 21.
    Ketsman, B., Albarghouthi, A., Koutris, P.: Distribution policies for datalog. In: ICDT ’18, pp. 17:1–17:22 (2018)Google Scholar
  22. 22.
    Ketsman, B., Neven, F.: Optimal broadcasting strategies for conjunctive queries over distributed data. Theory Comput. Syst. 61(1), 233–260 (2017)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Ketsman, B., Suciu, D.: A worst-case optimal multi-round algorithm for parallel computation of conjunctive queries. In: PODS ’17, pp. 417–428 (2017)Google Scholar
  24. 24.
    Koutris, P., Beame, P., Suciu, D.: Worst-case optimal algorithms for parallel query processing. In: ICDT ’16, pp. 8:1–8:18 (2016)Google Scholar
  25. 25.
    Koutris, P., Suciu, D.: Parallel evaluation of conjunctive queries. In: PODS ’11, pp. 223–234 (2011)Google Scholar
  26. 26.
    Motik, B., Nenov, Y., Piro, R., Horrocks, I., Olteanu, D.: Parallel materialisation of datalog programs in centralised, main-memory RDF systems. In: AAAI ’14, pp. 129–137 (2014)Google Scholar
  27. 27.
    Neven, F., Schwentick, T., Spinrath, C., Vandevoort, B.: Parallel-correctness and parallel-boundedness for datalog programs. In: ICDT’19, pp. 14:1–14:19 (2019)Google Scholar
  28. 28.
    Seib, J., Lausen, G.: Parallelizing datalog programs by generalized pivoting. In: PODS ’91, pp. 241–251 (1991)Google Scholar
  29. 29.
    Seo, J., Park, J., Shin, J., Lam, M.S.: Distributed socialite: A datalog-based language for large-scale graph analysis. PVLDB 6(14), 1906–1917 (2013)Google Scholar
  30. 30.
    Shaw, M., Koutris, P., Howe, B., Suciu, D.: Optimizing large-scale semi-naïve datalog evaluation in hadoop. In: Datalog 2.0, pp. 165–176 (2012)Google Scholar
  31. 31.
    Shkapsky, A., Yang, M., Interlandi, M., Chiu, H., Condie, T., Zaniolo, C.: Big data analytics with datalog queries on spark. In: SIGMOD ’16, pp. 1135–1149 (2016)Google Scholar
  32. 32.
  33. 33.
    Ullman, J.D., Van Gelder, A.: Parallel complexity of logical query programs. Algorithmica 3, 5–42 (1988)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Wang, J., Balazinska, M., Halperin, D.: Asynchronous and fault-tolerant recursive datalog evaluation in shared-nothing engines. PVLDB 8(12), 1542–1553 (2015)Google Scholar
  35. 35.
    Wolfson, O.: Sharing the load of logic-program evaluation. In: DPDS ’88, pp. 46–55 (1988)Google Scholar
  36. 36.
    Wolfson, O., Ozeri, A.: A new paradigm for parallel and distributed rule-processing. SIGMOD Rec. 19(2), 133–142 (1990)CrossRefGoogle Scholar
  37. 37.
    Wolfson, O., Silberschatz, A.: Distributed processing of logic programs. SIGMOD Rec. 17(3), 329–336 (1988)CrossRefGoogle Scholar
  38. 38.
    Xin, R.S., Rosen, J., Zaharia, M., Franklin, M.J., Shenker, S., Stoica, I.: Shark: SQL and rich analytics at scale. In: SIGMOD ’13, pp. 13–24 (2013)Google Scholar
  39. 39.
    Zhang, W., Wang, K., Chau, S.-C.: Data partition and parallel evaluation of datalog programs. IEEE Trans. Knowl. Data Eng. 7(1), 163–176 (1995)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Vrije Universiteit BrusselBrusselsBelgium
  2. 2.University of Wisconsin-MadisonMadisonUSA

Personalised recommendations