Skip to main content

Scaling up the performance of more powerful Datalog systems on multicore machines


Extending RDBMS technology to achieve performance and scalability for queries that are much more powerful than those of SQL-2 has been the goal of deductive database research for more than thirty years. The \(\mathcal {D}e\mathcal {A}\mathcal {L}\mathcal {S}\) system has made major progress toward this goal, by (1) Datalog extensions that support the more powerful recursive queries needed in advanced applications, and (2) superior performance for both traditional recursive queries and those made possible by the new extensions, while (3) delivering competitive performance with commercial RDBMSs on non-recursive queries. In this paper, we focus on the techniques used to support the in-memory evaluation of Datalog programs on multicore machines. In \(\mathcal {D}e\mathcal {A}\mathcal {L}\mathcal {S}\), a Datalog program is represented as an AND/OR tree, and multiple copies of the same AND/OR tree are used to access the tables in the database concurrently during the parallel evaluation. We describe compilation techniques that (1) recognize when the given program is lock-free, (2) transform a locking program into a lock-free program, and (3) find an efficient parallel plan that correctly evaluates the program while minimizing the use of locks and other overhead required for parallel evaluation. Extensive experiments demonstrate the effectiveness of the proposed techniques.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    The actual name “Datalog” was introduced by David Maier several years later.

  2. 2.

    Another way to implement this query is to use the recursive common table expressions. But the approach of using a WHILE loop performs significantly better in our experiments.

  3. 3.

    There are other possible partitioning strategies, and the choice will be discussed later in the section.

  4. 4.

    Currently, the user determines when to force the materialization of a relation with an annotation in the program.

  5. 5.

    The idea of this optimization is that there is no need to consider any value but the maximum value produced by the mcount goals, i.e., the current count values, when certain conditions are satisfied. \(\mathcal {D}e\mathcal {A}\mathcal {L}\mathcal {S}\) uses simple sufficient conditions that can be easily checked by a compiler, including (i) the values produced by the mcount goals are tested against some monotonic Boolean conditions which evaluate to true iff they are true for the max values or (ii) the values produced by the mcount term are fed to the final extraction rule which disregards all the values but the max ones. Similar conditions apply for msum.

  6. 6.

    For a predicate p, R(p) denotes the relation that stores all tuples corresponding to facts about p; \(p[\overline{X}]\) denotes a tuple of arity \(|\overline{X}|\) by retrieving the arguments in p whose positions belong to \(\overline{X}\), and it is treated as a multiset of arguments when involved in equality checking.

  7. 7.

    count(distinct) is replaced with count in \(\mathtt{query16}\). order by and limit are ignored in our program. The evaluation time will not change significantly if we add these constructs since most queries return very few results except \(\mathtt{query3}\) and \(\mathtt{query10}\).

  8. 8.

    \(\mathcal {D}e\mathcal {A}\mathcal {L}\mathcal {S}\) is about 2\(\times \) faster than the version used in [55] on the TPC-H benchmark by function inline optimization.

  9. 9.

    We use the graph as a directed graph for \(\mathtt{4cycle}\), and as a undirected graph for \(\mathtt{3clique}\) and \(\mathtt{4clique}\).

  10. 10.

    The single-processor version of DLV is downloaded from [10]. Although a parallel version [9] is available, it is either much slower than the single-processor version, or it fails since it is a 32-bit executable that does not support more than 4 GB memory required by evaluation.


  1. 1.

    Aref, M., ten Cate, B., Green, T.J., Kimelfeld, B., et al.: Design and implementation of the LogicBlox system. In: SIGMOD, pp. 1371–1382. ACM, New York (2015)

  2. 2.

    Arni, F., Ong, K., Tsur, S., Wang, H., Zaniolo, C.: The deductive database system LDL++. TPLP 3(1), 61–94 (2003)

    MATH  Google Scholar 

  3. 3.

    Bell, D.A., Shao, J., Hull, M.E.C.: A pipelined strategy for processing recursive queries in parallel. Data Knowl. Eng. 6(5), 367–391 (1991)

    Article  Google Scholar 

  4. 4.

    Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. CIDR 5, 225–237 (2005)

    Google Scholar 

  5. 5.

    Bravenboer, M., Smaragdakis, Y.: Strictly declarative specification of sophisticated points-to analyses. In: OOPSLA, pp. 243–262. ACM, New York (2009)

  6. 6.

    Chimenti, D., Gamboa, R., Krishnamurthy, R., Naqvi, S., et al.: The LDL system prototype. TKDE 2(1), 76–90 (1990)

    Google Scholar 

  7. 7.

    Cohen, S., Wolfson, O.: Why a single parallelization strategy is not enough in knowledge bases. In: PODS, pp. 200–216. ACM, New York (1989)

  8. 8.

    Deductive application language system.

  9. 9.

    DLV (parallel version).

  10. 10.

    DLV (single-processor version).

  11. 11.

    DLV with recursive aggregates.

  12. 12.

    Dees, J., Sanders, P.: Efficient many-core query execution in main memory column-stores. In: ICDE, pp. 350–361. IEEE, New York (2013)

  13. 13.

    Eisner, J., Filardo, N.W.: Dyna: extending Datalog for modern AI. In: Datalog Reloaded, pp. 181–220. Springer, Berlin (2011)

  14. 14.

    Fogel, A., Fung, S., Pedrosa, L., Walraed-Sullivan, M., et al.: A general approach to network configuration analysis. In: NSDI, pp. 469–483 (2015)

  15. 15.

    Ganguly, S., Silberschatz, A., Tsur, S.: Parallel bottom-up processing of Datalog queries. J. Logic Program. 14(1), 101–126 (1992)

    MathSciNet  Article  MATH  Google Scholar 

  16. 16.

    Ganguly, S., Silberschatz, A., Tsur, S.: Mapping Datalog program execution to networks of processors. TKDE 7(3), 351–361 (1995)

    Google Scholar 

  17. 17.

    Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Clingo=ASP+Control: preliminary report. arXiv preprint arXiv:1405.3694

  18. 18.

    Hulin, G.: Parallel processing of recursive queries in distributed architectures. In: VLDB, pp. 87–96. Morgan Kaufmann, Los Altos (1989)

  19. 19.

    Lattner, C.: LLVM and Clang: next generation compiler technology. In: The BSD Conference, pp. 1–2 (2008)

  20. 20.

    Leone, N., Pfeifer, G., Faber, W., Eiter, T., et al.: The DLV system for knowledge representation and reasoning. TOCL 7(3), 499–562 (2006)

    MathSciNet  Article  Google Scholar 

  21. 21.

    Leskovec, J., Krevl, A.: SNAP datasets: Stanford large network dataset collection. (2014)

  22. 22.

    Mazuran, M., Serra, E., Zaniolo, C.: Extending the power of Datalog recursion. VLDB J. 22(4), 471–493 (2013)

    Article  MATH  Google Scholar 

  23. 23.

    Mazuran, M., Serra, E., Zaniolo, C.: A declarative extension of Horn clauses, and its significance for Datalog and its applications. TPLP 13(4–5), 609–623 (2013)

    MathSciNet  MATH  Google Scholar 

  24. 24.

    Morris, K., Ullman, J.D., Van Gelder, A.: Design overview of the NAIL! system. In: ICLP, pp. 554–568. Springer, Berlin (1986)

  25. 25.

    Nguyen, D., Aref, M., Bravenboer, M., Kollias, G., et al.: Join Processing for Graph Patterns: An Old Dog with New Tricks. arXiv preprint arXiv:1503.04169 (2015)

  26. 26.

    Perri, S., Ricca, F., Sirianni, M.: Parallel instantiation of ASP programs: techniques and experiments. TPLP 13(02), 253–278 (2013)

    MathSciNet  MATH  Google Scholar 

  27. 27.

    Ramakrishnan, R., Srivastava, D., Sudarshan, S.: CORAL—control, relations and logic. In: VLDB, pp. 238–250. Morgan Kaufmann, Los Altos (1992)

  28. 28.

    Raschid, L., Su, S.Y.W.: A parallel processing strategy for evaluating recursive queries. In: VLDB, pp. 412–419. Morgan Kaufmann, Los Altos (1986)

  29. 29.

    Ross, K.A., Sagiv, Y.: Monotonic aggregation in deductive databases. In: PODS, pp. 114–126. ACM, New York (1992)

  30. 30.


  31. 31.

    SPEC\(^{\textregistered }\) CINT2006 Result. Cisco Systems: Cisco UCS C460 M4 (Intel Xeon E7-4890 v2, 2.80 GHz).

  32. 32.

    SPEC\(^{\textregistered }\) CINT2006 Result. Dell Inc.: PowerEdge R720 (Intel Xeon E5-2690, 2.90 GHz).

  33. 33.

    SPEC\(^{\textregistered }\) CINT2006 Result. Supermicro: Supermicro A+ Server 2042G-6RF (AMD Opteron 6376, 2.30 GHz).

  34. 34.

    SQL Server 2014.

  35. 35.

    Seib, J., Lausen, G.: Parallelizing Datalog programs by generalized pivoting. In: PODS, pp. 241–251. ACM, New York (1991)

  36. 36.

    Selman, B., Kautz, H.: Domain-independent extensions to GSAT: Solving large structured satisfiability problems. In: IJCAI, pp. 290–295. Morgan Kaufmann, Los Altos (1993)

  37. 37.

    Selman, B., Kautz, H., Cohen, B.: Local search strategies for satisfiability testing. Cliques Color. Satisf.: Second DIMACS Implement. Chall. 26, 521–532 (1993)

  38. 38.

    Selman, B., Levesque, H.J., Mitchell, D.G.: A new method for solving hard satisfiability problems. In: AAAI, pp. 440–446. AAAI Press/MIT Press, Cambridge (1992)

  39. 39.

    Seo, J., Guo, S., Lam, M.S.: SociaLite: Datalog extensions for efficient social network analysis. In: ICDE, pp. 278–289. IEEE, New York (2013)

  40. 40.

    Seo, J., Park, J., Shin, J., Lam, M.S.: Distributed socialite: a Datalog-based language for large-scale graph analysis. PVLDB 6(14), 1906–1917 (2013)

    Google Scholar 

  41. 41.

    Shkapsky, A., Yang, M., Interlandi, M., Chiu, H., Condie, T., Zaniolo, C.: Big data analytics with Datalog queries on Spark. In: SIGMOD, pp. 1135–1149. ACM, New York (2016)

  42. 42.

    Shkapsky, A., Yang, M., Zaniolo, C.: Optimizing recursive queries with monotonic aggregates in DeALS. In: ICDE, pp. 867–878. IEEE, New York (2015)

  43. 43.

    Shkapsky, A., Zeng, K., Zaniolo, C.: Graph queries in a next-generation Datalog system. PVLDB 6(12), 1258–1261 (2013)

    Google Scholar 

  44. 44.

    Spears, W.M.: Simulated annealing for hard satisfiability problems. Cliques Color. Satisf.: Second DIMACS Implement. Chall. 26, 533–558 (1993)

  45. 45.


  46. 46.

    TPC-H Result on Cisco UCS C460 M4 Server.

  47. 47.

    TPC-H Result on Dell PowerEdge R720.

  48. 48.

    Ullman, J.D.: Implementation of logical query languages for databases. TODS 10(3), 289–321 (1985)

    Article  MATH  Google Scholar 

  49. 49.


  50. 50.

    Van Gelder, A.: Foundations of aggregation in deductive databases. In: DOOD, pp. 13–34. Springer, Berlin (1993)

  51. 51.

    Veldhuizen, T.L.: Triejoin: A simple, worst-case optimal join algorithm. In: ICDT, pp. 96–106 (2014)

  52. 52.

    Wang, J., Balazinska, M., Halperin, D.: Asynchronous and fault-tolerant recursive Datalog evaluation in shared-nothing engines. PVLDB 8(12), 1542–1553 (2015)

    Google Scholar 

  53. 53.

    Wolfson, O.: Sharing the load of logic-program evaluation. In: DPDS, pp. 46–55. IEEE, New York (1988)

  54. 54.

    Wolfson, O., Silberschatz, A.: Distributed processing of logic programs. In: SIGMOD, pp. 329–336. ACM, New York (1988)

  55. 55.

    Yang, M., Shkapsky, A., Zaniolo, C.: Parallel bottom-up evaluation of logic programs: DeALS on shared-memory multicore machines. In: Technical Communications of ICLP (2015)

  56. 56.

    Yang, M., Zaniolo, C.: Main memory evaluation of recursive queries on multicore machines. In: IEEE BigData, pp. 251–260. IEEE, New York (2014)

  57. 57.

    Zaniolo, C.: Logical foundations of continuous query languages for data streams. In: Datalog in Academia and Industry, pp. 177–189. Springer, Berlin (2012)

  58. 58.

    Zhang, W., Wang, K., Chau, S.C.: Data partition and parallel evaluation of Datalog programs. TKDE 7(1), 163–176 (1995)

    Google Scholar 

Download references


This work was supported by NSF Grants IIS 1218471 and IIS 1118107. We would like to thank the reviewers and Matteo Interlandi for their comments. We thank LogicBlox especially Martin Bravenboer, Dung Nguyen, and Yannis Smaragdakis for their assistance with the LogicBlox comparison.

Author information



Corresponding author

Correspondence to Mohan Yang.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yang, M., Shkapsky, A. & Zaniolo, C. Scaling up the performance of more powerful Datalog systems on multicore machines. The VLDB Journal 26, 229–248 (2017).

Download citation


  • Parallel
  • Bottom-up evaluation
  • Datalog
  • Multicore
  • AND/OR tree