Theory of Computing Systems

, Volume 62, Issue 4, pp 810–853 | Cite as

It’s All a Matter of Degree

Using Degree Information to Optimize Multiway Joins
Article
  • 50 Downloads
Part of the following topical collections:
  1. Special Issue on Database Theory

Abstract

We optimize multiway equijoins on relational tables using degree information. We give a new bound that uses degree information to more tightly bound the maximum output size of a query. On real data, our bound on the number of triangles in a social network can be up to 95 times tighter than existing worst case bounds. We show that using only a constant amount of degree information, we are able to obtain join algorithms with a running time that has a smaller exponent than existing algorithms–for any database instance. We also show that this degree information can be obtained in nearly linear time, which yields asymptotically faster algorithms in the serial setting and lower communication cost algorithms in the MapReduce setting. In the serial setting, the data complexity of join processing can be expressed as a function O(IN x + OUT) in terms of input size IN and output size OUT in which x depends on the query. An upper bound for x is given by the fractional hypertreewidth. We are interested in situations in which we can get algorithms for which x is strictly smaller than the fractional hypertreewidth. We say that a join can be processed in subquadratic time if x < 2. Building on the AYZ algorithm for processing cycle joins in quadratic time, for a restricted class of joins which we call 1-series-parallel graphs, we obtain a complete decision procedure for identifying subquadratic solvability (subject to the 3-SUM problem requiring quadratic time). Our 3-SUM based quadratic lower bound is tight, making it the only known tight bound for joins that does not require any assumption about the matrix multiplication exponent ω. We also give a MapReduce algorithm that meets our improved communication bound and handles essentially optimal parallelism.

Keywords

Joins Degree 

Notes

Acknowledgements

The authors would like to thank Atri Rudra for pointing out the connection to submodular width. CR gratefully acknowledges the support of the Defense Advanced Research Projects Agency (DARPA) XDATA Program under No. FA8750-12-2-0335 and DEFT Program under No. FA8750-13-2-0039, DARPAs MEMEX program under No. FA8750-14-2-0240, the National Science Foundation (NSF) under CAREER Award No. IIS-1353606, Award No. No. CCF-1356918 and EarthCube Award under No. ACI-1343760, the Office of Naval Research (ONR) under awards No. N000141210041 and No. N000141310129, the Sloan Research Fellowship, the Moore Foundation Data Driven Investigator award, and gifts from American Family Insurance, Google, Lightspeed Ventures, and Toshiba.

References

  1. 1.
    Afrati, F., Joglekar, M., Ré, C., Salihoglu, S., Ullman, J.: GYM: A multiround join algorithm in mapreduce. CoRR (2014). arXiv:1410.4156
  2. 2.
    Afrati, F.N., Ullman, J.D.: Optimizing Multiway Joins in a Map-Reduce Environment. IEEE TKDE 23 (2011)Google Scholar
  3. 3.
    Alon, N., Yuster, R., Zwick, U.: Finding and counting given length cycles (extended abstract). In: Proceedings of the 2nd Annual European Symposium on Algorithms. ESA ’94, pp. 354–364. Springer, London (1994). http://dl.acm.org/citation.cfm?id=647904.739463
  4. 4.
    Alon, N., Newman, I., Shen, A., Tardos, G., Vereshchagin, N.: Partitioning multi-dimensional sets in a small number of “uniform” parts. Eur. J. Comb. 28, 134–144 (2007)Google Scholar
  5. 5.
    Atserias, A., Grohe, M., Marx, D.: Size Bounds and Query Plans for Relational Joins. SIAM J. Comput. 42, 1737–1767 (2013)Google Scholar
  6. 6.
    Baran, I., Demaine, E., Patrascu, M.: Subquadratic algorithms for 3sum. In: Dehne, F., Lopez-Ortiz, A., Sack, J. (eds.) Algorithms and Data Structures, Lecture Notes in Computer Science, vol. 3608, pp. 409–421. Springer, Berlin (2005)  https://doi.org/10.1007/11534273_36
  7. 7.
    Beame, P., Koutris, P., Suciu, D.: Communication Steps for Parallel Query Processing. In: PODS (2013)Google Scholar
  8. 8.
    Beame, P., Koutris, P., Suciu, D.: Skew in Parallel Query Processing. In: PODS (2014)Google Scholar
  9. 9.
    Björklund, A., Pagh, R., Williams, V.V., Zwick, U.: Listing triangles. In: Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E. (eds.) Automata, Languages, and Programming, Lecture Notes in Computer Science, vol. 8572, pp. 223–234. Springer, Berlin (2014)  https://doi.org/10.1007/978-3-662-43948-7_19
  10. 10.
    Chekuri, C., Rajaraman, A.: Conjunctive Query Containment Revisited. TCS 239 (2000)Google Scholar
  11. 11.
    Gottlob, G., Grohe, M., Nysret, M., Marko, S., Scarcello, F.: Hypertree Decompositions: Structure, Algorithms, and Applications. In: WG (2005)Google Scholar
  12. 12.
    Gross, J., Yellen, J., Zhang, P.: Handbook of Graph Theory, 2nd edn. Chapman & Hall/CRC, London (2013)MATHGoogle Scholar
  13. 13.
    Joglekar, M., Ré, C.: It’s all a matter of degree: Using degree information to optimize multiway joins. CoRR (2015). arXiv:1508.01239
  14. 14.
    Koutris, P., Suciu, D.: Parallel evaluation of conjunctive queries. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. PODS ’11, pp. 223–234. ACM, New York (2011)  https://doi.org/10.1145/1989284.1989310
  15. 15.
    Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection. (2014).http://snap.stanford.edu/data
  16. 16.
    Marx, D.: Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J. ACM 60, 1–51 (2013)Google Scholar
  17. 17.
    Ngo, H., Porat, E., Ré, C., Rudra, A.: Worst-case optimal join algorithms: [extended abstract]. In: Proceedings of the 31st Symposium on Principles of Database Systems. PODS ’12, pp. 37–48. ACM, New York (2012).  https://doi.org/10.1145/2213556.2213565
  18. 18.
    Ngo, H., Ré, C., Rudra, A.: Skew Strikes Back: New Developments in the Theory of Join Algorithms. SIGMOD 42 (2014)Google Scholar
  19. 19.
    Veldhuizen, T.: Leapfrog triejoin: a worst-case optimal join algorithm. CoRR (2012). arXiv:1210.0481
  20. 20.
    Yannakakis, M.: Algorithms for Acyclic Database Schemes. In: VLDB (1981)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Stanford UniversityStanfordUSA

Personalised recommendations