Abstract
Triangle enumeration is a foundation brick for solving harder graph problems related to social networks, the Internet and transportation, to name a few applications. This problem is well studied in the theory literature, but remains an open problem with big data. In this paper, we defend the idea of solving triangle enumeration with SQL queries evaluating the steps of a new adaptive algorithm with linear speedup. Such SQL approach provides scalability beyond RAM limits, automatic parallel processing and more importantly: linear speedup as more machines are added. We present theory results and experimental validation showing our solution works well with large graphs analyzed on a parallel cluster with many machines, producing a balanced workload even with highly skewed degree vertices. We consider two types of distributed systems: (1) a parallel DBMS that evaluates SQL queries, and (2) a parallel HPC cluster calling the MPI library (called via Python). Extensive benchmark experiments with large graphs show our SQL solution offers many advantages over MPI and competing graph analytic systems.
Similar content being viewed by others
References
Wimmer, A., Lewis, K.: Beyond and below racial homophily: Erg models of a friendship network documented on facebook. Am. J. Sociol. 116(2), 583–642 (2010). https://doi.org/10.1086/653658
Wasserman, S., Faust, K.: Social network analysis: methods and ap- plications. Cambridge University Press, Cambridge (1994). https://doi.org/10.1017/CBO9780511815478
Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8(1–2), 161–185 (2012). https://doi.org/10.1080/15427951.2012.625260
Eckmann, J.-P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the world wide web. Proc. Natl. Acad. Sci. 99(9), 5825–5829 (2002). https://doi.org/10.1073/pnas.032093399
Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: Proceedings of the 14th ACM SIGKDD pp. 16-24, (2008). https://doi.org/10.1145/1401890.1401898
Welser, H.T., Gleave, E., Fisher, D., Smith, M.: Visualizing the signatures of social roles in online discussion groups. J. Soc. Struct. 8(2), 1–32 (2007)
Mirza, B.J., Keller, B.J., Ramakrishnan, N.: Studying recommendation algorithms by graph analysis. J. Intell. Inf. Syst. 20(2), 131–160 (2003). https://doi.org/10.1023/A:1021819901281
Fudos, I., Hoffmann, C.M.: A graph-constructive approach to solving systems of geometric constraints. ACM Trans. Graph. 16(2), 179–216 (1997). https://doi.org/10.1145/248210.248223
Berry, J.W., Fostvedt, L.A., Nordman, D.J., Phillips, C.A., Seshadhri, C., Wilson, A.G.: Why do simple algorithms for triangle enumeration work in the real world? Internet Math. 11(6), 555–571 (2015)
Arifuzzaman, S., Khan, M., Marathe, M.: A space-efficient parallel algorithm for counting exact triangles in massive networks. Hpcc (2015). https://doi.org/10.1109/HPCC-CSS-ICESS.2015.301
Teixeira, C.H.C., Fonseca, A.J., Serafini, M., Siganos, G., Zaki, M.J., Aboulnaga, A.: Arabesque: A system for distributed graph mining. In: Proceedings of the 25th Symposium on Operating Systems Principles pp. 425-440 (2015). New York: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/2815400.2815410
Rasel, M.K., Han, Y., Kim, J., Park, K., Tu, N.A., Lee, Y.-K.: Itri: index-based triangle listing in massive graphs. Inf. Sci. 336, 1–20 (2016)
Park, H.-M., Myaeng, S.-H., Kang, U.: PTE: enumerating trillion triangles on distributed systems. ACM SIGKDD (2016). https://doi.org/10.1145/2939672.2939757
Zhu, Y., Zhang, H., Qin, L., Cheng, H.: Efficient MapReduce algorithms for triangle listing in billion-scale graphs. DAPD J. 35(2), 149–176 (2017)
Klauck, H., Nanongkai, D., Pandurangan, G., Robinson, P.: Distributed computation of large-scale graph problems. In: Proceedings of the 26th ACM-SIAM SODA, pp. 391-410 (2015)
Al Hasan, M., Dave, V.S.: Triangle counting in large networks: a review. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8(2), e1226 (2018)
Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11(4), 29–41 (2009). https://doi.org/10.1109/MCSE.2009.120
Xirogiannopoulos, K., Khurana, U., Deshpande, A.: Graphgen: exploring interesting graphs in relational data. Proc. VLDB Endow. 8(12), 2032–2035 (2015). https://doi.org/10.14778/2824032.2824129
Bordoloi, S., Kalita, B.: Article: designing graph database models from existing relational databases. Intern. J. Comput. Appl. 74(1), 25–31 (2013)
De Virgilio, R., Maccioni, A., Torlone, R.: Converting relational to graph databases. GRADES 2013 co-located with SIGMOD/PODS (2013). https://doi.org/10.1145/2484425.2484426
Leskovec, J., Krevl, A.: SNAP Datasets: stanford large network dataset collection. http://snap.stanford.edu/data (2014)
Kunegis, J.: Konect: The koblenz network collection. http://konect.uni-koblenz.de. Association for Computing Machinery (2013)
Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. Exp. Effic. Algorithms (2005). https://doi.org/10.1007/11427186_54
Itai, A., Rodeh, M.: Finding a minimum circuit in a graph. SIAM J. Comput. 7(4), 413–423 (1978). https://doi.org/10.1137/0207033
Ayed, R., Hacid, M., Haque, R., Jemai, A.: An updated dashboard of complete search FSM implementations in centralized graph transaction databases. J. Intell. Inf. Syst. 55(1), 149–182 (2020). https://doi.org/10.1007/s10844-019-00579-4
Yu, M., Qin, L., Zhang, Y., Zhang, W., Lin, X.:. Aot: Pushing the efficiency boundary of main-memory triangle listing. In: Nah, Y., Cui, B., Lee, S.-W., Yu, J.X., Moon, Y.-S., Whang, S.E. (Eds.), Database systems for advanced applications (pp. 516-533). Cham: Springer International Publishing (2020)
Yu, M., Qin, L., Zhang, Y., Zhang, W., Lin, X.: Dptl+: efficient parallel triangle listing on batch-dynamic graphs. IEEE ICDE (2021). https://doi.org/10.1109/ICDE51399.2021.00119
Afrati, F.N., Sarma, A.D., Salihoglu, S., Ullman, J.D.: Upper and lower bounds on the cost of a MapReduce computation. PVLDB 6(4), 277–288 (2013). https://doi.org/10.14778/2535570.2488334
Park, H.-M., Chung, C.-W.: An efficient MapReduce algorithm for counting triangles in a very large graph. ACM CIKM (2013). https://doi.org/10.1145/2505515.2505563
Zhu, Y., Zhang, H., Qin, L., Cheng, H.: Efficient mapreduce algorithms for triangle listing in billion-scale graphs. DAPD 35(2), 149–176 (2017). https://doi.org/10.1007/s10619-017-7193-1
Zhuo, W., Pan, B., Bo, S.: Parallel algorithm for triangle enumeration. J. Compu. Appl. 37(12), 3397–3400 (2017). https://doi.org/10.11772/j.issn.1001-9081.2017.12.3397
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: Proceedings of the 2010 Acm Sigmod Interna-tional Conference on Management of Data (p. 135-146) (2010). New York, : Association for Computing Machinery. Retrieved from https://doi.org/10.1145/1807167.1807184
Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx a: resilient distributed graph system on spark. First international work- shop on graph data management experiences and systems. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2484425.2484427
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: A new framework for parallel machine learning. In: Grünwald, P., Spirtes, P. (Eds.), UAI 2010, Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, Catalina Island, vol. 8-11, pp. 340-349. AUAI Press (2010)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: Distributed graph-parallel computation on natural graphs. In: Thekkath, C., Vahdat, A. (Eds.), 10th USENIX symposium on op- erating systems design and implementation, OSDI 2012, hollywood, ca, usa, october 8-10, 2012 (pp. 17-30) (2012). USENIX Association. Retrieved from https://www.usenix.org/conference/osdi12/technicalsessions/ presentation/gonzalez
Yan, D., Bu, Y., Tian, Y., Deshpande, A., Cheng, J.: Big graph analytics systems. In: Proceedings of the 2016 International Conference on Management of Data pp. 2241-2243) (2016). New York: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/2882903.2912566
Quamar, A., Deshpande, A., Lin, J.: Nscale: neighborhoodcentric large-scale graph analytics in the cloud. VLDB J. 25(2), 125–150 (2016). https://doi.org/10.1007/s00778-015-0405-2
Chen, H., Liu, M., Zhao, Y., Yan, X., Yan, D., Cheng, J.: G-miner: an efficient task-oriented graph mining system. In: Oliveira, R., Felber, P., Hu, Y.C. (Eds.), Proceedings of the 13th eurosys conference, pp. 1-12. ACM, (2018). https://doi.org/10.1145/3190508.3190545
dos Santos Dias, V.V., Teixeira, C.H.C., Guedes, D.O., Jr., W.M., Parthasarathy, S.: Fractal: A general-purpose graph pattern mining system. SIGMOD (pp. 1357-1374). ACM, (2019). https://doi.org/10.1145/3299869.3319875
Zhang, H., Yu, J.X., Zhang, Y., Zhao, K., Cheng, H.: Distributed subgraph counting: a general approach. Proc. VLDB Endow. 13(11), 2493–2507 (2020)
Yan, D., Guo, G., Rahman Chowdhury, M.M., Tamer Özsu, M., Ku, W.-S., Lui, J.C.S.: G-thinker: A distributed framework for mining subgraphs in a big graph. In: 2020 ieee 36th International Conference on Data Engineering (icde), p. 1369-1380 (2020). https://doi.org/10.1109/ICDE48307.2020.00122
Farouzi, A., Bellatreche, L., Ordonez, C., Pandurangan, G., Malki, M.: A scalable randomized algorithm for triangle enumeration on graph based on SQL queries. Dawak Conf. (2020). https://doi.org/10.1007/978-3-030-59065-9_12
Pandurangan, G., Robinson, P., Scquizzato, M.: Fast distributed algorithms for connectivity and MST in large graphs. ACM Trans. Parallel Comput. 5(1), 4:1-4:22 (2018). https://doi.org/10.1145/3209689
Pandurangan, G., Robinson, P., Scquizzato, M.: On the distributed complexity of large-scale graph computations. SPAA (2018). https://doi.org/10.1145/3210377.3210409
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Contributions
Conceptualization: AF, CO; Methodology: CO; Formal analysis and investigation: AF, XZ; Writing−original draft preparation: AF, XZ; Writing−review and editing: AF, XZ, CO, LB; Funding acquisition: MM; Resources: LB; Supervision: CO, LB, MM.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Farouzi, A., Zhou, X., Bellatreche, L. et al. Balanced parallel triangle enumeration with an adaptive algorithm. Distrib Parallel Databases 42, 103–141 (2024). https://doi.org/10.1007/s10619-023-07437-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-023-07437-x