Skip to main content
Log in

Balanced parallel triangle enumeration with an adaptive algorithm

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Triangle enumeration is a foundation brick for solving harder graph problems related to social networks, the Internet and transportation, to name a few applications. This problem is well studied in the theory literature, but remains an open problem with big data. In this paper, we defend the idea of solving triangle enumeration with SQL queries evaluating the steps of a new adaptive algorithm with linear speedup. Such SQL approach provides scalability beyond RAM limits, automatic parallel processing and more importantly: linear speedup as more machines are added. We present theory results and experimental validation showing our solution works well with large graphs analyzed on a parallel cluster with many machines, producing a balanced workload even with highly skewed degree vertices. We consider two types of distributed systems: (1) a parallel DBMS that evaluates SQL queries, and (2) a parallel HPC cluster calling the MPI library (called via Python). Extensive benchmark experiments with large graphs show our SQL solution offers many advantages over MPI and competing graph analytic systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Wimmer, A., Lewis, K.: Beyond and below racial homophily: Erg models of a friendship network documented on facebook. Am. J. Sociol. 116(2), 583–642 (2010). https://doi.org/10.1086/653658

    Article  Google Scholar 

  2. Wasserman, S., Faust, K.: Social network analysis: methods and ap- plications. Cambridge University Press, Cambridge (1994). https://doi.org/10.1017/CBO9780511815478

    Book  Google Scholar 

  3. Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8(1–2), 161–185 (2012). https://doi.org/10.1080/15427951.2012.625260

    Article  MathSciNet  Google Scholar 

  4. Eckmann, J.-P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the world wide web. Proc. Natl. Acad. Sci. 99(9), 5825–5829 (2002). https://doi.org/10.1073/pnas.032093399

    Article  MathSciNet  Google Scholar 

  5. Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: Proceedings of the 14th ACM SIGKDD pp. 16-24, (2008). https://doi.org/10.1145/1401890.1401898

  6. Welser, H.T., Gleave, E., Fisher, D., Smith, M.: Visualizing the signatures of social roles in online discussion groups. J. Soc. Struct. 8(2), 1–32 (2007)

    Google Scholar 

  7. Mirza, B.J., Keller, B.J., Ramakrishnan, N.: Studying recommendation algorithms by graph analysis. J. Intell. Inf. Syst. 20(2), 131–160 (2003). https://doi.org/10.1023/A:1021819901281

    Article  Google Scholar 

  8. Fudos, I., Hoffmann, C.M.: A graph-constructive approach to solving systems of geometric constraints. ACM Trans. Graph. 16(2), 179–216 (1997). https://doi.org/10.1145/248210.248223

    Article  Google Scholar 

  9. Berry, J.W., Fostvedt, L.A., Nordman, D.J., Phillips, C.A., Seshadhri, C., Wilson, A.G.: Why do simple algorithms for triangle enumeration work in the real world? Internet Math. 11(6), 555–571 (2015)

    Article  MathSciNet  Google Scholar 

  10. Arifuzzaman, S., Khan, M., Marathe, M.: A space-efficient parallel algorithm for counting exact triangles in massive networks. Hpcc (2015). https://doi.org/10.1109/HPCC-CSS-ICESS.2015.301

    Article  Google Scholar 

  11. Teixeira, C.H.C., Fonseca, A.J., Serafini, M., Siganos, G., Zaki, M.J., Aboulnaga, A.: Arabesque: A system for distributed graph mining. In: Proceedings of the 25th Symposium on Operating Systems Principles pp. 425-440 (2015). New York: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/2815400.2815410

  12. Rasel, M.K., Han, Y., Kim, J., Park, K., Tu, N.A., Lee, Y.-K.: Itri: index-based triangle listing in massive graphs. Inf. Sci. 336, 1–20 (2016)

    Article  Google Scholar 

  13. Park, H.-M., Myaeng, S.-H., Kang, U.: PTE: enumerating trillion triangles on distributed systems. ACM SIGKDD (2016). https://doi.org/10.1145/2939672.2939757

    Article  Google Scholar 

  14. Zhu, Y., Zhang, H., Qin, L., Cheng, H.: Efficient MapReduce algorithms for triangle listing in billion-scale graphs. DAPD J. 35(2), 149–176 (2017)

    Google Scholar 

  15. Klauck, H., Nanongkai, D., Pandurangan, G., Robinson, P.: Distributed computation of large-scale graph problems. In: Proceedings of the 26th ACM-SIAM SODA, pp. 391-410 (2015)

  16. Al Hasan, M., Dave, V.S.: Triangle counting in large networks: a review. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8(2), e1226 (2018)

    Article  Google Scholar 

  17. Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11(4), 29–41 (2009). https://doi.org/10.1109/MCSE.2009.120

    Article  Google Scholar 

  18. Xirogiannopoulos, K., Khurana, U., Deshpande, A.: Graphgen: exploring interesting graphs in relational data. Proc. VLDB Endow. 8(12), 2032–2035 (2015). https://doi.org/10.14778/2824032.2824129

    Article  Google Scholar 

  19. Bordoloi, S., Kalita, B.: Article: designing graph database models from existing relational databases. Intern. J. Comput. Appl. 74(1), 25–31 (2013)

    Google Scholar 

  20. De Virgilio, R., Maccioni, A., Torlone, R.: Converting relational to graph databases. GRADES 2013 co-located with SIGMOD/PODS (2013). https://doi.org/10.1145/2484425.2484426

    Article  Google Scholar 

  21. Leskovec, J., Krevl, A.: SNAP Datasets: stanford large network dataset collection. http://snap.stanford.edu/data (2014)

  22. Kunegis, J.: Konect: The koblenz network collection. http://konect.uni-koblenz.de. Association for Computing Machinery (2013)

  23. Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. Exp. Effic. Algorithms (2005). https://doi.org/10.1007/11427186_54

    Article  Google Scholar 

  24. Itai, A., Rodeh, M.: Finding a minimum circuit in a graph. SIAM J. Comput. 7(4), 413–423 (1978). https://doi.org/10.1137/0207033

    Article  MathSciNet  Google Scholar 

  25. Ayed, R., Hacid, M., Haque, R., Jemai, A.: An updated dashboard of complete search FSM implementations in centralized graph transaction databases. J. Intell. Inf. Syst. 55(1), 149–182 (2020). https://doi.org/10.1007/s10844-019-00579-4

    Article  Google Scholar 

  26. Yu, M., Qin, L., Zhang, Y., Zhang, W., Lin, X.:. Aot: Pushing the efficiency boundary of main-memory triangle listing. In: Nah, Y., Cui, B., Lee, S.-W., Yu, J.X., Moon, Y.-S., Whang, S.E. (Eds.), Database systems for advanced applications (pp. 516-533). Cham: Springer International Publishing (2020)

  27. Yu, M., Qin, L., Zhang, Y., Zhang, W., Lin, X.: Dptl+: efficient parallel triangle listing on batch-dynamic graphs. IEEE ICDE (2021). https://doi.org/10.1109/ICDE51399.2021.00119

    Article  Google Scholar 

  28. Afrati, F.N., Sarma, A.D., Salihoglu, S., Ullman, J.D.: Upper and lower bounds on the cost of a MapReduce computation. PVLDB 6(4), 277–288 (2013). https://doi.org/10.14778/2535570.2488334

    Article  Google Scholar 

  29. Park, H.-M., Chung, C.-W.: An efficient MapReduce algorithm for counting triangles in a very large graph. ACM CIKM (2013). https://doi.org/10.1145/2505515.2505563

    Article  Google Scholar 

  30. Zhu, Y., Zhang, H., Qin, L., Cheng, H.: Efficient mapreduce algorithms for triangle listing in billion-scale graphs. DAPD 35(2), 149–176 (2017). https://doi.org/10.1007/s10619-017-7193-1

    Article  Google Scholar 

  31. Zhuo, W., Pan, B., Bo, S.: Parallel algorithm for triangle enumeration. J. Compu. Appl. 37(12), 3397–3400 (2017). https://doi.org/10.11772/j.issn.1001-9081.2017.12.3397

    Article  Google Scholar 

  32. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: Proceedings of the 2010 Acm Sigmod Interna-tional Conference on Management of Data (p. 135-146) (2010). New York, : Association for Computing Machinery. Retrieved from https://doi.org/10.1145/1807167.1807184

  33. Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx a: resilient distributed graph system on spark. First international work- shop on graph data management experiences and systems. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2484425.2484427

    Book  Google Scholar 

  34. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: A new framework for parallel machine learning. In: Grünwald, P., Spirtes, P. (Eds.), UAI 2010, Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, Catalina Island, vol. 8-11, pp. 340-349. AUAI Press (2010)

  35. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: Distributed graph-parallel computation on natural graphs. In: Thekkath, C., Vahdat, A. (Eds.), 10th USENIX symposium on op- erating systems design and implementation, OSDI 2012, hollywood, ca, usa, october 8-10, 2012 (pp. 17-30) (2012). USENIX Association. Retrieved from https://www.usenix.org/conference/osdi12/technicalsessions/ presentation/gonzalez

  36. Yan, D., Bu, Y., Tian, Y., Deshpande, A., Cheng, J.: Big graph analytics systems. In: Proceedings of the 2016 International Conference on Management of Data pp. 2241-2243) (2016). New York: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/2882903.2912566

  37. Quamar, A., Deshpande, A., Lin, J.: Nscale: neighborhoodcentric large-scale graph analytics in the cloud. VLDB J. 25(2), 125–150 (2016). https://doi.org/10.1007/s00778-015-0405-2

    Article  Google Scholar 

  38. Chen, H., Liu, M., Zhao, Y., Yan, X., Yan, D., Cheng, J.: G-miner: an efficient task-oriented graph mining system. In: Oliveira, R., Felber, P., Hu, Y.C. (Eds.), Proceedings of the 13th eurosys conference, pp. 1-12. ACM, (2018). https://doi.org/10.1145/3190508.3190545

  39. dos Santos Dias, V.V., Teixeira, C.H.C., Guedes, D.O., Jr., W.M., Parthasarathy, S.: Fractal: A general-purpose graph pattern mining system. SIGMOD (pp. 1357-1374). ACM, (2019). https://doi.org/10.1145/3299869.3319875

  40. Zhang, H., Yu, J.X., Zhang, Y., Zhao, K., Cheng, H.: Distributed subgraph counting: a general approach. Proc. VLDB Endow. 13(11), 2493–2507 (2020)

    Article  Google Scholar 

  41. Yan, D., Guo, G., Rahman Chowdhury, M.M., Tamer Özsu, M., Ku, W.-S., Lui, J.C.S.: G-thinker: A distributed framework for mining subgraphs in a big graph. In: 2020 ieee 36th International Conference on Data Engineering (icde), p. 1369-1380 (2020). https://doi.org/10.1109/ICDE48307.2020.00122

  42. Farouzi, A., Bellatreche, L., Ordonez, C., Pandurangan, G., Malki, M.: A scalable randomized algorithm for triangle enumeration on graph based on SQL queries. Dawak Conf. (2020). https://doi.org/10.1007/978-3-030-59065-9_12

    Article  Google Scholar 

  43. Pandurangan, G., Robinson, P., Scquizzato, M.: Fast distributed algorithms for connectivity and MST in large graphs. ACM Trans. Parallel Comput. 5(1), 4:1-4:22 (2018). https://doi.org/10.1145/3209689

    Article  Google Scholar 

  44. Pandurangan, G., Robinson, P., Scquizzato, M.: On the distributed complexity of large-scale graph computations. SPAA (2018). https://doi.org/10.1145/3210377.3210409

    Article  Google Scholar 

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: AF, CO; Methodology: CO; Formal analysis and investigation: AF, XZ; Writing−original draft preparation: AF, XZ; Writing−review and editing: AF, XZ, CO, LB; Funding acquisition: MM; Resources: LB; Supervision: CO, LB, MM.

Corresponding author

Correspondence to Abir Farouzi.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Farouzi, A., Zhou, X., Bellatreche, L. et al. Balanced parallel triangle enumeration with an adaptive algorithm. Distrib Parallel Databases 42, 103–141 (2024). https://doi.org/10.1007/s10619-023-07437-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-023-07437-x

Keywords

Navigation