Balanced parallel triangle enumeration with an adaptive algorithm

Farouzi, Abir; Zhou, Xiantian; Bellatreche, Ladjel; Malki, Mimoun; Ordonez, Carlos

doi:10.1007/s10619-023-07437-x

Balanced parallel triangle enumeration with an adaptive algorithm

Published: 13 July 2023

Volume 42, pages 103–141, (2024)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Abir Farouzi^1,3,
Xiantian Zhou²,
Ladjel Bellatreche¹,
Mimoun Malki³ &
…
Carlos Ordonez²

239 Accesses
Explore all metrics

Abstract

Triangle enumeration is a foundation brick for solving harder graph problems related to social networks, the Internet and transportation, to name a few applications. This problem is well studied in the theory literature, but remains an open problem with big data. In this paper, we defend the idea of solving triangle enumeration with SQL queries evaluating the steps of a new adaptive algorithm with linear speedup. Such SQL approach provides scalability beyond RAM limits, automatic parallel processing and more importantly: linear speedup as more machines are added. We present theory results and experimental validation showing our solution works well with large graphs analyzed on a parallel cluster with many machines, producing a balanced workload even with highly skewed degree vertices. We consider two types of distributed systems: (1) a parallel DBMS that evaluates SQL queries, and (2) a parallel HPC cluster calling the MPI library (called via Python). Extensive benchmark experiments with large graphs show our SQL solution offers many advantages over MPI and competing graph analytic systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Scalable Randomized Algorithm for Triangle Enumeration on Graphs Based on SQL Queries

Triangle Enumeration on Massive Graphs Using AWS Lambda Functions

Parallel Triangle Counting over Large Graphs

References

Wimmer, A., Lewis, K.: Beyond and below racial homophily: Erg models of a friendship network documented on facebook. Am. J. Sociol. 116(2), 583–642 (2010). https://doi.org/10.1086/653658
Article Google Scholar
Wasserman, S., Faust, K.: Social network analysis: methods and ap- plications. Cambridge University Press, Cambridge (1994). https://doi.org/10.1017/CBO9780511815478
Book Google Scholar
Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8(1–2), 161–185 (2012). https://doi.org/10.1080/15427951.2012.625260
Article MathSciNet Google Scholar
Eckmann, J.-P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the world wide web. Proc. Natl. Acad. Sci. 99(9), 5825–5829 (2002). https://doi.org/10.1073/pnas.032093399
Article MathSciNet Google Scholar
Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: Proceedings of the 14th ACM SIGKDD pp. 16-24, (2008). https://doi.org/10.1145/1401890.1401898
Welser, H.T., Gleave, E., Fisher, D., Smith, M.: Visualizing the signatures of social roles in online discussion groups. J. Soc. Struct. 8(2), 1–32 (2007)
Google Scholar
Mirza, B.J., Keller, B.J., Ramakrishnan, N.: Studying recommendation algorithms by graph analysis. J. Intell. Inf. Syst. 20(2), 131–160 (2003). https://doi.org/10.1023/A:1021819901281
Article Google Scholar
Fudos, I., Hoffmann, C.M.: A graph-constructive approach to solving systems of geometric constraints. ACM Trans. Graph. 16(2), 179–216 (1997). https://doi.org/10.1145/248210.248223
Article Google Scholar
Berry, J.W., Fostvedt, L.A., Nordman, D.J., Phillips, C.A., Seshadhri, C., Wilson, A.G.: Why do simple algorithms for triangle enumeration work in the real world? Internet Math. 11(6), 555–571 (2015)
Article MathSciNet Google Scholar
Arifuzzaman, S., Khan, M., Marathe, M.: A space-efficient parallel algorithm for counting exact triangles in massive networks. Hpcc (2015). https://doi.org/10.1109/HPCC-CSS-ICESS.2015.301
Article Google Scholar
Teixeira, C.H.C., Fonseca, A.J., Serafini, M., Siganos, G., Zaki, M.J., Aboulnaga, A.: Arabesque: A system for distributed graph mining. In: Proceedings of the 25th Symposium on Operating Systems Principles pp. 425-440 (2015). New York: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/2815400.2815410
Rasel, M.K., Han, Y., Kim, J., Park, K., Tu, N.A., Lee, Y.-K.: Itri: index-based triangle listing in massive graphs. Inf. Sci. 336, 1–20 (2016)
Article Google Scholar
Park, H.-M., Myaeng, S.-H., Kang, U.: PTE: enumerating trillion triangles on distributed systems. ACM SIGKDD (2016). https://doi.org/10.1145/2939672.2939757
Article Google Scholar
Zhu, Y., Zhang, H., Qin, L., Cheng, H.: Efficient MapReduce algorithms for triangle listing in billion-scale graphs. DAPD J. 35(2), 149–176 (2017)
Google Scholar
Klauck, H., Nanongkai, D., Pandurangan, G., Robinson, P.: Distributed computation of large-scale graph problems. In: Proceedings of the 26th ACM-SIAM SODA, pp. 391-410 (2015)
Al Hasan, M., Dave, V.S.: Triangle counting in large networks: a review. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8(2), e1226 (2018)
Article Google Scholar
Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11(4), 29–41 (2009). https://doi.org/10.1109/MCSE.2009.120
Article Google Scholar
Xirogiannopoulos, K., Khurana, U., Deshpande, A.: Graphgen: exploring interesting graphs in relational data. Proc. VLDB Endow. 8(12), 2032–2035 (2015). https://doi.org/10.14778/2824032.2824129
Article Google Scholar
Bordoloi, S., Kalita, B.: Article: designing graph database models from existing relational databases. Intern. J. Comput. Appl. 74(1), 25–31 (2013)
Google Scholar
De Virgilio, R., Maccioni, A., Torlone, R.: Converting relational to graph databases. GRADES 2013 co-located with SIGMOD/PODS (2013). https://doi.org/10.1145/2484425.2484426
Article Google Scholar
Leskovec, J., Krevl, A.: SNAP Datasets: stanford large network dataset collection. http://snap.stanford.edu/data (2014)
Kunegis, J.: Konect: The koblenz network collection. http://konect.uni-koblenz.de. Association for Computing Machinery (2013)
Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. Exp. Effic. Algorithms (2005). https://doi.org/10.1007/11427186_54
Article Google Scholar
Itai, A., Rodeh, M.: Finding a minimum circuit in a graph. SIAM J. Comput. 7(4), 413–423 (1978). https://doi.org/10.1137/0207033
Article MathSciNet Google Scholar
Ayed, R., Hacid, M., Haque, R., Jemai, A.: An updated dashboard of complete search FSM implementations in centralized graph transaction databases. J. Intell. Inf. Syst. 55(1), 149–182 (2020). https://doi.org/10.1007/s10844-019-00579-4
Article Google Scholar
Yu, M., Qin, L., Zhang, Y., Zhang, W., Lin, X.:. Aot: Pushing the efficiency boundary of main-memory triangle listing. In: Nah, Y., Cui, B., Lee, S.-W., Yu, J.X., Moon, Y.-S., Whang, S.E. (Eds.), Database systems for advanced applications (pp. 516-533). Cham: Springer International Publishing (2020)
Yu, M., Qin, L., Zhang, Y., Zhang, W., Lin, X.: Dptl+: efficient parallel triangle listing on batch-dynamic graphs. IEEE ICDE (2021). https://doi.org/10.1109/ICDE51399.2021.00119
Article Google Scholar
Afrati, F.N., Sarma, A.D., Salihoglu, S., Ullman, J.D.: Upper and lower bounds on the cost of a MapReduce computation. PVLDB 6(4), 277–288 (2013). https://doi.org/10.14778/2535570.2488334
Article Google Scholar
Park, H.-M., Chung, C.-W.: An efficient MapReduce algorithm for counting triangles in a very large graph. ACM CIKM (2013). https://doi.org/10.1145/2505515.2505563
Article Google Scholar
Zhu, Y., Zhang, H., Qin, L., Cheng, H.: Efficient mapreduce algorithms for triangle listing in billion-scale graphs. DAPD 35(2), 149–176 (2017). https://doi.org/10.1007/s10619-017-7193-1
Article Google Scholar
Zhuo, W., Pan, B., Bo, S.: Parallel algorithm for triangle enumeration. J. Compu. Appl. 37(12), 3397–3400 (2017). https://doi.org/10.11772/j.issn.1001-9081.2017.12.3397
Article Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: Proceedings of the 2010 Acm Sigmod Interna-tional Conference on Management of Data (p. 135-146) (2010). New York, : Association for Computing Machinery. Retrieved from https://doi.org/10.1145/1807167.1807184
Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx a: resilient distributed graph system on spark. First international work- shop on graph data management experiences and systems. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2484425.2484427
Book Google Scholar
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: A new framework for parallel machine learning. In: Grünwald, P., Spirtes, P. (Eds.), UAI 2010, Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, Catalina Island, vol. 8-11, pp. 340-349. AUAI Press (2010)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: Distributed graph-parallel computation on natural graphs. In: Thekkath, C., Vahdat, A. (Eds.), 10th USENIX symposium on op- erating systems design and implementation, OSDI 2012, hollywood, ca, usa, october 8-10, 2012 (pp. 17-30) (2012). USENIX Association. Retrieved from https://www.usenix.org/conference/osdi12/technicalsessions/ presentation/gonzalez
Yan, D., Bu, Y., Tian, Y., Deshpande, A., Cheng, J.: Big graph analytics systems. In: Proceedings of the 2016 International Conference on Management of Data pp. 2241-2243) (2016). New York: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/2882903.2912566
Quamar, A., Deshpande, A., Lin, J.: Nscale: neighborhoodcentric large-scale graph analytics in the cloud. VLDB J. 25(2), 125–150 (2016). https://doi.org/10.1007/s00778-015-0405-2
Article Google Scholar
Chen, H., Liu, M., Zhao, Y., Yan, X., Yan, D., Cheng, J.: G-miner: an efficient task-oriented graph mining system. In: Oliveira, R., Felber, P., Hu, Y.C. (Eds.), Proceedings of the 13th eurosys conference, pp. 1-12. ACM, (2018). https://doi.org/10.1145/3190508.3190545
dos Santos Dias, V.V., Teixeira, C.H.C., Guedes, D.O., Jr., W.M., Parthasarathy, S.: Fractal: A general-purpose graph pattern mining system. SIGMOD (pp. 1357-1374). ACM, (2019). https://doi.org/10.1145/3299869.3319875
Zhang, H., Yu, J.X., Zhang, Y., Zhao, K., Cheng, H.: Distributed subgraph counting: a general approach. Proc. VLDB Endow. 13(11), 2493–2507 (2020)
Article Google Scholar
Yan, D., Guo, G., Rahman Chowdhury, M.M., Tamer Özsu, M., Ku, W.-S., Lui, J.C.S.: G-thinker: A distributed framework for mining subgraphs in a big graph. In: 2020 ieee 36th International Conference on Data Engineering (icde), p. 1369-1380 (2020). https://doi.org/10.1109/ICDE48307.2020.00122
Farouzi, A., Bellatreche, L., Ordonez, C., Pandurangan, G., Malki, M.: A scalable randomized algorithm for triangle enumeration on graph based on SQL queries. Dawak Conf. (2020). https://doi.org/10.1007/978-3-030-59065-9_12
Article Google Scholar
Pandurangan, G., Robinson, P., Scquizzato, M.: Fast distributed algorithms for connectivity and MST in large graphs. ACM Trans. Parallel Comput. 5(1), 4:1-4:22 (2018). https://doi.org/10.1145/3209689
Article Google Scholar
Pandurangan, G., Robinson, P., Scquizzato, M.: On the distributed complexity of large-scale graph computations. SPAA (2018). https://doi.org/10.1145/3210377.3210409
Article Google Scholar

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

ISAE-ENSMA, Poitiers, France
Abir Farouzi & Ladjel Bellatreche
University of Houston, Houston, USA
Xiantian Zhou & Carlos Ordonez
Ecole Supérieure en Informatique, Sidi Bel Abbès, Algeria
Abir Farouzi & Mimoun Malki

Authors

Abir Farouzi
View author publications
You can also search for this author in PubMed Google Scholar
Xiantian Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ladjel Bellatreche
View author publications
You can also search for this author in PubMed Google Scholar
Mimoun Malki
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Ordonez
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: AF, CO; Methodology: CO; Formal analysis and investigation: AF, XZ; Writing−original draft preparation: AF, XZ; Writing−review and editing: AF, XZ, CO, LB; Funding acquisition: MM; Resources: LB; Supervision: CO, LB, MM.

Corresponding author

Correspondence to Abir Farouzi.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Farouzi, A., Zhou, X., Bellatreche, L. et al. Balanced parallel triangle enumeration with an adaptive algorithm. Distrib Parallel Databases 42, 103–141 (2024). https://doi.org/10.1007/s10619-023-07437-x

Download citation

Accepted: 25 June 2023
Published: 13 July 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10619-023-07437-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Balanced parallel triangle enumeration with an adaptive algorithm

Abstract

Access this article

Similar content being viewed by others

A Scalable Randomized Algorithm for Triangle Enumeration on Graphs Based on SQL Queries

Triangle Enumeration on Massive Graphs Using AWS Lambda Functions

Parallel Triangle Counting over Large Graphs

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Balanced parallel triangle enumeration with an adaptive algorithm

Abstract

Access this article

Similar content being viewed by others

A Scalable Randomized Algorithm for Triangle Enumeration on Graphs Based on SQL Queries

Triangle Enumeration on Massive Graphs Using AWS Lambda Functions

Parallel Triangle Counting over Large Graphs

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation