Parallelizing Maximal Clique Enumeration Over Graph Data

Chen, Qun; Fang, Chao; Wang, Zhuo; Suo, Bo; Li, Zhanhuai; Ives, Zachary G.

doi:10.1007/978-3-319-32049-6_16

Qun Chen¹⁹,
Chao Fang¹⁹,
Zhuo Wang¹⁹,
Bo Suo¹⁹,
Zhanhuai Li¹⁹ &
…
Zachary G. Ives²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9643))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1618 Accesses
10 Citations

Abstract

In a wide variety of emerging data-intensive applications, such as social network analysis, Web document clustering, entity resolution, and detection of consistently co-expressed genes in systems biology, the detection of dense subgraphs (cliques) is an essential component. Unfortunately, this problem is NP-Complete and thus computationally intensive at scale — hence there is a need to come up with techniques for distributing the computation across multiple machines such that the computation, which is too time-consuming on a single machine, can be efficiently performed on a machine cluster given that it is large enough.

In this paper, we first propose a new approach for maximal clique enumeration, which identifies cliques by recursive graph partitioning. Given a connected graph \(G=(V,E)\), it has a space complexity of O(|E|) and a time complexity of \(O(|E|\mu (G))\), where \(\mu (G)\) represents the number of different cliques existing in G. It recursively divides a graph until each task is sufficiently small to be processed in parallel. We then develop parallel solutions and demonstrate how graph partitioning can enable effective load balancing. Finally, we evaluate the performance of the proposed approach on real and synthetic graph data and show that it performs considerably better than existing approaches in both centralized and parallel settings. Our parallel algorithms are implemented and evaluated on MapReduce, a popular shared-nothing parallel framework, but can easily generalize to other shared-nothing or shared-memory parallel frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mapreduce. http://en.wikipedia.org/wiki/MapReduce
Real graph datasets. http://snap.stanford.edu/data/
McClosky, B., Hicks, I.V.: Combinatorial algorithms for the maximum k-plex problem. J. Comb. Optim. 23, 29–49 (2012)
Article MathSciNet MATH Google Scholar
On, B.W., Elmacioglu, E., et al.: Improving grouped-entity resolution using quasi-cliques. In: ICDM (2006)
Google Scholar
Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)
Article MATH Google Scholar
Cheng, J., Ke, Y., et al.: Finding maximal cliques in massive networksby H*-graph. In: SIGMOD (2010)
Google Scholar
Bader, D.A., Madduri, K.: GTgraph: a synthetic graph generator suite (2006). http://www.cse.psu.edu/madduri/software/GTgraph/
Eppstein, D., Strash, D.: Listing all maximal cliques in large sparse real-world graphs. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 364–375. Springer, Heidelberg (2011)
Chapter Google Scholar
Eppstein, D., Löffler, M., Strash, D.: Listing all maximal cliques in sparse graphs in near-optimal time. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part I. LNCS, vol. 6506, pp. 403–414. Springer, Heidelberg (2010)
Chapter Google Scholar
Akkoyunlu, E.A.: The enumeration of maximal cliques of large graphs. SIAM J. Comput. 2(1), 1–6 (1973)
Article MathSciNet MATH Google Scholar
Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1), 28–42 (2006)
Article MathSciNet MATH Google Scholar
Cazals, F., Karande, C.: A note on the problem of reporting maximal cliques. Theor. Comput. Sci. 407(1–3), 564–568 (2008)
Article MathSciNet MATH Google Scholar
Pavlopoulos, G.A., Secrier, M., et al.: Using graph theory to analyze biological networks. BioData Min. 4(10), 1–10 (2011)
Google Scholar
Malewicz, G., Austern, M.H., et al.: Pregel: a system for large-scale graphprocessing. In: SIGMOD (2010)
Google Scholar
Cheng, J., Zhu, L.H., et al.: Fast algorithms for maximal clique enumeration with limited memory. In: KDD (2012)
Google Scholar
Cheng, J., Ke, Y.P., et al.: Finding maximal cliques in massive networks. TODS 36(4), Article No. 21, 1–34 (2011)
Google Scholar
Xiang, J.G., Guo, C., Aboulnaga, A.: Scalable maximum clique computation using mapreduce. In: ICDE (2013)
Google Scholar
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: SIGKDD (2006)
Google Scholar
Wang, J.Y., Zeng, Z.P., Zhou, L.Z.: CLAN: an algorithm for mining closed cliques from large dense graph databases. In: ICDE (2006)
Google Scholar
Makino, K., Uno, T.: New algorithms for enumerating all maximal cliques. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 260–272. Springer, Heidelberg (2004)
Chapter Google Scholar
Leskovec, J., Lang, K.J., et al.: Statistical properties of community structure in large social and information networks. In: WWW, pp. 695–704 (2008)
Google Scholar
Lu, L., Gu, Y., et al.: dMaximalCliques: a distributed algorithm for enumerating all maximal cliques and maximal clique distribution. In: IEEE International Conference on Data Mining Workshops, pp. 1320–1327 (2010)
Google Scholar
Schmidt, M.C., Samatova, N.F., et al.: A scalable, parallel algorithm for maximal clique enumeration. J. Parallel Distrib. Comput. 69, 417–428 (2009)
Article Google Scholar
Haraguchi, M., Okubo, Y.: A method for pinpoint clustering of web pages with pseudo-clique search. In: Jantke, K.P., Lunzer, A., Spyratos, N., Tanaka, Y. (eds.) Federation over the Web. LNCS (LNAI), vol. 3847, pp. 59–78. Springer, Heidelberg (2006)
Chapter Google Scholar
Chiba, N., Nishizeki, T.: Arboricity and subgraph listing algorithms. SIAM J. Comput. 14(1), 210–223 (1985)
Article MathSciNet MATH Google Scholar
Du, N., Wu, B., et al.: A parallel algorithm for enumerating all maximal cliques in complex network. In: ICDM Workshops (2006)
Google Scholar
Modani, N., Dey, K.: Large maximal cliques enumeration in sparse graphs. In: CIKM, pp. 1377–1378 (2008)
Google Scholar
Chen, Q., Fang, C., et al.: Parallelizing clique and quasi-clique detection over graph data. Technical report, Northwestern Polytechnical University, (2014). http://wowbigdata.cn/paper/clique.pdf
Rossi, R.A., Gleich, D.F., et al.: Fast maximum clique algorithms for large graphs. In: WWW (2014)
Google Scholar
Hanneman, R.: Introduction to social network methods, Chap. 11:cliques (2005). http://faculty.ucr.edu/~hanneman/nettext/
Tsukiyama, S., Ide, M., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM J. Comput. 6(3), 505–517 (1977)
Article MathSciNet MATH Google Scholar
Stix, V.: Finding all maximal cliques in dynamic graphs. Comput. Optim. Appl. 27, 173–186 (2004)
Article MathSciNet MATH Google Scholar
Wu, B., Yang, S., et al.: A distributed algorithm to enumerate all maximal cliques in mapreduce. In: International Conference on Frontier of Computer Science and Technology, pp. 45–51 (2009)
Google Scholar
Yang, S., Wang, B., et al.: Efficient dense structure mining using mapreduce. In: IEEE International Conference on Data Mining Workshops, pp. 332–337 (2009)
Google Scholar
Zhang, Y., Abu-Khzam, F.N., et al.: Genome-scale computational approaches to memory-intensive applications in systems biology. In: ACM/IEEE Supercomputing (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Northwestern Polytechnical University, Xi’an, China
Qun Chen, Chao Fang, Zhuo Wang, Bo Suo & Zhanhuai Li
Department of Computer and Information Systems, University of Pennsylvania, Philadelphia, PA, USA
Zachary G. Ives

Authors

Qun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chao Fang
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Suo
View author publications
You can also search for this author in PubMed Google Scholar
Zhanhuai Li
View author publications
You can also search for this author in PubMed Google Scholar
Zachary G. Ives
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qun Chen .

Editor information

Editors and Affiliations

Georgia Institute of Technology , Atlanta, Georgia, USA
Shamkant B. Navathe
University of Texas at Dallas , Richardson, Texas, USA
Weili Wu
University of Minnesota , Minneapolis, Minnesota, USA
Shashi Shekhar
Renmin University , Beijing, China
Xiaoyong Du
Fudan University , Shanghai, China
Sean X. Wang
Rutgers, The State University of New Jer , New Brunswick, New Jersey, USA
Hui Xiong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Q., Fang, C., Wang, Z., Suo, B., Li, Z., Ives, Z.G. (2016). Parallelizing Maximal Clique Enumeration Over Graph Data. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, S., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9643. Springer, Cham. https://doi.org/10.1007/978-3-319-32049-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-32049-6_16
Published: 25 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32048-9
Online ISBN: 978-3-319-32049-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics