Skip to main content

Parallelizing Maximal Clique Enumeration Over Graph Data

  • Conference paper
  • First Online:
Book cover Database Systems for Advanced Applications (DASFAA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9643))

Included in the following conference series:

Abstract

In a wide variety of emerging data-intensive applications, such as social network analysis, Web document clustering, entity resolution, and detection of consistently co-expressed genes in systems biology, the detection of dense subgraphs (cliques) is an essential component. Unfortunately, this problem is NP-Complete and thus computationally intensive at scale — hence there is a need to come up with techniques for distributing the computation across multiple machines such that the computation, which is too time-consuming on a single machine, can be efficiently performed on a machine cluster given that it is large enough.

In this paper, we first propose a new approach for maximal clique enumeration, which identifies cliques by recursive graph partitioning. Given a connected graph \(G=(V,E)\), it has a space complexity of O(|E|) and a time complexity of \(O(|E|\mu (G))\), where \(\mu (G)\) represents the number of different cliques existing in G. It recursively divides a graph until each task is sufficiently small to be processed in parallel. We then develop parallel solutions and demonstrate how graph partitioning can enable effective load balancing. Finally, we evaluate the performance of the proposed approach on real and synthetic graph data and show that it performs considerably better than existing approaches in both centralized and parallel settings. Our parallel algorithms are implemented and evaluated on MapReduce, a popular shared-nothing parallel framework, but can easily generalize to other shared-nothing or shared-memory parallel frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mapreduce. http://en.wikipedia.org/wiki/MapReduce

  2. Real graph datasets. http://snap.stanford.edu/data/

  3. McClosky, B., Hicks, I.V.: Combinatorial algorithms for the maximum k-plex problem. J. Comb. Optim. 23, 29–49 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  4. On, B.W., Elmacioglu, E., et al.: Improving grouped-entity resolution using quasi-cliques. In: ICDM (2006)

    Google Scholar 

  5. Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)

    Article  MATH  Google Scholar 

  6. Cheng, J., Ke, Y., et al.: Finding maximal cliques in massive networksby H*-graph. In: SIGMOD (2010)

    Google Scholar 

  7. Bader, D.A., Madduri, K.: GTgraph: a synthetic graph generator suite (2006). http://www.cse.psu.edu/madduri/software/GTgraph/

  8. Eppstein, D., Strash, D.: Listing all maximal cliques in large sparse real-world graphs. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 364–375. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Eppstein, D., Löffler, M., Strash, D.: Listing all maximal cliques in sparse graphs in near-optimal time. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part I. LNCS, vol. 6506, pp. 403–414. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  10. Akkoyunlu, E.A.: The enumeration of maximal cliques of large graphs. SIAM J. Comput. 2(1), 1–6 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  11. Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1), 28–42 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  12. Cazals, F., Karande, C.: A note on the problem of reporting maximal cliques. Theor. Comput. Sci. 407(1–3), 564–568 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  13. Pavlopoulos, G.A., Secrier, M., et al.: Using graph theory to analyze biological networks. BioData Min. 4(10), 1–10 (2011)

    Google Scholar 

  14. Malewicz, G., Austern, M.H., et al.: Pregel: a system for large-scale graphprocessing. In: SIGMOD (2010)

    Google Scholar 

  15. Cheng, J., Zhu, L.H., et al.: Fast algorithms for maximal clique enumeration with limited memory. In: KDD (2012)

    Google Scholar 

  16. Cheng, J., Ke, Y.P., et al.: Finding maximal cliques in massive networks. TODS 36(4), Article No. 21, 1–34 (2011)

    Google Scholar 

  17. Xiang, J.G., Guo, C., Aboulnaga, A.: Scalable maximum clique computation using mapreduce. In: ICDE (2013)

    Google Scholar 

  18. Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: SIGKDD (2006)

    Google Scholar 

  19. Wang, J.Y., Zeng, Z.P., Zhou, L.Z.: CLAN: an algorithm for mining closed cliques from large dense graph databases. In: ICDE (2006)

    Google Scholar 

  20. Makino, K., Uno, T.: New algorithms for enumerating all maximal cliques. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 260–272. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  21. Leskovec, J., Lang, K.J., et al.: Statistical properties of community structure in large social and information networks. In: WWW, pp. 695–704 (2008)

    Google Scholar 

  22. Lu, L., Gu, Y., et al.: dMaximalCliques: a distributed algorithm for enumerating all maximal cliques and maximal clique distribution. In: IEEE International Conference on Data Mining Workshops, pp. 1320–1327 (2010)

    Google Scholar 

  23. Schmidt, M.C., Samatova, N.F., et al.: A scalable, parallel algorithm for maximal clique enumeration. J. Parallel Distrib. Comput. 69, 417–428 (2009)

    Article  Google Scholar 

  24. Haraguchi, M., Okubo, Y.: A method for pinpoint clustering of web pages with pseudo-clique search. In: Jantke, K.P., Lunzer, A., Spyratos, N., Tanaka, Y. (eds.) Federation over the Web. LNCS (LNAI), vol. 3847, pp. 59–78. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  25. Chiba, N., Nishizeki, T.: Arboricity and subgraph listing algorithms. SIAM J. Comput. 14(1), 210–223 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  26. Du, N., Wu, B., et al.: A parallel algorithm for enumerating all maximal cliques in complex network. In: ICDM Workshops (2006)

    Google Scholar 

  27. Modani, N., Dey, K.: Large maximal cliques enumeration in sparse graphs. In: CIKM, pp. 1377–1378 (2008)

    Google Scholar 

  28. Chen, Q., Fang, C., et al.: Parallelizing clique and quasi-clique detection over graph data. Technical report, Northwestern Polytechnical University, (2014). http://wowbigdata.cn/paper/clique.pdf

  29. Rossi, R.A., Gleich, D.F., et al.: Fast maximum clique algorithms for large graphs. In: WWW (2014)

    Google Scholar 

  30. Hanneman, R.: Introduction to social network methods, Chap. 11:cliques (2005). http://faculty.ucr.edu/~hanneman/nettext/

  31. Tsukiyama, S., Ide, M., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM J. Comput. 6(3), 505–517 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  32. Stix, V.: Finding all maximal cliques in dynamic graphs. Comput. Optim. Appl. 27, 173–186 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  33. Wu, B., Yang, S., et al.: A distributed algorithm to enumerate all maximal cliques in mapreduce. In: International Conference on Frontier of Computer Science and Technology, pp. 45–51 (2009)

    Google Scholar 

  34. Yang, S., Wang, B., et al.: Efficient dense structure mining using mapreduce. In: IEEE International Conference on Data Mining Workshops, pp. 332–337 (2009)

    Google Scholar 

  35. Zhang, Y., Abu-Khzam, F.N., et al.: Genome-scale computational approaches to memory-intensive applications in systems biology. In: ACM/IEEE Supercomputing (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qun Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, Q., Fang, C., Wang, Z., Suo, B., Li, Z., Ives, Z.G. (2016). Parallelizing Maximal Clique Enumeration Over Graph Data. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, S., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9643. Springer, Cham. https://doi.org/10.1007/978-3-319-32049-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32049-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32048-9

  • Online ISBN: 978-3-319-32049-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics