Advertisement

Parallelizing Maximal Clique Enumeration Over Graph Data

  • Qun ChenEmail author
  • Chao Fang
  • Zhuo Wang
  • Bo Suo
  • Zhanhuai Li
  • Zachary G. Ives
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9643)

Abstract

In a wide variety of emerging data-intensive applications, such as social network analysis, Web document clustering, entity resolution, and detection of consistently co-expressed genes in systems biology, the detection of dense subgraphs (cliques) is an essential component. Unfortunately, this problem is NP-Complete and thus computationally intensive at scale — hence there is a need to come up with techniques for distributing the computation across multiple machines such that the computation, which is too time-consuming on a single machine, can be efficiently performed on a machine cluster given that it is large enough.

In this paper, we first propose a new approach for maximal clique enumeration, which identifies cliques by recursive graph partitioning. Given a connected graph \(G=(V,E)\), it has a space complexity of O(|E|) and a time complexity of \(O(|E|\mu (G))\), where \(\mu (G)\) represents the number of different cliques existing in G. It recursively divides a graph until each task is sufficiently small to be processed in parallel. We then develop parallel solutions and demonstrate how graph partitioning can enable effective load balancing. Finally, we evaluate the performance of the proposed approach on real and synthetic graph data and show that it performs considerably better than existing approaches in both centralized and parallel settings. Our parallel algorithms are implemented and evaluated on MapReduce, a popular shared-nothing parallel framework, but can easily generalize to other shared-nothing or shared-memory parallel frameworks.

Keywords

Maximal clique enumeration Parallel graph processing  Mapreduce 

References

  1. 1.
  2. 2.
    Real graph datasets. http://snap.stanford.edu/data/
  3. 3.
    McClosky, B., Hicks, I.V.: Combinatorial algorithms for the maximum k-plex problem. J. Comb. Optim. 23, 29–49 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    On, B.W., Elmacioglu, E., et al.: Improving grouped-entity resolution using quasi-cliques. In: ICDM (2006)Google Scholar
  5. 5.
    Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)CrossRefzbMATHGoogle Scholar
  6. 6.
    Cheng, J., Ke, Y., et al.: Finding maximal cliques in massive networksby H*-graph. In: SIGMOD (2010)Google Scholar
  7. 7.
    Bader, D.A., Madduri, K.: GTgraph: a synthetic graph generator suite (2006). http://www.cse.psu.edu/madduri/software/GTgraph/
  8. 8.
    Eppstein, D., Strash, D.: Listing all maximal cliques in large sparse real-world graphs. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 364–375. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  9. 9.
    Eppstein, D., Löffler, M., Strash, D.: Listing all maximal cliques in sparse graphs in near-optimal time. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part I. LNCS, vol. 6506, pp. 403–414. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Akkoyunlu, E.A.: The enumeration of maximal cliques of large graphs. SIAM J. Comput. 2(1), 1–6 (1973)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1), 28–42 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Cazals, F., Karande, C.: A note on the problem of reporting maximal cliques. Theor. Comput. Sci. 407(1–3), 564–568 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Pavlopoulos, G.A., Secrier, M., et al.: Using graph theory to analyze biological networks. BioData Min. 4(10), 1–10 (2011)Google Scholar
  14. 14.
    Malewicz, G., Austern, M.H., et al.: Pregel: a system for large-scale graphprocessing. In: SIGMOD (2010)Google Scholar
  15. 15.
    Cheng, J., Zhu, L.H., et al.: Fast algorithms for maximal clique enumeration with limited memory. In: KDD (2012)Google Scholar
  16. 16.
    Cheng, J., Ke, Y.P., et al.: Finding maximal cliques in massive networks. TODS 36(4), Article No. 21, 1–34 (2011)Google Scholar
  17. 17.
    Xiang, J.G., Guo, C., Aboulnaga, A.: Scalable maximum clique computation using mapreduce. In: ICDE (2013)Google Scholar
  18. 18.
    Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: SIGKDD (2006)Google Scholar
  19. 19.
    Wang, J.Y., Zeng, Z.P., Zhou, L.Z.: CLAN: an algorithm for mining closed cliques from large dense graph databases. In: ICDE (2006)Google Scholar
  20. 20.
    Makino, K., Uno, T.: New algorithms for enumerating all maximal cliques. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 260–272. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  21. 21.
    Leskovec, J., Lang, K.J., et al.: Statistical properties of community structure in large social and information networks. In: WWW, pp. 695–704 (2008)Google Scholar
  22. 22.
    Lu, L., Gu, Y., et al.: dMaximalCliques: a distributed algorithm for enumerating all maximal cliques and maximal clique distribution. In: IEEE International Conference on Data Mining Workshops, pp. 1320–1327 (2010)Google Scholar
  23. 23.
    Schmidt, M.C., Samatova, N.F., et al.: A scalable, parallel algorithm for maximal clique enumeration. J. Parallel Distrib. Comput. 69, 417–428 (2009)CrossRefGoogle Scholar
  24. 24.
    Haraguchi, M., Okubo, Y.: A method for pinpoint clustering of web pages with pseudo-clique search. In: Jantke, K.P., Lunzer, A., Spyratos, N., Tanaka, Y. (eds.) Federation over the Web. LNCS (LNAI), vol. 3847, pp. 59–78. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  25. 25.
    Chiba, N., Nishizeki, T.: Arboricity and subgraph listing algorithms. SIAM J. Comput. 14(1), 210–223 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Du, N., Wu, B., et al.: A parallel algorithm for enumerating all maximal cliques in complex network. In: ICDM Workshops (2006)Google Scholar
  27. 27.
    Modani, N., Dey, K.: Large maximal cliques enumeration in sparse graphs. In: CIKM, pp. 1377–1378 (2008)Google Scholar
  28. 28.
    Chen, Q., Fang, C., et al.: Parallelizing clique and quasi-clique detection over graph data. Technical report, Northwestern Polytechnical University, (2014). http://wowbigdata.cn/paper/clique.pdf
  29. 29.
    Rossi, R.A., Gleich, D.F., et al.: Fast maximum clique algorithms for large graphs. In: WWW (2014)Google Scholar
  30. 30.
    Hanneman, R.: Introduction to social network methods, Chap. 11:cliques (2005). http://faculty.ucr.edu/~hanneman/nettext/
  31. 31.
    Tsukiyama, S., Ide, M., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM J. Comput. 6(3), 505–517 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Stix, V.: Finding all maximal cliques in dynamic graphs. Comput. Optim. Appl. 27, 173–186 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Wu, B., Yang, S., et al.: A distributed algorithm to enumerate all maximal cliques in mapreduce. In: International Conference on Frontier of Computer Science and Technology, pp. 45–51 (2009)Google Scholar
  34. 34.
    Yang, S., Wang, B., et al.: Efficient dense structure mining using mapreduce. In: IEEE International Conference on Data Mining Workshops, pp. 332–337 (2009)Google Scholar
  35. 35.
    Zhang, Y., Abu-Khzam, F.N., et al.: Genome-scale computational approaches to memory-intensive applications in systems biology. In: ACM/IEEE Supercomputing (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Qun Chen
    • 1
    Email author
  • Chao Fang
    • 1
  • Zhuo Wang
    • 1
  • Bo Suo
    • 1
  • Zhanhuai Li
    • 1
  • Zachary G. Ives
    • 2
  1. 1.School of ComputingNorthwestern Polytechnical UniversityXi’anChina
  2. 2.Department of Computer and Information SystemsUniversity of PennsylvaniaPhiladelphiaUSA

Personalised recommendations