The VLDB Journal

, Volume 25, Issue 2, pp 171–196

Diversified top-k clique search

  • Long Yuan
  • Lu Qin
  • Xuemin Lin
  • Lijun Chang
  • Wenjie Zhang
Regular Paper

DOI: 10.1007/s00778-015-0408-z

Cite this article as:
Yuan, L., Qin, L., Lin, X. et al. The VLDB Journal (2016) 25: 171. doi:10.1007/s00778-015-0408-z

Abstract

Maximal clique enumeration is a fundamental problem in graph theory and has been extensively studied. However, maximal clique enumeration is time-consuming in large graphs and always returns enormous cliques with large overlaps. Motivated by this, in this paper, we study the diversified top-k clique search problem which is to find top-k cliques that can cover most number of nodes in the graph. Diversified top-k clique search can be widely used in a lot of applications including community search, motif discovery, and anomaly detection in large graphs. A naive solution for diversified top-k clique search is to keep all maximal cliques in memory and then find k of them that cover most nodes in the graph by using the approximate greedy max k-cover algorithm. However, such a solution is impractical when the graph is large. In this paper, instead of keeping all maximal cliques in memory, we devise an algorithm to maintain k candidates in the process of maximal clique enumeration. Our algorithm has limited memory footprint and can achieve a guaranteed approximation ratio. We also introduce a novel light-weight \(\mathsf {PNP}\)-\(\mathsf {Index}\), based on which we design an optimal maximal clique maintenance algorithm. We further explore three optimization strategies to avoid enumerating all maximal cliques and thus largely reduce the computational cost. Besides, for the massive input graph, we develop an I/O efficient algorithm to tackle the problem when the input graph cannot fit in main memory. We conduct extensive performance studies on real graphs and synthetic graphs. One of the real graphs contains 1.02 billion edges. The results demonstrate the high efficiency and effectiveness of our approach.

Keywords

Graph Diversified top-k search Clique I/O efficient 

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Long Yuan
    • 1
  • Lu Qin
    • 2
  • Xuemin Lin
    • 1
  • Lijun Chang
    • 1
  • Wenjie Zhang
    • 1
  1. 1.The University of New South WalesSydneyAustralia
  2. 2.Centre for QCISUniversity of TechnologySydneyAustralia