Advantage of Overlapping Clusters for Minimizing Conductance

  • Rohit Khandekar
  • Guy Kortsarz
  • Vahab Mirrokni
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7256)

Abstract

Graph clustering is an important problem with applications to bioinformatics, community discovery in social networks, distributed computing, etc. While most of the research in this area has focused on clustering using disjoint clusters, many real datasets have inherently overlapping clusters. We compare overlapping and non-overlapping clusterings in graphs in the context of minimizing their conductance. It is known that allowing clusters to overlap gives better results in practice. We prove that overlapping clustering may be significantly better than non-overlapping clustering with respect to conductance, even in a theoretical setting.

For minimizing the maximum conductance over the clusters, we give examples demonstrating that allowing overlaps can yield significantly better clusterings, namely, one that has much smaller optimum. In addition for the min-max variant, the overlapping version admits a simple approximation algorithm, while our algorithm for the non-overlapping version is complex and yields worse approximation ratio due to the presence of the additional constraint. Somewhat surprisingly, for the problem of minimizing the sum of conductances, we found out that allowing overlap does not really help. We show how to apply a general technique to transform any overlapping clustering into a non-overlapping one with only a modest increase in the sum of conductances. This uncrossing technique is of independent interest and may find further applications in the future.

Keywords

graph clustering overlapping clustering tree decomposition dynamic programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Andersen, R., Chung, F.R.K., Lang, K.J.: Local graph partitioning using pagerank vectors. In: FOCS, pp. 475–486 (2006)Google Scholar
  2. 2.
    Andersen, R., Gleich, D., Mirrokni, V.: Overlapping clustering for distributed computation. In: ACM Conference on Web Search and Data Mining (2012)Google Scholar
  3. 3.
    Arora, S., Rao, S., Vazirani, U.V.: Expander flows, geometric embeddings and graph partitioning. In: STOC, pp. 222–231 (2004)Google Scholar
  4. 4.
    Brandes, U., Gaertler, M., Wagner, D.: Engineering graph clustering: Models and experimental evaluation. ACM J. Experimental Algorithmics 1(1) (2007)Google Scholar
  5. 5.
    Even, G., Naor, J., Rao, S., Schieber, B.: Fast approximate graph partitioning algorithms. SIAM J. Comput. 28(6), 2187–2214 (1999)MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    Feige, U., Peleg, D., Kortsarz, G.: The dense k -subgraph problem. Algorithmica 29(3), 410–421 (2001), citeseer.ist.psu.edu/feige99dense.html MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Gargi, U., Lu, W., Mirrokni, V., Yoon, S.: Large-scale community detection on youtube. In: ICWSM (2011)Google Scholar
  8. 8.
    Gleich, D.: Personal communication (2011)Google Scholar
  9. 9.
    Harrelson, C., Hildrum, K., Rao, S.: A polynomial-time tree decomposition to minimize congestion. In: SPAA, pp. 34–43 (2003)Google Scholar
  10. 10.
    Khandekar, R., Rao, S., Vazirani, U.V.: Graph partitioning using single commodity flows. In: STOC, pp. 385–390 (2006)Google Scholar
  11. 11.
    Lepère, R., Rapine, C.: An Asymptotic \(\mathcal{O}(\ln\rho/\ln\ln\rho)\)-Approximation Algorithm for the Scheduling Problem with Duplication on Large Communication Delay Graphs. In: Alt, H., Ferreira, A. (eds.) STACS 2002. LNCS, vol. 2285, pp. 154–165. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. 12.
    Mishra, N., Schreiber, R., Stanton, I., Tarjan, R.E.: Clustering Social Networks. In: Bonato, A., Chung, F.R.K. (eds.) WAW 2007. LNCS, vol. 4863, pp. 56–67. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  13. 13.
    Räcke, H.: Minimizing congestion in general networks. In: FOCS, pp. 43–52 (2002)Google Scholar
  14. 14.
    Räcke, H.: Optimal hierarchical decompositions for congestion minimization in networks. In: STOC, pp. 255–264 (2008)Google Scholar
  15. 15.
    Spielman, D.A., Teng, S.H.: Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In: STOC, pp. 81–90 (2004)Google Scholar
  16. 16.
    Streich, A.P., Frank, M., Basin, D., Buhmann, J.M.: Multi-assignment clustering for boolean data. In: ICML (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Rohit Khandekar
    • 1
  • Guy Kortsarz
    • 2
  • Vahab Mirrokni
    • 3
  1. 1.IBM T.J. Watson Research CenterUSA
  2. 2.Rutgers UniversityCamdenUSA
  3. 3.Google ResearchNew YorkUSA

Personalised recommendations