DB-CSC: A Density-Based Approach for Subspace Clustering in Graphs with Feature Vectors

  • Stephan Günnemann
  • Brigitte Boden
  • Thomas Seidl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6911)

Abstract

Data sources representing attribute information in combination with network information are widely available in today’s applications. To realize the full potential for knowledge extraction, mining techniques like clustering should consider both information types simultaneously. Recent clustering approaches combine subspace clustering with dense subgraph mining to identify groups of objects that are similar in subsets of their attributes as well as densely connected within the network. While those approaches successfully circumvent the problem of full-space clustering, their limited cluster definitions are restricted to clusters of certain shapes.

In this work, we introduce a density-based cluster definition taking the attribute similarity in subspaces and the graph density into account. This novel cluster model enables us to detect clusters of arbitrary shape and size. We avoid redundancy in the result by selecting only the most interesting non-redundant clusters. Based on this model, we introduce the clustering algorithm DB-CSC. In thorough experiments we demonstrate the strength of DB-CSC in comparison to related approaches.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C., Wang, H.: Managing and Mining Graph Data. Springer, New York (2010)MATHCrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD, pp. 94–105 (1998)Google Scholar
  3. 3.
    Assent, I., Krieger, R., Müller, E., Seidl, T.: EDSC: Efficient density-based subspace clustering. In: CIKM, pp. 1093–1102 (2008)Google Scholar
  4. 4.
    Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is ”nearest neighbor” meaningful? In: ICDT, pp. 217–235 (1999)Google Scholar
  5. 5.
    Dorogovtsev, S., Goltsev, A., Mendes, J.: K-core organization of complex networks. Physical Review Letters 96(4), 40601 (2006)CrossRefGoogle Scholar
  6. 6.
    Du, N., Wu, B., Pei, X., Wang, B., Xu, L.: Community detection in large-scale social networks. In: WebKDD/SNA-KDD, pp. 16–25 (2007)Google Scholar
  7. 7.
    Ester, M., Ge, R., Gao, B.J., Hu, Z., Ben-Moshe, B.: Joint cluster analysis of attribute data and relationship data: the connected k-center problem. In: SDM (2006)Google Scholar
  8. 8.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)Google Scholar
  9. 9.
    Günnemann, S., Färber, I., Boden, B., Seidl, T.: Subspace clustering meets dense subgraph mining: A synthesis of two paradigms. In: ICDM, pp. 845–850 (2010)Google Scholar
  10. 10.
    Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological networks and gene expression data. Bioinformatics 18, 145–154 (2002)CrossRefGoogle Scholar
  11. 11.
    Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: KDD, pp. 58–65 (1998)Google Scholar
  12. 12.
    Janson, S., Luczak, M.: A simple solution to the k-core problem. Random Structures & Algorithms 30(1-2), 50–62 (2007)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Kailing, K., Kriegel, H.P., Kroeger, P.: Density-connected subspace clustering for high-dimensional data. In: SDM, pp. 246–257 (2004)Google Scholar
  14. 14.
    Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1), 1–58 (2009)CrossRefGoogle Scholar
  15. 15.
    Kubica, J., Moore, A.W., Schneider, J.G.: Tractable group detection on large link data sets. In: ICDM, pp. 573–576 (2003)Google Scholar
  16. 16.
    Moser, F., Colak, R., Rafiey, A., Ester, M.: Mining cohesive patterns from graphs with feature vectors. In: SDM, pp. 593–604 (2009)Google Scholar
  17. 17.
    Pei, J., Jiang, D., Zhang, A.: On mining cross-graph quasi-cliques. In: KDD, pp. 228–238 (2005)Google Scholar
  18. 18.
    Ulitsky, I., Shamir, R.: Identification of functional modules using network topology and high-throughput data. BMC Systems Biology 1(1) (2007)Google Scholar
  19. 19.
    Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)Google Scholar
  20. 20.
    Zhou, Y., Cheng, H., Yu, J.X.: Clustering large attributed graphs: An efficient incremental approach. In: ICDM, pp. 689–698 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Stephan Günnemann
    • 1
  • Brigitte Boden
    • 1
  • Thomas Seidl
    • 1
  1. 1.RWTH Aachen UniversityGermany

Personalised recommendations