Data Mining and Knowledge Discovery

, Volume 25, Issue 2, pp 243–269 | Cite as

Finding density-based subspace clusters in graphs with feature vectors

Article

Abstract

Data sources representing attribute information in combination with network information are widely available in today’s applications. To realize the full potential for knowledge extraction, mining techniques like clustering should consider both information types simultaneously. Recent clustering approaches combine subspace clustering with dense subgraph mining to identify groups of objects that are similar in subsets of their attributes as well as densely connected within the network. While those approaches successfully circumvent the problem of full-space clustering, their limited cluster definitions are restricted to clusters of certain shapes. In this work we introduce a density-based cluster definition, which takes into account the attribute similarity in subspaces as well as a local graph density and enables us to detect clusters of arbitrary shape and size. Furthermore, we avoid redundancy in the result by selecting only the most interesting non-redundant clusters. Based on this model, we introduce the clustering algorithm DB-CSC, which uses a fixed point iteration method to efficiently determine the clustering solution. We prove the correctness and complexity of this fixed point iteration analytically. In thorough experiments we demonstrate the strength of DB-CSC in comparison to related approaches.

Keywords

Graph clustering Dense subgraphs Networks 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal C, Wang H (2010) Managing and mining graph data. Springer, New YorkMATHCrossRefGoogle Scholar
  2. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD, pp 94–105. SIGMOD, SeattleGoogle Scholar
  3. Assent I, Krieger R, Müller E, Seidl T (2008) EDSC: efficient density-based subspace clustering. In: CIKM, pp 1093–1102. CIKM, GlasgowGoogle Scholar
  4. Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is ”nearest neighbor” meaningful? In: ICDT, pp 217–235. ICDT, Mont BlancGoogle Scholar
  5. Dorogovtsev S, Goltsev A, Mendes J (2006) K-core organization of complex networks. Phys Rev Lett 96(4): 40–601CrossRefGoogle Scholar
  6. Du N, Wu B, Pei X, Wang B, Xu L (2007) Community detection in large-scale social networks. In: WebKDD/SNA-KDD, pp 16–25. SNA-KDD, San JoseGoogle Scholar
  7. Ester M, Kriegel HP, S J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp 226–231. KDD, PortlandGoogle Scholar
  8. Ester M, Ge R, Gao BJ, Hu Z, Ben-Moshe B (2006) Joint cluster analysis of attribute data and relationship data: the connected k-center problem. In: SDM. SDM, BethesdaGoogle Scholar
  9. Günnemann S, Müller E, Färber I, Seidl T (2009) Detection of orthogonal concepts in subspaces of high dimensional data. In: CIKM, pp 1317–1326. CIKM, Hong KongGoogle Scholar
  10. Günnemann S, Färber I, Boden B, Seidl T (2010) Subspace clustering meets dense subgraph mining: a synthesis of two paradigms. In: ICDM, pp 845–850. ICDM, SydneyGoogle Scholar
  11. Günnemann S, Kremer H, Seidl T (2010) Subspace clustering for uncertain data. In: SDM, pp 385–396. SDM, BethesdaGoogle Scholar
  12. Günnemann S, Boden B, Seidl T (2011) DB-CSC: A density-based approach for subspace clustering in graphs with feature vectors. In: ECML/PKDD (1), pp 565–580. ECML, AthensGoogle Scholar
  13. Günnemann S, Färber I, Müller E, Assent I, Seidl T (2011) External evaluation measures for subspace clustering. In: CIKM, pp 1363–1372. CIKM, GlasgowGoogle Scholar
  14. Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18: 145–154CrossRefGoogle Scholar
  15. Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. In: KDD, pp 58–65. KDD, New YorkGoogle Scholar
  16. Janson S, Luczak M (2007) A simple solution to the k-core problem. Rand Struct Algorithm 30(1–2): 50–62MathSciNetMATHCrossRefGoogle Scholar
  17. Kailing K, Kriegel HP, Kroeger P (2004) Density-connected subspace clustering for high-dimensional data. In: SDM, pp 246–257. SDM, BethesdaGoogle Scholar
  18. Kriegel HP, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. Trans Knowl Discov Data 3(1): 1–58CrossRefGoogle Scholar
  19. Kubica J, Moore AW, Schneider JG (2003) Tractable group detection on large link data sets. In: ICDM, pp 573–576. ICDM, SydneyGoogle Scholar
  20. Long B, Wu X, Zhang ZM, Yu PS (2006) Unsupervised learning on k-partite graphs. In: KDD, pp 317–326. KDD, PortlandGoogle Scholar
  21. Long B, Zhang ZM, Yu PS (2007) A probabilistic framework for relational clustering. In: KDD, pp 470–479. KDD, PortlandGoogle Scholar
  22. Moise G, Sander J (2008) Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp 533–541. KDD, PortlandGoogle Scholar
  23. Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SDM, pp 593–604. SDM, BethesdaGoogle Scholar
  24. Müller E, Assent I, Günnemann S, Krieger R, Seidl T (2009) Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp 377–386. ICDM, SydneyGoogle Scholar
  25. Müller E, Günnemann S, Assent I, Seidl T (2009) Evaluating clustering in subspace projections of high dimensional data. In: VLDB, pp 1270–1281. VLDB, SingaporeGoogle Scholar
  26. Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor 6(1): 90–105CrossRefGoogle Scholar
  27. Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: KDD, pp 228–238. KDD, PortlandGoogle Scholar
  28. Ruan J, Zhang W (2007) An efficient spectral algorithm for network community discovery and its applications to biological and social networks. In: ICDM, pp 643–648. ICDM, SydneyGoogle Scholar
  29. Ulitsky I, Shamir R (2007) Identification of functional modules using network topology and high-throughput data. BMC Syst Biol 1(1): 8CrossRefGoogle Scholar
  30. Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. PVLDB 2(1): 718–729Google Scholar
  31. Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. In: ICDM, pp 689–698. ICDM, SydneyGoogle Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  • Stephan Günnemann
    • 1
  • Brigitte Boden
    • 1
  • Thomas Seidl
    • 1
  1. 1.Data Management and Data Exploration GroupRWTH Aachen UniversityAachenGermany

Personalised recommendations