Clustering large attributed information networks: an efficient incremental computing approach

Cheng, Hong; Zhou, Yang; Huang, Xin; Yu, Jeffrey Xu

doi:10.1007/s10618-012-0263-0

Clustering large attributed information networks: an efficient incremental computing approach

Published: 04 March 2012

Volume 25, pages 450–477, (2012)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Hong Cheng¹,
Yang Zhou²,
Xin Huang¹ &
…
Jeffrey Xu Yu¹

912 Accesses
34 Citations
Explore all metrics

Abstract

In recent years, many information networks have become available for analysis, including social networks, road networks, sensor networks, biological networks, etc. Graph clustering has shown its effectiveness in analyzing and visualizing large networks. The goal of graph clustering is to partition vertices in a large graph into clusters based on various criteria such as vertex connectivity or neighborhood similarity. Many existing graph clustering methods mainly focus on the topological structures, but largely ignore the vertex properties which are often heterogeneous. Recently, a new graph clustering algorithm, SA-cluster, has been proposed which combines structural and attribute similarities through a unified distance measure. SA-Cluster performs matrix multiplication to calculate the random walk distances between graph vertices. As part of the clustering refinement, the graph edge weights are iteratively adjusted to balance the relative importance between structural and attribute similarities. As a consequence, matrix multiplication is repeated in each iteration of the clustering process to recalculate the random walk distances which are affected by the edge weight update. In order to improve the efficiency and scalability of SA-cluster, in this paper, we propose an efficient algorithm In-Cluster to incrementally update the random walk distances given the edge weight increments. Complexity analysis is provided to estimate how much runtime cost Inc-Cluster can save. We further design parallel matrix computation techniques on a multicore architecture. Experimental results demonstrate that Inc-Cluster achieves significant speedup over SA-Cluster on large graphs, while achieving exactly the same clustering quality in terms of intra-cluster structural cohesiveness and attribute value homogeneity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ANCA : Attributed Network Clustering Algorithm

Distributed Graph Clustering Using Modularity and Map Equation

Local Graph Clustering by Multi-network Random Walk with Restart

References

Cai D, Shao Z, He X, Yan X, Han J (2005) Mining hidden community in heterogeneous social networks. In: Proceedings of Workshop on Link Discovery: Issues, Approaches and Applications (LinkKDD’05), pp 58–65, Chicago, IL
Cohn H, Kleinberg R, Szegedy B, Umans C (2005) Group-theoretic algorithms for matrix multiplication. In: Symposium on Foundations of Computer Science (FOCS)
Desikan P, Pathak N, Srivastava J, Kumar V (2005) Incremental page rank computation on evolving graphs. In: 14th International World Wide Web (WWW) Conference, pp 1094–1095
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of SIGIR, pp 50–57
Jeh G, Widom J (2002) SimRank: a measure of structural-context similarity. In: Proceedings of KDD, pp 538–543
Long B, Zhang ZM, Wu X, Yu PS (2006) Spectral clustering for multi-type relational data. In: Proceedings of International Conference on Machine Learning (ICML), pp 585–592
Navlakha S, Rastogi R, Shrivastava N (2008) Graph summarization with bounded error. In: Proceedings of SIGMOD, pp 419–432
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113
Google Scholar
Pons P, Latapy M (2006) Computing communities in large networks using random walks. J. Graph Algorithms Appl 10(2): 191–218
Article MathSciNet MATH Google Scholar
Satuluri V, Parthasarathy S (2009) Scalable graph clustering using stochastic flows: applications to community discovery. In: Conference on Knowledge Discovery and Data Mining (KDD), pp 737–745
Shi J, Malik J (2000) Normalized cuts and image segmentation. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8): 888–905
Article Google Scholar
Strassen V (1969) Gaussian elimination is not optimal. Numerische Mathematik 13: 354–356
Article MathSciNet MATH Google Scholar
Sun J, Faloutsos C, Papadimitriou S, Yu PS (2007) Graphscope: parameter-free mining of large time-evolving graphs. In: Proceedings of KDD, pp 687–696
Sun Y, Han J, Zhao P, Yin Z, Cheng H, Wu T (2009) Rankclus: integrating clustering with ranking for heterogenous information network analysis. In: Proceedings of EDBT, pp 565–576
Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of SIGMOD, pp 567–580
Tong H, Faloutsos C, Pan J-Y (2006) Fast random walk with restart and its applications. In: Proceedings of ICDM, pp 613–622
Tong H, Faloutsos C, Pan J-Y (2008) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14: 327–346
Article MATH Google Scholar
Tsai C-Y, Chui C-C (2008) Developing a feature weight self-adjustment mechanism for a k-means clustering algorithm. Comput Stat Data Anal 52: 4658–4672
Article MATH Google Scholar
Wang F, Li T, Wang X, Zhu S, Ding C (2011) Community discovery using nonnegative matrix factorization. Data Min Knowl Discov 22(3): 493–521
Article MathSciNet MATH Google Scholar
Wu Y, Raschid L (2009) Approxrank: estimating rank for a subgraph. In: Proceedings of ICDE, pp 54–65
Xu X, Yuruk N, Feng Z (2007) Schweiger TAJ Scan: a structural clustering algorithm for networks. In: Proceedings of KDD, pp 824–833
Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. In: Proceedings of the VLDB Endowment, pp 718–729
Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. In: IEEE International Conference on Data Mining (ICDM), pp 689–698

Download references

Author information

Authors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
Hong Cheng, Xin Huang & Jeffrey Xu Yu
College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
Yang Zhou

Authors

Hong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Xu Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Cheng.

Additional information

Responsible editor: Fei Wang, Hanghang Tong, Phillip Yu, Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, H., Zhou, Y., Huang, X. et al. Clustering large attributed information networks: an efficient incremental computing approach. Data Min Knowl Disc 25, 450–477 (2012). https://doi.org/10.1007/s10618-012-0263-0

Download citation

Received: 30 April 2011
Accepted: 16 February 2012
Published: 04 March 2012
Issue Date: November 2012
DOI: https://doi.org/10.1007/s10618-012-0263-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering large attributed information networks: an efficient incremental computing approach

Abstract

Access this article

Similar content being viewed by others

ANCA : Attributed Network Clustering Algorithm

Distributed Graph Clustering Using Modularity and Map Equation

Local Graph Clustering by Multi-network Random Walk with Restart

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering large attributed information networks: an efficient incremental computing approach

Abstract

Access this article

Similar content being viewed by others

ANCA : Attributed Network Clustering Algorithm

Distributed Graph Clustering Using Modularity and Map Equation

Local Graph Clustering by Multi-network Random Walk with Restart

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation