Abstract
The community structure is a common non-trivial topological feature of many complex real-world networks. Existing methods for identifying the community structure are generally based on statistical-type properties, such as the degree of centrality, the shortest path betweenness centrality, the modularity, and so forth. However, the form of the community structure may vary widely, even if the number of vertices and edges are fixed. Consequently, it is difficult to be certain of the exact number of clusters within the network. Clustering schemes which require the number of clusters to be specified in advance often misjudge the community structure and yield a poor clustering performance as a result. Accordingly, the present study proposes a clustering algorithm, designated as the Weighted-Spectral Clustering Algorithm, capable of detecting the community structure of a network with no prior knowledge of the cluster number. The proposed method is tested on both computer-generated networks and several real-world networks for which the community structures are already known. The results confirm the ability of the proposed algorithm to partition the network into an appropriate number of clusters in every case.
Similar content being viewed by others
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723. doi:10.1109/TAC.1974.1100705
Amaral LAN, Scala A, Barthelemy M, Stanley HE (2000) Classes of small-world networks. Proc Natl Acad Sci 97(21):11149–11152. doi:10.1073/pnas.200327197
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. NIPS 14:585–591
Biemann C (2006) Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In: Proceedings of the first workshop on graph based methods for natural language processing. Association for Computational Linguistics, pp 73–80
Brandes U (2008) On variants of shortest-path betweenness centrality and their generic computation. Soc Netw 30(2):136–145. doi:10.1016/j.socnet.2007.11.001
Capocci A, Servedio VD, Caldarelli G, Colaiori F (2005) Detecting communities in large networks. Phys A Stat Mech Appl 352(2):669–676. doi:10.1016/j.physa.2004.12.050
Chung FR (1997) Spectral graph theory, vol. 92. American Mathematical Soc
Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111
Coppersmith D, Winograd S (1987) Matrix multiplication via arithmetic progressions. In: Proceedings of the nineteenth annual ACM symposium on Theory of computing, pp 1–6
Evans TS (2010) Clique graphs and overlapping communities. J Stat Mech Theory Exp 2010(12):P12037. doi:10.1088/1742-5468/2010/12/P12037
Fay D, Haddadi H, Thomason A, Moore AW, Mortier R, Jamakovic A, Rio M (2010) Weighted spectral distribution for internet topology analysis: theory and applications. IEEE/ACM Trans Netw 18(1):164–176. doi:10.1109/TNET.2009.2022369
Figueiredo MA, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396. doi:10.1109/34.990138
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3):75–174
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976. doi:10.1126/science.1136800
Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826. doi:10.1073/pnas.122653799
Good BH, de Montjoye YA, Clauset A (2010) Performance of modularity maximization in practical contexts. Phys Rev E 81(4):046106
Hecking T, Steinert L, Gohnert T, Hoppe HU (2014) Incremental clustering of dynamic bipartite networks. In: Network intelligence conference (ENIC), 2014 European. IEEE, pp. 9–16
Huang Z (2010) Link prediction based on graph topology: the predictive value of generalized clustering coefficient. SSRN 1634014
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666. doi:10.1016/j.patrec.2009.09.011
Jure L, Andrej K (2014) SNAP datasets: stanford large network dataset collection. http://snap.stanford.edu/data/
Knuth DE (1993) The Stanford GraphBase: a platform for combinatorial computing, vol 37. Addison-Wesley, Reading
LaSalle D, Karypis G (2015) Multi-threaded modularity based graph clustering using the multilevel paradigm. J Parallel Distrib Comput 76:66–80
Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on World wide web. ACM, pp 641–650
Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2008) Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th international conference on World Wide Web. ACM, pp 695–704. doi:10.1145/1367497.1367591
Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. In: Advances in neural information processing systems, pp 539–547
Li Z (2012) A non-MCMC procedure for fitting dirichlet process mixture models. Doctoral dissertation. University of Saskatchewan
Lusseau D (2003) The emergent properties of a dolphin social network. Proc R Soc Lond B Biol Sci 270(Suppl 2):S186–S188. doi:10.1098/rsbl.2003.0057
Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM (2003) The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol 54(4):396–405. doi:10.1007/s00265-003-0651-y
Meila M, Shi J (2001) A random walks view of spectral segmentation. In: Proceedings of the 8th international workshop on artificial intelligence and statistics
Micheloyannis S, Pachou E, Stam CJ, Breakspear M, Bitsios P, Vourkas M, Zervakis M (2006) Small-world networks and disturbed functional connectivity in schizophrenia. Schizophr Res 87(1):60–66. doi:10.1016/j.schres.2006.06.028
Mohar B, Alavi Y (1991) The Laplacian spectrum of graphs. Graph Theory Comb Appl 2:871–898
Nascimento MC, Carvalho AC (2011) A graph clustering algorithm based on a clustering coefficient for weighted graphs. J Braz Comput Soc 17(1):19–29
Nascimento MC, Pitsoulis L (2013) Community detection by modularity maximization using GRASP with path relinking. Comput Oper Res 40(12):3121–3131
Newman ME (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69(6):066133
Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582. doi:10.1073/pnas.0601602103
Pelleg D, Moore AW (2000) X-means: extending K-means with efficient estimation of the number of clusters. In: ICML, pp 727–734
Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: Computer and information sciences-ISCIS 2005. Springer, Berlin Heidelberg, pp 284–293. doi:10.1007/11569596_31
Santos FC, Pacheco JM, Lenaerts T (2006) Evolutionary dynamics of social dilemmas in structured heterogeneous populations. Proc Natl Acad Sci USA 103(9):3490–3494. doi:10.1073/pnas.0508201103
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Scott J (2011) Social network analysis: developments, advances, and prospects. Soc Netw Anal Min 1(1):21–26. doi:10.1007/s13278-010-0012-6
Shen H, Cheng X, Cai K, Hu MB (2009) Detect overlapping and hierarchical community structure in networks. Phys A Stat Mech Appl 388(8):1706–1712. doi:10.1016/j.physa.2008.12.021
Stam CJ, Jones BF, Nolte G, Breakspear M, Scheltens P (2007) Small-world networks and functional connectivity in Alzheimer’s disease. Cereb Cortex 17(1):92–99. doi:10.1093/cercor/bhj127
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc B (Stat Methodol) 63(2):411–423. doi:10.1111/1467-9868.00293
Van Dongen SM (2001) Graph clustering by flow simulation. Ph.D. Thesis, Dutch National Research Institute for Mathematics and Computer Science, University of Utrecht, Netherlands
Wang J, Li M, Wang H, Pan Y (2012) Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans Comput Biol Bioinform 9(4):1070–1080
Wasserman S (1994) Social network analysis: methods and applications, vol 8. Cambridge University Press, Cambridge
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’networks. Nature 393(6684):440–442. doi:10.1038/30918
Waxman BM (1988) Routing of multipoint connections. IEEE J Sel Areas Commun 6(9):1617–1622. doi:10.1109/49.12889
Wehmuth K, Gomes ATA, Ziviani A, Da Silva APC (2010) On the joint dynamics of network diameter and spectral gap under node removal. In: LAWDN-Latin-American workshop on dynamic networks
Wehmuth K, Ziviani A (2011) Distributed location of the critical nodes to network robustness based on spectral analysis. Network operations and management symposium (LANOMS) (2011) 7th Latin American. IEEE. doi:10.1109/LANOMS.2011.6102259
Xiang B, Chen EH, Zhou T (2009) Finding community structure based on subgraph similarity. Complex networks. Springer, Berlin Heidelberg, pp 73–81
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33:452–473
Acknowledgments
The authors would like to thank the Ministry of Science and Technology, ROC, for the financial support of this study under Grant No. MOST 103-2221-E-006 -147-MY3.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, TS., Lin, HT. & Wang, P. Weighted-spectral clustering algorithm for detecting community structures in complex networks. Artif Intell Rev 47, 463–483 (2017). https://doi.org/10.1007/s10462-016-9488-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-016-9488-4