Abstract
Community detection aims to partition a set of nodes with more similarities in the set than out of it based on different criteria like neighborhood similarity or vertex connectivity. Most present day community detection methods principally concentrate on the topological structure, largely ignoring the heterogeneous properties of the vertex. This paper proposes a new community detection model, based on the possibilistic c-means model, by using structural as well as attribute similarities in a large scale in social networks. In the majority of real social networks, different clusters share nodes, resulting in the formation of overlapping communities. The proposed model, on the basis of structural and attribute similarity (PCMSA), serves as a fuzzy community detection model addressing the overlapping community detection problem, and detecting communities in a way that each community has a densely connected sub-graph with homogeneous attribute values. The function of the proposed model is assessed by a trade-off between intra-cluster and inter-cluster density and homogeneity. Therefore, to validate the proposed community detection algorithm (PCMSA) and its results, an index, compatible with the proposed model, is defined; and to assess the efficiency of the proposed fuzzy community detection, several experimental results in variety sizes from very small to very large sizes of real social networks are given, and the results are contrasted with other community detection models like FCAN, CODICIL, SA-cluster, K-SNAP and PCM. The experimental findings reveal the superiority of this novel model and its promising scalability and computational complexity over others.
Similar content being viewed by others
References
Adamic LA, Glance N (2004) The political blogosphere and the 2004 US election. In n Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem, 2005
Andersen R, Chung F, Lang K (2006) Local graph partitioning using pagerank vectors. Proc - Annu IEEE Symp Found Comput Sci FOCS, pp. 475–483
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms, vol 25, no 3. Utah state university, Logan, Utah, Plenum press, New York
Bezdek JC, Keller J, Krisnapuram R, Pal NR (1999) Fuzzy models and algorithms for pattern recognition and image processing, vol 18, no 2. Kluwer Academic Publisher, Boston, London, Dordrecht
Bu Z et al (2019) Graph K-means based on leader Identification, dynamic game and opinion dynamics. IEEE Trans Knowl Data Eng 32(7):1348–1361
Cao J, Bu Z, Wang Y, Yang H, Jiang J, Li H (2019) Detecting prosumer-community groups in smart grids from the multiagent perspective. IEEE Trans Syst Man Cybern Syst 49(8):1652–1664
Dunn JC (1974) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57
Flake GW, Lawrence S, Giles CL (2000) Efficient identification of Web communities, Proc sixth ACM SIGKDD Int Conf Knowl Discov data Min - KDD ’00, 150–160
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Fu X, Liu L, Wang C (2013) Detection of community overlap according to belief propagation and conflict. Phys A Stat Mech its Appl 392(4):941–952
Höppner F (Ed.) (1999), Fuzzy cluster analysis: methods for classification, data analysis and image recognition
Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826
Golsefid SMM, Fazel Zarandi MH, Bastani S (2015) Fuzzy duocentric community detection model in social networks. Soc Networks 43:177–189
Granovetter MS (1977) The strength of weak ties, vol 1380. ACADEMIC PRESS, INC.
Gustafson DE, Kessel WC (1978) Fuzzy-clustering-with-a-fuzzy-covariance-matrix. IEEE
Hu L, Chan KCC (2016) Fuzzy clustering in a complex network based on content relevance and link structures. IEEE Trans Fuzzy Syst 24(2):456–470
Kadushin C (2004) Understanding social network
Kelley CT (1999) Iterative methods for optimization
Krishnapuram R, Keller JM (1993) A Possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110
Mendel JM, Mendel JM (2017) Uncertain rule- b ased fuzzy systems
Malek Mohamadi Golsefid S, Fazel Zarandi MH, and Susan B (2015) Fuzzy communtiy detection model in social networks. Int J Intell Syst 30:1227–1244
Pathak N, Delong C, Banerjee A (2008) Social topic models for community extraction, 2nd SNA-KDD Work., pp. 565–574
Ruan Y, Fuhry D, Parthasarathy S (2013) Efficient community detection in large networks using content and links. In WWW ’13 proceedings of the 22nd international conference on Word Wide Web, pp. 1089–1098
Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64
Sun Y, Han J, Gao J, Yu Y (2009) iTopicModel : information network-integrated topic modelling. In 2009 Ninth IEEE international conference on data mining
Tan WW, Chua TW, Cliffs E (2007) Book review, no. February
Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization.In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 567–580
Traud AL, Mucha PJ, Porter MA (2011) Social structure of facebook networks.PDF, Arxiv Prepr. arXiv1102.2166, 2011., 391(16)s pp. 4165–4180,
Valente de Oliveira J, Pedrycz W (2007) Advances in fuzzy clustering and its applications. Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England
Wang W, Liu D, Liu X, Pan L (2013) Fuzzy overlapping community detection based on local random walk and multidimensional scaling. Phys A 392(24):6578–6586
Wasserman S, Faust K (1994) Social network analysis: methods and applications, Methods and Applications. p. 825
Yang MS (1993) A survey of fuzzy clustering. Mat Comput Model 18(11):1–16
Yang T, Jin R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach, Proc 15th ACM SIGKDD Int Conf Knowl Discov data Min, pp. 927–936
Zarandi MHF, Razaee ZS (2010) A fuzzy clustering model for fuzzy data with outliers. Int J Fuzzy Syst Appl 1(2):1–18
Zarandi MHF, Faraji MR, Karbasian M (2010) An exponentioal cluster validity index for fuzzy clustering with crisp and fuzzy data. Sci Iran 17(2):95–110
Zarinbal M, Fazel Zarandi MH, Turksen IB (2014) Relative entropy fuzzy c-means clustering. Inf Sci (Ny) 260:74–97
Zhang S, Wang R-S, Zhang X-S (2007) Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Phys A-Stat Mech Its Appl 374:483–490
Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural / attribute similarities. Vldb 2(1):718–729
Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. Proc. - IEEE Int. Conf. Data Mining, ICDM, pp 689–698
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1
Proof of Theorem
Theorem 1
All \(u_{ik}\) in \(U,\forall i,k\) are independent. Hence, minimization of \(J(U,V)\) with regard to U is similar to that of \(J(u_{ik} ,v_{i} )\) regarding \(u_{ik}\). The gradient of \(J(u_{ik} ,v_{i} )\) with respect to \(u_{ik}\) is set to zero in an attempt to find the first-order necessary conditions for optimality:
To find the most favorable node as the center of cluster, a node with the closest in structure to other members of the cluster considering their membership value \(u_{ik}\) should be selected. As a result, the center of cluster i is defined as follows:
Theorem 2
From (18), the necessary conditions considering partial derivatives are as follows:
Note that \(\frac{\partial }{\partial x}(x^{T} Hx) = 2Hx\) in which H is symmetric and is not a function of x.
and finally,
The identities \(\frac{\partial }{\partial H}(x^{T} Hx) = xx^{T} { , }\frac{\partial }{\partial H}\left| H \right| = \left| H \right|H^{ - 1}\) are used for a non-singular matrix H and any compatible vector x (Gustafson and Kessel 1978).
Appendix 2
The required condition for converging of the algorithm proposed in Fig. 2 is met when:
The iterative formula for \(u_{ik}\) originates from the classical gradient descent method (Newton–Raphson method) (Kelley 1999), with \(J_{m}\) as the error function which should be minimized, namely:
where \(\tau^{(t)}\) denotes a positive parameter of learning rate and \(2 \le c \le 20\) stands for the gradient of \(J_{m}\) with respect to \(u_{ik}\) at (t-1) iteration. Re-writing (36) for U renders:
By considering \(\tau^{(t)} = \frac{{\psi^{(t)} }}{{\left\| {J_{m} (U,\Omega ,\delta )^{(t - 1)} } \right\|\left\| {\frac{{\partial J_{m} (U,\Omega ,\delta )^{(t - 1)} }}{\partial U}} \right\|^{ - 1} }}\), (38) becomes:
where, \(\psi^{(t)} = \psi_{0}^{(t)} /t\) in which \(\psi_{0}^{(t)}\) represents a constant value, and \(\psi_{0}^{(t)} \to 0\) when \(t \to \infty\). Hence, (B.5) is proved and, as a result, the suggested algorithm is convergent.
Appendix 3
NMI is a measurement index that measures the degree of matching between the communities identified by different algorithms and that of the expected. Consider \(F = \{ F_{k} \} (1 < k < c\}\) is the expected communities. NMI is defined as follows (Hu and Chan 2016):
where \(n_{{C_{k} }}\) is the number of nodes in \(C_{k}\), \(n_{{F_{k} }}\) is the number of nodes in \(F_{k}\), and \(n_{{C_{{k_{1} }} ,F_{{k_{2} }} }}\) is the number of nodes discovered in both \(C_{{k_{1} }}\) and \(F_{{k_{2} }}\).
For Accuracy measure (Hu and Chan 2016), the mapping function \(Z:C_{{k_{1} }} \to F_{{k_{2} }}\) is needed. To find Z, the \(n_{{C_{{k_{1} }} ,F_{{k_{2} }} }}\) is determined for all combinations of \(C_{{k_{1} }}\) and \(F_{{k_{2} }}\) in C and F, respectively. Given that, this is an iterative process, for each iteration, starting from the largest \(n_{{C_{{k_{1} }} ,F_{{k_{2} }} }}\), the \(C_{{k_{1} }}\) that matches against \(F_{{k_{2} }}\) is determined; then the mapping \(C_{{k_{1} }} \to F_{{k_{2} }}\) to Z is added and \(C_{{k_{1} }}\) and \(F_{{k_{2} }}\) are ignored in future iterations. The process ends when all \(C_{{k_{1} }}\) in C has found a match in F. This measurement is defined as follows:
Therefore, the values of NMI and Accuracy are larger when \(C_{k}\) matches better with the expected result F.
Rights and permissions
About this article
Cite this article
Naderipour, M., Fazel Zarandi, M.H. & Bastani, S. Fuzzy community detection on the basis of similarities in structural/attribute in large-scale social networks. Artif Intell Rev 55, 1373–1407 (2022). https://doi.org/10.1007/s10462-021-09987-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-021-09987-x