Fuzzy community detection on the basis of similarities in structural/attribute in large-scale social networks

Naderipour, Mansoureh; Fazel Zarandi, Mohammad Hossein; Bastani, Susan

doi:10.1007/s10462-021-09987-x

Fuzzy community detection on the basis of similarities in structural/attribute in large-scale social networks

Published: 12 April 2021

Volume 55, pages 1373–1407, (2022)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Mansoureh Naderipour¹,
Mohammad Hossein Fazel Zarandi¹ &
Susan Bastani²

982 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

Community detection aims to partition a set of nodes with more similarities in the set than out of it based on different criteria like neighborhood similarity or vertex connectivity. Most present day community detection methods principally concentrate on the topological structure, largely ignoring the heterogeneous properties of the vertex. This paper proposes a new community detection model, based on the possibilistic c-means model, by using structural as well as attribute similarities in a large scale in social networks. In the majority of real social networks, different clusters share nodes, resulting in the formation of overlapping communities. The proposed model, on the basis of structural and attribute similarity (PCMSA), serves as a fuzzy community detection model addressing the overlapping community detection problem, and detecting communities in a way that each community has a densely connected sub-graph with homogeneous attribute values. The function of the proposed model is assessed by a trade-off between intra-cluster and inter-cluster density and homogeneity. Therefore, to validate the proposed community detection algorithm (PCMSA) and its results, an index, compatible with the proposed model, is defined; and to assess the efficiency of the proposed fuzzy community detection, several experimental results in variety sizes from very small to very large sizes of real social networks are given, and the results are contrasted with other community detection models like FCAN, CODICIL, SA-cluster, K-SNAP and PCM. The experimental findings reveal the superiority of this novel model and its promising scalability and computational complexity over others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A community detection algorithm based on multi-similarity method

Article 13 January 2018

A New Community Detection Algorithm Based on Fuzzy Measures

LapEFCM: overlapping community detection using laplacian eigenmaps and fuzzy C-means clustering

Article 15 July 2022

References

Adamic LA, Glance N (2004) The political blogosphere and the 2004 US election. In n Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem, 2005
Andersen R, Chung F, Lang K (2006) Local graph partitioning using pagerank vectors. Proc - Annu IEEE Symp Found Comput Sci FOCS, pp. 475–483
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms, vol 25, no 3. Utah state university, Logan, Utah, Plenum press, New York
Bezdek JC, Keller J, Krisnapuram R, Pal NR (1999) Fuzzy models and algorithms for pattern recognition and image processing, vol 18, no 2. Kluwer Academic Publisher, Boston, London, Dordrecht
Bu Z et al (2019) Graph K-means based on leader Identification, dynamic game and opinion dynamics. IEEE Trans Knowl Data Eng 32(7):1348–1361
Article Google Scholar
Cao J, Bu Z, Wang Y, Yang H, Jiang J, Li H (2019) Detecting prosumer-community groups in smart grids from the multiagent perspective. IEEE Trans Syst Man Cybern Syst 49(8):1652–1664
Article Google Scholar
Dunn JC (1974) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57
Article MathSciNet Google Scholar
Flake GW, Lawrence S, Giles CL (2000) Efficient identification of Web communities, Proc sixth ACM SIGKDD Int Conf Knowl Discov data Min - KDD ’00, 150–160
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Article MathSciNet Google Scholar
Fu X, Liu L, Wang C (2013) Detection of community overlap according to belief propagation and conflict. Phys A Stat Mech its Appl 392(4):941–952
Article Google Scholar
Höppner F (Ed.) (1999), Fuzzy cluster analysis: methods for classification, data analysis and image recognition
Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826
Article MathSciNet Google Scholar
Golsefid SMM, Fazel Zarandi MH, Bastani S (2015) Fuzzy duocentric community detection model in social networks. Soc Networks 43:177–189
Article Google Scholar
Granovetter MS (1977) The strength of weak ties, vol 1380. ACADEMIC PRESS, INC.
Google Scholar
Gustafson DE, Kessel WC (1978) Fuzzy-clustering-with-a-fuzzy-covariance-matrix. IEEE
Hu L, Chan KCC (2016) Fuzzy clustering in a complex network based on content relevance and link structures. IEEE Trans Fuzzy Syst 24(2):456–470
Article Google Scholar
Kadushin C (2004) Understanding social network
Kelley CT (1999) Iterative methods for optimization
Krishnapuram R, Keller JM (1993) A Possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110
Article Google Scholar
Mendel JM, Mendel JM (2017) Uncertain rule- b ased fuzzy systems
Malek Mohamadi Golsefid S, Fazel Zarandi MH, and Susan B (2015) Fuzzy communtiy detection model in social networks. Int J Intell Syst 30:1227–1244
Article Google Scholar
Pathak N, Delong C, Banerjee A (2008) Social topic models for community extraction, 2nd SNA-KDD Work., pp. 565–574
Ruan Y, Fuhry D, Parthasarathy S (2013) Efficient community detection in large networks using content and links. In WWW ’13 proceedings of the 22nd international conference on Word Wide Web, pp. 1089–1098
Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64
Article Google Scholar
Sun Y, Han J, Gao J, Yu Y (2009) iTopicModel : information network-integrated topic modelling. In 2009 Ninth IEEE international conference on data mining
Tan WW, Chua TW, Cliffs E (2007) Book review, no. February
Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization.In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 567–580
Traud AL, Mucha PJ, Porter MA (2011) Social structure of facebook networks.PDF, Arxiv Prepr. arXiv1102.2166, 2011., 391(16)s pp. 4165–4180,
Valente de Oliveira J, Pedrycz W (2007) Advances in fuzzy clustering and its applications. Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England
Wang W, Liu D, Liu X, Pan L (2013) Fuzzy overlapping community detection based on local random walk and multidimensional scaling. Phys A 392(24):6578–6586
Article Google Scholar
Wasserman S, Faust K (1994) Social network analysis: methods and applications, Methods and Applications. p. 825
Yang MS (1993) A survey of fuzzy clustering. Mat Comput Model 18(11):1–16
Article MathSciNet Google Scholar
Yang T, Jin R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach, Proc 15th ACM SIGKDD Int Conf Knowl Discov data Min, pp. 927–936
Zarandi MHF, Razaee ZS (2010) A fuzzy clustering model for fuzzy data with outliers. Int J Fuzzy Syst Appl 1(2):1–18
Google Scholar
Zarandi MHF, Faraji MR, Karbasian M (2010) An exponentioal cluster validity index for fuzzy clustering with crisp and fuzzy data. Sci Iran 17(2):95–110
Google Scholar
Zarinbal M, Fazel Zarandi MH, Turksen IB (2014) Relative entropy fuzzy c-means clustering. Inf Sci (Ny) 260:74–97
Article MathSciNet Google Scholar
Zhang S, Wang R-S, Zhang X-S (2007) Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Phys A-Stat Mech Its Appl 374:483–490
Article Google Scholar
Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural / attribute similarities. Vldb 2(1):718–729
Google Scholar
Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. Proc. - IEEE Int. Conf. Data Mining, ICDM, pp 689–698
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial Engineering, Amirkabir University of Technology (Polytechnic of Tehran), P.O. Box 15875-4413, Tehran, Iran
Mansoureh Naderipour & Mohammad Hossein Fazel Zarandi
Department of Sociology, Alzahra University, 19938-93973, Tehran, Iran
Susan Bastani

Authors

Mansoureh Naderipour
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Hossein Fazel Zarandi
View author publications
You can also search for this author in PubMed Google Scholar
Susan Bastani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Hossein Fazel Zarandi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Proof of Theorem

Theorem 1

All $u_{ik}$ in $U,\forall i,k$ are independent. Hence, minimization of $J(U,V)$ with regard to U is similar to that of $J(u_{ik} ,v_{i} )$ regarding $u_{ik}$. The gradient of $J(u_{ik} ,v_{i} )$ with respect to $u_{ik}$ is set to zero in an attempt to find the first-order necessary conditions for optimality:

$$\frac{\partial J}{{\partial u_{ik} }} = m{\text{ u}}_{ik}^{m - 1} D_{ik} - m \, \Delta_{i} (1 - u_{ik} )^{m - 1} = 0 \Rightarrow u_{ik} = \left( {1 + \left( {\frac{{D_{ik} }}{{\Delta_{i} }}} \right)^{1/(m - 1)} } \right)^{ - 1}$$

(30)

To find the most favorable node as the center of cluster, a node with the closest in structure to other members of the cluster considering their membership value $u_{ik}$ should be selected. As a result, the center of cluster i is defined as follows:

$$v_{i}^{ * } = \mathop {\arg \min }\limits_{{v_{i} \in [1,n]}} (\sum\limits_{k = 1}^{n} {\sum\limits_{j = 1}^{n} {u_{ik}^{m} D_{ik} } } )$$

(31)

Theorem 2

From (18), the necessary conditions considering partial derivatives are as follows:

$$\begin{gathered} \frac{\partial J}{{\partial u_{ik} }} = m(u_{ik} )^{m - 1} d_{ik} (\Omega_{i} ) + m\delta_{i} (1 - u_{ik} )^{m - 1} = 0 \hfill \\ u_{ik}^{*} = \left( {1 + \left( {\frac{{d_{ik} (\Omega_{i} )}}{{ - \delta_{i} }}} \right)^{1/(m - 1)} } \right)^{ - 1} \hfill \\ \end{gathered}$$

(32)

$$\begin{gathered} \left. {\frac{\partial J}{{\partial v_{i} }}} \right|_{*} = - 2\sum\limits_{k = 1}^{n} {(u_{ik} )^{m} H_{i} (x_{k} - v_{i}^{*} )} = 0 \, ; \, i = 1,2,...,c \hfill \\ v_{i}^{*} = \frac{{\sum\limits_{k = 1}^{n} {u_{ik}^{m} } x_{k} }}{{\sum\limits_{k = 1}^{n} {u_{ik}^{m} } }} \hfill \\ \end{gathered}$$

(33)

Note that $\frac{\partial }{\partial x}(x^{T} Hx) = 2Hx$ in which H is symmetric and is not a function of x.

and finally,

$$\begin{gathered} \left. {\frac{\partial J}{{\partial H_{i} }}} \right|_{*} = \sum\limits_{k = 1}^{n} {u_{ik}^{m} (x_{k} - v_{i} )(x_{k} - v_{i} )^{T} + \lambda_{i} \left| {H_{i}^{*} } \right|} \, H_{i}^{{*^{ - 1} }} = 0 \hfill \\ H_{i}^{{*^{ - 1} }} = \frac{1}{{\lambda_{i} \left| {H_{i}^{*} } \right|}}\sum\limits_{k = 1}^{n} {u_{ik}^{m} } (x_{k} - v_{i}^{*} )(x_{k} - v_{i}^{*} )^{T} \hfill \\ \end{gathered}$$

(34)

The identities $\frac{\partial }{\partial H}(x^{T} Hx) = xx^{T} { , }\frac{\partial }{\partial H}\left| H \right| = \left| H \right|H^{ - 1}$ are used for a non-singular matrix H and any compatible vector x (Gustafson and Kessel 1978).

Appendix 2

The required condition for converging of the algorithm proposed in Fig. 2 is met when:

$$\lim_{t \to \infty } \left\| {U^{(t)} - U^{(t - 1)} } \right\| = 0$$

(35)

The iterative formula for $u_{ik}$ originates from the classical gradient descent method (Newton–Raphson method) (Kelley 1999), with $J_{m}$ as the error function which should be minimized, namely:

$$u_{ik}^{(t)} = u_{ik}^{(t - 1)} - \tau^{(t)} (J_{m} (u_{ik} ,\Omega_{i} ,\delta_{i} )^{(t - 1)} )\left( {\frac{{\partial J_{m} (u_{ik} ,\Omega_{i} ,\delta_{i} )^{(t - 1)} }}{{\partial u_{ik} }}} \right)^{ - 1}$$

(36)

where $\tau^{(t)}$ denotes a positive parameter of learning rate and $2 \le c \le 20$ stands for the gradient of $J_{m}$ with respect to $u_{ik}$ at (t-1) iteration. Re-writing (36) for U renders:

$$U^{(t)} - U^{(t - 1)} = - \tau^{(t)} (J_{m} (U,\Omega ,\delta )^{(t - 1)} )\left( {\frac{{\partial J_{m} (U,\Omega ,\delta )^{(t - 1)} }}{\partial U}} \right)^{ - 1}$$

(37)

Now, putting (35) in (37):

$$\lim_{t \to \infty } \left\| {U^{(t)} - U^{(t - 1)} } \right\| = \lim_{t \to \infty } \left( {\left\| {\tau^{(t)} } \right\|\left\| {J_{m} (U,\Omega ,\delta )^{(t - 1)} } \right\|\left\| {\frac{{\partial J_{m} (U,\Omega ,\delta )^{(t - 1)} }}{\partial U}} \right\|^{ - 1} } \right)$$

(38)

By considering $\tau^{(t)} = \frac{{\psi^{(t)} }}{{\left\| {J_{m} (U,\Omega ,\delta )^{(t - 1)} } \right\|\left\| {\frac{{\partial J_{m} (U,\Omega ,\delta )^{(t - 1)} }}{\partial U}} \right\|^{ - 1} }}$, (38) becomes:

$$\lim_{t \to \infty } \left\| {U^{(t)} - U^{(t - 1)} } \right\| = \lim_{t \to \infty } \left\| {\psi^{(t)} } \right\| = 0$$

(39)

where, $\psi^{(t)} = \psi_{0}^{(t)} /t$ in which $\psi_{0}^{(t)}$ represents a constant value, and $\psi_{0}^{(t)} \to 0$ when $t \to \infty$. Hence, (B.5) is proved and, as a result, the suggested algorithm is convergent.

Appendix 3

NMI is a measurement index that measures the degree of matching between the communities identified by different algorithms and that of the expected. Consider $F = \{ F_{k} \} (1 < k < c\}$ is the expected communities. NMI is defined as follows (Hu and Chan 2016):

$$NMI = \frac{{\sum\limits_{{k_{1} = 1}}^{c} {\sum\limits_{{k_{2} = 1}}^{c} {n_{{C_{{k_{1} }} ,F_{{k_{2} }} }} \log \left(\frac{{n \, *n_{{C_{{k_{1} }} ,F_{{k_{2} }} }} }}{{n_{{C_{{k_{1} }} }} *n_{{F_{{k_{2} }} }} }}\right)} } }}{{\sqrt {\left(\sum\limits_{k = 1}^{c} {n_{{C_{k} }} \log \frac{{n_{{C_{k} }} }}{n}} \right)\left(\sum\limits_{k = 1}^{c} {n_{{F_{k} }} \log \frac{{n_{{F_{k} }} }}{n}} \right)} }}$$

(40)

where $n_{{C_{k} }}$ is the number of nodes in $C_{k}$, $n_{{F_{k} }}$ is the number of nodes in $F_{k}$, and $n_{{C_{{k_{1} }} ,F_{{k_{2} }} }}$ is the number of nodes discovered in both $C_{{k_{1} }}$ and $F_{{k_{2} }}$.

For Accuracy measure (Hu and Chan 2016), the mapping function $Z:C_{{k_{1} }} \to F_{{k_{2} }}$ is needed. To find Z, the $n_{{C_{{k_{1} }} ,F_{{k_{2} }} }}$ is determined for all combinations of $C_{{k_{1} }}$ and $F_{{k_{2} }}$ in C and F, respectively. Given that, this is an iterative process, for each iteration, starting from the largest $n_{{C_{{k_{1} }} ,F_{{k_{2} }} }}$, the $C_{{k_{1} }}$ that matches against $F_{{k_{2} }}$ is determined; then the mapping $C_{{k_{1} }} \to F_{{k_{2} }}$ to Z is added and $C_{{k_{1} }}$ and $F_{{k_{2} }}$ are ignored in future iterations. The process ends when all $C_{{k_{1} }}$ in C has found a match in F. This measurement is defined as follows:

$$Accuracy = \frac{1}{n}\sum\limits_{k = 1}^{c} {n_{{C_{k} ,Z(C_{k} )}} }$$

(41)

Therefore, the values of NMI and Accuracy are larger when $C_{k}$ matches better with the expected result F.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naderipour, M., Fazel Zarandi, M.H. & Bastani, S. Fuzzy community detection on the basis of similarities in structural/attribute in large-scale social networks. Artif Intell Rev 55, 1373–1407 (2022). https://doi.org/10.1007/s10462-021-09987-x

Download citation

Published: 12 April 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10462-021-09987-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fuzzy community detection on the basis of similarities in structural/attribute in large-scale social networks

Abstract

Access this article

Similar content being viewed by others

A community detection algorithm based on multi-similarity method

A New Community Detection Algorithm Based on Fuzzy Measures

LapEFCM: overlapping community detection using laplacian eigenmaps and fuzzy C-means clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1

Theorem 1

Theorem 2

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fuzzy community detection on the basis of similarities in structural/attribute in large-scale social networks

Abstract

Access this article

Similar content being viewed by others

A community detection algorithm based on multi-similarity method

A New Community Detection Algorithm Based on Fuzzy Measures

LapEFCM: overlapping community detection using laplacian eigenmaps and fuzzy C-means clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1

Theorem 1

Theorem 2

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation